American Jewish History R+ Settler Colonialism Text Analysis

Word Clouding: Immigrant Jewish Female Settlers

Assumptions are the enemies of historians.  Well, at least, this historian. I had hoped that the topic modeling for immigrant female Jewish settlers would be gendered in a way that would provide stark contrast to their male counterparts.  You know that old dichotomy—masculine vs. feminine.  Even though I often write against such tropes, I nevertheless had thought that these stark contrasts would provide a rich and fertile discussion about gender on the frontier.  These hopes, however, were dashed.


As I looked at the word clouds, I found that, like much their male counterparts, female Jewish immigrants looked to and remembered their time in the Dakotas in terms of two key topics: family and the home.  For many of these Jewish women, life on the Dakotas was much different than it was in Russia.  Back in the mother country, Jewish women were, by and large, the economic breadwinners while their husbands went to Yeshiva.  This was not always the case, as some, like the Losk family were upper middle class business owners and the duties of the wives were like much like their Western European coreligionists.  Families like the Losks were the exception and not the rule. Many of those families who immigrated were poor and from shtetls.  Jewish women living in the Dakotas mirrored the lives of many immigrant women.  They worked in the fields, raised children, and if they lived in larger cities, kept a house while the husband worked.


Jewish Women on the Dakotas
Jewish Women in the Dakotas
Jewish Women in the Dakotas
Jewish Women in the Dakotas
Jewish Women in the Dakotas
Jewish Women in the Dakotas

The word clouds do tell a gendered story.  Traditionally, women both transmitted Jewish culture to their children and in inter-religious marriages, it was women who determined if the offspring were considered halachically (by Jewish law) Jewish.  One can see across the five various word clouds that for these women, religion was a primary concern.  These concerns also mirror the rise of a stable Jewish life in the Dakotas as well.  Initially, no temples existed, and Jewish settlers held religious services in homes or rented churches.  Their concern to reconstruct and ensure that Jewish life would continue for their children might be one explanation for this.

American Indians American Jewish History Native American History R+ Settler Colonialism Text Analysis

Word Clouding: Immigrant Jewish Male Settlers

Historical actors make history!  Such axioms allow my project to include the lives of many Jewish settlers that, by in large, have been excluded in the overarching narrative of the American West.  The entire dissertation views Jewish settlers as part and parcel of a much large settler colonial project set into motion by the federal government through the  Oregon Donation Land Act, the Homestead Act, and the Dawes Act.  Immigrants were promised free land!  Although it was not technically free (filing fees and $1.00 per acre), the ability to own land was largely unheard of in many of the settlers’ home countries.  This was especially true for Russian Jews who left their country for various reasons: pograms and the inability to own land.  For many Russian Jews their lives were torn asunder.

Close reading of the some of the documents reinforce, and then, add to these historical narratives.  For example, in the case of Charles Losk who’s family fled Russia right after the 1905 Russo-Japanese War, they chose to farm because when they had traveled across Europe they witnessed German farmers.  For the Losk family, those German farmers led a peaceful life that was free of government intervention.  Yet, the word clouds do not show the reasons why they chose to move to North Dakota over the more urban locales.  In the case of Sigmund Shlesinger (below), the topics in the word clouds directly reflect the actual text.

Sigmund Schlessinger
Sigmund Shlesinger



Collectively, the Jewish male data set reflect a concern for family, land, farming, and traveling to the United States.  Some of these things might not surprise historians.  We can understand the travelogues.  One might also think that topics such as farming and land would be natural fits for immigrant families coming to the Dakotas.  Which in turn make topics like the seasons not so ordinary.  What is so fascinating, especially when one looks at the topics with special attention to gender–the fact that these men are writing a great deal about the family is especially telling.  It might lend itself to understand that Jewish males viewed the home and their family as a refuge from the outside pressure to assimilate.

jewishmale5 jewishmale4 jewishmale3 jewishmale2 jewishmale

American Indians Native American History R+ Settler Colonialism Text Analysis

Word Clouding: American Indian Men

This Native American history project required me to create additional stop words that might be considered sacred.  Interviewers had not practiced this when these oral histories took place in the 1970’s, but historians do now.  I created an additional set of stop words that by some standards, means that I have already played with the results.

I view this as a crucial component of being a humanist.  I am a humanist first and humanist à la digital second.  Working with the sacred means that I must respect that and honor traditions over conventional digital humanities practices.  I would love to tell you that I struggled with creating an additional list of stop words, but I did not even flinch.  In fact, it was only until after I created the stop list and saw the difference between the pre-additional list and post-additional list that I became concerned.  At the end of the day, I am honoring the lives of the people that I study.  Being a mindful digital humanist has been one of the largest takeaways that I had from this entire project.


Below are selections from single-person interviews:


Charles Little Dog2
Word Cloud from Single American Indian Male
Word Cloud from Single American Indian Male
George Eagle Elk
Word Cloud from Single American Indian Male
Word Cloud from Single American Indian Male

Taken together I can see how family, time, and land are very important.  Of course, in the third one down, I can also see “bullet,” which signifies a series of very important historical battles between the Lakota, Dakota, and the U.S. Federal Government.  In fact, I know that this person was at the Ghost Dance of 1890.


Then when I run them all together, the collective data provides somewhat different results.  Time, land, healing, and assimilation come into the forefront.  Using digital humanist tools like R allows me to hone by argument as a Native American historian.  The statistical analysis demonstrates that American Indians remember U.S. interventions on their land in stark terms: traditional and forced-assmiliative pathways.  “Land” is especially telling because it tells two stories: one that land is part of who the Lakota and Dakota peoples are and also that the U.S. and white settlers endeavored to first seize their land, and second, to change their relationship to the land.

Word Cloud All American Indian Male
Word Cloud All American Indian Male
Word Cloud All American Indian Male
Word Cloud All American Indian Male
Project Management R+ Text Analysis


As the incubator portion of this digital project comes to end, I’d like to provide a bit of transparency about how and why I am using the text files.

As a historian and digital humanist, I am always plagued by the lack sources. For those of us who work in Native American history, it is especially troubling—and even worse for those who examine the lives of 19th century American Indian women.

Race and gender are very important to me as a historian.  One way to get at how both of these topics operated in the data sets was analyzing them separately and then together.  First, I split the data sets into the following groups:  American Indian males; Jewish male settlers; and Jewish female settlers.  I am still actively seeking American Indian female transcripts or journals. By analyzing them in this way, it allows me to understand how each how race and gender shaped their historical experiences in the Dakotas.

From there, I analyzed all of the male text files and the female text files separately.  When I am able to locate more female American Indian files, I will be able to complete the American Indian data sets.  Once those are located, I will then process those with the same code as the others to look for similarities and differences in the topics by gender.

Project Management R+ Text Analysis

Data Management


Being a digital humanist, and before that, a bit of a web nerd, I had file naming and data management conventions drilled into my head.  At least that is what I thought.  As part of our incubator, we had the opportunity to meet with Assistant Professor and Data Curation Librarian, Jennifer Thoegersen.


Here are just a few of the things that I learned:

  • You want a local and backup copy; three copies on two different mediums and one remote copy.
  • When dealing with human subjects, it’s important to think about private information—especially if they are alive.
  • For my project, which uses R, it is important to ensure that the names of the R output files match the text input and that there are no spaces or special characters in the file names (For example, airp730.pdf. airp730.txt, airp730.R)  I
  • Always keep a README File.
  • It helps others who might not be familiar with my project to understand my choices.  For example, I removed non-essential pages from the .pdf’s before I used OCR software.



Project Management R+ Text Analysis

Oral History + R

Oral history presents certain challenges that you never think about when you start a digital project.  Over the past few weeks I have been struggling to make certain choices about how to handle and then process the text files.  For example, do I change the diction to standard English?; Do I include both the questions and the answers for text analysis and topic modeling?; and the dreaded speech to text issues that seem to plague the process.

I decided that I needed to convert various words (cuz to because; differnt to different, and so on) so that I could standardize the texts.  Because I am also topic modeling, I did not want to add any more words to my stop list file when training the modeler.  I also was afraid that I would change some of the KWIC (key words in context) if I kept the words in their original form for the text analysis portion of my project.

In terms of keeping both the questions and answers for data mining, I decided to create two data sets: the original and one with just the answers of the interviewee. I will then compare and contrast to see how it shifts the results.

There is really now easy work around converting speech to text.  It’s a slow process of training, and even more training to ensure that the text is matching the spoken file.  In the past I have had to just transcribe it by hand because the software could not handle a thick New York accent. Wish me luck as I start to train Mac2Speech this Friday!

R+ Text Analysis

Regaining Literacy in R+

Some people may wonder why a historian would use computational analysis over a close reading of documents. My reason is simple: computational analysis provides me an opportunity to extract meaning from large volumes of data over time.   Historians look for change over time and R is a tool that helps me do that. In short, R helps me understand the past more clearly.

I really thought that all of my previous R+ training would come back. I learned that a computing language is much like a foreign language. If you do not use it, you lose it. I spent the better part of the week refreshing my R skills. Although it felt like starting over, it was important for me to go back to the basics. Like Jason,  I learned to code through experience.

If you are thinking about using R, or need a refresher, here are some of the sites that I reference a lot:

Next week I will attempt to explain if, and how, historians who use R are in any way different than their literature counterparts.



Project Management R+ Text Analysis

First project frights


I am worried that this entire academic enterprise is a blunder. In fact, before I posted this blog I shared my fears with Jason Heppler. He also understands the fears of writing a blog, making mistakes, and the pressure of perfection that all academics face—or at least self-imposed pressure. Luckily for me, Jason shared some great pointers for academic bloggers. Over the summer, I will write on a weekly basis about my first digital humanities project.

I was first introduced to text analysis and topic modeling through Cameron Blevins’ work on Martha Ballard’s diary that he completed when he was Matthew Jockers’s student at Stanford. Nearly a year later I had the good fortune of working with Amanda Gailey on a digital project about children’s literature and race as part of my internship for my certificate in Digital Humanities. As I encoded various newspapers from the Carlisle Indian Boarding School, it occurred to me that Native American history is ready for computational analysis. But I did not have the right archival sources at the time. The next fall, I took a Microanalysis class with Matthew Jockers at UNL in hopes that the training would allow me to move forward. It did—tenfold. At the same time, I discovered a series of American Indian oral history transcriptions and journals written by Jewish settlers. Since then I’ve been following the work of Lincoln Mullen and other historians like Kellen Funk. Friends have directed me to Ben Schmidt’s Bookworm project.

Other than a group project for Matthew Jockers course, this is my first digital humanities project, which means I am learning through trial and error. The University Libraries, Jockers, and other faculty affiliated with the Center for Digital Research in the Humanities have provided strong mentorship and support for my project. This summer I am part of the second iteration of the Graduate Student Incubator project that works with Liz Lorang, the Digital Projects Librarian for the CDRH. One of the first things we had to do was go through an annotated checklist. It included such things as: the scopes of the project, what is our research question, develop a communication plan, and develop a data management plan to name a few. Geared to make us think more clearly about our projects, I spent nearly eight hours crafting the annotated checklist in hopes that a lot of upfront work would save me from bigger headaches as I moved through the project. I was wrong because I had still been thinking about my project like a manuscript. That said, I learned a lot from Liz’s instructive comments. I thought one way that the blog might serve the larger graduate digital humanities community is to actively show the struggles and successes of this project in hopes that those who come across the space will avoid my own “lessons.”

Here are some very helpful hints when thinking about conceiving your digital project:

  • It is not your dissertation. What I mean is that the way you must think about a digital project is entirely different than a dissertation or any other manuscript. For example, dates are important for scope in your written work, but may not be for your digital project.
  • Think about copyright immediately. Not only the archival sources you will pull from, but what license you will use if you provide source code. Standard archival agreements do not always meet the conditions that digital projects create. For example, as Liz rightly pointed out in my first draft: does clearance mean I have the right to use the materials in computational analysis and to publish about them and quote from them in limited form? Or, do you also have permission to make your input data available (the transcriptions you create)?
  • When creating a project work plan, allow for some wiggle room in your schedule. Some things will take much longer than you think. Do you really think that you can knock out that code in a week? Don’t we all make mistakes? Pad your schedule to account for human error.
  • Various aspects of your project may require different licenses. MIT might work for one part of your project, but might not be frequently used for R or MALLET source code.