Project Management R+ Text Analysis

First project frights


I am worried that this entire academic enterprise is a blunder. In fact, before I posted this blog I shared my fears with Jason Heppler. He also understands the fears of writing a blog, making mistakes, and the pressure of perfection that all academics face—or at least self-imposed pressure. Luckily for me, Jason shared some great pointers for academic bloggers. Over the summer, I will write on a weekly basis about my first digital humanities project.

I was first introduced to text analysis and topic modeling through Cameron Blevins’ work on Martha Ballard’s diary that he completed when he was Matthew Jockers’s student at Stanford. Nearly a year later I had the good fortune of working with Amanda Gailey on a digital project about children’s literature and race as part of my internship for my certificate in Digital Humanities. As I encoded various newspapers from the Carlisle Indian Boarding School, it occurred to me that Native American history is ready for computational analysis. But I did not have the right archival sources at the time. The next fall, I took a Microanalysis class with Matthew Jockers at UNL in hopes that the training would allow me to move forward. It did—tenfold. At the same time, I discovered a series of American Indian oral history transcriptions and journals written by Jewish settlers. Since then I’ve been following the work of Lincoln Mullen and other historians like Kellen Funk. Friends have directed me to Ben Schmidt’s Bookworm project.

Other than a group project for Matthew Jockers course, this is my first digital humanities project, which means I am learning through trial and error. The University Libraries, Jockers, and other faculty affiliated with the Center for Digital Research in the Humanities have provided strong mentorship and support for my project. This summer I am part of the second iteration of the Graduate Student Incubator project that works with Liz Lorang, the Digital Projects Librarian for the CDRH. One of the first things we had to do was go through an annotated checklist. It included such things as: the scopes of the project, what is our research question, develop a communication plan, and develop a data management plan to name a few. Geared to make us think more clearly about our projects, I spent nearly eight hours crafting the annotated checklist in hopes that a lot of upfront work would save me from bigger headaches as I moved through the project. I was wrong because I had still been thinking about my project like a manuscript. That said, I learned a lot from Liz’s instructive comments. I thought one way that the blog might serve the larger graduate digital humanities community is to actively show the struggles and successes of this project in hopes that those who come across the space will avoid my own “lessons.”

Here are some very helpful hints when thinking about conceiving your digital project:

  • It is not your dissertation. What I mean is that the way you must think about a digital project is entirely different than a dissertation or any other manuscript. For example, dates are important for scope in your written work, but may not be for your digital project.
  • Think about copyright immediately. Not only the archival sources you will pull from, but what license you will use if you provide source code. Standard archival agreements do not always meet the conditions that digital projects create. For example, as Liz rightly pointed out in my first draft: does clearance mean I have the right to use the materials in computational analysis and to publish about them and quote from them in limited form? Or, do you also have permission to make your input data available (the transcriptions you create)?
  • When creating a project work plan, allow for some wiggle room in your schedule. Some things will take much longer than you think. Do you really think that you can knock out that code in a week? Don’t we all make mistakes? Pad your schedule to account for human error.
  • Various aspects of your project may require different licenses. MIT might work for one part of your project, but might not be frequently used for R or MALLET source code.