Hello. Thank you to Jill O’Neil and the organizing committee for inviting me to be a part on this roundtable. As the only graduate student on the agenda, and a leader of the Digital Humanities Initiative here at the Graduate Center, I offer you a perspective from someone currently engaged in building a DH community from the ground up. As you will see from my examples, my philosophy is to – in the words of Kathleen Fitzpatrick - “do the risky thing” by experimenting with a data-driven approach to humanities research. Today I will show you three projects I am currently engaged in in order to demonstrate the methods I employ. As my title suggests I structure each of these examples by explaining both the input, or large data sets I investigate, and then the steps needed to process this information into a coherent output.
I wouldn’t be a good grad student if I didn’t start with my dissertation. My project is a large scale analysis of student writing in online open spaces using data from the Macaulay eportfolio system. My second example will be my collaborative digital project the Writing Studies Tree, which is online academic genealogy program I co-founded in 2011 and continue to develop with Ben Miller and Jill Belli here at the Graduate Center. And finally, I will end by showcasing the Journal of Interactive Technology and Pedagogy, for which I serve as an editor and co-authored a piece for that won a Digital Humanities Award.
I came to the graduate center because of the Interactive Technology and Pedagogy program. I knew I wanted to study the intersection of educational technology and the writing process, but in 2009 when I was looking for graduate programs, very few institutions had a invested in the digital humanities, and, as many critics have noted, digital humanities programs were not particularly interested in pedagogy. However, at MLA in that year, I saw Matt Gold present on “Looking for Whitman” project his collaborative, cross-campus teaching experiment run on WordPress, and after speaking with Matt at the conference I knew CUNY would support the kind of work I wanted to pursue.
Once accepted, I became an Instructional Technology Fellow at Macaulay Honors College, a unique program across eight of the CUNY undergraduate courses. As an ITF I work to help professors across the disciplines to integrate technology into their course in pedagogically sound ways. for over a decade, this program has maintained a mutliuser Wordpress install with BuddyPress built in - similar to the CUNY Academic Commons, or the MLA Commons - however the primary purpose is to support the creation of course sites and student-run blogs. All students in the Macaulay program are introduced to the possibilities of this system through their four required seminar courses. While they are called eportfolios, these are more like DIY Course Management systems, where the professor and ITF build the structure and use it to organize the materials for the course, but it the students who generate the content through reflective blog posts and multimodal assignments ranging from interactive timelines, videos, walking tours with maps and audio guides, and much more. This experience prepares the students to create their own sites, which they all do in Seminar 2, but also encourages them to carve out their own space to showcase the work they do over their four years in the program.
I am using the archive of over 3000 Macaulay eportfolio sites as my data for my dissertation, which investigates student writing in online open spaces. This is where we come to input. In order to deal with this large data set, I first had to strip out any sites or posts that were marked private, and remove any identifying information in order to meet IRB requirements. This data is in MySQL - a database system that runs on my server - and I process it through selecting segments as test cases. For example, this is the raw data from one “Arts in New York City” seminar. I have extracted the content categories that contain relevant information for my purposes, and transfer it to a text editor in order to strip out unwanted characters - such as leftover HTML code that will distort my results. After that I pair down the data again to create relationship tables in Excel. In this example I am using topic modeling to see what words students use in proximity to the work “art”. I used this same data set to look at how often students posted on the site, the length of their post and to compare the relationship between length and frequency of post by student. These experiments usually start with sketch pad and pencil to work through the information before messing with the data itself. This period of trial and error is both fun - I am playing with my data - and frustrating. I end up with dozens of spreadsheets and sample visualizations for each small subset of the data I am testing.
But the results can be pretty beautiful. I have taken this grammatically incorrect caption from a popular sub-reddit of people who play with data, but this visualization you see in the result of my topic modeling experiment. This cloud contains words that appear in proximity to the word Art in the student’s posts. This visualization was done in Gephi; the nodes are colored by words that also appear in proximity to each other, the size of the word is how many times it appears, and the thickness of the line denotes how strong the connection is to the central word. This is one way to see writing in a new way.
The WST is an online, open-access, crowdsourced database of scholarly relationships within writing studies. It started with a seemingly simple question in a graduate level course in composition. We were reading texts that chronicled the history of our field and trying to translate that information into a timeline of important dates, places, and publications. But the scope quickly exceeded our tools – meaning our notebooks and classroom chalkboard. And the scope also extended beyond out initial lines of inquiry – we realized through our research and discussion that is was the relationships between scholars working in this field that made the history unique, vibrant, and dynamic. But how do you capture relationships? Well, we actually have a model – family trees. Over 8 million people per month visit family tree sites such as ancestry.com and genealogy.com, testifying to the enduring appeal of tracing one’s roots. And once we started looking, we found that in academia, STEM disciplines such as mathematics and neuroscience have for some time engaged in online genealogies that trace “parentage” via dissertation advisement. Yet the humanities have been slower to adopt this technology, in part because mentoring relationships in the humanities are often more complex, with ideas from coursework intertwining with suggestions from multiple thesis committee members to yield research projects separate from, though shaped by, the agendas of these various advisors.
To begin mapping this complex network of overlapping relations, we created the Writing Studies Tree1 The site provides simple data entry forms for registered users to name a person or institution of higher education, and to create data-rich links from person to person or from person to school. This fixed data structure, combined with open editing privileges, allows the WST to rapidly aggregate small data-entry efforts into collective visualizations of the field, presenting its history anew and enabling scholars to identify patterns and movements in new ways.
First, is the family view. In this view you can hover over the color-coded lines to reveal relationship type, and you can re-center the tree by clicking on any person icon. Also, if you click on a name, it will take you to that person’s individual view where you can edit and add information.
Secondly, (and this may take a while to load) you can view the full network of both people and institutions. This has the same features as the family tree – and some new improvements – so now there is a legend, and you can hover on the icons and color coded lines to see what they represent. Hovering for more than two seconds will highlight that one node and its local network (up to two relations away), and double-clicking any person or institution will link to that individual view.
As of this writing, the WST database contains a web of over , ~1250 people, ~400 schools, over 3,000 relations and it continues to grow as new members sign on and contribute. We see this distributed approach to data-gathering as an essential ethical component of our project: by crowdsourcing disciplinary self-study and trusting site members to curate the archive, the WST encourages users to see themselves not just as individuals, but also as members of an evolving network of scholars taking part in the collective knowledge-making project of writing studies. Importantly, as more users contribute to the database, it will become increasingly accurate as a representation of the field. And I want to mention that we just received our third grant for this project from the Provost’s Digital Innovation Grants here at the Graduate Center. Thank you!
My third and final example is the Journal of Interactive Technology and Pedagogy for which I am a member of the editorial collective, and have co-authored piece for that ran in Issue four. This journal was inspired by the student projects that came out of our ITP certificate program, because many of these projects are multimodal and born digital - projects that cannot be showcased in traditional print venues. The goal of this journal is to remix scholarly publication to be more transparent and versatile – in other words we seek both non-traditions input and output. In our “call for submissions” not papers, we encourage potential contributors to utilize the medium through innovative, with a focus on the how and why. Because creating a webtext is still a new process for most academics, we mentor authors using an open peer review process, striving for transparency and a sustained commitment to pedagogy.
Take for example the piece I created with Roger Whitson and Kimon Keramedias. This started with Roger inviting JITP to be a part of his 19th century literature class wherein students would be creating their projects as if they were submitting them to our journal. We used Google Hangout to meet with the students, helping them understand the aim and format of the journal, and critiquing their projects midday through their creation process. The final product contains edited clips of these meetings, the course site, the final projects, and our reflections – both as polished articles and as “work-in-progress” Google docs that show our writing process. The view navigates these materials through a interactive timeline. As you can see there are many types of data to deal with here – and the more interesting the submissions we receive are the more complex this process becomes. But the outcome is a new model for publishing.
Dealing with Digital, Data-driven Scholarship
in the Humanities
Dissertation Writing Studies Tree
Journal of Interactive