Hello. Thank you to John Shanahan and the conference committee for inviting me to campus today, and for all of you for coming. I really have three objectives for this presentation, to show you my journey as a scholar and teacher coming from DePaul’s Master’s in English Program, to introduce you to the digital projects I am currently working on, and to give you an opportunity to get some hands on experience deconstructing digital projects as a way for you to identify the tools and techniques you can use in your work.
I became interested in the digital humanities here at DePaul back in 2006, when in Jonathan Gross’s textual studies class we read Jerome McGann’s Rationale of Hypertext. I attempted to actually construct my final paper in hypertext for that class – and as you will see from my examples, my philosophy continues to be – in the words of Kathleen Fitzpatrick - “do the risky thing” by experimenting with a data-driven approach to humanities research. My goal is to demonstrate the experimental humanities theory, methods, and tools I employ as a model for the kind of work you can pursue as scholars in this field. As my title suggests I structure each of these examples by explaining both the input, or large data sets I investigate as sources, and then the steps needed to process this information into a coherent output.
Because I am currently in the final stages of completing this project, I am going to start with my dissertation. This project is a large scale analysis of student writing in online open spaces using data from the Macaulay eportfolio system. My second example will be my collaborative digital project the Writing Studies Tree, which is online academic genealogy program I co-founded in 2011 and continue to develop with Ben Miller and Jill Belli at the Graduate Center. And finally, I will end by showcasing the Journal of Interactive Technology and Pedagogy, for which I serve as an editor and co-authored a piece for that won a Digital Humanities Award.
After DePaul, I choose to study at the Graduate Center because of the Interactive Technology and Pedagogy program. After teaching as an adjunct and integrating digital tools into my pedagogy, I knew I wanted to study the intersection of educational technology and the writing process, but in 2009 when I was looking for graduate programs, very few institutions had a invested in the digital humanities, and, as many critics have noted, digital humanities programs were not particularly interested in pedagogy. So I pursued this connection by attending conferences and speaking to professors and students working in these fields to identify which institutions would support my work. Throughout my career at the Graduate Center I have been encouraged to experiment with digital methods: for instance, I TEI encoded an Emily Dickinson poem, taking a hand-written manuscript and coding it to be machine readable with annotations embedded in the code to communicate to human readers as well. I also did what Franco Moretti terms a distant reading of prefaces in 18th century novels, producing visualizations of the terms authors used to legitimize these texts before the genre existed. And I created the online academic genealogy that I will show you in a moment. It is only through these experiments, and all of the trial and error it took to understand how to build them, and how to theorize and communicate their relevance, that I was able to embark on my current research project. It is a process I term “failing forward,” which I borrow from the programming world, and for me applies to any form of communication. This combination of theory and practice not only gave me the tools to formulate my dissertation project, it also helped me secure an Instructional Technology Fellowship at Macaulay Honors College, which was a key turning point in my academic career.
The Macaulay Honors College is a unique program across eight of CUNY’s 24 undergraduate campuses. As an Instructional Technology Fellow (ITF) I work to help professors across the disciplines to integrate technology into their courses in pedagogically sound ways, as well as assisting students to complete those assignments. For over a decade, this program has maintained a mutliuser Wordpress install – the same blogging platform that runs 20% of sites on the Internet at this moment. The primary purpose is of this system at Macaulay is to support the creation of course sites and student-run blogs. The students generate the content through reflective blog posts and multimodal assignments ranging from interactive timelines, videos, walking tours with maps and audio guides, and much more.
I am using an archive of over 3000 Macaulay eportfolio sites as my data for my dissertation, which investigates student writing in online open spaces. This case study challenges the assertions about both the benefits and drawbacks of this practice through a mixed methods approach including surveys and interviews with the students, as well as a distant reading of all 3000 sites, and a close reading of six student-run sites.
A primary goal of my project is to investigate what Jill Walker Rettberg terms “dataism,” or the collection and use of data by corporations through the mining of personal content across devices. Course management systems and other educational technology platforms collect student data for a variety of uses, including product development, marketing, and advertising purposes. In fact, this issue is of such a pressing concern to all of us that recently Obama proposed the Student Data Privacy Act in order to prohibit technology firms from profiting from information collected in schools. My objective is to subvert the system by mining student data for pedagogical purposes.
This is where we come to input. In order to deal with this large data set, I access the backend of the Wordpress install, which is a database, and first strip out any sites or posts that were marked private, and remove any identifying information in order to meet IRB requirements (and the Internal Review Board process is certainly something we can talk more about in the Q&A) . This data is in MySQL - a database system that runs on my server - and I process it through selecting segments as test cases. For example, this is the raw data from one “Arts in New York City” seminar. I have extracted the content categories that contain relevant information for my purposes, and transferred it to a text editor, which is what you see in this first column. At this point, the data is incredibly messy and requires human intervention in order to strip out unwanted characters - such as leftover HTML code- that will distort my results. After that I pair down the data again to create relationship tables in Excel. In this example I am using a method called topic modeling to see what words students use in proximity to the work “art”. However, I used this same data set to look at how often students posted on the site, the length of their posts, and to compare the relationship between length and frequency of post by student. Many of you will recognize this analysis as useful for assessment purposes, at the course level or programmatic level. And while this may sound complicated to some of you - it will also sound easy to others - because although it looks impressive it really only takes a cursory knowledge of MySQL and Excel to get this far. I taught myself both of these skills through a combination of free online tutorials and the help of my peers at the Graduate Center. These experiments usually start with sketch pad and pencil to work through the information before messing with the data itself. This period of trial and error is both fun - I am playing with my data - and frustrating. I end up with dozens of spreadsheets and sample visualizations for each small subset of the data I am testing.
But the results can be pretty beautiful. I have taken this grammatically incorrect caption from a popular sub-reddit of people who play with data, and this visualization you see is the result of my topic modeling experiment. This cloud contains words that appear in proximity to the word Art in the student’s posts. This visualization was done in Gephi; the nodes are colored by words that also appear in proximity to each other, the size of the word is how many times it appears, and the thickness of the line denotes how strong the connection is to the central word. For this method, the writing is my data, and I am processing this data through “distant reading” - which is a method developed in order to get a zoomed out view of large quantities of text - too large for an individual to reasonably read. So, from the backend of those WordPress sites I have extracted just the content of the posts and processed it through a series of trials, similar to an experiment in the hard sciences. This is perhaps why we call digital humanities centers “labs,” because we form hypotheses, test data, analyze results, and try to replicate those results. Typically these visualizations do not provide answers, but rather more questions. In this case, I can see that students are writing experientially about their encounters with art (note the size and proximity of the words experience, walking, public, place, meet, and watched) and that they are interacting with a wide variety of art (see the words performance, gallery, photography, recitals, shows, dance, and opera), but I am also wondering the significance of words like “identity” and “strong” in this visualization. This is just one way we can use computational analysis to see writing in a new way.
All three stages of my research include various types of media - videos, data visualizations, infographics, screenshots, and links to live sites. This work is hard to represent on paper. Therefore I am constantly re-imagining what form my final product will take. From the beginning, I have been blogging my way through this process in order to provide an example for other scholars who want to engage in these methods, but also to promote an open, transparent approach to academic work.. Hopefully these experiments will create useful products that allow us to argue for the changes we believe need to happen in higher education to better serve our students.
My next example is the Writing Studies Tree. The WST is an online, open-access, crowdsourced database of scholarly relationships within writing studies. It started with a seemingly simple question in a graduate level course in composition. We were reading texts that chronicled the history of our field and trying to translate that information into a timeline of important dates, places, and publications. But the scope quickly exceeded our tools – meaning our notebooks and classroom chalkboard. And the scope also extended beyond out initial lines of inquiry – we realized through our research and discussion that is was the relationships between scholars working in this field that made the history unique, vibrant, and dynamic. We found a model -Over 8 million people per month visit family tree sites such as ancestry.com and genealogy.com, testifying to the enduring appeal of tracing one’s roots. And once we started looking, we found that in academia, STEM disciplines such as mathematics and neuroscience have for some time engaged in online genealogies that trace “parentage” via dissertation advisement. So, we used these models to build a humanities driven geneology project.
The site provides simple data entry forms for registered users to name a person or institution of higher education, and to create data-rich links from person to person or from person to school. This fixed data structure, combined with open editing privileges, allows the WST to rapidly aggregate small data-entry efforts into collective visualizations of the field, presenting its history anew and enabling scholars to identify patterns and movements in a variety of ways.
This is the family view. In this view you can hover over the color-coded lines to reveal relationship type, and you can re-center the tree by clicking on any person icon. Also, if you click on a name, it will take you to that person’s individual view where you can edit and add information.
You can also view the full network of both people and institutions. This has the same features as the family tree – and some additional features, such as the legend you see, and this nifty new trick where if you Hover for more than two seconds on one node will highlight the its local network (up to two relations away). .
At this time, the WST database contains a web of over 3,000 relations (~1250 people, ~400 schools) and it continues to grow as new members sign on and contribute. We see this distributed approach to data-gathering as an essential ethical component of our project: by crowdsourcing disciplinary self-study and trusting site members to curate the archive, the WST encourages users to see themselves not just as individuals, but also as members of an evolving network of scholars taking part in the collective knowledge-making project of writing studies.
My third and final example is the Journal of Interactive Technology and Pedagogy for which I am a member of the editorial collective, and have co-authored piece for that ran in Issue four. This journal was inspired by the student projects that came out of the ITP certificate program I mentioned earlier - because many of these projects are multimodal and born digital, they are limited by the constraints of traditional print venues. The goal of this journal is to remix scholarly publication to be more versatile and transparent – in other words we seek both non-traditional input and output. In our “call for submissions” not papers, we encourage potential contributors to utilize the medium through innovative compositions, with a focus on the how and why. Because creating a webtext is still a new process for most academics, we mentor authors using an open peer review process, striving for transparency and a sustained commitment to pedagogy.
Take for example the piece I created with Roger Whitson and Kimon Keramedias. This started with Roger inviting JITP to be a part of his 19th century literature class wherein students would be creating their projects as if they were submitting them to our journal. We used Google Hangout to meet with the students, helping them understand the aim and format of the journal, and critiquing their projects midday through their creation process. The final product contains edited video clips of these meetings, the course site, the final projects, and our reflections – both as polished articles and as “work-in-progress” Google docs that show our writing process. The view navigates these materials through a interactive timeline. As you can see there are many types of data to deal with here – and the more interesting the submissions we receive are the more complex this process becomes. But the outcome is a new model for publishing.
So what infrastructure needs to be in place for experiments such as those I have show you to exist? It takes collaboration from the administration, librarians, faculty, and students to work on establishing server space, open access repositories, and data management plans for these projects.
Dealing with Digital, Data-driven Scholarship
in the Humanities
Dissertation Writing Studies Tree
Journal of Interactive