Hello. Thank you to Maria and the search committee for inviting me to campus today, and for all of you for coming. I really have three objectives for this presentation, for you to get to know me better through my history as a scholar and teacher working in digital humanities, to introduce you to the digital projects I am currently working on, and to give you an opportunity to get some hands on experience deconstructing digital projects as a way for you to identify the tools and techniques you can use in your work. As a leader of the Digital Humanities Initiative at the Graduate Center, I offer you a perspective from someone currently engaged in building a DH community from the ground up. As you will see from my examples, my philosophy is to – in the words of Kathleen Fitzpatrick - “do the risky thing” by experimenting with a data-driven approach to humanities research. To begin, I am going to show you three digital projects I am engaged in to demonstrate the experimental humanities theory, methods, and tools I employ. As my title suggests I structure each of these examples by explaining both the input, or large data sets I investigate as sources, and then the steps needed to process this information into a coherent output.
Because I am currently in the final stages of completing this project, I am going to start with my dissertation. This project is a large scale analysis of student writing in online open spaces using data from the Macaulay eportfolio system. My second example will be my collaborative digital project the Writing Studies Tree, which is online academic genealogy program I co-founded in 2011 and continue to develop with Ben Miller and Jill Belli at the Graduate Center. And finally, I will end by showcasing the Journal of Interactive Technology and Pedagogy, for which I serve as an editor and co-authored a piece for that won a Digital Humanities Award.
I came to the Graduate Center because of the Interactive Technology and Pedagogy program. I knew I wanted to study the intersection of educational technology and the writing process, but in 2009 when I was looking for graduate programs, very few institutions had a invested in the digital humanities, and, as many critics have noted, digital humanities programs were not particularly interested in pedagogy. However, at MLA that year, I saw Matt Gold present on “Looking for Whitman” his collaborative, cross-campus teaching experiment run on WordPress, and after speaking with Matt at the conference I knew CUNY would support the kind of work I wanted to pursue.
As result of my work in the ITP program, I became an Instructional Technology Fellow (ITF) at Macaulay Honors College, a unique program across eight of CUNY’s 24 undergraduate campuses. As an ITF I work to help professors across the disciplines to integrate technology into their courses in pedagogically sound ways, as well as assisting students to complete those assignments. For over a decade, this program has maintained a mutliuser Wordpress install – the same blogging platform that runs 20% of sites on the Internet at this moment. The primary purpose is of this system at Macaulay is to support the creation of course sites and student-run blogs. All students in the Macaulay program are introduced to the possibilities of this system through their four required seminar courses. While called an eportfolio system, these are more like DIY Course Management systems when used as course sites; the professor and ITF build the structure and use it to organize the materials for the course, but the students generate the content through reflective blog posts and multimodal assignments ranging from interactive timelines, videos, walking tours with maps and audio guides, and much more. This experience prepares the students to create their own sites, which they all do in Seminar 2, but also encourages them to carve out their own online space to showcase the work they do over their four years in the program.
I am using an archive of over 3000 Macaulay eportfolio sites as my data for my dissertation, which investigates student writing in online open spaces. This case study challenges the assertions about both the benefits and drawbacks of this practice through a mixed methods approach including surveys and interviews with the students, as well as a distant reading of all 3000 sites, and a close reading of six student-run sites.
A primary goal of my project is to investigate what Jill Walker Rettberg terms “dataism,” or the collection and use of data by corporations through the mining of personal content across devices. Course management systems and other educational technology platforms collect student data for a variety of uses, including product development, marketing, and advertising purposes. In fact, this issue is of such a pressing concern to all of us that just last month Obama proposed the Student Data Privacy Act in order to prohibit technology firms from profiting from information collected in schools. My objective is to subvert the system by mining student data for pedagogical purposes.
This is where we come to input. In order to deal with this large data set, I access the backend of the Wordpress install, which is a database, and first strip out any sites or posts that were marked private, and remove any identifying information in order to meet IRB requirements (and the Internal Review Board process is certainly something we can talk more about in the Q&A) . This data is in MySQL - a database system that runs on my server - and I process it through selecting segments as test cases. For example, this is the raw data from one “Arts in New York City” seminar. I have extracted the content categories that contain relevant information for my purposes, and transferred it to a text editor, which is what you see in this first column. At this point, the data is incredibly messy and requires human intervention in order to strip out unwanted characters - such as leftover HTML code- that will distort my results. After that I pair down the data again to create relationship tables in Excel. In this example I am using a method called topic modeling to see what words students use in proximity to the work “art”. However, I used this same data set to look at how often students posted on the site, the length of their posts, and to compare the relationship between length and frequency of post by student. Many of you will recognize this analysis as useful for assessment purposes, at the course level or programmatic level. And while this may sound complicated to some of you - it will also sound easy to others - because although it looks impressive it really only takes a cursory knowledge of MySQL and Excel to get this far. I taught myself both of these skills through a combination of free online tutorials and the help of my peers at the Graduate Center. These experiments usually start with sketch pad and pencil to work through the information before messing with the data itself. This period of trial and error is both fun - I am playing with my data - and frustrating. I end up with dozens of spreadsheets and sample visualizations for each small subset of the data I am testing.
But the results can be pretty beautiful. I have taken this grammatically incorrect caption from a popular sub-reddit of people who play with data, and this visualization you see is the result of my topic modeling experiment. This cloud contains words that appear in proximity to the word Art in the student’s posts. This visualization was done in Gephi; the nodes are colored by words that also appear in proximity to each other, the size of the word is how many times it appears, and the thickness of the line denotes how strong the connection is to the central word. For this method, the writing is my data, and I am processing this data through what Franco Moretti termed “distant reading” - which is a method developed in order to get a zoomed out view of large quantities of text - too large for an individual to reasonably read. So, from the backend of those WordPress sites I have extracted just the content of the posts and processed it through a series of trials, similar to an experiment in the hard sciences. This is perhaps why we call digital humanities centers “labs,” because we form hypotheses, test data, analyze results, and try to replicate those results. Typically these visualizations do not provide answers, but rather more questions. In this case, I can see that students are writing experientially about their encounters with art (note the size and proximity of the words experience, walking, public, place, meet, and watched) and that they are interacting with a wide variety of art (see the words performance, gallery, photography, recitals, shows, dance, and opera), but I am also wondering the significance of words like “identity” and “strong” in this visualization. This is just one way we can use computational analysis to see writing in a new way.
All three stages of my research include various types of media - videos, data visualizations, infographics, screenshots, and links to live sites. This work is hard to represent on paper. Therefore I am constantly re-imagining what form my final product will take. From the beginning, I have been blogging my way through this process in order to provide an example for other scholars who want to engage in these methods, but also to promote an open, transparent approach to academic work. Actually, one of my prospectus reviews wrote that he had never seen a dissertation like this before. But this is happening in humanities departments worldwide. Hopefully these experiments will create useful products that allow us to argue for the changes we believe need to happen in higher education to better serve our students.
My next example is the Writing Studies Tree. The WST is an online, open-access, crowdsourced database of scholarly relationships within writing studies. It started with a seemingly simple question in a graduate level course in composition. We were reading texts that chronicled the history of our field and trying to translate that information into a timeline of important dates, places, and publications. But the scope quickly exceeded our tools – meaning our notebooks and classroom chalkboard. And the scope also extended beyond out initial lines of inquiry – we realized through our research and discussion that is was the relationships between scholars working in this field that made the history unique, vibrant, and dynamic. But how do you capture relationships? Well, we actually have a model – family trees. Over 8 million people per month visit family tree sites such as ancestry.com and genealogy.com, testifying to the enduring appeal of tracing one’s roots. And once we started looking, we found that in academia, STEM disciplines such as mathematics and neuroscience have for some time engaged in online genealogies that trace “parentage” via dissertation advisement. Yet the humanities have been slower to adopt this technology, in part because mentoring relationships in the humanities are often more complex, with ideas from coursework intertwining with suggestions from multiple thesis committee members to yield research projects separate from, though shaped by, the agendas of these various advisors.
To begin mapping this complex network of overlapping relations, we created the Writing Studies Tree. The site provides simple data entry forms for registered users to name a person or institution of higher education, and to create data-rich links from person to person or from person to school. This fixed data structure, combined with open editing privileges, allows the WST to rapidly aggregate small data-entry efforts into collective visualizations of the field, presenting its history anew and enabling scholars to identify patterns and movements in a variety of ways.
First, is the family view. In this view you can hover over the color-coded lines to reveal relationship type, and you can re-center the tree by clicking on any person icon. Also, if you click on a name, it will take you to that person’s individual view where you can edit and add information.
Secondly, you can view the full network of both people and institutions. This has the same features as the family tree – and some new improvements – so now there is a legend, and you can hover on the icons and color coded lines to see what they represent. Hovering for more than two seconds will highlight that one node and its local network (up to two relations away), and double-clicking any person or institution will link to that individual view.
As of this writing, the WST database contains a web of over 3,000 relations (~1250 people, ~400 schools) and it continues to grow as new members sign on and contribute. We see this distributed approach to data-gathering as an essential ethical component of our project: by crowdsourcing disciplinary self-study and trusting site members to curate the archive, the WST encourages users to see themselves not just as individuals, but also as members of an evolving network of scholars taking part in the collective knowledge-making project of writing studies. Importantly, as more users contribute to the database, it will become increasingly accurate as a representation of the field. And I want to mention that we just received our third grant for this project from the Provost’s Digital Innovation Grants at the Graduate Center – which you will hear more about in the workshop portion of this event.
My third and final example is the Journal of Interactive Technology and Pedagogy for which I am a member of the editorial collective, and have co-authored piece for that ran in Issue four. This journal was inspired by the student projects that came out of the ITP certificate program I mentioned earlier - because many of these projects are multimodal and born digital, they are limited by the constraints of traditional print venues. The goal of this journal is to remix scholarly publication to be more versatile and transparent – in other words we seek both non-traditional input and output. In our “call for submissions” not papers, we encourage potential contributors to utilize the medium through innovative compositions, with a focus on the how and why. Because creating a webtext is still a new process for most academics, we mentor authors using an open peer review process, striving for transparency and a sustained commitment to pedagogy.
Take for example the piece I created with Roger Whitson and Kimon Keramedias. This started with Roger inviting JITP to be a part of his 19th century literature class wherein students would be creating their projects as if they were submitting them to our journal. We used Google Hangout to meet with the students, helping them understand the aim and format of the journal, and critiquing their projects midday through their creation process. The final product contains edited video clips of these meetings, the course site, the final projects, and our reflections – both as polished articles and as “work-in-progress” Google docs that show our writing process. The view navigates these materials through a interactive timeline. As you can see there are many types of data to deal with here – and the more interesting the submissions we receive are the more complex this process becomes. But the outcome is a new model for publishing.
So what infrastructure needs to be in place for experiments such as those I have show you to exist? It takes collaboration from the administration, librarians, faculty, and students to work on establishing server space, open access repositories, and data management plans for these projects.
Licastro DH Workshop
Dealing with Digital, Data-driven Scholarship
in the Humanities
Dissertation Writing Studies Tree
Journal of Interactive