When dealing with datasets, journalists have many options to choose from when moving beyond Excel. Usually the first step is using a relational (or SQL) database. While a relational database can be a good choice for some datasets, data analysts today turn to new tools to gain deeper insight. This talk will show how we can use a graph database to analyze highly connected data using examples from U.S. Congressional data and political email archives. Using the U.S. Congress data, we’ll show you how to explore the dataset using Cypher, the Neo4j query language, to discover legislator activity including bill sponsorship and voting activity. Building up our knowledge of Cypher as we progress, we’ll show how you can use principles from social network analysis to find influential legislators and discover what topics legislators have influence over. Finally, we will examine how to draw insights from the Hillary Clinton email dataset, released as part of a FOIA request earlier this year. We will explore this dataset as a graph of interactions among users, answering questions like: Who is communicating with Hillary the most? What are the topics of these emails? You’ll learn how to visualize these using the Neo4j browser to quickly make sense of the data as we are exploring.
The goal of this talk is to provide a demonstration of database tools that any journalist can use to explore datasets and draw insights from connected datasets.