Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Govhack - Collections of World War One Connecting the Dots


Published on

  • Be the first to comment

  • Be the first to like this

Govhack - Collections of World War One Connecting the Dots

  1. 1. Collections of War Connecting the dots for GovHack 2015 Aimee Maree Forsstrom – Erle Pereira
  2. 2. OpenData and OpenGov So our story begins with “You” the happy hacker heading out to join the OpenGov Hackathon to play with Open Data
  3. 3. OpenData and OpenGov You arrive at the hackspace and announce… “I have a great idea”
  4. 4. OpenData and OpenGov You browse the OpenData sets and think... Oh wow there are so many… I feel like I am running around in circles..
  5. 5. OpenData and OpenGov [Streaming tears when one realises they have to understand the connections in the data to build their app] But your a hacker, so you dont give up...
  6. 6. OpenData and OpenGov Then you think…. “But I just wanted to be creative, I just wanted to help… Now I dont know where to start?”
  7. 7. So many data sets So you wanted to make an app about World War One and you took a look at the datasets? How many World War One datasets do we have? How many formats are they displayed in? Where is the unique indentifier?
  8. 8. The problem is with the technology So the problem we find is not with the lack of enthusiam from the hackers… It is the reality of the current state of Open Data sets and there disparete nature So my team decided to help aliviate your frustration :D
  9. 9. What was our Project you ask? We wanted to look at sentiment analysis of the World War One Diaries and connect it to the Portraits of the Soldiers But Louise Denoon suggested how it would be great to see sentiment analysis of their Diares (personal account) aposed to Newspapper clippings or letters to Departments (Offical correspondence)
  10. 10. What was the problem Different datasets across different formats Not enough time to understand the Schema Even if we had more time the Schema would be no help as what were are trying to do is discover the connection across different data
  11. 11. The Solution? Traditional methods of Databases are not going to solve this problem We were facing the unanswered issue of Big Data So much information how does the human mind comprehend it and make meaningful connections We needed a better Database!
  12. 12. Technology saves the Day SQL just wont cut it, you need to understand your data before you can write a join… You need to understand your fields?
  13. 13. Its all about the Relationships With Graphing databases like Neo4J we can coalate a variety of diverse datasets Neo4J starts to learn the Relationships between the data for us Instead of understanding our data joins we start spending the time defining the question we want answered by our app
  14. 14. SQL Query vs Relationship Query So instead of writing SELECT firstname FROM person WHERE person.nickname= 'John Brown Soldier' You would write MATCH p:person WHERE p.nickname= 'John Brown Soldier' RETURN p.firstname p is not in the table person, or even 'is' a person, it simply has the label person unlike in SQL we need to define that John is a Nickname in the table Person aka we need to know our schema
  15. 15. SQL Query vs Relationship Query So instead of writing SELECT firstname, FROM person JOIN batallion ON person.batallionid = WHERE person.nickname= 'John Brown' AND order.rank = 'Soldier' You would write MATCH (p:person)-[:in]-(t:batallion) WHERE p.nickname= 'John Brown' AND o.rank = 'Soldier' RETURN p.firstname,
  16. 16. Schema SQL is a Schema Database and you need to understand the Schema But in Big Data we dont have the time to understand the Schema how can we ever know the full Schema?
  17. 17. SQL vs Graphing This means in the old SQL Query world we would need to know that John was a Person and even a Soldier So p is not in the table person, or even 'is' a person, it simply has the label person, we have defined the relationship (the correlation) and asking the database to graph the connection
  18. 18. Potential In the world of Big Data we can no longer rely on knowing the schema of the information that we will be performing analysis on We need graphing databases in order to start to see the correlations between disparete sets of data
  19. 19. Gov Hack 2016? What does this mean for next years Gov Hack? We will be able to import all the datasets from WorldWarOne and give JSON access to various disparete datasets and allow the hackers that extra time and head space to create the app to consume and create aka what they do best
  20. 20. Research Potential What does this mean for research? World War One researchers do not have to try to understand mulitpule dataset schemas Because the Database does it for them researchers spend their mental effort on drawing deeper connections and relationships Understanding the web of knowledge Unlocking new discoveries from old data
  21. 21. Its only a Prototype Our team consisted of two people We had big plans for a fancy frontend and more datasets But time is always a factor This is our technology showcase to display the potential of a new way to look at datasets for to enable hackers with the least barriers to data
  22. 22. Where do we go from here? We have created a Docker Image of Neo4J please visit our github please contribute please fork