4. OpenData and OpenGov
You browse the OpenData sets and think...
Oh wow there are
so many…
I feel like I am running
around in circles..
5. OpenData and OpenGov
[Streaming tears when one realises they have to
understand the connections in the data to build
their app]
But your a hacker,
so you dont give up...
6. OpenData and OpenGov
Then you think…. “But I just wanted to be
creative, I just wanted to help… Now I dont know
where to start?”
7. So many data sets
So you wanted to make an app about World
War One and you took a look at the datasets?
How many World War One datasets do we
have?
How many formats are they displayed in?
Where is the unique indentifier?
8. The problem is with the technology
So the problem we find is not with the lack of
enthusiam from the hackers…
It is the reality of the current state of Open Data
sets and there disparete nature
So my team decided to help aliviate your
frustration :D
9. What was our Project you ask?
We wanted to look at sentiment analysis of the
World War One Diaries and connect it to the
Portraits of the Soldiers
But Louise Denoon suggested how it would be
great to see sentiment analysis of their Diares
(personal account) aposed to Newspapper
clippings or letters to Departments (Offical
correspondence)
10. What was the problem
Different datasets across different formats
Not enough time to understand the Schema
Even if we had more time the Schema would be
no help as what were are trying to do is
discover the connection across different data
11. The Solution?
Traditional methods of Databases are not going to
solve this problem
We were facing the unanswered issue of Big Data
So much information how does the human mind
comprehend it and make meaningful connections
We needed a better Database!
12. Technology saves the Day
SQL just wont cut it, you need to understand
your data before you can write a join… You
need to understand your fields?
13. Its all about the Relationships
With Graphing databases like Neo4J we can
coalate a variety of diverse datasets
Neo4J starts to learn the Relationships
between the data for us
Instead of understanding our data joins we start
spending the time defining the question we
want answered by our app
14. SQL Query vs Relationship Query
So instead of writing
SELECT firstname
FROM person
WHERE person.nickname= 'John Brown Soldier'
You would write
MATCH p:person
WHERE p.nickname= 'John Brown Soldier'
RETURN p.firstname
p is not in the table person, or even 'is' a person, it simply has the
label person unlike in SQL we need to define that John is a
Nickname in the table Person aka we need to know our schema
15. SQL Query vs Relationship Query
So instead of writing
SELECT firstname,batallion.name
FROM person JOIN batallion ON person.batallionid = batallion.id
WHERE person.nickname= 'John Brown'
AND order.rank = 'Soldier'
You would write
MATCH (p:person)-[:in]-(t:batallion)
WHERE p.nickname= 'John Brown'
AND o.rank = 'Soldier'
RETURN p.firstname, t.name
16. Schema
SQL is a Schema Database and you need to
understand the Schema
But in Big Data we dont have the time to
understand the Schema how can we ever know
the full Schema?
17. SQL vs Graphing
This means in the old SQL Query world we
would need to know that John was a Person
and even a Soldier
So p is not in the table person, or even 'is' a
person, it simply has the label person, we have
defined the relationship (the correlation) and
asking the database to graph the connection
18. Potential
In the world of Big Data we can no longer rely
on knowing the schema of the information that
we will be performing analysis on
We need graphing databases in order to start to
see the correlations between disparete sets of
data
19. Gov Hack 2016?
What does this mean for next years Gov Hack?
We will be able to import all the datasets from
WorldWarOne and give JSON access to
various disparete datasets and allow the
hackers that extra time and head space to
create the app to consume and create aka what
they do best
20. Research Potential
What does this mean for research?
World War One researchers do not have to try to
understand mulitpule dataset schemas
Because the Database does it for them
researchers spend their mental effort on drawing deeper
connections and relationships
Understanding the web of knowledge
Unlocking new discoveries from old data
21. Its only a Prototype
Our team consisted of two people
We had big plans for a fancy frontend and more
datasets
But time is always a factor
This is our technology showcase to display the
potential of a new way to look at datasets for to
enable hackers with the least barriers to data
22. Where do we go from here?
We have created a Docker Image of Neo4J
please visit our github
https://github.com/erlepereira/govhackau2015
please contribute
please fork