Exploring the Networks in Open Public Data


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • the raw data not always immediately useful to wide public - using open data - discovering patterns - making sense of it
  • It’s worthwhile to explore networks that emerge from the data you’re looking atVarious kinds of networks: - people in companies (who communicates with whom) - MPs, based on co-voting patterns - companies (networks of)
  • Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. - http://opendefinition.org/http://opendatahandbook.org/en/what-is-open-data/index.html
  • - scrape the data -make it open - clean up the data - transform the data - make it usable [for the purpose]how do we define an edge?
  • We want to choose those parts of data from which we can deduce something - simple procedural decisions are outChose voting instances where there were notable opinion differencesNoise = MPs who had votes only a few times (throws off %s)---Some votes are more important than others
  • Harmony CentreGreens/Farmers–choice: (a) join one of twoclusters; (b) isolation; (c) bridge between them
  • strong voting discipline in the Harmony Centre. majority of the rest do not vote the same (at this value of n%)
  • far opposition / near opposition / coalitionlooks prettydoesnot give much useful information - almost a full graph
  • does it look right at first sight? (the “sniff test”)show to domain expertspeople can make pretty graphs - but what do they mean? - what can we explain or show via them?
  • the Greens / Farmers party is bridging between the strong opposition party Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalitionsee slides 21, 22 re “live animation” showing what happens if you take them off the graph
  • learned from experts: not everything appears as a vote; some votes are more important than others - more insights -> better visualisations (more truthful, etc.)some advanced visualisations will need more information - e.g., to define what laws are on what topicsbringing in more data - annotate nodes & edges with additional data / explanations of why this edge appears here - profiles for members of parliament (e.g., TheyWorkUs site in the UK) - linked data
  • another example of an open data graph visualisation
  • another view of this data: http://www.slideshare.net/DERIGalway/valdis-krebs-social-network-analysis-19872007/15The central red cluster corresponds to the company headquarters. Eachvertex in the network represents an employee, colored according to the locationthey work at. Graph edges denote frequent, confirmed, work-related communi-cations between employees. Cluster overlaps reveal which employees frequentlyinteract with other locations, serving as boundary-spanners. This visualizationhelps to identify key connectors in the company [0].
  • what do we do with thesevisualisations next? = how do we use them (to have impact, explain data, …)
  • social network visualisation & analysis allow to see what was previously invisible“Social Network Analysis” talk by Valdis Krebs - for more info re SNA and network visualization
  • demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalition - (edge connection criteria n = 25%)
  • demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalitionwhen the Greens / Farmers party nodes are hidden from the graph, there is no connection. - the coalition and the Harmony Centre do not vote the same
  • Exploring the Networks in Open Public Data

    1. 1. Exploring the Networks in Open Public Data Uldis BojārsInstitute of Mathematics and Computer Science University of Latvia Using Open Data Workshop Brussels, 20-Jun-2012
    2. 2. About us• Institute of Mathematics and Computer Science, University of Latvia – http://www.lumii.lv/resource/show/170 – Uldis Bojārs @CaptSolo – Valdis Krebs http://orgnet.com – Pēteris Ručevskis
    3. 3. Network visualisation and analysisApplications:• discover interesting patterns• explore data in [more] detailWork from the Open Data Hackaton in Riga• analysis of Saeima voting patterns• http://opendata.lv
    4. 4. Overview• Data needs to be Open• Pre-processing and filtering the data – selecting what to show• Data visualization – iterative process (visualize, refine, repeat)• What’s next?
    5. 5. Open Data needed first (!)“Open data is data that can befreely used, reused and redistributed by anyone …” http://opendefinition.org/Data needs to be:• open• easy to useStill a problem in Latvia:• only a few datasets are open in an easy-to-consume form (PDF does not count :)
    6. 6. http://titania.saeima.lv/LIVS11/SaeimaLIVS2_DK.nsf/0/9DEA96450E79B7E5C2257944007E589D?OpenDocument
    7. 7. Pre-processing• Input: – raw vote data (scraped from the website) published at http://data.opendata.lv/• Output: – nodes (MPs) – edges (connections between them)• What is a connection?
    8. 8. Defining graph connections• Connect MPs if they have voted similarly – disagreed on at most n% of decisions• Filter out cases where almost all MPs voted the same• Filter out trivial decisions• Filter out noise
    9. 9. Node colour legend• Ruling coalition: – Zatler’s Reform Party – Unity – the National Alliance• Opposition: – Harmony Centre – Greens / Farmers Party• a few non-party MPs
    10. 10. MPs who always vote the same (n = 0%) Connection criteria too narrow
    11. 11. MPs who disagree in less than 35% of cases Connection criteria too broad (everyone agrees, really?)
    12. 12. Refining the visualisation• Need to find the right cut-off values (n%) – where patterns [start to] appear – and the visualisation makes sense• Show the results to domain experts – MPs, journalists, political researchers, …• Experts: – help improve visualisations – can discover new things for themselves
    13. 13. MPs who disagree in less than 11% of casesOpposition parties [sometimes] vote the same
    14. 14. MPs who disagree in less than 25% of cases Bridges appear b/w position and opposition parties(see slides 21, 22 re the bridging role of yellow nodes)
    15. 15. What next?• Improve our understanding of data• Enhance visualisations – add clusters, etc.• Create multiple visualisations – different topics, changes in time, etc.• Bring in more data – explain nodes & edges
    16. 16. network visualisation example #1 Donations to political partieshttp://www.thenetworkthinkers.com/2011/12/ innovation-happens-at-intersections.html
    17. 17. network visualisation example #2Intra-company communication patterns
    18. 18. Conclusion• Need more, useful Open Data• Discovering patterns, making sense of data – helping make sense = purpose of visualisations• Looking forward to collaboration re: – Using Open Data – Data Visualisation and Analysis
    19. 19. More info• Uldis Bojārs uldis.bojars@gmail.com• Social Network Analysis talk / Valdis Krebs http://www.slideshare.net/DERIGalway/ valdis-krebs-social-network-analysis-19872007• Smart Network Analyzer tool http://sna.lumii.lv/ in development at IMCS, University of Latvia