WhyMCA HappyHour - EUHackathon Part II

966 views

Published on

The presentation for my speech at #WhyMCA HappyHour about the EUHackathon '11

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
966
On SlideShare
0
From Embeds
0
Number of Embeds
528
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • WhyMCA HappyHour - EUHackathon Part II

    1. 1. HappyHour WhyMCA “OpenData” Rome, Dec. 15th 2011 EUHackathon: Hacking data to deliver meaningful information Part IIAlessandro Manfredi
    2. 2. EUHackathon Nov, 8-9 ’11 Long story short: we went, we coded, we had fun, got a prize, etc.
    3. 3. EUHackathon Nov, 8-9 ’11 Long story short: we went, we coded, we had fun, got a prize, etc.
    4. 4. @matteocollina What these guys said @Giuliano84
    5. 5. @matteocollina What these guys said @Giuliano84
    6. 6. @matteocollina What these guys said @Giuliano84
    7. 7. @matteocollina What these guys said @Giuliano84
    8. 8. @matteocollina What these guys said @Giuliano84 OpenData Visualization Meaningful Information
    9. 9. Data Sources Transparency Report
    10. 10. Data Sources Crowd-Sourced Transparency ReportHackathon sponsor
    11. 11. Data Sources Crowd-Sourced Aggregated data Unfilteredusers reports Transparency Report Hackathon sponsor
    12. 12. Data Sources Crowd-Sourced Aggregated data Unfilteredusers reports(kind of a bloody mess) Transparency Report Hackathon sponsor
    13. 13. So, how about the GTT ?
    14. 14. So, how about the GTT ? Transparency Report
    15. 15. Roadmap from 10k ft• Clean the data and remove noise• Combine data from from different sources >• Put everything in an easy-to-query format• Throw the result inside a DB• Build a nice interface to < to display meaningful information :-)
    16. 16. In practice (1/3)• Data from Google and OpenNet were already aggregated • Good: ready to use as information • Bad: not much to do with them • Bad: they were only about some countries (~75)• So we also filtered data from Herdict to get only reports relevant to these countries.• We combined data from both with some stats extracted from Herdict reports to provide country-specific information...
    17. 17. Like...
    18. 18. Like... Content removal requestsTransparency Alert Censored categories
    19. 19. We did something similar at site-specific level... (2/3)
    20. 20. Like...
    21. 21. website keywords Like... website preview # of unreachability warnings
    22. 22. In practice (3/3)• Data from Herdict were a little bit messy • Good: direct users reports, a lot of data • Bad: not verified, confirmed, or ranked • Bad: user’s typo, non-existent ISPs, etc. • Bad: some obvious fake data • e.g., 600+ fake reports of palestine-info.co.uk being inaccessible from ISP [A-Za-z0-9]{8}
    23. 23. In practice (3/3)• We considered only websites with more than <T> reports • and only (www.)?domain.<tld> with some exceptions, like [^.]*.blogspot.com or [^.]*.wordpress.com• We aggregated reports per-(ISP, country) and per-site• So that it was easy to get responses to queries like: • From which countries the website X has been reported as unreachable? • From which ISPs in country Y the website X is reported as unreachable?
    24. 24. http://www.sharpnod.es/
    25. 25. http://www.sharpnod.es/ Live Demo?
    26. 26. Cool things we didnt have time for • Keyword-based websites search • Selection of a temporal interval • A sort of “PLAY” button to visualize the evolution of the graph through time • ...many more :-)
    27. 27. Cool things we knewwe wouldnt have time for Real-time reachability check using proxies located in several countries. How about using ToR with .. ? ExitNodes <Nodes-country-X-ISP-Y> StrictExitNodes 1 Infer censorship applied by ISP in higher positions in the internet graph.
    28. 28. Q (&A)?
    29. 29. HappyHour WhyMCA “OpenData” Rome, Dec. 15th 2011Alessandro Manfrediwww.n0on3.net@n0on3

    ×