SlideShare a Scribd company logo
HappyHour WhyMCA “OpenData”
                      Rome, Dec. 15th 2011



       EUHackathon: Hacking data to deliver
            meaningful information
                     Part II



Alessandro Manfredi
EUHackathon Nov, 8-9 ’11

   Long story short:
      we went,
      we coded,
     we had fun,
     got a prize,
         etc.
EUHackathon Nov, 8-9 ’11

   Long story short:
      we went,
      we coded,
     we had fun,
     got a prize,
         etc.
@matteocollina
                 What these guys said
                   @Giuliano84
@matteocollina
                 What these guys said
                   @Giuliano84
@matteocollina
                 What these guys said
                   @Giuliano84
@matteocollina
                 What these guys said
                   @Giuliano84
@matteocollina
                  What these guys said
                     @Giuliano84




                 OpenData


             Visualization


     Meaningful Information
Data Sources




               Transparency
               Report
Data Sources
             Crowd-Sourced




                     Transparency
                     Report

Hackathon
 sponsor
Data Sources
                             Crowd-Sourced


                                   Aggregated
                                      data
 Unfiltered
users reports
                                      Transparency
                                      Report

                Hackathon
                 sponsor
Data Sources
                                       Crowd-Sourced


                                             Aggregated
                                                data
 Unfiltered
users reports
(kind of a bloody mess)
                                                Transparency
                                                Report

                          Hackathon
                           sponsor
So, how about the GTT ?
So, how about the GTT ?




                   Transparency
                   Report
Roadmap from 10k ft
• Clean the data and
 remove noise
• Combine data from           from
 different sources
                                 >
• Put everything in an
 easy-to-query format
• Throw the result inside
 a DB
• Build a nice interface to
                                     < to
 display meaningful
 information :-)
In practice (1/3)
• Data from Google and OpenNet were already aggregated
  • Good: ready to use as information
  • Bad: not much to do with them
  • Bad: they were only about some countries (~75)
• So we also filtered data from Herdict to get only reports
 relevant to these countries.
• We combined data from both with some stats extracted
 from Herdict reports to provide country-specific
 information...
Like...
Like...             Content removal requests




Transparency Alert     Censored categories
We did something similar
 at site-specific level...
          (2/3)
Like...
website keywords
                     Like...                      website preview




                   # of unreachability warnings
In practice (3/3)

• Data from Herdict were a little bit messy
 • Good: direct users reports, a lot of data
 • Bad: not verified, confirmed, or ranked
 • Bad: user’s typo, non-existent ISPs, etc.
 • Bad: some obvious fake data
   • e.g., 600+ fake reports of palestine-info.co.uk being
     inaccessible from ISP [A-Za-z0-9]{8}
In practice (3/3)

• We considered only websites with more than <T> reports
  • and only (www.)?domain.<tld> with some exceptions,
   like [^.]*.blogspot.com or [^.]*.wordpress.com
• We aggregated reports per-(ISP, country) and per-site
• So that it was easy to get responses to queries like:
  • From which countries the website X has been reported as
   unreachable?
 • From which ISPs in country Y the website X is reported as
   unreachable?
http://www.sharpnod.es/
http://www.sharpnod.es/




       Live Demo?
Cool things we didn't have time for
       • Keyword-based websites search
       • Selection of a temporal interval
                         • A sort of “PLAY” button to
                           visualize the evolution of
                           the graph through time
                         • ...many more :-)
Cool things we knew
we wouldn't have time for
    Real-time reachability check using proxies
    located in several countries.
    How about using ToR with .. ?
    ExitNodes <Nodes-country-X-ISP-Y>
    StrictExitNodes 1


    Infer censorship applied by ISP in higher
    positions in the internet graph.
Q (&A)?
HappyHour WhyMCA
       “OpenData”
            Rome, Dec. 15th 2011




Alessandro Manfredi
www.n0on3.net
@n0on3

More Related Content

Similar to WhyMCA HappyHour - EUHackathon Part II

Distributed dataintelligence
Distributed dataintelligenceDistributed dataintelligence
Distributed dataintelligence
www.ixxo.io
 
Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...
Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...
Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...
blockchainexe
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media Streaming
Cloud Elements
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
DataWorks Summit
 
Enhancing user engagement on mobile devices
Enhancing user engagement on mobile devicesEnhancing user engagement on mobile devices
Enhancing user engagement on mobile devices
Randall Arnold
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Nicola Sandoli
 
OI in the Public Sector by Esteve Almirall
OI in the Public Sector by Esteve AlmirallOI in the Public Sector by Esteve Almirall
OI in the Public Sector by Esteve Almirall
citycamptunisia
 
DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...
DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...
DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...
Felipe Prado
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Data Con LA
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and ChallengesReal-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and Challenges
DataWorks Summit/Hadoop Summit
 
Data in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathonData in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathon
Cisco DevNet
 
Scaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesScaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and Devices
Rick Warren
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in Financial
Yifeng Jiang
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
James Sirota
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
partagetransparents
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
DataWorks Summit/Hadoop Summit
 
Tools and Solutions, Roger Roberts, RTBF
Tools and Solutions, Roger Roberts, RTBFTools and Solutions, Roger Roberts, RTBF
Tools and Solutions, Roger Roberts, RTBF
FIAT/IFTA
 
MobileMiner and NervousNet
MobileMiner and NervousNetMobileMiner and NervousNet
MobileMiner and NervousNet
kingsBSD
 
Media, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionMedia, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste Prediction
Xavier Amatriain
 

Similar to WhyMCA HappyHour - EUHackathon Part II (20)

Distributed dataintelligence
Distributed dataintelligenceDistributed dataintelligence
Distributed dataintelligence
 
Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...
Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...
Blockchain EXE #10:Ocean ProtocolとBigchainDB: 分散型データエコシステムの実現(Dimitri De Jong...
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media Streaming
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Enhancing user engagement on mobile devices
Enhancing user engagement on mobile devicesEnhancing user engagement on mobile devices
Enhancing user engagement on mobile devices
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
OI in the Public Sector by Esteve Almirall
OI in the Public Sector by Esteve AlmirallOI in the Public Sector by Esteve Almirall
OI in the Public Sector by Esteve Almirall
 
DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...
DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...
DEF CON 27 - MASARAH PAQUET CLOUSTON and OLIVER BILODEAU - the industry of so...
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and ChallengesReal-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and Challenges
 
Data in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathonData in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathon
 
Scaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesScaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and Devices
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in Financial
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Tools and Solutions, Roger Roberts, RTBF
Tools and Solutions, Roger Roberts, RTBFTools and Solutions, Roger Roberts, RTBF
Tools and Solutions, Roger Roberts, RTBF
 
MobileMiner and NervousNet
MobileMiner and NervousNetMobileMiner and NervousNet
MobileMiner and NervousNet
 
Media, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionMedia, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste Prediction
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

WhyMCA HappyHour - EUHackathon Part II

  • 1. HappyHour WhyMCA “OpenData” Rome, Dec. 15th 2011 EUHackathon: Hacking data to deliver meaningful information Part II Alessandro Manfredi
  • 2. EUHackathon Nov, 8-9 ’11 Long story short: we went, we coded, we had fun, got a prize, etc.
  • 3. EUHackathon Nov, 8-9 ’11 Long story short: we went, we coded, we had fun, got a prize, etc.
  • 4. @matteocollina What these guys said @Giuliano84
  • 5. @matteocollina What these guys said @Giuliano84
  • 6. @matteocollina What these guys said @Giuliano84
  • 7. @matteocollina What these guys said @Giuliano84
  • 8. @matteocollina What these guys said @Giuliano84 OpenData Visualization Meaningful Information
  • 9. Data Sources Transparency Report
  • 10. Data Sources Crowd-Sourced Transparency Report Hackathon sponsor
  • 11. Data Sources Crowd-Sourced Aggregated data Unfiltered users reports Transparency Report Hackathon sponsor
  • 12. Data Sources Crowd-Sourced Aggregated data Unfiltered users reports (kind of a bloody mess) Transparency Report Hackathon sponsor
  • 13. So, how about the GTT ?
  • 14. So, how about the GTT ? Transparency Report
  • 15. Roadmap from 10k ft • Clean the data and remove noise • Combine data from from different sources > • Put everything in an easy-to-query format • Throw the result inside a DB • Build a nice interface to < to display meaningful information :-)
  • 16. In practice (1/3) • Data from Google and OpenNet were already aggregated • Good: ready to use as information • Bad: not much to do with them • Bad: they were only about some countries (~75) • So we also filtered data from Herdict to get only reports relevant to these countries. • We combined data from both with some stats extracted from Herdict reports to provide country-specific information...
  • 18. Like... Content removal requests Transparency Alert Censored categories
  • 19. We did something similar at site-specific level... (2/3)
  • 21. website keywords Like... website preview # of unreachability warnings
  • 22. In practice (3/3) • Data from Herdict were a little bit messy • Good: direct users reports, a lot of data • Bad: not verified, confirmed, or ranked • Bad: user’s typo, non-existent ISPs, etc. • Bad: some obvious fake data • e.g., 600+ fake reports of palestine-info.co.uk being inaccessible from ISP [A-Za-z0-9]{8}
  • 23. In practice (3/3) • We considered only websites with more than <T> reports • and only (www.)?domain.<tld> with some exceptions, like [^.]*.blogspot.com or [^.]*.wordpress.com • We aggregated reports per-(ISP, country) and per-site • So that it was easy to get responses to queries like: • From which countries the website X has been reported as unreachable? • From which ISPs in country Y the website X is reported as unreachable?
  • 26. Cool things we didn't have time for • Keyword-based websites search • Selection of a temporal interval • A sort of “PLAY” button to visualize the evolution of the graph through time • ...many more :-)
  • 27. Cool things we knew we wouldn't have time for Real-time reachability check using proxies located in several countries. How about using ToR with .. ? ExitNodes <Nodes-country-X-ISP-Y> StrictExitNodes 1 Infer censorship applied by ISP in higher positions in the internet graph.
  • 29. HappyHour WhyMCA “OpenData” Rome, Dec. 15th 2011 Alessandro Manfredi www.n0on3.net @n0on3

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n