20110922 owf

577 views

Published on

Data Publica presentation at OWF 2011

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

20110922 owf

  1. 2. Data Publica <ul><li>Open Data Infrastructure </li></ul><ul><ul><li>Christian Frisch, CTO </li></ul></ul><ul><ul><li>September 22, 2011 </li></ul></ul>
  2. 3. Agenda <ul><li>Data Publica – the Company </li></ul><ul><ul><li>Elevator Pitch / Business Model </li></ul></ul><ul><ul><li>Open Data Catalog </li></ul></ul><ul><li>Technology </li></ul><ul><li>Open Data Recycling </li></ul>
  3. 4. Elevator Pitch <ul><li>Data Publica is gathering the most complete and detailed knowledge of electronic data (public/corporation, free/charged) in France </li></ul><ul><li>Data Publica develops and sells custom data sets, based on customer specifications </li></ul><ul><li>Data Publica operates a DataStore with </li></ul><ul><ul><li>Ready-made data sets we develop ourselves </li></ul></ul><ul><ul><li>Data sets produced by third party vendors </li></ul></ul>
  4. 5. Gathering information about data <ul><li>Most complete and detailed catalog of French public data (launched in sept 2010) </li></ul><ul><li>Third national catalog in the world </li></ul><ul><li>Built by hand, moving to automation </li></ul><ul><li>170 editors, 5,500 files </li></ul><ul><li>Automated updates </li></ul><ul><li>90% are spreadsheets </li></ul><ul><li>Data + metadata </li></ul><ul><li>Search engine </li></ul>
  5. 6. Technology <ul><li>Data Catalog </li></ul><ul><li>Catalog with metadata </li></ul><ul><li>Search (meta & full txt) </li></ul><ul><li>Social Activity (wiki) </li></ul><ul><li>Reco./Notation </li></ul><ul><li>Structured Data </li></ul><ul><li>Multiformat </li></ul><ul><li>Visualisation </li></ul><ul><li>API (geo) </li></ul><ul><li>Search (concepts) </li></ul><ul><li>Link Datasets </li></ul>Recycle Opendata Structured Data Proprietary basic ontologies ETL Linked Data <ul><li>Linked Data </li></ul><ul><li>URI </li></ul><ul><li>Semantic </li></ul>link Open data Raw Usage/Social Web Crawl / filter / classification Scraping Metadata
  6. 7. Recycling Open Data <ul><li>Estimation: 170K spreadsheets on public sites in France (300K in the UK) </li></ul><ul><li>How do we make sense of this content? </li></ul>
  7. 8. Recycling Open Data – Analysis <ul><li>INRIA/Zenith to identify table & metadata </li></ul><ul><ul><li>Extract table (image recognition techniques) </li></ul></ul><ul><ul><li>Identify attributes & types (data columns) </li></ul></ul><ul><ul><li>Export structure and Data (use DSPL format) </li></ul></ul>
  8. 9. Recycling Open Data – Publication
  9. 10. Recycling Open Data – API Access http://api.data-publica.com/…/content.json? limit=10&filter={revenue_fiscal_par_foyer:{$gt:25000}} <ul><li>Multi format (json, xml, spreadsheet,csv) </li></ul><ul><li>Geolocalized Queries </li></ul><ul><li>Mashups </li></ul>
  10. 11. Conclusion <ul><li>Build Visualisations </li></ul><ul><li>Access Open Data through API </li></ul><ul><li>Combine data from multiple sources </li></ul>Structured Data Recycle Opendata Raw
  11. 12. <ul><li>Christian Frisch </li></ul><ul><li>@c_frisch </li></ul><ul><li>www.data-publica.com </li></ul><ul><li>@datapublicatwit </li></ul>

×