What is opendata


Published on

Lecture on what is open data @

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

What is opendata

  1. 1. DATAVIZ: VISUAL REPRESENTATION OF COMPLEX PHENOMENA data visualization & computational design @ Better Nouveau Workshop 14/12/2011What is Open Data? Lorenzo Benussi, TOP-IX Consotium lorenzo.benussi@top-ix.org 1
  2. 2. About me Research & Business Development TOP-IX Consortium Fellow, NEXA Centre Polytechnic of Turin Fellow, Department ofEconomics University of Turin 2
  3. 3. agenda1. Background2. Definitions I. Open Knowledge Definition II. Open Data Licenses III. Pricing models IV. Formats3. Examples 3
  4. 4. Did you take the bus today? 4
  5. 5. Ref: National Geographic http://ngm.nationalgeographic.com/big-idea/14/augmented-reality Background 5
  6. 6. BIG DATA stylized facts 1• $600 to buy a disk drive that can store all the worlds music.• 5 billion mobile phone in use in 2010.• 30 billion pieces of content shared on Facebook every month.• 40% of projected growth in global data generated per year VS 5% growth in global IT spending.• 235 terabytes data collected by US Library of Congress in April 2011.• 15 out of 17 sectors in the United States have more data stored per company than the US Library of Congress McKinsey: Big Data:The next frontier of innovation, competition and productivity. (may 2011) 6
  7. 7. BIG DATA stylized facts 2 $300 billion potential annual value to US health care - more than X 2 total annual health care spending in Spain.• €250 billion potential annual value to Europes public sector administration - more than GDP of Greece.• $600 billion potential annual consumer surplus from using personal location data globally.• 60% potential increase in retailers operating margins possible with big data.• 140.000-190.000 more deep analytical talent position and 1.5 million more data-savvy managers needed to take full advantage of big data in the USA. McKinsey: Big Data:The next frontier of innovation, competition and productivity. (may 2011) 7
  8. 8. WEB(squared)1.Redefining Collective Intelligence:New Sensory Input2.Cooperating Data Subsystems3.How the Web Learns: Explicit vs.Implicit Meaning4.Web Meets World: The"Information Shadow" and theInternet of Things5.The Rise of Real Time: A CollectiveMindRef: Tim O’Reilly and John Battelle (2009), Web Squared: Web 2.0 Five Years On.http://www.web2summit.com/web2009/public/schedule/detail/10194 8
  9. 9. Digital technology could enable an extraordinary range ofordinary people to become part of a creative process. (The future of ideas, Lawrence Lessig) 9
  10. 10. When I say that innovation is being democratized, I meanthat users of products and services—both firms and individualconsumers—are increasingly able to innovate for themselves.(Democratizing Innovation, Eric Von Hippel) 10
  11. 11. The value of metrics • Data Hal Varian, Google’s Chief Economist • Information • Knowledge • Value 11
  12. 12. 12
  13. 13. DATA as a SERVICEData are not closed inside applications but they are consumed on-demand asa serviceRESTful API make possible to access data as a web resource (trough URI) 13
  14. 14. Business ModelsA. Data owner: paid to publish / revenue share.B. Data user: pay for data delivery/trasformation/ analysis services. New Generation Marketplace3. Works with open and not-open data4. Provide data on-the-fly through API (evan custom).5. Sometime the community of data curators in involved to maintain and expand the data crowd- sourcing (e.g. Factual).6. Provide tools (web based) to explore the data 14
  15. 15. What open data means? Open Data is a model to extract value from public sector information by using the data to build new tools and to create innovative services 15
  16. 16. PSI (public sector information) mines• The Public Sector produces and manages huge amount of data, opening PSI information in EU produces economic growth 140 billion € / year (aggregate)• Public Data are the raw material to create new products and services COURTESY/RON WHEELER. The 8,000-foot deep Homestake Gold Mine in South Dakota is the site where scientists, including UC Berkeley researchers, plan to construct the worlds deepest research center. 16
  17. 17. data.gov “Openness will strengthen our democracy and promote efficiency and effectiveness in Government” Transparency and Open Government Memorandum for the Heads of Executive Departments and Agencies (2009)[…] As you know, transparency is at theheart of our agenda for Government. Werecognise that transparency and open datacan be a powerful tool to help reform publicservices, foster innovation and empowercitizens.David Cameron - Letter to Cabinet Ministers(2011) 17
  18. 18. Information is the currency of democracyBenjamin Franklin (attribution) 18
  19. 19. Raw data now!"... give us the unadulterated data, we want the data, we wantunadulterated data. We have to ask for raw data now."Tim Berners-Lee, advisor data.gov.uk 19
  20. 20. data.gov: leading examplesUSA - data.gov UK - data.gov.uk Australia - data.gov.au 20
  21. 21. Legislation in EU, Italy and Piedmont EUROPA Direttiva 2003/98/CE del 17 novembre 2003 The evolution towards an information and knowledge society influences the life of every citizen inthe Com-munity, inter alia, by enabling them to gain new ways of accessing and acquiring knowledge. DIRECTIVE 2003/98/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 November 2003 on the re-use of public sector information ITALY Decreto Legislativo n. 36 January, 24 2006 and  L. 96/2010. PIEDMONT Delibera di Giunta regionale 36 - 1109 November 2010 21
  22. 22. WHY : civil society• Accountability• Tansparency• Collaboration• Participation 22
  23. 23. WHY : (digital) market•Innovation• Cooperation• Competition• Digital commons 23
  24. 24. The first example in Italy - dati.piemonte.it 24
  25. 25. apps4italy• All EU citizens can participate (!!) & 40K€ in cash prizes• Building useful, innovative projects based on italian public data (not only open data)• Four main categories (growing): 1. Ideas 2. Apps Ref: appsforitaly.org 3. Visualization 4. Datasets 25
  26. 26. Open Data: definitions 26
  27. 27. Open Knowledge Definition v.1.1 by OKF A work is open if its manner of distribution satisfies the following conditions:1. Access2. Redistribution 8. No discrimination (fields or endeavor)3. Reuse 9. Distribution of license4. Absence of technologicalrestriction 10. License must not be specific to a package5. Attribution 11. License must not6. Integrity restrict the distribution of other works7. No discrimination(persons or groups) 27
  28. 28. Open Definition - http://opendefinition.org/okd/Version 1.1TerminologyThe term knowledge is taken to include:# 1.# Content such as music, films, books# 2.# Data be it scientific, historical, geographic or otherwise# 3.# Government and other administrative informationSoftware is excluded [...]The term work will be used to denote the item or piece of knowledgewhich is being transferred.The term package may also be used to denote a collection of works. [...]The term license refers to the legal license under which the work is madeavailable. Where no license has been made this should be interpreted asreferring to the resulting default legal conditions under which the work isavailable (for example copyright). 28
  29. 29. The Definition - A work is open if its manner of distributionsatisfies the following conditions:1. ACCESSThe work shall be available as a whole and at no more than a reasonablereproduction cost, preferably downloading via the Internet without charge. Thework must also be available in a convenient and modifiable form.2. REDISTRIBUTIONThe license shall not restrict any party from selling or giving away the work eitheron its own or as part of a package made from works from many different sources.The license shall not require a royalty or other fee for such sale or distribution.3. REUSEThe license must allow for modifications and derivative works and must allowthem to be distributed under the terms of the original work. 29
  30. 30. 4. ABSENCE OF TECHNOLOGICAL RESTRICTIONThe work must be provided in such a form that there are no technologicalobstacles to the performance of the above activities. This can be achieved by theprovision of the work in an open data format, i.e. one whose specification is publiclyand freely available and which places no restrictions monetary or otherwise uponits use.5. ATTRIBUTIONThe license may require as a condition for redistribution and re-use the attributionof the contributors and creators to the work. If this condition is imposed it mustnot be onerous. For example if attribution is required a list of those requiringattribution should accompany the work.6. INTEGRITYThe license may require as a condition for the work being distributed in modifiedform that the resulting work carry a different name or version number from theoriginal work. 30
  31. 31. 7. NO DISCRIMINATION AGAINST PERSONS OR GROUPSThe license must not discriminate against any person or group of persons.8. NO DISCRIMINATION AGAINST FIELDS OF ENDEAVORThe license must not restrict anyone from making use of the work in a specificfield of endeavor. For example, it may not restrict the work from being used in abusiness, or from being used for genetic research.9. DISTRIBUTION OF LICENSEThe rights attached to the work must apply to all to whom it is redistributedwithout the need for execution of an additional license by those parties.10. LICENSE MUST NOT BE SPECIFIC TO A PACKAGEThe rights attached to the work must not depend on the work being part of aparticular package. If the work is extracted from that package and used ordistributed within the terms of the work’s license, all parties to whom the work isredistributed should have the same rights as those that are granted in conjunctionwith the original package.11. LICENSE MUST NOT RESTRICT THE DISTRIBUTION OF OTHER WORKSThe license must not place restrictions on other works that are distributed alongwith the licensed work. For example, the license must not insist that all otherworks distributed on the same medium are open. 31
  32. 32. Open Data: prices 32
  33. 33. A paradigmatic shift: information economy• The transition from a physically-based to a knowledge-based economic environment made information a primary wealth-creating asset.• Digital access to information seems to have changed the structure of many industries, promoting services-oriented business models based on disclosure and sharing of information and knowledge. 33
  34. 34. A paradigmatic shift: PSI data mines• The Public Sector holds and manages huge amounts of data and information. Fostering access to those repositories enables new business opportunities that can broaden market volumes in such sectors.• PSI represents the raw material from which value added products and services can be designed. 34
  35. 35. The use/value of PSI PSI can be used and reused in many ways (non rivalry in Several supply chain consumption): configurations.1.Broad range of sectors 1.Linear models (private re-users2.Different sets of actors add value)3.PSI holders 2.User generated contents4.Private re-users 3.Information sharing between5.Regulatory bodies public bodies6.Citizens 35
  36. 36. The price of PSI: the “free data” approach• The peculiar cost structure of digital data collecting, processing and delivering (high fixed costs, zero marginal cost) strongly influences the possible pricing strategies to be adopted by PSI holders.• Pollock (2008): a price that equals marginal costs (i.e. PSI free of charge) is socially optimal provided that elasticity of demand and positive externalities overcome a given threshold. ✓ Empirics: those conditions are likely to be verified in most of the PSI domains. 36
  37. 37. The price of PSI: cost recovery approach• Although a cost recovery regime may bound potential demand and distort competition, several critical issues could trigger its adoption.• Underestimation of downstream demand and network externalities. ✓Lack of long-run commitment in subsidizing PSI collection. ✓Short-term decision making. ✓Moral hazard (?). 37
  38. 38. The price of PSI: possible scenariosDirective 2003/98/EC is aimed at fostering PSI reuse mainly by promoting:1.PSI availability in digital format2.Transparency of reuse conditions and pricing3.Non discrimination Which market configurations are likely to emerge?MEPSIR (2006) Directive impact Main condition Example Information is strongly liked with the functioning Cadastral Closed shop Minor. Public Sector bodies continue to of public bodies. information control the supply chain. Non-negligible. New entrants step into the Information is important while not strategic for Battlefield Meteorological data downstream market. PA. Strong. Public Sector enlarges its influence Digitalization offers new opportunities for value Legal information over the downstream stages. extraction. Playground Non-negligible. Public Sector has the only Information reuse generates high demand Traffic and transport role of information holder. volumes from citizens and firms information 38
  39. 39. The price of PSI: Externalities & PolicyAll pricing strategies encompass potential risks of inefficiency for PSI holders (due to lack of incentives in reducing costs and/or improving quality). The importance of the regulatory framework The Central Role of Externalities 39
  40. 40. Open Data: formats 40
  41. 41. Linked open data and Semantic web The Semantic Web isnt just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (by Tim Berners-Lee) 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. Ref: http://www.w3.org/DesignIssues/ LinkedData.html 41
  42. 42. 42
  43. 43. Linked open data: basic principles1. Everything has a name (people, locations, etc.)1. Every name starts with http://3. All data are described by using RDF (Resource Description Framework is a W3C standard). Tim Berners Lee talk on linked data: http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html 43
  44. 44. Data as a RDF graph 44
  45. 45. The Vision - A globalinterconnected database 45
  46. 46. The Vision - Mix data on-the-fly 46
  47. 47. Linked data - hands onDBPedia provide information of wikipedia as Linked Data.Example, Turin airport: http://dbpedia.org/page/Turin_Caselle_Airport 47
  48. 48. Open Data: license 48
  49. 49. Open Data license 1 (OKF) Open Knowledge foundation licences1. Public Domain Dedication and License (PDDL) — “Public Domain for data/databases”2. Open Data Commons Attribution License (ODC- By) — “Attribution for data/databases”3. Open Data Commons Open Database License (ODC-ODbL) — “Attribution Share-Alike for data/ databases” Ref: http://www.opendatacommons.org/licenses/ 49
  50. 50. Open Data licenses 2 (CC e IODL) Creative Commons Licenses (http://creativecommons.org/ licenses/)1. CC Zero2. CC by - Atribution3. CC SA - Share alike4. CC BY-SA - Attribution and Share alike Italian open data license (http://www.formez.it/iodl/)• IODL - Italian Open Data License (BY-SA) 50
  51. 51. examples 51
  52. 52. 2 groupsI. TransparencyII. Information services 52
  53. 53. Transparency• Public assembly (parliament, councils)• Public Budget and expenses• Public procurement 53
  54. 54. Ref: http://traintimes.org.uk/map/tube/Info services• Transportation• Environment• Cultural heritage 54
  55. 55. food 55
  56. 56. kids 56
  57. 57. environment 57
  58. 58. Ref: http://traintimes.org.uk/map/tube/transportation58
  59. 59. Ref: http://webdesignledger.com/inspiration/ 15-stunning-examples-of-data-visualizationRef: http: //www.gapminder.org/ Data VIZ 59
  60. 60. Where to find open dataOpen (and not open) data archivehttp://ckan.net/http://it.ckan.net/Example of italian datasets:Dati.gov.it: http://www.dati.gov.it/5T: http://biennaledemocrazia.it/dataset/Dati Piemonte: http://dati.piemonte.itISTAT: http://dati.istat.it/Enel: http://data.enel.com/ 60
  61. 61. Tools and linksONLINE DATA VISUALIZATIONG visualization Api: http://code.google.com/intl/it-IT/apis/chart/Tableau Public: http://www.tableausoftware.com/publicOpen Heat Map: http://www.openheatmap.com/ONLINE STORAGE+VISUALIZATIONGoogle Public Data explorer: http://www.google.com/publicdata/homeIBM Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/Google Fusion tables: http://www.google.com/fusiontables/HomeImpure: http://www.impure.com/CURATION & LINKINGGoogle RefineData Wrangler: http://vis.stanford.edu/wrangler/OFFLINE TOOLSR: http://www.r-project.org/Jscript Library for data viz: http://thejit.org/Anche questa: http://vis.stanford.edu/protovis/Network / graph analysis / visualization: http://gephi.org/Language turing complete for dataviz for visual artist: http://processing.org/ 61
  62. 62. wrap-up1. Not all public data are open data2. Public data and gov data are often “broken” (strange formats and ambiguous IP)3. Open Data make sense if we put it in perspective - the rise of Big Data 62
  63. 63. everything is changing 63
  64. 64. thankslorenzo.benussi@top-ix.org 64