Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013

on

  • 784 views

Seminar on Open Data at Universidad Simon Bolivar presented by Lorenzino Vaccari. Authors: Juan Pane, Lorenzino Vaccari. ...

Seminar on Open Data at Universidad Simon Bolivar presented by Lorenzino Vaccari. Authors: Juan Pane, Lorenzino Vaccari.
Contributions (CC-BY) from Maurizio Napolitano: Slides 7,8, 55,56,57 and from 61 to 69

Five parts:
1. Open Data: introduction
2. Open Data: Issues
3. Open Data in Trentino Project
4. Open data: Applications
5. Open Data: Semantic Issues

Statistics

Views

Total Views
784
Views on SlideShare
784
Embed Views
0

Actions

Likes
1
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://www.infogineering.net/data-information-knowledge.htm <br /> Knowledge <br /> Firstly, let’s look at Knowledge. Knowledge is what we know. Think of this as the map of the World we build inside our brains. Like a physical map, it helps us knowwhere things are – but it contains more than that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc. <br /> It is from this “map” that we base our decisions, not the real world itself. Our brains constantly update this map from the signals coming through our eyes, ears, nose, mouth and skin. <br /> You can’t currently store knowledge in anything other than a brain, because a brain connects it all together. Everything is inter-connected in the brain. Computers are not artificial brains. They don’t understand what they are processing, and can’t make independent decisions based upon what you tell them. <br /> There are two sources that the brain uses to build this knowledge - information and data. <br /> Data <br /> Data is/are the facts of the World. For example, take yourself. You may be 5ft tall, have brown hair and blue eyes. All of this is “data”. You have brown hair whether this is written down somewhere or not. <br /> In many ways, data can be thought of as a description of the World. We can perceive this data with our senses, and then the brain can process this. <br /> Human beings have used data as long as we’ve existed to form knowledge of the world. <br /> Until we started using information, all we could use was data directly. If you wanted to know how tall I was, you would have to come and look at me. Our knowledge was limited by our direct experiences. <br /> Information <br /> Information allows us to expand our knowledge beyond the range of our senses. We can capture data in information, then move it about so that other people can access it at different times. <br /> Here is a simple analogy for you. <br /> If I take a picture of you, the photograph is information. But what you look like is data. <br /> I can move the photo of you around, send it to other people via e-mail etc. However, I’m not actually moving you around – or what you look like. I’m simply allowing other people who can’t directly see you from where they are to know what you look like. If I lose or destroy the photo, this doesn’t change how you look. <br /> So, in the case of the lost tax records, the CDs were information. The information was lost, but the data wasn’t. Mrs Jones still lives at 14 Whitewater road, and she was still born on 15th August 1971. <br /> The Infogineering Model (below) explains how these interact… <br />
  • Use a strategy that is inclusive and that creates community and engagement around OGD, centered in use and need rather than format (from NAPO) <br /> It is more important to raise the awareness of what I can do with open data, and use that time in trying to open it perfectly (use example from Mexico (only RDF endpoint) vs Mexido DF or Chile) <br /> Rule of 1, 9, 90 of wikipedia and open street maps <br /> 1 that create <br /> 9 that read, editor <br /> 90 that consumes <br /> Open data engagement should: <br /> * Be demand driven <br /> Are your choices about the data you release, how it is structured, and the tools and support provided around it based on community needs and demands? Have you got ways of listening to people&apos;s requests for data, and responding with open data? <br /> * * Put the data in context <br /> Do you provide clear information to describe that data you provide, including information about frequency of updates, data formats and data quality? Do you include qualitative information alongside datasets such as details of how the data was created, or manuals for working with the data? Do you link from data catalogue pages to analysis of the data that your organisation, or third-parties, have already carried out with it, or to third-party tools for working with the data?   <br /> * * * Support conversation around the data <br /> Can people comment on datasets, or create a structured conversation around data to network with other data users? Do you join the conversations? Are there easy ways to contact the individual &apos;data owner&apos; in your organisation to ask them questions about the data, or to get them to join the conversation? Are there offline opportunities to have conversations that involve your data? <br /> * * * * Build capacity, skills and networks <br /> Do you provide or link to tools for people to work with your datasets? Do you provide or link to How To guidance on using open data analysis tools, so people can build their capacity and skills to interpret and use data in the ways they want to? Are these links contextual (e.g. pointing people to GeoData tools for a geo dataset, and to statistical tools for a performance monitoring dataset)? Do you go out into the community to run skill-building sessions on using data in particular ways, or using particular datasets? Do you sponsor or engage with community capacity building? <br /> * * * * * Collaborate on data as a common resource <br /> Do you have feedback loops so people can help you improve your datasets? Do you collaborate with the community to create new data resources (e.g. derived datasets)? Do you broker or provide support to people to build and sustain useful tools and services that work with your data? Do you work with other organisations to connect up your data sources. <br />
  • https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex <br /> Principle 1: Open Data by Default <br /> 11) We recognise that free access to, and subsequent re-use of, open data are of significant value to society and the economy. <br /> 12) We agree to orient our governments towards open data by default. <br /> 13) We recognise that the term government data is meant in the widest sense possible. This could apply to data owned by national, federal, local, or international government bodies, or by the wider public sector. <br /> 14) We recognise that there is national and international legislation, in particular pertaining to intellectual property, personally-identifiable and sensitive information, which must be observed. <br /> 15) We will: <br /> establish an expectation that all government data be published openly by default, as outlined in this Charter, while recognising that there are legitimate reasons why some data cannot be released. <br /> 2.Principle 2: Quality and Quantity <br /> 16) We recognise that governments and the public sector hold vast amounts of information that may be of interest to citizens. <br /> 17) We also recognise that it may take time to prepare high-quality data, and the importance of consulting with each other and with national, and wider, open data users to identify which data to prioritise for release or improvement. <br /> 18) We will: <br /> release high-quality open data that are timely, comprehensive, and accurate. To the extent possible, data will be in their original, unmodified form and at the finest level of granularity available; <br /> ensure that information in the data is written in plain, clear language, so that it can be understood by all, though this Charter does not require translation into other languages; <br /> make sure that data are fully described, so that consumers have sufficient information to understand their strengths, weaknesses, analytical limitations, and security requirements, as well as how to process the data; and <br /> release data as early as possible, allow users to provide feedback, and then continue to make revisions to ensure the highest standards of open data quality are met. <br /> 3.Principle 3: Usable by All <br /> 19) We agree to release data in a way that helps all people to obtain and re-use it. <br /> 20) We recognise that open data should be available free of charge in order to encourage their most widespread use. <br /> 21) We agree that when open data are released, it should be done without bureaucratic or administrative barriers, such as registration requirements, which can deter people from accessing the data. <br /> 22) We will: <br /> release data in open formats wherever possible, ensuring that the data are available to the widest range of users for the widest range of purposes; and <br /> release as much data as possible, and where it is not possible to offer free access at present, promote the benefits and encourage the allowance of free access to data. In many cases this will include providing data in multiple formats, so that they can be processed by computers and understood by people. <br /> 4.Principle 4: Releasing Data for Improved Governance <br /> 23) We recognise that the release of open data strengthens our democratic institutions and encourages better policy-making to meets the needs of our citizens. This is true not only in our own countries but across the world. <br /> 24) We also recognise that interest in open data is growing in other multilateral organisations and initiatives. <br /> 25) We will: <br /> share technical expertise and experience with each other and with other countries across the world so that everyone can reap the benefits of open data; and <br /> be transparent about our own data collection, standards, and publishing processes, by documenting all of these related processes online. <br /> 5.Principle 5: Releasing Data for Innovation <br /> 26) Recognising the importance of diversity in stimulating creativity and innovation, we agree that the more people and organisations that use our data, the greater the social and economic benefits that will be generated. This is true for both commercial and non-commercial uses. <br /> 27) We will: <br /> work to increase open data literacy and encourage people, such as developers of applications and civil society organisations that work in the field of open data promotion, to unlock the value of open data; <br /> empower a future generation of data innovators by providing data in machine-readable formats. <br />
  • You can contribute to report the status of the PSI of your country <br />
  • Based oin the results of a crowdsourcing tools <br /> If you want to get more details about the score board the list of indicators used in this score is public and the link is avaiable on the slide <br />
  • Based oin the results of a crowdsourcing tools <br /> If you want to get more details about the score board the list of indicators used in this score is public and the link is avaiable on the slide <br />
  • Get more info abuot Tel Aviv <br /> Till now everything is happiness and joy, like these kids in tel aviv, even with a hard weather, they see the potential to enjoy and use the most basic instruments fhat they have at their disposal to have fun, provided that their goal is to have fun.. <br />
  • BUT, working in an open data initialtive can be scary too when time comes, and there are several issues in several categories that we need to deal with if we are to have a great project, let us consider that each issue is a kid, that witll trick or treat us in the project, and that if we are not ready, they will play tricks on us. <br /> Working in a open data project is like halloween, you must know that the kids will come and tick or traet you, and you must be prepared with candies (solutions), and that possibly, each kid will ask for a different kind of candie, actually, for sure they will <br />
  • Open Nuts! <br /> Open Government Data activities in Austria <br /> Gregor Eibl*, Brigitte Lutz** <br /> How does opening government data compare to opening nuts? Opening government data can <br /> be compared well with the act of opening a nut. <br /> The kernel of the walnut for example, which is protected by the hard shell and is a valuable <br /> food for animals, is a calorie-rich winter food supply for birds, squirrels and other rodents. <br /> Let’s assume that government data is a rich supply for third parties (businesses, NGOs, citizens, <br /> universities, other government agencies…), this information resource is often protected well from <br /> third parties. Making this data available and easy to use is one of the core claims in the debate of <br /> open government. You will see later in this reflection that the principles of open government data <br /> all have the aim to make these valuable information resources easily available. <br /> Just like cracking a nut to access the rich fruit. <br /> In this reflection we will shortly talk about the hard shell, which the open government data <br /> movement will have to crack. First the activities, which shows what Austria has done so far to <br /> crack the shell and remove the barriers and finally the first fruits, which became available through <br /> the efforts of opening the nut itself. <br /> 1. The Hard Shell of Open Government Data <br /> Figure 1 depicts some barriers encountered during the implementation of opening government <br /> data in Austrian public administrations. Only through the opening process of government data, <br /> data islands were identified, which had to be integrated and harmonized. Other data sources are consciously kept secret, which explains the resistance of some data <br /> owners. Arguments were raised, that opening can have unpredictable results (overwhelming <br /> feedback, new requirements,…) about the quality of data or new technical features. <br /> On the other hand administrations have to deal with missing resources, like lower budgets in the <br /> time of financial crisis, additional distribution cost with the open data portals and missing human <br /> resources capable of handling the new tasks. <br /> Raw data eliminates the possibility to publish only “censored” and non-critical data and <br /> information. Some administrations have concerns to publish their data in a quality that was <br /> sufficient for internal purposes, but not enough for the broad public audience. <br />
  • http://static.fjcdn.com/pictures/Best_7327f7_1907506.jpg <br /> http://d3sdoylwcs36el.cloudfront.net/creative-commons-license-types-pros-cons.gif <br />
  • Preparation Phase <br /> Open Data for Geographic datasets <br /> Guidelines deliberation which are an italian best practice <br /> The catalog (experimental) <br /> Start of the project <br /> First group of Data Sources <br /> The catalog on line: http://dati.trentino.it <br /> Activities on organizational, legal, communities issues <br /> Open Data Challenge <br /> Analysis of the semantic entities for the PAT <br /> Join the OKF CKAN community <br /> Platform evolution <br /> New processes development <br /> Data quality and reuse analysis/Impact assessment <br /> Cooperation with / involvement of the local ICT companies <br /> Modeling and implementing the semantic entities, <br /> Semantic Tools development <br />
  • The Open Knowledge Foundation (OKF) is a non-profit organisation founded in 2004 and dedicated to promoting open data and open content in all their forms – including government data, publicly funded research and public domain cultural content. The OKF CKAN project is the world’s leading open source data portal platform. <br /> CKAN (Comprehensive Knowledge Archive Network) is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. <br />
  • Besides the offline activities we have a social media strategy (ttwitter, Facebook) etc… <br />
  • The Open Data Institute is catalysing the evolution of open data culture to create economic, environmental, and social value. It helps unlock supply, generates demand, creates and disseminates knowledge to address local and global issues. <br /> We convene world-class experts to collaborate, incubate, nurture and mentor new ideas, and promote innovation. We enable anyone to learn and engage with open data, and empower our teams to help others through professional coaching and mentoring. <br />
  • Smart Citizen kit (v. Barcellona) <br />

Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013 Presentation Transcript

  • 1. Open Government Data Seminar @USB* Lorenzino Vaccari1, Juan Pane2 Autonomous Province of Trento, Trento, Italy lorenzino.vaccari@provincia.tn.it 2 University of Trento, Trento, Italy – Universidad Nacional de Asuncion, Asuncion, Paraguay pane@disi.unitn.it 1 *This presentation is taken from the “Open Government Data Tutorial” presented at CLEI2013 1 Lorenzino Vaccari, Juan Pane 22/10/13 http://dati.trentino.it
  • 2. Goal of the Seminar • Introduce Open Government Data • Intro, Issues (Part 1) • If you need it, how can you organize it? • Real experience (Part 2) • Methods for opening data • • 2 Applications (Part 3) Semantic Issues (Part 4) Lorenzino Vaccari, Juan Pane 22/10/13
  • 3. 3 Juan Pane, Lorenzino VaccariPane Lorenzino Vaccari, Juan 22/10/13 15/10/2013 http://www.point-fort.com/index.php?2012/01/25/805-why-how-what http://www.point-fort.com/index.php?2012/01/25/805-why-how-what
  • 4. What? “is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” * *(Source: 4 Lorenzino Vaccari, Juan Pane ) 22/10/13 http://www.opendefinition.org
  • 5. “open” = use reuse redistribution commercial reuse derivative works BUT, may require: - attribution - share alike http://myfbcovers.com/uploads/covers/2012/06/09/16628a1094aa012f7c6e0025902480d2/watermarked_cover.jpg 5 Lorenzino Vaccari, Juan Pane 22/10/13 J. Gray (OKF): http://www.slideshare.net/jwyg/open-government-data-what-why-how
  • 6. The value is in its use 6 Lorenzino Vaccari, Juan Pane 22/10/13
  • 7. 7 Lorenzino Vaccari, Juan Pane 22/10/13 Maurizio Napolitano: http://www.youtube.com/watch?v=YlkjrVAW43Q
  • 8. Is open data useful? 8 Lorenzino Vaccari, Juan Pane 22/10/13 Maurizio Napolitano: http://www.youtube.com/watch?v=YlkjrVAW43Q
  • 9. Open Data Benefits  The Open data are the knowledge base to:  Improve the economic grow and the entrepreneurship based on the development of digital services reusing Public Sector Information  Answer to social needs through the publication of innovative services and applications  Aims at reducing the cost of the public administrative activities within Public – Private Partnerships (PPP)  Improve the transparency of the activities of the public institutions and the participation of the citizens to these activities 9 Lorenzino Vaccari, Juan Pane 22/10/13
  • 10. Principles Tim Berners-Lee (5-Stars of Linked Open Data) http://5stardata.info/ Vs. Tim Davis (5-Stars of Open Data Engagement) http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/ 10 Lorenzino Vaccari, Juan Pane 22/10/13
  • 11. 5 Starts Linked Open Data 11 Lorenzino Vaccari, Juan Pane Tim Berners-Lee 22/10/13 http://5stardata.info
  • 12. 5-Stars of Open Data Engagement Tim Davis * Be demand driven * * Provide context * * * Support conversation * * * * Build capacity & skills * * * * * Collaborate with the community 12 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/
  • 13. Create Community 13 Lorenzino Vaccari, Juan Pane 22/10/13 http://msnbcmedia.msn.com/j/MSNBC/Components/Photo/_new/pb-121007-spain-tarragona-pyramid-nj-02.photoblog900.jpg
  • 14. Open Government Data 14 Lorenzino Vaccari, Juan Pane 22/10/13
  • 15. State of the Art What is happening around us? -Globally -Europe -Latin America 15 Lorenzino Vaccari, Juan Pane 22/10/13
  • 16. Open Data Charter - G8 The principles are: Open Data by Default Quality and Quantity Useable by All Releasing Data for Improved Governance Releasing Data for Innovation https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex 16 Lorenzino Vaccari, Juan Pane 22/10/13 http://opensource.com/government
  • 17. OGD around the world http://census.okfn.org/ 17 Lorenzino Vaccari, Juan Pane 22/10/13 http://opensource.com/government
  • 18. http://census.okfn.org/country/ 18 Lorenzino Vaccari, Juan Pane 22/10/13 http://opensource.com/government
  • 19. OGD in Europe 19 Lorenzino Vaccari, Juan Pane 22/10/13 http://open-data.europa.eu/
  • 20. OGD in Europe screenshots 20 Lorenzino Vaccari, Juan Pane 22/10/13 http://epsiplatform.eu/content/european-psi-scoreboard
  • 21. OGD in Europe Insert table http://epsiplatform.eu/content/european-psi-scoreboard http://epsiplatform.eu/content/psi-scoreboard-indicator-list 21 Lorenzino Vaccari, Juan Pane 22/10/13
  • 22. OGD in Italy 22 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.dati.gov.it/content/infografica
  • 23. OGD in Latin America* *In Venezuela some OD projects have been started by the USB 23 Lorenzino Vaccari, Juan Pane 22/10/13
  • 24. OGD: Part 2 - Issues Questions? 24 Lorenzino Vaccari, Juan Pane 22/10/13
  • 25. 25 25 Juan Pane, Lorenzino Vaccari Lorenzino Vaccari, Juan Pane 08/10/2013 22/10/13 http://evian-thesource.com/kids-having-fun/ http://evian-thesource.com/kids-having-fun/
  • 26. Open Data. Oh ohh Organizational 26 26 Legal Juan Pane, Lorenzino Vaccari Lorenzino Vaccari, Juan Pane Adoption Barriers Technical Contextual 08/10/2013 22/10/13 http://www.wallpapermania.eu/wallpaper/trick-or-treat-cute-pumpkins-lanterns-halloween-wallpaper
  • 27. 27 Lorenzino Vaccari, Juan Pane 22/10/13 http://de.straba.us/wp-content/uploads/2012/08/barrieres_for_implementation_of_ogd.png
  • 28. Organizational Barriers Not ready Lack of resources IT Human Don’t want to be ready 28 Lorenzino Vaccari, Juan Pane 22/10/13 http://montcomediation.org/images/MCMC_MyWayYourWay.jpg
  • 29. Legal barriers Open the Data All the data that was produced using public money has to be made publicly available (with exceptions) vs Privacy You cannot open data that could allow correlation of private personal data Or the complete lack of legislation! 29 Lorenzino Vaccari, Juan Pane 22/10/13
  • 30. Adoption barriers Data is not contextualized People are not informed Opening data is a complex task, opening cleaned data is even more complex. Unclear licenses 30 Lorenzino Vaccari, Juan Pane 22/10/13
  • 31. Technical Barriers  Access to data:  Organizational  Technical, Downtimes, logins,  Payment fees  Fragmentation, incomplete data, scattered  Format  Cataloging, indexing, search  Lack of explicit semantics, metadata  Data is not reliable  Conflicting standards, models, ontologies 31 Lorenzino Vaccari, Juan Pane 22/10/13
  • 32. Barriers Zuiderwijk et al 2010 http://www.ejeg.com/issue/download.html?idArticle=255 Listed 118 socio-technical impediments for opening data in the literature.  Findability  Usability  Understandablity  Quality  Linking  Comparability and compatibility  Metadata  …. 32 Lorenzino Vaccari, Juan Pane 22/10/13
  • 33. Context Barriers Privileged access to data Other companies what to avoid legislation of privacy. Transparency is bad for fraudulent business 33 Lorenzino Vaccari, Juan Pane 22/10/13 http://img.gawkerassets.com/img/182n8vzdlg1iojpg/original.jpg
  • 34. 34 Lorenzino Vaccari, Juan Pane 22/10/13 http://netdna.webdesignerdepot.com/uploads/photo_manipulation/manipulation-9.jpg
  • 35. Part 3 - Real Experience Preguntas? 35 Lorenzino Vaccari, Juan Pane 22/10/13
  • 36. 36 Lorenzino Vaccari, Juan Pane 22/10/13 http://goo.gl/T2Xp80
  • 37. The “Open Data in Trentino” project • The “Open Data in Trentino” project is a 3 years initiative finalized to develop an open data infrastructure to enhance Service Innovation for Trentino following the PAT strategy for services innovation enabled by ICT. The project will be developed within a partnership between Trento RISE and the Autonomous Province of Trento (PAT) according to the innovation PAT model • Goals • Improved quality of life for citizens • Open Data and local businesses • Transparency • Improved efficiency and productivity 37 Lorenzino Vaccari, Juan Pane 22/10/13
  • 38. Workplan - Steps 38 Lorenzino Vaccari, Juan Pane 22/10/13
  • 39. Nome (Acronimo) Descrizione Tipo di Dato Estensione del file Comma Separated Value (CSV) Dato tabellare Formato testuale per l'interscambio testuale di tabelle, le cui righe corrispondono a linee e i cui valori delle singole colonne sono separati da una virgola (o punto e virgola) Geographic Markup Language (GML) Dato geografico Formato XML utile allo scambio di dati territoriali di tipo vettoriale vettoriale Keyhole Markup Language (KML) Formato basato su XML creato per gestire dati territoriali in tre dimensioni nei programmi Google Earth, Google Maps Open Document Format (ODF) Formato per l'archiviazione e lo scambio di documenti di testo, fogli di calcolo, diagrammi e presentazioni Resource Description Framework (RDF) Basato su XML, e' lo strumento base proposto da World Wide Web Consortium (W3C) per la codifica, lo scambio e il riutilizzo di metadati strutturati e consente l'interoperabilità tra applicazioni che si scambiano informazioni sul Web ESRI Shapefile (SHP) Lo Shapefile ESRI è un popolare formato vettoriale per sistemi informativi geografici. Il dato geografico viene distribuito normalmente attraverso tre o quattro files (se indicato il sistema di riferimento delle coordinate). Il formato è stato rilasciato da ESRI come formato (quasi) aperto Extensible Markup Language (XML) E' un formato di markup, ovvero basato su un meccanismo che consente di definire e controllare il significato degli elementi contenuti in un documento o in un testo attraverso delle etichette (markup) 39 .csv Dato geografico vettoriale .kml Dato tabellare .odc Lorenzino Vaccari, Juan Pane .gml Dato strutturato .rdf Dato geografico vettoriale .shp, .shx, .dbf, .prj Dato strutturato .xml 22/10/13
  • 40. Tecnological platform … Comune Meteo GeoDati Statistica Comune Trasporti Meteo GeoDati Statistica Trento Trasporti Trento 40 … Etc… Etc… Lorenzino Vaccari, Juan Pane 22/10/13
  • 41. Catalog http://okfn.org (2004) The Open Knowledge Foundation (OKF) is a non-profit organisation founded in 2004 and dedicated to promoting open data and open content in all their forms – including government data, publicly funded research and public domain cultural content. 41 Lorenzino Vaccari, Juan Pane 22/10/13
  • 42. http://dati.trentino.it* * Available for all the data providers of Trentino   42 Lorenzino Vaccari, Juan Pane Analysis: http://dati.trentino.it/stats Admin: http://dati.trentino.it/admin Harvesting: http://dati.trentino.it/harvest 22/10/13
  • 43. Services 43 Lorenzino Vaccari, Juan Pane 22/10/13
  • 44. Also Trentino is going to launch a challenge to build software applications and creative products (multimedia, audiovisual products, posters, illustrations) based on the datasets published on the http://dati.trentino.it open data catalog.  #ODTChallenge will be the official hashtag for our first open data challenge in Trentino!  44 Lorenzino Vaccari, Juan Pane 22/10/13
  • 45. 45 Lorenzino Vaccari, Juan Pane 22/10/13
  • 46. 567 datasets provided by 10 departments of PAT… Agriculture Culture Geographical Data Welfare Weather Forecast Social policies Statistics Transports 7 months until now 68.555 visits 7.988 unique visits 2.516 downloads 62,64% new visitors 37,36% returning visitors 20 15 10 6 46 reporting errors asking for new data new suggestions OD Applications …MUNICIPALITY OF TRENTO, and INFORMATICA TRENTINA NOW - ALL the departmnets demand to be involved - Plus other local actors 100% ENTHUSIASTIC REACTIONS Lorenzino Vaccari, Juan Pane 22/10/13
  • 47. Want to Know & Learn more? 47 Lorenzino Vaccari, Juan Pane 22/10/13
  • 48. 48 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.theodi.org/
  • 49. 49 Lorenzino Vaccari, Juan Pane 22/10/13 http://schoolofdata.org/
  • 50. 50 Lorenzino Vaccari, Juan Pane 22/10/13 http://opendatahandbook.org/pt_BR/
  • 51. 51 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.od4d.org/category/open-data/how-to/
  • 52. 52 Lorenzino Vaccari, Juan Pane 22/10/13 http://schoolofdata.org/online-resources/
  • 53. Thanks to the project team !!!! • • • • • • • • • • 53 General Manager: Isabella Bressan Project coordinator: Lorenzino Vaccari Organizational/Communication issues: Francesca Gleria, Roberto Cibin Data gatherer: Luca Paolazzi Catalog: Maurizio Napolitano, Samuele Santi Semantics: Juan Pane, David Leoni, Alberto Zanella Legal issues: Eleonora Bassi, Stefano Leucci Communities: Maurizio Napolitano, Francesca De Chiara System integration: Marco Combetto, Lorenzo Dallapè Statistical Linked Data: Pavel Shvaiko Lorenzino Vaccari, Juan Pane 22/10/13
  • 54. OGD: Part 4 - Applications Questions? 54 Lorenzino Vaccari, Juan Pane 22/10/13
  • 55. Apps4Italy 55 Lorenzino Vaccari, Juan Pane 22/10/13
  • 56. Best Application: http://parlamento17.openpolis.it/ 56 Lorenzino Vaccari, Juan Pane 22/10/13
  • 57. Open Bilancio Best Idea: http://opendata.comune.fi.it/open_bilancio/ 57 Lorenzino Vaccari, Juan Pane 22/10/13
  • 58. What? DAL America Latina (2012): http://desarrollandoamerica.org/aplicaciones-2012/ DAL America Latina (2013): http://2013.desarrollandoamerica.org/appschallenge/ 58 Lorenzino Vaccari, Juan Pane 22/10/13
  • 59. http://limaio.innovacion.pe/ http://www.limaio.com/demo 59 Lorenzino Vaccari, Juan Pane 22/10/13
  • 60. 60 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.mysociety.org/2007/more-travel-maps/morehousing
  • 61. Johann MITTHEISZ (CIO der Stadt Wien) Total hours to develop 38 applications: around 2.600 City of Wien saved around 208.000 Euro 61 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.slideshare.net/BrigitteLutz/keynote-mittheisz-cio-stadt-wien/16
  • 62. The Open Data Ecosystem (and the OpenStreetMap case) 62 Lorenzino Vaccari, Juan Pane 22/10/13
  • 63. 63 Lorenzino Vaccari, Juan Pane 22/10/13
  • 64. OpenStreetMap OpenStreetMap project creates and provides geographical data, such as road maps, freely available to anyone. Behind the establishment and growth of the project have been restrictions on use or availability of map information across much of the world and the advent of inexpensive portable satellite navigation devices. OpenStreetMap is a free map of the world, created by someone like you 64 Lorenzino Vaccari, Juan Pane ~ 22/10/13
  • 65. 65 Lorenzino Vaccari, Juan Pane 22/10/13 http://tools.geofabrik.de/mc/?mt0=mapnik&mt1=googlemap&lon=11.12042&lat=46.07224&zoom=18
  • 66. 66 Lorenzino Vaccari, Juan Pane 22/10/13 http://haiti.ushahidi.com
  • 67. Watercolor maps 67 Lorenzino Vaccari, Juan Pane 22/10/13 http://content.stamen.com/files/cartography/index_watercolor.html#18.00/46.07204/11.12097
  • 68. From maps to blankets… 68 Lorenzino Vaccari, Juan Pane 22/10/13 http://softcities.net
  • 69. Sharing Data Globally (the eHabitat example) 69 Lorenzino Vaccari, Juan Pane 22/10/13
  • 70. 21th Century Challenges Source: http://www.slideshare.net/angeled/geoss © GEO secretariat 70 Lorenzino Vaccari, Juan Pane 22/10/13
  • 71. The Group of Earth Observation 84 GEO members and 61 Participating organizations 71 Lorenzino Vaccari, Juan Pane Source: http://www.slideshare.net/angeled/geoss © GEO secretariat 22/10/13
  • 72. GEOSS Data Sharing Principles • Full and Open Exchange of Data, recognizing Relevant International Instruments and National Policies • Data and Products at Minimum Time delay and Minimum Cost http://www.geoportal.org/web/guest/geo_home 72 Lorenzino Vaccari, Juan Pane • Free of Charge or minimal Cost for Research and Education 22/10/13
  • 73. “Venezuela is considered a state with extremely high biodiversity, with habitats ranging from the Andes mountains in the west to the Amazon Basin rainforest in the south, via extensive llanos plains and Caribbean coast in the center and the Orinoco River Delta in the east." Source: Wikipedia 73 Lorenzino Vaccari, Juan Pane 22/10/13
  • 74. GEOSS for biodiversity 74 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.eurogeoss-broker.eu/
  • 75. The eHabitat Model 75 Lorenzino Vaccari, Juan Pane 22/10/13 http://ehabitat-wps.jrc.ec.europa.eu/ehabitat/
  • 76. OGD: Part 5 - Semantics Questions? 76 Lorenzino Vaccari, Juan Pane 22/10/13
  • 77. Available Structured Linked Open Data Open formats Redefenceable Linked The best data is an open data Vs. All data must be perfect 77 Lorenzino Vaccari, Juan Pane 22/10/13
  • 78. Lack of explicit semantics The real meaning of the data was kept in the developers mind when creating the data 78 Lorenzino Vaccari, Juan Pane 22/10/13 http://goo.gl/npEHKr
  • 79. Lack of explicit semantics Can lead to things like: 79 Lorenzino Vaccari, Juan Pane 22/10/13
  • 80. Semantic heterogeneity Difference in the meaning of local data 80 Lorenzino Vaccari, Juan Pane 22/10/13
  • 81. Issues when Opening Trentino Data  Each department has authority on only some part of the data.  Dataset originally created for internal use only.  Dataset created for a specific need.  Dataset created with custom format:  For structure (some exceptions)  For data  Lack of reuse -> duplication.  Lack of programmers.  We cannot TELL them what/how to do (always).  Data changes 81 Lorenzino Vaccari, Juan Pane 22/10/13
  • 82. Available Data Catalog Data Catalog Structured Open formats Redefenceable Linked 82 Lorenzino Vaccari, Juan Pane Entity Centric Semantic Layer 22/10/13
  • 83. Entity centric: Added value Aggregated data Accurate data, manually curated Unique identifiers, distributed perspectives Re-think identifiers Semantified values E1 name name Ignacio P. F. nationality italian born in Paraguay lives in Trento date of birth 1980 affiliation 83 Juan Pane E2 Univ. Trento affiliation PF-UNA Lorenzino Vaccari, Juan Pane 22/10/13
  • 84. Entities Real world: is something that has a distinct, separate existence, although it need not be a material (physical) existence. Has a set of properties, which evolve over time. Example: Mental: personal (local) model created and maintained by a person that references and describes a real world entity. Digital: capture the semantics of real world entities, provided by people. 84 Lorenzino Vaccari, Juan Pane 22/10/13
  • 85. Entity Centric Semantic Layer: • Address the integration problems due to semantic heterogeneity: • Different formats • Different identifiers • Implicit semantics • Homonyms, synonyms, aliases • Partial knowledge • Knowledge evolution http://www.webfoundation.org/2011/11/5-staropen-data-initiatives/ 85 Lorenzino Vaccari, Juan Pane 22/10/13
  • 86. Entity-based Integration • Focus on entities as first class citizens • Entities are objects which are so important in our everyday life to be referred with a name • Each entity has its own metadata (e.g. name, latitude, longitude, …) • Each entity is in relation with many other entities (e.g. Einstein was born in Ulm, his affiliation was Charles University, Ulm is a city in Germany) • There are relatively “few” commonsense entity types (person, …, event) • There are many domain specific entities (bus stops, cycling paths, ..) • All components have explicit semantics: schema, entities, attributes, values 86 Lorenzino Vaccari, Juan Pane 22/10/13
  • 87. Importing pipeline, Macro Steps 1. Domain analysis  Study the needed entity types, adapt the knowledge base accordingly. First time bootstrapping 2. Import entities  Semi-automatic tool.  Domain experts are expensive.  Human attention is a scarce resource.  Incremental enrichment and aggregation of entities. 87 Lorenzino Vaccari, Juan Pane 22/10/13
  • 88. Open Data Peculiarities All data comes from a CKAN repository (DCAT). Process one data file at a time. Each data file can be represented as a table. Each row in the table represents a (partial) entity. The format of the values might not be enforced in the data files. Not all data is relevant. 88 Lorenzino Vaccari, Juan Pane 22/10/13
  • 89. Importing tool process 89 Lorenzino Vaccari, Juan Pane 22/10/13
  • 90. 1. Source Selection Import one data file at a time 90 Lorenzino Vaccari, Juan Pane 22/10/13
  • 91. 2. Schema Matching Select a target type of entity -> correspondences between the input columns and the output attributes nome provincia descrizione Andalo (1047) Provincia di Trento Canazei (1450) Trento Prov. 91 funivie lat long Sorge su un'ampia sella 3 prativa al centro... 654463 712857 Situato all'estremità settentrionale della... 511504 147444 2 Lorenzino Vaccari, Juan Pane 22/10/13
  • 92. 3. Data Validation Applies format and structure validation and possible automatic transformations needed to have the input data in the expected format. 92 Lorenzino Vaccari, Juan Pane 22/10/13
  • 93. 4. Semantic Enrichment (1/2) Entity disambiguation: Transform text references into links to existing entities. 93 Lorenzino Vaccari, Juan Pane 22/10/13
  • 94. 4. Semantic Enrichment (2/2) Natural Language Processing: Extract concepts and entity references from free-text. 94 Lorenzino Vaccari, Juan Pane 22/10/13
  • 95. 5. Reconciliation Run Identity Management Algorithms to identify each row as a new or existing entity. Result •No Match •Match •Multiple Matches Action: •Use ID •New ID •Ignore Row 95 Lorenzino Vaccari, Juan Pane 22/10/13
  • 96. 6. Exporting At this point: We know what to export. All values for target attributes conform to the expected format. All text has been semantified (NLP). All textual references to entities are converted to links Each row has an identifier v0 96 Lorenzino Vaccari, Juan Pane i i+1 22/10/13
  • 97. 7. Publishing Put back the semantified entities into CKAN so that the entities can be Open Data and can be found in the same catalog as the original data. Developers and find the data files of the cleaned, aggregated entities But can also interact with the entities via the Entitypedia APIs 8. Visualization Search and Navigation 97 Lorenzino Vaccari, Juan Pane 22/10/13
  • 98. Semantic Layer: Services Tool for aiding the “semantification” of the datasets in the catalog based on: • Schema matching services • Identity Management services • Entity Matching services • Global Unique Identifier services • Semantic search and indexing services • Natural Language Processing • Entity store 98 Lorenzino Vaccari, Juan Pane 22/10/13
  • 99. Our Goal TN UK BE 99 Lorenzino Vaccari, Juan Pane ES 22/10/13
  • 100. BEYOND 100 Lorenzino Vaccari, Juan Pane 22/10/13 http://www.youtube.com/watch?v=Bq_ZWl1ZXA0
  • 101. Thanks! Grazie! Mercy! Gracias! Kiitos! Dank u! Gràcies! Gratias! Danke! ευχαριστώ Lorenzino Vaccari1, Juan Pane2 Autonomous Province of Trento, Trento, Italy lorenzino.vaccari@provincia.tn.it 2 University of Trento, Trento, Italy – Universidad Nacional de Asuncion, Asuncion, Paraguay pane@disi.unitn.it 1 We thank in particular CLEI 2013, Autonomous Province of Trento, TrentoRise association, Universidad Nacional de Asuncion, Universidad Simon Bolivar and University of Trento 101 Lorenzino Vaccari, Juan Pane 22/10/13