Data Strategies: Metadata, Open Data, Linked Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data Strategies: Metadata, Open Data, Linked Data

on

  • 1,018 views

In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies ...

In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies derived from the Semantic Web can help to establish more efficient means to manage data and information.

Statistics

Views

Total Views
1,018
Views on SlideShare
1,014
Embed Views
4

Actions

Likes
4
Downloads
43
Comments
0

3 Embeds 4

http://user-pc5 2
https://twitter.com 1
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Strategies: Metadata, Open Data, Linked Data Presentation Transcript

  • 1. Data Strategies Metadata, Open Data & Linked Data Andreas Blumauer CEO, Semantic Web Company www.semantic-web.at www.poolparty.biz
  • 2. About Semantic Web Company Company was founded 2001 in Vienna, Austria >20 experts in linked data technologies Product: PoolParty Suite (launched 2009) Serving global 500 companies & large NGOs EU- & US-based consulting services
  • 3. Some customers we serve • Pearson • Daimler • Wolters Kluwer • Ministry of Finance (AUT) • GBPN • Credit Suisse • Council of EU • Education Services (AUS) • World Bank • Roche • Wood Mackenzie • REEEP
  • 4. Agenda Intro  Data management – the current situation  Potential & Benefits of Linked Open Data (LOD) – what is metadata, open data, linked data, what is linked open data? Use Cases  Global Buildings Performance Network (GBPN) & BPIE  World Bank Thesauri  EIP on Water: Marketplace  Renewable Energy & Energy Efficiency Partnership (REEEP) Q&A
  • 5. Which problems can be solved on top of big data?
  • 6. Common interests. Common topics. Water Management
  • 7. Common vocabulary? Common understanding? Wastewater treatment Wastewater treatment Water Management
  • 8. Globalisation + Localisation = Glocalisation Certification (Europe) Rating (U.S.)
  • 9. Common data? Questions in common? Energy management policies Search Water Management
  • 10. The Semantic Puzzle
  • 11. Data Analytics 2.0 The islands are now open for the experts
  • 12. Data management in the environmental sector – The current situation  Example: Buildings performance  “2012 saw the launch of an impressive number of online portals sharing data and analysis on energy efficiency in buildings” (Ingeborg Nolte, Senior Communication Manager at BPIE)  However: how can the value be leveraged of so many (open) data sets which are actually isolated from each other?  Will Excel be the ultimate solution?
  • 13. What‟s wrong with Open Data? <daycare id=„Seven Dwarfs“ address=„...“> . . . </ daycare > <kindergarten> <name>Seven Dwarfs</name> <child_care name=„Seven Dwarfs“> <address> <location>...</location> <street>...</street> <description>...</description> <zip>...</zip> </address> </kindergarten> <text>...</text> </child_care>  Syntactic heterogenity – different trees  Semantic heterogenity – different tags and attributes (e.g. kindergarten, child_care, daycare)
  • 14. What is metadata, open data or linked data? What is linked open data?
  • 15. Metadata Meta-metadata Metadata Data & Information
  • 16. Thesaurus / Ontology http://voc.org.com/core/355 altLabel Café Central altLabel http://voc.org.com/core/77 Wien broader http://voc.org.com/core/176 narrower http://voc.org.com/core/97 Das Central prefLabel (de) related Places Vienna Vindobona narrower Café Coffeehouse prefLabel prefLabel broader Gastronomy http://voc.org.com/core/54 related http://voc.org.com/core/44 prefLabel Innere Stadt
  • 17. Data Analytics 3.0 – Connected islands based on standards
  • 18. What is linked data, what is linked open data? The Free Universal Construction Kit connects Lego®, Duplo®, Fischertechnik®, Gears! Gears! Gears!®, K‟Nex®, Krinkles®, Bristle Blocks®, Lincoln Logs®, Tinkertoys®, Zome®, Zom eTool® and Zoob® with a low cost 3D printed adapter set CC by Golan Levin (US), Shawn Sims (US)
  • 19. LOD as a giant knowledge base Which policies in the area of renewable energy have helped to initiate projects and programmes in the agricultural sector which finally have improved substantially the nutritional situation in a certain country?
  • 20. Application example #1: Energy Market Intelligence Scenario #1: I am an energy market researcher at the International Energy Agency (IEA). I inform policy makers about the situation in specific renewable energy areas to develop targeted energy support programs. For my research I need indicators about utilisation levels of all alternative forms of energy with regards to geographical and political categories. http://integrator.poolparty.biz/report_renewable/
  • 21. How does it work?  Articles about Renewable Energy  72,018 documents  From ~300 web sources  Reegle Thesaurus: ~3,000 concepts  Traverse hierarchies below main categories (wind, solar, etc.) and classify documents  Geonames  Annotate documents with regards to their geographical entities  DBpedia  Lookup several Yago classes to all extracted geographical entities to assert additional categories, e.g.: EUcountries, French-speaking countries etc.
  • 22. How does it work? Semantic Search Geospatial Search PoolParty Semantic Integrator …. Data Visualisation
  • 23. Application example #2: Health Care Scenario #2: I am an information officer at the Global Health Observatory of the World Health Organisation. I inform policy makers about the global situation in specific disease areas to direct support to the required health support programs. For my research I need data about disease prevalence in relation to socio-economic factors. http://integrator.poolparty.biz/report_medicine/
  • 24. How does it work?  PubMed Articles  Cardiovascular Diseases: 39,911 documents  Neoplasms: 69,937 documents  Nervous System Diseases: 48,128 documents  MeSH: 26,700 concepts / 346,600 triples  Traverse hierarchies below disease main categories and classify documents  Geonames  Annotate documents with regards to their geographical entities  DBpedia  Lookup HDI (The Human Development Index (HDI) is a composite statistic of life expectancy, education, and income indices used to rank countries into four tiers of human development)
  • 25. How does it work? Semantic Search Geospatial Search PoolParty Semantic Integrator Data Visualisation
  • 26. Data management in the environmental sector – The current situation  Example: Energy data  “It’s necessary to split the responsibility for different data sets between different data providers.” (Florian Bauer)  However: how can this „splitting‟ be co- ordinated and hwo can additional positive network effects be stimulated?
  • 27. 5 stars of data standards • Publish Open Data in RDF reusing vocabularies which can be understood and combined by apps in unforeseen ways (e.g. visualization widgets) link your data use URIs to denote things use non-proprietary formats (e.g., CSV instead of Excel) make it available as structured data (e.g., Excel instead of image scan of a table) make your stuff available on the Web (whatever format) under an open license
  • 28. Licensing is key for open data Kind of license Source: http://www.licensius.com/blog/lodlicenses Num. % Not specified 132 39% Public Domain 69 21% Attribution 66 20% Share alike 35 10% Closed 16 5% With restrictions 5 2% Other 3 1%
  • 29. Use Cases
  • 30. Global Buildings Performance Network (GBPN) The Global Buildings Performance Network (GBPN) is a globally organised and regionally focused network whose mission is to advance best practice policies that can significantly reduce energy consumption and associated CO2 emissions from buildings.
  • 31. Goals Launch of the GBPN global Knowledge Platform for the Energy Performance of Buildings (www.gbpn.org)  Share Knowledge  Build Awareness & showcase best practise  Stimulate collective research  Stimulate collective analysis from experts worldwide  Promote better decision-making  Help the building sector effectively reduce its impact on climate change Linked Open Data successfully services these objectives!
  • 32. Technical Solution Integrated view (& search index) annotation & mapping publish enrich GBPN Knowledge Plattform
  • 33. The GBPN Knowledge Plattform  LOD based GBPN Terminology http://bit.ly/YSbD9S  GBPN News Aggregator Tool: http://bit.ly/13JLJqk  GBPN Policy Comparative Tool: http://bit.ly/X9Vihm The GBPN Knowledge Platform is a Linked Open Data project that aims to open and connect with the best resources, data and information on buildings energy performance policies worldwide.  Report Database: http://www.gbpn.org/reports  The Laboratory: http://www.gbpn.org/laboratory  GBPN web blog: http://bit.ly/X9VSeW Live-Demo
  • 34. The Worldbank Taxonomies & Thesauri http://vocabulary.worldbank.org/
  • 35. EIP on Water - Marketplace http://www.eip-water.eu
  • 36. reegle – country profiles http://reegle.info/countries
  • 37. Understanding synonyms & relations?
  • 38. Standardisation and consistency is key Based on our experience in establishing knowledge broker portals we know:  There is a strong need to increase consistency when tagging climate and energy resources  We need to ensure the consistency of message being delivered to the public to avoid confusion using terms in different ways  This needs standardization of the used categories and tags
  • 39. reegle thesaurus
  • 40. The trusted Clean Energy LOD Cloud http://blog.semantic-web.at/
  • 41. reegle tagging API blog.okfn.org/2013/04/08/sustainable-energy-policy-demands-sustainable-open-data/ Have a look on http://api.reegle.info – using the API is free!
  • 42. Impact reegle.info users per year (not including datasets reused on other sites) 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 2008 2009 2010 2011 2012
  • 43. Contact Andreas Blumauer CEO, Semantic Web Company +43 1 4021235 a.blumauer@semantic-web.at www.semantic-web.at www.poolparty.biz Partner network