20120419 linkedopendataandteamsciencemcguinnesschicago


Published on

This talk introduces Linked Data and Semantic Web by using two examples - population sciences grid and semantAqua - a semantically enabled environmental monitoring. It shows a few tools and the semantic methodology and opens discussion for LOD and team science

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

20120419 linkedopendataandteamsciencemcguinnesschicago

  1. 1. Linked Open Data as an Enabler for Team Science Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute, Troy, NY & CEO McGuinness Associates, Latham, NY Science of Team Science; LOD and Team Science April 19, 2012
  2. 2. Background– Semantic Technologies – technological support for encoding meaning in a form computers can understand and manipulate – are maturing and increasing in usage– Computational encodings of meaning can be used to help integrate, link, validate, filter,…. Essentially to make smarter, more context-aware applications– Semantic Technologies enable linking data … and linked data provides a way of connecting and traversing information, nodes, graphs, webs, …
  3. 3. Linked Data• Linked Data is quite simple and follows principles set out by Berners-Lee in http://www.w3.org/DesignIssues/LinkedData.html – Use URIs as names for things – Use HTTP URIs so that people can look up those names. – When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) – Include links to other URIs. so that they can discover more things. – Introduction by examples and then discussion
  4. 4. Population Sciences Grid Goals• Convey complex health-related information to consumer and public health decision makers for community health impact• Inform the development of future research opportunities effectively utilizing cyberinfrastructure for cancer prevention and controlMcGuinness, D. Shaikh, A., Lebo, T, Ding, L., Courtney, P., McCusker, J., Moser,. Morgan, G.D., Tatalovich, Z., Willis, G., Contractor, N., and Hesse, B.2012. Towards Semantically-Enabled Next Generation Community Health Information Portals: The PopSciGrid Pilot In Proceedings of HawaiiInternational Conference on System Sciences 2012 4
  5. 5. Semantic Web Perspective on Initial PopSciGrid Goals• How can semantic technologies be used to integrate, present, and analyze data for a wide range of users?• Can tools allow lay people to build their own demos and support public usage and accurate interpretation?• How do we facilitate collaboration and “viral” applications?• Within PopSciGrid: – Which policies (taxation, smoking bans, etc) impact health and health care costs? – What data should be displayed to help scientists and lay people evaluate related questions? – What data might be presented so that people choose to make (positive) behavior changes? – What does the data show? why should someone believe that? – What are appropriate follow up questions to support actionability? 5
  6. 6. Foundations: The Tetherless World Constellation Linked Open Government Data Portal Convert TWC LOGD Query/ Access LOGD Community Portal SPARQL • RDF Endpoint • RSS • JSONCreate • XML • HTML • CSV •… Enhance Data.gov deployment 6
  7. 7. What is an Ontology? Thesauri “narrower Formal Frames GeneralCatalog/ term” is-a (properties) LogicalID relation constraints Informal Formal Value Disjointness Terms/ instance Restrs. , Inverse, glossary is-a part-of…Ontologies Come of Age McGuinness, 2001, and From AAAI Panel 99 – McGuinness, Welty, Uschold, Gruninger, LehmannPlus basis of Ontologies Come of Age – McGuinness, 2003
  8. 8. Inference Web: Making Data Transparent and Actionable Using Semantic Technologies• How and when does it make sense to use smart system results & how do we interact with them? (Mobile) Knowledge Intelligent Provenance in Virtual Agents NSF Interops: Observatories SONET SSIII – Sea Ice Intelligence Analyst Tools Hypothesis Investigation / Policy Advisors 8
  9. 9. Foundations: Web Layer Cake Visualization APIs S2S Govt Data Inference Web, Proof Markup Language, W3C Inference Web IW Trust, Provenance Working Air + Trust group formal model, W3C incubator group, DL, KIF, CL, N3Logic … Ontology repositoriesOWL 1 & 2 WG Edited main OWL (ontolinguag), Docs, quick reference, Ontology Evolution env: OWL profiles (OWL RL), Chimaera, Earlier languages: DAML, Semantic eScience DAML+OIL, Classic Ontologies, MANY other ontologie RIF WG AIR accountability tool SPARQL WG, earlier QL – OWL-QL, Classic’ QL, … Govt metadata search Linked Open Govt Data SPARQL to Xquery translator RDFS materialization (Billion triple winner) Transparent Accountable Datamining Initiative (TAM
  10. 10. PopSciGrid Workflow Ban coverage Publish CSV2RDF4LOD Direct visualize derive deriveCHSI 2009 archive Archive SemDiff CSV2RDF4LOD derive Enhance
  11. 11. PopSciGrid Example State -HawaiiExtensible Mashups via Linked Data Diverse datasets from NIH Potentially linking to other content (e.g.“unemployment rate”)Accountable Mashups via Provenance Annotate datasets used in demos 12 Feedback users’ comment to gov contact (e.g. %) Annotation capabilities coming (and more)
  12. 12. PopSciGrid II
  13. 13. ReflectionsSuccessful but….• What if we could allow data experts to build their own demos?• What if we could allow non-subject matter experts to function as subject-literate staff?• What if team members could interchange roles (and thus make contributions in other areas)?• What technological infrastructure is required?• Claim: all of this is being done now – but not at scale 14
  14. 14. Updates and Motivations from a Computer Science PerspectiveOld: New:• Raw conversions • Enhanced conversions• Per-dataset vocabularies • Vocabulary reuse• Custom queries • Generic queries• Custom data • Re-usable data management code management code• Limited use because of • Unlimited use of new Google Visualization open source visualization licenses toolkit• State-level data • State and county-level data 15
  15. 15. RDF Data Cube Vocabulary • Integrated with the LOGD• For publishing multi- data conversion dimensional data, such infrastructure as statistics, on the web in such a way that it can • Integrated with other tooling be linked to related data like Stats2RDF sets and concepts using RDF.• Compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange).• Also compatible with: – SKOS, SCOVO, VoiD, FOAF, Dublin Core Terms 16
  16. 16. County average life expectancy(Summary Measures of Health
  17. 17. SemantEco/SemantAqua• Enable/Empower citizens & scientists to explore pollution sites, facilities, regulations, and health impacts along with provenance. 5 4• Demonstrates semantic 2 3 monitoring possibilities.• Map presentation of analysis• Explanations and Provenance 1 available http://was.tw.rpi.edu/swqp/map.html and 1. Map view of analyzed results http://aquarius.tw.rpi.edu/projects/semantaqua 2. Explanation of pollution 3. Possible health effect of contaminant (from EPA) 4. Filtering by facet to select type of data 5. Link for reporting problems 6. Now joint with USGS resource managers ; expanded to endangered species; now more virtual observatory style
  18. 18. System ArchitectureVirtuoso access 19
  19. 19. Originally developed for VSTO, now in SSIII, SESDI, SESF, OOI … The Virtual Solar-TerrestrialObservatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. 19Conf. on Innovative Applications of Artificial Intelligence (IAAI-07), http://www.vsto.org
  20. 20. Discussion• Semantic Technologies and Linked Data are powering a wide array of application – many in Big Science, Team Science, at least interdisciplinary science• Labeled graphs as powered by structured data can be a nice corpus for evaluation• Tools and methodologies are ready for use• We love to partner in these areas• What do you need or want from linked data?Questions? - dlm @ cs . rpi . edu
  21. 21. Extra
  22. 22. Directions• Incorporation of TWC data Quality Facts label (Zednik et al)• Use of DataFAQs automated data quality framework (Lebo et al)• Additional provenance inclusion / usage (Inference / Provenance Web)• Annotation / Collaboration facilities (Michaelis et al)• Other data sets? Or exposition of other parameters?• Partners in additional topic areas 23
  23. 23. Enabling Subject Area Exploration and Hypothesis Generation• What factors influence prevalence (and under what conditions)?• Within smoking, should we focus on prevalence, packs sold, quit rate, hospital admission diagnosis, other?• What is prevalence (definition)? And how is it measured (overall / in this data set)?• What are the conditions under which the data was obtained (date, sample set, extenuating conditions, …)• What other data might we include? And how might we show that data?• What should be represented ? And how should it be manipulated?• What tools and services to people benefit from to explore? Encode? Act?
  24. 24. Semantically-enabled advisorsutilize: • Ontologies • Reasoning • Social • Mobile • Provenance • ContextPatton & McGuinness.et. altw.rpi.edu/web/project/Wineagent
  25. 25. Semantic SommelierPrevious versions used ontologiesto infer descriptions of wines formeals and query for winesNew version uses Context: GPS location, local restaurants and wine lists, user preferences Social input: Twitter, Facebook, Wiki, mobile, …Source variability in quality,contradictions exist,Maintenance is an issue… howevernew models emerging
  26. 26. • Semantic Technologies: ready for use• The Semantic Web Tools & tutorials available; deep apps enables… future planning may benefit from consultants• • New models of intelligent services Context-aware, semantic apps are the future • E-commerce solutions • M-commerce • Web assistants • … New forms of web assistants/agents that act on a human’s behalf requiring less from humans and their communication devices…