The Web Observatory Extension: Facilitating 
Web Science 
Collaboration through Semantic Markup 
Collaboration through Semantic Markup 
Presented by: 
The Tetherless World Constellation 
Rensselaer Polytechnic Institute, Troy, NY 
With thanks to the extended RPI Tetherless World Team
Agenda 
6 
I. Welcome and Overview RPI Web 
Science Research Center 
II. Building a Web Observatory 
III.Schema.org 
IV.Recommendations of Schema.org 
vocabulary
RPI Web Science Research 
Center 
6 
PRESENTER: 
Dominic DiFranzo 
PhD Student 
difrad@rpi.edu 
Please feel free to follow along: 
For more information, please see 
http://tw.rpi.edu/web/web_observatory 
To participate via hackpad, please see < 
https://hackpad.com/WSNet-Webinar-Schema.org-Vocabulary->
Building a Web Observatory 
6 
Boston WOW meeting
Schema.org 
6
Schema.org vocabulary extension 
Web Observatory Vocabulary
Schema.org vocabulary extension 
Web Observatory Project
Department of Health and Human Services' 
Developer Challenge 
6 
A group from RPI TWC won first place in the competition, by using semantic 
technologies and in-house developed software, such as csv2rdf4lod, LODSPeaKr, 
Farrah and DataFAQS.
Questions? 
6
Questions? 
6

Building A Web Observatory Extension: Schema.org

  • 1.
    The Web ObservatoryExtension: Facilitating Web Science Collaboration through Semantic Markup Collaboration through Semantic Markup Presented by: The Tetherless World Constellation Rensselaer Polytechnic Institute, Troy, NY With thanks to the extended RPI Tetherless World Team
  • 2.
    Agenda 6 I.Welcome and Overview RPI Web Science Research Center II. Building a Web Observatory III.Schema.org IV.Recommendations of Schema.org vocabulary
  • 3.
    RPI Web ScienceResearch Center 6 PRESENTER: Dominic DiFranzo PhD Student difrad@rpi.edu Please feel free to follow along: For more information, please see http://tw.rpi.edu/web/web_observatory To participate via hackpad, please see < https://hackpad.com/WSNet-Webinar-Schema.org-Vocabulary->
  • 4.
    Building a WebObservatory 6 Boston WOW meeting
  • 5.
  • 6.
    Schema.org vocabulary extension Web Observatory Vocabulary
  • 7.
    Schema.org vocabulary extension Web Observatory Project
  • 8.
    Department of Healthand Human Services' Developer Challenge 6 A group from RPI TWC won first place in the competition, by using semantic technologies and in-house developed software, such as csv2rdf4lod, LODSPeaKr, Farrah and DataFAQS.
  • 9.
  • 10.

Editor's Notes

  • #2 For those who may be just joining us, the topic of today’s Telcon is Health Web Science. We just heard a presentation from our collaborators in Scotland and now we’ll be telling you about our activities at Rensselaer. First some perspective: We live in an era of data proliferation – more instruments are capturing data, more entities are publishing data, often making the data discoverable and widely available. Along with the data explosion, more tools and strategies are emerging for finding, using, and making sense of this next generation of widely available, massively growing datasets. Note that when we say widely available, what we generally mean is available on the Web. Our talk today will focus on our Health Web Observatory. As some of you may know, the Web Science Trust, a global not for profit institution established the Web Observatory. The Web Observatory is intended to be a global data resource for the advancement of economic &amp; social prosperity.   In this contribution, we will take one of current use cases – environmentally motivated water research – and use this scenario to describe some current challenges in data interoperability and often unanticipated usage issues in the evolving data ecosystems. We will also discuss some emerging semantic strategies for data usage in broad and diverse settings and then discuss evolving trends.KEYWORDS: [1920] INFORMATICS / Emerging informatics technologies, [1946] INFORMATICS / Metadata, [1970] INFORMATICS / Semantic web and semantic integration, [1958] INFORMATICS / Ontologies.
  • #3 Today, we will be presenting and discussing the following: Welcome and Overview RPI’s Health Web Science research agenda Building a Health Web Observatory Examples from the Observatory Related Research Our first presenter is Prof. Deborah McGuinness who will speak to the SemantEco Portal and the Semantic Workflow
  • #4 Bios: Joanne Luciano: Research Associate Professor at the Tetherless World Constellation at Rensselaer Polytechnic Institute, Deputy Director of the Web Science Research Center at RPI, a pioneer in the use of computational methods in psychiatry and in the application of semantic technologies in the biomedical and life sciences. Deborah McGuinness: Professor Deborah McGuinness is a Tetherless World Constellation Chair and Professor of Cognitive Science and Computer Science. She a leading expert in knowledge representation and reasoning languages and systems and has worked in ontology creation and evolution environments for over 20 years. Deborah&amp;apos;s main research thrusts are in languages, tools, and environments for the semantic web. Jim McCusker: Jim McCusker is a PhD student under Professor McGuinness focusing in Biomedical Semantics. His current interests are data and provenance interoperability in life sciences. He has worked as a software developer for 11 years in bioinformatics, high performance computing, data mining, natural language processing, and supply chain auditing.
  • #9 In June 2012, HHS issued the first of its seven challenges calling for developers “to make high value health data more accessible to entrepreneurs, researchers, and policy makers in the hopes of better health outcomes for all.” HHS wanted Metadata &amp;quot;... application of existing voluntary consensus standards for metadata common to all open government data&amp;quot; RPI TWC submitted: DCAT - W3C Data Catalog Version controlled on github. Extracted from their CKAN as input to converter. VoID - W3C Vocabulary of Interlinked Data Organized datasets by source, dataset, version. Provided links to data dumps, Linksets to LOD. PROV - W3C Provenance Interchange Model Captured during CKAN extraction, retrieval, conversion, and publishing. Dublin Core Metadata Terms Annotated subjects based on descriptions. HHS wanted Classification &amp;quot;...classify datasets in our growing catalog, creating entities, attributes and relations that form the foundations for better discovery, integration...&amp;quot; RPI TWC presented: Bottom-up vocabulary and entity reuse Vocabulary created for each dataset Enhanced datasets shifted to reuse vocabulary and entities from other datasets. Three stub vocabularies for top-level reuse. NCBO (Nat. Center for Biomedical Ont.) Annotations annotator/annotator.py SADI service data/source/bioontology-org/annotator-description-subject/version/retrieve.sh HHS wanted Liquidity &amp;quot;new designs ... that form the foundations for ... liquidity&amp;quot; RPI TWC provided: 2B triples among 1M URIs Dataset Linked Data Machine and Human views (via conneg) Faceted search of datasets Dataset dumps (.ttl.gz) For each dataset, and for the whole thing. Dataset query (http://healthdata.tw.rpi.edu/sparql)