An Introduction to the Semantic
Web and Linked Open Data
Kristi L. Holmes, PhD
Twitter: @kristiholmes
Layne Mark Johnson, ...
Information Overload
We humans have always applied
tools to our work to make things
work easier…
Simple Machines
“a web of data that can be
processed directly and
indirectly by machines”
-Tim Berners-Lee
At its heart, the Semantic
Web is really about
extending standard Web
technologies to better deal
with data on the Web.
If...
How the Semantic Web works
Anakin Skywalker is Luke Skywalker's father.
How the Semantic Web works
XML and RDF are at the heart of the Semantic Web.
They give computers a structure in which to l...
An ontology is simply a vocabulary that describes
objects and how they relate to one another. A schema
is a method for org...
Using languages designed for data
RDF | OWL | XML
Semantic web: describes methods and
technologies to allow machines to
understand the meaning or "semantics”
of information...
Let’s talk about the data…
The Semantic Web isn't just about
putting data on the web. It is about
making links, so that a ...
The 5 Stars of Linked Open Data
★
★★
★★★
★★★★
★★★★★
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
http://www....
Is your data 5-star??
The 5 Stars of Linked Open Data
http://5stardata.info
The growth of Linked Data
20082007
2011
http://lod-cloud.net
What kind of things
are available as
linked data?
The LOD Cloud
Models and standards that allow for greater
data exchange (and flexibility!)
It takes layers and layers of
metadata, logic...
Building a web of data
http://geology.com/articles/night-satellite/satellite-photo-of-europe-at-night-lg.jpg
Data Creators...
Ok! Now let’s dig into a few good
examples of how we can put
these things to work
Linked Open Data and
Biomedical Research: A Survey
of Current International Efforts
Kristi L. Holmes, PhD
Twitter: @kristi...
Theevolvingecosystemof
information
Courtesy Mike Conlon, U Florida
Projects.
Research Networking
Ontology
Research Networking
Information about scholars is optimized using a Web-based
infrastructure of standards and technologies...
The Semantic Web
& Researcher Networking
• Increasing recognition of the value of semantic web standards
• Increasing mome...
Recommendations and Best Practices
for Research Networking
The Research Networking Recommendations were approved by the CT...
Research Networking Systems
• VIVO, Profiles, SciVal Experts, Stanford’s
CAP, Iowa’s Loki
• Encourage your RN provider to ...
Profiles
• text
http://catalyst.harvard.edu/spotlights/profiles.html
VIVO
This work is funded by the National Institutes of Health, U24
VIVO enjoys a robust open source, open
community space ...
www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
CTSAConnect Project
Goals:
– Identify potential col...
1/25/2015 31www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Merging VIVO and eagle-i
 eagle-i is a...
OpenPHACTS
Open PHACTS Project
• To reduce the barriers to drug discovery in industry,
academia and for small businesses, ...
OpenPHACTS
Open PHACTS Project
• Develop a set of robust standards…
• Implement the standards in a semantic integration hu...
http://skr3.nlm.nih.gov/SemMed/index.html
Outreach and
adoption
activities
Education
and
training
Ontology
and
controlled
vocabulary
expertise
Relationships
with
ve...
Tools & Apps.
Search
Visualizations
Work efficiencies
Analysis and evaluation
Search
• VIVOsearch and CTSAsearch
• VIVOsearchlight
• AgriVIVO – FAO of the UN
• Search across
– Land Grant institutions
...
http://vivosearchlight.org/
@mileswortho
Visualizations!
http://xcite.hackerceo.org/VIVOviz/
@hackerceo
Inter-InstitutionalCollaborationExplorer
Make work easier
SPARQL Query Builder
Are you using Linked Open Data?
What are your hopes for this
collection of technologies?
How can you get involved?
Open data, open tools, open process
Thank you!
Acknowledgements:
• Carlo Torniai & Melissa Haendel – OHSU
• Tony Williams ...
Linked Open Data_mlanet13
Upcoming SlideShare
Loading in …5
×

Linked Open Data_mlanet13

223
-1

Published on

Presentation at 2013 Medical Library Association Annual Meeting
Layne Johnson and Kristi Holmes

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
223
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • My favorite tool…
  • Here were talking about different machines – computers.


    Describe typical web page, limitations of HTML

    Connecting documents, not concepts; not easy to traverse across disparate data sources


    From DERI (http://www.deri.ie/about/press/coverage/details/?uid=194&ref=214):
    The semantic web is a term coined by world wide web inventor and Deri advisory board member Tim Berners-Lee, to describe the “web of data” that enables machines to understand the semantics, or meaning, of information on the web.
    It involves the insertion of machine-readable metadata into web pages to give information on how they are related to each other, enabling automated agents to access the web more intelligently and perform tasks on behalf of users.
    Berners-Lee has defined the semantic web as “a web of data that can be processed directly and indirectly by machines”.
  • Anakin Skywalker is Luke Skywalker's father.

    It's easy for you to figure out what this sentence means -- Anakin and Luke Skywalker are both people, and there is a relationship between them.

    You know that a father is a type of parent, and that the sentence also means that Luke is Anakin's son.

    But a computer can't figure any of that out without help. To allow a computer to understand what this sentence means, you'd need to add machine-readable information that describes who Anakin and Luke are and what their relationship is.

    This starts with two tools -- eXtensible Markup Language (XML) and Resource Description Framework (RDF).

    XML is a markup language XML complements HTML by adding tags that describe data. These tags are invisible to the people who read the document but visible to computers.

    RDF does exactly what its name indicates -- using XML tags, it provides a framework to describe resources. In RDF terms, pretty much everything in the world is a resource.

    To do this, RDF uses triples written as XML tags to express this information as a graph. These triples consist of a subject, property and object, which are like the subject, verb and direct object of a sentence. (Some sources call these the subject, predicate and object.)

    So far in this example, the computer knows that there are two objects in this sentence and that there is a relationship between them. But it doesn't know what the objects are or how they relate to one another.
  • Another obstacle is that computers don't have the kind of vocabulary that people do.

    Difficult to know the connections between different words and concepts and to infer meanings based on contexts.

    In order to understand what words mean and what the relationships between words are, the computer has to have documents that describe all the words and logic to make the necessary connections.

    In the Semantic Web, this comes from schemata and ontologies.


  • From DERI (http://www.deri.ie/about/press/coverage/details/?uid=194&ref=214):

    The Semantic Web involves publishing in languages specifically designed for data: Resource Description Framework (RDF), Web Ontology Language (OWL), and Extensible Markup Language (XML).






    HTML describes documents and the links between them. RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meetings, or airplane parts.
  • Semantic/ontology definitions (below RDF), Elly in RDF example for visual, point out links.
    content from the VIVO team – http://vivoweb.org
  • Go to http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ (linked on arrow) and view data levels and examples – emphasize costs and benefits
  • ★ make your stuff available on the Web (whatever format) under an open license
    ★★ make it available as structured data (e.g., Excel instead of image scan of a table)
    ★★★ use non-proprietary formats (e.g., CSV instead of Excel)
    ★★★★ use URIs to identify things, so that people can point at your stuff
    ★★★★★ link your data to other data to provide context
  • What kind of things are available as linked data?
  • It takes layers and layers of metadata, logic and security to make the Web machine-readable.

    Most visual representations of these layers involve a stack -- sort of a tower of blocks that represent all the layers.

    The stack changes and evolves as the concepts behind the Semantic Web develop.
  • We want to use linked open data concepts to provide data as RDF at URIs. This is critically important for building a web of data.

    Predicates have addresses, sites point to objects in other triples stores.

    Resolve queries across triple stores – “show investigators who genetic work is implicated in breast cancer.” VIVO won’t have information linkages between breast cancer and disease. Other resources will. But VIVO can link to external sources. “Mike worksOn GeneY”

    Archives. Data Aggregators. Publishers. Institutional repositories.

    So now we turn to tools
  • This is a simplified version of the ecosystem of information we are creating.  Additional elements not depicted are concepts and events.
  • VIVO enables collaboration and understanding across an institution and among institutions

    VIVO harvests much of its data automatically from verified sources so it is accurate and current, reducing the need for manual input.

    The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
    Data is housed and maintained at the local institutions. There it can be updated on a regular basis.
    Search results are faceted so information can be located rapidly and with less time spent sorting through information.

    Profiles are largely created via automated data feeds, but can be customized to suit the needs of the individual.
    Profiles are richer in content than typical [web pages or] social networking sites and will rank higher in general internet searches.

    Across institutions VIVO provides a uniform semantic structure to enable a new class of tools using the data to advance science. …..visualizations, search, discovery, etc


    Each institution provides its own VIVO system and data. Local governance determines data to be provided.

    VIVO structures data in RDF triples using the VIVO ontology. Moreover, the recommendations state that as a general principle the profile data should be publically available as Linked Open Data. This announcement demonstrates the CTSA Consortium’s recognition of the value of semantic web standards and increasing momentum in support of semantic web technologies to facilitate research discovery. Examples of applications which consume these rich data, including: visualizations (Katy’s viz URL), enhanced multi-site search (VIVO search URL), and VIVO Searchlight (searchlight URL). Other utilities are in development across a wide range of functionalities.

  • Strong open source development component to the project – this is reflected in part by the top notch applications that were submitted to a recent call for applications by the project
  • Data are reused and repurposed in a wide array of tools and settings.
    Cornell University has done a stellar job of this – using VIVO data to provide current information about faculty and their interests for department and college websites; University of Florida reuses data from their VIVO for their CTSI member database – a move that other institutions are making, as well.
  • Linked Open Data_mlanet13

    1. 1. An Introduction to the Semantic Web and Linked Open Data Kristi L. Holmes, PhD Twitter: @kristiholmes Layne Mark Johnson, PhD @LayneJohnson The day after May the Fourth, 2013
    2. 2. Information Overload
    3. 3. We humans have always applied tools to our work to make things work easier…
    4. 4. Simple Machines
    5. 5. “a web of data that can be processed directly and indirectly by machines” -Tim Berners-Lee
    6. 6. At its heart, the Semantic Web is really about extending standard Web technologies to better deal with data on the Web. If the WWW is for people, the Semantic Web is for machines George Thomas and Jim Hendler, http://www.data.gov/communities/node/116/blogs/142 Data modeled as bidirectional relationships Semantic Web Value Proposition… Web-based infrastructure of standards and technologies which allows for a distributable, machine readable description of data that allows for stronger data and smart web application linkages
    7. 7. How the Semantic Web works Anakin Skywalker is Luke Skywalker's father.
    8. 8. How the Semantic Web works XML and RDF are at the heart of the Semantic Web. They give computers a structure in which to look for information and define relationships between resources. http://computer.howstuffworks.com/semantic-web
    9. 9. An ontology is simply a vocabulary that describes objects and how they relate to one another. A schema is a method for organizing information http://computer.howstuffworks.com/semantic-web
    10. 10. Using languages designed for data RDF | OWL | XML
    11. 11. Semantic web: describes methods and technologies to allow machines to understand the meaning or "semantics” of information on the web. -- W3C director Sir Tim Berners-Lee Ontology: a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. -- Wikipedia
    12. 12. Let’s talk about the data… The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other related data. http://computer.howstuffworks.com/semantic-web
    13. 13. The 5 Stars of Linked Open Data ★ ★★ ★★★ ★★★★ ★★★★★ http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ http://www.w3.org/DesignIssues/LinkedData.html AVAILABILITY & VALUE
    14. 14. Is your data 5-star?? The 5 Stars of Linked Open Data http://5stardata.info
    15. 15. The growth of Linked Data 20082007 2011
    16. 16. http://lod-cloud.net What kind of things are available as linked data? The LOD Cloud
    17. 17. Models and standards that allow for greater data exchange (and flexibility!) It takes layers and layers of metadata, logic and security to make the Web machine- readable. http://computer.howstuffworks.com/semantic-web
    18. 18. Building a web of data http://geology.com/articles/night-satellite/satellite-photo-of-europe-at-night-lg.jpg Data Creators, Data Aggregators, & Data Consumers Repositories. Tools. Applications. Workflows
    19. 19. Ok! Now let’s dig into a few good examples of how we can put these things to work
    20. 20. Linked Open Data and Biomedical Research: A Survey of Current International Efforts Kristi L. Holmes, PhD Twitter: @kristiholmes Layne Mark Johnson, PhD @LayneJohnson May 5, 2013
    21. 21. Theevolvingecosystemof information Courtesy Mike Conlon, U Florida
    22. 22. Projects. Research Networking Ontology
    23. 23. Research Networking Information about scholars is optimized using a Web-based infrastructure of standards and technologies which allows for a distributable, machine readable description of data that allows for stronger data and smart web application linkages across many universities, agencies, societies both within the US and abroad. Why is this important? Linked data infrastructure allows for • Visualizations, research and clinical data integration, and deep semantic searching across multiple types and sources of data • By breaking data out of traditional database silos, research networking platforms promote a network effect within a single site and across multiple sites – The value of the network increases with the amount of linked data and applications that are available to consume the linked data.
    24. 24. The Semantic Web & Researcher Networking • Increasing recognition of the value of semantic web standards • Increasing momentum in support of semantic web technologies to facilitate research discovery • Recommendations for researcher networking recently endorsed by the CTSA Consortium Steering Committee represent a new standard in researcher networking. • Examples of applications that consume these rich data include: visualizations, enhanced multi-site search. Other utilities are in development across a wide range of topic areas.
    25. 25. Recommendations and Best Practices for Research Networking The Research Networking Recommendations were approved by the CTSA Consortium Executive and Steering Committee on October 25, 2011. Recommendations for Research Networking: • Recommendation: All CTSAs should encourage their institution(s) to implement research networking tool(s) institution-wide that utilize RDF triples and an ontology compatible with the VIVO ontology. • Recommendation: Information in people profiles at institutions should be publicly available as data as a general principle, specifically as Linked Open Data. To ensure quality of information, authoritative electronic data sources versus manual entry should be emphasized. Institutions will vary in the amount of information that they will include and make publicly available but the value is enhanced by the quality and quantity of information. • Recommendation: Monitoring of the research networking landscape, technology, and tools should continue to be overseen by experts from the CTSA consortium (e.g., the Research Networking group of the Informatics KFC). https://www.ctsacentral.org/recommendations-and-best-practices-research-networking
    26. 26. Research Networking Systems • VIVO, Profiles, SciVal Experts, Stanford’s CAP, Iowa’s Loki • Encourage your RN provider to meet the recommendations for Researcher networking – Better visibility – Enhanced utility
    27. 27. Profiles • text http://catalyst.harvard.edu/spotlights/profiles.html
    28. 28. VIVO This work is funded by the National Institutes of Health, U24 VIVO enjoys a robust open source, open community space to support implementation, adoption, and development efforts around the world. See http://vivo.sourceforge.net
    29. 29. www.ctsaconnect.org CTSAconnect Reveal Connections. Realize Potential. CTSAConnect Project Goals: – Identify potential collaborators, relevant resources, and expertise across scientific disciplines – Assemble translational teams of scientists to address specific research questions Approach: Create a semantic representation of clinician and basic science researcher expertise to enable – Broad and computable representation of translational expertise – Publication of expertise as Linked Data (LD) for use in other applications
    30. 30. 1/25/2015 31www.ctsaconnect.org CTSAconnect Reveal Connections. Realize Potential. Merging VIVO and eagle-i  eagle-i is an ontology-driven application for collecting and searching research resources.  VIVO is an ontology-driven application for collecting and displaying information about people.  Both publish Linked Data. Neither addresses clinical expertise.  CTSAconnect will produce a single Integrated Semantic Framework, a modular collection of ontologies — that also includes clinical expertise eagle-i Resources VIV O People Coordination eagle-i VIV O Semantic Clinical activities
    31. 31. OpenPHACTS Open PHACTS Project • To reduce the barriers to drug discovery in industry, academia and for small businesses, the Open PHACTS consortium is building the Open PHACTS Discovery Platform. This will be freely available, integrating pharmacological data from a variety of information resources and providing tools and services to question this integrated data to support pharmacological research. Guiding principle is open access, open usage, open source - Key to standards adoption - http://www.openphacts.org/
    32. 32. OpenPHACTS Open PHACTS Project • Develop a set of robust standards… • Implement the standards in a semantic integration hub • Deliver services to support drug discovery programs in pharma and public domain • 22 partners, 8 pharmaceutical companies, 3 biotechs • 36 months project, through March 2014 Guiding principle is open access, open usage, open source - Key to standards adoption - http://www.openphacts.org/
    33. 33. http://skr3.nlm.nih.gov/SemMed/index.html
    34. 34. Outreach and adoption activities Education and training Ontology and controlled vocabulary expertise Relationships with vendors/data providers Programming & technical support Understand data structure Libraries Libraries are supporting (& contributing!) to work areas in a variety of ways related to core mission and service areas
    35. 35. Tools & Apps. Search Visualizations Work efficiencies Analysis and evaluation
    36. 36. Search • VIVOsearch and CTSAsearch • VIVOsearchlight • AgriVIVO – FAO of the UN • Search across – Land Grant institutions – CTSA Consortium Schools – State university systems; Big 10, Big 12, etc.
    37. 37. http://vivosearchlight.org/ @mileswortho
    38. 38. Visualizations! http://xcite.hackerceo.org/VIVOviz/ @hackerceo Inter-InstitutionalCollaborationExplorer
    39. 39. Make work easier
    40. 40. SPARQL Query Builder
    41. 41. Are you using Linked Open Data? What are your hopes for this collection of technologies? How can you get involved?
    42. 42. Open data, open tools, open process Thank you! Acknowledgements: • Carlo Torniai & Melissa Haendel – OHSU • Tony Williams – OpenPHACTS, RSC • CTSA Research Networking Affinity workgroup • VIVO Project

    ×