MIT CSAIL Linked Data Ventures Class: Linked Open Data for Entrepreneurs 2013


Published on

A presentation to MIT CSAIL's Linked Data Ventures class 20130312.

Published in: Education
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MIT CSAIL Linked Data Ventures Class: Linked Open Data for Entrepreneurs 2013

  1. 1. Linked Data: Opportunities for Entrepreneurs Dr. David Wood @prototypo 12 March 2013
  2. 2. David WoodB.S. Mechanical EngineeringB.S. Electrical Engineering (equivalency)M.S. Astronautical EngineeringAeronautical & Astronautical EngineerPh.D. Software Engineering
  3. 3. David Woodcompany founded products disposition @𝛑 Plugged In Software 2002 2005 ongoing ongoing
  4. 4. David Woodcompany founded products disposition @𝛑 Plugged In Software RDF Database 2002 RDF Database Management 2005 RDF Usage ongoing Linked Data ongoing Management
  5. 5. “more anterior sectors of the prefrontalcortex are distinctively recruited whenaltruistic choices prevail over selfishmaterial interests” - Jorge Moll et al
  6. 6. “For it is in giving that we receive.” - Saint Francis of Assisi
  7. 7. Consistently late to rapidly changingmarkets (music, electronics, cafés, e-books)
  8. 8. Pop Quiz
  9. 9. Pop Quiz
  10. 10. Innovators Dilemma
  11. 11. Innovators Dilemma
  12. 12. May 2001
  13. 13. 08 Oct 2007 07 Nov 2007 10 Nov 2007 28 Feb 2008 31 Mar 200818 Sep 2008 05 Mar 2009 27 Mar 2009 14 Jul 2009 22 Sep 2010
  14. 14. Sep 2011
  15. 15. We’ve Seen This Before
  16. 16. YouTube HDTV watch Betterwatch videos videosPublish videosShare videos Rate videosDiscuss videos
  17. 17. Linked Data RDBMS Use data Use data Publish data Share data Rate data Discuss data
  18. 18.
  20. 20. 32
  21. 21. Publishing
  22. 22. Credit: Bradley P. Allen, Elsevier Labs
  23. 23. ✔ DocBook 5 ✔ LaTex ✔ XHTML 5 ✔ ePub 3Credit: Bradley P. Allen, Elsevier Labs
  24. 24. Open Government
  25. 25. US EPA• Cloud-based Linked Data provision of 3 coreprograms: • 2.9M Facilities • 100K substances • 25 years of toxic pollution reports• FISMA compliant• 16 Callimachus templates• Official launch Feb 2013
  26. 26. From EPAFrom WikipediaOpen Street Map
  27. 27. Life Sciences
  28. 28. Active PURLs for Clinical Study Aggregation David Wood1 and Tom Plasterer2 1, 2Tom.Plasterer@astrazeneca.comThe problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.The solution: Gather, convert, aggregate and format for display 3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus Project, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enable PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical companys network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is dynamically transformed into Resource Description Framework (RDF) formats and all sources results then merged into a single, temporary graph of RDF data. Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable versions of the data are also available.How semantic technologies help Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it. Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.User experience Challenges HTTP-accessible endpoints capable of returning XML or textual content Distributed queries have many known 1 Users resolve a URL that limitations, such as the introduction of provides a unique identifier for multiple single points of failure in any a clinical study, drug, chemical given PURL resolution. HTTP timeouts, or other concept managed by auth/auth errors or other network failures this system. The user may can slow or stop a pipeline from returning be presented with the URL on correctly. HTML pages, search it via full- Similarly, distributed queries can result text techniques or discover it in variant query-time performance due to via semantic search. complex network and endpoint perform- Multiple targets queried independently ance variances. Convert XML or textual results to 2 Users are presented with a RDF Proactive caching and cache manage- dynamically generated Web meant strategies can improve runtime page representing aggregated 1 performance and protect end users from clinical study information. Users User resolves a single URI to an Render RDF to HTML via template the limitations inherent in a distributed are isolated from the complex Active PURL query architecture. Caching of and distributed information intermediate results from endpoints has environment. not yet been implemented.References Next steps
  29. 29. Your Opportunity?• Linked Data warehouses 10B USD annually.• Linked Data supply chains 205M USD annually (Web analytics) 6B USD annually (enterprise)• Linked Data analytics 16B USD annually
  30. 30.
  31. 31. Credits Batman Treaty Signing (public domain) Batman_signs_treaty_artist_impression.jpg) Centro Universitario deCiencias Exactas e Ingenierías, Universidad de Guadalajara (public domain) Spreadsheet Photo Casey Serin (CC-BY licensed) LOD Cloud Diagrams Richard Cyganiak, Anja Jentzsch, (CC-BY-SA)Earth weather analysis image NASA Goddard SFC CC-BY Publisher emerging content Copyright (c) 2011 Elsevier, used with permission. architecture Corporate logos, Darkon Movie Poster, BBC screenshots, CAMC credit card image and book covers © their respective owners and used under Fair Use for educational purposes
  32. 32. Credits Mundaneum images Copyright © Collection Mundaneum - Mons, Courtesy of the Mundaneum Archives Centre. Chasm Photo Travis S. (CC-BY-NC licensed) Supply Chain Image Kevin Krejci (CC-BY licensed) Sharing Squirrels Image leezie5 CC-BY-NC-ND licensed) Envirofacts screenshot A US Government Work of the US EPA. Used with permission. Linked Data book cover Copyright (c) 2012-13 Manning Publications Inc. Used with permission.All other photos and drawings © 2010-13 3 Round Stones Inc or David Wood, released under a CC-BY-SA license
  33. 33. This work is Copyright © 2011 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the workUnder the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
  34. 34. Linked Data: Opportunities for Entrepreneurs Dr. David Wood @prototypo 12 March 2013