Sentara Linked Data Workshop - Sept 10, 2012


Published on

One day workshop to Sentara Healthcare on using a Linked Data approach for enterprise architecture. Topics include: Open Government Data initiatives, demo of Weather Health Web application; leveraging open data from NIH, NLM, NOAA, EPA, HHS; Callimachus Enterprise, a Linked Data Management System for the enterprise.

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sentara Linked Data Workshop - Sept 10, 2012

  1. 1. Integrated Data for Improved Personal Health Delivery 10-September-2012 Presenters: Bernadette Hyland, David Wood & Luke Ruth Email. Twitter: @BernHyland This presentation:
  2. 2. Today’s Agenda• 9.00-9.20 - (All) Introductions• 9.20-9.45 - (Phil) Goals & objectives• 9.45-10.30 - (Bernadette) Value proposition of Linked Data, update on government data publishing initiatives, Health Datapalooza• 10.30-11.10 - (David) Intro to enterprise linked data, a resource oriented approach to interoperability• 11.10-11.30 - Break• 11.30-12noon - (Luke) Review of Weather Health app development• 12.00-12.45 - lunch• 12.45-1.30 (David) Web of data architecture, Callimachus• 1.30-2.15 (All) Building support within Sentara, uses cases for Weather Health (Phase I), Q&A
  3. 3. Introductions ...• Sentara team• 3 Round Stones team • Dave Wood, PhD - Enterprise Architect • Bernadette Hyland - Sr. Solutions Architect • Luke Ruth - Software Engineer • ... All specialists in Web architecture & Linked Data
  4. 4. Customers & AffiliationsEnvironmental Health &Protection HumanAgency ServicesGovernmentPrinting Office
  5. 5. • Linked Data is about publishing and consuming data using international data standards • Based on 20 year old idea • A system of linked information systemsWhy am I speaking on Linked Data and sharing today? I’m here in myrole as the co-chair of W3C GLD WG.I’m a serial entrepreneur in this space having founded several companiesthat led some of the most widely used Open Source projects for LinkedData, including Mulgara, OpenRDF/Sesame, the PURLs 2.0 andCallimachus.I’ve authored chapters a couple peer-reviewed chapters in these bookswhich are available in hardcopy or for free, via the Web.
  6. 6. What ideas involvingdata access, sharing& re-use can we helpnurture?
  7. 7. Jeff Pollock, Oracle Businesses are in future shock • Needs changing at faster pace • Affordable Care Act, new regulations, changes in global economy accelerating changes • Information increasingly more central to the operation of any businessIn a dynamic economy, we have to adapt quickly. We cannot change people orhardware fast enough. We have to take a new approach in software to dealwith this. This is a quote from a director @ Oracle who is saying this.Credits: (c) Random House
  8. 8. "If information systems are to keep up with business, we need to change more than technology - we need to change how people deal with technology." - Jeff PollockOf course, Jeff also said "Changes in behavior have to be well-motivated and show some visible value immediately."
  9. 9. Goal for improved health delivery ...• Harness larger & more complex datasets to evaluate the potential for health impacts• More accurately predict factors that contribute to illness or diagnose disease
  10. 10. DATA Drives every decision we make daily & every decision others make on our behalfWhat is happening to data? We are sharing it ...The Web is the a natural place to publish information for public dissemination.The modern Web is an information system owned by no one and yet open tovendors, governments and private citizens. The Web of documents has been agreat place to share HTML, PDF. However we are entering the Web of Data.This is how we’ll share most open data in the next decade.
  11. 11. “We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” -- Report on Digital Government: Building a 21st Century Platform to Better Serve the American PeopleGovernments around the world are defining detailed digital services plans thatare based on Open data and open APIs to deliver government and private digitalservices.At the highest level, government executives in the UK, EU, US, India, Brazil arecommitted to managing open data and content in a way that is useful for theconsumer of that content. The question is HOW?
  12. 12. Sharing WorldwideWe are sharing documents and data worldwide, routinely with people we don’tknow.If achieved, it will transform how governments interact with one another,between nations and how they serve their citizens in the 21st Century.Using the Web to solicit input and inform decision making, and ultimately, tocreate a more transparent and accessible government is a very, very worthwhilegoal.
  13. 13. Who is sharing their data ... ? Small and large commercial and governmentorganizations, NGOs, Non-profits ... plus many universities.Governments in the last few years have been responding to Open Governmentinitiatives that mandate publishing open government data.Some are careful, slow-moving entities who simply needed to find real solutionsto real problems.
  14. 14. RetailersGoal: Improve click-throughs on search results
  15. 15. Book PublishersGoals: Improve internal manuscript pipelines, expose additional ways of finding and using content
  16. 16. New Media
  17. 17. GovernmentsGoals: Governmental transparency and/or improved internal efficiencies (data warehouses)
  18. 18. Common business need ...• The ability to integrate & manage large amounts of data in a rigorous & transparent manner• Discovery through interaction of scientific communities, including biomedical informatics & evidence-based medicines
  19. 19. How many are doing it ... the Web of Data • No one vendor owns it • It scales ... to Web-scale • Doesn’t require a super model • Based on International Data Exchange Standards (RDF, SPARQL)Scope: Bigger than any other deployed systemInfinitely adaptable: Changes piecemeal and allows for ad hocadditions & changes.Ownership: Nobody owns it
  20. 20. Let’s look at some ‘versions’ of the Web. It should be said here that TimBerners-Lee, the recognized “father” of the WWW, doesn’t like the ideaof versioning the Web. I happen to agree, but I understand why people doit.As we talk about these versions of the Web, you may want to think ofthis as a continuum with significant waves; each with its ownbenchmark technologies rather than specific versions with distinctstart and end points.Nova Spivack of Radar Networks and created this.
  21. 21. RDF is a lingua franca for data exchangeNot all of Open Government content is Linked Data. A relatively smallpercentage of open data is 4-5 star linked data, however it is growingexponentially.Use of structured data is actively promoted by international standards groupslike the W3C and major search engines, Google, Yahoo!, Bing, Yandex.
  22. 22. Semantic Technologies Semantic Linked Web DataLinked Open Data is a small, pragmatic portion of the greater body ofSemantic Technologies & international standards for data.
  23. 23. The 5 Stars of Open Linked DataGuidance per Tim Berners-Lee, W3C ★ Make your stuff available on the web (any format) ★★ make it available as structured data (e.g. Excel instead of image scan of a table) ★★★ Use a non-proprietary format (e.g. CSV instead of Excel) ★★★★ Use URLs to identify things, so that people can point at your stuff ★★★★★ Link your data to other people’s data to provide context Credit:
  24. 24. 5 Stars of Open Linked Vocabularies Bernard Vatant (Mondeca) Guidance ★ Publish your vocabulary on the Web at a stable URI ★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number) ★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes ★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation ★★★★★ Link to other vocabularies by re-using elements rather than re-inventingCredit:
  25. 25. Why is RDF important? • It is an international standard for publishing data on the Web (public and private) • Data exchange model • It is the future of the Web • ... because it is how we share and reuse dataLeading publishers, HCLS scientists, library scientists, new media,old media, retailers have all committed to structured data forimproved search & access.
  26. 26. WE’VE SEEN THIS BEFORELike HTML and RDF, credit cards have a human-readable side and a machine-readable side.
  27. 27. Each HTML page is paired with a machine-readable data representation.
  28. 28. Open Government Data 3 brief years ... • Starting in 2008, a few heads of state directed open government data to be published on the Web • In September 2011, Presidents Obama (USA) and Rousseff (Brazil) endorsed the Open Government Partnership • 7 other nations launched their government’s National Plans during the meeting of the UN General AssemblyBeginning in 2008, a a couple of heads of state embraced directed opengovernment to be published on the Web. Last month, (September 2011),President Obama and President Dilma Rousseff stood with other heads ofstate to endorse the principles of the Open Government Partnership andlaunch their government’s Open Government National Plans during themeeting of the UN General Assembly.In addition to Brazil and the US, nations who have made committmentsinclude: Indonesia, Mexico, Norway, Philippines, South Africa, and theUK.
  29. 29. What is next for Data? • Structured data on the Web is rapidly becoming mainstream • Government authorities are funding more Linked Open Data projects, especially for weather, human health and scientific research • In 2012 we’re seeing Apps Challenges, hack-a-thons, funding ($1M-$200M)What’s next? We are already seeing signs of the things to come.Structured data on the Web is quickly becoming mainstream.There have been many well-publicized triple challenges, hack-a-thons, apps challenges-- they are popping up everywhere.Organizations with mission critical applications based on relational technologies arecreating a layer above their traditional architectures and building Linked Data-drivenWeb apps.Web apps based on LD are beginning to replace data warehouses.
  30. 30. Publishing data in 2012 & beyond ...• Good = Use Data Standards (RDF) to publish metadata about data and models• Better = Use a Linked Data approach to publish all your open data on the Web• Best = Link your data + models using a Linked Data approach• Web architecture, Web-scale
  31. 31. Open CDC Linked Data Government EPA Cloud DBpedia Data Ontology Clinical US Pub Med Census Business NLM Ontology Internal Social Portal Media Data PhysiciansFacebook Twitter EMR Services Data Locations Clinical Condition Specific
  32. 32. Methodology1. Define target population and clinical data from electronic medical record2. Identify sources of open government data related to environmental, weather, and other variables related to chronic pulmonary disease exacerbations3. Combine open content from NLM, PubMed, Medline to support education4. Leverage a Linked Data approach, using Open Source and international data exchange standards (RDF)5. Alert patient of possible hazardous conditions and recommend appropriate actions
  33. 33. Iterative Approach• Initial POC delivered May 2012 (60 day sprint) • EMR (anonymized) • EPA air quality • Doctors listing (spreadsheet)• Demo’d at Health Datapalooza, Washington DC in June
  34. 34. Health  Data  Ini,a,ve  Forum  IIIHealth  Datapalooza Using EMR and Linked Open Data to Manage Chronic Asthma and COPD
  35. 35. Conceptual MODEL    Pa$ents  with  chronic  pulmonary  disease  that  are   educated  and  no$fied  of  adverse  environmental,  weather,   and  geographic    condi$ons  are  .  .  .       be#er  able  to  respond  and  proac/vely   manage  their  condi/on. Health  Data  Ini,a,ve  Forum  III Health  Datapalooza
  36. 36. Value PROPOSITION MODEL  Decrease in  costly  Emergency  Department  visitsReduce  hospital  re-­‐admissions  aBer  treatmentImprove self-­‐care  and  medica$on  complianceAwareness  of  triggers  and  disease  management Health  Data  Ini,a,ve  Forum  III Health  Datapalooza
  37. 37. Big data ecosystem includes complex dataA phased approach to delivery of a successful Weather Health Explorerapplication is selecting both available and reliable data sources asinputs. It is for these reasons, authoritative government sources fromorganizations including the National Library of Medicine (NLM), NationalOceanic and Atmospheric Association (NOAA) and the US EnvironmentProtection Agency (EPA) have been selected for use in this project.
  38. 38. Leverage  Linked DATA,  OPEN  SOURCE  &  STANDARDS SEMANTIC  FRAMEWORK Web  of  Data CDC DBpedia SMS EPA Pub  Med US  Census NLM Email Web EMR Health  Data  Ini,a,ve  Forum  III Health  DatapaloozaCallimachus is a Linked Data Management platform that takes full advantage of RDF and datadriven navigation. Created with Web 2.0 developers in mind.Governments are providing citizens access to open government data;Corporates can information to the public, customers, suppliers, regulators, with timely information onthe corporation;Research portals etc.
  39. 39. Today’s Asthma ForecastCurrent Anticipate and Prevent EPA Data Patient Admission Data by Date Historic EPA Data at Admission
  40. 40. Progress Update• June - Sept 2012 • Designed Weather Health Web application • Identified data sources (NIH, NOAA, EPA) • Created a Web based application with live data feeds from NIH, NOAA & EPA • Hosted on the cloud using a linked data management system, Callimachus
  42. 42. The NLM will function as the primary source for drug-related information.The NLM publishes multiple API’s that could be of use to this project butthe most immediately beneficial will probably be one called DailyMed.DailyMed is an API that offers access to current Structured Product Label(SPL) information for drugs.
  43. 43.
  44. 44. Drug information may also be taken from a service called MedlinePlus -which is organized and distributed by the National Library of Medicine,National Institutes of Health, and the Department of Health and HumanServices. Upgrades are currently being done to MedlinePlus which willinclude the ability to return an XML document as opposed to a searchresults page. This feature would be extremely useful and if fully functional,may make MedlinePlus the logical choice for primary drug information.
  45. 45. Hostedon cloud Off-s S n M /S ns atio notifications Email/SMS ail atio ite b Em tific istr no a min cku Ad ps Monitoring Service Application-level monitoring l ve SNS -le m ring ste ito Sy on m Callimachus (application) Additional attached Periodic snapshots storage (backup) EBS - 50 GB S3 - 50 GB M2.2XLarge HTTP/HTTPS Public users
  46. 46. In summary, Weather Health ... • Leverages internal and external structured data on the Web • All data from authoritative sources • Involves a combination of static and dynamic data • Hosted on the cloud using AWS • Created using a linked data management system • Callimachus enables Web 2.0 internal or contract developers to combine data sources & quickly build a web UI for Web or mobile devicesThe Weather Health application can also serve to warn patients of drug interactions or advisingthem on dosage. There is also opportunity for smaller modules within the application such as pillidentification by using imprint data. This application was built using Callimachus, a data platformfor data-driven applications. Callimachus allows Web 2.0 developers within Sentara or externaldevelopers to combine multiple data sources and quickly build a Web UI.The basic architecture for the Weather Health solution involves a combination of both static (orpseudo-static) and dynamic data.
  47. 47. LUNCH BREAK!
  48. 48. Web of Data• Resource oriented approach to data interoperability• Callimachus Overview• Maturity of ecosystem • Development environments, reporting tools, databases, hosting, commercial support & training• Next steps, an iterative approach
  49. 49. A History of Silos $ cat foo.txt | grep blah | sort 1970s 1980s 1990sA neat little package Client-Server The Early Web
  50. 50. The Next Great Leap Extending the Ubiquitous, Universal Client reusable applications Expanding theUniversal Connection Explaining the Web of Data Logic Providing the Universal Database
  51. 51. Writing Business Applications Data formatted Code written 1970s 1980s 1990s 2000s
  52. 52. Requirements of The Informatics LandscapeMaximal AgilityvMust span the entire drug development lifecycle o and back (post-market surveillance to discovery)vMust support large and very heterogeneous data o single nucleotide polymorphisms to countriesvWill change as new science emerges & new regulations come into play o Medline just under 1M articles/yearvMust be able to work with multiple, international regulatory bodies o Emerging marketsvPartners, customers and collaborators will change o and will have divergent technical aptitudesvMust be able to interoperated with pre-competitive consortia o Can they perform common tasks for the communityvMust be able to work with legacy data o Lots of unmined gems here! Slide credit: Tom Plaster, PhD, AstraZeneca R&D | RDI
  53. 53. Improving Internal Interoperability Scientists, Clinicians, Informaticists can now freely interoperate as:vThe PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locationsvThe vocabulary server provides a way of harmonizing concepts across different domains o Where possible, public vocabularies are used o Where not, they’re extended o We don’t want to develop and maintain vocabularies Slide credit: Tom Plaster, PhD, AstraZeneca
  54. 54. • Callimachus is a framework for data-driven applications based on Linked Data principles• Callimachus allows Web developers to easily create data driven applications for the Web• It is Open Source (FLOSS)•
  55. 55. Tools & best practices?• Large and small vendors are involved in Linked Data • From Oracle, IBM to 3 Round Stones • Listing of active research projects & deployments See • Best practices, see
  56. 56. W3C HCLS The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicinevActivities: oContinue to develop high level (e.g. TMO) and architectural (e.g. SWAN) vocabularies. oImplement proof-of-concept demonstrations and industry-ready code. oDocument guidelines to accelerate the adoption of the technology. oDisseminate information about the groups work at government, industry, academic events and by participating in community initiatives.vUse Cases/Domains oDrug Discovery oElectronic Lab Notebooks oComparator Arm Data oPatient Data Ownership oBiotech Acquisition oSupply Chain Automation oWeb Integration oBio-surveillance oCo-development Reference: Slide credit: Tom Plaster, PhD, AstraZeneca
  57. 57. This work is Copyright © 2011-2012 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported License Full details at: You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.This presentation is licensed under a Creative Commons BY-SAlicense, allowing you to share and remix its contents as long asyou give us attribution and share alike.