Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked data and the future of scientific publishing


Published on

Presentation to NFAIS Webinar on "Linked Data: What It Is, What It Does and The Future of Information Discovery", delivered 2012-10-25.

Published in: Technology
  • Be the first to comment

Linked data and the future of scientific publishing

  1. 1. Linked Data andthe Future of Scientific PublishingBradley P. Allen, Elsevier LabsPresentation to NFAIS Webinar – “Linked Data: What It Is, What ItDoes and The Future of Information Discovery”2012-10-25
  2. 2. Scientific knowledge in a post-print world “Our new knowledge does not consist of a careful set of works that have passed through a series of gates. … Our new knowledge is not even a set of works. It is an infrastructure of connection.” David Weinberger. 2011. Too Big to Know: Rethinking Knowledge Now That the Facts Arent the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room, Basic Books, New York, NY 2 2
  3. 3. “Infrastructure of connection” = linked data Type of data Content Inputs Linked Data Outputs Benefits • XML • Asset metadata • Better discoverability • Long-form free text • Citations • Better visualization and understandability • Short-form free text • Classifications What the • Tables • Clusters • Better integration for use in information solutions literature is • Images • Entities about • Video • Relations • Audio • Language models • Probabilistic graphical models • Article views • Article-level metrics • Provides the researcher • Search queries • Sentiment analysis insight about her career How the • User behavior • Ranking and impact • Provides institutions data about their performance literature is • Social media streams metrics and impact • User interest profiles being used • Provides publishers data for optimizing our business 3 3
  4. 4. Linked data as standards and best practices “Linked data is just a term 1. Use URIs as names for for how to publish data on things the web while working 2. Use HTTP URIs so that with the web. And the web people can look up those is the best architecture we names know for publishing information in a hugely 3. When someone looks up diverse and distributed a URI, provide useful environment, in a gradual information, using the and sustainable way.” standards 4. Include links to Jeni Tennison. 2010. Why Linked Data for other URIs, so that they 140 can discover more things Tim Berners-Lee. 2006. Linked Data
  5. 5. Scientific publication as linked data Linked data Provenance metadata Entity record Relational Metadata Document Asset metadata Acquire Relational Relational Deliver Metadata metadata Media object Asset Asset metadata Metadata Transform, Enhance, Index, Analyze, Compose 5
  6. 6. Linked data is increasingly important in science 6
  7. 7. The challenge for publishers • Create greater online engagement with our content and platform • Semantically enrich our content and enhance value of discovery services compared to the same and similar content at other platforms • Drive additional usage (in journals and books, in downloads and interactivity) • Improve our ability to be a partner in research, and as a publisher that adds value • Improve our connection with the scientific community through productive collaborations that improve search and discovery for all researchers 7
  8. 8. Elsevier’s approach to linked data • Expose existing asset and subject metadata as linked data in Web pages to aid discovery • Embrace linked data principles while leveraging our existing content production workflow and infrastructure • Leverage partners for content enhancement and knowledge organization • Reuse Web-standard vocabularies, taxonomies, ontologies and entity resources where possible • Collaborate in building needed authoritative resources for identity resolution and metrics • Deliver benefits across the complementary use cases of researcher and practitioner 8
  9. 9. Creating smart content by extracting & linking Asset Metadata Usage Entities Citations Relations 9
  10. 10. Methods for extracting and linking content & data• Very mature, but • Variable degrees of maturity, but huge • Language-driven, hard to scale strides through machine learning research so challenging to• Crowdsourcing is a and practical application on the consumer generalize and possible solution, Internet scale but quality control • Data-driven, so the more data the better • Crucial to realize is a challenge • Models can be used to build applications, promise of ease of can be a new type of publication integration 10
  11. 11. Packaging linked data for content production tag:satelliteWrapper + XML Schema rdf:RDF+namespaces sat:Satellite Concept schemes SKOS Statement 1 Generator Tags Diabetes Statement 2 Hypertension LDR ... RDF Generator Para1-Statement-1 Region Tags Diabetes Example RDF Statements ... Tags from a taxonomy for a given document Document sections relevant to a given concept Para2-Statement-2 Document sections providing answers to a given question Hypertension Learning objects compliant with a given state educational standard Genes mentioned in a given document Documents supporting or disputing conclusions of a given document Concepts that are in the areas of expertise for a given author ... 11
  12. 12. Infrastructure for storing and publishing linked data Loader (REST) Data Spaces tes Satelli ation Annot es Satellit Asset es Satellit Vocab Data Party 3rd Pipeline Coordination Pipeline Services (Hadoop EMR) N- RDF Ontology JSON Reaso Interlin ValidatiSvcs Quads Transform Extract ning king on Discovery Services Amazo MongoDB SIREN/ Virtuoso n S3 SOLR Triplestor e Discovery Atom Admin& Ontology SPARQL A&E Service API Analytics Feed Monotoring Service Endpoint (REST) Load Balance & Failover (Akamai GTM & Amazon ELB) 12
  13. 13. Integrating content & data services with linked data 13
  14. 14. Delivering linked data through multiple online servicesOrganization Main driver Example Benefits Linked dataS&T Journals Making the article more engaging and Article of the Understanding, Entities, Citations, informative through visualization and linking Future Discovery Relations Books Making the book more engaging and Brain Navigator Understanding, Entities informative through visualization and linking Discovery A&G Making the discovery of relevant content Lipids SciVerse Discovery, Entities, Asset Research easier and more engaging App Integration Metadata A&G Making data about the production and use SciVal Spotlight Understanding Entities, Citations, Institutional of scientific content easier to understand Usage Corporate Alternative Making the exploration of design Elsevier Biofuels Discovery Entities, Citations Fuels alternatives easier Bibliographical Automating the indexing of content for Embase Discovery Asset Metadata, Databases traditional discovery channels Entities Engineering & Making the discovery of technology trends Illumin8 Discovery Entities, Citations, Technology and sources easier Relations Pharma Biotech Rich integration of content and data in Target Insights Discovery, Entities, Citations support of research and design workflows UnderstandingHS CDS Delivering actionable information in the Order Sets Integration Entities, Relations context of medical decision making GCR Making the discovery of relevant medical Clinical Key Discovery Entities, Asset content easier and contextual Metadata NHP Making the delivery and organization of General Discovery, Entities, Asset medical content easier to integrate with Education Integration Metadata, educational workflows Platform Relations 14
  15. 15. Challenges in implementing linked data • Access to content and data • Production – Usage data not integrated or – Manually intensive knowledge engineering leveraged – Balancing production validation and – Hard to stage content for modeling rapid iterative development and analytics – Relation extraction needed but capabilities are minimal at best • Integration – Tools for syntactic rather than – Adoption of standards across silos semantic validation and legacy systems • Sharing – Globalization/localization of – Culture and legacy knowledge organization systems – Business model disincentives – Named entity registries for identity – Identifier, URI and namespace resolution for accreditation, governance provenance and trust • Quality control • Human resources – Lack of clean external data – Gaps in linked data resources – Scarcity of data scientists, language – Bugs in knowledge organization engineers systems
  16. 16. Trends within Elsevier today • Increasing acquisition of data and text analytics capabilities • Shifting dependence from partners to in-house resources for content enhancement and knowledge organization • Innovation in new knowledge organization systems (some through integration of existing ones) – Two main design emphases: taxonomy for discovery, ontology for understanding and integration • Emergence of shared smart content infrastructure based on linked data principles 16
  17. 17. Smart content is a bridge to the future of publishing • Smart content allows publishers to create new products and services through structuring content for better discovery, insight and utility – The value is in the structure, not the content – Creating that structure is hard work – The kind of hard work that publishers have traditionally focused on • Consumer Internet businesses are using text and data mining to add structure to content today… quickly and on the cheap • Publishers, societies and libraries both large and small can use the same techniques to follow suit 17
  18. 18. Thank youBradley P. Allenb.allen@elsevier.combradleypallen on twitter, github