Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

II-SDV 2017: The Springer Nature SciGraph – Building a Linked Data Knowledge Graph for the Scholarly Publishing Domain

624 views

Published on

We are pleased to introduce Springer Nature SciGraph, the new Linked Open Data platform aggregating data sources from Springer Nature and key partners from the scholarly domain. The Linked Open Data platform will initially collate information from across the research landscape, such as funders, research projects, conferences, affiliations and publications. Additional data, such as citations, patents, clinical trials and usage numbers will follow over time. This high quality data from trusted and reliable sources provides a rich semantic description of how information is related, as well as enabling innovative visualizations of the scholarly domain.

Published in: Internet
  • Be the first to comment

II-SDV 2017: The Springer Nature SciGraph – Building a Linked Data Knowledge Graph for the Scholarly Publishing Domain

  1. 1. Building a Linked Data Knowledge Graph for the Scholarly Publishing Domain II-SDV | Markus Kaindl Nice, France | April 25 2017
  2. 2. 11 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Agenda Intro • Springer Nature • SN SciGraph Motivation • Integration & Discoverability • Linked Open Data Publishing Roadmap • Status Report • Looking Ahead
  3. 3. 22 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 #1 Intro: - Springer Nature - SN SciGraph
  4. 4. 3 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 3 #1.1 Intro: - Springer Nature - SN SciGraph
  5. 5. 4 Formed in May 2015 through the merger of Nature Publishing Group, Palgrave Macmillan, Macmillan Education and Springer Science+Business Media
  6. 6. 5 [Pre-Merger] Springer Science + Business Media brands
  7. 7. 6 [Pre-Merger] Macmillan Science & Education brands
  8. 8. 7 We publish a lot of science (since 1815) 13M documents 7M articles, 4M chapters 4k journals, 700k books
  9. 9. 8 A Rich History.. 20142014 20132013 20122012 20152015 20162016 NPG Linked Data Platform Nature Ontologies Portal Springer Materials Springer Conferences Subject Pages Scigraph prototype Nero Project Linnaeus Project Springer Protocols CURI Semantic Annotation Project
  10. 10. 9 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 9 #1.2 Intro: - Springer Nature - SN SciGraph
  11. 11. 1010 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Growing Landscape of Knowledge Graphs
  12. 12. 1111 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Product Vision We create the largest state-of-the-art linked data aggregation platform for the scholarly domain. In doing so, we increase content discoverability and provide data tools and services for researchers, authors, editors, librarians, data scientists, funders, conference organizers, and many others by adding value across all content types.
  13. 13. 1212 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 From Content to Data PDF XML EPUB HTML TIFF Content base We publish content Knowledge Graph We manage knowledge
  14. 14. 1313 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 reads / writes is about interested in Three areas of knowledge we care about
  15. 15. 14 Located at In proceedings Cites Has learning resource Has topic
  16. 16. 1515 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 #2 Motivation: - Integration & Discoverability - Linked Open Data Publishing
  17. 17. 16 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 16 #2.1 Motivation: - Integration & Discoverability - Linked Open Data Publishing
  18. 18. 1717 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 2. DISCOVERABILITY • Metadata delivery & validation • Better end user applications • Linked Dataset publishing 1. INTEGRATION via Linked Data • Springer Nature unified domain model and bibliographic data (journals, articles, books, chapters, protocols, MRWs, etc..) • Ingestion and normalization of third party datasets • Consolidation of existing Linked Data efforts What a KG enables: overview
  19. 19. 18 Research/ Manuscript Creation Manuscript Submission Peer Review/ Proposal Stage Planning Production Publication Distribution/ Sales Discovery Publishing Life Cycle Researcher
  20. 20. 1919 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Increase Discoverability  New services • New platform by integrating existing Linked Data activities • Conference Proceedings • Nature.com ontologies • Semantic search on our platforms • handling morphological variations, synonyms, generalisations etc. • metadata provided by graph and deposited into search back ends • Dynamic semantic publishing • Repurposing content based on context and metadata • Linking article and non-article content (e. g. Research data) • Provide rich web metadata snippets for SEO (e.g. to improve ranking by using Schema.org in JSON-LD etc.)
  21. 21. 2020 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Increase Discoverability  Indexing control • Improving coverage of external indexing, e. g. on Scopus or Web of Science • 1st time we check in an automatic way if content was ingested by a 3rd party • Include other online databases, such as selected discovery services • Extend this check to all books & chapters plus journal articles
  22. 22. 21 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 21 #2.2 Motivation: - Integration & Discoverability - Linked Open Data Publishing
  23. 23. 22 The Web of Data • From hypertext pages to the Web of Data • TED Talk Tim Berners-Lee: The Next WEB Data is relationships, not only properties The more data you connect → the more you will find out
  24. 24. 2323 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 The Web of Data Why publish Linked Open Data? • Increase discoverability and ranking through linking • “The future library catalogue is the Internet!” • Drive traffic and usage to our platforms • Make our data machine-readable • Be part of the LOD cloud
  25. 25. 24 The Global Linked Data Cloud (2017) http://lod-cloud.net/
  26. 26. 25 Feb. 2017 Data Release At a glance: - 150 M triples / 32 GB download size - CC-BY-NC License Metadata about: - Articles 2012-2016 (5M) + Abstracts - Grants (200k) - Journals (3k) - Subjects (3k) - Core Ontology Linked Open Data Publishing: making science more accessible
  27. 27. 26 Upcoming Data Releases: 1) Complete SN article archive + Revised licensing strategy (based on user's feedback) 2) Core metadata for books & chapters + Conference series and event info Linked Open Data Publishing: making science more accessible
  28. 28. 27
  29. 29. 2828 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Promotion Activities around Data Release 1. Press release on March 9 (before London Book Fair), announcing the project, pointing to portal page and data  position ourselves as open research supporter  encourage usage of dataset in the linked data community as early on as possible 2. London Hackathon, focused on these datasets, together with selected partners and co-organized with Digital Science
  30. 30. 2929 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 #3 Roadmap - Status Report - Looking Ahead
  31. 31. 30 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 30 #3.1 Roadmap - Status Report - Looking Ahead
  32. 32. 31 > Collaborative effort: Springer Nature ∞ Digital Science > Supporting internal use cases, but also contributing to an emerging web of linked science data > Semantic Web technology stack allows for scalable and expressive enterprise-level metadata management > Not just publications but a wealth of other related data
  33. 33. 32 (Data) Collaboration is Key
  34. 34. 33 ETL Architecture: main features [in evolution] Tech stack > Airflow framework (Airbnb) > Amazon S3 to make backups > GraphDB triplestore (staging and presentation) > Elastic search and APIs Components & Principles > Graph must be ‘ephemeral’ > Data sources versioning algorithm > Identity Persistence service > Validation via SHACL (TopBraid API)
  35. 35. 3434 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Progress with speeding up our ETL code >> Processing metadata for 5.5 million articles September 2016: 11 days September 2016: 11 days October 2016: 24 hours October 2016: 24 hours November 2016: 11 hours November 2016: 11 hours
  36. 36. 35 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 35 #3.2 Roadmap - Status Report - Looking Ahead
  37. 37. 3636 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Ongoing Work ● Build internal/external APIs (JSON-LD) ● Release more data (all articles, books & chapters) in other formats with refined license (CC0) ● Create tools for analytics, reporting, visualization and interactive exploration of the graph ● Entities extraction: chemical substances, places, people, events, links to research data
  38. 38. 37
  39. 39. 38 Data Roadmap 2017 1 Journals + Articles data 2 Institutions (GRID) 3 Books + Chapters data 4 Field of Research categories (FOR) 5 Conferences 6 Disambiguated authors 7 Citations / References 8 Research grants and OA funding information 9 Download + reader numbers 10 Concepts + chemical substances 11 Patents + clinical trials 12 Links to research datasets
  40. 40. 39 Explore the data both as Graph and via Linked Data Browser
  41. 41. 40
  42. 42. 41
  43. 43. 4242 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 Thanks Email : markus.kaindl@springernature.com Senior Manager Semantic Data & Product Owner SN SciGraph SN SciGraph: http://www.springernature.com/scigraph Public Forum: https://groups.google.com/forum/#!forum/scigraph-public
  44. 44. 43
  45. 45. 4444 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 APPENDIX
  46. 46. 45
  47. 47. 46 ETL Architecture ✴ Extraction ✴ Validation ✴ Identity Persistence ✴ Updating / Replacing named graphs ✴ Versioning service ✴ (md5 checksum, timestamps, origin version, etc...) ✴ Integration (union graph) ✴ Inference Named Graphs
  48. 48. 4747 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show which books within our database within a given set of copyright years and collections are from authors who co-authored with persons from a given institution
  49. 49. 4848 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show top universities/institutions with many SN authors within a given set of copyright years
  50. 50. 4949 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show how many affiliated researchers choose to publish with SN, and how often they use their licensed content in their citation practice
  51. 51. 5050 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show funding information grouped by subject area in all available taxonomies (FOR, PMC, Nature subjects, LCC, Dewey etc.)
  52. 52. 5151 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show which conferences take place in subject areas that show an increase of funding over the last 5 years in a given geographical area
  53. 53. 5252 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show grants in a given country grouped by subject area all in relevant taxonomies (e. g. product market codes) over the last x years
  54. 54. 5353 SN SciGraph | II-SDV 2017 | Markus Kaindl | Nice, France | April 25 2017 With our Knowledge Graph we are able to … … show which Springer Nature articles cite researchers from a given university per subject area

×