Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20191119_The OpenAIRE Research Graph

40 views

Published on

with Paolo Manghi (OpenAIRE)

Published in: Science
  • Be the first to comment

  • Be the first to like this

20191119_The OpenAIRE Research Graph

  1. 1. @openaire_euOpenAIRE-Connect Review 23rd of April, 2018 - Brussels The OpenAIRE Research Graph Bringing scholarly communication back into the hands of scientists PaoloManghi InstituteofInformationScienceandTechnologies ConsiglioNazionaledelleRicerche
  2. 2. Materializing the Open Science Graph Project communit y FunderFunding Product Publicatio n Researc h Data Software Organizatio n Source Other res. products Mining Deduplication End-user feedback Scientific product catalogue Harvesting GUIDE LINES Research Infrastructures Publishing IT OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  3. 3. Providing an open metadata research graph of interlinked scientific products, with Open Access information, linked to funding information and research communities The OpenAIRE research graph Open Complete De-duplicated Transparent Participatory Decentralized Trusted
  4. 4. De-duplicated More information about the de-duplication framework used by OpenAIRE can be found searching on Zenodo for : • “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster) • “GDup: De-Duplication of Scholarly Communication Big Graphs” Metadata records corresponding to equivalent objects are merged Scientific products Organizations
  5. 5. Complete: community-trusted sources Academic Graph … and more … and more … and more … and more … and more … and more OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  6. 6. • Rely on quality scholarly communication sources of different kinds Participatory • Include solutions and content from any interested and known content provider in scholarly communication Institutional repositories Aggregators Data archives Software repositories Research infrastructure sources Funder grant databases Authors & Orgs entity registries Publishers & journals
  7. 7. • Metadata in the graph includes provenance when harvested and reliability indicators when obtained from mining Transparent
  8. 8. • Preservation and ownership beyond OpenAIRE Exchanged with other graph initiatives Broker Service: Redistributed via subscription and notification to contributing data sources (provide.openaire.eu) • Openly accessible via APIs (develop.openaire.eu) Decentralized
  9. 9. • Authors in the loop to enrich their ORCID record • Validation of end-user ”claims” Trusted (November 2019)
  10. 10. Populating the Graph
  11. 11. Harvesting: Revised Classification of Research Products Publications • Article • Preprint • Report • … Datasets • Dataset • Collection • Clinical Trials • … Software • Research Software • … Other Research Products • Service • Workflow • Interactive Resource • … Institutional/ publication repositories Journals/ publishers Data repositories Other Products repositories Software repositories Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
  12. 12. Open Science publishing Bridging RIs and Scholarly Communication Transparency and reproducibility e-Infrastructures and Research Infrastructures Scholarly Communication infrastructure Dataset Method Thematic Service Dataset Experiment Publishing the experiment Input Dataset Input Method Output Dataset Experiment product Thematic Service Parameters Experiment repo Research data, Software, Workflows, Publications Data repo Method repo Publications IT Harvesting OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  13. 13. • EPOS Research Infrastructure Reproducibility Transparency Seamless publishing Open Science publishing workflows OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  14. 14. Pre-processed sources Article-dataset links 480Mi links CrossRef enriched 85Mi publication records DOIBoost Academic Graph Published every 6 months (new versions to be published next week) OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  15. 15. Context Propagation Product Source Country Project Organization communit y Product Project Source Product Project Product supplementedBy fundedBy hostedBy (institutional repository) located Funder funds (National Funder) fundedBy jurisdiction located ofInterestofInterest fundedBy hostedBy Product supplementedBy 157K 8Mi 10K OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  16. 16. Production: Open Access CAPs BETA: Open Science CAPs 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 Old CAP New CAP literature 0 2000000 4000000 6000000 8000000 10000000 12000000 Old CAP New CAP research data 0 20000 40000 60000 80000 100000 120000 140000 Old CAP New CAP software 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Old CAP New CAP other 110Mi 30Mi 1Mi 10Mi 100K 180K 3Mi 7.5Mi Harvested content • Data sources 10K + • Records ~480Mi • Publication full-texts ~12Mi (Springer N. coming) • Links (also text-mined) ~960Mi PROD BETA PROD BETA PROD BETAPROD BETA OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  17. 17. Microsoft Research (being drafted) Unpaywall (ongoing) ORCID membership (November 2019) RDA IG Open Science Graphs for FAIR Data FREYA, ResearchGraph, OpenCitations, Open Knowledge Research Graph IG Session at RDA Helsinki 2019 (15th of October 2019) Liaisons Academic Graph OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  18. 18. • October-November 2019: OpenAIRE Research Graph open for consultation Collecting feedback via Trello (operational end of September) • December 2019: OpenAIRE Research Graph in production BETA Graph Open Consultation http://beta.explore.openaire.eu OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  19. 19. Trello for for feedback
  20. 20. Thank you! Paolo Manghi paolo.manghi@isti.cnr.it
  21. 21. Architecture, technologies, and infrastructure
  22. 22. Metadata records files cleaned records Full-text cache Transform Clean Identify equivelent products and organisation s Aggregation subsystem De-duplication subsystem Information Inference subsystem Data Sources Populate Merge equivalent objects Data provision subsystem Collect Native graph “slices” Publishing subsystem Data Monitoring Action Sets (similarity rels) Front-end Native graph Deduped graph Extract full-text Copy of deduped graph Enrich graphs with links Action Set (inferred links) Enriched graph Propagation Text-mining of the full-texts and the graph to derive new semantic links Architecture and technologies: today
  23. 23. Task 9.1. System administration - infrastructure: before Jan 2018 Public System 20srv 122CPU 320GB 8TB Mining System 21srv 406CPU 2TB 385TB Data provision System 23srv 154CPU 430GB 23TB Testing System 5srv 30CPU 100GB 3TB Public System 44srv 274CPU 905GB 20TB Mining System 22srv 414CPU 2.2TB 388TB Data provision System 23srv 154CPU 430GB 24TB Testing System 14srv 86CPU 302GB 9TB OpenAIREAdvance1stReview|Luxembourg|10Oct2019

×