Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to OpenAIRE services and the OpenAIRE Research Graph

21 views

Published on

Introduction to the OpenAIRE services and the OpenAIRE Research Graph from Paolo Manghi - OpenAIRE Open Innovation Call

Published in: Science
  • Be the first to comment

  • Be the first to like this

Introduction to OpenAIRE services and the OpenAIRE Research Graph

  1. 1. @openaire_eu OpenAIRE Services Paolo Manghi Istituto di Scienza e Tecnologie dell’Informazione, CNR
  2. 2. Research communities Researchers (All) Content providers Innovators Research managers Funders Building the graph and Dashboards OpenAIRE Dashboards Validation Cleaning De-duplication Inference Research Graph Services Project communiity FunderFunding Product Publicatio n Data Software Organizatio n TERMS OF USE Harvesting Uploading Brokering Source ORP Publications repositories Data repositories Hybrid repositories Registries OA Journals Software repositories Content Providers Research Infras GUIDE LINES
  3. 3. Metadata records files cleaned records Full-text cache Transform Clean Identify equivelent products and organisation s Aggregation subsystem De-duplication subsystem Information Inference subsystem Data Sources Populate Merge equivalent objects Data provision subsystem Collect Native graph “slices” Publishing subsystem Data Monitoring Action Sets (similarity rels) Front-end Native graph Deduped graph Extract full-text Copy of deduped graph Enrich graphs with links Action Set (inferred links) Enriched graph Propagation Text-mining of the full-texts and the graph to derive new semantic links Architecture and technologies: today
  4. 4. Round-table of Open Source Technologies
  5. 5. Resources Public System 20srv 122CPU 320GB 8TB Mining System 21srv 406CPU 2TB 385TB Data provision System 23srv 154CPU 430GB 23TB Testing System 5srv 30CPU 100GB 3TB Public System 44srv 274CPU 905GB 20TB Mining System 22srv 414CPU 2.2TB 388TB Data provision System 23srv 154CPU 430GB 24TB Testing System 14srv 86CPU 302GB 9TB
  6. 6. 6 OpenAIRE technical staff (40+ members)
  7. 7. The OpenAIRE Research Graph
  8. 8. Materializing the Open Science Graph Project communit y FunderFunding Product Publicatio n Researc h Data Software Organizatio n Source Other res. products Mining Deduplication End-user feedback Harvesting GUIDE LINES Research Infrastructures Publishing IT OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  9. 9. Providing an open metadata research graph of interlinked scientific products, with Open Access information, linked to funding information and research communities The OpenAIRE research graph Open Complete De-duplicated Transparent Participatory Decentralized Trusted
  10. 10. Complete: community-trusted sources Academic Graph … and more … and more … and more … and more … and more … and more
  11. 11. Harvesting/transformation workflows Source A Collect Transform Source B Native XML Cleaned XML Collect Transform Native XML Cleaned XML Data Collection Workflow Sub-Workflow Sub-Workflow Monitoring Data Quality/Expectations across sources, within sources, etc. • Workflow templates and workflow executions (scheduled) • Provenance • Types of products • Etc. Transformation • Moving from XML to JSON frameworks: XSLT to JSON, XML to JSON GUIDE LINES GUIDE LINES
  12. 12. Fine-grained classification of Research Products Publications • Article • Preprint • Report • … Datasets • Dataset • Collection • Clinical Trials • … Software • Research Software • … Other Research Products • Service • Workflow • Interactive Resource • … Institutional/ publication repositories Journals/ publishers Data repositories Other Products repositories Software repositories OpenAIRE-Advance Review, January 2019
  13. 13. Pre-processed sources Article-datasetlinks 480Milinks CrossRefenriched 85Mipublicationrecords DOIBoost Academic Graph Published every 6 months (new versions to be published next week) Generating and maintaing dumps overtime • Versions • Incremental
  14. 14. • MapReduce on HDFS/Spark • 13 Millions full-texts • Java/Python framework Mining Find new metadata and links • Identification of links to entities (URLs, PIDs) • Semantics for documents, datasets, software • Semantics of links • Links to web docs • Ecc Collect Open Access PDFs • Pro-actively collect pre-prints • Identify Open Access versions
  15. 15. Context Propagation Product Source Country Project Organization communit y Product Project Source Product Project Product supplementedBy fundedBy hostedBy (institutional repository) located Funder funds (National Funder) fundedBy jurisdiction located ofInterestofInterest fundedBy hostedBy Product supplementedBy 157K 8Mi 10K OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  16. 16. De-duplication (BETA Content) More information about the de-duplication framework used by OpenAIRE can be found searching on Zenodo for : • “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster) • “GDup: De-Duplication of Scholarly Communication Big Graphs” Deduplication techniques (MapReduce based, Java) • Improving results by adding context
  17. 17. Production: Open Access CAPs BETA: Open Science CAPs 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 Old CAP New CAP literature 0 2000000 4000000 6000000 8000000 10000000 12000000 Old CAP New CAP research data 0 20000 40000 60000 80000 100000 120000 140000 Old CAP New CAP software 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Old CAP New CAP other 110Mi 30Mi 1Mi 10Mi 100K 180K 3Mi 7Mi Harvested content • Data sources 12K + • Records 450Mi • Publication full-texts 11,6Mi (Springer N. coming) • Links (also text-mined) 680Mi PROD BETA PROD BETA PROD BETAPROD BETA OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  18. 18. How to access the services
  19. 19. API and access Bulk OAI-PMH Dumps in Zenodo for large datasets HTTP Search Search REST APIs Linked Open Data SparQL LOD dumps Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica http://develop.openaire.eu Average unique visitors per month 25,000 Average hits per month 2,2Mi
  20. 20. DOIBoost Result DOI Preprint 10.5281/zenodo.1492766 Software toolkit 10.5281/zenodo.1492210 Dataset dump 10.5281/zenodo.1438356
  21. 21. Scholexplorer
  22. 22. • October-November 2019: OpenAIRE Research Graph open for consultation Collecting feedback via Trello (operational end of September) • December 2019: OpenAIRE Research Graph in production BETA Graph Open Consultation http://beta.explore.openaire.eu • Identify errors/inconsistencies (semi-)automatically • Crowd-sourcing
  23. 23. OpenAIRE Stand-Alone Services
  24. 24. Access use-cases: APIs and web portal Harvesting of article- dataset and dataset- dataset scholarly links API WebUI: link discovery/navigation API: link search/resolution Other sources 17,5Mi literature objects, 50,7Mi datasets, 481,3Mi Scholix links; Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica 40Mi hits/month (~1Bi hits since Jan 2018)
  25. 25. • Numbers 17,5Mi literature objects, 50,7Mi datasets, 481,3Mi Scholix links; • API Adoption 40Mi hits per month Scholexplorer
  26. 26. Access use-cases: APIs and web portal Other sources Harvesting of links API API: link search/resolutio n WebUI: link discovery/navigation 40Mi hits/month (~1Bi hits since Jan 2018) OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  27. 27. OpenAIREAdvance1stReview|Luxembourg|10Oct2019 • Data: 141TB • Files: 3.5M • Records: 1,389,303 • Largest File: 516GB • Largest Dataset: 2.5TB • Visitors: 2M / year Zenodo: Content & Usage 27 Growth
  28. 28. Is it a questionnaire management system? Definite no! • Articulated handling of a DMP Publishing, discovery, reuse, statistics onDMPs • Actionable DMPs Validation ofstatements viaexternal services • Collaborative DMP composition Researchers intheloop ArgOS Machine-actionable data management planning Powered by OpenDMP Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
  29. 29. • Amnesia is a data anonymization tool available at https://amnesia.openaire.eu Amnesiacanbeusedlocallyoron-line On-line is for demos and training, not safe • Offers true anonymity and not pseudo-anonymity k-anonymityandkm-anonymity • Numbers in 2019 till now: 33Khits 7Kusesoftheon-lineservice 470installations Amnesia Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
  30. 30. OpenAIRE Dashboards
  31. 31. High-Level View Harvesting GUIDE LINES Research Infrastructures Publishing IT
  32. 32. • Repository registration and validation • Repository Usage Statistics • Repository Broker Service Services for Content Providers http://provide.openaire.eu Screenshot
  33. 33. • 24 repositories defined at least one subscription • Integrate with repositories (Zenodo) and aggregators (LA Referencia) • Towards PlanS implementation (PDF brokering) Broker Service
  34. 34. Example of record enrichments: From LaReferencia to OpenAIRE
  35. 35. • Topics have data sources as targets • Events regard an object in a given data source • Data sources: Publication repositories from OpenDOAR Data Archives from re3data.org Topics Event (potential notification): • Message • Topic • TargetRepository • Trust
  36. 36. Events Properties or links that are not available in the records Merge Inference Claims Enrichments Records that should be in the repository but are NOT in the repository Deduction from authors Deduction from affiliation Additions Wrong links End-user feedbacks Alerts
  37. 37. Broker User interfaces 37
  38. 38. Usage statistics service for Content Providers
  39. 39. ● Join OpenAIRE Usage Statistics ○ enable “usage metrics” for your data source ○ download & configure tracking plugin in your data source ○ confirmation by OpenAIRE once usage events are tracked in PIWIK ● or enter SUSHI endpoint to let OpenAIRE collect COUNTER reports Metrics Download tracker Configure Deploy & Test Validation & Confirmation
  40. 40. Enable Metrics for content providers
  41. 41. Summarized Usage Statistics on the content provider level
  42. 42. Research Community Dashboard and Gateways Research Community Dashboard Researcher Search-Navigate-Monitor Research Products Community Gateway Community Gateway Community Manager Configure criteria of inclusion into Gateway as-a-Service IT
  43. 43. • Subjects of pertinence • Provenance (data source) + critieria • Zenodo communities • Projects • Propagation via relationships Publication «supplementedBy» Data/Software Project «funds» Publication/Data/Software Criteria for inclusion New criteria • Via ORCID • Others?
  44. 44. Monitoring trends and impact MONITOR Funding impact Funding attraction Open Science impact Open Access impact Research Impact 28 Funders in BETA
  45. 45. Monitoring trends and impact MONITOR Funding impact Funding attraction Open Science impact Open Access impact Research Impact 28 Funders in BETA Funders • Trends in research fields: new (multidisciplinary) disciplines Institutions • OA/OS behavior, ability to attract cross-funder grants Projects • Success, interconnections, possible liaisons Funders • Recent and past EC and other funders’ activities (representing various funding levels) • Checking compliance to funder mandates Institutions • Collaboration network (by institution) via projects and products • Ability to attract funds from different funders Projects • Check if projects are eligible for Post-Grant APC funding • Compare project portfolio against that of other similar institutions (anonymized)
  46. 46. Search and discovery portal http://explore.openaire.euhttp://beta.explore.openaire.eu
  47. 47. Thank you! Paolo Manghi paolo.manghi@isti.cnr.it

×