SlideShare a Scribd company logo
1 of 38
DiggiCORE: Digging into Connected
          Repositories
              Petr Knoth
       Knowledge Media institute
          The Open University



                1/38
Outline
1. Connecting by aggregating Open Access (OA) publications
 •   Why agregate and who is it for
 •   The added value of aggregations
2. The CORE system
3. Supporting research in mining databases of scientific
   publications




                              2/38
Outline
1. Connecting by aggregating Open Access (OA) publications
 •   Why agregate and who is it for
 •   The added value of aggregations
2. The CORE system
3. Supporting research in mining databases of scientific
   publications




                             3/38
The rapid rise of OA articles




 The graph (from Laasko and Bjork's paper - BMC Medicine 2012, 10:124) shows
 the numbers of papers published in three different types of online open access
 journals from 2000 to 2011.

                                         4/38
Growth of Open Access repositories




                         5/38
Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and services,
and generate new knowledge from repository content.’’
                                                   [COAR manifesto]


                                6/38
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                                         OLTP
                                                                                                  Analytical



                                                 Semantic Enrichment
Repository
                                                                                             information access




                                                                                Interfaces
                                         Aggregation
                                                                                                 Transaction
  Repository                                                                                 information access
                                      Content



                                                                         OLAP



                                                                                              Raw data access
Repository


                                                                       7/38
Who should be supported by aggregations?

• The following users groups (divided according to the level of
  abstraction of information they need):
   •   Raw data access. Developers, DLs, DL researchers, companies …
   •   Transaction information access. Researchers, students, life-long learners …
   •   Analytical information access. Funders, government, bussiness intelligence
       …




                                     8/38
What is it all about?




                        9/38
Outline
1. Connecting by aggregating Open Access (OA) publications – why,
   how, what for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications




                              10/38
CORE objective

 CORE aims to provide a technical infrastructure for Open Access
 scholarly publications that will support access and reuse of scholarly
 materials at different levels of abstraction.




                               11/38
CORE functionality

              Content harvesting, processing




                             12/38
CORE functionality

                             Semantic enrichment




                     13/38
CORE functionality




              Providing services




                              14/38
What does CORE provide at different access levels?

                                                                  Repository Analytics


               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                   Interfaces
                                              Enrichment
                                                                   CORE Portal, CORE
                                       Aggregation
                                                                   Mobile, CORE Plugin
                                                                                    Transaction
  Repository                                                                    information access
                                   Content



                                                           OLAP
                                                                                 CORE API
                                                                                 CORE API

                                                                                  Raw data access
Repository


                                                       15/38
CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories




                                   16/38
CORE Applications

CORE Mobile – Allows
searching and
navigating scientific
publications aggregated
from Open Access
repositories




                          17/38
CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.




                                 18/38
CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).




                                   19/38
20/38
CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.


                                                  • Search service
                                                  • Pdf and plain text
                                                    service
                                                  • Similarity service
                                                  • Classification service
                                                  • Citation service




                                  21/38
CORE Applications
CORE API registered users:
British Education Index
Cottagelabs
UKCORR
Europeana
ULCC
Library, The Open University
Los Alamos National Laboratory, USA
University of Manchester Library
Universidad de los Andes. Bogotá, Colombia
UNESCO



                                 22/38
CORE visits (October 2012)




More than 6000 visits per day




                23/38
Outline
1. Connecting by aggregating Open Access (OA) publications – why,
   how, what for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications




                              24/38
Objective


Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).




                           25/38
DiggiCORE networks




Three networks: (a) semantically related papers,
(b) citation network, (c) author citation network


                          26/38
The problem of result transparency

Google Scholar




Microsoft Academic Search




                            27/38
DiggiCORE objectives

Allow researchers to use this platform to analyse
publications.
Why?
•   To identifying patterns in the behaviour of research
    communities
•   To detect trends in research disciplines
•   To gain new insights into the citation behaviour of researchers
•   To discover features that distinguish papers with high impact



                               28/38
Questions the system can help answering?
•   What are the attributes of impact publications?
•   Do these attributes differ in the humanities, social sciences and
    computer sciences?
•    What are the features of research groups within disciplines and
    how do these features relate to contributions generated by the
    group?
•   What are the attributes of high-impact authors and what is their
    role within the group?
•    What are the dynamics of successful research groups?



                                29/38
Questions the system can help answering?
•   What is the mechanism of cross-fertilisation within disciplines,
    especially between the humanities and the sciences?
•   Who are the authors whose work is worth monitoring because
    they contribute to the achievements of their own discipline and
    also inspire other disciplines?
•   How should the novice in the discipline get acquainted with key
    achievements in the discipline?
•    How should he/she search for the most important publications?




                               30/38
Challenges
•   Technical issues of quick Open Access harvesting
•   Lack of understanding of publishers of academics of Open
    Access licenses
•   Explain the added value of full-text vs metadata aggregations:
    • User experience
    • Text-mining




                                31/38
The power of full-text aggregations (WorldCat vs CORE)




                          32/38
Text-mining
“There are currently over 144,000 full time equivalent academic
professionals (teaching and research) working in UK higher
education. Using data from the Higher Education Statistics Agency
(HESA) for UK academic salaries, the median salary for a UK
academic falls into a band of between £42k and £55k, which
translates to between £26 and £33 per working hour. If text mining
enabled just a 2% increase in productivity – corresponding to only
45 minutes per academic per working week (and looking at CIBER’s
analysis of the impact of eJournals, this is very much an
underestimate), this would imply over 4.7 million working hours and
additional productivity worth between £123.5m and £156.8m in
working time per year.” [McDonnald & Kelly, 2012] – JISC report on
text-mining
                              33/38
Cost of Gold OA




             http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/


                                  34/38
Summary
•   Aggregations should serve the needs of different user groups.
•   Transparency is crucial
•   Machine access to publications provides lots of new
    opportunities.
•   We can have many services that are part of the infrastructure,
    but should work with the same data.
•   CORE aims to
    • prepare the way for innovative open access services
    • demonstrate the benefits of programmable access to
        publications
    • data mine publications for impact characteristics
                                35/38
Partners




Advisory Board



                 36/38
Questions?




             37/38
38/38

More Related Content

Viewers also liked

Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
petrknoth
 
DEVCSI Core Mobile
DEVCSI Core MobileDEVCSI Core Mobile
DEVCSI Core Mobile
petrknoth
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
petrknoth
 
Ali’S Careers Power Point
Ali’S Careers Power PointAli’S Careers Power Point
Ali’S Careers Power Point
guestb4db5a8
 
Snail 12345
Snail 12345Snail 12345
Snail 12345
reblyn1
 
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
Zarobiza
 

Viewers also liked (16)

Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
DEVCSI Core Mobile
DEVCSI Core MobileDEVCSI Core Mobile
DEVCSI Core Mobile
 
Core presentation
Core presentationCore presentation
Core presentation
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Ali’S Careers Power Point
Ali’S Careers Power PointAli’S Careers Power Point
Ali’S Careers Power Point
 
All Joke Photos
All Joke PhotosAll Joke Photos
All Joke Photos
 
Amicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource companyAmicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource company
 
The murder of a student.
The murder of a student.The murder of a student.
The murder of a student.
 
Snail 12345
Snail 12345Snail 12345
Snail 12345
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
Suman Pandit
Suman PanditSuman Pandit
Suman Pandit
 
The Clown Doctor
The Clown DoctorThe Clown Doctor
The Clown Doctor
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
 
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
 

Similar to DiggiCORE: Digging into Connected Repositories

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
Jian Qin
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
Nikesh Narayanan
 
Organic.Edunet Repository Tools
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository Tools
Hannes Ebner
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
floyd taag
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 

Similar to DiggiCORE: Digging into Connected Repositories (20)

ECLAP Tutorial first part, ECLAP 2012 conference. the general overview
ECLAP Tutorial first part, ECLAP 2012 conference. the general overviewECLAP Tutorial first part, ECLAP 2012 conference. the general overview
ECLAP Tutorial first part, ECLAP 2012 conference. the general overview
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Metasearchers Benchmarking
Metasearchers BenchmarkingMetasearchers Benchmarking
Metasearchers Benchmarking
 
OAI-PMH
OAI-PMHOAI-PMH
OAI-PMH
 
CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable  CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable
 
Organic.Edunet Repository Tools
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository Tools
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...
 
Crushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataCrushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional Data
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 

More from petrknoth

Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 

More from petrknoth (16)

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

DiggiCORE: Digging into Connected Repositories

  • 1. DiggiCORE: Digging into Connected Repositories Petr Knoth Knowledge Media institute The Open University 1/38
  • 2. Outline 1. Connecting by aggregating Open Access (OA) publications • Why agregate and who is it for • The added value of aggregations 2. The CORE system 3. Supporting research in mining databases of scientific publications 2/38
  • 3. Outline 1. Connecting by aggregating Open Access (OA) publications • Why agregate and who is it for • The added value of aggregations 2. The CORE system 3. Supporting research in mining databases of scientific publications 3/38
  • 4. The rapid rise of OA articles The graph (from Laasko and Bjork's paper - BMC Medicine 2012, 10:124) shows the numbers of papers published in three different types of online open access journals from 2000 to 2011. 4/38
  • 5. Growth of Open Access repositories 5/38
  • 6. Why we need aggregations? “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto] 6/38
  • 7. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Analytical Semantic Enrichment Repository information access Interfaces Aggregation Transaction Repository information access Content OLAP Raw data access Repository 7/38
  • 8. Who should be supported by aggregations? • The following users groups (divided according to the level of abstraction of information they need): • Raw data access. Developers, DLs, DL researchers, companies … • Transaction information access. Researchers, students, life-long learners … • Analytical information access. Funders, government, bussiness intelligence … 8/38
  • 9. What is it all about? 9/38
  • 10. Outline 1. Connecting by aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications 10/38
  • 11. CORE objective CORE aims to provide a technical infrastructure for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction. 11/38
  • 12. CORE functionality Content harvesting, processing 12/38
  • 13. CORE functionality Semantic enrichment 13/38
  • 14. CORE functionality Providing services 14/38
  • 15. What does CORE provide at different access levels? Repository Analytics Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment CORE Portal, CORE Aggregation Mobile, CORE Plugin Transaction Repository information access Content OLAP CORE API CORE API Raw data access Repository 15/38
  • 16. CORE Applications CORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories 16/38
  • 17. CORE Applications CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories 17/38
  • 18. CORE Applications CORE Plugin – A plugin to system that recommendations for related items. 18/38
  • 19. CORE Applications Repository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers). 19/38
  • 20. 20/38
  • 21. CORE Applications CORE API – Enables external systems and services to interact with the CORE repository. • Search service • Pdf and plain text service • Similarity service • Classification service • Citation service 21/38
  • 22. CORE Applications CORE API registered users: British Education Index Cottagelabs UKCORR Europeana ULCC Library, The Open University Los Alamos National Laboratory, USA University of Manchester Library Universidad de los Andes. Bogotá, Colombia UNESCO 22/38
  • 23. CORE visits (October 2012) More than 6000 visits per day 23/38
  • 24. Outline 1. Connecting by aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications 24/38
  • 25. Objective Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR). 25/38
  • 26. DiggiCORE networks Three networks: (a) semantically related papers, (b) citation network, (c) author citation network 26/38
  • 27. The problem of result transparency Google Scholar Microsoft Academic Search 27/38
  • 28. DiggiCORE objectives Allow researchers to use this platform to analyse publications. Why? • To identifying patterns in the behaviour of research communities • To detect trends in research disciplines • To gain new insights into the citation behaviour of researchers • To discover features that distinguish papers with high impact 28/38
  • 29. Questions the system can help answering? • What are the attributes of impact publications? • Do these attributes differ in the humanities, social sciences and computer sciences? • What are the features of research groups within disciplines and how do these features relate to contributions generated by the group? • What are the attributes of high-impact authors and what is their role within the group? • What are the dynamics of successful research groups? 29/38
  • 30. Questions the system can help answering? • What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences? • Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines? • How should the novice in the discipline get acquainted with key achievements in the discipline? • How should he/she search for the most important publications? 30/38
  • 31. Challenges • Technical issues of quick Open Access harvesting • Lack of understanding of publishers of academics of Open Access licenses • Explain the added value of full-text vs metadata aggregations: • User experience • Text-mining 31/38
  • 32. The power of full-text aggregations (WorldCat vs CORE) 32/38
  • 33. Text-mining “There are currently over 144,000 full time equivalent academic professionals (teaching and research) working in UK higher education. Using data from the Higher Education Statistics Agency (HESA) for UK academic salaries, the median salary for a UK academic falls into a band of between £42k and £55k, which translates to between £26 and £33 per working hour. If text mining enabled just a 2% increase in productivity – corresponding to only 45 minutes per academic per working week (and looking at CIBER’s analysis of the impact of eJournals, this is very much an underestimate), this would imply over 4.7 million working hours and additional productivity worth between £123.5m and £156.8m in working time per year.” [McDonnald & Kelly, 2012] – JISC report on text-mining 33/38
  • 34. Cost of Gold OA http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/ 34/38
  • 35. Summary • Aggregations should serve the needs of different user groups. • Transparency is crucial • Machine access to publications provides lots of new opportunities. • We can have many services that are part of the infrastructure, but should work with the same data. • CORE aims to • prepare the way for innovative open access services • demonstrate the benefits of programmable access to publications • data mine publications for impact characteristics 35/38
  • 37. Questions? 37/38
  • 38. 38/38

Editor's Notes

  1. What happens in the box: A metadata interoperability layer, metadata, content, enrichment, presentation layer
  2. All text mining takes place at this phase