CORE: Aggregating and Enriching Content to Support Open Access

P
CORE: Aggregating and Enriching
Content to Support Open Access
            Petr Knoth
        The Open University




              1/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                              2/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                             3/52
Growth of items in Open Access repositories




                         4/52
Growth of Open Access repositories




                         5/52
Growth of articles in OA journals




                           6/52
Growth of OA journals




                        7/52
Green Open Access - statistics




                       8/52
Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and
services, and generate new knowledge from repository content.’’
                                                   [COAR manifesto]


                                9/52
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                                         OLTP
                                                                                                  Analytical



                                                 Semantic Enrichment
Repository
                                                                                             information access




                                                                                Interfaces
                                         Aggregation
                                                                                                 Transaction
  Repository                                                                                 information access
                                      Content



                                                                         OLAP



                                                                                              Raw data access
Repository


                                                                       10/52
Who should be supported by aggregations?

The following users groups (divided according to the level of
abstraction of information they need):
   •   Raw data access.
   •   Transaction information access.
   •   Analytical information access.




                                    11/52
Who should be supported by aggregations?

• The following users groups (divided according to the level of
  abstraction of information they need):
   •   Raw data access. Developers, DLs, DL researchers, companies …
   •   Transaction information access. Researchers, students, life-long learners …
   •   Analytical information access. Funders, government, bussiness intelligence
       …




                                     12/52
Layers of an aggregation system


                                Interfaces

                 OLTP                           OLAP

                                  Enrichment

              Metadata                          Content

   Metadata Transfer Interoperability




                                        13/52
Layers of an aggregation system
                   APIs (REST, SOAP, XML-RPC), UIs, Dashboards    Statistics


                                Interfaces

                 OLTP                                OLAP

                                  Enrichment
                                                                 Catalog records
              Metadata                               Content

   Metadata Transfer Interoperability
                                                                   Annotations

    OAI-PMH, OAI-ORE …             Dublin Core, XML, RDF …       PDF, Word …


                                        14/52
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                              OLTP
Repository                                                                             Analytical
                                                                                  information access




                                                                     Interfaces
                                                 Enrichment
                                                                                      Transaction
  Repository                                                                      information access
                                      Content



                                                              OLAP


                                                                                   Raw data access
Repository


                                                          15/52
Related systems




     16/52
Aggregation projects – BASE



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       17/52
Aggregation projects – OAISter/WorldCAT



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       18/52
Aggregation projects – RepUK



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       19/52
Aggregations need access to content, not just metadata!

• Certain metadata types can be created only at the level of the
  aggregation
• Certain metadata can be changing in time
• Ensuring content:
   • accessibility
   • availability
   • validity
   • quality
   • …



                               20/52
Aggregation projects – CiteSeerX



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       21/52
Should an aggregation system support all three user types?

            Can be realised by more than one system
                          providing that
                    the dataset is the same!




                             22/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                              23/52
CORE objectives
• CORE aims to provide a comprehensive technical infrastructure
  for Open Access scholarly publications that will support access
  and reuse of scholarly materials at different levels of abstraction.
• A nation-wide aggregation system that will improve the discovery
  of publications stored in British Open Access Repositories (OARs).




                                24/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         25/52
CORE functionality




                     26/52
CORE functionality
Step 1: Metadata and full-text harvesting



                       Content harvesting, processing




                                    27/52
What does CORE provide at different aggregation levels?
                                                                    Semantic similarity, Citation
                                                                    extraction, classsification, …



                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                                Analytical
                                                                                     information access




                                                                      Interfaces
                                                Enrichment
                                                                                         Transaction
  Repository                                                                         information access
                                     Content



                                                             OLAP


                                                                                       Raw data access
Repository


                                                         28/52
CORE functionality
Step 2: Semantic enrichment




                                      Semantic enrichment




                              29/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         30/52
CORE functionality
Step 3: Providing a set of services on top of the aggregation




                        Providing services




                                    31/52
CORE applications

 •   CORE Portal
 •   CORE Mobile
 •   CORE Plugin
 •   CORE API
 •   Repository Analytics




                            32/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         33/52
CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories




                                   34/52
CORE Applications

CORE Mobile – Allows searching and
navigating scientific publications
aggregated from Open Access
repositories




                                35/52
CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.




                                 36/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         37/52
CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.




                                  38/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         39/52
CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).




                                   40/52
What does CORE provide at different aggregation levels?

                                                                    Repository Analytics


                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                              Analytical
                                                                                   information access




                                                                     Interfaces
                                                Enrichment
                                                                     CORE Portal, CORE
                                                                     Mobile, CORE Plugin
                                                                                      Transaction
  Repository                                                                      information access
                                     Content



                                                             OLAP
                                                                                   CORE API

                                                                                    Raw data access
Repository


                                                         41/52
CORE statistics
• Content
   • 5.4M records
   • 192 repositories
   • 402k full-texts
• Started: February 2011
• Budget: 140k£




                           42/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (          )




                              43/52
Partners




Advisory Board



                 44/52
Objective


Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).




                           45/52
DiggiCORE networks




Three networks: (a) semantically related papers,
(b) citation network, (c) author citation network


                          46/52
DiggiCORE objectives

Allow researchers to use this platform to analyse
publications.
Why?
•   To identifying patterns in the behaviour of research
    communities
•   To detect trends in research disciplines
•   To gain new insights into the citation behaviour of researchers
•   To discover features that distinguish papers with high impact



                               47/52
Questions the system can help answering?
•   What are the attributes of impact publications?
•   Do these attributes differ in the humanities, social sciences and
    computer sciences?
•    What are the features of research groups within disciplines and
    how do these features relate to contributions generated by the
    group?
•   What are the attributes of high-impact authors and what is their
    role within the group?
•    What are the dynamics of successful research groups?



                                48/52
Questions the system can help answering?
•   What is the mechanism of cross-fertilisation within
    disciplines, especially between the humanities and the
    sciences?
•   Who are the authors whose work is worth monitoring because
    they contribute to the achievements of their own discipline and
    also inspire other disciplines?
•   How should the novice in the discipline get acquainted with key
    achievements in the discipline?
•    How should he/she search for the most important publications?



                               49/52
Summary
•   The rapid growth of OA content provides both an opportunity as
    well as a challenge.
•   Aggregations should serve the needs of different user groups.
•   Aggregations need to aggregate content, not just metadata.
•   We can have many services that are part of the
    infrastructure, but should work with the same data.




                               50/52
Thank you!




Yes we can!
   51/52
52/52
1 of 52

Recommended

DiggiCORE: Digging into Connected Repositories by
DiggiCORE: Digging into Connected RepositoriesDiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected Repositoriespetrknoth
1.1K views38 slides
MapR lucidworks joint webinar by
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinarTed Dunning
1.3K views21 slides
Towards an Infrastructure for Mining Scientific Publications by
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
870 views38 slides
Linked Open data: CNR by
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
680 views32 slides
Linked data functional genomics by
Linked data functional genomicsLinked data functional genomics
Linked data functional genomicsMikel Egaña Aranguren, Ph.D.
1.1K views45 slides
High level-api in tensorflow by
High level-api in tensorflowHigh level-api in tensorflow
High level-api in tensorflowHyungjoo Cho
3.5K views38 slides

More Related Content

Similar to CORE: Aggregating and Enriching Content to Support Open Access

Open Archives Initiatives For Metadata Harvesting by
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
3.8K views14 slides
Data repositories -- Xiamen University 2012 06-08 by
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
826 views21 slides
Organic.Edunet Repository Tools by
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository ToolsHannes Ebner
835 views21 slides
Digitisation and institutional repositories 3 by
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Libsoul Technologies Pvt. Ltd.
723 views23 slides
OLAP & DATA WAREHOUSE by
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
73.2K views37 slides

Similar to CORE: Aggregating and Enriching Content to Support Open Access(20)

Open Archives Initiatives For Metadata Harvesting by Nikesh Narayanan
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
Nikesh Narayanan3.8K views
Data repositories -- Xiamen University 2012 06-08 by Jian Qin
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
Jian Qin826 views
Organic.Edunet Repository Tools by Hannes Ebner
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository Tools
Hannes Ebner835 views
OLAP & DATA WAREHOUSE by Zalpa Rathod
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod73.2K views
OLAP & Data Warehouse by Zalpa Rathod
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
Zalpa Rathod4.2K views
Enterprise linked data clouds by damienjoyce
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
damienjoyce421 views
Contributing to the Smart City Through Linked Library Data by Marcia Zeng
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
Marcia Zeng2.3K views
Text mining in CORE (OR2012) by petrknoth
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
petrknoth1.2K views
Net flowhadoop flocon2013_yhlee_final by Yeounhee Lee
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_final
Yeounhee Lee2.5K views
ESI Supplemental Webinar 2 - DataONE presentation slides by DuraSpace
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
DuraSpace2K views
Data Mining: Data mining and key definitions by Datamining Tools
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
Datamining Tools742 views
Putting it all together for digital assets by Jon Morley
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
Jon Morley268 views
The ARIADNE interoperability framework, component architecture and registry s... by ariadnenetwork
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...
ariadnenetwork1.1K views
Building a Data Discovery Network for Sustainability Science by Robert H. McDonald
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Robert H. McDonald1.2K views
Real-Time Data Flows with Apache NiFi by Manish Gupta
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta19.8K views

More from petrknoth

Qui Bono? Cumulative advantage in open access publishing by
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingpetrknoth
50 views18 slides
CORE APIv3 by
CORE APIv3CORE APIv3
CORE APIv3petrknoth
284 views20 slides
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories by
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositoriespetrknoth
516 views23 slides
UKRI OA policy requirements for repositories and how to meet them by
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet thempetrknoth
404 views24 slides
Enabling Educators to Locate High-Quality Teaching Resources by
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resourcespetrknoth
237 views11 slides
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard by
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboardpetrknoth
416 views38 slides

More from petrknoth(20)

Qui Bono? Cumulative advantage in open access publishing by petrknoth
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
petrknoth50 views
CORE APIv3 by petrknoth
CORE APIv3CORE APIv3
CORE APIv3
petrknoth284 views
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories by petrknoth
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
petrknoth516 views
UKRI OA policy requirements for repositories and how to meet them by petrknoth
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
petrknoth404 views
Enabling Educators to Locate High-Quality Teaching Resources by petrknoth
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
petrknoth237 views
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard by petrknoth
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
petrknoth416 views
Better together: building services for public good on top of content from the... by petrknoth
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth257 views
CORE Analytics Dashboard by petrknoth
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
petrknoth1.2K views
Better together: building services for public good on top of content from the... by petrknoth
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth748 views
Analysing the performance of open access papers discovery tools by petrknoth
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
petrknoth2.3K views
Assessing Compliance with the UK REF 2021 Open Access Policy by petrknoth
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
petrknoth3.8K views
Data interoperability toolkit (OpenMinTeD) by petrknoth
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
petrknoth126 views
Integrating research indicators for use in the repositories infrastructure by petrknoth
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
petrknoth189 views
Towards effective research recommender systems for repositories by petrknoth
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
petrknoth438 views
COAR Next Generation Repositories WG - Text mining and Recommender system sto... by petrknoth
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
petrknoth192 views
Seamless access to the world’s open access research papers via ResourceSync by petrknoth
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
petrknoth731 views
Semantometrics: Towards Fulltext-based Research Evaluation by petrknoth
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
petrknoth1.7K views
Aggregating Research papers from Publishers' Systems to Support Text and Data... by petrknoth
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
petrknoth555 views
My repository is being aggregated: a blessing or a curse? by petrknoth
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
petrknoth1.1K views
FOSTER - Content Delivery (WP3) by petrknoth
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
petrknoth507 views

Recently uploaded

Berry country.pdf by
Berry country.pdfBerry country.pdf
Berry country.pdfMariaKenney3
75 views12 slides
Gross Anatomy of the Liver by
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liverobaje godwin sunday
77 views12 slides
EILO EXCURSION PROGRAMME 2023 by
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023info33492
202 views40 slides
ICS3211_lecture 09_2023.pdf by
ICS3211_lecture 09_2023.pdfICS3211_lecture 09_2023.pdf
ICS3211_lecture 09_2023.pdfVanessa Camilleri
141 views10 slides
Thanksgiving!.pdf by
Thanksgiving!.pdfThanksgiving!.pdf
Thanksgiving!.pdfEnglishCEIPdeSigeiro
500 views17 slides
UNIDAD 3 6º C.MEDIO.pptx by
UNIDAD 3 6º C.MEDIO.pptxUNIDAD 3 6º C.MEDIO.pptx
UNIDAD 3 6º C.MEDIO.pptxMarcosRodriguezUcedo
146 views32 slides

Recently uploaded(20)

EILO EXCURSION PROGRAMME 2023 by info33492
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023
info33492202 views
12.5.23 Poverty and Precarity.pptx by mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239381 views
Career Building in AI - Technologies, Trends and Opportunities by WebStackAcademy
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and Opportunities
WebStackAcademy45 views
Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf by TechSoup
 Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf
Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf
TechSoup 53 views
What is Digital Transformation? by Mark Brown
What is Digital Transformation?What is Digital Transformation?
What is Digital Transformation?
Mark Brown41 views
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx by Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan40 views
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... by Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste55 views
NodeJS and ExpressJS.pdf by ArthyR3
NodeJS and ExpressJS.pdfNodeJS and ExpressJS.pdf
NodeJS and ExpressJS.pdf
ArthyR348 views
INT-244 Topic 6b Confucianism by S Meyer
INT-244 Topic 6b ConfucianismINT-244 Topic 6b Confucianism
INT-244 Topic 6b Confucianism
S Meyer45 views
JRN 362 - Lecture Twenty-Two by Rich Hanley
JRN 362 - Lecture Twenty-TwoJRN 362 - Lecture Twenty-Two
JRN 362 - Lecture Twenty-Two
Rich Hanley39 views

CORE: Aggregating and Enriching Content to Support Open Access

  • 1. CORE: Aggregating and Enriching Content to Support Open Access Petr Knoth The Open University 1/52
  • 2. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 2/52
  • 3. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 3/52
  • 4. Growth of items in Open Access repositories 4/52
  • 5. Growth of Open Access repositories 5/52
  • 6. Growth of articles in OA journals 6/52
  • 7. Growth of OA journals 7/52
  • 8. Green Open Access - statistics 8/52
  • 9. Why we need aggregations? “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto] 9/52
  • 10. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Analytical Semantic Enrichment Repository information access Interfaces Aggregation Transaction Repository information access Content OLAP Raw data access Repository 10/52
  • 11. Who should be supported by aggregations? The following users groups (divided according to the level of abstraction of information they need): • Raw data access. • Transaction information access. • Analytical information access. 11/52
  • 12. Who should be supported by aggregations? • The following users groups (divided according to the level of abstraction of information they need): • Raw data access. Developers, DLs, DL researchers, companies … • Transaction information access. Researchers, students, life-long learners … • Analytical information access. Funders, government, bussiness intelligence … 12/52
  • 13. Layers of an aggregation system Interfaces OLTP OLAP Enrichment Metadata Content Metadata Transfer Interoperability 13/52
  • 14. Layers of an aggregation system APIs (REST, SOAP, XML-RPC), UIs, Dashboards Statistics Interfaces OLTP OLAP Enrichment Catalog records Metadata Content Metadata Transfer Interoperability Annotations OAI-PMH, OAI-ORE … Dublin Core, XML, RDF … PDF, Word … 14/52
  • 15. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 15/52
  • 17. Aggregation projects – BASE Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 17/52
  • 18. Aggregation projects – OAISter/WorldCAT Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 18/52
  • 19. Aggregation projects – RepUK Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 19/52
  • 20. Aggregations need access to content, not just metadata! • Certain metadata types can be created only at the level of the aggregation • Certain metadata can be changing in time • Ensuring content: • accessibility • availability • validity • quality • … 20/52
  • 21. Aggregation projects – CiteSeerX Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 21/52
  • 22. Should an aggregation system support all three user types? Can be realised by more than one system providing that the dataset is the same! 22/52
  • 23. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 23/52
  • 24. CORE objectives • CORE aims to provide a comprehensive technical infrastructure for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction. • A nation-wide aggregation system that will improve the discovery of publications stored in British Open Access Repositories (OARs). 24/52
  • 25. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 25/52
  • 27. CORE functionality Step 1: Metadata and full-text harvesting Content harvesting, processing 27/52
  • 28. What does CORE provide at different aggregation levels? Semantic similarity, Citation extraction, classsification, … Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 28/52
  • 29. CORE functionality Step 2: Semantic enrichment Semantic enrichment 29/52
  • 30. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 30/52
  • 31. CORE functionality Step 3: Providing a set of services on top of the aggregation Providing services 31/52
  • 32. CORE applications • CORE Portal • CORE Mobile • CORE Plugin • CORE API • Repository Analytics 32/52
  • 33. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 33/52
  • 34. CORE Applications CORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories 34/52
  • 35. CORE Applications CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories 35/52
  • 36. CORE Applications CORE Plugin – A plugin to system that recommendations for related items. 36/52
  • 37. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 37/52
  • 38. CORE Applications CORE API – Enables external systems and services to interact with the CORE repository. 38/52
  • 39. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 39/52
  • 40. CORE Applications Repository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers). 40/52
  • 41. What does CORE provide at different aggregation levels? Repository Analytics Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment CORE Portal, CORE Mobile, CORE Plugin Transaction Repository information access Content OLAP CORE API Raw data access Repository 41/52
  • 42. CORE statistics • Content • 5.4M records • 192 repositories • 402k full-texts • Started: February 2011 • Budget: 140k£ 42/52
  • 43. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications ( ) 43/52
  • 45. Objective Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR). 45/52
  • 46. DiggiCORE networks Three networks: (a) semantically related papers, (b) citation network, (c) author citation network 46/52
  • 47. DiggiCORE objectives Allow researchers to use this platform to analyse publications. Why? • To identifying patterns in the behaviour of research communities • To detect trends in research disciplines • To gain new insights into the citation behaviour of researchers • To discover features that distinguish papers with high impact 47/52
  • 48. Questions the system can help answering? • What are the attributes of impact publications? • Do these attributes differ in the humanities, social sciences and computer sciences? • What are the features of research groups within disciplines and how do these features relate to contributions generated by the group? • What are the attributes of high-impact authors and what is their role within the group? • What are the dynamics of successful research groups? 48/52
  • 49. Questions the system can help answering? • What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences? • Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines? • How should the novice in the discipline get acquainted with key achievements in the discipline? • How should he/she search for the most important publications? 49/52
  • 50. Summary • The rapid growth of OA content provides both an opportunity as well as a challenge. • Aggregations should serve the needs of different user groups. • Aggregations need to aggregate content, not just metadata. • We can have many services that are part of the infrastructure, but should work with the same data. 50/52
  • 51. Thank you! Yes we can! 51/52
  • 52. 52/52