SlideShare a Scribd company logo
1 of 23
Universitätsbibliothek




BASE – a powerful search engine
    for Open Access documents

                 AIMS@OA Week

                          25 Oct 2012

             Friedrich Summann
          Bielefeld University Library
Universitätsbibliothek




Overview
   BASE – the OA search engine

 Harvesting OAI-PMH and its challenges

 Metadata Aggregation and Data Quality

 Processing Subject Repositories
Universitätsbibliothek



            Harvesting Background


BASE (Bielefeld Academic Search Engine)

• started in 2002, active since 2004
• 2900 repositories harvested via OAI-PMH
• 2337 repositories indexed
• 37.4 Mill. documents included
• 3.1 Mill. documents automatically classified
• Lucene/Solr Index
• VuFind end-user GUI
Universitätsbibliothek



Repositories: Geographical Distribution


                         14.0 m
 15,9 m                           2.9 m




                         0.26 m



      0.45 m m
        0.45                              2,5 m
Universitätsbibliothek



     BASE search features

• Truncation
• Search History
• Sorting
• Drilldown
• Linguistic Tools
  (Stemming, Eurovoc Thesaurus)
Universitätsbibliothek



            Repository Typology
• Institutional Repositories (35 %)
• Thesis and Dissertation Server (11 %)
• Subject Repositories (1 %)
• Electronic Journals (21 %)
• Digital Collections (6 %)
• Others (Videos, Audios, Datasets etc.) (2 %)
Universitätsbibliothek



                  BASE Interfaces

• Query REST interface
• Repository Metadata interface
• Data Delivery Interface (Repository based, DDC
  of aggregated Metadata) (under construction)
Universitätsbibliothek




Overview
  BASE – the OA search engine

 Harvesting OAI-PMH and its challenges

 Metadata Aggregation and Data Quality

 Processing Repositories
Universitätsbibliothek




My Conclusion:

OAI-PMH Harvesting is easy

But:

Putting things (results) together
   is the real challenge
Universitätsbibliothek



            Harvesting : Challenges and pitfalls


Repository does not respond (temporarily, specific verbs)
Results are not xml-valid
Harvesting breaks (especially big reps)
Incremental Harvesting does not work
No deleting information, added records
Variety of Field Contents
Change of behavior (basicurl, contents)
Metadata point to reference or citation only
Link to Document is not operable
Fulltext access is restricted (non OA)
Universitätsbibliothek




Overview
  BASE – the OA search engine

 Harvesting OAI-PMH and its challenges

 Metadata Aggregation and Data Quality

 Processing Subject Repositories
Universitätsbibliothek



          dc:language: Variety of Metadata Values

     Analysis: European Repositories, Oct. 2009
     804 different values in 4720585 tags

Top values
                                  ;-3
en – 1385175                      ?-3
eng – 511085
                                  at;deu - 2
spa – 345658
de – 319937                       enm;eng - 2
en_GB - 178381                    FRA – 2
ger – 166587                      fr_BE - 2
eng; - 102678                     Andere Sprache – 2
FR – 95798                        cat, spa, fra, eng. - 2
…
l
Universitätsbibliothek



           dc:type: Variety of Metadata Values

       Analysis: German Repositories, Sept. 2009
       2772 different values in 1394089 tags

Top values
                               Software - 7
Dataset – 588525
Artikel – 192306               Kulturkarten - 7
Rezension – 113924             Composition - 7
Text – 73210                   Interactive Resource - 4
Text.Thesis.Doctoral – 30201   Interview – 3
Article – 29278                Media - 1
Miszelle – 27060               content analysis – 1
NonPeerReviewed – 24688        Anniversary Publication – 1
ResearchPaper – 16046          qualitative research -1
Dissertation - 15531
…
l
Universitätsbibliothek




Overview
  BASE – the OA search engine

 Harvesting OAI-PMH and its challenges

 Metadata Aggregation and Data Quality

 Processing Subject Repositories
Universitätsbibliothek



        Subject Repositories: Registries



  Disciplinary repositories
http://oad.simmons.edu/oadwiki/Disciplinary_repositories

 OpenDOAR
Universitätsbibliothek



    Subject Repositories in BASE


   The Big Ones:
   • arXiv.org (Physics)
   • CERN Document Server (Physics)
   • PubMed Central (Life Sciences)
   • CiteSeer (Computer Science)
   • ELIS (Library Science)
   • REPEC (Economics)
   • EconStor (Economics)
   • SSOAR (Social Sciences)
   ...
Universitätsbibliothek



The BASE Approach: Automatic Classification
Universitätsbibliothek



                   Contents for Classifier Feed


dc:description: 30 to 40 % of metadata records have dc:description
with relevant abstract information

Document fulltext (if accessible)

Setspec contains ddc and lcc codes

dc:subject contains lots of subject-orientated information
Universitätsbibliothek



        Building the Knowledge Base
Universitätsbibliothek




Mapping of frequently used classifications

LCC
ELIS classification
ArXiv classification

DDC codes: ~400.000 Documents = 1,4%
Universitätsbibliothek




DDC classes distribution in Harvesting Results
Universitätsbibliothek



  Subject-based Browsing
Universitätsbibliothek




The End. Thank you!


Mail: friedrich.summann@uni-bielefeld.de

More Related Content

What's hot

Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
The European Library
 
Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007
annegrete
 
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
Lisette Giepmans
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and query
Nina Jeliazkova
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
Karlsruhe Institute of Technology (KIT)
 
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Robin Rice
 

What's hot (20)

Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007
 
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
 
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWArchiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Biblio-transformation-engine slides in Open Repositories 2012
Biblio-transformation-engine slides in Open Repositories 2012Biblio-transformation-engine slides in Open Repositories 2012
Biblio-transformation-engine slides in Open Repositories 2012
 
Symmetry 13-00195-v2
Symmetry 13-00195-v2Symmetry 13-00195-v2
Symmetry 13-00195-v2
 
RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
 
Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UK
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and query
 
Open Access, Repositories and Research Assessment
Open Access, Repositories and Research AssessmentOpen Access, Repositories and Research Assessment
Open Access, Repositories and Research Assessment
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
Introduction to Europeana Cloud project
Introduction to Europeana Cloud projectIntroduction to Europeana Cloud project
Introduction to Europeana Cloud project
 
Europeana Cloud - Introduction to Europeana Cloud
Europeana Cloud - Introduction to Europeana CloudEuropeana Cloud - Introduction to Europeana Cloud
Europeana Cloud - Introduction to Europeana Cloud
 
Linked Open Projects (DCMI Library Community)
Linked Open Projects (DCMI Library Community)Linked Open Projects (DCMI Library Community)
Linked Open Projects (DCMI Library Community)
 
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 

Similar to BASE : a powerful search engine for Open Access documents

Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
NIFT
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
GESIS
 
Workflow session wla 2012
Workflow session wla 2012Workflow session wla 2012
Workflow session wla 2012
WiLS
 

Similar to BASE : a powerful search engine for Open Access documents (20)

Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...
 
Intro nsl-sc-july
Intro nsl-sc-julyIntro nsl-sc-july
Intro nsl-sc-july
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine Harvester
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
Primo at Ticer 2009 - afternoon session
Primo at Ticer 2009 - afternoon sessionPrimo at Ticer 2009 - afternoon session
Primo at Ticer 2009 - afternoon session
 
Visibility and internationalization USARB Through Institutional Repository
Visibility and internationalization USARB Through Institutional Repository Visibility and internationalization USARB Through Institutional Repository
Visibility and internationalization USARB Through Institutional Repository
 
Presentation - First International Library Staff Exchange Week, Zagreb
Presentation - First International Library Staff Exchange Week, ZagrebPresentation - First International Library Staff Exchange Week, Zagreb
Presentation - First International Library Staff Exchange Week, Zagreb
 
Small molecule identification and the new MassBank
Small molecule identification and the new MassBankSmall molecule identification and the new MassBank
Small molecule identification and the new MassBank
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
Workflow session wla 2012
Workflow session wla 2012Workflow session wla 2012
Workflow session wla 2012
 
Workflow Session Wla 2012
Workflow Session Wla 2012Workflow Session Wla 2012
Workflow Session Wla 2012
 
Cambridge university library ess update for ucs
Cambridge university library  ess update for ucsCambridge university library  ess update for ucs
Cambridge university library ess update for ucs
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community Webinar
 
RUGCombine & Livetrix : search for a perfect interface ....?
RUGCombine & Livetrix : search for a perfect interface ....?RUGCombine & Livetrix : search for a perfect interface ....?
RUGCombine & Livetrix : search for a perfect interface ....?
 

More from AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
AIMS (Agricultural Information Management Standards)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
AIMS (Agricultural Information Management Standards)
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
AIMS (Agricultural Information Management Standards)
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
AIMS (Agricultural Information Management Standards)
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
AIMS (Agricultural Information Management Standards)
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
AIMS (Agricultural Information Management Standards)
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
AIMS (Agricultural Information Management Standards)
 

More from AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAssigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
 
VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release
 
The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
 
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
 
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
 
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
 
Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
 
Research4Life: The library that opens doors
Research4Life: The library that opens doorsResearch4Life: The library that opens doors
Research4Life: The library that opens doors
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

BASE : a powerful search engine for Open Access documents

  • 1. Universitätsbibliothek BASE – a powerful search engine for Open Access documents AIMS@OA Week 25 Oct 2012 Friedrich Summann Bielefeld University Library
  • 2. Universitätsbibliothek Overview BASE – the OA search engine  Harvesting OAI-PMH and its challenges  Metadata Aggregation and Data Quality  Processing Subject Repositories
  • 3. Universitätsbibliothek Harvesting Background BASE (Bielefeld Academic Search Engine) • started in 2002, active since 2004 • 2900 repositories harvested via OAI-PMH • 2337 repositories indexed • 37.4 Mill. documents included • 3.1 Mill. documents automatically classified • Lucene/Solr Index • VuFind end-user GUI
  • 4. Universitätsbibliothek Repositories: Geographical Distribution 14.0 m 15,9 m 2.9 m 0.26 m 0.45 m m 0.45 2,5 m
  • 5. Universitätsbibliothek BASE search features • Truncation • Search History • Sorting • Drilldown • Linguistic Tools (Stemming, Eurovoc Thesaurus)
  • 6. Universitätsbibliothek Repository Typology • Institutional Repositories (35 %) • Thesis and Dissertation Server (11 %) • Subject Repositories (1 %) • Electronic Journals (21 %) • Digital Collections (6 %) • Others (Videos, Audios, Datasets etc.) (2 %)
  • 7. Universitätsbibliothek BASE Interfaces • Query REST interface • Repository Metadata interface • Data Delivery Interface (Repository based, DDC of aggregated Metadata) (under construction)
  • 8. Universitätsbibliothek Overview BASE – the OA search engine  Harvesting OAI-PMH and its challenges  Metadata Aggregation and Data Quality  Processing Repositories
  • 9. Universitätsbibliothek My Conclusion: OAI-PMH Harvesting is easy But: Putting things (results) together is the real challenge
  • 10. Universitätsbibliothek Harvesting : Challenges and pitfalls Repository does not respond (temporarily, specific verbs) Results are not xml-valid Harvesting breaks (especially big reps) Incremental Harvesting does not work No deleting information, added records Variety of Field Contents Change of behavior (basicurl, contents) Metadata point to reference or citation only Link to Document is not operable Fulltext access is restricted (non OA)
  • 11. Universitätsbibliothek Overview BASE – the OA search engine  Harvesting OAI-PMH and its challenges  Metadata Aggregation and Data Quality  Processing Subject Repositories
  • 12. Universitätsbibliothek dc:language: Variety of Metadata Values Analysis: European Repositories, Oct. 2009 804 different values in 4720585 tags Top values ;-3 en – 1385175 ?-3 eng – 511085 at;deu - 2 spa – 345658 de – 319937 enm;eng - 2 en_GB - 178381 FRA – 2 ger – 166587 fr_BE - 2 eng; - 102678 Andere Sprache – 2 FR – 95798 cat, spa, fra, eng. - 2 … l
  • 13. Universitätsbibliothek dc:type: Variety of Metadata Values Analysis: German Repositories, Sept. 2009 2772 different values in 1394089 tags Top values Software - 7 Dataset – 588525 Artikel – 192306 Kulturkarten - 7 Rezension – 113924 Composition - 7 Text – 73210 Interactive Resource - 4 Text.Thesis.Doctoral – 30201 Interview – 3 Article – 29278 Media - 1 Miszelle – 27060 content analysis – 1 NonPeerReviewed – 24688 Anniversary Publication – 1 ResearchPaper – 16046 qualitative research -1 Dissertation - 15531 … l
  • 14. Universitätsbibliothek Overview BASE – the OA search engine  Harvesting OAI-PMH and its challenges  Metadata Aggregation and Data Quality  Processing Subject Repositories
  • 15. Universitätsbibliothek Subject Repositories: Registries Disciplinary repositories http://oad.simmons.edu/oadwiki/Disciplinary_repositories OpenDOAR
  • 16. Universitätsbibliothek Subject Repositories in BASE The Big Ones: • arXiv.org (Physics) • CERN Document Server (Physics) • PubMed Central (Life Sciences) • CiteSeer (Computer Science) • ELIS (Library Science) • REPEC (Economics) • EconStor (Economics) • SSOAR (Social Sciences) ...
  • 17. Universitätsbibliothek The BASE Approach: Automatic Classification
  • 18. Universitätsbibliothek Contents for Classifier Feed dc:description: 30 to 40 % of metadata records have dc:description with relevant abstract information Document fulltext (if accessible) Setspec contains ddc and lcc codes dc:subject contains lots of subject-orientated information
  • 19. Universitätsbibliothek Building the Knowledge Base
  • 20. Universitätsbibliothek Mapping of frequently used classifications LCC ELIS classification ArXiv classification DDC codes: ~400.000 Documents = 1,4%
  • 23. Universitätsbibliothek The End. Thank you! Mail: friedrich.summann@uni-bielefeld.de