SlideShare a Scribd company logo
EASTER

 Evaluating Automated Subject Tools
       for Enhancing Retrieval

                    Douglas Tudhope
                Hypermedia Research Unit
                 University of Glamorgan




JISC Automatic Metadata Generation Meeting, London, May 25, 2010
Background

•   EASTER is an 18-month JISC project funded under the Information
    Environment Programme 2009-11.

•   Started April 2009 and involves eight institutional partners

•   Aim is to test and evaluate a range of current tools for automated
    subject metadata generation

•   Anticipated outcomes:
     –   better understanding of limitations and what possible
     –   recommendations for services employing subject metadata in JISC community
Rationale – problems, issues, relevance

•   EASTER investigates the creation and enrichment of subject
    metadata using existing automated tools.

•   Subject metadata are the most important in resource discovery, yet
    most expensive to produce manually. In addition, they are more
    difficult to generate automatically compared to formal metadata
    such as file type, title, etc. Wide uses in retrieval and NLP tools.

•   Due to the high cost of evaluation, automated subject metadata
    tools are rarely tested in live environments of use.

•   Challenge facing digital collections, institutional repositories, and
    aggregators of how to provide high quality subject metadata at
    reasonable costs.
Intute testbed

•   Test-bed is Intute http://www.intute.ac.uk
    - a collection of websites (mostly)
    However results intended to be generally applicable

•   Tools for automated subject metadata generation
    will be tested in two contexts:
     Intute cataloguers in the cataloguing workflow;
     end-users of Intute who search for information


•   Task-based end-user retrieval study will examine contribution of
    automatically assigned terms and manually assigned terms
Methodology

•   A methodology for evaluating such tools is intended as a significant
    project outcome/contribution

•   Low reliability rates between cataloguers and different times of
    indexing is a recognised problem

•   EASTER methodology includes creating an enhanced ‘gold
    standard’ test collection by careful manual cataloguing and expert
    review by cataloguers and users. Provision for consideration of
    automatic indexing output within enhanced gold standard in
    methodology.
Candidate Tools

Initial candidate tools (a subset will be selected after review)

1) Temis Categorizer (French SME – inhouse)
2) KEA -- new version Maui (Waikato)
3) TextGarden
4) TerMine (NACTEM)
5) KnowLib’s automated classifier (Lund)
6) Scorpion (OCLC)
7) iVia project’s libiViaClassification (UC Riverside)
Candidate Tools

Initial candidate tools (a subset will be selected after review)

1) Temis Categorizer                  (machine learning, classification)
2) KEA (http://www.nzdl.org/Kea/) -- new version Maui (indexing)
3) TextGarden (http://kt.ijs.si/Dunja/textgarden/)
4) TerMine (http://www.nactem.ac.uk/software/termine/) (noun phrase)
5) KnowLib’s automated classifier                            (classification)
    (http://www.it.lth.se/knowlib/auto.htm)
6) Scorpion
    (http://www.oclc.org/research/software/scorpion/default.htm)
7) iVia project’s libiViaClassification
    (http://ivia.ucr.edu/manuals/stable/libiViaClassification/5.4.0/)
Progress

•   Distinguish 3 subject domains associated with different thesauri
•   VETINERARY              - CAB Thesaurus
•   VISUAL ARTS             - AAT
•   POLITICS                - HASSET, (IBSS?)

•   KEA/Maui      thesauri and training set
•   AutoClass     thesauri – need to consider main classes to classify
•   TERMINE       none
•   TEMIS         thesauri and training set depending on mode
                  (IPR of thesauri for commercial use an issue)

•   Conversion of thesauri to SKOS format underway
•   Web crawler for EASTER purposes implemented
Lessons learned
Preliminary stages – provisional general observations

 •   Subject metadata generation tools typically complex layered
     software. Require maintenance to stay current. Installation may not
     be trivial. Resource implications.

 •   General subject metadata generation tools often require tuning and
     adaptation for different contexts and subject domains?
     Resource implications.

 •   Subject metadata generation for what purpose? Classification,
     indexing, annotation associated with different use cases.
     Eg browsing and search require different metadata for best results.
     An individual tool may not deliver all use cases.

 •   Possibilities for pipelining different approaches (tools) in sequence
STAR/STELLAR Projects also relevant
Information Extraction from archaeology grey literature (AHRC)

   ‘Rich’, semantic indexing of Archaeology fieldwork reports (ADS
    OASIS Grey Literature) with respect to the English Heritage
    extension of the CRM Conceptual Reference Model (Ontology),
    making use of EH thesauri/glossaries and the GATE NLP tool.


   Transforms GATE XML annotations to RDF triples conformant to
    conceptual model, allowing cross search with datasets.


   In progress
    Web service interface planned to NLP semantic indexing


   STAR terminology services (based on SKOS vocabularies)
    JavaScript widgets browser neutral
STAR/STELLAR Projects also relevant
Information Extraction from archaeology grey literature (AHRC)


   Archaeology domain specific but investigating generalisation to
    cultural heritage more generally
    eg classical art history domain (with OUCS)


   STELLAR (AHRC) investigates generalising data mapping tool
    and producing linked data (with ADS)
    http://hypermedia.research.glam.ac.uk/kos/star/
    http://hypermedia.research.glam.ac.uk/kos/stellar
Grey Literature Information Extraction
(Andreas Vlachidis)
                                    •    Looking to extract
                                         CRM-EH period,
                                         context, find,
                                         sample entities
                                    •    Aim to cross
                                         search with
                                         archaeology
                                         datasets
CRM-EH Entities and Events (Example)
Contact

EASTER project website

http://www.ukoln.ac.uk/projects/easter/

Project publications
http://www.ukoln.ac.uk/projects/easter/dissemination/



dstudhope@glam.ac.uk

More Related Content

What's hot

Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
dgarijo
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
Carole Goble
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*
Vince Smith
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
FAIRDOM
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Carole Goble
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
Carole Goble
 
Modularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space ExplorationModularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space Exploration
Steffen Zschaler
 
Comparing and matching archaeological excavation data for integration in onto...
Comparing and matching archaeological excavation data for integration in onto...Comparing and matching archaeological excavation data for integration in onto...
Comparing and matching archaeological excavation data for integration in onto...
ariadnenetwork
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Carole Goble
 

What's hot (20)

Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Modularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space ExplorationModularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space Exploration
 
Comparing and matching archaeological excavation data for integration in onto...
Comparing and matching archaeological excavation data for integration in onto...Comparing and matching archaeological excavation data for integration in onto...
Comparing and matching archaeological excavation data for integration in onto...
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 

Similar to Easter JISC metadata May25 DT

RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
Vladimir Alexiev, PhD, PMP
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
Jamshaid Ashraf
 
ARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperabilityARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperability
ariadnenetwork
 
Metadata and me
Metadata and meMetadata and me
Metadata and me
Nick Sheppard
 
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinarQualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
Christina Silver
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodeiASIS&T
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008
Nancy Elkington
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
faflrt
 
Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital Preservation
National Digital Forum
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managers
Nick Sheppard
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
locloud
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
Christophe Debruyne
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Takeshi Morita
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
Adrian Stevenson
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensors
Web Directions
 
Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nick Jones
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Francesco Osborne
 

Similar to Easter JISC metadata May25 DT (20)

RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
ARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperabilityARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperability
 
Metadata and me
Metadata and meMetadata and me
Metadata and me
 
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinarQualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
 
Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital Preservation
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managers
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensors
 
Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 

Recently uploaded

Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 

Recently uploaded (20)

Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 

Easter JISC metadata May25 DT

  • 1. EASTER Evaluating Automated Subject Tools for Enhancing Retrieval Douglas Tudhope Hypermedia Research Unit University of Glamorgan JISC Automatic Metadata Generation Meeting, London, May 25, 2010
  • 2. Background • EASTER is an 18-month JISC project funded under the Information Environment Programme 2009-11. • Started April 2009 and involves eight institutional partners • Aim is to test and evaluate a range of current tools for automated subject metadata generation • Anticipated outcomes: – better understanding of limitations and what possible – recommendations for services employing subject metadata in JISC community
  • 3. Rationale – problems, issues, relevance • EASTER investigates the creation and enrichment of subject metadata using existing automated tools. • Subject metadata are the most important in resource discovery, yet most expensive to produce manually. In addition, they are more difficult to generate automatically compared to formal metadata such as file type, title, etc. Wide uses in retrieval and NLP tools. • Due to the high cost of evaluation, automated subject metadata tools are rarely tested in live environments of use. • Challenge facing digital collections, institutional repositories, and aggregators of how to provide high quality subject metadata at reasonable costs.
  • 4. Intute testbed • Test-bed is Intute http://www.intute.ac.uk - a collection of websites (mostly) However results intended to be generally applicable • Tools for automated subject metadata generation will be tested in two contexts: Intute cataloguers in the cataloguing workflow; end-users of Intute who search for information • Task-based end-user retrieval study will examine contribution of automatically assigned terms and manually assigned terms
  • 5. Methodology • A methodology for evaluating such tools is intended as a significant project outcome/contribution • Low reliability rates between cataloguers and different times of indexing is a recognised problem • EASTER methodology includes creating an enhanced ‘gold standard’ test collection by careful manual cataloguing and expert review by cataloguers and users. Provision for consideration of automatic indexing output within enhanced gold standard in methodology.
  • 6. Candidate Tools Initial candidate tools (a subset will be selected after review) 1) Temis Categorizer (French SME – inhouse) 2) KEA -- new version Maui (Waikato) 3) TextGarden 4) TerMine (NACTEM) 5) KnowLib’s automated classifier (Lund) 6) Scorpion (OCLC) 7) iVia project’s libiViaClassification (UC Riverside)
  • 7. Candidate Tools Initial candidate tools (a subset will be selected after review) 1) Temis Categorizer (machine learning, classification) 2) KEA (http://www.nzdl.org/Kea/) -- new version Maui (indexing) 3) TextGarden (http://kt.ijs.si/Dunja/textgarden/) 4) TerMine (http://www.nactem.ac.uk/software/termine/) (noun phrase) 5) KnowLib’s automated classifier (classification) (http://www.it.lth.se/knowlib/auto.htm) 6) Scorpion (http://www.oclc.org/research/software/scorpion/default.htm) 7) iVia project’s libiViaClassification (http://ivia.ucr.edu/manuals/stable/libiViaClassification/5.4.0/)
  • 8. Progress • Distinguish 3 subject domains associated with different thesauri • VETINERARY - CAB Thesaurus • VISUAL ARTS - AAT • POLITICS - HASSET, (IBSS?) • KEA/Maui thesauri and training set • AutoClass thesauri – need to consider main classes to classify • TERMINE none • TEMIS thesauri and training set depending on mode (IPR of thesauri for commercial use an issue) • Conversion of thesauri to SKOS format underway • Web crawler for EASTER purposes implemented
  • 9. Lessons learned Preliminary stages – provisional general observations • Subject metadata generation tools typically complex layered software. Require maintenance to stay current. Installation may not be trivial. Resource implications. • General subject metadata generation tools often require tuning and adaptation for different contexts and subject domains? Resource implications. • Subject metadata generation for what purpose? Classification, indexing, annotation associated with different use cases. Eg browsing and search require different metadata for best results. An individual tool may not deliver all use cases. • Possibilities for pipelining different approaches (tools) in sequence
  • 10. STAR/STELLAR Projects also relevant Information Extraction from archaeology grey literature (AHRC)  ‘Rich’, semantic indexing of Archaeology fieldwork reports (ADS OASIS Grey Literature) with respect to the English Heritage extension of the CRM Conceptual Reference Model (Ontology), making use of EH thesauri/glossaries and the GATE NLP tool.  Transforms GATE XML annotations to RDF triples conformant to conceptual model, allowing cross search with datasets.  In progress Web service interface planned to NLP semantic indexing  STAR terminology services (based on SKOS vocabularies) JavaScript widgets browser neutral
  • 11. STAR/STELLAR Projects also relevant Information Extraction from archaeology grey literature (AHRC)  Archaeology domain specific but investigating generalisation to cultural heritage more generally eg classical art history domain (with OUCS)  STELLAR (AHRC) investigates generalising data mapping tool and producing linked data (with ADS) http://hypermedia.research.glam.ac.uk/kos/star/ http://hypermedia.research.glam.ac.uk/kos/stellar
  • 12. Grey Literature Information Extraction (Andreas Vlachidis) • Looking to extract CRM-EH period, context, find, sample entities • Aim to cross search with archaeology datasets
  • 13. CRM-EH Entities and Events (Example)
  • 14. Contact EASTER project website http://www.ukoln.ac.uk/projects/easter/ Project publications http://www.ukoln.ac.uk/projects/easter/dissemination/ dstudhope@glam.ac.uk