Your SlideShare is downloading. ×
0
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Easter JISC metadata May25 DT
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Easter JISC metadata May25 DT

361

Published on

Presentation JISC meeting on automated metadata tools, London 25/05/10

Presentation JISC meeting on automated metadata tools, London 25/05/10

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
361
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. EASTER Evaluating Automated Subject Tools for Enhancing Retrieval Douglas Tudhope Hypermedia Research Unit University of Glamorgan JISC Automatic Metadata Generation Meeting, London, May 25, 2010
  2. Background • EASTER is an 18-month JISC project funded under the Information Environment Programme 2009-11. • Started April 2009 and involves eight institutional partners • Aim is to test and evaluate a range of current tools for automated subject metadata generation • Anticipated outcomes: – better understanding of limitations and what possible – recommendations for services employing subject metadata in JISC community
  3. Rationale – problems, issues, relevance • EASTER investigates the creation and enrichment of subject metadata using existing automated tools. • Subject metadata are the most important in resource discovery, yet most expensive to produce manually. In addition, they are more difficult to generate automatically compared to formal metadata such as file type, title, etc. Wide uses in retrieval and NLP tools. • Due to the high cost of evaluation, automated subject metadata tools are rarely tested in live environments of use. • Challenge facing digital collections, institutional repositories, and aggregators of how to provide high quality subject metadata at reasonable costs.
  4. Intute testbed • Test-bed is Intute http://www.intute.ac.uk - a collection of websites (mostly) However results intended to be generally applicable • Tools for automated subject metadata generation will be tested in two contexts: Intute cataloguers in the cataloguing workflow; end-users of Intute who search for information • Task-based end-user retrieval study will examine contribution of automatically assigned terms and manually assigned terms
  5. Methodology • A methodology for evaluating such tools is intended as a significant project outcome/contribution • Low reliability rates between cataloguers and different times of indexing is a recognised problem • EASTER methodology includes creating an enhanced ‘gold standard’ test collection by careful manual cataloguing and expert review by cataloguers and users. Provision for consideration of automatic indexing output within enhanced gold standard in methodology.
  6. Candidate Tools Initial candidate tools (a subset will be selected after review) 1) Temis Categorizer (French SME – inhouse) 2) KEA -- new version Maui (Waikato) 3) TextGarden 4) TerMine (NACTEM) 5) KnowLib’s automated classifier (Lund) 6) Scorpion (OCLC) 7) iVia project’s libiViaClassification (UC Riverside)
  7. Candidate Tools Initial candidate tools (a subset will be selected after review) 1) Temis Categorizer (machine learning, classification) 2) KEA (http://www.nzdl.org/Kea/) -- new version Maui (indexing) 3) TextGarden (http://kt.ijs.si/Dunja/textgarden/) 4) TerMine (http://www.nactem.ac.uk/software/termine/) (noun phrase) 5) KnowLib’s automated classifier (classification) (http://www.it.lth.se/knowlib/auto.htm) 6) Scorpion (http://www.oclc.org/research/software/scorpion/default.htm) 7) iVia project’s libiViaClassification (http://ivia.ucr.edu/manuals/stable/libiViaClassification/5.4.0/)
  8. Progress • Distinguish 3 subject domains associated with different thesauri • VETINERARY - CAB Thesaurus • VISUAL ARTS - AAT • POLITICS - HASSET, (IBSS?) • KEA/Maui thesauri and training set • AutoClass thesauri – need to consider main classes to classify • TERMINE none • TEMIS thesauri and training set depending on mode (IPR of thesauri for commercial use an issue) • Conversion of thesauri to SKOS format underway • Web crawler for EASTER purposes implemented
  9. Lessons learned Preliminary stages – provisional general observations • Subject metadata generation tools typically complex layered software. Require maintenance to stay current. Installation may not be trivial. Resource implications. • General subject metadata generation tools often require tuning and adaptation for different contexts and subject domains? Resource implications. • Subject metadata generation for what purpose? Classification, indexing, annotation associated with different use cases. Eg browsing and search require different metadata for best results. An individual tool may not deliver all use cases. • Possibilities for pipelining different approaches (tools) in sequence
  10. STAR/STELLAR Projects also relevant Information Extraction from archaeology grey literature (AHRC)  ‘Rich’, semantic indexing of Archaeology fieldwork reports (ADS OASIS Grey Literature) with respect to the English Heritage extension of the CRM Conceptual Reference Model (Ontology), making use of EH thesauri/glossaries and the GATE NLP tool.  Transforms GATE XML annotations to RDF triples conformant to conceptual model, allowing cross search with datasets.  In progress Web service interface planned to NLP semantic indexing  STAR terminology services (based on SKOS vocabularies) JavaScript widgets browser neutral
  11. STAR/STELLAR Projects also relevant Information Extraction from archaeology grey literature (AHRC)  Archaeology domain specific but investigating generalisation to cultural heritage more generally eg classical art history domain (with OUCS)  STELLAR (AHRC) investigates generalising data mapping tool and producing linked data (with ADS) http://hypermedia.research.glam.ac.uk/kos/star/ http://hypermedia.research.glam.ac.uk/kos/stellar
  12. Grey Literature Information Extraction (Andreas Vlachidis) • Looking to extract CRM-EH period, context, find, sample entities • Aim to cross search with archaeology datasets
  13. CRM-EH Entities and Events (Example)
  14. Contact EASTER project website http://www.ukoln.ac.uk/projects/easter/ Project publications http://www.ukoln.ac.uk/projects/easter/dissemination/ dstudhope@glam.ac.uk

×