SlideShare a Scribd company logo
Introducing an automated
subject classifier
Pru Mitchell, Tine Grimston
Robert Parkes
With thanks to: Phil Anderson, Leidos
#vala16 #s27
Cunningham
Library
• Services
• ACER staff
• ACER students
• Education
community
• Indexing services
Australian
Education
Index
• First print edition 1957
• Available on Informit as A+
Education, ProQuest,
Taiwan
• Indexed by ACER staff and
external contract indexers
Indexing varies with staffing levels and budget
“an increasingly onerous
task”
0
2000
4000
6000
8000
10000
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Production
steps
1. Identification of potential
sources
2. Acquisition of identified
sources
3. Selection of relevant material
from these sources
4. Cataloguing or indexing of
selected material
5. Quality assurance of indexed
records
6. Dissemination of records to
users
The product
Indexing
database
Cunningham
catalogue
One vocabulary to bind them
• AEI
• EdResearch
Online
• Australian
Education
Research
Theses
• IDP Database
• Learning
Ground
Australian
Thesaurus
of
Education
Descriptor
s
Web
docsbooks
Journal
articles
conf
papers
Machine
learning
Automated
classification
Why
• More to index
• Less staff time available
• Increasing metadata feeds
instead of print journals
• Increase efficiency
Our story
2009 First journal metadata
2011 Information online
presentation
2012 Increased metadata
replacing print journals
2013 Feasibility study
2014 Initial installation in June –
followed by continuous
refinement of system
What is the
classifier?
Two Processes
1. Training:
Uses past data to create
models of how each subject
term should be used
2. Classifier:
Uses the models to assign
subjects to new records based
on article title, abstract and
journal title
Training the classifier
• Selection of past records - not all are suitable
Running the classifier
What the human indexer sees
How the
classifier has
performed
• Provides a useful set of
descriptors on the majority
of records
• Average of 11.7 major
descriptors assigned per
record (Max=13)
• Average of 6.5 “correct”
major descriptors per
record
Findings
A particular challenge:
Horse-Girl Assemblages:
Towards a Post-Human
Cartography of Girls' Desire in
an Ex-Mining Valleys
Community
[Discourse, 35(3)]
• Classifier performance greatly
dependent on abstract length,
style and level of detail
• ACER index a wide variety of
material, some is not
necessarily easy to index
using ATED
• The specific topic of an article
might only have a more
general term in ATED
• Quality vs efficiency
Workflow improvements
Classifier use increasing due to workflow improvements
Publisher
feeds
• Taylor & Francis 2009--
• SAGE 2013--
• Wiley 2013--
• Springer 2013--
• Inderscience 2013—
• Emerald (in negotiation)
• Many publishers can provide
a metadata feed of education
journals
• All in XML, but all different
from each other
• 24,138 articles received in
feeds in 2015, up from 5,006
in 2010
Lessons
• Indexing from the abstract
• Thesaurus structure
• Metadata
• Processes simplification
• Prioritisation
• Indexer experience
• Curation
• Skill set required in team
What next?
• Ongoing development of
workflows
• Possible changes to our
database structure
• More publisher feeds
• Other ways to get bibliographic
metadata into the workflow – eg
RSS feeds, search alerts from
databases
• Develop selection processes
further
• Documentation and
Questions
library@acer.edu.au

More Related Content

Viewers also liked

Berat volume agregat andre
Berat volume agregat andreBerat volume agregat andre
Berat volume agregat andre
andrepratamaputra
 
Siltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājā
Siltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājāSiltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājā
Siltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājā
Ekonomikas ministrija/ Dzīvo siltāk
 
Amjadmalikcv
AmjadmalikcvAmjadmalikcv
Amjadmalikcv
MUHAMMAD AMJAD IQBAL
 
Nip, tuck and polish: Reworking those library lessons with your technological...
Nip, tuck and polish: Reworking those library lessons with your technological...Nip, tuck and polish: Reworking those library lessons with your technological...
Nip, tuck and polish: Reworking those library lessons with your technological...
Australian School Library Association
 
Memorial jk
Memorial jkMemorial jk
Memorial jk
Cicero Feltrin
 
CV NOFERI 2015
CV NOFERI 2015CV NOFERI 2015
CV NOFERI 2015nof feri
 
We are the weather makers: Advocacy in action
We are the weather makers: Advocacy in actionWe are the weather makers: Advocacy in action
We are the weather makers: Advocacy in action
Australian School Library Association
 
Stato dell'arte sugli stili nutrizionali efficaci
Stato dell'arte sugli stili nutrizionali efficaciStato dell'arte sugli stili nutrizionali efficaci
Stato dell'arte sugli stili nutrizionali efficaci
Enrico Ponta
 
(19 02-13)--ndt tests & their importance
(19 02-13)--ndt tests & their importance(19 02-13)--ndt tests & their importance
(19 02-13)--ndt tests & their importanceRajesh Sharma
 
Library pedagogy in the era of Big Data
Library pedagogy in the era of Big DataLibrary pedagogy in the era of Big Data
Library pedagogy in the era of Big Data
Australian School Library Association
 
13 3
13 313 3
Guided inquiry does it work
Guided inquiry does it workGuided inquiry does it work
Guided inquiry does it work
Australian School Library Association
 
Memoria viaje Ruta del Tequila
Memoria viaje Ruta del TequilaMemoria viaje Ruta del Tequila
Memoria viaje Ruta del Tequila
Yesenia Casanova
 
Natural Disasters
Natural DisastersNatural Disasters
Natural Disasters
Ikram Ul haq
 
Industrial Dye Presentation 12 11 2015
Industrial Dye Presentation 12 11 2015Industrial Dye Presentation 12 11 2015
Industrial Dye Presentation 12 11 2015Thomas Tarantelli
 
Electron beam machining by Himanshu Vaid
Electron beam machining by Himanshu VaidElectron beam machining by Himanshu Vaid
Electron beam machining by Himanshu Vaid
Himanshu Vaid
 
Exercício Senac - Dandara Alexandra
Exercício Senac - Dandara AlexandraExercício Senac - Dandara Alexandra
Exercício Senac - Dandara Alexandra
Dandara Alexandra
 
Un retraité sur 8 perçoit une rente de retraite supplémentaire.
Un retraité sur 8 perçoit une rente de retraite supplémentaire. Un retraité sur 8 perçoit une rente de retraite supplémentaire.
Un retraité sur 8 perçoit une rente de retraite supplémentaire.
Anne-Bénédicte LE MENTEC
 

Viewers also liked (20)

Berat volume agregat andre
Berat volume agregat andreBerat volume agregat andre
Berat volume agregat andre
 
Siltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājā
Siltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājāSiltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājā
Siltumapgādes sistēmas renovācija/ rekonstrukcija daudzdzīvokļu mājā
 
Amjadmalikcv
AmjadmalikcvAmjadmalikcv
Amjadmalikcv
 
Nip, tuck and polish: Reworking those library lessons with your technological...
Nip, tuck and polish: Reworking those library lessons with your technological...Nip, tuck and polish: Reworking those library lessons with your technological...
Nip, tuck and polish: Reworking those library lessons with your technological...
 
Qadirov Elmar
Qadirov ElmarQadirov Elmar
Qadirov Elmar
 
Memorial jk
Memorial jkMemorial jk
Memorial jk
 
CV NOFERI 2015
CV NOFERI 2015CV NOFERI 2015
CV NOFERI 2015
 
We are the weather makers: Advocacy in action
We are the weather makers: Advocacy in actionWe are the weather makers: Advocacy in action
We are the weather makers: Advocacy in action
 
Stato dell'arte sugli stili nutrizionali efficaci
Stato dell'arte sugli stili nutrizionali efficaciStato dell'arte sugli stili nutrizionali efficaci
Stato dell'arte sugli stili nutrizionali efficaci
 
Sirisha_V&V
Sirisha_V&VSirisha_V&V
Sirisha_V&V
 
(19 02-13)--ndt tests & their importance
(19 02-13)--ndt tests & their importance(19 02-13)--ndt tests & their importance
(19 02-13)--ndt tests & their importance
 
Library pedagogy in the era of Big Data
Library pedagogy in the era of Big DataLibrary pedagogy in the era of Big Data
Library pedagogy in the era of Big Data
 
13 3
13 313 3
13 3
 
Guided inquiry does it work
Guided inquiry does it workGuided inquiry does it work
Guided inquiry does it work
 
Memoria viaje Ruta del Tequila
Memoria viaje Ruta del TequilaMemoria viaje Ruta del Tequila
Memoria viaje Ruta del Tequila
 
Natural Disasters
Natural DisastersNatural Disasters
Natural Disasters
 
Industrial Dye Presentation 12 11 2015
Industrial Dye Presentation 12 11 2015Industrial Dye Presentation 12 11 2015
Industrial Dye Presentation 12 11 2015
 
Electron beam machining by Himanshu Vaid
Electron beam machining by Himanshu VaidElectron beam machining by Himanshu Vaid
Electron beam machining by Himanshu Vaid
 
Exercício Senac - Dandara Alexandra
Exercício Senac - Dandara AlexandraExercício Senac - Dandara Alexandra
Exercício Senac - Dandara Alexandra
 
Un retraité sur 8 perçoit une rente de retraite supplémentaire.
Un retraité sur 8 perçoit une rente de retraite supplémentaire. Un retraité sur 8 perçoit une rente de retraite supplémentaire.
Un retraité sur 8 perçoit une rente de retraite supplémentaire.
 

Similar to Introducing an automated subject classifier

Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
ResearchSpace
 
Paradata standards presentation, IDEA012
Paradata standards presentation, IDEA012Paradata standards presentation, IDEA012
Paradata standards presentation, IDEA012
Nick Nicholas
 
EBSCO Discovery Service @ University of Toledo - Rigda
EBSCO Discovery Service @ University of Toledo - RigdaEBSCO Discovery Service @ University of Toledo - Rigda
EBSCO Discovery Service @ University of Toledo - Rigda
SWON-EDS
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
Nikesh Narayanan
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
Robin Rice
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
National Information Standards Organization (NISO)
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
EUDAT
 
Roles & Skills for RDM
Roles & Skills for RDMRoles & Skills for RDM
Roles & Skills for RDM
EDINA, University of Edinburgh
 
The Discipline of Organzing - Workshop presentation
The Discipline of Organzing - Workshop presentationThe Discipline of Organzing - Workshop presentation
The Discipline of Organzing - Workshop presentation
unmilk
 
MetadataTheory: Introduction to Repositories (8th of 10)
MetadataTheory: Introduction to Repositories (8th of 10)MetadataTheory: Introduction to Repositories (8th of 10)
MetadataTheory: Introduction to Repositories (8th of 10)
Nikos Palavitsinis, PhD
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library Partnership
OCLC
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data managementIncisive_Events
 
Deluca "Building Momentum and Support for Institutional Repository Deposits"
Deluca "Building Momentum and Support for Institutional Repository Deposits"Deluca "Building Momentum and Support for Institutional Repository Deposits"
Deluca "Building Momentum and Support for Institutional Repository Deposits"
National Information Standards Organization (NISO)
 
The ELIXIR UK training portal (TeSS) by Carole Goble
The ELIXIR UK training portal (TeSS) by Carole GobleThe ELIXIR UK training portal (TeSS) by Carole Goble
The ELIXIR UK training portal (TeSS) by Carole Goble
ELIXIR UK
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
Sarah Anna Stewart
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of Edinburgh
Robin Rice
 
Sharing the load: librarians and research data support services
Sharing the load: librarians and research data support servicesSharing the load: librarians and research data support services
Sharing the load: librarians and research data support services
London South Bank University
 
Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...
Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...
Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...
MelindaFischer1
 
Getting on with it (research support at an academic library) presented at Uni...
Getting on with it (research support at an academic library) presented at Uni...Getting on with it (research support at an academic library) presented at Uni...
Getting on with it (research support at an academic library) presented at Uni...Reed Elsevier
 

Similar to Introducing an automated subject classifier (20)

Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Paradata standards presentation, IDEA012
Paradata standards presentation, IDEA012Paradata standards presentation, IDEA012
Paradata standards presentation, IDEA012
 
EBSCO Discovery Service @ University of Toledo - Rigda
EBSCO Discovery Service @ University of Toledo - RigdaEBSCO Discovery Service @ University of Toledo - Rigda
EBSCO Discovery Service @ University of Toledo - Rigda
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
Roles & Skills for RDM
Roles & Skills for RDMRoles & Skills for RDM
Roles & Skills for RDM
 
The Discipline of Organzing - Workshop presentation
The Discipline of Organzing - Workshop presentationThe Discipline of Organzing - Workshop presentation
The Discipline of Organzing - Workshop presentation
 
MetadataTheory: Introduction to Repositories (8th of 10)
MetadataTheory: Introduction to Repositories (8th of 10)MetadataTheory: Introduction to Repositories (8th of 10)
MetadataTheory: Introduction to Repositories (8th of 10)
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library Partnership
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 
Deluca "Building Momentum and Support for Institutional Repository Deposits"
Deluca "Building Momentum and Support for Institutional Repository Deposits"Deluca "Building Momentum and Support for Institutional Repository Deposits"
Deluca "Building Momentum and Support for Institutional Repository Deposits"
 
The ELIXIR UK training portal (TeSS) by Carole Goble
The ELIXIR UK training portal (TeSS) by Carole GobleThe ELIXIR UK training portal (TeSS) by Carole Goble
The ELIXIR UK training portal (TeSS) by Carole Goble
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of Edinburgh
 
Sharing the load: librarians and research data support services
Sharing the load: librarians and research data support servicesSharing the load: librarians and research data support services
Sharing the load: librarians and research data support services
 
Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...
Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...
Ms Gitta Siekmann - Identifying and Sharing Work Skills - International Appro...
 
Getting on with it (research support at an academic library) presented at Uni...
Getting on with it (research support at an academic library) presented at Uni...Getting on with it (research support at an academic library) presented at Uni...
Getting on with it (research support at an academic library) presented at Uni...
 

More from Australian Council for Educational Research

Literacy and numeracy development for Indigenous students
Literacy and numeracy development for Indigenous studentsLiteracy and numeracy development for Indigenous students
Literacy and numeracy development for Indigenous students
Australian Council for Educational Research
 
APA 6th edition referencing. Part 2: Reference list
APA 6th edition referencing. Part 2: Reference listAPA 6th edition referencing. Part 2: Reference list
APA 6th edition referencing. Part 2: Reference list
Australian Council for Educational Research
 
APA 6th edition referencing. Part 1: In text citation
APA 6th edition referencing. Part 1: In text citationAPA 6th edition referencing. Part 1: In text citation
APA 6th edition referencing. Part 1: In text citation
Australian Council for Educational Research
 
Best-practice model of teaching and learning for refugee students from Sub-Sa...
Best-practice model of teaching and learning for refugee students from Sub-Sa...Best-practice model of teaching and learning for refugee students from Sub-Sa...
Best-practice model of teaching and learning for refugee students from Sub-Sa...
Australian Council for Educational Research
 
Global trends in higher education policies
Global trends in higher education policiesGlobal trends in higher education policies
Global trends in higher education policies
Australian Council for Educational Research
 
Translating research into action
Translating research into actionTranslating research into action
Translating research into action
Australian Council for Educational Research
 
Assessing general capabilities
Assessing general capabilitiesAssessing general capabilities
Assessing general capabilities
Australian Council for Educational Research
 
Finding pathways in education
Finding pathways in educationFinding pathways in education
Finding pathways in education
Australian Council for Educational Research
 
Procurando caminhos na educacao pesquisa
Procurando caminhos na educacao pesquisaProcurando caminhos na educacao pesquisa
Procurando caminhos na educacao pesquisa
Australian Council for Educational Research
 
Elaborating responses to fraction assessment tasks reveals students’ algebrai...
Elaborating responses to fraction assessment tasks reveals students’ algebrai...Elaborating responses to fraction assessment tasks reveals students’ algebrai...
Elaborating responses to fraction assessment tasks reveals students’ algebrai...Australian Council for Educational Research
 
How strong is your school as a professional community?
How strong is your school as a professional community?How strong is your school as a professional community?
How strong is your school as a professional community?
Australian Council for Educational Research
 
Improving subject access to the Office for Learning and Teaching's resource c...
Improving subject access to the Office for Learning and Teaching's resource c...Improving subject access to the Office for Learning and Teaching's resource c...
Improving subject access to the Office for Learning and Teaching's resource c...
Australian Council for Educational Research
 
Digital education in Australia
Digital education in AustraliaDigital education in Australia
Digital education in Australia
Australian Council for Educational Research
 

More from Australian Council for Educational Research (13)

Literacy and numeracy development for Indigenous students
Literacy and numeracy development for Indigenous studentsLiteracy and numeracy development for Indigenous students
Literacy and numeracy development for Indigenous students
 
APA 6th edition referencing. Part 2: Reference list
APA 6th edition referencing. Part 2: Reference listAPA 6th edition referencing. Part 2: Reference list
APA 6th edition referencing. Part 2: Reference list
 
APA 6th edition referencing. Part 1: In text citation
APA 6th edition referencing. Part 1: In text citationAPA 6th edition referencing. Part 1: In text citation
APA 6th edition referencing. Part 1: In text citation
 
Best-practice model of teaching and learning for refugee students from Sub-Sa...
Best-practice model of teaching and learning for refugee students from Sub-Sa...Best-practice model of teaching and learning for refugee students from Sub-Sa...
Best-practice model of teaching and learning for refugee students from Sub-Sa...
 
Global trends in higher education policies
Global trends in higher education policiesGlobal trends in higher education policies
Global trends in higher education policies
 
Translating research into action
Translating research into actionTranslating research into action
Translating research into action
 
Assessing general capabilities
Assessing general capabilitiesAssessing general capabilities
Assessing general capabilities
 
Finding pathways in education
Finding pathways in educationFinding pathways in education
Finding pathways in education
 
Procurando caminhos na educacao pesquisa
Procurando caminhos na educacao pesquisaProcurando caminhos na educacao pesquisa
Procurando caminhos na educacao pesquisa
 
Elaborating responses to fraction assessment tasks reveals students’ algebrai...
Elaborating responses to fraction assessment tasks reveals students’ algebrai...Elaborating responses to fraction assessment tasks reveals students’ algebrai...
Elaborating responses to fraction assessment tasks reveals students’ algebrai...
 
How strong is your school as a professional community?
How strong is your school as a professional community?How strong is your school as a professional community?
How strong is your school as a professional community?
 
Improving subject access to the Office for Learning and Teaching's resource c...
Improving subject access to the Office for Learning and Teaching's resource c...Improving subject access to the Office for Learning and Teaching's resource c...
Improving subject access to the Office for Learning and Teaching's resource c...
 
Digital education in Australia
Digital education in AustraliaDigital education in Australia
Digital education in Australia
 

Recently uploaded

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 

Recently uploaded (20)

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 

Introducing an automated subject classifier

  • 1. Introducing an automated subject classifier Pru Mitchell, Tine Grimston Robert Parkes With thanks to: Phil Anderson, Leidos #vala16 #s27
  • 2. Cunningham Library • Services • ACER staff • ACER students • Education community • Indexing services
  • 3. Australian Education Index • First print edition 1957 • Available on Informit as A+ Education, ProQuest, Taiwan • Indexed by ACER staff and external contract indexers Indexing varies with staffing levels and budget “an increasingly onerous task” 0 2000 4000 6000 8000 10000 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
  • 4. Production steps 1. Identification of potential sources 2. Acquisition of identified sources 3. Selection of relevant material from these sources 4. Cataloguing or indexing of selected material 5. Quality assurance of indexed records 6. Dissemination of records to users
  • 6. Indexing database Cunningham catalogue One vocabulary to bind them • AEI • EdResearch Online • Australian Education Research Theses • IDP Database • Learning Ground Australian Thesaurus of Education Descriptor s Web docsbooks Journal articles conf papers
  • 8. Automated classification Why • More to index • Less staff time available • Increasing metadata feeds instead of print journals • Increase efficiency Our story 2009 First journal metadata 2011 Information online presentation 2012 Increased metadata replacing print journals 2013 Feasibility study 2014 Initial installation in June – followed by continuous refinement of system
  • 9. What is the classifier? Two Processes 1. Training: Uses past data to create models of how each subject term should be used 2. Classifier: Uses the models to assign subjects to new records based on article title, abstract and journal title
  • 10. Training the classifier • Selection of past records - not all are suitable
  • 12. What the human indexer sees
  • 13. How the classifier has performed • Provides a useful set of descriptors on the majority of records • Average of 11.7 major descriptors assigned per record (Max=13) • Average of 6.5 “correct” major descriptors per record
  • 14. Findings A particular challenge: Horse-Girl Assemblages: Towards a Post-Human Cartography of Girls' Desire in an Ex-Mining Valleys Community [Discourse, 35(3)] • Classifier performance greatly dependent on abstract length, style and level of detail • ACER index a wide variety of material, some is not necessarily easy to index using ATED • The specific topic of an article might only have a more general term in ATED • Quality vs efficiency
  • 15. Workflow improvements Classifier use increasing due to workflow improvements
  • 16. Publisher feeds • Taylor & Francis 2009-- • SAGE 2013-- • Wiley 2013-- • Springer 2013-- • Inderscience 2013— • Emerald (in negotiation) • Many publishers can provide a metadata feed of education journals • All in XML, but all different from each other • 24,138 articles received in feeds in 2015, up from 5,006 in 2010
  • 17. Lessons • Indexing from the abstract • Thesaurus structure • Metadata • Processes simplification • Prioritisation • Indexer experience • Curation • Skill set required in team
  • 18. What next? • Ongoing development of workflows • Possible changes to our database structure • More publisher feeds • Other ways to get bibliographic metadata into the workflow – eg RSS feeds, search alerts from databases • Develop selection processes further • Documentation and

Editor's Notes

  1. This paper presents the outcomes of a 2014-15 trial of automated subject indexing at the Australian Council for Educational Research. I will give some background to the project which involved the integration of a machine learning subject classification tool into our indexing process. Tine will give you more detail on the trial and implementation stages and discuss the findings and insights we gained. Rob who has adopted the classifier, trained it, fed it , kept it ticking and monitored its every motion has provided technical detail and analysis of findings in the paper itself. Thanks also are owed to Phil Anderson of Leidos for developing the machine learning algorithms.
  2. Established in 1930, the Australian Council for Educational Research (ACER) is a not-for-profit organisation providing educational research services and products. The Cunningham Library at ACER (the Library) has a research level collection in Australian education. We have a mission to support the work of ACER staff working in educational research and assessment services, as well as the education community at large. Recently ACER has become a private higher education provider and we are now offering academic library services as well. The library is still very much a physical presence – situated off the Atrium in the Melbourne office. We are proud of our physical collection dating back to 1930s. But luckily space-wise (and for the sake of our 8 interstate and overseas offices, and our predominantly remote graduate students) the new collection is predominantly digital. As well as all this, we maintain an expert indexing team who provide a range of products and services.
  3. Our flagship product is the Australian Education Index (AEI), a bibliographic database containing over 200,000 entries and abstracts - The Australian Education Index (AEI) is a bibliographic database containing over 200,000 entries and abstracts. Increasingly it has full-text material with 58% of records since 2000 with a DOI, Persistent URL or scanned PDF. Selecting and indexing Australia’s education literature is labour intensive and thus an increasingly expensive activity. Curating the ever-growing range of documents and assigning thesaurus terms to metadata records are intellectually demanding processes, as well as being time consuming. The goal of capturing and indexing comprehensively all research in Australian education is indeed an onerous task’ “The sources are numerous, and the task is growing as years pass.” This quote is from the preface to the first edition of the Australian Education Index  (Radford in 1958)  - so not much has changed. This reality of ever-increasing content to index and decreasing budget and staffing levels has been a long-term mantra for the indexing team, and really has to be addressed. For the sake of indexer sanity, subscriber satisfaction and  budgets something in the equation has to change.
  4. In any process change it is helpful to break an issue into its component parts, and the process of producing the Australian Education Index involves the following six information tasks – familiar to all involved in collection building. A rigorous selection process ensures comprehensive coverage of significant Australian education research. The challenge of curating and indexing the literature on Australian education required by administrators, teachers and students is one of even greater complexity and cost, as types and sources of literature increase and the topics related to education expand. While these six steps of production for the AEI have been constant since 1958, there have been changes in the way they are performed over the intervening years. This has been in response to both the changing formats of the resources being indexed, and the format of the Index itself.
  5. This is a screenshot of part of an AEI record - just so I can highlight the 4 subject descriptor fields that are of interest to the subject classifier project. major descriptors are the concepts discussed in the work – all major descriptors come from our thesaurus. It is the ‘aboutness’ The minor descriptors are also from the thesaurus - but they descrIbe aspects such as educational level of those involved in the research, and the methodology of the research  geographic descriptors: refers to the location the content is ‘about’, rather than the place of publication, although they may be the same in some content identifiers: these are terms NOT in the thesaurus but which the indexer would like to have used – so candidate terms.
  6. At the heart of all our indexing services is a vocabulary. The Australian Thesaurus of Education Descriptors (ATED) is the source of the major and minor subject descriptors, and is basically the glue that holds ACER’s indexing services together. A hierarchically-structured thesaurus of concepts across all levels of education from preschool to higher education, ATED is used to index and search the subject matter of the AEI and its subsidiary databases - as well as the Library’s catalogue. It is searchable free of charge online, and can be purchased in hard copy or as an electronic dataset to be embedded into an organisation’s own information services. ATED is updated on a six-monthly basis. As at February 2016, it contains over 10,000 terms, around half of which are preferred terms, and half are references. [Library Catalogue 60,000 records Indexing MasterDatabase 214,000 records Contains all indexing records – books, reports, journal articles, theses, conference papers, book chapters etc.] Relevant records for each separate database or index product are tagged and filtered from the MasterDatabase into subject or audience specialist databases: IDP – Database of Research in International Education, Learning Ground is our indigenous education database and BOLDE covers Blended, Online learning and Distance Education]
  7. While the value of providing the Index is not disputed, as I said - increasing costs as well as a decrease in indexing output, means support for the professional indexers was required. A typical strategy in cases like this is to investigate ways of automating the process. In our case, subject classification was considered the most complex aspect of indexing - so the obvious place to start. The quest for automated indexing is not new. A 1965 monograph by Stevens, entitled Automatic indexing: a state-of-the-art report contains almost 200 pages of experiments [in ‘automatic assignment indexing, automatic classification and categorisation, computer use of thesauri, statistical association techniques, and linguistic data processing’ (p.1).] ‘Automatic indexing’ as a concept was actually added to ATED in 1984, with a related term ‘computational linguistics’. Sad it has taken 30 years to actually get the concept. [described as: A branch of linguistics concerned with the use of computers for the analysis and synthesis of language data - for example, in machine translation, word frequency counts, and speech recognition and synthesis (ATED, 2013, p. 25).] In a nutshell machine learning involves using an existing set of documents (corpus) to train a computer about what constitutes an appropriate response (in this case which thesaurus term) so that when it encounters a new document it can suggest an appropriate response. In 2010 I was involved in a proof-of-concept project trialling automated metadata for web-based education services using the Schools Online Thesaurus (ScOT) with Flinders University. At that time we concluded that  “automated classification based on artificial intelligence is useful as a means of supplementing and assisting human classification, but is not at this stage a replacement for human classification of educational resources” (Leibbrandt et al, 2010).
  8. Our story is about two interweaving threads – Publisher journal feeds and automated classification In 2009 Taylor & Francis changed their policy of providing us with “free for indexing” print journals. We began to receive xml files of journal article metadata instead of print journals. In FEB 2011 Lance Deveson (our previous library manager) and myself attended the Information Online conference in Sydney and listened to a presentation about the Parliamentary Library’s automated indexing project. This sparked our interest, because we were looking for ways to increase our indexing efficiency to help us cope with the increasing amounts of relevant material being published and reduced funding for indexing staff. In order for ACER to even consider automated classification we would need a pool of metadata to be classified. The Journal metadata from publishers seemed to be a promising source. We started to actively negotiate with other publishers for more journal feeds. We also spoke to the Parliamentary library about their experience. We hoped we could use automated classification on our journal feed material to produce a set of terms that could be used by a human indexer as the basis of the final assigned terms. In Nov 2012 –We got a quotation from SAIC (now Leidos) to investigate the application of automated classification to journal data at ACER. Funding for this was approved in August 2013 and the feasibility study was undertaken from September to December 2013   After significant dialog back & forth, we received a feasibility report. We were satisfied that an “automated classifier” could work with our journal feed data to produce useful terms for our purposes.    Jan 2014 – We requested a quotation for implementing “automated classification” into our system – this expenditure was approved in March 2014 and in June 2014 The components were installed on Robert and my computers.    SINCE THEN----- We have been continuously refining our workflows. Reference Revolutionising digital content ingest: building a newspaper clippings collection using practical automation to assist with selection and classification link Judy Hutchinson, Roxanne Missingham, Information Access Branch, Parliamentary Library, Philip Anderson, SAIC Pty. Ltd
  9. So what exactly is the Classifier? It is not hardware, and not a software “product.” It does require the installation of TeraText software, but the Classifier itself is a set of Custom programs built to suit our specific data and to work with our existing DBTextworks system. The system works with XML. The feasibility stage of the project was about the training process -- creating and testing models by learning from past indexing records. It was found that the best fields to learn from were article title, abstract and journal title. There was some testing using the fulltext of articles, but it was found that better results were obtained when just using the abstract. The implementation stage of the project was about the workflow of using the training models to actually assign terms to new records, and get those terms into DBTextworks so indexers can see and use them.
  10. The training program uses information from past records to create 4 models - one for each of the descriptor fields We need to keep retraining to include newly indexed material and improve results over time. We need to ensure the latest update of our thesaurus is being used. ATED is updated periodically with new and/or modified terms. Because Classifier “learns” from what subjects have been assigned to records in the past we did a significant amount of work on our subject data to improve what the Classifier is learning from. Firstly we cleaned up existing data to make sure only currently valid terms were used in each field Major Descriptors Minor Descriptors Identifiers Geographical Secondly we created a set of the “best” records to use for training. For example we excluded old records with no abstract, or one sentence abstracts.
  11. Our indexers do their indexing in DBTextworks. Records to be classified need to have their Article title, Journal title and Abstract exported in XML format. That XML is then run through the classifier which uses the models to assign and add suggested terms. A new XML file which includes the suggested terms is then imported back into DBTextworks. Initially we had to manually export a batch of records, run the classifier and then re-import the updated records. Rob, with his IT background was able to come up with a much more streamlined way to do this. We can now process single records or batches of records with the click of a button. We search for record(s) in DBTextworks, then click the “run classifier” button on the resulting form. Rob has programmed this “run classifier” button to export XML data from the selected record(s), place it in a certain folder, run the classifier program, put results in another folder then import the new XML back into the record(s). A human indexer then checks the suggested subjects and completes the record.
  12. This is screenshot of PART of the data input screen used by indexers The classifier populates right hand fields with suggested terms arranged by their weight – the bigger the number – the more confident the classifier is of that term. Terms above a certain threshold weight are also shown in he left hand fields In the Major Descriptors field we can only see the terms with a weight of more than the threshold of 10,000 were not assigned to the left hand fields Human indexer keeps “correct” terms, deletes “incorrect” terms, adds “missed” terms to the left hand fields. They mark the record complete and save it.
  13. Performance detail is very complex – you can see the full paper for some detailed performance information. Performance is different in each of the 4 subject fields. Major descriptors are currently performing best. We have some limited ways to continue improving the number of correct terms in all fields over time. Overall we believe the classifier provides a useful set of descriptors for most records, and it does make indexing quicker and increase our efficiency. Because the training models only consider terms that have been used at least a certain number of times, it does not ever assign “new terms” itself. New terms need to be added by human indexers a certain number of times before the classifier can learn and assign the term. This is the reason for some “missed” terms, and one of the reasons for poorer performance in the identifiers field which is where new terms usually first appear. “Incorrect” terms Occasionally some terms assigned by the classifier are completely wrong, such as it once strangely assigning the term 'fairy tales' to an article from a psychometrics journal. More common though are terms that while not totally wrong, are deemed by the human indexer to be too general and therefore not needed. If the classifier assigns both 'Leadership' and 'Educational leadership' the indexer might choose only the more specific term. Also, there are some terms in ATED that are very similar but have different meanings. For example - the classifier struggles to decide between the separate ATED terms: 'Scholarship' and 'Scholarships' We are finding the classifier a useful tool but it can still be improved
  14. Classifier performance greatly dependent on abstract length, style and level of detail. The classifier performs best when a quality abstract is available. ACER index a wide variety of material, some is not necessarily easy to index using ATED terms. If it is an article that is difficult for a human indexer to index – then the Classifier will usually also perform poorly. For example this article from discourse has a title which doesn’t really give good clues to what the article is about. 'The specific topic of an article might only have a more general term in ATED. The classifier is unlikely to perform well on an article if its main topic is not even in ATED, such as a specific concept from physics or mathematics.' Quality vs efficiency - If we do lots of “classifier” indexing which has the most efficient workflow, we can certainly increase the amount of indexing throughput. However there will always be lots of indexing such as conference proceedings, book chapters and reports that don’t have metadata readily available in a useable form. These items are quite possibly more important to index as they aren’t available elsewhere. Creating an indexing record for these items is more labour intensive and time consuming. We can run the classifier after the indexer populates all the bibliographic fields in the indexing record, but by the time they have cut and pasted or manually typed in all that data, experienced indexers report that running the classifier may not be significantly quicker than manually assigning terms. What balance should we strike between the efficient “classifier” indexing and the less efficient, but possibly more important other material.
  15. We are using the classifier more and more. 41% of our latest batch of indexing used the classifier. The biggest increase in usage comes as a result of the “run classifier” button developed by Robert and mentioned earlier. There have been a string of tweaks and enhancements to processes, dating right back to the time of the feasibility study. Most of these changes have been a result of collaborative discussions which are time consuming but get everyone’s agreement to changes. We have had some vigorous discussion about our indexing standards and methods and even the scope of our index . Some of the many tweaks not already mentioned include..... The Classifier has now been implemented in the catalogue as well as our indexing database Improved data entry screens Changes to data entry guidelines for some fields Many changes to structure of indexing databases and usage of particular fields Troubleshooting the classifier, the publisher feeds and DBTextworks – if one thing changes in one place it can cause unexpected problems somewhere else. All in-house indexers now have the programs available on their computer.
  16. The workflow for dealing with publisher feeds and implementing the classifier are dependent on each other so I just wanted to show a slide with some information about the Publisher feeds. Image acknowledgement: https://plus.google.com/+SteveThomas/posts/SL77ii3qWa6
  17. You may be wondering what there is for you to learn from this project which is so uniquely tailored to an obscure Australian education index. Well let me suggest some lessons: There is an obvious challenge for scholars, editors and publishers – think carefully about how dependent you are on your abstract in a machine-oriented landscape There are particular challenges for vocabulary owners who want their thesaurus, taxonomy or subject headings to be read by machines. We have not gone anywhere near the linked data model for ATED as our own systems have no way of accommodating this. We know now that the current classifier algorithms do not take into account ATED’s reference structure, so that is another challenge Metadata – machines like clean, consistent metadata. They are not accommodating like humans. You will clean and clean and clean – so make sure you can export and import large chunks of your metadata. Thank goodness for DBTextWorks As with any change management project, the human aspect is all important. Ours was the classic – fund the technology and a bit for the consultant, but no funds for time release/backfill for training, setting up, conducting the research required, or making a user-friendly interface. That was left to the already overloaded team. We thought we were dealing with the subject classification step of the indexing process, but in fact the change that got highest votes from the indexers was the reduction in manual keying, or cutting and pasting of data. A not insignificant improvement. We also thought classification was the most complex step, but in fact we are beginning to regard curation as the real challenge. Given we can’t index everything, how do we prioritise across so many variables? Do we run the risk of doing the easier articles from feeds at the expense of the more important, but slower to index material such as book chapters and web-based conference proceedings. Finally – team members who are prepared to stick at long, tedious data cleansing projects, to motivate those reluctant to change, and librarians with advanced technical skills on your team make an amazing difference.
  18. Watch the statistics: Can we arrest the sliding quantity without reducing the quality of our index? When will we recoup the time invested in setting this up?