This VALA 2016 conference paper presents the outcomes of a 2014-15 trial of automated subject indexing at the Australian Council for Educational Research. The integration of a machine learning classification tool has resulted in streamlined workflows and increased use of machine-readable data. Insights were gained into the decisions human indexers make in using a controlled vocabulary, and into the importance of quality abstracts and metadata.
Virtual Research Environments and the Open Era: Open Science Framework for R...Idowu Adegbilero-Iwari
Seminar training of librarians and lecturers on Open Science Framework at Redeemer's University, Ede, Nigeria during the 2017 International Open Access Week
Implementing figshare, research data networkJisc RDM
Implementing figshare and engaging researchers,
Research data network, September 2016, Georgina Parsons, Cranfield University and Megan Hardeman, figshare.
An institutional repository is an archive for collecting, preserving, and disseminating digital copies of the intellectual output of an institution, particularly a research institution.
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
Presentation for a Society of American Archivists Web Archiving Roundtable professional development webinar.
Session Description:
Two co-authors, Alexis Antracoli, Records Management Archivist at Drexel University and Kristen Yarmey, Associate Professor and Digital Services Librarian at the University of
Scranton will share their experiences and engage in discussion about their web archiving projects. The work they will be talking about is covered in “Capture All the URLs: First Steps in Web Archiving” (http://palrap.pitt.edu/ojs/index.php/palrap/article/view/67).
Kristen will discuss her and her colleagues’ first steps in web archiving at the University of Scranton, including making the case to campus stakeholders, finding funding, choosing Archive-It as well as selecting content and seeds to capture. Alexis will talk about establishing policies and implementing QA procedures. Both Alexis and Kristen will provide
insights on stumbling blocks, lessons learned, and future plans. Plenty of time will be allotted for questions and discussion.
Virtual Research Environments and the Open Era: Open Science Framework for R...Idowu Adegbilero-Iwari
Seminar training of librarians and lecturers on Open Science Framework at Redeemer's University, Ede, Nigeria during the 2017 International Open Access Week
Implementing figshare, research data networkJisc RDM
Implementing figshare and engaging researchers,
Research data network, September 2016, Georgina Parsons, Cranfield University and Megan Hardeman, figshare.
An institutional repository is an archive for collecting, preserving, and disseminating digital copies of the intellectual output of an institution, particularly a research institution.
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
Presentation for a Society of American Archivists Web Archiving Roundtable professional development webinar.
Session Description:
Two co-authors, Alexis Antracoli, Records Management Archivist at Drexel University and Kristen Yarmey, Associate Professor and Digital Services Librarian at the University of
Scranton will share their experiences and engage in discussion about their web archiving projects. The work they will be talking about is covered in “Capture All the URLs: First Steps in Web Archiving” (http://palrap.pitt.edu/ojs/index.php/palrap/article/view/67).
Kristen will discuss her and her colleagues’ first steps in web archiving at the University of Scranton, including making the case to campus stakeholders, finding funding, choosing Archive-It as well as selecting content and seeds to capture. Alexis will talk about establishing policies and implementing QA procedures. Both Alexis and Kristen will provide
insights on stumbling blocks, lessons learned, and future plans. Plenty of time will be allotted for questions and discussion.
Seeking for an opportunity to work with an organization that utilize my
technical skills and enrich my knowledge to help in organizational and
self-growth.
ASLA XXIII Biennial Conference - Kate Reid - Sometimes a favourite classroom activity needs a mini-makeover. This presentation describes the process of redesigning, implementing and reviewing lessons and units of work to bring them back into fashion and technological relevance. Please bring along something you want to 'nip, tuck or polish' for the practical component!
ASLA XXIII Biennial Conference - Holly Godfree & Olivia Neilson - Have you heard what's been happening in the ACT? Come and find out about the successful advocacy work that teacher librarians in Canberra have been achieving. You'll come away with support materials and practical ideas to try when you go home. Together we can make it!
Stato dell'arte sugli stili nutrizionali efficaciEnrico Ponta
In un ambiente in cui la visibilità fa da padrona, saper distinguere tra mode e stili fondati su base scientifica rappresenta il primo passo verso la corretta alimentazione.
Professor Erica McWilliam, keynote at ASLA XXIII Biennial Conference
The burgeoning volume, variety, velocity and veracity of the data that is shaping our social world means that we cannot hope to teach the next generation of young people what they need to know to live, learn and earn well. What we can and must do is to build young people's capacity to manage their own learning in such a way that they can engage meaningfully and ethically with a world replete with uncertain data and unfamiliar concepts and processes. In the era of Big Data, much of the information that young people encounter is fictitious or misleading. Given this, our pedagogy needs to assist young people to transcend a 'type and pray' approach to investigating information. Erica's presentation explores the challenges of pedagogy in the era of Big Data, and how we might respond more realistically in our libraries and classrooms.
Guided Inquiry is one of the keys to establishing the elusive collaboration that teacher librarians have been seeking for many years now. This presentation will essentially be an analysis of the learnings of a team of teachers and teacher librarians about Guided Inquiry as two inquiry units are planned, carried out and evaluated during 2011, with the aim of identifying what works and what doesn’t, and the organising principles behind Guided Inquiry, from the practitioners’ perspectives.
Seeking for an opportunity to work with an organization that utilize my
technical skills and enrich my knowledge to help in organizational and
self-growth.
ASLA XXIII Biennial Conference - Kate Reid - Sometimes a favourite classroom activity needs a mini-makeover. This presentation describes the process of redesigning, implementing and reviewing lessons and units of work to bring them back into fashion and technological relevance. Please bring along something you want to 'nip, tuck or polish' for the practical component!
ASLA XXIII Biennial Conference - Holly Godfree & Olivia Neilson - Have you heard what's been happening in the ACT? Come and find out about the successful advocacy work that teacher librarians in Canberra have been achieving. You'll come away with support materials and practical ideas to try when you go home. Together we can make it!
Stato dell'arte sugli stili nutrizionali efficaciEnrico Ponta
In un ambiente in cui la visibilità fa da padrona, saper distinguere tra mode e stili fondati su base scientifica rappresenta il primo passo verso la corretta alimentazione.
Professor Erica McWilliam, keynote at ASLA XXIII Biennial Conference
The burgeoning volume, variety, velocity and veracity of the data that is shaping our social world means that we cannot hope to teach the next generation of young people what they need to know to live, learn and earn well. What we can and must do is to build young people's capacity to manage their own learning in such a way that they can engage meaningfully and ethically with a world replete with uncertain data and unfamiliar concepts and processes. In the era of Big Data, much of the information that young people encounter is fictitious or misleading. Given this, our pedagogy needs to assist young people to transcend a 'type and pray' approach to investigating information. Erica's presentation explores the challenges of pedagogy in the era of Big Data, and how we might respond more realistically in our libraries and classrooms.
Guided Inquiry is one of the keys to establishing the elusive collaboration that teacher librarians have been seeking for many years now. This presentation will essentially be an analysis of the learnings of a team of teachers and teacher librarians about Guided Inquiry as two inquiry units are planned, carried out and evaluated during 2011, with the aim of identifying what works and what doesn’t, and the organising principles behind Guided Inquiry, from the practitioners’ perspectives.
Paradata standards presentation, IDEA012Nick Nicholas
Presentation on paradata in the context of standards used in education technology. Presentation given at the IDEA012 conference ( http://www.idea.edu.au/?page_id=56 ), panel on "Standards development for open content and online assessment", Dec 5 2012
Staffing Research Data Services at University of EdinburghRobin Rice
Invited remote talk for Georg-August University of Göttingen workshop: RDM costs and efforts on 28 May in Göttingen. Organised by the project Göttingen Research Data Exploratory (GRAcE).
This presentation was provided by Carolyn Hansen of the University of Cincinnati during the NISO Training Thursday event, Metadata and the IR, held on Thursday, February 23, 2017.
The Discipline of Organzing - Workshop presentationunmilk
This is a short presentation reflecting on my experience with the inclusion of the TDO book in my 2014 Digital Libraries course and plans for its inclusion in the 2015 course.
Presented at the OCLC Research Library Partnership meeting by Senior Program Officer, Karen Smith-Yoshimura and hosted by the University of Sydney in Sydney, NSW Australia, 17 February 2017. This meeting provided an opportunity for Research Library Partners to touch base with each other on issues of common concern and explore possible areas of future engagement with the OCLC Research Library Partnership and OCLC Research.
This presentation was provided by Lisa Deluca of Seton Hall University, during the NISO event "Blurred Boundaries: Intellectual Property and Networked Sharing of Content," held on May 22, 2019.
A brief overview of the development and current workflows for Research Data Management at Imperial College London, presented to colleagues at the University of Copenhagen and Roskilde University in Denmark.
A longitudinal study of literacy and numeracy development in Australian Indigenous students, presented by Nola Purdie at the 2004 Annual Conference of the Australian Association for Research in Education, Melbourne.
This presentation by Dr Mary Kimani discusses a qualitative study exploring success stories of refugee students from Sub-Saharan Africa. It considers African refugee students’ experiences in schools, what African refugee students bring to schools that can be incorporated positively into their learning and school experiences, and how best schools can serve African refugee students. Presented at The Centre of Excellence for Equity in Higher Education (CEEHE) inaugural one-day symposium on students from refugee backgrounds in higher education, at the University of Newcastle on 20 November 2015.
Keynote presentation by Professor Kathryn Moyle for the International Conference on Teacher Training and Education held in Solo, Indonesia on 5-6 November 2015. This presentation outlines the current global context for higher education in 2015, as a basis for examining the key trends in teacher education in the first decades of the 21st century. The purpose of this paper is to outline the current global contexts for higher education, and to provide an overview of the policies found in teacher education in those countries that consistently produce students who perform highly on international standardized tests such as PISA, TIMSS and PIRLS.
If education is to be an evidence-based profession then all teachers need access to that evidence if they are to improve student learning. This presentation considers how teachers interested in Aboriginal and Torres Strait Islander Education can access high quality research that is relevant, reliable and readable, and the importance of engaging with researchers to translate research into practice.
This presentation by Julian Fraillon and Juliette Mendelovits from Research Conference 2015 considers assessment of general capabilities and cross-curricular learning outcomes such as literacy in information and communication technologies, creative thinking and collaborative and individual problem-solving. As the expectation for such competencies to be taught in schools has increased, so has the need for teachers and schools to validly and reliably assess student learning in those areas, and to report on them in ways that inform future teaching and learning. This presentation will examine the challenges of assessing and reporting on student learning and learning growth in general capabilities and cross-curricular learning areas. The presentation will explore approaches used in research to address some of these challenges and reflect on how these can be applied in the classroom.
This presentation by Professor Kathryn Moyle at Timor-Leste conference: Finding Pathways in Education. provides an overview of the Australian Council for Educational Research (ACER), insights into some of the work ACER undertakes in teacher education and information about work ACER is undertaking in Timor Leste
This presentation by Professor Kathryn Moyle at Timor-Leste conference: Finding Pathways in Education. provides an overview of the Australian Council for Educational Research (ACER), insights into some of the work ACER undertakes in teacher education and information about work ACER is undertaking in Timor Leste.
Presentation by Dr Lawrence Ingvarson, ACER and Ed Roper, Brisbane Grammar School at the 2015 ACER Excellence in Professional Practice Conference.
The ACER Professional Community Framework describes the five domains that characterise schools with strong professional culture, as defined by the Australian Performance and Development Framework, together with key elements, indicators and rubrics. The Professional Community Questionnaire provides a confidential online survey of all teaching staff in a school, based on the framework. Initial trials indicate that the questionnaire has high levels of internal reliability.
School leaders can use the framework and questionnaire to identify key areas for action and measure changes over time. Participating schools receive a comprehensive report
based on the survey results. This session will report on the results of administering the Professional Community Questionnaire in one school.
Presentation by Philip Hider, Charles Sturt University and Barbara Spiller, Australian Council for Educational Research Australia, at Write Edit Index 2015, an Australian conference for editors, indexers, and publishing professionals. This case study focuses on the process of improving subject access to a collection of resources related to the scholarship of teaching and learning in higher education. It describes how existing controlled vocabularies in the education field were evaluated as candidates for adoption. The Australian Thesaurus of Education Descriptors (ATED) was selected and enhanced to meet the specific needs of the OLT Resource Library.
This presentation to the CoSN delegation to Singapore in January 2015 by Professor Kathryn Moyle and Pru Mitchell provides an overview of the state of digital education policy, research and practice in Australia.
More from Australian Council for Educational Research (13)
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
3. Australian
Education
Index
• First print edition 1957
• Available on Informit as A+
Education, ProQuest,
Taiwan
• Indexed by ACER staff and
external contract indexers
Indexing varies with staffing levels and budget
“an increasingly onerous
task”
0
2000
4000
6000
8000
10000
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
4. Production
steps
1. Identification of potential
sources
2. Acquisition of identified
sources
3. Selection of relevant material
from these sources
4. Cataloguing or indexing of
selected material
5. Quality assurance of indexed
records
6. Dissemination of records to
users
6. Indexing
database
Cunningham
catalogue
One vocabulary to bind them
• AEI
• EdResearch
Online
• Australian
Education
Research
Theses
• IDP Database
• Learning
Ground
Australian
Thesaurus
of
Education
Descriptor
s
Web
docsbooks
Journal
articles
conf
papers
8. Automated
classification
Why
• More to index
• Less staff time available
• Increasing metadata feeds
instead of print journals
• Increase efficiency
Our story
2009 First journal metadata
2011 Information online
presentation
2012 Increased metadata
replacing print journals
2013 Feasibility study
2014 Initial installation in June –
followed by continuous
refinement of system
9. What is the
classifier?
Two Processes
1. Training:
Uses past data to create
models of how each subject
term should be used
2. Classifier:
Uses the models to assign
subjects to new records based
on article title, abstract and
journal title
13. How the
classifier has
performed
• Provides a useful set of
descriptors on the majority
of records
• Average of 11.7 major
descriptors assigned per
record (Max=13)
• Average of 6.5 “correct”
major descriptors per
record
14. Findings
A particular challenge:
Horse-Girl Assemblages:
Towards a Post-Human
Cartography of Girls' Desire in
an Ex-Mining Valleys
Community
[Discourse, 35(3)]
• Classifier performance greatly
dependent on abstract length,
style and level of detail
• ACER index a wide variety of
material, some is not
necessarily easy to index
using ATED
• The specific topic of an article
might only have a more
general term in ATED
• Quality vs efficiency
16. Publisher
feeds
• Taylor & Francis 2009--
• SAGE 2013--
• Wiley 2013--
• Springer 2013--
• Inderscience 2013—
• Emerald (in negotiation)
• Many publishers can provide
a metadata feed of education
journals
• All in XML, but all different
from each other
• 24,138 articles received in
feeds in 2015, up from 5,006
in 2010
17. Lessons
• Indexing from the abstract
• Thesaurus structure
• Metadata
• Processes simplification
• Prioritisation
• Indexer experience
• Curation
• Skill set required in team
18. What next?
• Ongoing development of
workflows
• Possible changes to our
database structure
• More publisher feeds
• Other ways to get bibliographic
metadata into the workflow – eg
RSS feeds, search alerts from
databases
• Develop selection processes
further
• Documentation and
This paper presents the outcomes of a 2014-15 trial of automated subject indexing at the Australian Council for Educational Research. I will give some background to the project which involved the integration of a machine learning subject classification tool into our indexing process.
Tine will give you more detail on the trial and implementation stages and discuss the findings and insights we gained.
Rob who has adopted the classifier, trained it, fed it , kept it ticking and monitored its every motion has provided technical detail and analysis of findings in the paper itself. Thanks also are owed to Phil Anderson of Leidos for developing the machine learning algorithms.
Established in 1930, the Australian Council for Educational Research (ACER) is a not-for-profit organisation providing educational research services and products. The Cunningham Library at ACER (the Library) has a research level collection in Australian education. We have a mission to support the work of ACER staff working in educational research and assessment services, as well as the education community at large. Recently ACER has become a private higher education provider and we are now offering academic library services as well.
The library is still very much a physical presence – situated off the Atrium in the Melbourne office. We are proud of our physical collection dating back to 1930s. But luckily space-wise (and for the sake of our 8 interstate and overseas offices, and our predominantly remote graduate students) the new collection is predominantly digital.
As well as all this, we maintain an expert indexing team who provide a range of products and services.
Our flagship product is the Australian Education Index (AEI), a bibliographic database containing over 200,000 entries and abstracts -The Australian Education Index (AEI) is a bibliographic database containing over 200,000 entries and abstracts. Increasingly it has full-text material with 58% of records since 2000 with a DOI, Persistent URL or scanned PDF.
Selecting and indexing Australia’s education literature is labour intensive and thus an increasingly expensive activity. Curating the ever-growing range of documents and assigning thesaurus terms to metadata records are intellectually demanding processes, as well as being time consuming.
The goal of capturing and indexing comprehensively all research in Australian education is indeed an onerous task’
“The sources are numerous, and the task is growing as years pass.” This quote is from the preface to the first edition of the Australian Education Index (Radford in 1958) - so not much has changed. This reality of ever-increasing content to index and decreasing budget and staffing levels has been a long-term mantra for the indexing team, and really has to be addressed. For the sake of indexer sanity, subscriber satisfaction and budgets something in the equation has to change.
In any process change it is helpful to break an issue into its component parts, and the process of producing the Australian Education Index involves the following six information tasks – familiar to all involved in collection building.
A rigorous selection process ensures comprehensive coverage of significant Australian education research. The challenge of curating and indexing the literature on Australian education required by administrators, teachers and students is one of even greater complexity and cost, as types and sources of literature increase and the topics related to education expand.
While these six steps of production for the AEI have been constant since 1958, there have been changes in the way they are performed over the intervening years. This has been in response to both the changing formats of the resources being indexed, and the format of the Index itself.
This is a screenshot of part of an AEI record - just so I can highlight the 4 subject descriptor fields that are of interest to the subject classifier project.
major descriptors are the concepts discussed in the work – all major descriptors come from our thesaurus. It is the ‘aboutness’
The minor descriptors are also from the thesaurus - but they descrIbe aspects such as educational level of those involved in the research, and the methodology of the research
geographic descriptors: refers to the location the content is ‘about’, rather than the place of publication, although they may be the same in some content
identifiers: these are terms NOT in the thesaurus but which the indexer would like to have used – so candidate terms.
At the heart of all our indexing services is a vocabulary. The Australian Thesaurus of Education Descriptors (ATED) is the source of the major and minor subject descriptors, and is basically the glue that holds ACER’s indexing services together.
A hierarchically-structured thesaurus of concepts across all levels of education from preschool to higher education, ATED is used to index and search the subject matter of the AEI and its subsidiary databases - as well as the Library’s catalogue. It is searchable free of charge online, and can be purchased in hard copy or as an electronic dataset to be embedded into an organisation’s own information services. ATED is updated on a six-monthly basis. As at February 2016, it contains over 10,000 terms, around half of which are preferred terms, and half are references.
[Library Catalogue 60,000 recordsIndexing MasterDatabase 214,000 recordsContains all indexing records – books, reports, journal articles, theses, conference papers, book chapters etc.]Relevant records for each separate database or index product are tagged and filtered from the MasterDatabase into subject or audience specialist databases: IDP – Database of Research in International Education, Learning Ground is our indigenous education database and BOLDE covers Blended, Online learning and Distance Education]
While the value of providing the Index is not disputed, as I said - increasing costs as well as a decrease in indexing output, means support for the professional indexers was required. A typical strategy in cases like this is to investigate ways of automating the process. In our case, subject classification was considered the most complex aspect of indexing - so the obvious place to start.The quest for automated indexing is not new. A 1965 monograph by Stevens, entitled Automatic indexing: a state-of-the-art report contains almost 200 pages of experiments [in ‘automatic assignment indexing, automatic classification and categorisation, computer use of thesauri, statistical association techniques, and linguistic data processing’ (p.1).]
‘Automatic indexing’ as a concept was actually added to ATED in 1984, with a related term ‘computational linguistics’. Sad it has taken 30 years to actually get the concept.
[described as:
A branch of linguistics concerned with the use of computers for the analysis and synthesis of language data - for example, in machine translation, word frequency counts, and speech recognition and synthesis (ATED, 2013, p. 25).]
In a nutshell machine learning involves using an existing set of documents (corpus) to train a computer about what constitutes an appropriate response (in this case which thesaurus term) so that when it encounters a new document it can suggest an appropriate response.In 2010 I was involved in a proof-of-concept project trialling automated metadata for web-based education services using the Schools Online Thesaurus (ScOT) with Flinders University. At that time we concluded that
“automated classification based on artificial intelligence is useful as a means of supplementing and assisting human classification, but is not at this stage a replacement for human classification of educational resources” (Leibbrandt et al, 2010).
Our story is about two interweaving threads – Publisher journal feeds and automated classification
In 2009 Taylor & Francis changed their policy of providing us with “free for indexing” print journals. We began to receive xml files of journal article metadata instead of print journals.
In FEB 2011 Lance Deveson (our previous library manager) and myself attended the Information Online conference in Sydney and listened to a presentation about the Parliamentary Library’s automated indexing project.
This sparked our interest, because we were looking for ways to increase our indexing efficiency to help us cope with the increasing amounts of relevant material being published and reduced funding for indexing staff.
In order for ACER to even consider automated classification we would need a pool of metadata to be classified. The Journal metadata from publishers seemed to be a promising source.
We started to actively negotiate with other publishers for more journal feeds. We also spoke to the Parliamentary library about their experience.
We hoped we could use automated classification on our journal feed material to produce a set of terms that could be used by a human indexer as the basis of the final assigned terms.
In Nov 2012 –We got a quotation from SAIC (now Leidos) to investigate the application of automated classification to journal data at ACER. Funding for this was approved in August 2013 and the feasibility study was undertaken from September to December 2013
After significant dialog back & forth, we received a feasibility report. We were satisfied that an “automated classifier” could work with our journal feed data to produce useful terms for our purposes.
Jan 2014 – We requested a quotation for implementing “automated classification” into our system – this expenditure was approved in March 2014 and in June 2014 The components were installed on Robert and my computers.
SINCE THEN----- We have been continuously refining our workflows.
Reference
Revolutionising digital content ingest: building a newspaper clippings collection using practical automation to assist with selection and classification linkJudy Hutchinson, Roxanne Missingham, Information Access Branch, Parliamentary Library, Philip Anderson, SAIC Pty. Ltd
So what exactly is the Classifier? It is not hardware, and not a software “product.” It does require the installation of TeraText software, but the Classifier itself is a set of Custom programs built to suit our specific data and to work with our existing DBTextworks system. The system works with XML.
The feasibility stage of the project was about the training process -- creating and testing models by learning from past indexing records. It was found that the best fields to learn from were article title, abstract and journal title.
There was some testing using the fulltext of articles, but it was found that better results were obtained when just using the abstract.
The implementation stage of the project was about the workflow of using the training models to actually assign terms to new records, and get those terms into DBTextworks so indexers can see and use them.
The training program uses information from past records to create 4 models - one for each of the descriptor fields
We need to keep retraining to include newly indexed material and improve results over time.
We need to ensure the latest update of our thesaurus is being used. ATED is updated periodically with new and/or modified terms.
Because Classifier “learns” from what subjects have been assigned to records in the past we did a significant amount of work on our subject data to improve what the Classifier is learning from.
Firstly we cleaned up existing data to make sure only currently valid terms were used in each field
Major Descriptors
Minor Descriptors
Identifiers
Geographical
Secondly we created a set of the “best” records to use for training. For example we excluded old records with no abstract, or one sentence abstracts.
Our indexers do their indexing in DBTextworks.
Records to be classified need to have their Article title, Journal title and Abstract exported in XML format.
That XML is then run through the classifier which uses the models to assign and add suggested terms.
A new XML file which includes the suggested terms is then imported back into DBTextworks.
Initially we had to manually export a batch of records, run the classifier and then re-import the updated records.
Rob, with his IT background was able to come up with a much more streamlined way to do this. We can now process single records or batches of records with the click of a button.
We search for record(s) in DBTextworks, then click the “run classifier” button on the resulting form.
Rob has programmed this “run classifier” button to export XML data from the selected record(s), place it in a certain folder, run the classifier program, put results in another folder then import the new XML back into the record(s).
A human indexer then checks the suggested subjects and completes the record.
This is screenshot of PART of the data input screen used by indexers
The classifier populates right hand fields with suggested terms arranged by their weight – the bigger the number – the more confident the classifier is of that term.
Terms above a certain threshold weight are also shown in he left hand fields
In the Major Descriptors field we can only see the terms with a weight of more than the threshold of 10,000 were not assigned to the left hand fields
Human indexer keeps “correct” terms, deletes “incorrect” terms, adds “missed” terms to the left hand fields. They mark the record complete and save it.
Performance detail is very complex – you can see the full paper for some detailed performance information.
Performance is different in each of the 4 subject fields. Major descriptors are currently performing best. We have some limited ways to continue improving the number of correct terms in all fields over time.
Overall we believe the classifier provides a useful set of descriptors for most records, and it does make indexing quicker and increase our efficiency.
Because the training models only consider terms that have been used at least a certain number of times, it does not ever assign “new terms” itself.
New terms need to be added by human indexers a certain number of times before the classifier can learn and assign the term. This is the reason for some “missed” terms, and one of the reasons for poorer performance in the identifiers field which is where new terms usually first appear.
“Incorrect” terms
Occasionally some terms assigned by the classifier are completely wrong, such as it once strangely assigning the term 'fairy tales' to an article from a psychometrics journal. More common though are terms that while not totally wrong, are deemed by the human indexer to be too general and therefore not needed. If the classifier assigns both 'Leadership' and 'Educational leadership' the indexer might choose only the more specific term. Also, there are some terms in ATED that are very similar but have different meanings. For example - the classifier struggles to decide between the separate ATED terms: 'Scholarship' and 'Scholarships'
We are finding the classifier a useful tool but it can still be improved
Classifier performance greatly dependent on abstract length, style and level of detail. The classifier performs best when a quality abstract is available.
ACER index a wide variety of material, some is not necessarily easy to index using ATED terms. If it is an article that is difficult for a human indexer to index – then the Classifier will usually also perform poorly.
For example this article from discourse has a title which doesn’t really give good clues to what the article is about.
'The specific topic of an article might only have a more general term in ATED. The classifier is unlikely to perform well on an article if its main topic is not even in ATED, such as a specific concept from physics or mathematics.'
Quality vs efficiency - If we do lots of “classifier” indexing which has the most efficient workflow, we can certainly increase the amount of indexing throughput. However there will always be lots of indexing such as conference proceedings, book chapters and reports that don’t have metadata readily available in a useable form. These items are quite possibly more important to index as they aren’t available elsewhere. Creating an indexing record for these items is more labour intensive and time consuming.
We can run the classifier after the indexer populates all the bibliographic fields in the indexing record, but by the time they have cut and pasted or manually typed in all that data, experienced indexers report that running the classifier may not be significantly quicker than manually assigning terms.
What balance should we strike between the efficient “classifier” indexing and the less efficient, but possibly more important other material.
We are using the classifier more and more. 41% of our latest batch of indexing used the classifier.
The biggest increase in usage comes as a result of the “run classifier” button developed by Robert and mentioned earlier.
There have been a string of tweaks and enhancements to processes, dating right back to the time of the feasibility study.
Most of these changes have been a result of collaborative discussions which are time consuming but get everyone’s agreement to changes. We have had some vigorous discussion about our indexing standards and methods and even the scope of our index .
Some of the many tweaks not already mentioned include.....
The Classifier has now been implemented in the catalogue as well as our indexing database
Improved data entry screens
Changes to data entry guidelines for some fields
Many changes to structure of indexing databases and usage of particular fields
Troubleshooting the classifier, the publisher feeds and DBTextworks – if one thing changes in one place it can cause unexpected problems somewhere else.
All in-house indexers now have the programs available on their computer.
The workflow for dealing with publisher feeds and implementing the classifier are dependent on each other so I just wanted to show a slide with some information about the Publisher feeds.
Image acknowledgement: https://plus.google.com/+SteveThomas/posts/SL77ii3qWa6
You may be wondering what there is for you to learn from this project which is so uniquely tailored to an obscure Australian education index.
Well let me suggest some lessons:
There is an obvious challenge for scholars, editors and publishers – think carefully about how dependent you are on your abstract in a machine-oriented landscape
There are particular challenges for vocabulary owners who want their thesaurus, taxonomy or subject headings to be read by machines. We have not gone anywhere near the linked data model for ATED as our own systems have no way of accommodating this. We know now that the current classifier algorithms do not take into account ATED’s reference structure, so that is another challenge
Metadata – machines like clean, consistent metadata. They are not accommodating like humans. You will clean and clean and clean – so make sure you can export and import large chunks of your metadata. Thank goodness for DBTextWorks
As with any change management project, the human aspect is all important. Ours was the classic – fund the technology and a bit for the consultant, but no funds for time release/backfill for training, setting up, conducting the research required, or making a user-friendly interface. That was left to the already overloaded team.
We thought we were dealing with the subject classification step of the indexing process, but in fact the change that got highest votes from the indexers was the reduction in manual keying, or cutting and pasting of data. A not insignificant improvement.We also thought classification was the most complex step, but in fact we are beginning to regard curation as the real challenge. Given we can’t index everything, how do we prioritise across so many variables? Do we run the risk of doing the easier articles from feeds at the expense of the more important, but slower to index material such as book chapters and web-based conference proceedings.
Finally – team members who are prepared to stick at long, tedious data cleansing projects, to motivate those reluctant to change, and librarians with advanced technical skills on your team make an amazing difference.
Watch the statistics: Can we arrest the sliding quantity without reducing the quality of our index?
When will we recoup the time invested in setting this up?