The growing pains of a controlled vocabulary - Presentation Transcript
The growing pains of a controlled vocabulary
Introduction
Karen Loasby
Information architect
Worked for BBC for 4 years on search, navigation, metadata and content management projects
2 years previously for the Guardian newspaper archiving the paper and arranging content on the website
MSc in Information Science from City University, London
Agenda
Background
The problem
Formal classification vs. Folk tags
Our middle ground
What happened
Learning points
Questions
Background
Content management project
Regional websites
Need for metadata
Authors around the UK
Problem
Faceted classification system
Authors to tag
Central control
But …
Journalists are the specialists – know the domain and the vocabulary.
Formal classification
Pre-determined terms
Centralised control
Rich relationships
Folk tags
What it is then?
Folksonomy, ethnoclassification, social classification, social categorisation and so on
Comparing approaches
Formal
High maintenance
Consistent/predictable
Rich relationships
Can be artificial
Folk
Low maintenance
Quirky/surprising
Less added value
Real user language
A role for both
Where we are using folk tagging
And where we won’t
Trust & Authority
High value to business
Missing motivation from users
Broad domain/user base
To avoid tryanny of minority
An experimental middle ground
Centralised control of terms
But encouraging absorption of user language
Higher maintenance than folk tags
Cheaper than professional cataloguing
BBC Experience Semi-automatic classification Terms suggested from the CVs Terms are OK The suggested terms do not describe the content Search or browse for terms Send suggestion to the CV team Terms are OK Send suggestion to the CV team CV team evaluate suggestion Say no to the term – change the classification on the content object Add to CV as a variant term or preferred term
Operational system
8000 requests in 10 months
From 160 journalists
Average per user of 50 terms
However this varied wildly. Our top user has suggested 476 terms
Graph showing variation between teams
Growth in the CVs
Up 15000 terms in 10 months
Most growth in person/proper names
People, venues and organisations
Up by 50% to 35,000
Growth of facets
Types of terms
Mostly good
Only 200 terms actually rejected
Synonyms vs. entirely new terms
New for names (only 2% synonyms)
Synonyms for subject (15% synonyms)
Location – needed colloquial terms
Resourcing
Handling the requests from journalists
First 3 months – one IA
Subsequently 2 to 3 junior IAs
Too much – how to reduce?
Lessons learned
Success with the journalists
They suggested terms!
Got the faceted classification
Began to suggest terms in “our” format
Some did engage at a detailed level
Lessons Learnt
Difficulties for journalists
System looks as if totally automatic as part of a content management system
“ Journalists are people too”
Users struggling with a content object tagging system; rather than page based
Example Subject: Pregnancy
Lessons Learnt
Difficulties for journalists, cont.
They find it boring
Makes it harder for the aim of “finding and re-use” to apply
Needed to do more pre-emptive work for them
Lessons learnt
Number of terms suggested depends on
Type of facet
Dynamism of content
Scope of the content
Enthusiasm of users
Next?
High value facets still need control
Make use of the metadata(!)
Sell the message
Federated management
Earlier in production
And for folk tagging?
Thanks to the IA team for their analysis work;
Jon Carey
Adil Hussein
Christine Rimmer
Thank you Questions or comments? Karen Loasby [email_address]
0 comments
Post a comment