The growing pains of a controlled vocabulary
Introduction <ul><li>Karen Loasby </li></ul><ul><li>Information architect </li></ul><ul><li>Worked for BBC for 4 years on ...
Agenda <ul><li>Background </li></ul><ul><li>The problem </li></ul><ul><li>Formal classification vs. Folk tags </li></ul><u...
Background <ul><li>Content management project </li></ul><ul><li>Regional websites </li></ul><ul><li>Need for metadata </li...
 
Problem <ul><li>Faceted classification system </li></ul><ul><li>Authors to tag </li></ul><ul><li>Central control  </li></u...
Formal classification <ul><li>Pre-determined  terms </li></ul><ul><li>Centralised control </li></ul><ul><li>Rich relations...
Folk tags <ul><li>What it is then? </li></ul><ul><li>Folksonomy, ethnoclassification, social classification, social catego...
Comparing approaches <ul><li>Formal </li></ul><ul><li>High maintenance </li></ul><ul><li>Consistent/predictable </li></ul>...
A role for both <ul><li>Where we are using folk tagging  </li></ul><ul><li>And where we won’t </li></ul><ul><ul><li>Trust ...
An experimental middle ground <ul><li>Centralised control of terms </li></ul><ul><li>But encouraging absorption of user la...
BBC Experience Semi-automatic  classification  Terms suggested from the CVs Terms are OK The suggested terms do not descri...
Operational system <ul><li>8000 requests in 10 months </li></ul><ul><li>From 160 journalists </li></ul><ul><ul><li>Average...
Graph showing variation between teams
Growth in the CVs <ul><ul><li>Up 15000 terms in 10 months </li></ul></ul><ul><ul><li>Most growth in person/proper names  <...
Growth of facets
Types of terms <ul><li>Mostly good </li></ul><ul><ul><li>Only 200 terms actually rejected   </li></ul></ul><ul><li>Synonym...
Resourcing <ul><li>Handling the requests from journalists  </li></ul><ul><li>First 3 months – one IA </li></ul><ul><li>Sub...
Lessons learned <ul><li>Success with the journalists </li></ul><ul><ul><li>They suggested terms! </li></ul></ul><ul><ul><l...
Lessons Learnt <ul><li>Difficulties for journalists </li></ul><ul><ul><li>System looks as if totally automatic as part of ...
Example Subject: Pregnancy
Lessons Learnt <ul><li>Difficulties for journalists, cont. </li></ul><ul><ul><li>They find it boring  </li></ul></ul><ul><...
Lessons learnt <ul><li>Number of terms suggested depends on </li></ul><ul><ul><li>Type of facet </li></ul></ul><ul><ul><li...
Next? <ul><li>High value facets still need control </li></ul><ul><ul><li>Make use of the metadata(!) </li></ul></ul><ul><u...
<ul><li>Thanks to the IA team for their analysis work; </li></ul><ul><ul><li>Jon Carey </li></ul></ul><ul><ul><li>Adil Hus...
Thank you Questions or comments? Karen Loasby [email_address]
Upcoming SlideShare
Loading in …5
×

The growing pains of a controlled vocabulary

1,603 views
1,521 views

Published on

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,603
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The growing pains of a controlled vocabulary

    1. 1. The growing pains of a controlled vocabulary
    2. 2. Introduction <ul><li>Karen Loasby </li></ul><ul><li>Information architect </li></ul><ul><li>Worked for BBC for 4 years on search, navigation, metadata and content management projects </li></ul><ul><li>2 years previously for the Guardian newspaper archiving the paper and arranging content on the website </li></ul><ul><li>MSc in Information Science from City University, London </li></ul>
    3. 3. Agenda <ul><li>Background </li></ul><ul><li>The problem </li></ul><ul><li>Formal classification vs. Folk tags </li></ul><ul><li>Our middle ground </li></ul><ul><li>What happened </li></ul><ul><li>Learning points </li></ul><ul><li>Questions </li></ul>
    4. 4. Background <ul><li>Content management project </li></ul><ul><li>Regional websites </li></ul><ul><li>Need for metadata </li></ul><ul><li>Authors around the UK </li></ul>
    5. 6. Problem <ul><li>Faceted classification system </li></ul><ul><li>Authors to tag </li></ul><ul><li>Central control </li></ul><ul><li>But … </li></ul><ul><li>Journalists are the specialists – know the domain and the vocabulary. </li></ul>
    6. 7. Formal classification <ul><li>Pre-determined terms </li></ul><ul><li>Centralised control </li></ul><ul><li>Rich relationships </li></ul>
    7. 8. Folk tags <ul><li>What it is then? </li></ul><ul><li>Folksonomy, ethnoclassification, social classification, social categorisation and so on </li></ul>
    8. 9. Comparing approaches <ul><li>Formal </li></ul><ul><li>High maintenance </li></ul><ul><li>Consistent/predictable </li></ul><ul><li>Rich relationships </li></ul><ul><li>Can be artificial </li></ul><ul><li>Folk </li></ul><ul><li>Low maintenance </li></ul><ul><li>Quirky/surprising </li></ul><ul><li>Less added value </li></ul><ul><li>Real user language </li></ul>
    9. 10. A role for both <ul><li>Where we are using folk tagging </li></ul><ul><li>And where we won’t </li></ul><ul><ul><li>Trust & Authority </li></ul></ul><ul><ul><li>High value to business </li></ul></ul><ul><ul><li>Missing motivation from users </li></ul></ul><ul><ul><li>Broad domain/user base </li></ul></ul><ul><ul><li>To avoid tryanny of minority </li></ul></ul>
    10. 11. An experimental middle ground <ul><li>Centralised control of terms </li></ul><ul><li>But encouraging absorption of user language </li></ul><ul><li>Higher maintenance than folk tags </li></ul><ul><li>Cheaper than professional cataloguing </li></ul>
    11. 12. BBC Experience Semi-automatic classification Terms suggested from the CVs Terms are OK The suggested terms do not describe the content Search or browse for terms Send suggestion to the CV team Terms are OK Send suggestion to the CV team CV team evaluate suggestion Say no to the term – change the classification on the content object Add to CV as a variant term or preferred term
    12. 13. Operational system <ul><li>8000 requests in 10 months </li></ul><ul><li>From 160 journalists </li></ul><ul><ul><li>Average per user of 50 terms </li></ul></ul><ul><ul><li>However this varied wildly. Our top user has suggested 476 terms </li></ul></ul>
    13. 14. Graph showing variation between teams
    14. 15. Growth in the CVs <ul><ul><li>Up 15000 terms in 10 months </li></ul></ul><ul><ul><li>Most growth in person/proper names </li></ul></ul><ul><ul><ul><li>People, venues and organisations </li></ul></ul></ul><ul><ul><ul><li>Up by 50% to 35,000 </li></ul></ul></ul>
    15. 16. Growth of facets
    16. 17. Types of terms <ul><li>Mostly good </li></ul><ul><ul><li>Only 200 terms actually rejected </li></ul></ul><ul><li>Synonyms vs. entirely new terms </li></ul><ul><ul><li>New for names (only 2% synonyms) </li></ul></ul><ul><ul><li>Synonyms for subject (15% synonyms) </li></ul></ul><ul><ul><li>Location – needed colloquial terms </li></ul></ul>
    17. 18. Resourcing <ul><li>Handling the requests from journalists </li></ul><ul><li>First 3 months – one IA </li></ul><ul><li>Subsequently 2 to 3 junior IAs </li></ul><ul><li>Too much – how to reduce? </li></ul>
    18. 19. Lessons learned <ul><li>Success with the journalists </li></ul><ul><ul><li>They suggested terms! </li></ul></ul><ul><ul><li>Got the faceted classification </li></ul></ul><ul><ul><li>Began to suggest terms in “our” format </li></ul></ul><ul><ul><li>Some did engage at a detailed level </li></ul></ul>
    19. 20. Lessons Learnt <ul><li>Difficulties for journalists </li></ul><ul><ul><li>System looks as if totally automatic as part of a content management system </li></ul></ul><ul><ul><li>“ Journalists are people too” </li></ul></ul><ul><ul><li>Users struggling with a content object tagging system; rather than page based </li></ul></ul>
    20. 21. Example Subject: Pregnancy
    21. 22. Lessons Learnt <ul><li>Difficulties for journalists, cont. </li></ul><ul><ul><li>They find it boring </li></ul></ul><ul><ul><li>Makes it harder for the aim of “finding and re-use” to apply </li></ul></ul><ul><ul><li>Needed to do more pre-emptive work for them </li></ul></ul>
    22. 23. Lessons learnt <ul><li>Number of terms suggested depends on </li></ul><ul><ul><li>Type of facet </li></ul></ul><ul><ul><li>Dynamism of content </li></ul></ul><ul><ul><li>Scope of the content </li></ul></ul><ul><ul><li>Enthusiasm of users </li></ul></ul>
    23. 24. Next? <ul><li>High value facets still need control </li></ul><ul><ul><li>Make use of the metadata(!) </li></ul></ul><ul><ul><li>Sell the message </li></ul></ul><ul><ul><li>Federated management </li></ul></ul><ul><ul><li>Earlier in production </li></ul></ul><ul><li>And for folk tagging? </li></ul>
    24. 25. <ul><li>Thanks to the IA team for their analysis work; </li></ul><ul><ul><li>Jon Carey </li></ul></ul><ul><ul><li>Adil Hussein </li></ul></ul><ul><ul><li>Christine Rimmer </li></ul></ul>
    25. 26. Thank you Questions or comments? Karen Loasby [email_address]

    ×