Wikipedia as controlled vocabulary
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Wikipedia as controlled vocabulary

on

  • 14,080 views

The Essentials of Metadata and Taxonomy - Henry Stewart Event...

The Essentials of Metadata and Taxonomy - Henry Stewart Event

The Next Wave: Using Wikipedia as a Controlled Vocabulary

* Leveraging an online resource for internal use
* Integrating pre-existing unique identifications numbers (UIDs)
* Inherited relations
* Capturing and cataloging
* Risks and remedies
Chris Sizemore BBC Future Technology & Media and Silver Oliver, BBC Future Technology & Media

Statistics

Views

Total Views
14,080
Views on SlideShare
13,920
Embed Views
160

Actions

Likes
18
Downloads
173
Comments
0

8 Embeds 160

http://blockslabpillar.com 127
http://www.slideshare.net 19
http://jacksonmedeiros.wordpress.com 7
https://123.writeboard.com 2
http://translate.googleusercontent.com 2
https://twitter.com 1
http://twitter.com 1
http://tweetedtimes.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Wikipedia as controlled vocabulary Presentation Transcript

  • 1. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary
  • 2. I’m about ‘Victorians’
  • 3. BBC Topic Page I’m about ‘Victorians’ Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2
  • 4. BBC Topic Page I’m about ‘Victorians’ viktorianisch V 잊도 r 이안 Ελληνικά NY Times, flickr, wikipedia Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2
  • 5. An index language exists primarily to:
  • 6.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
  • 7.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
    • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • 8.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
    • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
    • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
  • 9.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
    • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
    • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
    F.W. Lancaster Vocabulary control for information retrieval
  • 10. Could Wikipedia be used as a universal language for identifying subjects?
  • 11. Story of Wikipedia-as-CV
  • 12. Story of Wikipedia-as-CV: personal origins
  • 13.  
  • 14. Story of Wikipedia-as-CV: personal origins We needed a system to categorise movie & TV reviews
  • 15. Story of Wikipedia-as-CV: personal origins So of course we built a categorisation system from scratch -- including its own controlled vocab
  • 16. Story of Wikipedia-as-CV: personal origins And when people saw the system, they always said: “Hey, that reminds me of Internet Movie Database…”
  • 17.  
  • 18. Story of Wikipedia-as-CV: personal origins It struck me that the way Internet Movie Database is set up isn’t dissimilar to the structure of a thesaurus or a very flat taxonomy…
  • 19. Story of Wikipedia-as-CV: personal origins But its’s one where the emphasis is on “related to”, not broader/narrower, synonym, antonym, etc
  • 20. Story of Wikipedia-as-CV: personal origins From then, I couldn’t help but be drawn to websites where the structure is clearly:
  • 21. Story of Wikipedia-as-CV: personal origins From then, I couldn’t help but be drawn to websites where the structure is clearly: “ a single primary Concept per page -- and pages for related Concepts link to each other”
  • 22. Story of Wikipedia-as-CV: personal origins Could those “one Concept per page” webpages be used as “terms” as in a controlled vocabulary?
  • 23. Are some websites actually “ indexing languages” in disguise?
  • 24. conText -- a Wikipedia-as-CV auto-categoriser prototype
  • 25.  
  • 26. conText -- a Wikipedia-as-CV auto-categoriser prototype: http://sells.welcomebackstage.com:5000/item/submit
  • 27.  
  • 28. Demo of conText -- a Wikipedia-as-CV auto-categoriser prototype
  • 29. Demo of conText -- a Wikipedia-as-CV auto-categoriser prototype: Take text from audience!
  • 30. Wikipedia is already being used across the Web as a form of subject identification & disambiguation, in a grassroots way:
  • 31. Wikipedia is already being used across the Web as a form of subject identification & disambiguation, in a grassroots way: in the form of hyperlinks embedded by authors in blog posts, news articles, music reviews, etc everywhere!
  • 32. http://en.wikipedia.org/wiki/British http://en.wikipedia.org/wiki/Science_fiction http://en.wikipedia.org/wiki/BBC http://en.wikipedia.org/wiki/Time_travel http://en.wikipedia.org/wiki/Dr_who http://en.wikipedia.org/wiki/Tardis
  • 33. These days, by convention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other page”, you are more likely giving a definition to a concept referred to in your content…
  • 34. These days, by convention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other page”, you are more likely giving a definition to a concept referred to in your content… Also used in this way for specific domains are Internet Movie Database (for films & TV programmes), MySpace (for bands), Amazon (for books), etc
  • 35. For general knowledge, though, Wikipedia is becoming the Web’s defacto controlled vocabulary
  • 36. http://en.wikipedia.org/wiki/Heerlen http://en.wikipedia.org/wiki/Beethoven http://en.wikipedia.org/wiki/Amsterdam http://en.wikipedia.org/wiki/Van_Gogh_Museum
  • 37.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
    • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
    • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
    F.W. Lancaster Vocabulary control for information retrieval
  • 38. Wikipedia pages provide the best scope notes in the world
  • 39. Wikipedia pages provide the best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community
  • 40. Wikipedia pages provide the best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community Each concept represents a consensus view and its meaning can be understood simply by reading the associated Wikipedia page
  • 41. Wikipedia pages provide the best scope notes in the world For each Concept, the document edit history, discussion around concept definition, & debate is important here…
  • 42.  
  • 43.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
    • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
    • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
    F.W. Lancaster Vocabulary control for information retrieval
  • 44. So, we can tag pretty accurately semi-automatically with globally unique subject identifiers using this approach… So what?
  • 45. So, we can tag pretty accurately semi-automatically with globally unique subject identifiers using this approach… So what? Un-silo your content repository quickly and cheaply, by connecting it to the Web via Wikipedia
  • 46.  
  • 47.  
  • 48.  
  • 49.  
  • 50. Now playing vs. the Web
  • 51.  
  • 52.  
  • 53. Now playing vs. the Web Why not bring in BBC Archive materials to this service via Wikipedia-as-CV tagging and linked data bridge between Wikipedia & MusicBrainz?
  • 54.  
  • 55.  
  • 56. By using Wikipedia-as-CV, you can get your repository onto this diagram quickly, for free
  • 57.  
  • 58.
    • An index language exists primarily to:
    • Allow an indexer to represent the subject matter of documents in a consistent way
    • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
    • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
    F.W. Lancaster Vocabulary control for information retrieval
  • 59. A Web-scale, globally accessible index language accidentally exists:
  • 60.
    • A Web-scale, globally accessible index language accidentally exists:
    • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
  • 61.
    • A Web-scale, globally accessible index language accidentally exists:
    • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
    • It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
  • 62.
    • A Web-scale, globally accessible index language accidentally exists:
    • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
    • It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
    • It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
  • 63.
    • A Web-scale, globally accessible index language accidentally exists:
    • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
    • It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
    • It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
    • It adds Web-scale navigation & cross-reference possibilities
  • 64. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary
  • 65. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary
  • 66. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary
  • 67. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary Much thanks! Questions, comments, & constructive criticism?
  • 68. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary http://flickr.com/photos/deniscollette/1817034358/