Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary
I’m about ‘Victorians’
BBC Topic Page I’m about ‘Victorians’ Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2
BBC Topic Page I’m about ‘Victorians’ viktorianisch V 잊도 r 이안  Ελληνικά   NY Times, flickr, wikipedia Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2
An index language exists primarily to:
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
Could Wikipedia be used as a universal language for identifying subjects?
Story of Wikipedia-as-CV
Story of Wikipedia-as-CV: personal origins
 
Story of Wikipedia-as-CV: personal origins We needed a system to categorise movie & TV reviews
Story of Wikipedia-as-CV: personal origins So of course we built a categorisation system from scratch -- including its own controlled vocab
Story of Wikipedia-as-CV: personal origins And when people saw the system, they always said: “Hey, that reminds me of Internet Movie Database…”
 
Story of Wikipedia-as-CV: personal origins It struck me that the way Internet Movie Database is set up isn’t dissimilar to the structure of a thesaurus or a very flat taxonomy…
Story of Wikipedia-as-CV: personal origins But its’s one where the emphasis is on “related to”, not broader/narrower, synonym, antonym, etc
Story of Wikipedia-as-CV: personal origins From then, I couldn’t help but be drawn to websites where the structure is clearly:
Story of Wikipedia-as-CV: personal origins From then, I couldn’t help but be drawn to websites where the structure is clearly:  “ a single primary Concept per page --  and pages for related Concepts  link to each other”
Story of Wikipedia-as-CV: personal origins Could those “one Concept per page” webpages be used as “terms” as in a controlled vocabulary?
Are some websites actually  “ indexing languages” in disguise?
conText  -- a Wikipedia-as-CV auto-categoriser prototype
 
conText --   a Wikipedia-as-CV auto-categoriser prototype: http://sells.welcomebackstage.com:5000/item/submit
 
Demo of  conText --   a Wikipedia-as-CV auto-categoriser prototype
Demo of  conText --   a Wikipedia-as-CV auto-categoriser prototype: Take text from audience!
Wikipedia is already being used across the Web as a form of subject identification & disambiguation, in a grassroots way:
Wikipedia is already being used across the Web as a form of subject identification & disambiguation, in a grassroots way:  in the form of hyperlinks  embedded by authors in blog posts, news articles, music reviews, etc everywhere!
http://en.wikipedia.org/wiki/British http://en.wikipedia.org/wiki/Science_fiction http://en.wikipedia.org/wiki/BBC http://en.wikipedia.org/wiki/Time_travel http://en.wikipedia.org/wiki/Dr_who http://en.wikipedia.org/wiki/Tardis
These days, by convention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other page”, you are more likely giving a definition to a concept referred to in your content…
These days, by convention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other page”, you are more likely giving a definition to a concept referred to in your content… Also used in this way for specific domains are Internet Movie Database (for films & TV programmes), MySpace (for bands), Amazon (for books), etc
For general knowledge, though, Wikipedia is becoming the Web’s defacto controlled vocabulary
http://en.wikipedia.org/wiki/Heerlen http://en.wikipedia.org/wiki/Beethoven http://en.wikipedia.org/wiki/Amsterdam http://en.wikipedia.org/wiki/Van_Gogh_Museum
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
Wikipedia pages provide the best scope notes in the world
Wikipedia pages provide the best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community
Wikipedia pages provide the best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community Each concept represents a consensus view and its meaning can be understood simply by reading the associated Wikipedia page
Wikipedia pages provide the best scope notes in the world For each Concept, the document edit history, discussion around concept definition, & debate is important here…
 
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
So, we can tag pretty accurately semi-automatically with globally unique subject identifiers using this approach… So what?
So, we can tag pretty accurately semi-automatically with globally unique subject identifiers using this approach… So what? Un-silo your content repository quickly and cheaply, by connecting it to the Web via Wikipedia
 
 
 
 
Now playing vs. the Web
 
 
Now playing vs. the Web Why not bring in BBC Archive materials to this service via Wikipedia-as-CV tagging and linked data bridge between Wikipedia & MusicBrainz?
 
 
By using  Wikipedia-as-CV,  you can get your repository onto this diagram quickly,  for free
 
An index language exists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
A Web-scale, globally accessible index language accidentally exists:
A Web-scale, globally accessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
A Web-scale, globally accessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
A Web-scale, globally accessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
A Web-scale, globally accessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate It adds Web-scale navigation & cross-reference possibilities
Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary
Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary
Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary
Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary Much thanks! Questions, comments, & constructive criticism?
Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary http://flickr.com/photos/deniscollette/1817034358/

Wikipedia as controlled vocabulary

  • 1.
    Chris Sizemore SilverOliver BBC Wikipedia as controlled vocabulary
  • 2.
  • 3.
    BBC Topic PageI’m about ‘Victorians’ Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2
  • 4.
    BBC Topic PageI’m about ‘Victorians’ viktorianisch V 잊도 r 이안 Ελληνικά NY Times, flickr, wikipedia Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2
  • 5.
    An index languageexists primarily to:
  • 6.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way
  • 7.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • 8.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
  • 9.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
  • 10.
    Could Wikipedia beused as a universal language for identifying subjects?
  • 11.
  • 12.
    Story of Wikipedia-as-CV:personal origins
  • 13.
  • 14.
    Story of Wikipedia-as-CV:personal origins We needed a system to categorise movie & TV reviews
  • 15.
    Story of Wikipedia-as-CV:personal origins So of course we built a categorisation system from scratch -- including its own controlled vocab
  • 16.
    Story of Wikipedia-as-CV:personal origins And when people saw the system, they always said: “Hey, that reminds me of Internet Movie Database…”
  • 17.
  • 18.
    Story of Wikipedia-as-CV:personal origins It struck me that the way Internet Movie Database is set up isn’t dissimilar to the structure of a thesaurus or a very flat taxonomy…
  • 19.
    Story of Wikipedia-as-CV:personal origins But its’s one where the emphasis is on “related to”, not broader/narrower, synonym, antonym, etc
  • 20.
    Story of Wikipedia-as-CV:personal origins From then, I couldn’t help but be drawn to websites where the structure is clearly:
  • 21.
    Story of Wikipedia-as-CV:personal origins From then, I couldn’t help but be drawn to websites where the structure is clearly: “ a single primary Concept per page -- and pages for related Concepts link to each other”
  • 22.
    Story of Wikipedia-as-CV:personal origins Could those “one Concept per page” webpages be used as “terms” as in a controlled vocabulary?
  • 23.
    Are some websitesactually “ indexing languages” in disguise?
  • 24.
    conText --a Wikipedia-as-CV auto-categoriser prototype
  • 25.
  • 26.
    conText -- a Wikipedia-as-CV auto-categoriser prototype: http://sells.welcomebackstage.com:5000/item/submit
  • 27.
  • 28.
    Demo of conText -- a Wikipedia-as-CV auto-categoriser prototype
  • 29.
    Demo of conText -- a Wikipedia-as-CV auto-categoriser prototype: Take text from audience!
  • 30.
    Wikipedia is alreadybeing used across the Web as a form of subject identification & disambiguation, in a grassroots way:
  • 31.
    Wikipedia is alreadybeing used across the Web as a form of subject identification & disambiguation, in a grassroots way: in the form of hyperlinks embedded by authors in blog posts, news articles, music reviews, etc everywhere!
  • 32.
    http://en.wikipedia.org/wiki/British http://en.wikipedia.org/wiki/Science_fiction http://en.wikipedia.org/wiki/BBChttp://en.wikipedia.org/wiki/Time_travel http://en.wikipedia.org/wiki/Dr_who http://en.wikipedia.org/wiki/Tardis
  • 33.
    These days, byconvention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other page”, you are more likely giving a definition to a concept referred to in your content…
  • 34.
    These days, byconvention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other page”, you are more likely giving a definition to a concept referred to in your content… Also used in this way for specific domains are Internet Movie Database (for films & TV programmes), MySpace (for bands), Amazon (for books), etc
  • 35.
    For general knowledge,though, Wikipedia is becoming the Web’s defacto controlled vocabulary
  • 36.
  • 37.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
  • 38.
    Wikipedia pages providethe best scope notes in the world
  • 39.
    Wikipedia pages providethe best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community
  • 40.
    Wikipedia pages providethe best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community Each concept represents a consensus view and its meaning can be understood simply by reading the associated Wikipedia page
  • 41.
    Wikipedia pages providethe best scope notes in the world For each Concept, the document edit history, discussion around concept definition, & debate is important here…
  • 42.
  • 43.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
  • 44.
    So, we cantag pretty accurately semi-automatically with globally unique subject identifiers using this approach… So what?
  • 45.
    So, we cantag pretty accurately semi-automatically with globally unique subject identifiers using this approach… So what? Un-silo your content repository quickly and cheaply, by connecting it to the Web via Wikipedia
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
    Now playing vs.the Web Why not bring in BBC Archive materials to this service via Wikipedia-as-CV tagging and linked data bridge between Wikipedia & MusicBrainz?
  • 54.
  • 55.
  • 56.
    By using Wikipedia-as-CV, you can get your repository onto this diagram quickly, for free
  • 57.
  • 58.
    An index languageexists primarily to: Allow an indexer to represent the subject matter of documents in a consistent way Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate F.W. Lancaster Vocabulary control for information retrieval
  • 59.
    A Web-scale, globallyaccessible index language accidentally exists:
  • 60.
    A Web-scale, globallyaccessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
  • 61.
    A Web-scale, globallyaccessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
  • 62.
    A Web-scale, globallyaccessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
  • 63.
    A Web-scale, globallyaccessible index language accidentally exists: It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate It adds Web-scale navigation & cross-reference possibilities
  • 64.
    Chris Sizemore SilverOliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary
  • 65.
    Chris Sizemore SilverOliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary
  • 66.
    Chris Sizemore SilverOliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary
  • 67.
    Chris Sizemore SilverOliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary Much thanks! Questions, comments, & constructive criticism?
  • 68.
    Chris Sizemore SilverOliver BBC Wikipedia as controlled vocabulary http://flickr.com/photos/deniscollette/1817034358/