and One Hundred Years of Search


Published on

A talk from London SemWeb meetup hosted by the BBC Academy in London, Mar 30 2012.

Video of the talk:

See also!/kansandhaus/status/185064835694862337

Published in: Technology, Education
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide and One Hundred Years of Search

  1. and One Hundred Years of Search Libraries, Media and The Semantic Web BBC Academy, March 28th 2012, London Dan Brickley <>Friday, March 30, 2012
  2. In 20 minutes • Introduce you to the initiative • Revisit the Web before the Web of 1912 • Use this to describe whats new with, ... and the practical choices we face when scaling to billions of users and pagesFriday, March 30, 2012
  3. Intro: Dan Brickley • Ex-W3C, helped start Semantic Web project • Worked on RDF/S, FOAF, SKOS & other standards around W3C • Currently working on <> project • See also <>, @danbriFriday, March 30, 2012
  4. Back to 1912Friday, March 30, 2012
  5. Friday, March 30, 2012
  6. ■ The Republic of China is proclaimed. ■ Albert Berry makes the first parachute jump from a moving airplane. ■ Prague Party Conference: Vladimir Lenin and the Bolshevik Party break away from the rest of the Russian Social Democratic Labour Party. ■ France establishes a protectorate over Morocco. ■ RMS Titanic strikes an iceberg in the northern Atlantic Ocean. ■ Paramount Pictures, the oldest American motion picture studio still in operation, is founded ■ Albania declares independence from the Ottoman Empire. ■ First Balkan War ■ Alan Turing, British mathematician is born ■ Semantic search over structured data goes mainstream, in Belgium. source:, March 30, 2012
  7. Credit and thanks: W. Boyd RaywardFriday, March 30, 2012
  8. Sample queries from 1912Moteur Diesel. Philosophie des mathematiques. Les pecheriesau Maroc et sur la cote dEspagne. Finances Bulgares.Gyroscope. Culte de feu. Motocolture (garden). Evolution dela dent humaine. Emigration italienne. Casier civil. Chemin defer de bagdad (railroad...). Planete Mars. Suffrage universel.Nevrose traumatique. Eugenism. Le saumon; Saumonsmanques et repeches. Boomerang. Fabrication del lacyanamide. Emigration des Juifs. Intoxications par le tabac.Quantite dhuile dolive importee en Belgique. Jurisprudencedes compagnies dassurances en Angleterre, Hollande etDanemark...Friday, March 30, 2012
  9. Friday, March 30, 2012
  10. Search before search • Paul Otlet, "the man who dreamed the Internet", http:// • "The International Centre organises collections of world-wide importance. These collections are the International Museum, the International Library, the International Bibliographic Catalogue and the Universal Documentary Archives. These collections are conceived as parts of one universal body of documentation, as an encyclopedic survey of human knowledge, as an enormous intellectual warehouse of books, documents, catalogues and scientific objects." • Start at for full whole storyFriday, March 30, 2012
  11. Libraries, media & ...? • Universal Decimal Classification (UDC) used in many 1000s of libraries today • In BBC archive for 40 years, as Lonclass • Shows the challenge and promise of structured description • So whats in Lonclass? Whats not in Lonclass!Friday, March 30, 2012
  12. Friday, March 30, 2012
  13. Friday, March 30, 2012
  14. Friday, March 30, 2012
  15. Friday, March 30, 2012
  16. Friday, March 30, 2012
  17. Friday, March 30, 2012
  18. Friday, March 30, 2012
  19. Friday, March 30, 2012
  20. Friday, March 30, 2012
  22. Compositional Semantics • 656.881:301.162.721 “LETTERS OF APOLOGY” • 656.881 “LETTERS (POSTAL SERVICES)” • 656.881:06.022.6 “RESIGNATION LETTERS” • 654.192.731TV-AM “TV AM (TELEVISION AM)” (this work pre-dated modern linguistics, never mind computing...)Friday, March 30, 2012
  23. Archives and classification • Lonclass tells a story of the world; of this country at least; and a lot about the rest • It is huge - 1000s of terms, composite sentence-like codes, and rather sparse • It began with UDC in 1890s, and remains key to BBCs media archives even todayFriday, March 30, 2012
  24. Friday, March 30, 2012
  25. And now for something new.Friday, March 30, 2012
  26. • Search engine collaboration: • Google, Bing,Yahoo! & Yandex • Simple factual data for better search • Launched June 2011, schema • 300 classes, 261 properties & growing • discussions: W3C WebSchemas groupFriday, March 30, 2012
  27. Example: Google Rich Snippets From: See also Yandexs, March 30, 2012
  28. On IMDB: <div id="content-2-wide" itemscope itemtype=""> <div class="txt-block"> <h4 class="inline">Stars:</h4> <a onclick="(new Image()).src=/rg/title-overview/star-1/images/b.gif?link=%2Fname %2Fnm0010930%2F;" href="/name/nm0010930/" itemprop="actors">Douglas Adams</a>, <a onclick="(new Image()).src=/rg/title-overview/star-2/images/b.gif?link=%2Fname %2Fnm0048982%2F;" href="/name/nm0048982/" itemprop="actors">Tom Baker</a> and <a onclick="(new Image()).src=/rg/title-overview/star-3/images/b.gif?link=%2Fname %2Fnm3035100%2F;" href="/name/nm3035100/" itemprop="actors">Hans Peter Brondmo</ a> </div> <div class="star-box" itemprop="aggregateRating" itemscope itemtype=""> Linked Data: see for markup describing Douglas Adams as a (jobTitle, birthDate, description, performerIn, ...).Friday, March 30, 2012
  29. What’s in the schema? • Classes (types) e.g. LocalBusiness, Person, Organization,VideoObject, TVSeries... • Properties (attributes) e.g. openingHours, transcript, productionCompany, streetAddress • That’s all - a dictionary of terms, used for annotating data within normal Web pagesFriday, March 30, 2012
  30. CreativeWork event UserInteraction LocalBusiness intangible place Organization CivicStructure LandformFriday, March 30, 2012
  31. Another example:Friday, March 30, 2012
  32. <div itemscope itemtype=""> <span itemprop="name">GreatFood</span> <div itemprop="address" itemscope itemtype=""> <span itemprop="streetAddress">1901 Lemur Ave</span> <span itemprop="addressLocality">Sunnyvale</span>, <span itemprop="addressRegion">CA</span> <span itemprop="postalCode">94086</span> </div> <span itemprop="telephone">(408) 714-1489</span> <a itemprop="url" href=""></a> Hours: <meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am-2:30pm <meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm-9:30pm <meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm-10:00pm Categories: <span itemprop="servesCuisine">Middle Eastern</span>, <span itemprop="servesCuisine">Mediterranean</span></div>Friday, March 30, 2012
  33. scope • In-page structured data for search • Not asking an unconstrained “so, how do we describe cars?”, but “how can we improve markup on existing pages that describe cars?” (or Comics, SoftwareApps, Sports, ...) • Simplify publisher/webmaster experience • Record agreements between search engines • Central use case: augmented search resultsFriday, March 30, 2012
  34. Friday, March 30, 2012
  35. and UDC • In many ways the opposite of UDC • Small (by contrast), pragmatic, Web-based • Yet by Semantic Web standards and culture, it is a big centralised schema • The art is finding ways to decentralise without creating chaos • We dont want to re-invent UDC, or Wikipedia; but integrate such things into simple descriptive templates for searchFriday, March 30, 2012
  36. Lots missing! e.g. sports • Current vocabulary emphasizes points of interest on a map and sporting activities rather than sports content as entertainment • We also have terms to describe videos, TV shows etc., ...but no sports-specifics yet • How deep to go? How to integrate with existing vocabulary? How to identify players, teams, kinds of football? Video clips for that hand of God goal?Friday, March 30, 2012
  37. Job postings (done), rNews(done), Comics, Learning, ScholaryArticle, Software, Events, Genealogy, Real Estate, eCommerce, Health, Sports, Transport,Vehicles, Comments, Datasets, Bio, ... (+bugfixes, integration, ...)Friday, March 30, 2012
  38. Everything overlaps * • We added JobPosting; what if the job was sports-related? • Were adding educational markup; does it help describe sports education, training? • Is there a sports perspective on the health/ medical vocabulary were working on? • Cant coordinate everything! Pragmatism... * intertwingularityFriday, March 30, 2012
  39. Practicalities • Delegation to external sources for enumerations and detail • e.g. country codes from UN FAO or Wikipedia/DBpedia/Wikidata • We don’t want to create big enumerations • all the countries? sports? things that go on maps? • Decentralised subclassing & property valuesFriday, March 30, 2012
  40. Process • Search partners retain ultimate oversight • W3C hosts community group, discussion, wiki and proposal tracking • Web Schemas group - planning monthly telecons at W3C, based around proposals • Evolving, pragmatic, collaborativeFriday, March 30, 2012
  41. Compositional Semantics revisited • If we have SportsCentre and Karate, we can we describe a Karate Club? • If we have recipes vocab, and medical vocab, and restaurants, can we describe allergy free food? • If UN have country codes, Wikipedia list religions, ... then we just re-use thoseFriday, March 30, 2012
  42. And libraries • If the library world share their controlled vocabularies as open SKOS linked data • ...can we plug them directly into descriptions? • of videos? news? scholarly articles? (yes) • Why re-invent when you can collaborate?Friday, March 30, 2012
  43. WebSchemas public-vocabs list • process • Looking for rough consensus and incremental improvements • Realistic examples, simplicity for publishers, and re-use of existing vocabulary are important • <>Friday, March 30, 2012
  44. Friday, March 30, 2012