Leveraging the semantic web meetup, Semantic Search, Schema.org and more


Published on

A history and description of the adoption of Semantic Search by the major search and social engines. Covers schema.org, the knowledege graph and status to date (july 30, 2013). Presented From a Search Engine Point of View.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Leveraging the semantic web meetup, Semantic Search, Schema.org and more

  1. 1. Leveraging the Semantic Web, Schema.org, Semantic Search and more San Diego Semantic Web Meetup By: Barbara Starr Twitter: @BarbaraStarr Email: bstarr@algebraixData.com
  2. 2. • Pursued a doctorate in Artificial Intelligence from South Africa in the 80's. • Recruited to build intelligent/predictive trading systems on Wall Street • Migrated to government-based contracts, several of which turned into real world products like – SIRI (PAL from DARPA) – WATSON (Acquaint - IBM Watson Labs was a team member) • From the vantage of a semantic technologist, I keenly watched the evolution of the Semantic Web. • “Shocked into the real world” when working as a consultant @ Overstock • Today – SVP Product management AlgebraixData Meta Information ME By: Barbara Starr Twitter: @BarbaraStarr Email: bstarr@algebraixData.com Linkedin: http://www.linkedin.com/in/barbarastarr My favorite author: Isaac Asimov Favorite book: I Robot Favorite character: MULTIVAC
  3. 3. Additional Metainformation For the purpose of this talk: same-as MY ROBOT or Artificially Intelligent Entity or Search Engine OWL I explain things from a Search Engine Point of View! 
  4. 4. SEARCH ENGINE POINT OF VIEW How can I exploit metadata or “semantic search”??
  5. 5. SEARCH ENGINE POINT OF VIEW RICH SNIPPETS 2009 tiles Searchmonkey 2008 I can directly extract information to enhance SERP displays
  6. 6. SEARCH ENGINE POINT OF VIEW I can search directly on consumed metadata!
  7. 7. SEARCH ENGINE POINT OF VIEW I can provide direct answers to queries by searching on consumed, verified and validated information
  8. 8. SEARCH ENGINE POINT OF VIEW I can even aggregate answers or deduce them (like a timeline of events)
  9. 9. SEARCH ENGINE POINT OF VIEW I can even use it in conjunction with machine learning techniques- to eg. Train other components I can detect relevancy signals: i.e what content to show to what audience I can use it to Assist in interpreting a user query Penn Treebank tagset ?
  10. 10. SEARCH ENGINE POINT OF VIEW Really interesting in terms of exposing long tail content too. It makes things findable for me when pages are published with structured markup! I meant the beer brewer in Arizona
  11. 11. SEARCH ENGINE POINT OF VIEW I’m a Search Engine Robot I could really use this stuff. And it is like the tower of babel out there! Microdata Microformats RDFa Multiple conflicting vocabularies that I will have to align internally and multiple syntax formats as well. Prior to Schema.org (.e. June 2011) Goodrelations for e-commerce ?
  12. 12. SEARCH ENGINE POINT OF VIEW Time to get Serious!
  13. 13. What has been the history? Percentage of URLs with embedded metadata in various formats Five-fold increase between March, 2009 and October, 2010 Another five-fold increase between October 2010 and January, 2012 RDFa exploded in 2012 – Source Peter Mika - Yahoo
  14. 14. Current state of metadata on the Web • 31% of webpages, 5% of domains contain some metadata – Analysis of the Bing Crawl (US crawl, January, 2012) – RDFa is most common format • By URL: 25% RDFa, 7% microdata, 9% microformat • By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat – Adoption is stronger among large publishers • Especially for RDFa and microdata • See also – P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012 – H.Mühleisen, C.Bizer.Web Data Commons - Extracting Structured Data from Two Large Web Corpora, LDOW 2012
  15. 15. Prolific growth of the LOD Cloud
  16. 16. Timeline of RDFa and Semantic Web Adoption As of Semtech 2011 Inevitable passage of Semantic Web adoption – culminating in schema.org
  17. 17. SEARCH ENGINE POINT OF VIEW Align and consume many vocabularies that may not be of interest to search engines? Rather mandate vocabulary And Syntax - microdata A Search Engine alliance has the power to MANDATE vocabulary and syntax! Initial alliance: Google, Yahoo, Bing. Then Yandex and subsequently Pinterest
  18. 18. Sample portion
  19. 19. SEARCH ENGINE POINT OF VIEW On the other hand – Not wise to ignore standards bodies like W3C No mandate on Syntax
  20. 20. SEARCH ENGINE POINT OF VIEW Did I tell you I don’t like spam?
  21. 21. SEARCH ENGINE POINT OF VIEW Make sure you are not cloaking by feeding one set of information to me and another to human users! Ensure your data feeds match information with the structured markup or “metadata” on your web pages.
  22. 22. Your Logo SEARCH ENGINE POINT OF VIEW Serving RELEVANT ANSWERS are IMPERATIVE! & central to my very being!
  25. 25. SEARCH ENGINE POINT OF VIEW Adding context in search verticals really helps me serve up relevant information (Seriously increases my recall), as does geospatial information. Consumed information - Structured Data Dashboard Google’s “SearchVerticals” Notice any correlations? I would advise you to!
  26. 26. OH! and be sure to check out Moores law SEARCH ENGINE POINT OF VIEW I also have a pretty good understanding of big data and web intelligence so I can leverage them! SIRI “Amazing fact: same amount of computing to answer one Google Search query as all the computing done -- in flight and on the ground -- for the entire Apollo program!
  27. 27. SEARCH ENGINE POINT OF VIEW I can leverage metadata for better image search SIRI I can combine it with computer vision techniques. I can enhance user’s shopping experience.
  28. 28. SEARCH ENGINE POINT OF VIEW ? Know rather than Recognize? INTRODUCING THE KNOWLEDGE GRAPH Symbolic reasoning vs stochastic reasoning (Latter is more like NLP or page rank)
  29. 29. SEARCH ENGINE POINT OF VIEW ♫ Folks finding answers on my page never even have to click through to yours! And speaking of the knowledge graph or knowledge carousel! I can even now start to derive associations or relationships between entities.
  30. 30. SEARCH ENGINE POINT OF VIEW Check out this great highlighter. The information is available only to me and not to any other search or social engines! Can you believe I have been accused of hijacking semantic markup? I find it so helpful that I would really like to be able to keep all that validated verified information to myself!
  31. 31. SEARCH ENGINE POINT OF VIEW And extended my data highlighter to include the following types of entities (check your webmaster tools for this) I have since created the structured markup helper! And added support for JSON-LD as well as microdata)
  32. 32. SEARCH ENGINE POINT OF VIEW They are also leveraging it in their newly released graph search! Not only that, they are even building an entity graph not dissimilar from my knowledge graph! My social counterparts have been leveraging structured markup (rdfa) for their opengraph protocol for quite some time. The Open Graph Protocol enables you to integrate your Web pages into the social graph Example of crowdsourced entity graph info source - places
  33. 33. SEARCH ENGINE POINT OF VIEW My social counterparts ought to have a field day in terms of both targeted advertising and in creating engaging user experiences by leveraging their more recent innovations.
  34. 34. SEARCH ENGINE POINT OF VIEW Knowledge Graphs are now ubiquitous, and the term has become common vernacular! LINKED IN SNAPSHOTS ADDED PUBMED Knowledge Graph Knowledge Graph
  35. 35. SEARCH ENGINE POINT OF VIEW I am starting to use hashtags in search so I can merge topics and entities in graphs, like some of my social counterparts! LINKED IN SNAPSHOTS ADDED PUBMED Knowledge Graph Knowledge Graph
  36. 36. SEARCH ENGINE POINT OF VIEW I am even now measuring my trending “entities” in my top charts, rather than “strings”.
  38. 38. SEARCH ENGINE POINT OF VIEW Check the list to see what is coming out next! Schema.org is dynamic and is growing! Mark up information not yet consumed by search engines to get the advantage of extra lift when it is adopted.
  39. 39. SEARCH ENGINE POINT OF VIEW Thank you for your time!  And just a bye-the-bye, this technology is still in it’s nascent stages. Can you imagine what I will be able to do soon? Barbara Starr Email: bstarr@AlgebraixData Twitter: @BarbaraStarr Resources to help you! Make sure to use them wisely! Remember, if you want to make the search engines happy, put yourself in their shoes! PageRank is now only 1 of over 200 signals that Google uses!
  40. 40. Resources at this point in time Caveat: Some training may be required for some of the tools Programming Languages: JavaSCript: Microdatajs Live microdata Php: Microdataphp Ruby: RDF Microdata RDF Lib plugin PerlRuby: RDF Microdata Gem Mida Java: Sindice any23 library Publishing Form Based tools: Schema Creator Microdata generator Standalone tools Web.instadata Editors: Topbraid Composer Protege Platforms: Drupal Joomla Wordpress (about 7 of them) Virtuoso Topbraid Composer Validators, Testers and More Check.rdfa.info Sindice Inspector Rich Snippets Testing Tool Bing Validator Structured data Linter Online Parser?viewer and RSS generator Validator.nu Google Structured Data Tester
  41. 41. Goodrelations Resources …… Goodrelations: Resources, generators, validators, more, ….
  42. 42. More Resources From the mouth of
  43. 43. Other Semantic Web Resources OpenCalais – Can extract information about people, places and things AlchemyAPI – named entity extraction, topic recognition, keyword tagging, more …. Cogito – Expert System Franz Inc. – Gruff Pool Party JSON-LD playground YAHOO! Glimmer Many More…. Barbara Starr Twitter: @BarbaraStarr Email: bstarr@algebraixdata.com Linkedin: http://www.linkedin.com/in/barbarastarrFor more info contact: Caveat: Some training may be required for some of the tools Topbraid Composer
  44. 44. By Barbara Starr Twitter: @BarbaraStarr Linkedin :http://www.linkedin.com/in/barbarastarr E-mail : bstarr@algebraixdata.com Bye for now