Successfully reported this slideshow.
Your SlideShare is downloading. ×

Smx advanced-william-slawski-final

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 22 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Smx advanced-william-slawski-final (20)

Advertisement

Recently uploaded (20)

Smx advanced-william-slawski-final

  1. 1. #SMX #15A @bill_slawski Advanced Technical SEO: Schema & Structured Data, JavaScript Schema, Structured Data & Scattered Databases Such as the World Wide Web
  2. 2. #SMX #15A @bill_slawski Sergey Brin at the Web 2.0 Conference 2005. Credit: James Duncan Davidson/O'Reilly Media, Inc. Source: https://www.flickr.com/photos/x180/50329318/in/set-1076331/Inventor of DIPRE (DIPRE - Dual Iterative Pattern Relation Expansion))
  3. 3. #SMX #15A @bill_slawski Extracting Patterns and Relations from Scattered Databases Such as the World Wide Web* http://ilpubs.stanford.edu:8090/421/1/1999-65.pdf *A provisional patent filed by Sergey Brin on March 10, 1999
  4. 4. #SMX #15A @bill_slawski The Vision Behind Brin’s DIPRE If these chunks of information could be extracted from the World Wide Web and integrated into a structured form, they would form an unprecedented source of information.
  5. 5. #SMX #15A @bill_slawski Google Maps: A Proof of Concept Semantic Database Generating structured information http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2= HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum. htm&r=1&f=G&l=50&s1=7,788,293.PN.&OS=PN/7,788,293&RS =PN/7,788,293
  6. 6. #SMX #15A @bill_slawski Structured Data Collected About Local Entities •Name, •Phone number, •Address, •Business hours, •Reservations policy, •Parking availability, •Acceptable payment options, •Other information
  7. 7. #SMX #15A @bill_slawski Share these #SMXInsights on your social channels! #SMXInsights  Example bullet text – Example bullet text #SMXInsights 1. Brin’s 1999 Dipre Algorithm Extracted Patterns & Relations from the Web 2. Google Maps Did the Same for Local Entities
  8. 8. #SMX #15A @bill_slawski Table Search at Google https://research.google.com/tables?hl=en&ei=vKQBW8idBcLVpgOe3KXQDw&q=longest+wooden+pier+in+California Query: What is the longest Wooden Pier in California?
  9. 9. #SMX #15A @bill_slawski The WebTables Project at Google Because each relational table has its own “schema” of labeled and typed columns, each such table can be considered a small structured database. The resulting corpus of databases is larger than any other corpus we are aware of, by at least five orders of magnitude.
  10. 10. #SMX #15A @bill_slawski The Webtables Project Google Experimental Table Search WebTables: Exploring the Power of Tables on the Web Applying WebTables in Practice - Research - Google Introducing Structured Snippets, now a part of Google Web Search
  11. 11. #SMX #15A @bill_slawski Share these #SMXInsights on your social channels! #SMXInsights  Example bullet text – Example bullet text #SMXInsights 1. The Webtables Project Learns Semantics From Data Tables Across the Web 2. Relational Tables are Considered Small Structured Databases 3. Tables that do well in Table Search may lead to Structured Snippets (Needs Testing!)
  12. 12. #SMX #15A @bill_slawski Using the Web as a Database In 2005, Google publipublished a blog post
  13. 13. #SMX #15A @bill_slawski Using the Web as a Database In 2005, Google publipublished a blog post “With the Knowledge Graph, we’re continuing to go beyond keyword matching to better understand the people, places and things you care about.”
  14. 14. #SMX #15A @bill_slawski Using the Web as a Database In 2005, Google publipublished a blog post From https://en.wikipedia.org/wiki/Poland
  15. 15. #SMX #15A @bill_slawski Question-Answer Queries Identifying entities using search results
  16. 16. #SMX #15A @bill_slawski Schema Markup I http://schema.org/TouristAttraction
  17. 17. #SMX #15A @bill_slawski Schema Extensions I https://www.gs1.org/1/smart-search-demo/
  18. 18. #SMX #15A @bill_slawski More Schema Extensions I http://www.edmcouncil.org/financialbusiness
  19. 19. #SMX #15A @bill_slawski Crowdsourcing Ontologies with Biperpedia We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names. Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text. ~ Biperpedia: An Ontology for Search Applications
  20. 20. #SMX #15A @bill_slawski Schema Resources 1. Semantic Search Marketing – Aaron Bradley’s Google+ Community. 2. Schema.org Extensions – Learn About How Extensions work 3. Schema.org Community Group – A Place to discuss changes to Schema
  21. 21. #SMX #15A @bill_slawski Share these #SMXInsights on your social channels! #SMXInsights  Example bullet text – Example bullet text #SMXInsights 1. Schema Extensions are an opportunity for Growth in many industries 2. Ontologies built from Query Streams like Biperpedia are Crowdsourced & Optimized for Search
  22. 22. LEARN MORE: UPCOMING @SMX EVENTS THANK YOU! SEE YOU AT THE NEXT #SMX Bill Slawski SEO by the Sea Go Fish Digital #SMX #15A @bill_slawski

Editor's Notes

  • My role in this session is to introduce Structured Data and Schema at Google.
  • Google is possibly best known for the PageRank Algorithm invented by founder Lawrence Page, whom it is named after. What appears to be the second patent filed by someone at Google was the DIPRE (Dual interative pattern relation expansion) patent, invented and filed by Sergey Brin. He didn’t name it after himself (Brinrank) like Page did with PageRank.
  • The provisional patent filed for this invention was the whitepaper, “Extracting Patterns and Relations from Scattered Databases such as the World Wide Web.” The process behind it is set out in the paper, and it involves a list of 5 books, titles, their authors, Puiblishers, Year published. Unlike PageRank, it doesn’t involve crawling webpages, and indexing links from Page to page and anchor text. Instead, it involves collecting facts from page to page, and when it finds pages that contain properties and attributes from these five books, it is supposed to collect similar facts about other books on the same site. And once it has completed, it is supposed to move on to other sites and look for those same 5 books, and collect more books. The idea is to eventually know where all the books are on the Web, and facts about those books, that could be used to answer questions about them.
  • This is where we see Google being concerned about structured data on the web, and how helpful knowing about it could be.
  • When I first started out doing inhouse SEO, it was for a Delaware incorporation business, and geography was an important part of the queries that my pages were found for. I had started looking at patents, and ones such as this one on “Generating Structured Data” caught my attention. It focused upon collecting data about local entities, or local businesses, and properties related to those.
  • If you’ve heard of NAP consistency, and of mentions being important to local search, it is because Local search was focusing upon collecting structured data that could be used to answer questions about businesses. Patents about location prominence followed, which told us that a link counted as a mention, and a patent on local authority, which determined which Website was the authoritative one for a business. But, it seemed to start with collecting structured data about businesses at places.
  • So, the Dipre Algorithm focused upon crawling the web to find facts, and Google Maps built that into an approach that could be used to rank places and answer questions about them.
  • If you haven’t had a chance to use Google’s experimental table search, it is worth trying out. It can answer questions to find answers from data-based tables across the web, such as “what is the longest wooden pier in California”, which is the one in Oceanside, a town next to the one I live in. It is from a Webtables project at Google.
  • Database fields are sometimes referred to as schema, and table headers which tell us what kind of data is in a table column may also be referred to as “schema”. A data-based webtable could be considered a small structured database, and Google’s Webtable project found that there was a lot of information that could be found in web tables on the Web.
  • Try out the first link above when you get the chance, and do some searches on Google’s table search. The second paper is one that described the WebTables project when it first started out, and the one that follows it describes dome of the things that Google researchers learned from the Project. We’ve seen Structured Snippets like the one above grabbing facts to include in a snippet (in this case from a data table on the Wikipedia page about the Oceanside Piet)
  • When a data table column contains the same data that another table contains, and the first doesn’t have a table header label, it might learn a lable from the second table (and this is considered a way to learn semantics or meaning from tables) These are truly scattered databases across the World Wide Web, but though the use of crawlers, that information can be collected and become useful, like the DIPRE Algorithm described.
  • In 2005, the Official Google Blog published this short story, which told us about Google sometimes answering direct questions in response to queries at the top of Web results. I don’t remember when these first stated appearing, but do remember Definition results about a year earlier, which you could type out “Define:” and a word or ask “What is” before a word and Google would show a definition, and there was a patent that described how they were finding definitions from glossary pages, and how to ideally set up those glossaries, so that your definitions might be the ones that end up as responses.
  • In 2012, Google introduced the knowledge Graph, which told us that they would be focusing upon learning about specific people, places and things, and answering questions about those instead of just continuing to match keywords in queries to keywords in documents. They told us that this was a move to things instead of strings. Like the books in Brin’s DIPRE or Local Entities in Google Maps.
  • We could start using the Web as a scattered database, with questions and answers from places such as Wikipedia tables helping to answer queries such as “What is the capital of Poland”
  • And Knowledge bases such as Wikipedia, Freebase, IMDB and Yahoo Finance could be the sources of facts about properties and attributes about things such as movies and actors and businesses where Google could find answers to queries without having to find results that had the same keywords in the document as the query.
  • In 2011, The Schema.org site was launched as a joint project from Google, Yahoo, Bing, and Yandex, that provided machine readable text that could be added to web pages. This text is provided in a manner that is machine readable only, much like XML sitemaps are intended to be machine readable, to provide an alternative channel of information to search engines about the entities pages re about, and the properties and attributes on those pages.
  • While Schema was introduced in 2011, it was built to be extendable, and to let subject matter experts be able to add new schema, like this extension from GS1 (the intentors of bar codes in brick and mortar stores) If you haven’t tried out this demo from them, it is worth getting your hands on to see what is possible.
  • Another Schema Extension is one from the Financial Industry Business Ontology, which can be used on sites for banks and other organizationst that provide financial services. If you are in an industry that doesn’t have much in the way of Schema developed for it, that may not be a problem as much as it might be an opportunity waiting to happen.
  • In 2014, Google published their Biperpedia paper, which tells us about how they might create ontologies from Query streams (sessions about specific topics) by finding terms to extract data from the Web about. At one point in time, Search engines would do focused crawls of the web starting at sources such as DMOZ, so that the Index of the Web they were constructing contained pages about a wide range of categories. By using query stream information, they are crowdsourcing the building of resources to build otologies about. This paper tells us that Biperpedia enabled them to build otologies that were larger than what they had developed through Freebase, which may be partially why Freebase was replaced by wikidata.
  • The Google+ group I’ve linked to above has members who work on Schema from Google, such as Dan Brickley, who is the head of schema for Google. Learning about extensions is a good idea, especially if you might consider participating in building new ones, and the community group has a mailing list, which lets you see and participate in discussions about the growth of Schema

×