#SMX #15A @bill_slawski
Advanced Technical SEO: Schema & Structured Data, JavaScript
Schema,
Structured Data &
Scattered
Databases Such
as the World Wide
Web
#SMX #15A @bill_slawski
Sergey Brin at the Web 2.0 Conference 2005.
Credit: James Duncan Davidson/O'Reilly Media, Inc.
Source: https://www.flickr.com/photos/x180/50329318/in/set-1076331/Inventor of DIPRE (DIPRE - Dual Iterative Pattern Relation Expansion))
#SMX #15A @bill_slawski
Extracting Patterns and Relations from Scattered Databases
Such as the World Wide Web*
http://ilpubs.stanford.edu:8090/421/1/1999-65.pdf
*A provisional patent filed by Sergey Brin on March 10, 1999
#SMX #15A @bill_slawski
The Vision Behind Brin’s DIPRE
If these chunks of information could be extracted from
the World Wide Web and integrated into a structured form, they would form an
unprecedented source of information.
#SMX #15A @bill_slawski
Google Maps: A Proof of Concept Semantic Database
Generating structured information
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=
HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.
htm&r=1&f=G&l=50&s1=7,788,293.PN.&OS=PN/7,788,293&RS
=PN/7,788,293
#SMX #15A @bill_slawski
Structured Data Collected About Local Entities
•Name,
•Phone number,
•Address,
•Business hours,
•Reservations policy,
•Parking availability,
•Acceptable payment options,
•Other information
#SMX #15A @bill_slawski
Share these #SMXInsights on your social channels!
#SMXInsights
 Example bullet text
– Example bullet text
#SMXInsights
1. Brin’s 1999 Dipre Algorithm Extracted
Patterns & Relations from the Web
2. Google Maps Did the Same for Local
Entities
#SMX #15A @bill_slawski
Table Search at Google
https://research.google.com/tables?hl=en&ei=vKQBW8idBcLVpgOe3KXQDw&q=longest+wooden+pier+in+California
Query: What is the longest Wooden Pier in California?
#SMX #15A @bill_slawski
The WebTables Project at Google
Because each relational table has its own “schema” of labeled and typed
columns, each such table can be considered a small structured
database.
The resulting corpus of databases is larger
than any other corpus we are aware of, by at least five orders
of magnitude.
#SMX #15A @bill_slawski
The Webtables Project
Google Experimental Table Search
WebTables: Exploring the Power of Tables on the Web
Applying WebTables in Practice - Research - Google
Introducing Structured Snippets, now a part of Google Web Search
#SMX #15A @bill_slawski
Share these #SMXInsights on your social channels!
#SMXInsights
 Example bullet text
– Example bullet text
#SMXInsights
1. The Webtables Project Learns Semantics From
Data Tables Across the Web
2. Relational Tables are Considered Small
Structured Databases
3. Tables that do well in Table Search may
lead to Structured Snippets (Needs Testing!)
#SMX #15A @bill_slawski
Using the Web as a Database
In 2005, Google publipublished a blog post
#SMX #15A @bill_slawski
Using the Web as a Database
In 2005, Google publipublished a blog post
“With the Knowledge
Graph,
we’re continuing to go
beyond keyword matching
to better understand the
people, places and things
you care about.”
#SMX #15A @bill_slawski
Using the Web as a Database
In 2005, Google publipublished a blog post
From https://en.wikipedia.org/wiki/Poland
#SMX #15A @bill_slawski
Question-Answer Queries
Identifying entities using search results
#SMX #15A @bill_slawski
Schema Markup
I http://schema.org/TouristAttraction
#SMX #15A @bill_slawski
Schema Extensions
I https://www.gs1.org/1/smart-search-demo/
#SMX #15A @bill_slawski
More Schema Extensions
I http://www.edmcouncil.org/financialbusiness
#SMX #15A @bill_slawski
Crowdsourcing Ontologies with Biperpedia
We describe Biperpedia, an ontology with 1.6M (class, attribute)
pairs and 67K distinct attribute names.
Biperpedia extracts attributes from the query stream, and then uses
the best extractions to seed attribute extraction from text.
~ Biperpedia: An Ontology for Search Applications
#SMX #15A @bill_slawski
Schema Resources
1. Semantic Search Marketing – Aaron Bradley’s Google+ Community.
2. Schema.org Extensions – Learn About How Extensions work
3. Schema.org Community Group – A Place to discuss changes to
Schema
#SMX #15A @bill_slawski
Share these #SMXInsights on your social channels!
#SMXInsights
 Example bullet text
– Example bullet text
#SMXInsights
1. Schema Extensions are an opportunity for
Growth in many industries
2. Ontologies built from Query Streams like
Biperpedia are Crowdsourced & Optimized for
Search
LEARN MORE: UPCOMING @SMX EVENTS
THANK YOU!
SEE YOU AT THE NEXT #SMX
Bill Slawski
SEO by the Sea
Go Fish Digital
#SMX #15A
@bill_slawski

Smx advanced-william-slawski-final

  • 1.
    #SMX #15A @bill_slawski AdvancedTechnical SEO: Schema & Structured Data, JavaScript Schema, Structured Data & Scattered Databases Such as the World Wide Web
  • 2.
    #SMX #15A @bill_slawski SergeyBrin at the Web 2.0 Conference 2005. Credit: James Duncan Davidson/O'Reilly Media, Inc. Source: https://www.flickr.com/photos/x180/50329318/in/set-1076331/Inventor of DIPRE (DIPRE - Dual Iterative Pattern Relation Expansion))
  • 3.
    #SMX #15A @bill_slawski ExtractingPatterns and Relations from Scattered Databases Such as the World Wide Web* http://ilpubs.stanford.edu:8090/421/1/1999-65.pdf *A provisional patent filed by Sergey Brin on March 10, 1999
  • 4.
    #SMX #15A @bill_slawski TheVision Behind Brin’s DIPRE If these chunks of information could be extracted from the World Wide Web and integrated into a structured form, they would form an unprecedented source of information.
  • 5.
    #SMX #15A @bill_slawski GoogleMaps: A Proof of Concept Semantic Database Generating structured information http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2= HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum. htm&r=1&f=G&l=50&s1=7,788,293.PN.&OS=PN/7,788,293&RS =PN/7,788,293
  • 6.
    #SMX #15A @bill_slawski StructuredData Collected About Local Entities •Name, •Phone number, •Address, •Business hours, •Reservations policy, •Parking availability, •Acceptable payment options, •Other information
  • 7.
    #SMX #15A @bill_slawski Sharethese #SMXInsights on your social channels! #SMXInsights  Example bullet text – Example bullet text #SMXInsights 1. Brin’s 1999 Dipre Algorithm Extracted Patterns & Relations from the Web 2. Google Maps Did the Same for Local Entities
  • 8.
    #SMX #15A @bill_slawski TableSearch at Google https://research.google.com/tables?hl=en&ei=vKQBW8idBcLVpgOe3KXQDw&q=longest+wooden+pier+in+California Query: What is the longest Wooden Pier in California?
  • 9.
    #SMX #15A @bill_slawski TheWebTables Project at Google Because each relational table has its own “schema” of labeled and typed columns, each such table can be considered a small structured database. The resulting corpus of databases is larger than any other corpus we are aware of, by at least five orders of magnitude.
  • 10.
    #SMX #15A @bill_slawski TheWebtables Project Google Experimental Table Search WebTables: Exploring the Power of Tables on the Web Applying WebTables in Practice - Research - Google Introducing Structured Snippets, now a part of Google Web Search
  • 11.
    #SMX #15A @bill_slawski Sharethese #SMXInsights on your social channels! #SMXInsights  Example bullet text – Example bullet text #SMXInsights 1. The Webtables Project Learns Semantics From Data Tables Across the Web 2. Relational Tables are Considered Small Structured Databases 3. Tables that do well in Table Search may lead to Structured Snippets (Needs Testing!)
  • 12.
    #SMX #15A @bill_slawski Usingthe Web as a Database In 2005, Google publipublished a blog post
  • 13.
    #SMX #15A @bill_slawski Usingthe Web as a Database In 2005, Google publipublished a blog post “With the Knowledge Graph, we’re continuing to go beyond keyword matching to better understand the people, places and things you care about.”
  • 14.
    #SMX #15A @bill_slawski Usingthe Web as a Database In 2005, Google publipublished a blog post From https://en.wikipedia.org/wiki/Poland
  • 15.
    #SMX #15A @bill_slawski Question-AnswerQueries Identifying entities using search results
  • 16.
    #SMX #15A @bill_slawski SchemaMarkup I http://schema.org/TouristAttraction
  • 17.
    #SMX #15A @bill_slawski SchemaExtensions I https://www.gs1.org/1/smart-search-demo/
  • 18.
    #SMX #15A @bill_slawski MoreSchema Extensions I http://www.edmcouncil.org/financialbusiness
  • 19.
    #SMX #15A @bill_slawski CrowdsourcingOntologies with Biperpedia We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names. Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text. ~ Biperpedia: An Ontology for Search Applications
  • 20.
    #SMX #15A @bill_slawski SchemaResources 1. Semantic Search Marketing – Aaron Bradley’s Google+ Community. 2. Schema.org Extensions – Learn About How Extensions work 3. Schema.org Community Group – A Place to discuss changes to Schema
  • 21.
    #SMX #15A @bill_slawski Sharethese #SMXInsights on your social channels! #SMXInsights  Example bullet text – Example bullet text #SMXInsights 1. Schema Extensions are an opportunity for Growth in many industries 2. Ontologies built from Query Streams like Biperpedia are Crowdsourced & Optimized for Search
  • 22.
    LEARN MORE: UPCOMING@SMX EVENTS THANK YOU! SEE YOU AT THE NEXT #SMX Bill Slawski SEO by the Sea Go Fish Digital #SMX #15A @bill_slawski

Editor's Notes

  • #2 My role in this session is to introduce Structured Data and Schema at Google.
  • #3 Google is possibly best known for the PageRank Algorithm invented by founder Lawrence Page, whom it is named after. What appears to be the second patent filed by someone at Google was the DIPRE (Dual interative pattern relation expansion) patent, invented and filed by Sergey Brin. He didn’t name it after himself (Brinrank) like Page did with PageRank.
  • #4 The provisional patent filed for this invention was the whitepaper, “Extracting Patterns and Relations from Scattered Databases such as the World Wide Web.” The process behind it is set out in the paper, and it involves a list of 5 books, titles, their authors, Puiblishers, Year published. Unlike PageRank, it doesn’t involve crawling webpages, and indexing links from Page to page and anchor text. Instead, it involves collecting facts from page to page, and when it finds pages that contain properties and attributes from these five books, it is supposed to collect similar facts about other books on the same site. And once it has completed, it is supposed to move on to other sites and look for those same 5 books, and collect more books. The idea is to eventually know where all the books are on the Web, and facts about those books, that could be used to answer questions about them.
  • #5 This is where we see Google being concerned about structured data on the web, and how helpful knowing about it could be.
  • #6 When I first started out doing inhouse SEO, it was for a Delaware incorporation business, and geography was an important part of the queries that my pages were found for. I had started looking at patents, and ones such as this one on “Generating Structured Data” caught my attention. It focused upon collecting data about local entities, or local businesses, and properties related to those.
  • #7 If you’ve heard of NAP consistency, and of mentions being important to local search, it is because Local search was focusing upon collecting structured data that could be used to answer questions about businesses. Patents about location prominence followed, which told us that a link counted as a mention, and a patent on local authority, which determined which Website was the authoritative one for a business. But, it seemed to start with collecting structured data about businesses at places.
  • #8 So, the Dipre Algorithm focused upon crawling the web to find facts, and Google Maps built that into an approach that could be used to rank places and answer questions about them.
  • #9 If you haven’t had a chance to use Google’s experimental table search, it is worth trying out. It can answer questions to find answers from data-based tables across the web, such as “what is the longest wooden pier in California”, which is the one in Oceanside, a town next to the one I live in. It is from a Webtables project at Google.
  • #10 Database fields are sometimes referred to as schema, and table headers which tell us what kind of data is in a table column may also be referred to as “schema”. A data-based webtable could be considered a small structured database, and Google’s Webtable project found that there was a lot of information that could be found in web tables on the Web.
  • #11 Try out the first link above when you get the chance, and do some searches on Google’s table search. The second paper is one that described the WebTables project when it first started out, and the one that follows it describes dome of the things that Google researchers learned from the Project. We’ve seen Structured Snippets like the one above grabbing facts to include in a snippet (in this case from a data table on the Wikipedia page about the Oceanside Piet)
  • #12 When a data table column contains the same data that another table contains, and the first doesn’t have a table header label, it might learn a lable from the second table (and this is considered a way to learn semantics or meaning from tables) These are truly scattered databases across the World Wide Web, but though the use of crawlers, that information can be collected and become useful, like the DIPRE Algorithm described.
  • #13 In 2005, the Official Google Blog published this short story, which told us about Google sometimes answering direct questions in response to queries at the top of Web results. I don’t remember when these first stated appearing, but do remember Definition results about a year earlier, which you could type out “Define:” and a word or ask “What is” before a word and Google would show a definition, and there was a patent that described how they were finding definitions from glossary pages, and how to ideally set up those glossaries, so that your definitions might be the ones that end up as responses.
  • #14 In 2012, Google introduced the knowledge Graph, which told us that they would be focusing upon learning about specific people, places and things, and answering questions about those instead of just continuing to match keywords in queries to keywords in documents. They told us that this was a move to things instead of strings. Like the books in Brin’s DIPRE or Local Entities in Google Maps.
  • #15 We could start using the Web as a scattered database, with questions and answers from places such as Wikipedia tables helping to answer queries such as “What is the capital of Poland”
  • #16 And Knowledge bases such as Wikipedia, Freebase, IMDB and Yahoo Finance could be the sources of facts about properties and attributes about things such as movies and actors and businesses where Google could find answers to queries without having to find results that had the same keywords in the document as the query.
  • #17 In 2011, The Schema.org site was launched as a joint project from Google, Yahoo, Bing, and Yandex, that provided machine readable text that could be added to web pages. This text is provided in a manner that is machine readable only, much like XML sitemaps are intended to be machine readable, to provide an alternative channel of information to search engines about the entities pages re about, and the properties and attributes on those pages.
  • #18 While Schema was introduced in 2011, it was built to be extendable, and to let subject matter experts be able to add new schema, like this extension from GS1 (the intentors of bar codes in brick and mortar stores) If you haven’t tried out this demo from them, it is worth getting your hands on to see what is possible.
  • #19 Another Schema Extension is one from the Financial Industry Business Ontology, which can be used on sites for banks and other organizationst that provide financial services. If you are in an industry that doesn’t have much in the way of Schema developed for it, that may not be a problem as much as it might be an opportunity waiting to happen.
  • #20 In 2014, Google published their Biperpedia paper, which tells us about how they might create ontologies from Query streams (sessions about specific topics) by finding terms to extract data from the Web about. At one point in time, Search engines would do focused crawls of the web starting at sources such as DMOZ, so that the Index of the Web they were constructing contained pages about a wide range of categories. By using query stream information, they are crowdsourcing the building of resources to build otologies about. This paper tells us that Biperpedia enabled them to build otologies that were larger than what they had developed through Freebase, which may be partially why Freebase was replaced by wikidata.
  • #21 The Google+ group I’ve linked to above has members who work on Schema from Google, such as Dan Brickley, who is the head of schema for Google. Learning about extensions is a good idea, especially if you might consider participating in building new ones, and the community group has a mailing list, which lets you see and participate in discussions about the growth of Schema