Structured Data in Web Search

Structured Data on the Web
Alon Halevy
Google
May 23, 2014
Joint work with: Jayant Madhavan, Cong Yu, Fei Wu, Hongrae Lee, Warren Shen
Anish Das Sarma, Rahul Gupta, Boulos Harb, Zack Ives, Afshin Rostamizadeh,
Sree Balakrishnan, Anno Langen, Steven Whang, Mohamed Yahya, and others

Structured Data in Search Results

Set Queries
Chicago restaurants

The Knowledge Graph
Knowledge Graph
Brazil
Brasilia

Query Reformulation
Knowledge Graph
Brazil
Brasilia
Brazil capital
What is the capital of
Brazil
“Google, tell me the
capital of brazil”
 Brazil nuts
 Culture of Brazil
 “Google, will Brazil
win the world cup?”

Other Sources of Data
Knowledge Graph
Brazil
Brasilia
Brazil capital
The population of Brasilia is
2207718 according to the
GeoNames geographical
database
Tables Text

Answer Queries Directly from Web?
Brazil capital
database
Tables Text
Knowledge Graph
Brazil
Brasilia

The Web vs. the Knowledge Graph

Tables, Tables
Brazil capital
database
Tables Text
Knowledge Graph
Brazil
Brasilia
Fusion Tables: Enabling a broad range
of users to create tabular content
WebTables: Finding good HTML
tables on the Web

• City planning
• Sustainability: water, coffee, …
• Crisis response
• Advancing public discourse (e.g., gun control)
• Data philanthropy – corporations encouraged
to contribute data to the good of society.

Background for Coffee Examples

Fusion Tables
google.com/fusiontables
[SIGMOD 2010, SIGMOD 2012]
• Goal: an easy-to-use database system that is
integrated with the Web.
• Key: support common workflows
– Easy upload (CSV, KML, spreadsheets)
– Sharing (even outside your company)
– Visualizations front and center
– Easy publishing
• Goal 2: Fusion in the data cloud -- discover
others’ data and combine with yours.

Big Data for Regular People
Table Facts:
English poverty rates:
32,000 wards with a total of 1.8
million vertices
Colors indicate poverty levels
2011 Rioting:
2100 incidents
Colors indicate addresses of
Rioting and Rioters
Best UK Internet Journalist
Knight-Batten Award for
Innovations in Journalism

Join with Population Data:
What is a City?

Big Data Integration
Table Facts:
Texas Counties 2010 Census:
254 counties with 543000 vertices
Colored based on various demographics
See SIGMOD 2012 paper for details on scaling map visualizations

Search Engine for Data Sets
research.google.com/tables
[VLDB 2008, 2011, 2014]

Long Term Goal:
A Data-Guided Decision Engine
• Support decision making:
– Healthcare debate
– Should I install solar in my house?
– Which charity should I contribute to?
• Show relevant data
– Expose facets of the decision and enable drilldown
– Show opposing views
• Manually curated examples of decision engines:
– Justfacts.com, followthemoney.com, decide.com

HTML Lists
See Elmeleegy et al., VLDB 2009

Tree Search
Amish quilts
Parking tickets in India
Horses
The Deep Web [Madhavan et al., VLDB 2008]

Other Sources of Data
• Spreadsheets
• CSV files
• Tables embedded in PDF
• XML, RDF
• Visualizations
• Online databases (Fusion Tables, Tableau, …)
Each source has its particularities, but most
problems are common to all.

Data Optimized for Page Layout

Tabular Data Optimized for Site Layout
See [Ling et al, IJCAI 2013] for stitching tables within a site.

The Big Challenge
• Analyze natural language text as it pertains to
structured data.
• Different from (open) information extraction
that builds databases entirely from text.
• Good news: natural language parsing
technology is now scalable.

First Step: Annotating Columns
[Venetis et al., VLDB 2011]

Step 2: Understanding Relationships

Dictionary of Attributes
• I want the list of all attributes that countries
may have.
• Freebase doesn’t have coffee production.
• Is this an ontology?
– Not quite! I want an ontology suited for search.

Biperpedia:
[VLDB 2014]
Ontology for Search Applications

Comparing to Freebase Coverage

Tower of Babel: Internet Style
In 2013, the coffee
production of El Salvador
dropped by 20% due to the
coffee rust disease.
Coffee production el salvador 2013
El Salvador exports coffee 2013
Knowledge Graph
Tables Text

Conclusions
• This was a talk about Big Data:
– Millions of people creating data sets
– Billions of people seeing the data being impacted
• Get out there and find your favorite
application.
• Dreams do come true:
– At least as it pertains to structured data on the
Web!

References
• Fusion Tables: SIGMOD 2010, 2012
• WebTables: VLDB 2008, 2009, 2011

Structured Data in Web Search

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Structured Data in Web Search

Similar to Structured Data in Web Search (20)

More from eXascale Infolab

More from eXascale Infolab (20)

Recently uploaded

Recently uploaded (20)

Structured Data in Web Search