SlideShare a Scribd company logo
1 of 25
Download to read offline
Slide 1 
International Semantic Web Conference 
Riva del Garda, Italy, 22.10.2014 
Semantic Web Challenge – Big Data Track 
Extending Tables with Data from 
over a Million Websites 
Oliver Lehmberg, Dominique Ritze, Petar Ristoski, 
Kai Eckert, Heiko Paulheim, Christian Bizer
Slide 2 
Extend a local table with additional columns 
using different types of Web data. 
Region 
Un-employment 
Alsace 11 % 
Lorraine 12 % 
Guadeloupe 28 % 
Centre 10 % 
Martinique 25 % 
GDP 
per Capita 
Population 
Growth 
45.914 € 0,16 % 
51.233 € -0,05 % 
19.810 € 1,34 % 
59.502 € 1,76 % 
NULL 2,64 % 
+ 
Goal
Slide 3 
Operation 1: Extend Local Table with Single Column 
Given a local table and keywords describing the extension 
column, add the extension column to the table and fill it 
with data from the Web. 
„GDP per Capita“ 
Region Unemployment 
Alsace 11 % 
Lorraine 12 % 
Guadeloupe 28 % 
Centre 10 % 
Martinique 25 % 
… … 
GDP per Capita 
45.914 € 
51.233 € 
19.810 € 
59.502 € 
21,527 € 
… 
+
Slide 4 
Operation 2: Extend Local Table with Many Columns 
Given a local table, add all columns to the table that can 
be filled beyond a density threshold. 
Region Unemp. 
Rate 
Alsace 11 % 
Lorraine 12 % 
Guadeloupe 28 % 
Centre 10 % 
Martinique 25 % 
… … 
GDP 
per Capita 
Population 
Growth 
Overseas 
departments 
… 
45.914 € 0,16 % No … 
51.233 € -0,05 % No … 
19.810 € 1,34 % Yes … 
59.502 € NULL NULL … 
NULL 2,64 % Yes … 
… … … 
+ 
density >= 0.8
Slide 5 
Types of Web Data Used 
Microdata 
(schema.org) 
Wiki Tables 
HTML Tables 
Linked Data
Slide 6 
Billion Triple Challenge Dataset 2014 
4 billion triples crawled from 47,000 websites.
Slide 7 
Web Data Commons - Microdata Corpus 
250 million triples from 463,000 websites. 
 Extracted from Common Crawl 2013 web corpus 
 2.2 billion HTML pages from 12.8 million websites 
 Mostly using the schema.org vocabulary 
 Main topics 
 Products 
 Reviews 
 Organisations / LocalBusiness 
 Events 
Download: http://webdatacommons.org/structureddata/
Slide 8 
Web Data Commons – Web Tables Corpus 
Around 1% of all HTML tables contain structured data. 
 we used 35 million English HTML tables. 
 extracted from the Common Crawl 2012 web corpus 
 selected out of 11.2 billion raw tables
Slide 9 
Web Data Commons – Web Tables Corpus 
 Column Statistics 
Column #Tables 
name 4,600,000 
price 3,700,000 
date 2,700,000 
artist 2,100,000 
location 1,200,000 
year 1,000,000 
manufacturer 375,000 
counrty 340,000 
isbn 99,000 
area 95,000 
population 86,000 
 Subject Column Values 
Value #Rows 
usa 135,000 
germany 91,000 
greece 42,000 
new york 59,000 
london 37,000 
athens 11,000 
david beckham 3,000 
ronaldinho 1,200 
oliver kahn 710 
twist shout 2,000 
yellow submarine 1,400 
Download: http://webdatacommons.org/webtables/
Slide 10 
WikiTables 
1.4 million tables from English Wikipedia. 
 extracted by Northwestern University 
 from the 2013 Wikipedia XML dump 
 only tables, no infoboxes 
Download: http://downey-n1.cs.northwestern.edu/public/
Slide 11 
Internal Data Model: Entity-Attributes-Tables 
 One entity per row 
 Subject Column = Name of the entity 
 HTML tables: Most unique string column, break ties by taking leftmost. 
Rank Film Studio Director Length 
1. Star Wars –Episode 1 Lucasfilm George Lucas 121 min 
2. Alien Brandwine Ridley Scott 117 min 
3. Black Moon NEF Louis Malle 100 min 
 Table generation from Linked Data and Microdata 
 generate one table per class and website 
 subject column: rdfs:label, foaf:name, x:name 
 we exploit common vocabularies
Slide 12 
Indexed Tables 
 Selection Conditions: 
1. Minimum size of 3 columns and 5 rows 
2. Subject column detection successful 
 Total # of tables: 36.3 million 
 Total # of PLDs: ~ 1.5 million 
 Total # of triples: 3.0 billion
Slide 13 
The Mannheim Search Joins Engine (MSJE) 
Collection of tables Table Normalization 
Table Storage Table Index 
1. Table 
Indexing 
Input query table Table Preprocessing 
Search 
2. Table 
Search 
3. Data 
Consolidation 
Data collection 
User Preferences 
Consolidation 
MultiJoin Top k Candidates
Slide 14 
The Search Operator 
The Search operator determines the set of relevant Web tables. 
 Table Ranking 
 subject column value overlap 
 extended Jaccard Similarity (FastJoin) 
 Select TopK Tables 
 1000 tables in the single column experiments 
Relevant
Slide 15 
Multi-Join Operator 
The MultiJoin operator performs a series of left-outer joins 
between the query table and all tables in the input set. 
No. Region 
1 Alsace 
2 Lorraine 
3 Guadeloupe 
4 Centre 
Unemploy 
11 % 
12 % 
28 % 
10 % 
Unemploy 
NULL 
NULL 
NULL 
9.4 % 
GDP 
45.914 € 
51.233 € 
NULL 
NULL 
GDP per C 
45.000 € 
NULL 
19.000 € 
59.500 €
Slide 16 
Consolidation Operator 
The consolidation operator merges corresponding columns 
and fuses values in order to return a concise result table. 
 Column Matching 
 Combination of label- and instance-based techniques 
 Conflict Resolution 
 Strings: majority vote 
 Numeric values: average, 
median, clustering and vote 
No Region Unemploy GDP 
1 Alsace 11 % 45.914 € 
2 Lorraine 12 % 51.233 € 
3 Guadelo 
upe 
28 % 19.000 € 
4 Centre 10 % 59.500 €
Slide 17 
http://searchjoins.webdatacommons.org
Slide 18 
Result: Extend with Single Column
Slide 19 
Provenance Summary
Slide 20 
Provenance Details
Slide 21 
Evaluation Results 
Author Head‐quarter 
Industry Area Capital Code Currency Popu‐lation 
Ingre‐dient 
Cast Director Genre Year Artist Team 
Book Company Country Drug Film Song Soccer 
Player 
100% 
80% 
60% 
40% 
20% 
0% 
coverage 93% 94% 94% 100% 100% 100% 94% 100% 87% 94% 97% 97% 96% 99% 88% 
precision 96% 96% 94% 95% 100% 94% 96% 64% 89% 85% 97% 86% 97% 95% 67% 
Coverage: Percentage of entities for which a value was found. 
Precision: Manually evaluated using Wikipedia, IMDB, Amazon.
Slide 22 
Result: Extend with Many Columns 
505 columns are added 
and filled with data from 2071 tables.
Slide 23 
Provenance Summary
Slide 24 
Provenance Details for “area (sq. km)”
Search Joins bring together Web Search and DB Joins. 
Slide 25 
Conclusion 
 The prototype shows that simple queries are feasible. 
 The Web is one application domain for search joins, 
corporate intranets are the other. 
 The overlooked Big Data Vs: Variety and Veracity

More Related Content

What's hot

2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortarOpen Analytics
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
 
Building an open democracy with open data +
Building an open democracy with open data + Building an open democracy with open data +
Building an open democracy with open data + Irina Bolychevsky
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...Data Beers
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open DataDongpo Deng
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
20180226 data driven smart governance
20180226 data driven smart governance20180226 data driven smart governance
20180226 data driven smart governanceDongpo Deng
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 

What's hot (20)

Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
 
Building an open democracy with open data +
Building an open democracy with open data + Building an open democracy with open data +
Building an open democracy with open data +
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open Data
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
20180226 data driven smart governance
20180226 data driven smart governance20180226 data driven smart governance
20180226 data driven smart governance
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 

Viewers also liked

Infographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode LabelingInfographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode LabelingLoftware
 
The Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and TraceabilityThe Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and TraceabilityBlytheco
 
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...Monika Solanki
 
Realising the Potential of Algal Biomass Production through Semantic Web an...
Realising the Potential of Algal Biomass Production   through Semantic Web an...Realising the Potential of Algal Biomass Production   through Semantic Web an...
Realising the Potential of Algal Biomass Production through Semantic Web an...Monika Solanki
 
Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012Monika Solanki
 
Querying Linked Data and Büchi automata
Querying Linked Data and Büchi automataQuerying Linked Data and Büchi automata
Querying Linked Data and Büchi automataKonstantinos Giannakis
 
From Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked dataFrom Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked dataMonika Solanki
 
The potential role of open data in supply chain integration
The potential role of open data in supply chain integrationThe potential role of open data in supply chain integration
The potential role of open data in supply chain integrationChristopher Brewster
 
The Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food NetworksThe Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food NetworksChristopher Brewster
 
Representing Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of DataRepresenting Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of DataMonika Solanki
 
The curious case of Blockchain Technology
The curious case of Blockchain TechnologyThe curious case of Blockchain Technology
The curious case of Blockchain TechnologyRitesh Mehrotra
 
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
Linked data driven EPCIS Event-based Traceability across  Supply chain busine...Linked data driven EPCIS Event-based Traceability across  Supply chain busine...
Linked data driven EPCIS Event-based Traceability across Supply chain busine...Monika Solanki
 
Detecting EPCIS exceptions in linked traceability streams across supply cha...
Detecting   EPCIS exceptions in linked traceability streams across supply cha...Detecting   EPCIS exceptions in linked traceability streams across supply cha...
Detecting EPCIS exceptions in linked traceability streams across supply cha...Monika Solanki
 
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...Monika Solanki
 
Global Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, ShanghaiGlobal Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, ShanghaiDeborah Weinswig
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked DataHyun Namgoong
 
Linked data driven EPCIS Event based Traceability across Supply chain busine...
Linked data driven EPCIS Event based Traceability across  Supply chain busine...Linked data driven EPCIS Event based Traceability across  Supply chain busine...
Linked data driven EPCIS Event based Traceability across Supply chain busine...Monika Solanki
 
Linking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processesLinking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processesMonika Solanki
 
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build BrandsHow Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build BrandsCognizant
 
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...Monika Solanki
 

Viewers also liked (20)

Infographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode LabelingInfographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode Labeling
 
The Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and TraceabilityThe Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and Traceability
 
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
 
Realising the Potential of Algal Biomass Production through Semantic Web an...
Realising the Potential of Algal Biomass Production   through Semantic Web an...Realising the Potential of Algal Biomass Production   through Semantic Web an...
Realising the Potential of Algal Biomass Production through Semantic Web an...
 
Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012
 
Querying Linked Data and Büchi automata
Querying Linked Data and Büchi automataQuerying Linked Data and Büchi automata
Querying Linked Data and Büchi automata
 
From Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked dataFrom Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked data
 
The potential role of open data in supply chain integration
The potential role of open data in supply chain integrationThe potential role of open data in supply chain integration
The potential role of open data in supply chain integration
 
The Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food NetworksThe Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food Networks
 
Representing Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of DataRepresenting Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of Data
 
The curious case of Blockchain Technology
The curious case of Blockchain TechnologyThe curious case of Blockchain Technology
The curious case of Blockchain Technology
 
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
Linked data driven EPCIS Event-based Traceability across  Supply chain busine...Linked data driven EPCIS Event-based Traceability across  Supply chain busine...
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
 
Detecting EPCIS exceptions in linked traceability streams across supply cha...
Detecting   EPCIS exceptions in linked traceability streams across supply cha...Detecting   EPCIS exceptions in linked traceability streams across supply cha...
Detecting EPCIS exceptions in linked traceability streams across supply cha...
 
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
 
Global Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, ShanghaiGlobal Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, Shanghai
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
Linked data driven EPCIS Event based Traceability across Supply chain busine...
Linked data driven EPCIS Event based Traceability across  Supply chain busine...Linked data driven EPCIS Event based Traceability across  Supply chain busine...
Linked data driven EPCIS Event based Traceability across Supply chain busine...
 
Linking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processesLinking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processes
 
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build BrandsHow Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
 
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
 

Similar to Extending Tables with Data from over a Million Websites

Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Chris Bizer
 
Unevenly Distributed
Unevenly DistributedUnevenly Distributed
Unevenly DistributedC4Media
 
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe RocherINTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocherapidays
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web CorpusRobert Meusel
 
Text Mining with Rapid Miner.
Text Mining with Rapid Miner.Text Mining with Rapid Miner.
Text Mining with Rapid Miner.Gurdal Ertek
 
Text Mining with RapidMiner
Text Mining with RapidMinerText Mining with RapidMiner
Text Mining with RapidMinerertekg
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)Peter Presnell
 
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...doenz
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualizationbigdataviz_bay
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docxendawalling
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docxwkyra78
 
Daniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days OcDaniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days OcDaniel Egan
 
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEARASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEARDon Dooley
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?Örjan Lundberg
 

Similar to Extending Tables with Data from over a Million Websites (20)

Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)
 
Unevenly Distributed
Unevenly DistributedUnevenly Distributed
Unevenly Distributed
 
Asp Aug2008
Asp Aug2008Asp Aug2008
Asp Aug2008
 
Diadem 1.0
Diadem 1.0Diadem 1.0
Diadem 1.0
 
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe RocherINTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
 
Text Mining with Rapid Miner.
Text Mining with Rapid Miner.Text Mining with Rapid Miner.
Text Mining with Rapid Miner.
 
Text Mining with RapidMiner
Text Mining with RapidMinerText Mining with RapidMiner
Text Mining with RapidMiner
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)
 
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docx
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docx
 
Daniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days OcDaniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days Oc
 
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEARASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
 

More from Chris Bizer

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?Chris Bizer
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesChris Bizer
 
Using the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingChris Bizer
 
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebChris Bizer
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Chris Bizer
 
Exploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesChris Bizer
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Chris Bizer
 

More from Chris Bizer (7)

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
 
Using the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product Matching
 
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 
Exploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web Tables
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
 

Recently uploaded

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 

Recently uploaded (20)

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 

Extending Tables with Data from over a Million Websites

  • 1. Slide 1 International Semantic Web Conference Riva del Garda, Italy, 22.10.2014 Semantic Web Challenge – Big Data Track Extending Tables with Data from over a Million Websites Oliver Lehmberg, Dominique Ritze, Petar Ristoski, Kai Eckert, Heiko Paulheim, Christian Bizer
  • 2. Slide 2 Extend a local table with additional columns using different types of Web data. Region Un-employment Alsace 11 % Lorraine 12 % Guadeloupe 28 % Centre 10 % Martinique 25 % GDP per Capita Population Growth 45.914 € 0,16 % 51.233 € -0,05 % 19.810 € 1,34 % 59.502 € 1,76 % NULL 2,64 % + Goal
  • 3. Slide 3 Operation 1: Extend Local Table with Single Column Given a local table and keywords describing the extension column, add the extension column to the table and fill it with data from the Web. „GDP per Capita“ Region Unemployment Alsace 11 % Lorraine 12 % Guadeloupe 28 % Centre 10 % Martinique 25 % … … GDP per Capita 45.914 € 51.233 € 19.810 € 59.502 € 21,527 € … +
  • 4. Slide 4 Operation 2: Extend Local Table with Many Columns Given a local table, add all columns to the table that can be filled beyond a density threshold. Region Unemp. Rate Alsace 11 % Lorraine 12 % Guadeloupe 28 % Centre 10 % Martinique 25 % … … GDP per Capita Population Growth Overseas departments … 45.914 € 0,16 % No … 51.233 € -0,05 % No … 19.810 € 1,34 % Yes … 59.502 € NULL NULL … NULL 2,64 % Yes … … … … + density >= 0.8
  • 5. Slide 5 Types of Web Data Used Microdata (schema.org) Wiki Tables HTML Tables Linked Data
  • 6. Slide 6 Billion Triple Challenge Dataset 2014 4 billion triples crawled from 47,000 websites.
  • 7. Slide 7 Web Data Commons - Microdata Corpus 250 million triples from 463,000 websites.  Extracted from Common Crawl 2013 web corpus  2.2 billion HTML pages from 12.8 million websites  Mostly using the schema.org vocabulary  Main topics  Products  Reviews  Organisations / LocalBusiness  Events Download: http://webdatacommons.org/structureddata/
  • 8. Slide 8 Web Data Commons – Web Tables Corpus Around 1% of all HTML tables contain structured data.  we used 35 million English HTML tables.  extracted from the Common Crawl 2012 web corpus  selected out of 11.2 billion raw tables
  • 9. Slide 9 Web Data Commons – Web Tables Corpus  Column Statistics Column #Tables name 4,600,000 price 3,700,000 date 2,700,000 artist 2,100,000 location 1,200,000 year 1,000,000 manufacturer 375,000 counrty 340,000 isbn 99,000 area 95,000 population 86,000  Subject Column Values Value #Rows usa 135,000 germany 91,000 greece 42,000 new york 59,000 london 37,000 athens 11,000 david beckham 3,000 ronaldinho 1,200 oliver kahn 710 twist shout 2,000 yellow submarine 1,400 Download: http://webdatacommons.org/webtables/
  • 10. Slide 10 WikiTables 1.4 million tables from English Wikipedia.  extracted by Northwestern University  from the 2013 Wikipedia XML dump  only tables, no infoboxes Download: http://downey-n1.cs.northwestern.edu/public/
  • 11. Slide 11 Internal Data Model: Entity-Attributes-Tables  One entity per row  Subject Column = Name of the entity  HTML tables: Most unique string column, break ties by taking leftmost. Rank Film Studio Director Length 1. Star Wars –Episode 1 Lucasfilm George Lucas 121 min 2. Alien Brandwine Ridley Scott 117 min 3. Black Moon NEF Louis Malle 100 min  Table generation from Linked Data and Microdata  generate one table per class and website  subject column: rdfs:label, foaf:name, x:name  we exploit common vocabularies
  • 12. Slide 12 Indexed Tables  Selection Conditions: 1. Minimum size of 3 columns and 5 rows 2. Subject column detection successful  Total # of tables: 36.3 million  Total # of PLDs: ~ 1.5 million  Total # of triples: 3.0 billion
  • 13. Slide 13 The Mannheim Search Joins Engine (MSJE) Collection of tables Table Normalization Table Storage Table Index 1. Table Indexing Input query table Table Preprocessing Search 2. Table Search 3. Data Consolidation Data collection User Preferences Consolidation MultiJoin Top k Candidates
  • 14. Slide 14 The Search Operator The Search operator determines the set of relevant Web tables.  Table Ranking  subject column value overlap  extended Jaccard Similarity (FastJoin)  Select TopK Tables  1000 tables in the single column experiments Relevant
  • 15. Slide 15 Multi-Join Operator The MultiJoin operator performs a series of left-outer joins between the query table and all tables in the input set. No. Region 1 Alsace 2 Lorraine 3 Guadeloupe 4 Centre Unemploy 11 % 12 % 28 % 10 % Unemploy NULL NULL NULL 9.4 % GDP 45.914 € 51.233 € NULL NULL GDP per C 45.000 € NULL 19.000 € 59.500 €
  • 16. Slide 16 Consolidation Operator The consolidation operator merges corresponding columns and fuses values in order to return a concise result table.  Column Matching  Combination of label- and instance-based techniques  Conflict Resolution  Strings: majority vote  Numeric values: average, median, clustering and vote No Region Unemploy GDP 1 Alsace 11 % 45.914 € 2 Lorraine 12 % 51.233 € 3 Guadelo upe 28 % 19.000 € 4 Centre 10 % 59.500 €
  • 18. Slide 18 Result: Extend with Single Column
  • 21. Slide 21 Evaluation Results Author Head‐quarter Industry Area Capital Code Currency Popu‐lation Ingre‐dient Cast Director Genre Year Artist Team Book Company Country Drug Film Song Soccer Player 100% 80% 60% 40% 20% 0% coverage 93% 94% 94% 100% 100% 100% 94% 100% 87% 94% 97% 97% 96% 99% 88% precision 96% 96% 94% 95% 100% 94% 96% 64% 89% 85% 97% 86% 97% 95% 67% Coverage: Percentage of entities for which a value was found. Precision: Manually evaluated using Wikipedia, IMDB, Amazon.
  • 22. Slide 22 Result: Extend with Many Columns 505 columns are added and filled with data from 2071 tables.
  • 24. Slide 24 Provenance Details for “area (sq. km)”
  • 25. Search Joins bring together Web Search and DB Joins. Slide 25 Conclusion  The prototype shows that simple queries are feasible.  The Web is one application domain for search joins, corporate intranets are the other.  The overlooked Big Data Vs: Variety and Veracity