SlideShare a Scribd company logo
A Framework for Learning Web
Wrappers from the Crowd
Valter Crescenzi, Paolo Merialdo, Disheng Qiu
Dipartimento di Ingegneria
Università degli Studi Roma Tre
Via della Vasca Navale, 79, Rome
disheng@dia.uniroma3.it
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
1/15
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
DB#Wrapper!
1/15
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
Inference
algorithm!
DB#Wrapper!
1/15
Supervised
Supervised hard to scale
Inference
algorithm!
DB#Wrapper!
1/15
Unsupervised
Unsupervised easier to scale but not accurate
Inference
algorithm!
DB#Wrapper!
1/15
Automatic Annotator
Automatic annotators can not be applied in all cases
Inference
algorithm!
DB#Wrapper!
+"
1/15
• Sample values
• Ontology
• Lexical patterns
Crowdsourcing
An opportunity to scale supervised approaches
Inference
algorithm!
DB#Wrapper!
1/15
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
2/15
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
Non-expert
workers
• Simple interactions to reduce the
worker error rate
• Membership Query (yes/no answer)
2/15
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
Non-expert
workers
• Simple interactions to reduce the
worker error rate
• Membership Query (yes/no answer)
• Active Learning to carefully select
queries
• Dynamic Expressiveness of the
inference language
Costs
2/15
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
Non-expert
workers
• Simple interactions to reduce the
worker error rate
• Membership Query (yes/no answer)
• Active Learning to carefully select
queries
• Dynamic Expressiveness of the
inference language
Costs
2/15
Quality
• Bayesian Model to evaluate the
expected wrapper quality
• Sampling algorithms
ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
Input annotated page (page0):
3/15
ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
Input annotated page (page0):
3/15
ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
Input annotated page (page0):
3/15
ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
Input annotated page (page0):
3/15
ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
Input annotated page (page0):
Is this title the correct one?
3/15
ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
DB#Wrapper!
r1 = /html/table/tr[1]/td/text()
Input annotated page (page0):
Is this title the correct one?
3/15
Membership Query
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
4/15
Yes !
Membership Query
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
• Rules compatible with the answer more
likely to be correct (Bayesian Model)
For each new answer
4/15
Yes !
Membership Query
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
• Rules compatible with the answer more
likely to be correct (Bayesian Model)
For each new answer
• If no rule is good enough:
• a new query is selected (Active Learning)
4/15
Yes !
Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
a rule r is correct:
none of the candidate rules is correct:
Probability that:
P(r|Lk
)
P(R|Lk
)
Bayesian update:
Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
a rule r is correct:
none of the candidate rules is correct:
Probability that:
P(r|Lk
)
P(R|Lk
)
Bayesian update:
Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
a rule r is correct:
none of the candidate rules is correct:
Probability that:
P(r|Lk
)
P(R|Lk
)
Active Learning
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
ALFRED actively selects the queries;
a good policy saves money
6/15
Active Learning
• Random (baseline)
Values are randomly selected
• Entropy
Values are selected by maximizing the Entropy (most uncertain value)
• Greedy
Values are selected by minimizing the queries to confirm the most likely rule
• Lucky
Hybrid approach, it starts with an Entropy algorithm and then switch to Greedy to
confirm the best rule
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
ALFRED actively selects the queries;
a good policy saves money
6/15
Expressiveness
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
Expressiveness
Pool of candidate rules organized in fragments:
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
Expressiveness
Pool of candidate rules organized in fragments:
/html/table/tr[1]/td/text() Absolute Rules (complete path from root)
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
Expressiveness
Pool of candidate rules organized in fragments:
/html/table/tr[1]/td/text() Absolute Rules (complete path from root)
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Relative Rules (path from a textual node)
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
Expressiveness
Pool of candidate rules organized in fragments:
/html/table/tr[1]/td/text() Absolute Rules (complete path from root)
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Relative Rules (path from a textual node)
The candidate rules are generated observing the first annotated page
.... other XPaths
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
Expressiveness
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
Expressiveness
• The fragment is just expressive enough:
the correct rule can be generated.
• Few queries are needed to find it
/html/table/tr[1]/td/text()
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
Expressiveness
• The fragment is just expressive enough:
the correct rule can be generated.
• Few queries are needed to find it
/html/table/tr[1]/td/text()
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
State-of-the-art approaches fall in the first case !
They statically define the expressiveness of the XPath fragment
Expressiveness
• The fragment is just expressive enough:
the correct rule can be generated.
• Few queries are needed to find it
/html/table/tr[1]/td/text()
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
State-of-the-art approaches fall in the first case !
They statically define the expressiveness of the XPath fragment
R0 : Absolute Rules
R1 : R0 + Relative Rules
.....
Expressiveness
5%
70%
25%
We defined simple XPath fragments.
Empirically observed: too expressive fragments are not actually needed.
9/15
Rules are organized in a Hierarchy of Fragments with increasing expressiveness
R0 : Absolute Rules
R1 : R0 + Relative Rules
.....
Expressiveness
5%
70%
25%
We defined simple XPath fragments.
Empirically observed: too expressive fragments are not actually needed.
9/15
Rules are organized in a Hierarchy of Fragments with increasing expressiveness
R0 : Absolute Rules
R1 : R0 + Relative Rules
.....
Inspired by Structural Risk Minimization (SRM)*:
a Machine Learning technique to address overfitting
*Details: Shawe-Taylor et all - IEEE Transactions on Information Theory, 44(5):1926–1940, 1998
Expressiveness
5%
70%
25%
We defined simple XPath fragments.
Empirically observed: too expressive fragments are not actually needed.
9/15
Dynamic Expressiveness
R0 : Absolute Rules
10/15
Dynamic Expressiveness
R0 : Absolute Rules
10/15
P(R|Lk
)
No solution?
> ?R
Dynamic Expressiveness
R0 : Absolute Rules
10/15
P(R|Lk
)
No solution?
> ?R
Expands the expressiveness
No
R1 : R0 + Relative Rules
Dynamic Expressiveness
R0 : Absolute Rules
10/15
P(R|Lk
)
No solution?
> ?R
Expands the expressiveness
No
.....
R1 : R0 + Relative Rules
Dynamic Expressiveness
R0 : Absolute Rules
10/15
P(R|Lk
)
No solution?
> ?R
Expands the expressiveness
No
.....
R1 : R0 + Relative Rules
Dynamic Expressiveness
R0 : Absolute Rules
10/15
P(r|Lk
)
Is r good enough?
> ?r
Expands the expressiveness
No
.....
R1 : R0 + Relative Rules
Yes
Terminates
Dynamic Expressiveness
R0 : Absolute Rules
10/15
P(r|Lk
)
Is r good enough?
> ?r
Expands the expressiveness
No
Results
Site Entity |Pages|
www.imdb.com Actor 500k
www.imdb.com Movies 500k
www.allmusic.com Band 500k
www.allmusic.com Albums 500k
www.nasdaq.com Stock Quotes 7k
Dataset: 40 attributes
Measures:
• Costs - #MQ
• Quality - Precision and Recall
11/15
Results: Dynamic Expressiveness
Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on)
RANDOM 379 190 50% 0,998 0,977
GREEDY 398 169 58% 0,998 0,983
LUCKY 196 132 33% 0,996 0,995
ENTROPY 205 116 44% 0,998 0,99
12/15
Results: Dynamic Expressiveness
Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on)
RANDOM 379 190 50% 0,998 0,977
GREEDY 398 169 58% 0,998 0,983
LUCKY 196 132 33% 0,996 0,995
ENTROPY 205 116 44% 0,998 0,99
Dynamic Expressiveness saves a lot of queries
12/15
Results: Dynamic Expressiveness
Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on)
RANDOM 379 190 50% 0,998 0,977
GREEDY 398 169 58% 0,998 0,983
LUCKY 196 132 33% 0,996 0,995
ENTROPY 205 116 44% 0,998 0,99
Dynamic Expressiveness saves a lot of queries
Small quality loss:
The expressiveness is not expanded when it is needed
12/15
Results: Dynamic Expressiveness
Static Expressiveness Dynamic Expressiveness
# candidate rules # candidate rules
13/15
Results: Dynamic Expressiveness
Static Expressiveness Dynamic Expressiveness
“Simple” attributes: complex algorithms are not needed
# candidate rules # candidate rules
13/15
Results: Dynamic Expressiveness
Static Expressiveness Dynamic Expressiveness
“Simple” attributes: complex algorithms are not needed
“Complex” attributes: Entropy, Lucky and Dynamic Expressiveness saves
a lot of queries
# candidate rules # candidate rules
13/15
Future development
Noisy Crowds: workers mistakes vs task redundancy*
How to evaluate the accuracy of the worker?
Another query or another worker?
Same learning framework, different problems: NLP, Crawling
14/15
*Demo
Title: ALFRED: Crowd Assisted Data Extraction
When: Tomorrow 17h
Where: Imperial Room
Thank you for the attention !!
15/15
15/15
Redundancy
0
0,5
1
0 1 2 3 4
P(r1)
P(r2)
P(r3)
# MQ
0
0,5
1
0 1 2 3 4
P(r1)
P(r2)
P(r3)
Not Accurate Worker
# MQ
0
0,5
1
0 1 2 3 4
P(r1)
P(r2)
P(r3)
# MQ
Many Workers
Accurate Worker
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Wrapper!
Inference
algorithm!
... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Wrapper!
Inference
algorithm!
DB#
... Not all pages look like the pages about famous movies
Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
page0 page1
r1
r2
r3
Spirited Away City of God
Spirited Away -
Spirited Away City of God
r1 = r3 != r2
Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
page0 page1
r1
r2
r3
Spirited Away City of God
Spirited Away -
Spirited Away City of God
r1 = r3 != r2
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
r1 != r3 != r2
Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
page0 page1
r1
r2
r3
Spirited Away City of God
Spirited Away -
Spirited Away City of God
r1 = r3 != r2
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
r1 != r3 != r2
Pages make apparent the
differences among the rules
Find a small set that makes apparent
the same differences observed in the
whole set of pages*
Sampling & Quality
The problem.
Find the smallest set that makes apparent the differences among the rules:
(e.g., 100 pages that make apparent the same differences that we would observe in 2M pages).
It is a NP-Hard problem !! Reduction to SET-Cover problem:
Find the smallest set of pages that cover all the group of rules (group = equivalent rules).
The smallest set is not needed:
A greedy algorithm O(|Pages|) in time and O(1) in space works very well in practice.
XPath rules
For every page p:
if (p makes apparent new differences)
representative pages += p
An offline algorithm that can be easily parallelized
Sampling & Quality
Results: Sampling
Three sample sets:
• Biased
Pages collected by crawling the website
• Random
Pages randomly picked from the whole set of pages
• Representative
Pages collected by our sampling algorithm
Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Representative perfect
Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Biased: recall loss
Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Random:
better than biased
State of Art
• 2006 - Interactive wrapper generation with minimal user effort.
U. Irmik et al. WWW
• 2006 - Active learning with multiple views.
I. Muslea et al. JAIR
Supervised
Wrapper Induction
State of Art
• 2008 - Wrapper inference for ambiguous web pages.
C. Valter and P. Merialdo JAAI
• 2005 - Web Data Extraction Based on Partial Tree Alignment
Yanhong Zhai WWW.
Unsupervised
Wrapper Induction
State of Art
• 2012 - D.I.A.D.E.M.
J. Furche and G. Gottlob WWW
• 2011 - Automatic wrappers for large scale web extraction.
N.N. Dalvi et al. VLDB.
Automatic Annotators

More Related Content

What's hot

SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
LeeFeigenbaum
 
Java 8 Streams and Rx Java Comparison
Java 8 Streams and Rx Java ComparisonJava 8 Streams and Rx Java Comparison
Java 8 Streams and Rx Java Comparison
José Paumard
 
Thumbtack Expertise Days # 5 - Javaz
Thumbtack Expertise Days # 5 - JavazThumbtack Expertise Days # 5 - Javaz
Thumbtack Expertise Days # 5 - Javaz
Alexey Remnev
 
Java and SPARQL
Java and SPARQLJava and SPARQL
Java and SPARQL
Raji Ghawi
 
Java SE 8 best practices
Java SE 8 best practicesJava SE 8 best practices
Java SE 8 best practices
Stephen Colebourne
 
070517 Jena
070517 Jena070517 Jena
070517 Jena
yuhana
 
Java 8 Lambda Expressions & Streams
Java 8 Lambda Expressions & StreamsJava 8 Lambda Expressions & Streams
Java 8 Lambda Expressions & Streams
NewCircle Training
 
Java8.part2
Java8.part2Java8.part2
Java8.part2
Ivan Ivanov
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
AdonisDamian
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
Venkata Naga Ravi
 
Introduction to Java 8
Introduction to Java 8Introduction to Java 8
Introduction to Java 8
Knoldus Inc.
 
Amber and beyond: Java language changes
Amber and beyond: Java language changesAmber and beyond: Java language changes
Amber and beyond: Java language changes
Stephen Colebourne
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentation
Van Huong
 
SPARQL 1.1 Status
SPARQL 1.1 StatusSPARQL 1.1 Status
SPARQL 1.1 Status
LeeFeigenbaum
 
Python made easy
Python made easy Python made easy
Python made easy
Abhishek kumar
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full story
José Paumard
 
Productive Programming in Java 8 - with Lambdas and Streams
Productive Programming in Java 8 - with Lambdas and Streams Productive Programming in Java 8 - with Lambdas and Streams
Productive Programming in Java 8 - with Lambdas and Streams
Ganesh Samarthyam
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for Java
Aleksander Pohl
 
Dependent types (and other ideas for guaranteeing correctness with types)
Dependent types (and other ideas for guaranteeing correctness with types)Dependent types (and other ideas for guaranteeing correctness with types)
Dependent types (and other ideas for guaranteeing correctness with types)
radexp
 
Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8
Ganesh Samarthyam
 

What's hot (20)

SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Java 8 Streams and Rx Java Comparison
Java 8 Streams and Rx Java ComparisonJava 8 Streams and Rx Java Comparison
Java 8 Streams and Rx Java Comparison
 
Thumbtack Expertise Days # 5 - Javaz
Thumbtack Expertise Days # 5 - JavazThumbtack Expertise Days # 5 - Javaz
Thumbtack Expertise Days # 5 - Javaz
 
Java and SPARQL
Java and SPARQLJava and SPARQL
Java and SPARQL
 
Java SE 8 best practices
Java SE 8 best practicesJava SE 8 best practices
Java SE 8 best practices
 
070517 Jena
070517 Jena070517 Jena
070517 Jena
 
Java 8 Lambda Expressions & Streams
Java 8 Lambda Expressions & StreamsJava 8 Lambda Expressions & Streams
Java 8 Lambda Expressions & Streams
 
Java8.part2
Java8.part2Java8.part2
Java8.part2
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
Introduction to Java 8
Introduction to Java 8Introduction to Java 8
Introduction to Java 8
 
Amber and beyond: Java language changes
Amber and beyond: Java language changesAmber and beyond: Java language changes
Amber and beyond: Java language changes
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentation
 
SPARQL 1.1 Status
SPARQL 1.1 StatusSPARQL 1.1 Status
SPARQL 1.1 Status
 
Python made easy
Python made easy Python made easy
Python made easy
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full story
 
Productive Programming in Java 8 - with Lambdas and Streams
Productive Programming in Java 8 - with Lambdas and Streams Productive Programming in Java 8 - with Lambdas and Streams
Productive Programming in Java 8 - with Lambdas and Streams
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for Java
 
Dependent types (and other ideas for guaranteeing correctness with types)
Dependent types (and other ideas for guaranteeing correctness with types)Dependent types (and other ideas for guaranteeing correctness with types)
Dependent types (and other ideas for guaranteeing correctness with types)
 
Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8
 

Viewers also liked

Wrapper Generation Supervised by a Noisy Crowd
Wrapper Generation Supervised by a Noisy CrowdWrapper Generation Supervised by a Noisy Crowd
Wrapper Generation Supervised by a Noisy Crowd
Disheng Qiu
 
7 Secrets of Permanent Fat Loss and Fitness
7 Secrets of Permanent Fat Loss and Fitness7 Secrets of Permanent Fat Loss and Fitness
7 Secrets of Permanent Fat Loss and Fitness
usmcpv
 
Business
BusinessBusiness
Business
usmcpv
 
La Vida de la Virgen María
La Vida de la Virgen MaríaLa Vida de la Virgen María
La Vida de la Virgen María
santamariaysanpedro
 
ALFRED demo - www2013
ALFRED demo - www2013ALFRED demo - www2013
ALFRED demo - www2013
Disheng Qiu
 
Magnificat
Magnificat  Magnificat
Magnificat
santamariaysanpedro
 
D2 ic-presentation
D2 ic-presentationD2 ic-presentation
D2 ic-presentation
usmcpv
 
Restaurant Operators
Restaurant OperatorsRestaurant Operators
Restaurant Operators
usmcpv
 
Brand You (new)
Brand You (new)Brand You (new)
Brand You (new)
tim4gina
 
Rogers royals basketball
Rogers royals basketballRogers royals basketball
Rogers royals basketball
jillm68
 
26 de junio del 2011
26 de junio del 201126 de junio del 2011
26 de junio del 2011
santamariaysanpedro
 
презентация по англ. яз.
презентация по англ. яз.презентация по англ. яз.
презентация по англ. яз.
topMaximus
 
PORTFOLIO HGWeb Consulting
PORTFOLIO HGWeb ConsultingPORTFOLIO HGWeb Consulting
PORTFOLIO HGWeb Consulting
HGWeb Consulting WSI agency
 
Slideshare Language courses SME in French
Slideshare Language courses SME in FrenchSlideshare Language courses SME in French
Slideshare Language courses SME in FrenchElaN Languages
 
L'ergonomie d'un site web par Fred Colantonio
L'ergonomie d'un site web par Fred ColantonioL'ergonomie d'un site web par Fred Colantonio
L'ergonomie d'un site web par Fred Colantonio
J'ai besoin de com
 
LinkedIn : Visibilité versus Viralité de vos publications
LinkedIn : Visibilité versus Viralité de vos publicationsLinkedIn : Visibilité versus Viralité de vos publications
LinkedIn : Visibilité versus Viralité de vos publications
Consonaute
 

Viewers also liked (17)

Wrapper Generation Supervised by a Noisy Crowd
Wrapper Generation Supervised by a Noisy CrowdWrapper Generation Supervised by a Noisy Crowd
Wrapper Generation Supervised by a Noisy Crowd
 
7 Secrets of Permanent Fat Loss and Fitness
7 Secrets of Permanent Fat Loss and Fitness7 Secrets of Permanent Fat Loss and Fitness
7 Secrets of Permanent Fat Loss and Fitness
 
Business
BusinessBusiness
Business
 
La Vida de la Virgen María
La Vida de la Virgen MaríaLa Vida de la Virgen María
La Vida de la Virgen María
 
ALFRED demo - www2013
ALFRED demo - www2013ALFRED demo - www2013
ALFRED demo - www2013
 
Magnificat
Magnificat  Magnificat
Magnificat
 
D2 ic-presentation
D2 ic-presentationD2 ic-presentation
D2 ic-presentation
 
Restaurant Operators
Restaurant OperatorsRestaurant Operators
Restaurant Operators
 
Gülşen arislan
Gülşen arislanGülşen arislan
Gülşen arislan
 
Brand You (new)
Brand You (new)Brand You (new)
Brand You (new)
 
Rogers royals basketball
Rogers royals basketballRogers royals basketball
Rogers royals basketball
 
26 de junio del 2011
26 de junio del 201126 de junio del 2011
26 de junio del 2011
 
презентация по англ. яз.
презентация по англ. яз.презентация по англ. яз.
презентация по англ. яз.
 
PORTFOLIO HGWeb Consulting
PORTFOLIO HGWeb ConsultingPORTFOLIO HGWeb Consulting
PORTFOLIO HGWeb Consulting
 
Slideshare Language courses SME in French
Slideshare Language courses SME in FrenchSlideshare Language courses SME in French
Slideshare Language courses SME in French
 
L'ergonomie d'un site web par Fred Colantonio
L'ergonomie d'un site web par Fred ColantonioL'ergonomie d'un site web par Fred Colantonio
L'ergonomie d'un site web par Fred Colantonio
 
LinkedIn : Visibilité versus Viralité de vos publications
LinkedIn : Visibilité versus Viralité de vos publicationsLinkedIn : Visibilité versus Viralité de vos publications
LinkedIn : Visibilité versus Viralité de vos publications
 

Similar to ALFRED - www2013

Internet Technology and its Applications
Internet Technology and its ApplicationsInternet Technology and its Applications
Internet Technology and its Applications
amichoksi
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
Vitomir Kovanovic
 
Charles Sharp: Java 8 Streams
Charles Sharp: Java 8 StreamsCharles Sharp: Java 8 Streams
Charles Sharp: Java 8 Streams
jessitron
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Easy R
Easy REasy R
Easy R
Ajay Ohri
 
Coding standard
Coding standardCoding standard
Coding standard
FAROOK Samath
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
Santosh Rajan
 
02basics
02basics02basics
02basics
Waheed Warraich
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
Mohsen Zainalpour
 
A CTF Hackers Toolbox
A CTF Hackers ToolboxA CTF Hackers Toolbox
A CTF Hackers Toolbox
Stefan
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
jtdudley
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Ary Borenszweig
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Ary Borenszweig
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Crystal Language
 
Reitit - Clojure/North 2019
Reitit - Clojure/North 2019Reitit - Clojure/North 2019
Reitit - Clojure/North 2019
Metosin Oy
 
Core java
Core javaCore java
Core java
kasaragaddaslide
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
Olav Sandstå
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
Logan Chien
 
Logical Expressions in C/C++. Mistakes Made by Professionals
Logical Expressions in C/C++. Mistakes Made by ProfessionalsLogical Expressions in C/C++. Mistakes Made by Professionals
Logical Expressions in C/C++. Mistakes Made by Professionals
PVS-Studio
 

Similar to ALFRED - www2013 (20)

Internet Technology and its Applications
Internet Technology and its ApplicationsInternet Technology and its Applications
Internet Technology and its Applications
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 
Charles Sharp: Java 8 Streams
Charles Sharp: Java 8 StreamsCharles Sharp: Java 8 Streams
Charles Sharp: Java 8 Streams
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Easy R
Easy REasy R
Easy R
 
Coding standard
Coding standardCoding standard
Coding standard
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
 
02basics
02basics02basics
02basics
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
 
A CTF Hackers Toolbox
A CTF Hackers ToolboxA CTF Hackers Toolbox
A CTF Hackers Toolbox
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Reitit - Clojure/North 2019
Reitit - Clojure/North 2019Reitit - Clojure/North 2019
Reitit - Clojure/North 2019
 
Core java
Core javaCore java
Core java
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
 
Logical Expressions in C/C++. Mistakes Made by Professionals
Logical Expressions in C/C++. Mistakes Made by ProfessionalsLogical Expressions in C/C++. Mistakes Made by Professionals
Logical Expressions in C/C++. Mistakes Made by Professionals
 

Recently uploaded

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
Chevonnese Chevers Whyte, MBA, B.Sc.
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 

Recently uploaded (20)

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 

ALFRED - www2013

  • 1. A Framework for Learning Web Wrappers from the Crowd Valter Crescenzi, Paolo Merialdo, Disheng Qiu Dipartimento di Ingegneria Università degli Studi Roma Tre Via della Vasca Navale, 79, Rome disheng@dia.uniroma3.it
  • 2. Extracting data 2M pages from IMDB, and we want to extract ... titles, directors etc .... 1/15
  • 3. Extracting data 2M pages from IMDB, and we want to extract ... titles, directors etc .... DB#Wrapper! 1/15
  • 4. Extracting data 2M pages from IMDB, and we want to extract ... titles, directors etc .... Inference algorithm! DB#Wrapper! 1/15
  • 5. Supervised Supervised hard to scale Inference algorithm! DB#Wrapper! 1/15
  • 6. Unsupervised Unsupervised easier to scale but not accurate Inference algorithm! DB#Wrapper! 1/15
  • 7. Automatic Annotator Automatic annotators can not be applied in all cases Inference algorithm! DB#Wrapper! +" 1/15 • Sample values • Ontology • Lexical patterns
  • 8. Crowdsourcing An opportunity to scale supervised approaches Inference algorithm! DB#Wrapper! 1/15
  • 9. Scaling Wrapper Inference Scaling the number of workers with Crowdsourcing platforms opens new challenges: Issues: Contributions: 2/15
  • 10. Scaling Wrapper Inference Scaling the number of workers with Crowdsourcing platforms opens new challenges: Issues: Contributions: Non-expert workers • Simple interactions to reduce the worker error rate • Membership Query (yes/no answer) 2/15
  • 11. Scaling Wrapper Inference Scaling the number of workers with Crowdsourcing platforms opens new challenges: Issues: Contributions: Non-expert workers • Simple interactions to reduce the worker error rate • Membership Query (yes/no answer) • Active Learning to carefully select queries • Dynamic Expressiveness of the inference language Costs 2/15
  • 12. Scaling Wrapper Inference Scaling the number of workers with Crowdsourcing platforms opens new challenges: Issues: Contributions: Non-expert workers • Simple interactions to reduce the worker error rate • Membership Query (yes/no answer) • Active Learning to carefully select queries • Dynamic Expressiveness of the inference language Costs 2/15 Quality • Bayesian Model to evaluate the expected wrapper quality • Sampling algorithms
  • 13. ALFRED ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform. Input annotated page (page0): 3/15
  • 14. ALFRED ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform. r1 = /html/table/tr[1]/td/text() r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text() .... Inference algorithm! Input annotated page (page0): 3/15
  • 15. ALFRED ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform. r1 = /html/table/tr[1]/td/text() r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text() .... Inference algorithm! page0 r1 r2 r3 Spirited Away Spirited Away Spirited Away Input annotated page (page0): 3/15
  • 16. ALFRED ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform. r1 = /html/table/tr[1]/td/text() r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text() .... Inference algorithm! page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null Input annotated page (page0): 3/15
  • 17. ALFRED ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform. r1 = /html/table/tr[1]/td/text() r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text() .... Inference algorithm! Input annotated page (page0): Is this title the correct one? 3/15
  • 18. ALFRED ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform. r1 = /html/table/tr[1]/td/text() r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text() .... Inference algorithm! DB#Wrapper! r1 = /html/table/tr[1]/td/text() Input annotated page (page0): Is this title the correct one? 3/15
  • 19. Membership Query page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null 4/15 Yes !
  • 20. Membership Query page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null • Rules compatible with the answer more likely to be correct (Bayesian Model) For each new answer 4/15 Yes !
  • 21. Membership Query page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null • Rules compatible with the answer more likely to be correct (Bayesian Model) For each new answer • If no rule is good enough: • a new query is selected (Active Learning) 4/15 Yes !
  • 22. Bayesian Model Training sequence = {“Spirited Away” , “-” , “9.3” } Yes No No 5/15 Lk Lk
  • 23. Bayesian Model Training sequence = {“Spirited Away” , “-” , “9.3” } Yes No No 5/15 Lk Lk a rule r is correct: none of the candidate rules is correct: Probability that: P(r|Lk ) P(R|Lk )
  • 24. Bayesian update: Bayesian Model Training sequence = {“Spirited Away” , “-” , “9.3” } Yes No No 5/15 Lk Lk a rule r is correct: none of the candidate rules is correct: Probability that: P(r|Lk ) P(R|Lk )
  • 25. Bayesian update: Bayesian Model Training sequence = {“Spirited Away” , “-” , “9.3” } Yes No No 5/15 Lk Lk a rule r is correct: none of the candidate rules is correct: Probability that: P(r|Lk ) P(R|Lk )
  • 26. Active Learning page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null ALFRED actively selects the queries; a good policy saves money 6/15
  • 27. Active Learning • Random (baseline) Values are randomly selected • Entropy Values are selected by maximizing the Entropy (most uncertain value) • Greedy Values are selected by minimizing the queries to confirm the most likely rule • Lucky Hybrid approach, it starts with an Entropy algorithm and then switch to Greedy to confirm the best rule page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null ALFRED actively selects the queries; a good policy saves money 6/15
  • 28. Expressiveness The candidate rules are generated observing the first annotated page Should we use all the XPath expressiveness or just a fragment? 7/15 Expressiveness of the fragment Number of candidate rules
  • 29. Expressiveness Pool of candidate rules organized in fragments: The candidate rules are generated observing the first annotated page Should we use all the XPath expressiveness or just a fragment? 7/15 Expressiveness of the fragment Number of candidate rules
  • 30. Expressiveness Pool of candidate rules organized in fragments: /html/table/tr[1]/td/text() Absolute Rules (complete path from root) The candidate rules are generated observing the first annotated page Should we use all the XPath expressiveness or just a fragment? 7/15 Expressiveness of the fragment Number of candidate rules
  • 31. Expressiveness Pool of candidate rules organized in fragments: /html/table/tr[1]/td/text() Absolute Rules (complete path from root) //*[contains(.,”Spirited Away”)]/text() //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() //*[contains(.,”Director:”)]/../../tr[1]/td/text() Relative Rules (path from a textual node) The candidate rules are generated observing the first annotated page Should we use all the XPath expressiveness or just a fragment? 7/15 Expressiveness of the fragment Number of candidate rules
  • 32. Expressiveness Pool of candidate rules organized in fragments: /html/table/tr[1]/td/text() Absolute Rules (complete path from root) //*[contains(.,”Spirited Away”)]/text() //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() //*[contains(.,”Director:”)]/../../tr[1]/td/text() Relative Rules (path from a textual node) The candidate rules are generated observing the first annotated page .... other XPaths Should we use all the XPath expressiveness or just a fragment? 7/15 Expressiveness of the fragment Number of candidate rules
  • 33. Expressiveness /html/table/tr[1]/td/text() //*[contains(.,”Spirited Away”)]/text() //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() //*[contains(.,”Director:”)]/../../tr[1]/td/text() Correct (absolute) rule: /html/table/tr[1]/td/text() • The fragment is too expressive: the correct rule can be generated • But many MQ are needed to find it 8/15
  • 34. Expressiveness • The fragment is just expressive enough: the correct rule can be generated. • Few queries are needed to find it /html/table/tr[1]/td/text() /html/table/tr[1]/td/text() //*[contains(.,”Spirited Away”)]/text() //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() //*[contains(.,”Director:”)]/../../tr[1]/td/text() Correct (absolute) rule: /html/table/tr[1]/td/text() • The fragment is too expressive: the correct rule can be generated • But many MQ are needed to find it 8/15
  • 35. Expressiveness • The fragment is just expressive enough: the correct rule can be generated. • Few queries are needed to find it /html/table/tr[1]/td/text() /html/table/tr[1]/td/text() //*[contains(.,”Spirited Away”)]/text() //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() //*[contains(.,”Director:”)]/../../tr[1]/td/text() Correct (absolute) rule: /html/table/tr[1]/td/text() • The fragment is too expressive: the correct rule can be generated • But many MQ are needed to find it 8/15 State-of-the-art approaches fall in the first case ! They statically define the expressiveness of the XPath fragment
  • 36. Expressiveness • The fragment is just expressive enough: the correct rule can be generated. • Few queries are needed to find it /html/table/tr[1]/td/text() /html/table/tr[1]/td/text() //*[contains(.,”Spirited Away”)]/text() //*[contains(.,”Ratings:”)]/../../tr[1]/td/text() //*[contains(.,”Director:”)]/../../tr[1]/td/text() Correct (absolute) rule: /html/table/tr[1]/td/text() • The fragment is too expressive: the correct rule can be generated • But many MQ are needed to find it 8/15 State-of-the-art approaches fall in the first case ! They statically define the expressiveness of the XPath fragment
  • 37. R0 : Absolute Rules R1 : R0 + Relative Rules ..... Expressiveness 5% 70% 25% We defined simple XPath fragments. Empirically observed: too expressive fragments are not actually needed. 9/15
  • 38. Rules are organized in a Hierarchy of Fragments with increasing expressiveness R0 : Absolute Rules R1 : R0 + Relative Rules ..... Expressiveness 5% 70% 25% We defined simple XPath fragments. Empirically observed: too expressive fragments are not actually needed. 9/15
  • 39. Rules are organized in a Hierarchy of Fragments with increasing expressiveness R0 : Absolute Rules R1 : R0 + Relative Rules ..... Inspired by Structural Risk Minimization (SRM)*: a Machine Learning technique to address overfitting *Details: Shawe-Taylor et all - IEEE Transactions on Information Theory, 44(5):1926–1940, 1998 Expressiveness 5% 70% 25% We defined simple XPath fragments. Empirically observed: too expressive fragments are not actually needed. 9/15
  • 40. Dynamic Expressiveness R0 : Absolute Rules 10/15
  • 41. Dynamic Expressiveness R0 : Absolute Rules 10/15 P(R|Lk ) No solution? > ?R
  • 42. Dynamic Expressiveness R0 : Absolute Rules 10/15 P(R|Lk ) No solution? > ?R Expands the expressiveness No
  • 43. R1 : R0 + Relative Rules Dynamic Expressiveness R0 : Absolute Rules 10/15 P(R|Lk ) No solution? > ?R Expands the expressiveness No
  • 44. ..... R1 : R0 + Relative Rules Dynamic Expressiveness R0 : Absolute Rules 10/15 P(R|Lk ) No solution? > ?R Expands the expressiveness No
  • 45. ..... R1 : R0 + Relative Rules Dynamic Expressiveness R0 : Absolute Rules 10/15 P(r|Lk ) Is r good enough? > ?r Expands the expressiveness No
  • 46. ..... R1 : R0 + Relative Rules Yes Terminates Dynamic Expressiveness R0 : Absolute Rules 10/15 P(r|Lk ) Is r good enough? > ?r Expands the expressiveness No
  • 47. Results Site Entity |Pages| www.imdb.com Actor 500k www.imdb.com Movies 500k www.allmusic.com Band 500k www.allmusic.com Albums 500k www.nasdaq.com Stock Quotes 7k Dataset: 40 attributes Measures: • Costs - #MQ • Quality - Precision and Recall 11/15
  • 48. Results: Dynamic Expressiveness Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on) RANDOM 379 190 50% 0,998 0,977 GREEDY 398 169 58% 0,998 0,983 LUCKY 196 132 33% 0,996 0,995 ENTROPY 205 116 44% 0,998 0,99 12/15
  • 49. Results: Dynamic Expressiveness Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on) RANDOM 379 190 50% 0,998 0,977 GREEDY 398 169 58% 0,998 0,983 LUCKY 196 132 33% 0,996 0,995 ENTROPY 205 116 44% 0,998 0,99 Dynamic Expressiveness saves a lot of queries 12/15
  • 50. Results: Dynamic Expressiveness Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on) RANDOM 379 190 50% 0,998 0,977 GREEDY 398 169 58% 0,998 0,983 LUCKY 196 132 33% 0,996 0,995 ENTROPY 205 116 44% 0,998 0,99 Dynamic Expressiveness saves a lot of queries Small quality loss: The expressiveness is not expanded when it is needed 12/15
  • 51. Results: Dynamic Expressiveness Static Expressiveness Dynamic Expressiveness # candidate rules # candidate rules 13/15
  • 52. Results: Dynamic Expressiveness Static Expressiveness Dynamic Expressiveness “Simple” attributes: complex algorithms are not needed # candidate rules # candidate rules 13/15
  • 53. Results: Dynamic Expressiveness Static Expressiveness Dynamic Expressiveness “Simple” attributes: complex algorithms are not needed “Complex” attributes: Entropy, Lucky and Dynamic Expressiveness saves a lot of queries # candidate rules # candidate rules 13/15
  • 54. Future development Noisy Crowds: workers mistakes vs task redundancy* How to evaluate the accuracy of the worker? Another query or another worker? Same learning framework, different problems: NLP, Crawling 14/15 *Demo Title: ALFRED: Crowd Assisted Data Extraction When: Tomorrow 17h Where: Imperial Room
  • 55. Thank you for the attention !! 15/15
  • 56. 15/15 Redundancy 0 0,5 1 0 1 2 3 4 P(r1) P(r2) P(r3) # MQ 0 0,5 1 0 1 2 3 4 P(r1) P(r2) P(r3) Not Accurate Worker # MQ 0 0,5 1 0 1 2 3 4 P(r1) P(r2) P(r3) # MQ Many Workers Accurate Worker
  • 57. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but ....
  • 58. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but ....
  • 59. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but .... Inference algorithm!
  • 60. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but .... Inference algorithm!
  • 61. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but .... Inference algorithm!
  • 62. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but .... Inference algorithm!
  • 63. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but .... Wrapper! Inference algorithm!
  • 64. ... selecting the right sample set is crucial Sampling & Quality 2M pages from IMDB, we have to work with a sample set but .... Wrapper! Inference algorithm! DB# ... Not all pages look like the pages about famous movies
  • 65. Sampling & Quality page0 r1 r2 r3 Spirited Away Spirited Away Spirited Away r1 = r2 = r3
  • 66. Sampling & Quality page0 r1 r2 r3 Spirited Away Spirited Away Spirited Away r1 = r2 = r3 page0 page1 r1 r2 r3 Spirited Away City of God Spirited Away - Spirited Away City of God r1 = r3 != r2
  • 67. Sampling & Quality page0 r1 r2 r3 Spirited Away Spirited Away Spirited Away r1 = r2 = r3 page0 page1 r1 r2 r3 Spirited Away City of God Spirited Away - Spirited Away City of God r1 = r3 != r2 page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null r1 != r3 != r2
  • 68. Sampling & Quality page0 r1 r2 r3 Spirited Away Spirited Away Spirited Away r1 = r2 = r3 page0 page1 r1 r2 r3 Spirited Away City of God Spirited Away - Spirited Away City of God r1 = r3 != r2 page0 page1 page2 r1 r2 r3 Spirited Away City of God Howl’s Moving Castle Spirited Away - 9.3 Spirited Away City of God null r1 != r3 != r2 Pages make apparent the differences among the rules Find a small set that makes apparent the same differences observed in the whole set of pages*
  • 69. Sampling & Quality The problem. Find the smallest set that makes apparent the differences among the rules: (e.g., 100 pages that make apparent the same differences that we would observe in 2M pages). It is a NP-Hard problem !! Reduction to SET-Cover problem: Find the smallest set of pages that cover all the group of rules (group = equivalent rules). The smallest set is not needed: A greedy algorithm O(|Pages|) in time and O(1) in space works very well in practice.
  • 70. XPath rules For every page p: if (p makes apparent new differences) representative pages += p An offline algorithm that can be easily parallelized Sampling & Quality
  • 71. Results: Sampling Three sample sets: • Biased Pages collected by crawling the website • Random Pages randomly picked from the whole set of pages • Representative Pages collected by our sampling algorithm
  • 72. Results: Sampling Entity Sampling |Pages| P R Movies Biased 250 0.98 0.71 Movies Random 250 0.99 0.99Movies Representative 42 1.00 1.00 Actors Biased 250 1.00 1.00 Actors Random 250 1.00 0.96Actors Representative 30 1.00 1.00 Stocks Biased 86 1.00 0.98 Stocks Random 86 1.00 0.99Stocks Representative 15 1.00 1.00 Albums Biased 258 1.00 0.99 Albums Random 258 1.00 1.00Albums Representative 59 1.00 1.00 Bands Biased 289 1.00 0.68 Bands Random 289 1.00 1.00Bands Representative 36 1.00 1.00
  • 73. Results: Sampling Entity Sampling |Pages| P R Movies Biased 250 0.98 0.71 Movies Random 250 0.99 0.99Movies Representative 42 1.00 1.00 Actors Biased 250 1.00 1.00 Actors Random 250 1.00 0.96Actors Representative 30 1.00 1.00 Stocks Biased 86 1.00 0.98 Stocks Random 86 1.00 0.99Stocks Representative 15 1.00 1.00 Albums Biased 258 1.00 0.99 Albums Random 258 1.00 1.00Albums Representative 59 1.00 1.00 Bands Biased 289 1.00 0.68 Bands Random 289 1.00 1.00Bands Representative 36 1.00 1.00 Representative perfect
  • 74. Results: Sampling Entity Sampling |Pages| P R Movies Biased 250 0.98 0.71 Movies Random 250 0.99 0.99Movies Representative 42 1.00 1.00 Actors Biased 250 1.00 1.00 Actors Random 250 1.00 0.96Actors Representative 30 1.00 1.00 Stocks Biased 86 1.00 0.98 Stocks Random 86 1.00 0.99Stocks Representative 15 1.00 1.00 Albums Biased 258 1.00 0.99 Albums Random 258 1.00 1.00Albums Representative 59 1.00 1.00 Bands Biased 289 1.00 0.68 Bands Random 289 1.00 1.00Bands Representative 36 1.00 1.00
  • 75. Results: Sampling Entity Sampling |Pages| P R Movies Biased 250 0.98 0.71 Movies Random 250 0.99 0.99Movies Representative 42 1.00 1.00 Actors Biased 250 1.00 1.00 Actors Random 250 1.00 0.96Actors Representative 30 1.00 1.00 Stocks Biased 86 1.00 0.98 Stocks Random 86 1.00 0.99Stocks Representative 15 1.00 1.00 Albums Biased 258 1.00 0.99 Albums Random 258 1.00 1.00Albums Representative 59 1.00 1.00 Bands Biased 289 1.00 0.68 Bands Random 289 1.00 1.00Bands Representative 36 1.00 1.00 Biased: recall loss
  • 76. Results: Sampling Entity Sampling |Pages| P R Movies Biased 250 0.98 0.71 Movies Random 250 0.99 0.99Movies Representative 42 1.00 1.00 Actors Biased 250 1.00 1.00 Actors Random 250 1.00 0.96Actors Representative 30 1.00 1.00 Stocks Biased 86 1.00 0.98 Stocks Random 86 1.00 0.99Stocks Representative 15 1.00 1.00 Albums Biased 258 1.00 0.99 Albums Random 258 1.00 1.00Albums Representative 59 1.00 1.00 Bands Biased 289 1.00 0.68 Bands Random 289 1.00 1.00Bands Representative 36 1.00 1.00
  • 77. Results: Sampling Entity Sampling |Pages| P R Movies Biased 250 0.98 0.71 Movies Random 250 0.99 0.99Movies Representative 42 1.00 1.00 Actors Biased 250 1.00 1.00 Actors Random 250 1.00 0.96Actors Representative 30 1.00 1.00 Stocks Biased 86 1.00 0.98 Stocks Random 86 1.00 0.99Stocks Representative 15 1.00 1.00 Albums Biased 258 1.00 0.99 Albums Random 258 1.00 1.00Albums Representative 59 1.00 1.00 Bands Biased 289 1.00 0.68 Bands Random 289 1.00 1.00Bands Representative 36 1.00 1.00 Random: better than biased
  • 78. State of Art • 2006 - Interactive wrapper generation with minimal user effort. U. Irmik et al. WWW • 2006 - Active learning with multiple views. I. Muslea et al. JAIR Supervised Wrapper Induction
  • 79. State of Art • 2008 - Wrapper inference for ambiguous web pages. C. Valter and P. Merialdo JAAI • 2005 - Web Data Extraction Based on Partial Tree Alignment Yanhong Zhai WWW. Unsupervised Wrapper Induction
  • 80. State of Art • 2012 - D.I.A.D.E.M. J. Furche and G. Gottlob WWW • 2011 - Automatic wrappers for large scale web extraction. N.N. Dalvi et al. VLDB. Automatic Annotators