Web Data Truthfulness Analysis
Weiyi Meng
Department of Computer Science
State University of New York at Binghamton
meng@cs.binghamton.edu
(RenDa Office: 235)
http://www.cs.binghamton.edu/~meng/meng.html
July 2015
JUNE 16,
2016
1
Where in the World is SUNY Binghamton?
 Located in Binghamton
in New York state
 Birthplace of IBM
(Endicott, NY)
 Metro population:
249,000
 One of the safest
U.S. midsized cities
 Low cost of living
(12% below U.S.
average)
 Close to major cities
JUNE 16,
2016
2
Binghamton at a Glance
JUNE 16,
2016
3
The SUNY System
 64 campuses
(two-year, four-year)
 Four PhD-granting
“University Centers”
 Albany
 Binghamton
 Buffalo
 Stony Brook
JUNE 16,
2016
4
Rankings
 #1 best public college in
New York and #18 in the
nation
 — AMERICAN CITY BUSINESS JOURNAL, 2015
 #4 best value among
nation’s public colleges for
out-of-state and
international students and
#15 overall
 — KIPLINGER’S PERSONAL FINANCE MAGAZINE, 2014
 Top 5 best values in the
U.S.A. among public
colleges
 — KIPLINGER’S PERSONAL FINANCE MAGAZINE AND
THE PRINCETON REVIEW, 2014
 38th of top U.S.A. public
universities
 — U.S. NEWS & WORLD REPORT, 2014
 #5 of 50 best-rated, most
affordable colleges for
international students
 — GREAT VALUE COLLEGES, 2014
 Premier Public University
in Northeast U.S.A.
 — FISKE GUIDE TO COLLEGES
JUNE 16,
2016
5
6
The University Main Campus
 J U L Y 2 0 1 5
JUNE 16, 2016
Talk Outline
 Introduction
 Structured Web data quality analysis and truth finding
 Fact statement truthfulness analysis
 Additional Research Issues
JUNE 16,
2016
7
Web Data Quality
 Quality of data can be measured in different ways:
 Correctness or truthfulness
 Freshness
 Completeness
 Objectiveness
 Writing quality/style
 Appropriateness
 ……
 This talk will focus mostly on correctness/truthfulness.
JUNE 16,
2016
8
Structured Web Data Quality Analysis
and Truth Finding
 This part of the talk is based on the following two works in
collaboration with AT&T Labs-Research:
 Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh
Srivastava. Truth Finding on the Deep WEB: Is the Problem
Solved? VLDB 2013.
 Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh
Srivastava. Scaling Up Copy Detection. ICDE 2015.
JUNE 16,
2016
9
 Why these two domains?
Belief of fairly clean data
Data quality can have big impact on people’s lives
 Resolved heterogeneity at schema level and instance
level
Study on Two Domains
#Sources Period #Objects #Local-
attrs
#Global-
attrs
Considered
items
Stock 55 7/2011 1000*20 333 153 16000*20
Flight 38 12/2011 1200*31 43 15 7200*31
JUNE 16,
2016
10
Study on Two Domains
 Stock
 Search “stock price quotes” and “AAPL quotes”
 Sources: 200 (search results)89 (deep web)76 (GET method) 55 (none
JavaScript)
 1000 “Objects”: a stock with a particular symbol on a particular day
 30 from Dow Jones Index
 100 from NASDAQ100 (3 overlaps)
 873 from Russell 3000
 Attributes: 333 (local)  153 (global)  21 (provided by > 1/3 sources)  16 (no
change after market close)
#Sources Period #Objects #Local-
attrs
#Global-
attrs
Considered
items
Stock 55 7/2011 1000*20 333 153 16000*20
Flight 38 12/2011 1200*31 43 15 7200*31
JUNE 16,
2016
11
Study on Two Domains
 Flight
 Search “flight status”
 Sources: 38
 3 airline websites (AA, UA, Continental)
 8 airport websites (SFO, DEN, etc.)
 27 third-party websites (Orbitz, Travelocity, etc.)
 1200 “Objects”: a flight with a particular flight number on a
particular day from a particular departure city
 Departing or arriving at the hub airports of AA/UA/Continental
 Attributes: 43 (local)  15 (global)  6 (provided by > 1/3 sources)
 scheduled dept/arr time, actual dept/arr time, dept/arr gate
#Sources Period #Objects #Local-
attrs
#Global-
attrs
Considered
items
Stock 55 7/2011 1000*20 333 153 16000*20
Flight 38 12/2011 1200*31 43 15 7200*31
JUNE 16,
2016
12
Q1. Are There a Lot of Redundant Data on
the Deep Web?

JUNE 16,
2016
13
Q2. Are the Data Consistent?
Inconsistency on 70% data items
Tolerance to 1% difference

JUNE 16,
2016
14
Why Such Inconsistency?
— I. Semantic Ambiguity
Yahoo! Finance Nasdaq
Day’s Range: 93.80-95.71
52wk Range: 25.38-95.71
52 Wk: 25.38-93.72
JUNE 16,
2016
18
Why Such Inconsistency?
— II. Instance Ambiguity
JUNE 16,
2016
19
Why Such Inconsistency?
— III. Out-of-Date Data
4:05 pm 3:57 pm
JUNE 16,
2016
20
Why Such Inconsistency?
— IV. Unit Difference
76,821,000
76.82B
JUNE 16,
2016
21
Why Such Inconsistency?
— V. Pure Error
FlightView FlightAware
Orbitz
6:15 PM
6:15 PM
6:22 PM
9:40 PM
8:33 PM
9:54 PM
JUNE 16,
2016
22
Why Such Inconsistency?
 Random sample of 20 data items and 5 items with the
largest #values in each domain
JUNE 16,
2016
23
Q3. Is Each Source of High Accuracy?
 Not high on average: .86 for Stock and .8 for Flight
 Gold standard
 Stock: vote on data from Google Finance, Yahoo! Finance, MSN Money,
NASDAQ, Bloomberg
 Flight: from airline websites

JUNE 16,
2016
24
Q3-2. Are Authoritative Sources of High
Accuracy?
 Reasonable but not so high accuracy
 Medium coverage

JUNE 16,
2016
25
Q4. Is There Copying or Data Sharing
Between Deep-Web Sources?

JUNE 16,
2016
26
Q4-2. Is Copying or Data Sharing Mainly
on Accurate Data? 
JUNE 16,
2016
27
How to Resolve Inconsistency and Find
the True Values?
The problem is known as Data Fusion.
JUNE 16,
2016
28
Basic Solution: Voting
 Only 70% correct values are provided by over half of the sources
 Voting precision:
 .908 for Stock; i.e., wrong values for 1500 data items
 .864 for Flight; i.e., wrong values for 1000 data items
JUNE 16,
2016
29
Improvement I. Leveraging Source Accuracy
S1 S2 S3
Flight 1 7:02PM 6:40PM 7:02PM
Flight 2 5:43PM 5:43PM 5:50PM
Flight 3 9:20AM 9:20AM 9:20AM
Flight 4 9:40PM 9:52PM 8:33PM
Flight 5 6:15PM 6:15PM 6:22PM
JUNE 16,
2016
30
Improvement I. Leveraging Source Accuracy
 Naïve voting obtains an accuracy of 80%
S1 S2 S3
Flight 1 7:02PM 6:40PM 7:02PM
Flight 2 5:43PM 5:43PM 5:50PM
Flight 3 9:20AM 9:20AM 9:20AM
Flight 4 9:40PM 9:52PM 8:33PM
Flight 5 6:15PM 6:15PM 6:22PM
Higher accuracy;
More trustable
JUNE 16,
2016
31
Improvement I. Leveraging Source Accuracy
 Considering accuracy obtains an accuracy of 100%
Challenges:
1. How to decide source accuracy?
2. How to leverage source accuracy in voting?
S1 S2 S3
Flight 1 7:02PM 6:40PM 7:02PM
Flight 2 5:43PM 5:43PM 5:50PM
Flight 3 9:20AM 9:20AM 9:20AM
Flight 4 9:40PM 9:52PM 8:33PM
Flight 5 6:15PM 6:15PM 6:22PM
Higher accuracy;
More trustable
JUNE 16,
2016
32
Results on Stock Data (I)
 Among various methods, the Bayesian-based method (Accu) performs best
at the beginning, but in the end obtains a final precision (=recall) of .900,
worse than Vote (.908)
JUNE 16,
2016
33
Results on Stock Data (II)
 AccuSim obtains a final precision of .929, higher than Vote and any other
method (around .908)
 This translates to 350 more correct values
JUNE 16,
2016
34
Results on Stock Data (III)
JUNE 16,
2016
35
Results on Flight Data
 Accu/AccuSim obtains a final precision of .831/.833, both lower than Vote (.857)
JUNE 16,
2016
36
Copying or Data Sharing Can Happen on
Inaccurate Data
JUNE 16,
2016
37
S1 S2 S3 S4 S5
Flight 1 7:02PM 6:40PM 7:02PM 7:02PM 8:02PM
Flight 2 5:43PM 5:43PM 5:50PM 5:50PM 5:50PM
Flight 3 9:20AM 9:20AM 9:20AM 9:20AM 9:20AM
Flight 4 9:40PM 9:52PM 8:33PM 8:33PM 8:33PM
Flight 5 6:15PM 6:15PM 6:22PM 6:22PM 6:22PM
 Naïve voting works only if data sources are independent.
JUNE 16,
2016
38
 Considering source accuracy can be worse when there is
copying
Improvement II. Ignoring Copied Data
 It is important to detect copying and ignore copied values
in fusion
Challenges:
1. How to detect copying?
2. How to leverage copying in voting?
S1 S2 S3 S4 S5
Flight 1 7:02PM 6:40PM 7:02PM 7:02PM 8:02PM
Flight 2 5:43PM 5:43PM 5:50PM 5:50PM 5:50PM
Flight 3 9:20AM 9:20AM 9:20AM 9:20AM 9:20AM
Flight 4 9:40PM 9:52PM 8:33PM 8:33PM 8:33PM
Flight 5 6:15PM 6:15PM 6:22PM 6:22PM 6:22PM
JUNE 16,
2016
39
Results on Flight Data (I)
 AccuCopy obtains a final precision of .943, much higher than Vote (.864)
 This translates to 570 more correct values
JUNE 16,
2016
40
Results on Flight Data (II)
JUNE 16,
2016
41
Take-Away Messages
 Web data is not fully trustable, Web sources have
different accuracy, and copying is common
 Leveraging source accuracy, copying relationships, and
value similarity can improve truth finding
JUNE 16,
2016
42
Key Observations about Copy Detection
 Structured data copy detection is important and
challenging
 Useful for truth finding, protecting rights of data sources
 Copy detection can be expensive
 Prior work (Pairwise) has time complexity O(|S|2 |D| L)
 Many opportunities for scaling up copy detection
 Avoid comparing source pairs that are likely to be
independent
 Compare few values for source pairs before making decision
 Perform few comparisons in latter iterations of copy
detection
43
JUNE 16,
2016
 Are sources S0 and S1 copying?
– Not necessarily
Copy Detection
44
Source China Japan S Korea N Korea Vietnam
S0 Beijing Tokyo Seoul Hanoi
S1 Beijing Tokyo Seoul Pyongyang Hanoi
S2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
JUNE 16,
2016
 Are sources S2 and S3 copying?
– Very likely because sharing of many false values
without copying is very unlikely.
Copy Detection
45
Source China Japan S Korea N Korea Vietnam
S0 Beijing Tokyo Seoul Hanoi
S1 Beijing Tokyo Seoul Pyongyang Hanoi
S2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
JUNE 16,
2016
Copy Detection: Bayesian Analysis
 Goal: Pr(S1S2| Ф), Pr(S1S2| Ф) (sum = 1)
 According to Bayes Rule, we need Pr(Ф|S1S2),
Pr(Ф|S1S2)
 Key: compute Pr(ФD|S1S2), Pr(ФD|S1S2), for each D
 S1  S2
46
Different Values Od
TRUE Ot
S1  S2
FALSE Of
Same Values
JUNE 16,
2016
Copy Detection: Iterative Process
47
Truth
Discovery
Accuracy
Computation
Copy
Detection
Step 1Step 3
Step 2
JUNE 16,
2016
Scaling Up Copy Detection: Opportunities
 Pairwise copy detection has time complexity O(|S|2.|D|.L)
 |S| is the number of sources
 |D| is the number of data items
 L is the number of iterations
 Many opportunities for scaling up copy detection
 Avoid comparing source pairs that are likely to be
independent (e.g., share no values or very few true values)
 Compare few values for source pairs before making decision
 Perform few comparisons in latter iterations of copy detection
48
JUNE 16,
2016
 Avoid comparing source pairs that are likely to be independent
– Out of 45 source pairs, 18 source pairs do not share any value
– 1 source pair shares only two true values
Scaling Up Copy Detection: Opportunity 1
49
Source Accu China Japan S Korea N Korea Vietnam
S0 0.99 Beijing Tokyo Seoul Hanoi
S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi
S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi
S6 0.01 Nanjing Kyoto Wonsan Da Nang
S7 0.25 Nanjing Seoul Wonsan Da Nang
S8 0.2 Nanjing Nara Seoul Wonsan Da Nang
S9 0.99 Beijing Seoul Pyongyang
JUNE 16,
2016
 Each inverted index entry corresponds to Data Item.Value
– Includes provider sources (at least 2)
– Includes probability of the value being true
– Includes maximum contribution score to decision of copying
Solution: Build and Use Inverted Index
50
Data Item.Value Pr Max Score Sources
Japan.Kyoto 0.02 4.59 S5, S6
S Korea.Gyeongju 0.01 4.12 S2, S3, S4
… … … …
Vietnam.Hanoi 0.94 0.43 S0, S1, S5
China.Beijing 0.96 0.43 S0, S1, S5, S9
JUNE 16,
2016
 Algorithm Index processes entries in decreasing max score
order
– Accumulates scores for each source pair encountered in entry
– Adjusts scores for source pairs with different values for a data item
– Ignores source pairs that occur only in low max score entries
Solution: Build and Use Inverted Index
51
Data Item.Value Pr Max Score Sources
Japan.Kyoto 0.02 4.59 S5, S6
S Korea.Gyeongju 0.01 4.12 S2, S3, S4
… … … …
Vietnam.Hanoi 0.94 0.43 S0, S1, S5
China.Beijing 0.96 0.43 S0, S1, S5, S9
JUNE 16,
2016
 Compare few values for source pairs before making decision
– Out of 4 shared values, 3 are false
– Copying can be inferred after observing only 2 false values
Scaling Up Copy Detection: Opportunity 2
52
Source Accu China Japan S Korea N Korea Vietnam
S0 0.99 Beijing Tokyo Seoul Hanoi
S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi
S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi
S6 0.01 Nanjing Kyoto Wonsan Da Nang
S7 0.25 Nanjing Seoul Wonsan Da Nang
S8 0.2 Nanjing Nara Seoul Wonsan Da Nang
S9 0.99 Beijing Seoul Pyongyang
JUNE 16,
2016
 Algorithm Bound processes entries in decreasing max
score order
– Many high-score entries with source pair → early copying
decision
– Many high-score entries with neither source & many entries
with only one of the sources → early no-copying decision
Solution: Make Early Decisions
53
Data Item.Value Pr Max Score Sources
Japan.Kyoto 0.02 4.59 S5, S6
S Korea.Gyeongju 0.01 4.12 S2, S3, S4
… … … …
Vietnam.Hanoi 0.94 0.43 S0, S1, S5
China.Beijing 0.96 0.43 S0, S1, S5, S9
JUNE 16,
2016
 Perform few comparisons in later iterations of copy detection
– High likelihood for Japan.Tokyo in round 1, small change in round 2
– Terminate with no-copying decision early
Scaling Up Copy Detection: Opportunity 3
54
Source Accu China Japan S Korea N Korea Vietnam
S0 0.99 Beijing Tokyo Seoul Hanoi
S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi
S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi
S6 0.01 Nanjing Kyoto Wonsan Da Nang
S7 0.25 Nanjing Seoul Wonsan Da Nang
S8 0.2 Nanjing Nara Seoul Wonsan Da Nang
S9 0.99 Beijing Seoul Pyongyang
JUNE 16,
2016
 Algorithm Incremental processes index entries iteratively
– Copy detection depends on value probability and source accuracy
– Update scores on entries with big changes between iter i-2 and i-1
– Consider small-change entries only for source pairs whose score
changes might lead to opposite detection on copying
Solution: Incremental Copy Detection
55
Data Item.Value Pr Max Score Sources
Japan.Kyoto 0.02 4.59 S5, S6
S Korea.Gyeongju 0.01 4.12 S2, S3, S4
… … … …
Vietnam.Hanoi 0.94 0.43 S0, S1, S5
China.Beijing 0.96 0.43 S0, S1, S5, S9
JUNE 16,
2016
Summary of Experimental Results
 Four data sets: Book-CS, Stock-1day, Book-full, Stock-
2wk
 Few vs many sources, data items, distinct values, index entries
 Validates efficiency of proposed algorithms using inverted
index
 Algorithm Index improves efficiency by 10-100x over Pairwise
 Early decisions + Incremental further improves efficiency by 10x
 Significant reduction in run times of later iterations of Incremental
 Validates effectiveness of proposed algorithms
 Early decisions + Incremental obtains very similar result to Pairwise
56
JUNE 16,
2016
Fact Statement Truthfulness Analysis
 This part of the talk is based on the following works:
 Xian Li, Weiyi Meng, Clement Yu. T-verifier: Verifying
Truthfulness of Fact Statements. ICDE, 2011.
 Xian Li, Weiyi Meng, Clement Yu. Truthfulness Analysis of
Fact Statements Using the Web. IEEE Data Engineering
Bulletin, 34(3), September 2011.
 Xian Li, Weiyi Meng, Clement Yu, Haixun Wang. Verification
of Fact Statements with Multiple Truthful Alternatives.
Submitted.
 Liang Wang, Weiyi Meng, Wenzhu Tong, Zhiyong Peng.
Resolving Seemingly Conflicting Fact Statements Caused
by Missing Terms. Submitted.
JUNE 16,
2016
57
58
Untruthful Information All Around the Web
Untruth easily
spreads on the
Web.
Confliction makes
people confused
JUNE 16,
2016
 It can mislead people, especially young students, with
incorrect knowledge.
 It can make people lose confidence about the quality of
information on the Web.
 It can cause the stock market to be more volatile.
 It can cause a politician to lose an election.
 ……
59
Untruthful Information Is Bad
Need tools and technologies to help verify
information truthfulness!
JUNE 16,
2016
 Fact statement: A statement that attempts to state a “fact”.
 Barack Obama is a Christian.
 China has 35 provinces.
 Opinionated statement: A statement that expresses an
opinion.
 Michelle Obama is beautiful.
 Stinky tofu is delicious.
Fact Statements vs. Opinionated Statements
60
We focus on fact statements in this talk.
JUNE 16,
2016
 Given a fact statement,
 determine whether or not the statement is truthful, and
if it is not,
 identify a truthful statement most relevant to the given
statement,
using information on the Web.
Problem Statement
“Facts”
61
JUNE 16,
2016
 Consider only fact statements (will be called doubtful
statements) with a single specified doubt unit (denoted in [
]) and a single truthful answer.
 Example: China has [35] provinces.
 doubtful statement = doubt unit + topic units
 Example: For “China has [35] provinces”,
doubt unit = “35”
topic units = {“China”, “provinces”}
Restricted Fact Statements
62
JUNE 16,
2016
Overview of the T-verifier Method
Doubt unit
Doubtful
statement
Alternative
statement
generation
Statement
truthfulness
verification
Alternative
statements
Web
Search
Verify
Search
Verified
Results
63
Generate a query from the input
doubtful statement S (the “topic units”
of S) and submit it to a search engine
to collect information for analysis.
Extract features from retrieved results.
Use the features to rank alternative
units (alter-units) for the doubt unit.
Form alternative statements based
on highly ranked alter-units.
Submit each alternative statement as a query
to collect information for analysis.
Extract features from the search
results. Rank alternative statements
based on extracted features.
JUNE 16,
2016
 Basic requirements for each alternative statement AS:
 Same Topic. AS should cover the same topic as the doubtful
statement (DS).
 Different Value. AS should be different from the DS on the
doubt unit. We call the term(s) in place of the doubt unit
alternative unit or alter-unit.
 Term sense/type closeness. Each alter-unit and doubt unit
should have close word sense, in both data type and semantic
meaning.
Example: “Christian” is closer to “Muslim” than to “President”,
because both “Christian” and “Muslim” are religious people.
Alternative Statement Collection (1)
64
JUNE 16,
2016
 We convert the problem of generating alternative
statements into the problem of looking for alternative units
(alter-units) and use them to generate alternative
statements.
 Two observations:
 Relevant alter-units frequently co-occur with the topic units.
 Relevant alter-units often co-occur with the doubt unit.
 when people have doubt about a fact, they often mention
other possible answers to the fact
 when people dispute a controversial point or a common
misconception, they often mention their own points or
truthful fact
Alternative Statement Generation (2)
65
JUNE 16,
2016
 Notations:
 Q: Query = topic units = doubtful statement – doubt units
 Let D = {r1, r2 …, rN} be the set of top-N SRRs retrieved by Q.
 Cont(r, T) returns 1 if r contains T and 0 if r does not.
 Each term/phrase T in D is a possible candidate alter-unit and
a ranking score is computed for each T.
 Seven Features are used to rank T.
Alternative Statement Generation (3)
66
JUNE 16,
2016
 Feature 1 (Data type matching (DM)):
 Any relevant alter-unit should be of the same data type
as the doubt unit.
 Several data types are considered in T-verifier: date,
time, telephone number, email address, person name,
place name (e.g. name of state, city and attractions),
and number. All the others are considered as common
string.
Alternative Statement Generation (4)
67
JUNE 16,
2016
 Feature 2 (Sense closeness (SC)):
 Any relevant alter-unit should have related meaning
(sense) as the doubt unit (e.g., “Christian” has a closer
meaning to “Muslim” than “president”).
 WordNet is used to capture the sense closeness between
two different terms.
where α and β are parameters,  computes a Wu-Palmer
similarity between T and DU based on their distance in the
hypernym tree
Alternative Statement Generation (5)
68
SC(T,DU) =
a, hyper(T,DU) = true
b, sibling(T,DU) = true
g, g = wupsimilarity(T,DU),otherwise
ì
í
ïï
î
ï
ï
JUNE 16,
2016
 Feature 3 (Term local correlation (TLC)):
 The co-occurrence coefficient of T with the doubt unit –
terms that co-occur more frequently with the doubt unit
are more likely to be good alter-units.
Alternative Statement Generation (6)
69
JUNE 16,
2016
 Four Textual Features:
 Feature 4 (Result coverage (RC)): The percentage of SRRs
in D that contain T – if T appears in a higher percentage of
the SRRs, it is more likely to be a good alter-unit.
Alternative Statement Generation (7)
70
N
TrCont
TRC
N
i i
 1
),(
)(
JUNE 16,
2016
 Textual Features (continued):
 Feature 5 (Result Query Relevance (RQR)): The relevance of
the SRRs (with respect to Q) that contain T – if T appears in
more relevant SRRs, T is more likely to be a good alter-unit.
 We consider an SRR that contains more topic units to have a
higher degree of relevance to the topic query Q.
Alternative Statement Generation (8)
71





 N
i i
N
i ii
TrCont
QlenQrTrCont
TQRQR
1
1
),(
)(/||),(
),(
JUNE 16,
2016
 Textual Features (continued):
 Feature 6 (SRR ranking (Rrank)): The ranks of the SRRs that
contain T – if T appears in higher ranked SRRs, it is more
likely to be a good alter-unit.
where pos(r) is the position of r in the search engine’s ranking
list and N is the number of SRRs considered.
Alternative Statement Generation (9)
72






 N
i i
N
i ii
Nrpos
NrposTrCont
TRrank
1
1
)/)(1(
)/)(1(),(
)(
JUNE 16,
2016
 Feature 7 (Term distance (TD))): The size of the smallest
window of consecutive words in each SRR (title or snippet) that
covers all the topic units contained in the SRR as well as the
term T – the smaller the size is, the more likely T is a good alter-
unit.
Alternative Statement Generation (10)
73
JUNE 16,
2016
 Alter-unit generation algorithm
 Step 1: Filter the candidate terms by data type matching
(DM), i.e., only those terms that match the data type of
the doubt unit will be considered further.
 Step 2: Rank each remaining candidate term based on
the other six features. The ranking score of term T is
computed using the following formula:
where wi, i=1,...,6, are the weights for the features. The
optimal weights are determined by a learning algorithm.
Alternative Statement Generation (11)
aurs(T)= w1 *SC+w2 *TLC+w3 *RC+w4 *RQR+w5 *Rank+w6 *RTD
74
JUNE 16,
2016
 Select the top-k ranked alter-units for some k to be
determined by experiments on training data.
 Replace the doubt-unit by each of the top-k alter-units to
generate k alternative statements.
Alternative Statement Generation (12)
75
JUNE 16,
2016
Statement Truthfulness Verification (1)
Search each
alternative
statement
Use basic
rankers to rank
the alternative
statements
Merge the
ranks from
all basic
rankers
76
Step 1: Send each of the top-N ranked alternative
statements as a query to a search engine and collect
relevant SRRs.
Step 2: Employ a number of basic rankers and
generate a ranking list of the alternative statements
using each basic ranker based on newly collected
SRRs.
Step 3: Use a rank merging algorithm to merge the
rank lists into a combined final list. Select the top-
ranked statement as the truthful statement.
Determine which of the alternative statements is truthful.
JUNE 16,
2016
Basic ranker 1: Alter-Unit Ranker (AUR)
 Rank the alternative statements in the same order as their
corresponding alter-units which is produced by the
alternative statement generation phase.
 This basic ranker tries to utilize the information used in the
alternative statement generation phase.
Basic ranker 2: Hits Ranker (HR)
 Submit each alternative statement to a search engine and
rank the alternative statements by the numbers of hits they
get from the search engine.
 This basic ranker implicitly assumes that each hit provides an
evidence that the corresponding alternative statement is correct.
Statement Truthfulness Verification (2)
77
JUNE 16,
2016
Basic rankers 3-6: Text Feature Rankers (TFR)
 TFR(RC): Rank alternative statements in descending order of
the Result Coverage (RC) values of their alter-units.
 TFR(RQR): Rank alternative statements in descending order of
the Result Query Relevance (RQR) values of their alter-units.
 TFR(Rrank): Rank alternative statements in descending order
of the SRR ranking (Rrank) values of their alter-units.
 TFR(TD): Rank alternative statements in descending order of
the Term Distance (TD) values of their alter-units.
Statement Truthfulness Verification (3)
Note: A new set of SRRs is used for each alternative statement in
the above computations, different from that used in the alternative
statement generation phase.
78
JUNE 16,
2016
Basic ranker 7: Domain Authority Ranker (DAR)
 Some researchers have observed that web pages
published by certain domains are more likely to be
truthful, such as “.gov”, “.edu”, etc.
 This basic ranker ranks alternative statements based on
the percentage of results retrieved from each domain and
the correctness weight of each domain using training
data.
Statement Truthfulness Verification (4)
79
JUNE 16, 2016
Rank merging:
 Each basic ranker produces a ranked list of the same set
of alternative statements obtained for the same doubtful
statement.
 Rank merging is to combine the seven ranked lists into a
single ranked list of these alternative statements.
Statement Truthfulness Verification (5)
80
Basic ranker 1: Ranked list 1
Basic ranker 7: Ranked list 7
:
:
:
:
Merged Ranked listRank Merging
JUNE 16,
2016
Seven rank merging algorithms are evaluated:
Probability Combination:
 Let Pi be the probability the i-th basic ranker can predict
truthfulness correctly. Then the overall probability that the
truthfulness of a statement S is correctly determined (i.e.,
the probability that at least one of the basic rankers is
correct in predicting the truthfulness of S) can be
estimated by
assuming the basic rankers are independent.
Statement Truthfulness Verification (6)
81
JUNE 16,
2016
Basic Borda Count:
 Each alternative statement Si is treated as a candidate in
an election and each basic ranker BRj is treated as a voter
in the election.
 BRj assigns n – k + 1 points to Si if Si is ranked k-th by
BRj, where n is the number of alternative statements being
ranked.
 The total points assigned to Si is the sum of all points
assigned to Si by all basic rankers.
 Rank all alternative statements in descending order of the
total points they received.
Statement Truthfulness Verification (7)
82
JUNE 16,
2016
Basic Condorcet algorithm:
 Define a comparison function between any two alternative
statements Si and Sj: if Si is ranked higher than Sj by more
basic rankers, then define Si > Sj (i.e., Si is ranked higher than
Sj in the merged list); else if Sj is ranked higher than Si by more
basic rankers, then define Sj > Si; else define Si = Sj.
 Sort all alternative statements based on the above comparison
function and output this sorted list as the merged result list.
Example: Consider the following three local results: R1 = (S3,
S1, S2), R2 = (S1, S2, S3), and R3 = (S2, S3, S1). In this
case, we have S1 beats S2, S2 beats S3, but S3 beats S1.
When a cycle is formed, all statements in the cycle are
considered to be equivalent.
Statement Truthfulness Verification (8)
JUNE 16,
2016
83
Weighted Borda Count:
 The same as the Basic Borda algorithm except that each
basic ranker BRi is assigned a weight wi, reflecting the
quality of the basic ranker. For example, wi could be
implemented as the precision of BRi based on a training
set.
 Now the points assigned by BRi are adjusted by
multiplying them with wi.
 The rest of the steps remain the same as in the Basic
Borda algorithm.
Statement Truthfulness Verification (9)
84
JUNE 16,
2016
Weighted Condorcet algorithm:
 It is the same as the Basic Condorcet algorithm except
that the comparison function is modified as follows for any
two alternative statements Si and Sj:
 if the sum of the weights of the basic rankers that rank Si
higher than Sj is larger than the sum of the weights of the
basic rankers that rank Sj higher than Si, then define Si > Sj;
 else if the sum of the weights of the basic rankers that rank
Sj higher than Si is larger than the sum of the weights of the
basic rankers that rank Si higher than Sj, then define Sj > Si;
 else define Si = Sj.
Statement Truthfulness Verification (10)
JUNE 16,
2016
85
Positional Borda Count:
 Position probability: Let Pj
i denote the probability that an
alternative statement ranked at the j-th position by the i-th
basic ranker is truthful. Pj
i can be obtained by training.
 We can interpret Pj
i as the ranking score given by BRi to
the alternative statement ranked at j-th position.
 Rank all alternative statements in descending order of
the sum of the scores (the position probabilities) they
received.
Statement Truthfulness Verification (11)
JUNE 16,
2016
86
Weighted Positional Borda Count:
 Let wi be the weight of the i-th basic ranker BRi .
 Let Pj
i be the probability that an alternative statement
ranked at the j-th position by BRi is truthful.
 The final ranking score given by BRi to the alternative
statement ranked at j-th position is computed by wi * Pj
i .
 Rank all alternative statements in descending order of the
sum of the final ranking scores they received.
Statement Truthfulness Verification (12)
JUNE 16,
2016
87
 Dataset: 50 doubtful statements are compiled from
factoid questions in Q&A track in TREC 8 and TREC 9.
See http://cs.binghamton.edu/~xianli/doubtful_statements.htm.
 For each doubtful statement, the correct answer is
manually verified.
 For each doubtful statement, form a topic query after the
doubt unit is removed. Submit it to the Yahoo! search
engine and obtain the top 200 SRRs.
 Retrieve 100 results for each alternative statement.
Experiment (1)
88
Doubtful statement Doubt unit Truth
Antarctic is the only continent without a desert. Antarctic Europe
George C. Scott won the Oscar for best actor in 1970. George C. Scott George C. Scott
JUNE 16,
2016
 Alter-unit generation algorithm evaluation
 Test 1: Randomly select 25 doubtful statements as
training set and the rest as testing set.
Experiment (2)
Training data set Testing data set
Total cases 25 Total cases 25
Truthful one as top 1st 17 Truthful one as top 1st 14
Truthful one as 2nd 7 Truthful one as 2nd 9
Truthful one as 3rd 0 Truthful one as 3rd 1
Truthful one as 4th 1 Truthful one as 4th 1
Truthful one as 5th 0 Truthful one as 5th 0
All truthful alter-units are ranked among the top 5 results.
89
JUNE 16,
2016
 Impact of using different numbers of top SRRs
Experiment (3)
90
Top 10 Top 50 Top 100 Top 150 Top 200
Ranked as 1st 12 21 29 31 31
Ranked as 2nd 6 12 10 12 16
Ranked as 3rd 5 3 3 3 1
Ranked as 4th 3 1 2 2 2
Ranked as 5th 2 0 1 1 0
Not Among top 5 22 13 5 0 0
JUNE 16,
2016
Precision of each basic ranker
 The precision of each basic ranker is defined by
precision = n / N
n: the number of truthful statements that are ranked at the top
N: the number of doubtful statements evaluated
 The precisions of the seven basic rankers:
Experiment (4)
Ranker AUR TFR(TD) TFR(RC) TFR(RQR)
Precision 0.62 0.32 0.66 0.6
Ranker HR DAR TFR(Rrank)
Precision 0.20 0.20 0.62
91
JUNE 16,
2016
Truthfulness verification evaluation
 The precisions of different merging algorithms (based on
10-fold cross validation):
Experiment (5)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BaseBorda BaseCond ProbComb PosBorda WBorda WCond WPosBorda
92
JUNE 16,
2016
Analysis of five failed cases
Cases 4 and 5 could be considered as correct.
Experiment (6)
93
Untruthful statements verified as truthful Truthful
1 Tom Hanks was lead actress in the
movie 'Sleepless in Seattle'.
Meg Ryan
2 Apollo is the first spacecraft on the moon. Luna2
3 Sullivan is the fastest swimmer in the world. Michael Phelps
4 Les Paul invented the electric guitar. Rickenbacker
5 English is the primary language of the Philippines Filipino
JUNE 16,
2016
 General fact statements and questions in QA systems
are different.
 Not all fact statements can be converted to equivalent
questions, e.g., fact statements whose doubt units are
verbs.
 The doubt units in fact statements convey more information
than the WH-word in a question.
 A fact statement may have multiple doubt units but a
question can imply only one doubt unit.
Relationship with QA
94
JUNE 16,
2016
T-verifier Answers.com Yahoo! answers
Total statements 50 50 50
Wrong results 5 4 12
Cannot find results 0 6 18
Correct Results 45 40 20
95
How does T-verifier compare with two
popular Web QA Systems?
 Example Statement Answers.com can’t find result:
 [800] people died when the Estonia sank in 1994.
 Example Statement Answers.com gives wrong result:
 [20] hexagons are on a soccer ball
 Correct answer: 20, Answers.com gives: 32
JUNE 16,
2016
 This type of statements will be called MTA statements.
 Examples of MTA statements:
Doubtful Statements with Multiple
Truthful Alternatives
96
Type Doubtful statements Truthful alternatives
CC Barack Obama was born in [Kenya]. Honolulu, Hawaii, United States
MVA [Edwin Krebs] won the Nobel Prize in
medicine in 1992.
Edwin Krebs, Edmond Fischer
SFE [Peter Ustinov] has portrayed Hercule
Poirot.
Peter Ustinov, David Suchet,
Agatha Christie
DSC [U.S.] won team title at the 2011
World Gymnastics Championships.
U.S., China
TS [Bob Dole] served as the President of
the United States.
Barack Obama since Jan 2009,
George Bush from Jan 2001 to Jan
2009, …
JUNE 16,
2016
 Objective: identify all truthful alternatives for each MTA
statement.
 Significantly more challenging than processing STA
statements (doubtful statements with a single truthful
alternative).
 The step of generating and ranking alter-units can
remain the same.
 The difference is how to determine all the truthful alter-
units from the ranked candidate alter-units.
MTA Statements Processing
97
JUNE 16,
2016
 Assumption: Truthful alter-units are likely ranked above
untruthful ones.
 In other words, the ranking scores of the truthful alter-units
are likely to be higher than those of untruthful ones.
 Top-k approach: Determine an appropriate integer k so
that the top-k ranked alter-units are recognized as the
truthful ones.
 The problem becomes how to determine the right k for
each MTA statement.
 This approach is a straightforward extension of the
solution for processing STA statements, which is a top-1
solution.
Top-k Approach (1)
98
JUNE 16,
2016
Three different ways to select the k:
 Largest Score Gap (LSG). Compare the gap between the
scores of each pair of consecutively ranked alternative
statements and choose the largest gap as the cut-off point.
 Largest Percentage Gap (LPG). Similar to LSG except that
percentage gap of scores is used.
 The percentage gap between two ranking scores Si and Si+1 of two
consecutively ranked alternative statements is: (Si – Si+1) / Si.
 First Significant Gap (FSG). This method uses the first
significant score gap as the cut-off point. In this paper,
significant score gap is defined as follows: if logb(Si / Si+1) > 1,
the score gap is considered to be significant, and k is set to be
the smallest (i.e., the first) i that satisfies this condition.
Top-k Approach (2)
99
JUNE 16,
2016
Compare the three methods for selecting the k
 50 MTA statements are used.
 A total of 143 truthful alternative statements for the 50 MTA
statements, 2.8 per MTA statement on average.
Top-k Approach (3)
100
# total
selected
# truthful
alternatives
Precision Recall F-score
LSG 140 107 0.76 0.74 0.75
LPG 182 125 0.68 0.86 0.76
FSG 178 128 0.71 0.88 0.79
Conclusions:
 Top-k approach is not sufficiently good.
 Need new approach: Divide MTA statements into different
types and develop a solution for each type.
JUNE 16,
2016
Five types of MTA statements have been identified.
 Type 1: Compatible Concepts (CC). For each MTA statement of
this type, its truthful alter-units are compatible to each other.
Usually, these alter-units either are equivalent to each other
(i.e., synonyms) or correspond to the same basic concept but
with different specificity/generality (i.e., hyponyms/hypernyms)
or with different granularity (i.e., one is a part of another).
Example 1: For “Barack Obama was born in [Honolulu]”, truthful
alternatives include “Honolulu”, “Hawaii”, “United States”, etc.
Example 2: For “Queen Elizabeth II resided in [United
Kingdom]”,
correct alter-units include “United Kingdom”, “England” and
“Great Britain”.
Different Types of MTA Statements (1)
101
JUNE 16,
2016
 Type 2: Multi-Valued Attributes (MVA). For each MTA
statement of this type, the truthful alter-units correspond to
different values of a multi-valued attribute in a relational
database. A multi-valued attribute may have multiple values for
a given entity (record).
Example: For “[Edwin Krebs] won the Nobel Prize in medicine in
1992”, two US biochemists “Edwin Krebs” and “Edmond”
Fischer” shared the 1992 Nobel Prize in medicine (they are
values of the multi-valued attribute “Recipients” of a Nobel
Prize record); therefore both of them are truthful alter-units.
Different Types of MTA Statements (2)
102
JUNE 16,
2016
 Type 3: Shared-Feature Entities (SFE). Many entities share a
common feature. In this case, when trying to find entities that
have this feature, multiple entities may be detected.
Example: For “[Peter Ustinov] has portrayed Hercule Poirot”, all
actors who have portrayed Hercule Poirot in different movies
(including remakes) are truthful alter-units. The truthful alter-
units include Peter Ustinov, David Suchet and Agatha Christie.
 Difference between the SFE type and the MVA type: For the
former, multiple entities share a common feature, while for the
latter, the same entity has multiple values for an attribute.
 In reality, values of the same multi-valued attribute are much
more likely to co-occur compared to the entities that share the
same feature.
Different Types of MTA Statements (3)
103
JUNE 16,
2016
 Type 4: Different Sub-Categories (DSC). Different sub-
categories related to some of the topic units of the doubtful
statement. A topic unit of a doubtful statement DS is a
term/phrase in DS different from the doubt unit. When a topic
unit is replaced by different sub-categories, the doubt unit may
have different truthful alter-units, making the doubtful statement
an MTA statement.
Example: For “[U.S.] won team title at the 2011 World
Gymnastics Championships”, topic unit “World” could be
replaced by “US”, “Europe”, etc. and “team” could be replaced
by “men’s team” or “women’s team”. Each replaced can lead to
a different truthful alter-unit.
Different Types of MTA Statements (4)
104
JUNE 16,
2016
 Type 5: Time-Sensitive (TS).
 A fact statement is a time-sensitive statement if its truthfulness
changes over time with a fairly regular pattern on time or such
change is expected even though there is no such regularity.
Since there may be different truthful alter-units at different times
for such type of statements, they are MTA statements.
Example: A time-sensitive statement with a fairly regular pattern
is US presidents (serve 4 or 8 years).
Example: A time-sensitive statement with less regularity is “A
new world 100-meter track record was established by [Usain
Bolt]” because it is generally not predictable who will establish a
new record in the future.
Different Types of MTA Statements (5)
105
JUNE 16,
2016
 We recently completed a paper on processing doubtful
statements of CC and MVA types.
 For statements of the CC type, we primarily explored
semantic relationships (such as synonymy and hypernymy
relationships) among alter-units.
 For statements of the MVA type, we primarily explored co-
occurrence-based correlation relationships among alter-
units.
Processing CC & MVA Statements
106
JUNE 16,
2016
 Sometimes, seemingly conflicting fact statements (SCFSs)
do not mean actual conflicts.
 Consider the following two statements:
S1: China won the team title at the 2011 World
Gymnastics Championships in Tokyo.
S2: USA won the team title at the 2011 World Gymnastics
Championships in Tokyo.
Resolving Seemingly Conflicting Fact
Statements Caused by Missing Terms
107
JUNE 16,
2016
 The seemingly conflict is caused by missing terms that
make the statements imprecise.
 With the missing terms added, we have the following
precise statements:
S1: China won the men’s team title at the 2011 World
Gymnastics Championships in Tokyo.
S2: USA won the women’s team title at the 2011 World
Gymnastics Championships in Tokyo.
Resolving Seemingly Conflicting Fact
Statements Caused by Missing Terms
108
JUNE 16,
2016
 General problem: Giving a set of seemingly conflicting fact
statements, determine whether the seemingly conflict is
caused by missing terms by finding the missing terms. We
also determine the appropriate position for the missing
term in each statement.
 The missing terms often create sub-categories of a
category.
Example: “men’s team title” and “women’s team title” are
sub-categories of “team title”.
 So the solution to this problem can be used to process
MTA-statements of the DSC type.
Resolving Seemingly Conflicting Fact
Statements Caused by Missing Terms
109
JUNE 16,
2016
 Current data fusion techniques assume that there is a
single true value for each data item. How to handle
situations where there may be multiple true values for
some data items?
 Data inconsistency may have different causes. How to
automatically identify these causes and apply appropriate
methods to resolve them?
Additional Research Issues (1)
110
JUNE 16,
2016
 How to automatically classify MTA statements into different
types?
 The same statement may be classified into multiple types.
 How to process the other two types of MTA statements,
namely SFE and TS?
 If a statement is classified into multiple types, it may be
processed using different techniques. How to combine the
results of these different techniques?
Additional Research Issues (2)
111
JUNE 16,
2016
 How to handle doubtful statements that have no specified
doubt unit(s)? In other words, how to automatically
determine the most worthy doubt unit(s) from a given fact
statement?
Example: For the following statement,
More than 100,000 people were killed by the 2008 Sichuan
earthquake.
112
JUNE 16,
2016
Additional Research Issues (3)
 How to process all fact statements on the Web?
Challenges:
 How to identify controversial fact statements so we can
focus on these statements only?
 How to use fact statements that have already been
verified to verify or de-verify other fact statements?
113
JUNE 16,
2016
Additional Research Issues (4)
 How to apply natural language processing (NLP) to
determine the truthfulness of fact statements?
Example: Consider the following sentences related to fact
statement “ SUNY has 64 campuses”:
1. I believe SUNY has 64 campuses.
2. I strongly believe SUNY has 64 campuses.
3. I am sure SUNY has 64 campuses.
4. I am sure SUNY has 64 campuses because this is
mentioned at the SUNY website.
114
JUNE 16,
2016
Additional Research Issues (5)
 How to identify and resolve time-sensitive data
inconsistency in fact statements?
 The problem is mostly caused by the use of relative time
references such as yesterday, last week, 2 years ago, …
 Need to establish the correct reference point for each
relative time in a data source.
115
JUNE 16,
2016
Additional Research Issues (6)
Questions?
116
JUNE 16,
2016

Weiyi meng web data truthfulness analysis

  • 1.
    Web Data TruthfulnessAnalysis Weiyi Meng Department of Computer Science State University of New York at Binghamton meng@cs.binghamton.edu (RenDa Office: 235) http://www.cs.binghamton.edu/~meng/meng.html July 2015 JUNE 16, 2016 1
  • 2.
    Where in theWorld is SUNY Binghamton?  Located in Binghamton in New York state  Birthplace of IBM (Endicott, NY)  Metro population: 249,000  One of the safest U.S. midsized cities  Low cost of living (12% below U.S. average)  Close to major cities JUNE 16, 2016 2
  • 3.
    Binghamton at aGlance JUNE 16, 2016 3
  • 4.
    The SUNY System 64 campuses (two-year, four-year)  Four PhD-granting “University Centers”  Albany  Binghamton  Buffalo  Stony Brook JUNE 16, 2016 4
  • 5.
    Rankings  #1 bestpublic college in New York and #18 in the nation  — AMERICAN CITY BUSINESS JOURNAL, 2015  #4 best value among nation’s public colleges for out-of-state and international students and #15 overall  — KIPLINGER’S PERSONAL FINANCE MAGAZINE, 2014  Top 5 best values in the U.S.A. among public colleges  — KIPLINGER’S PERSONAL FINANCE MAGAZINE AND THE PRINCETON REVIEW, 2014  38th of top U.S.A. public universities  — U.S. NEWS & WORLD REPORT, 2014  #5 of 50 best-rated, most affordable colleges for international students  — GREAT VALUE COLLEGES, 2014  Premier Public University in Northeast U.S.A.  — FISKE GUIDE TO COLLEGES JUNE 16, 2016 5
  • 6.
    6 The University MainCampus  J U L Y 2 0 1 5 JUNE 16, 2016
  • 7.
    Talk Outline  Introduction Structured Web data quality analysis and truth finding  Fact statement truthfulness analysis  Additional Research Issues JUNE 16, 2016 7
  • 8.
    Web Data Quality Quality of data can be measured in different ways:  Correctness or truthfulness  Freshness  Completeness  Objectiveness  Writing quality/style  Appropriateness  ……  This talk will focus mostly on correctness/truthfulness. JUNE 16, 2016 8
  • 9.
    Structured Web DataQuality Analysis and Truth Finding  This part of the talk is based on the following two works in collaboration with AT&T Labs-Research:  Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh Srivastava. Truth Finding on the Deep WEB: Is the Problem Solved? VLDB 2013.  Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh Srivastava. Scaling Up Copy Detection. ICDE 2015. JUNE 16, 2016 9
  • 10.
     Why thesetwo domains? Belief of fairly clean data Data quality can have big impact on people’s lives  Resolved heterogeneity at schema level and instance level Study on Two Domains #Sources Period #Objects #Local- attrs #Global- attrs Considered items Stock 55 7/2011 1000*20 333 153 16000*20 Flight 38 12/2011 1200*31 43 15 7200*31 JUNE 16, 2016 10
  • 11.
    Study on TwoDomains  Stock  Search “stock price quotes” and “AAPL quotes”  Sources: 200 (search results)89 (deep web)76 (GET method) 55 (none JavaScript)  1000 “Objects”: a stock with a particular symbol on a particular day  30 from Dow Jones Index  100 from NASDAQ100 (3 overlaps)  873 from Russell 3000  Attributes: 333 (local)  153 (global)  21 (provided by > 1/3 sources)  16 (no change after market close) #Sources Period #Objects #Local- attrs #Global- attrs Considered items Stock 55 7/2011 1000*20 333 153 16000*20 Flight 38 12/2011 1200*31 43 15 7200*31 JUNE 16, 2016 11
  • 12.
    Study on TwoDomains  Flight  Search “flight status”  Sources: 38  3 airline websites (AA, UA, Continental)  8 airport websites (SFO, DEN, etc.)  27 third-party websites (Orbitz, Travelocity, etc.)  1200 “Objects”: a flight with a particular flight number on a particular day from a particular departure city  Departing or arriving at the hub airports of AA/UA/Continental  Attributes: 43 (local)  15 (global)  6 (provided by > 1/3 sources)  scheduled dept/arr time, actual dept/arr time, dept/arr gate #Sources Period #Objects #Local- attrs #Global- attrs Considered items Stock 55 7/2011 1000*20 333 153 16000*20 Flight 38 12/2011 1200*31 43 15 7200*31 JUNE 16, 2016 12
  • 13.
    Q1. Are Therea Lot of Redundant Data on the Deep Web?  JUNE 16, 2016 13
  • 14.
    Q2. Are theData Consistent? Inconsistency on 70% data items Tolerance to 1% difference  JUNE 16, 2016 14
  • 15.
    Why Such Inconsistency? —I. Semantic Ambiguity Yahoo! Finance Nasdaq Day’s Range: 93.80-95.71 52wk Range: 25.38-95.71 52 Wk: 25.38-93.72 JUNE 16, 2016 18
  • 16.
    Why Such Inconsistency? —II. Instance Ambiguity JUNE 16, 2016 19
  • 17.
    Why Such Inconsistency? —III. Out-of-Date Data 4:05 pm 3:57 pm JUNE 16, 2016 20
  • 18.
    Why Such Inconsistency? —IV. Unit Difference 76,821,000 76.82B JUNE 16, 2016 21
  • 19.
    Why Such Inconsistency? —V. Pure Error FlightView FlightAware Orbitz 6:15 PM 6:15 PM 6:22 PM 9:40 PM 8:33 PM 9:54 PM JUNE 16, 2016 22
  • 20.
    Why Such Inconsistency? Random sample of 20 data items and 5 items with the largest #values in each domain JUNE 16, 2016 23
  • 21.
    Q3. Is EachSource of High Accuracy?  Not high on average: .86 for Stock and .8 for Flight  Gold standard  Stock: vote on data from Google Finance, Yahoo! Finance, MSN Money, NASDAQ, Bloomberg  Flight: from airline websites  JUNE 16, 2016 24
  • 22.
    Q3-2. Are AuthoritativeSources of High Accuracy?  Reasonable but not so high accuracy  Medium coverage  JUNE 16, 2016 25
  • 23.
    Q4. Is ThereCopying or Data Sharing Between Deep-Web Sources?  JUNE 16, 2016 26
  • 24.
    Q4-2. Is Copyingor Data Sharing Mainly on Accurate Data?  JUNE 16, 2016 27
  • 25.
    How to ResolveInconsistency and Find the True Values? The problem is known as Data Fusion. JUNE 16, 2016 28
  • 26.
    Basic Solution: Voting Only 70% correct values are provided by over half of the sources  Voting precision:  .908 for Stock; i.e., wrong values for 1500 data items  .864 for Flight; i.e., wrong values for 1000 data items JUNE 16, 2016 29
  • 27.
    Improvement I. LeveragingSource Accuracy S1 S2 S3 Flight 1 7:02PM 6:40PM 7:02PM Flight 2 5:43PM 5:43PM 5:50PM Flight 3 9:20AM 9:20AM 9:20AM Flight 4 9:40PM 9:52PM 8:33PM Flight 5 6:15PM 6:15PM 6:22PM JUNE 16, 2016 30
  • 28.
    Improvement I. LeveragingSource Accuracy  Naïve voting obtains an accuracy of 80% S1 S2 S3 Flight 1 7:02PM 6:40PM 7:02PM Flight 2 5:43PM 5:43PM 5:50PM Flight 3 9:20AM 9:20AM 9:20AM Flight 4 9:40PM 9:52PM 8:33PM Flight 5 6:15PM 6:15PM 6:22PM Higher accuracy; More trustable JUNE 16, 2016 31
  • 29.
    Improvement I. LeveragingSource Accuracy  Considering accuracy obtains an accuracy of 100% Challenges: 1. How to decide source accuracy? 2. How to leverage source accuracy in voting? S1 S2 S3 Flight 1 7:02PM 6:40PM 7:02PM Flight 2 5:43PM 5:43PM 5:50PM Flight 3 9:20AM 9:20AM 9:20AM Flight 4 9:40PM 9:52PM 8:33PM Flight 5 6:15PM 6:15PM 6:22PM Higher accuracy; More trustable JUNE 16, 2016 32
  • 30.
    Results on StockData (I)  Among various methods, the Bayesian-based method (Accu) performs best at the beginning, but in the end obtains a final precision (=recall) of .900, worse than Vote (.908) JUNE 16, 2016 33
  • 31.
    Results on StockData (II)  AccuSim obtains a final precision of .929, higher than Vote and any other method (around .908)  This translates to 350 more correct values JUNE 16, 2016 34
  • 32.
    Results on StockData (III) JUNE 16, 2016 35
  • 33.
    Results on FlightData  Accu/AccuSim obtains a final precision of .831/.833, both lower than Vote (.857) JUNE 16, 2016 36
  • 34.
    Copying or DataSharing Can Happen on Inaccurate Data JUNE 16, 2016 37
  • 35.
    S1 S2 S3S4 S5 Flight 1 7:02PM 6:40PM 7:02PM 7:02PM 8:02PM Flight 2 5:43PM 5:43PM 5:50PM 5:50PM 5:50PM Flight 3 9:20AM 9:20AM 9:20AM 9:20AM 9:20AM Flight 4 9:40PM 9:52PM 8:33PM 8:33PM 8:33PM Flight 5 6:15PM 6:15PM 6:22PM 6:22PM 6:22PM  Naïve voting works only if data sources are independent. JUNE 16, 2016 38  Considering source accuracy can be worse when there is copying
  • 36.
    Improvement II. IgnoringCopied Data  It is important to detect copying and ignore copied values in fusion Challenges: 1. How to detect copying? 2. How to leverage copying in voting? S1 S2 S3 S4 S5 Flight 1 7:02PM 6:40PM 7:02PM 7:02PM 8:02PM Flight 2 5:43PM 5:43PM 5:50PM 5:50PM 5:50PM Flight 3 9:20AM 9:20AM 9:20AM 9:20AM 9:20AM Flight 4 9:40PM 9:52PM 8:33PM 8:33PM 8:33PM Flight 5 6:15PM 6:15PM 6:22PM 6:22PM 6:22PM JUNE 16, 2016 39
  • 37.
    Results on FlightData (I)  AccuCopy obtains a final precision of .943, much higher than Vote (.864)  This translates to 570 more correct values JUNE 16, 2016 40
  • 38.
    Results on FlightData (II) JUNE 16, 2016 41
  • 39.
    Take-Away Messages  Webdata is not fully trustable, Web sources have different accuracy, and copying is common  Leveraging source accuracy, copying relationships, and value similarity can improve truth finding JUNE 16, 2016 42
  • 40.
    Key Observations aboutCopy Detection  Structured data copy detection is important and challenging  Useful for truth finding, protecting rights of data sources  Copy detection can be expensive  Prior work (Pairwise) has time complexity O(|S|2 |D| L)  Many opportunities for scaling up copy detection  Avoid comparing source pairs that are likely to be independent  Compare few values for source pairs before making decision  Perform few comparisons in latter iterations of copy detection 43 JUNE 16, 2016
  • 41.
     Are sourcesS0 and S1 copying? – Not necessarily Copy Detection 44 Source China Japan S Korea N Korea Vietnam S0 Beijing Tokyo Seoul Hanoi S1 Beijing Tokyo Seoul Pyongyang Hanoi S2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City S3 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City S4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City JUNE 16, 2016
  • 42.
     Are sourcesS2 and S3 copying? – Very likely because sharing of many false values without copying is very unlikely. Copy Detection 45 Source China Japan S Korea N Korea Vietnam S0 Beijing Tokyo Seoul Hanoi S1 Beijing Tokyo Seoul Pyongyang Hanoi S2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City S3 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City S4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City JUNE 16, 2016
  • 43.
    Copy Detection: BayesianAnalysis  Goal: Pr(S1S2| Ф), Pr(S1S2| Ф) (sum = 1)  According to Bayes Rule, we need Pr(Ф|S1S2), Pr(Ф|S1S2)  Key: compute Pr(ФD|S1S2), Pr(ФD|S1S2), for each D  S1  S2 46 Different Values Od TRUE Ot S1  S2 FALSE Of Same Values JUNE 16, 2016
  • 44.
    Copy Detection: IterativeProcess 47 Truth Discovery Accuracy Computation Copy Detection Step 1Step 3 Step 2 JUNE 16, 2016
  • 45.
    Scaling Up CopyDetection: Opportunities  Pairwise copy detection has time complexity O(|S|2.|D|.L)  |S| is the number of sources  |D| is the number of data items  L is the number of iterations  Many opportunities for scaling up copy detection  Avoid comparing source pairs that are likely to be independent (e.g., share no values or very few true values)  Compare few values for source pairs before making decision  Perform few comparisons in latter iterations of copy detection 48 JUNE 16, 2016
  • 46.
     Avoid comparingsource pairs that are likely to be independent – Out of 45 source pairs, 18 source pairs do not share any value – 1 source pair shares only two true values Scaling Up Copy Detection: Opportunity 1 49 Source Accu China Japan S Korea N Korea Vietnam S0 0.99 Beijing Tokyo Seoul Hanoi S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi S6 0.01 Nanjing Kyoto Wonsan Da Nang S7 0.25 Nanjing Seoul Wonsan Da Nang S8 0.2 Nanjing Nara Seoul Wonsan Da Nang S9 0.99 Beijing Seoul Pyongyang JUNE 16, 2016
  • 47.
     Each invertedindex entry corresponds to Data Item.Value – Includes provider sources (at least 2) – Includes probability of the value being true – Includes maximum contribution score to decision of copying Solution: Build and Use Inverted Index 50 Data Item.Value Pr Max Score Sources Japan.Kyoto 0.02 4.59 S5, S6 S Korea.Gyeongju 0.01 4.12 S2, S3, S4 … … … … Vietnam.Hanoi 0.94 0.43 S0, S1, S5 China.Beijing 0.96 0.43 S0, S1, S5, S9 JUNE 16, 2016
  • 48.
     Algorithm Indexprocesses entries in decreasing max score order – Accumulates scores for each source pair encountered in entry – Adjusts scores for source pairs with different values for a data item – Ignores source pairs that occur only in low max score entries Solution: Build and Use Inverted Index 51 Data Item.Value Pr Max Score Sources Japan.Kyoto 0.02 4.59 S5, S6 S Korea.Gyeongju 0.01 4.12 S2, S3, S4 … … … … Vietnam.Hanoi 0.94 0.43 S0, S1, S5 China.Beijing 0.96 0.43 S0, S1, S5, S9 JUNE 16, 2016
  • 49.
     Compare fewvalues for source pairs before making decision – Out of 4 shared values, 3 are false – Copying can be inferred after observing only 2 false values Scaling Up Copy Detection: Opportunity 2 52 Source Accu China Japan S Korea N Korea Vietnam S0 0.99 Beijing Tokyo Seoul Hanoi S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi S6 0.01 Nanjing Kyoto Wonsan Da Nang S7 0.25 Nanjing Seoul Wonsan Da Nang S8 0.2 Nanjing Nara Seoul Wonsan Da Nang S9 0.99 Beijing Seoul Pyongyang JUNE 16, 2016
  • 50.
     Algorithm Boundprocesses entries in decreasing max score order – Many high-score entries with source pair → early copying decision – Many high-score entries with neither source & many entries with only one of the sources → early no-copying decision Solution: Make Early Decisions 53 Data Item.Value Pr Max Score Sources Japan.Kyoto 0.02 4.59 S5, S6 S Korea.Gyeongju 0.01 4.12 S2, S3, S4 … … … … Vietnam.Hanoi 0.94 0.43 S0, S1, S5 China.Beijing 0.96 0.43 S0, S1, S5, S9 JUNE 16, 2016
  • 51.
     Perform fewcomparisons in later iterations of copy detection – High likelihood for Japan.Tokyo in round 1, small change in round 2 – Terminate with no-copying decision early Scaling Up Copy Detection: Opportunity 3 54 Source Accu China Japan S Korea N Korea Vietnam S0 0.99 Beijing Tokyo Seoul Hanoi S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi S6 0.01 Nanjing Kyoto Wonsan Da Nang S7 0.25 Nanjing Seoul Wonsan Da Nang S8 0.2 Nanjing Nara Seoul Wonsan Da Nang S9 0.99 Beijing Seoul Pyongyang JUNE 16, 2016
  • 52.
     Algorithm Incrementalprocesses index entries iteratively – Copy detection depends on value probability and source accuracy – Update scores on entries with big changes between iter i-2 and i-1 – Consider small-change entries only for source pairs whose score changes might lead to opposite detection on copying Solution: Incremental Copy Detection 55 Data Item.Value Pr Max Score Sources Japan.Kyoto 0.02 4.59 S5, S6 S Korea.Gyeongju 0.01 4.12 S2, S3, S4 … … … … Vietnam.Hanoi 0.94 0.43 S0, S1, S5 China.Beijing 0.96 0.43 S0, S1, S5, S9 JUNE 16, 2016
  • 53.
    Summary of ExperimentalResults  Four data sets: Book-CS, Stock-1day, Book-full, Stock- 2wk  Few vs many sources, data items, distinct values, index entries  Validates efficiency of proposed algorithms using inverted index  Algorithm Index improves efficiency by 10-100x over Pairwise  Early decisions + Incremental further improves efficiency by 10x  Significant reduction in run times of later iterations of Incremental  Validates effectiveness of proposed algorithms  Early decisions + Incremental obtains very similar result to Pairwise 56 JUNE 16, 2016
  • 54.
    Fact Statement TruthfulnessAnalysis  This part of the talk is based on the following works:  Xian Li, Weiyi Meng, Clement Yu. T-verifier: Verifying Truthfulness of Fact Statements. ICDE, 2011.  Xian Li, Weiyi Meng, Clement Yu. Truthfulness Analysis of Fact Statements Using the Web. IEEE Data Engineering Bulletin, 34(3), September 2011.  Xian Li, Weiyi Meng, Clement Yu, Haixun Wang. Verification of Fact Statements with Multiple Truthful Alternatives. Submitted.  Liang Wang, Weiyi Meng, Wenzhu Tong, Zhiyong Peng. Resolving Seemingly Conflicting Fact Statements Caused by Missing Terms. Submitted. JUNE 16, 2016 57
  • 55.
    58 Untruthful Information AllAround the Web Untruth easily spreads on the Web. Confliction makes people confused JUNE 16, 2016
  • 56.
     It canmislead people, especially young students, with incorrect knowledge.  It can make people lose confidence about the quality of information on the Web.  It can cause the stock market to be more volatile.  It can cause a politician to lose an election.  …… 59 Untruthful Information Is Bad Need tools and technologies to help verify information truthfulness! JUNE 16, 2016
  • 57.
     Fact statement:A statement that attempts to state a “fact”.  Barack Obama is a Christian.  China has 35 provinces.  Opinionated statement: A statement that expresses an opinion.  Michelle Obama is beautiful.  Stinky tofu is delicious. Fact Statements vs. Opinionated Statements 60 We focus on fact statements in this talk. JUNE 16, 2016
  • 58.
     Given afact statement,  determine whether or not the statement is truthful, and if it is not,  identify a truthful statement most relevant to the given statement, using information on the Web. Problem Statement “Facts” 61 JUNE 16, 2016
  • 59.
     Consider onlyfact statements (will be called doubtful statements) with a single specified doubt unit (denoted in [ ]) and a single truthful answer.  Example: China has [35] provinces.  doubtful statement = doubt unit + topic units  Example: For “China has [35] provinces”, doubt unit = “35” topic units = {“China”, “provinces”} Restricted Fact Statements 62 JUNE 16, 2016
  • 60.
    Overview of theT-verifier Method Doubt unit Doubtful statement Alternative statement generation Statement truthfulness verification Alternative statements Web Search Verify Search Verified Results 63 Generate a query from the input doubtful statement S (the “topic units” of S) and submit it to a search engine to collect information for analysis. Extract features from retrieved results. Use the features to rank alternative units (alter-units) for the doubt unit. Form alternative statements based on highly ranked alter-units. Submit each alternative statement as a query to collect information for analysis. Extract features from the search results. Rank alternative statements based on extracted features. JUNE 16, 2016
  • 61.
     Basic requirementsfor each alternative statement AS:  Same Topic. AS should cover the same topic as the doubtful statement (DS).  Different Value. AS should be different from the DS on the doubt unit. We call the term(s) in place of the doubt unit alternative unit or alter-unit.  Term sense/type closeness. Each alter-unit and doubt unit should have close word sense, in both data type and semantic meaning. Example: “Christian” is closer to “Muslim” than to “President”, because both “Christian” and “Muslim” are religious people. Alternative Statement Collection (1) 64 JUNE 16, 2016
  • 62.
     We convertthe problem of generating alternative statements into the problem of looking for alternative units (alter-units) and use them to generate alternative statements.  Two observations:  Relevant alter-units frequently co-occur with the topic units.  Relevant alter-units often co-occur with the doubt unit.  when people have doubt about a fact, they often mention other possible answers to the fact  when people dispute a controversial point or a common misconception, they often mention their own points or truthful fact Alternative Statement Generation (2) 65 JUNE 16, 2016
  • 63.
     Notations:  Q:Query = topic units = doubtful statement – doubt units  Let D = {r1, r2 …, rN} be the set of top-N SRRs retrieved by Q.  Cont(r, T) returns 1 if r contains T and 0 if r does not.  Each term/phrase T in D is a possible candidate alter-unit and a ranking score is computed for each T.  Seven Features are used to rank T. Alternative Statement Generation (3) 66 JUNE 16, 2016
  • 64.
     Feature 1(Data type matching (DM)):  Any relevant alter-unit should be of the same data type as the doubt unit.  Several data types are considered in T-verifier: date, time, telephone number, email address, person name, place name (e.g. name of state, city and attractions), and number. All the others are considered as common string. Alternative Statement Generation (4) 67 JUNE 16, 2016
  • 65.
     Feature 2(Sense closeness (SC)):  Any relevant alter-unit should have related meaning (sense) as the doubt unit (e.g., “Christian” has a closer meaning to “Muslim” than “president”).  WordNet is used to capture the sense closeness between two different terms. where α and β are parameters,  computes a Wu-Palmer similarity between T and DU based on their distance in the hypernym tree Alternative Statement Generation (5) 68 SC(T,DU) = a, hyper(T,DU) = true b, sibling(T,DU) = true g, g = wupsimilarity(T,DU),otherwise ì í ïï î ï ï JUNE 16, 2016
  • 66.
     Feature 3(Term local correlation (TLC)):  The co-occurrence coefficient of T with the doubt unit – terms that co-occur more frequently with the doubt unit are more likely to be good alter-units. Alternative Statement Generation (6) 69 JUNE 16, 2016
  • 67.
     Four TextualFeatures:  Feature 4 (Result coverage (RC)): The percentage of SRRs in D that contain T – if T appears in a higher percentage of the SRRs, it is more likely to be a good alter-unit. Alternative Statement Generation (7) 70 N TrCont TRC N i i  1 ),( )( JUNE 16, 2016
  • 68.
     Textual Features(continued):  Feature 5 (Result Query Relevance (RQR)): The relevance of the SRRs (with respect to Q) that contain T – if T appears in more relevant SRRs, T is more likely to be a good alter-unit.  We consider an SRR that contains more topic units to have a higher degree of relevance to the topic query Q. Alternative Statement Generation (8) 71       N i i N i ii TrCont QlenQrTrCont TQRQR 1 1 ),( )(/||),( ),( JUNE 16, 2016
  • 69.
     Textual Features(continued):  Feature 6 (SRR ranking (Rrank)): The ranks of the SRRs that contain T – if T appears in higher ranked SRRs, it is more likely to be a good alter-unit. where pos(r) is the position of r in the search engine’s ranking list and N is the number of SRRs considered. Alternative Statement Generation (9) 72        N i i N i ii Nrpos NrposTrCont TRrank 1 1 )/)(1( )/)(1(),( )( JUNE 16, 2016
  • 70.
     Feature 7(Term distance (TD))): The size of the smallest window of consecutive words in each SRR (title or snippet) that covers all the topic units contained in the SRR as well as the term T – the smaller the size is, the more likely T is a good alter- unit. Alternative Statement Generation (10) 73 JUNE 16, 2016
  • 71.
     Alter-unit generationalgorithm  Step 1: Filter the candidate terms by data type matching (DM), i.e., only those terms that match the data type of the doubt unit will be considered further.  Step 2: Rank each remaining candidate term based on the other six features. The ranking score of term T is computed using the following formula: where wi, i=1,...,6, are the weights for the features. The optimal weights are determined by a learning algorithm. Alternative Statement Generation (11) aurs(T)= w1 *SC+w2 *TLC+w3 *RC+w4 *RQR+w5 *Rank+w6 *RTD 74 JUNE 16, 2016
  • 72.
     Select thetop-k ranked alter-units for some k to be determined by experiments on training data.  Replace the doubt-unit by each of the top-k alter-units to generate k alternative statements. Alternative Statement Generation (12) 75 JUNE 16, 2016
  • 73.
    Statement Truthfulness Verification(1) Search each alternative statement Use basic rankers to rank the alternative statements Merge the ranks from all basic rankers 76 Step 1: Send each of the top-N ranked alternative statements as a query to a search engine and collect relevant SRRs. Step 2: Employ a number of basic rankers and generate a ranking list of the alternative statements using each basic ranker based on newly collected SRRs. Step 3: Use a rank merging algorithm to merge the rank lists into a combined final list. Select the top- ranked statement as the truthful statement. Determine which of the alternative statements is truthful. JUNE 16, 2016
  • 74.
    Basic ranker 1:Alter-Unit Ranker (AUR)  Rank the alternative statements in the same order as their corresponding alter-units which is produced by the alternative statement generation phase.  This basic ranker tries to utilize the information used in the alternative statement generation phase. Basic ranker 2: Hits Ranker (HR)  Submit each alternative statement to a search engine and rank the alternative statements by the numbers of hits they get from the search engine.  This basic ranker implicitly assumes that each hit provides an evidence that the corresponding alternative statement is correct. Statement Truthfulness Verification (2) 77 JUNE 16, 2016
  • 75.
    Basic rankers 3-6:Text Feature Rankers (TFR)  TFR(RC): Rank alternative statements in descending order of the Result Coverage (RC) values of their alter-units.  TFR(RQR): Rank alternative statements in descending order of the Result Query Relevance (RQR) values of their alter-units.  TFR(Rrank): Rank alternative statements in descending order of the SRR ranking (Rrank) values of their alter-units.  TFR(TD): Rank alternative statements in descending order of the Term Distance (TD) values of their alter-units. Statement Truthfulness Verification (3) Note: A new set of SRRs is used for each alternative statement in the above computations, different from that used in the alternative statement generation phase. 78 JUNE 16, 2016
  • 76.
    Basic ranker 7:Domain Authority Ranker (DAR)  Some researchers have observed that web pages published by certain domains are more likely to be truthful, such as “.gov”, “.edu”, etc.  This basic ranker ranks alternative statements based on the percentage of results retrieved from each domain and the correctness weight of each domain using training data. Statement Truthfulness Verification (4) 79 JUNE 16, 2016
  • 77.
    Rank merging:  Eachbasic ranker produces a ranked list of the same set of alternative statements obtained for the same doubtful statement.  Rank merging is to combine the seven ranked lists into a single ranked list of these alternative statements. Statement Truthfulness Verification (5) 80 Basic ranker 1: Ranked list 1 Basic ranker 7: Ranked list 7 : : : : Merged Ranked listRank Merging JUNE 16, 2016
  • 78.
    Seven rank mergingalgorithms are evaluated: Probability Combination:  Let Pi be the probability the i-th basic ranker can predict truthfulness correctly. Then the overall probability that the truthfulness of a statement S is correctly determined (i.e., the probability that at least one of the basic rankers is correct in predicting the truthfulness of S) can be estimated by assuming the basic rankers are independent. Statement Truthfulness Verification (6) 81 JUNE 16, 2016
  • 79.
    Basic Borda Count: Each alternative statement Si is treated as a candidate in an election and each basic ranker BRj is treated as a voter in the election.  BRj assigns n – k + 1 points to Si if Si is ranked k-th by BRj, where n is the number of alternative statements being ranked.  The total points assigned to Si is the sum of all points assigned to Si by all basic rankers.  Rank all alternative statements in descending order of the total points they received. Statement Truthfulness Verification (7) 82 JUNE 16, 2016
  • 80.
    Basic Condorcet algorithm: Define a comparison function between any two alternative statements Si and Sj: if Si is ranked higher than Sj by more basic rankers, then define Si > Sj (i.e., Si is ranked higher than Sj in the merged list); else if Sj is ranked higher than Si by more basic rankers, then define Sj > Si; else define Si = Sj.  Sort all alternative statements based on the above comparison function and output this sorted list as the merged result list. Example: Consider the following three local results: R1 = (S3, S1, S2), R2 = (S1, S2, S3), and R3 = (S2, S3, S1). In this case, we have S1 beats S2, S2 beats S3, but S3 beats S1. When a cycle is formed, all statements in the cycle are considered to be equivalent. Statement Truthfulness Verification (8) JUNE 16, 2016 83
  • 81.
    Weighted Borda Count: The same as the Basic Borda algorithm except that each basic ranker BRi is assigned a weight wi, reflecting the quality of the basic ranker. For example, wi could be implemented as the precision of BRi based on a training set.  Now the points assigned by BRi are adjusted by multiplying them with wi.  The rest of the steps remain the same as in the Basic Borda algorithm. Statement Truthfulness Verification (9) 84 JUNE 16, 2016
  • 82.
    Weighted Condorcet algorithm: It is the same as the Basic Condorcet algorithm except that the comparison function is modified as follows for any two alternative statements Si and Sj:  if the sum of the weights of the basic rankers that rank Si higher than Sj is larger than the sum of the weights of the basic rankers that rank Sj higher than Si, then define Si > Sj;  else if the sum of the weights of the basic rankers that rank Sj higher than Si is larger than the sum of the weights of the basic rankers that rank Si higher than Sj, then define Sj > Si;  else define Si = Sj. Statement Truthfulness Verification (10) JUNE 16, 2016 85
  • 83.
    Positional Borda Count: Position probability: Let Pj i denote the probability that an alternative statement ranked at the j-th position by the i-th basic ranker is truthful. Pj i can be obtained by training.  We can interpret Pj i as the ranking score given by BRi to the alternative statement ranked at j-th position.  Rank all alternative statements in descending order of the sum of the scores (the position probabilities) they received. Statement Truthfulness Verification (11) JUNE 16, 2016 86
  • 84.
    Weighted Positional BordaCount:  Let wi be the weight of the i-th basic ranker BRi .  Let Pj i be the probability that an alternative statement ranked at the j-th position by BRi is truthful.  The final ranking score given by BRi to the alternative statement ranked at j-th position is computed by wi * Pj i .  Rank all alternative statements in descending order of the sum of the final ranking scores they received. Statement Truthfulness Verification (12) JUNE 16, 2016 87
  • 85.
     Dataset: 50doubtful statements are compiled from factoid questions in Q&A track in TREC 8 and TREC 9. See http://cs.binghamton.edu/~xianli/doubtful_statements.htm.  For each doubtful statement, the correct answer is manually verified.  For each doubtful statement, form a topic query after the doubt unit is removed. Submit it to the Yahoo! search engine and obtain the top 200 SRRs.  Retrieve 100 results for each alternative statement. Experiment (1) 88 Doubtful statement Doubt unit Truth Antarctic is the only continent without a desert. Antarctic Europe George C. Scott won the Oscar for best actor in 1970. George C. Scott George C. Scott JUNE 16, 2016
  • 86.
     Alter-unit generationalgorithm evaluation  Test 1: Randomly select 25 doubtful statements as training set and the rest as testing set. Experiment (2) Training data set Testing data set Total cases 25 Total cases 25 Truthful one as top 1st 17 Truthful one as top 1st 14 Truthful one as 2nd 7 Truthful one as 2nd 9 Truthful one as 3rd 0 Truthful one as 3rd 1 Truthful one as 4th 1 Truthful one as 4th 1 Truthful one as 5th 0 Truthful one as 5th 0 All truthful alter-units are ranked among the top 5 results. 89 JUNE 16, 2016
  • 87.
     Impact ofusing different numbers of top SRRs Experiment (3) 90 Top 10 Top 50 Top 100 Top 150 Top 200 Ranked as 1st 12 21 29 31 31 Ranked as 2nd 6 12 10 12 16 Ranked as 3rd 5 3 3 3 1 Ranked as 4th 3 1 2 2 2 Ranked as 5th 2 0 1 1 0 Not Among top 5 22 13 5 0 0 JUNE 16, 2016
  • 88.
    Precision of eachbasic ranker  The precision of each basic ranker is defined by precision = n / N n: the number of truthful statements that are ranked at the top N: the number of doubtful statements evaluated  The precisions of the seven basic rankers: Experiment (4) Ranker AUR TFR(TD) TFR(RC) TFR(RQR) Precision 0.62 0.32 0.66 0.6 Ranker HR DAR TFR(Rrank) Precision 0.20 0.20 0.62 91 JUNE 16, 2016
  • 89.
    Truthfulness verification evaluation The precisions of different merging algorithms (based on 10-fold cross validation): Experiment (5) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BaseBorda BaseCond ProbComb PosBorda WBorda WCond WPosBorda 92 JUNE 16, 2016
  • 90.
    Analysis of fivefailed cases Cases 4 and 5 could be considered as correct. Experiment (6) 93 Untruthful statements verified as truthful Truthful 1 Tom Hanks was lead actress in the movie 'Sleepless in Seattle'. Meg Ryan 2 Apollo is the first spacecraft on the moon. Luna2 3 Sullivan is the fastest swimmer in the world. Michael Phelps 4 Les Paul invented the electric guitar. Rickenbacker 5 English is the primary language of the Philippines Filipino JUNE 16, 2016
  • 91.
     General factstatements and questions in QA systems are different.  Not all fact statements can be converted to equivalent questions, e.g., fact statements whose doubt units are verbs.  The doubt units in fact statements convey more information than the WH-word in a question.  A fact statement may have multiple doubt units but a question can imply only one doubt unit. Relationship with QA 94 JUNE 16, 2016
  • 92.
    T-verifier Answers.com Yahoo!answers Total statements 50 50 50 Wrong results 5 4 12 Cannot find results 0 6 18 Correct Results 45 40 20 95 How does T-verifier compare with two popular Web QA Systems?  Example Statement Answers.com can’t find result:  [800] people died when the Estonia sank in 1994.  Example Statement Answers.com gives wrong result:  [20] hexagons are on a soccer ball  Correct answer: 20, Answers.com gives: 32 JUNE 16, 2016
  • 93.
     This typeof statements will be called MTA statements.  Examples of MTA statements: Doubtful Statements with Multiple Truthful Alternatives 96 Type Doubtful statements Truthful alternatives CC Barack Obama was born in [Kenya]. Honolulu, Hawaii, United States MVA [Edwin Krebs] won the Nobel Prize in medicine in 1992. Edwin Krebs, Edmond Fischer SFE [Peter Ustinov] has portrayed Hercule Poirot. Peter Ustinov, David Suchet, Agatha Christie DSC [U.S.] won team title at the 2011 World Gymnastics Championships. U.S., China TS [Bob Dole] served as the President of the United States. Barack Obama since Jan 2009, George Bush from Jan 2001 to Jan 2009, … JUNE 16, 2016
  • 94.
     Objective: identifyall truthful alternatives for each MTA statement.  Significantly more challenging than processing STA statements (doubtful statements with a single truthful alternative).  The step of generating and ranking alter-units can remain the same.  The difference is how to determine all the truthful alter- units from the ranked candidate alter-units. MTA Statements Processing 97 JUNE 16, 2016
  • 95.
     Assumption: Truthfulalter-units are likely ranked above untruthful ones.  In other words, the ranking scores of the truthful alter-units are likely to be higher than those of untruthful ones.  Top-k approach: Determine an appropriate integer k so that the top-k ranked alter-units are recognized as the truthful ones.  The problem becomes how to determine the right k for each MTA statement.  This approach is a straightforward extension of the solution for processing STA statements, which is a top-1 solution. Top-k Approach (1) 98 JUNE 16, 2016
  • 96.
    Three different waysto select the k:  Largest Score Gap (LSG). Compare the gap between the scores of each pair of consecutively ranked alternative statements and choose the largest gap as the cut-off point.  Largest Percentage Gap (LPG). Similar to LSG except that percentage gap of scores is used.  The percentage gap between two ranking scores Si and Si+1 of two consecutively ranked alternative statements is: (Si – Si+1) / Si.  First Significant Gap (FSG). This method uses the first significant score gap as the cut-off point. In this paper, significant score gap is defined as follows: if logb(Si / Si+1) > 1, the score gap is considered to be significant, and k is set to be the smallest (i.e., the first) i that satisfies this condition. Top-k Approach (2) 99 JUNE 16, 2016
  • 97.
    Compare the threemethods for selecting the k  50 MTA statements are used.  A total of 143 truthful alternative statements for the 50 MTA statements, 2.8 per MTA statement on average. Top-k Approach (3) 100 # total selected # truthful alternatives Precision Recall F-score LSG 140 107 0.76 0.74 0.75 LPG 182 125 0.68 0.86 0.76 FSG 178 128 0.71 0.88 0.79 Conclusions:  Top-k approach is not sufficiently good.  Need new approach: Divide MTA statements into different types and develop a solution for each type. JUNE 16, 2016
  • 98.
    Five types ofMTA statements have been identified.  Type 1: Compatible Concepts (CC). For each MTA statement of this type, its truthful alter-units are compatible to each other. Usually, these alter-units either are equivalent to each other (i.e., synonyms) or correspond to the same basic concept but with different specificity/generality (i.e., hyponyms/hypernyms) or with different granularity (i.e., one is a part of another). Example 1: For “Barack Obama was born in [Honolulu]”, truthful alternatives include “Honolulu”, “Hawaii”, “United States”, etc. Example 2: For “Queen Elizabeth II resided in [United Kingdom]”, correct alter-units include “United Kingdom”, “England” and “Great Britain”. Different Types of MTA Statements (1) 101 JUNE 16, 2016
  • 99.
     Type 2:Multi-Valued Attributes (MVA). For each MTA statement of this type, the truthful alter-units correspond to different values of a multi-valued attribute in a relational database. A multi-valued attribute may have multiple values for a given entity (record). Example: For “[Edwin Krebs] won the Nobel Prize in medicine in 1992”, two US biochemists “Edwin Krebs” and “Edmond” Fischer” shared the 1992 Nobel Prize in medicine (they are values of the multi-valued attribute “Recipients” of a Nobel Prize record); therefore both of them are truthful alter-units. Different Types of MTA Statements (2) 102 JUNE 16, 2016
  • 100.
     Type 3:Shared-Feature Entities (SFE). Many entities share a common feature. In this case, when trying to find entities that have this feature, multiple entities may be detected. Example: For “[Peter Ustinov] has portrayed Hercule Poirot”, all actors who have portrayed Hercule Poirot in different movies (including remakes) are truthful alter-units. The truthful alter- units include Peter Ustinov, David Suchet and Agatha Christie.  Difference between the SFE type and the MVA type: For the former, multiple entities share a common feature, while for the latter, the same entity has multiple values for an attribute.  In reality, values of the same multi-valued attribute are much more likely to co-occur compared to the entities that share the same feature. Different Types of MTA Statements (3) 103 JUNE 16, 2016
  • 101.
     Type 4:Different Sub-Categories (DSC). Different sub- categories related to some of the topic units of the doubtful statement. A topic unit of a doubtful statement DS is a term/phrase in DS different from the doubt unit. When a topic unit is replaced by different sub-categories, the doubt unit may have different truthful alter-units, making the doubtful statement an MTA statement. Example: For “[U.S.] won team title at the 2011 World Gymnastics Championships”, topic unit “World” could be replaced by “US”, “Europe”, etc. and “team” could be replaced by “men’s team” or “women’s team”. Each replaced can lead to a different truthful alter-unit. Different Types of MTA Statements (4) 104 JUNE 16, 2016
  • 102.
     Type 5:Time-Sensitive (TS).  A fact statement is a time-sensitive statement if its truthfulness changes over time with a fairly regular pattern on time or such change is expected even though there is no such regularity. Since there may be different truthful alter-units at different times for such type of statements, they are MTA statements. Example: A time-sensitive statement with a fairly regular pattern is US presidents (serve 4 or 8 years). Example: A time-sensitive statement with less regularity is “A new world 100-meter track record was established by [Usain Bolt]” because it is generally not predictable who will establish a new record in the future. Different Types of MTA Statements (5) 105 JUNE 16, 2016
  • 103.
     We recentlycompleted a paper on processing doubtful statements of CC and MVA types.  For statements of the CC type, we primarily explored semantic relationships (such as synonymy and hypernymy relationships) among alter-units.  For statements of the MVA type, we primarily explored co- occurrence-based correlation relationships among alter- units. Processing CC & MVA Statements 106 JUNE 16, 2016
  • 104.
     Sometimes, seeminglyconflicting fact statements (SCFSs) do not mean actual conflicts.  Consider the following two statements: S1: China won the team title at the 2011 World Gymnastics Championships in Tokyo. S2: USA won the team title at the 2011 World Gymnastics Championships in Tokyo. Resolving Seemingly Conflicting Fact Statements Caused by Missing Terms 107 JUNE 16, 2016
  • 105.
     The seeminglyconflict is caused by missing terms that make the statements imprecise.  With the missing terms added, we have the following precise statements: S1: China won the men’s team title at the 2011 World Gymnastics Championships in Tokyo. S2: USA won the women’s team title at the 2011 World Gymnastics Championships in Tokyo. Resolving Seemingly Conflicting Fact Statements Caused by Missing Terms 108 JUNE 16, 2016
  • 106.
     General problem:Giving a set of seemingly conflicting fact statements, determine whether the seemingly conflict is caused by missing terms by finding the missing terms. We also determine the appropriate position for the missing term in each statement.  The missing terms often create sub-categories of a category. Example: “men’s team title” and “women’s team title” are sub-categories of “team title”.  So the solution to this problem can be used to process MTA-statements of the DSC type. Resolving Seemingly Conflicting Fact Statements Caused by Missing Terms 109 JUNE 16, 2016
  • 107.
     Current datafusion techniques assume that there is a single true value for each data item. How to handle situations where there may be multiple true values for some data items?  Data inconsistency may have different causes. How to automatically identify these causes and apply appropriate methods to resolve them? Additional Research Issues (1) 110 JUNE 16, 2016
  • 108.
     How toautomatically classify MTA statements into different types?  The same statement may be classified into multiple types.  How to process the other two types of MTA statements, namely SFE and TS?  If a statement is classified into multiple types, it may be processed using different techniques. How to combine the results of these different techniques? Additional Research Issues (2) 111 JUNE 16, 2016
  • 109.
     How tohandle doubtful statements that have no specified doubt unit(s)? In other words, how to automatically determine the most worthy doubt unit(s) from a given fact statement? Example: For the following statement, More than 100,000 people were killed by the 2008 Sichuan earthquake. 112 JUNE 16, 2016 Additional Research Issues (3)
  • 110.
     How toprocess all fact statements on the Web? Challenges:  How to identify controversial fact statements so we can focus on these statements only?  How to use fact statements that have already been verified to verify or de-verify other fact statements? 113 JUNE 16, 2016 Additional Research Issues (4)
  • 111.
     How toapply natural language processing (NLP) to determine the truthfulness of fact statements? Example: Consider the following sentences related to fact statement “ SUNY has 64 campuses”: 1. I believe SUNY has 64 campuses. 2. I strongly believe SUNY has 64 campuses. 3. I am sure SUNY has 64 campuses. 4. I am sure SUNY has 64 campuses because this is mentioned at the SUNY website. 114 JUNE 16, 2016 Additional Research Issues (5)
  • 112.
     How toidentify and resolve time-sensitive data inconsistency in fact statements?  The problem is mostly caused by the use of relative time references such as yesterday, last week, 2 years ago, …  Need to establish the correct reference point for each relative time in a data source. 115 JUNE 16, 2016 Additional Research Issues (6)
  • 113.