Weiyi meng web data truthfulness analysis

Web Data Truthfulness Analysis
Weiyi Meng
Department of Computer Science
State University of New York at Binghamton
meng@cs.binghamton.edu
(RenDa Office: 235)
http://www.cs.binghamton.edu/~meng/meng.html
July 2015
JUNE 16,
2016
1

Where in the World is SUNY Binghamton?
 Located in Binghamton
in New York state
 Birthplace of IBM
(Endicott, NY)
 Metro population:
249,000
 One of the safest
U.S. midsized cities
 Low cost of living
(12% below U.S.
average)
 Close to major cities
JUNE 16,
2016
2

Binghamton at a Glance
JUNE 16,
2016
3

The SUNY System
 64 campuses
(two-year, four-year)
 Four PhD-granting
“University Centers”
 Albany
 Binghamton
 Buffalo
 Stony Brook
JUNE 16,
2016
4

Rankings
 #1 best public college in
New York and #18 in the
nation
 — AMERICAN CITY BUSINESS JOURNAL, 2015
 #4 best value among
nation’s public colleges for
out-of-state and
international students and
#15 overall
 — KIPLINGER’S PERSONAL FINANCE MAGAZINE, 2014
 Top 5 best values in the
U.S.A. among public
colleges
 — KIPLINGER’S PERSONAL FINANCE MAGAZINE AND
THE PRINCETON REVIEW, 2014
 38th of top U.S.A. public
universities
 — U.S. NEWS & WORLD REPORT, 2014
 #5 of 50 best-rated, most
affordable colleges for
international students
 — GREAT VALUE COLLEGES, 2014
 Premier Public University
in Northeast U.S.A.
 — FISKE GUIDE TO COLLEGES
JUNE 16,
2016
5

6
The University Main Campus
 J U L Y 2 0 1 5
JUNE 16, 2016

Talk Outline
 Introduction
 Structured Web data quality analysis and truth finding
 Fact statement truthfulness analysis
 Additional Research Issues
JUNE 16,
2016
7

Web Data Quality
 Quality of data can be measured in different ways:
 Correctness or truthfulness
 Freshness
 Completeness
 Objectiveness
 Writing quality/style
 Appropriateness
 ……
 This talk will focus mostly on correctness/truthfulness.
JUNE 16,
2016
8

Structured Web Data Quality Analysis
and Truth Finding
 This part of the talk is based on the following two works in
collaboration with AT&T Labs-Research:
 Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh
Srivastava. Truth Finding on the Deep WEB: Is the Problem
Solved? VLDB 2013.
 Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, Divesh
Srivastava. Scaling Up Copy Detection. ICDE 2015.
JUNE 16,
2016
9

 Why these two domains?
Belief of fairly clean data
Data quality can have big impact on people’s lives
 Resolved heterogeneity at schema level and instance
level
Study on Two Domains
#Sources Period #Objects #Local-
attrs
#Global-
attrs
Considered
items
Stock 55 7/2011 1000*20 333 153 16000*20
Flight 38 12/2011 1200*31 43 15 7200*31
JUNE 16,
2016
10

 Stock
 Search “stock price quotes” and “AAPL quotes”
 Sources: 200 (search results)89 (deep web)76 (GET method) 55 (none
JavaScript)
 1000 “Objects”: a stock with a particular symbol on a particular day
 30 from Dow Jones Index
 100 from NASDAQ100 (3 overlaps)
 873 from Russell 3000
 Attributes: 333 (local)  153 (global)  21 (provided by > 1/3 sources)  16 (no
change after market close)
attrs
#Global-
attrs
Considered
items
Stock 55 7/2011 1000*20 333 153 16000*20
Flight 38 12/2011 1200*31 43 15 7200*31
JUNE 16,
2016
11

 Flight
 Search “flight status”
 Sources: 38
 3 airline websites (AA, UA, Continental)
 8 airport websites (SFO, DEN, etc.)
 27 third-party websites (Orbitz, Travelocity, etc.)
 1200 “Objects”: a flight with a particular flight number on a
particular day from a particular departure city
 Departing or arriving at the hub airports of AA/UA/Continental
 Attributes: 43 (local)  15 (global)  6 (provided by > 1/3 sources)
 scheduled dept/arr time, actual dept/arr time, dept/arr gate
attrs
#Global-
attrs
Considered
items
Stock 55 7/2011 1000*20 333 153 16000*20
Flight 38 12/2011 1200*31 43 15 7200*31
JUNE 16,
2016
12

Q1. Are There a Lot of Redundant Data on
the Deep Web?

JUNE 16,
2016
13

Q2. Are the Data Consistent?
Inconsistency on 70% data items
Tolerance to 1% difference

JUNE 16,
2016
14

Why Such Inconsistency?
— I. Semantic Ambiguity
Yahoo! Finance Nasdaq
Day’s Range: 93.80-95.71
52wk Range: 25.38-95.71
52 Wk: 25.38-93.72
JUNE 16,
2016
18

— II. Instance Ambiguity
JUNE 16,
2016
19

— III. Out-of-Date Data
4:05 pm 3:57 pm
JUNE 16,
2016
20

— IV. Unit Difference
76,821,000
76.82B
JUNE 16,
2016
21

— V. Pure Error
FlightView FlightAware
Orbitz
6:15 PM
6:15 PM
6:22 PM
9:40 PM
8:33 PM
9:54 PM
JUNE 16,
2016
22

 Random sample of 20 data items and 5 items with the
largest #values in each domain
JUNE 16,
2016
23

Q3. Is Each Source of High Accuracy?
 Not high on average: .86 for Stock and .8 for Flight
 Gold standard
 Stock: vote on data from Google Finance, Yahoo! Finance, MSN Money,
NASDAQ, Bloomberg
 Flight: from airline websites

JUNE 16,
2016
24

Q3-2. Are Authoritative Sources of High
Accuracy?
 Reasonable but not so high accuracy
 Medium coverage

JUNE 16,
2016
25

Q4. Is There Copying or Data Sharing
Between Deep-Web Sources?

JUNE 16,
2016
26

Q4-2. Is Copying or Data Sharing Mainly
on Accurate Data? 
JUNE 16,
2016
27

How to Resolve Inconsistency and Find
the True Values?
The problem is known as Data Fusion.
JUNE 16,
2016
28

Basic Solution: Voting
 Only 70% correct values are provided by over half of the sources
 Voting precision:
 .908 for Stock; i.e., wrong values for 1500 data items
 .864 for Flight; i.e., wrong values for 1000 data items
JUNE 16,
2016
29

Improvement I. Leveraging Source Accuracy
S1 S2 S3
Flight 1 7:02PM 6:40PM 7:02PM
Flight 3 9:20AM 9:20AM 9:20AM
JUNE 16,
2016
30

 Naïve voting obtains an accuracy of 80%
S1 S2 S3
Higher accuracy;
More trustable
JUNE 16,
2016
31

 Considering accuracy obtains an accuracy of 100%
Challenges:
1. How to decide source accuracy?
2. How to leverage source accuracy in voting?
S1 S2 S3
Higher accuracy;
More trustable
JUNE 16,
2016
32

Results on Stock Data (I)
 Among various methods, the Bayesian-based method (Accu) performs best
at the beginning, but in the end obtains a final precision (=recall) of .900,
worse than Vote (.908)
JUNE 16,
2016
33

Results on Stock Data (II)
 AccuSim obtains a final precision of .929, higher than Vote and any other
method (around .908)
 This translates to 350 more correct values
JUNE 16,
2016
34

Results on Stock Data (III)
JUNE 16,
2016
35

Results on Flight Data
 Accu/AccuSim obtains a final precision of .831/.833, both lower than Vote (.857)
JUNE 16,
2016
36

Copying or Data Sharing Can Happen on
Inaccurate Data
JUNE 16,
2016
37

S1 S2 S3 S4 S5
Flight 1 7:02PM 6:40PM 7:02PM 7:02PM 8:02PM
Flight 3 9:20AM 9:20AM 9:20AM 9:20AM 9:20AM
 Naïve voting works only if data sources are independent.
JUNE 16,
2016
38
 Considering source accuracy can be worse when there is
copying

Improvement II. Ignoring Copied Data
 It is important to detect copying and ignore copied values
in fusion
Challenges:
1. How to detect copying?
2. How to leverage copying in voting?
S1 S2 S3 S4 S5
Flight 3 9:20AM 9:20AM 9:20AM 9:20AM 9:20AM
JUNE 16,
2016
39

Results on Flight Data (I)
 AccuCopy obtains a final precision of .943, much higher than Vote (.864)
 This translates to 570 more correct values
JUNE 16,
2016
40

Results on Flight Data (II)
JUNE 16,
2016
41

Take-Away Messages
 Web data is not fully trustable, Web sources have
different accuracy, and copying is common
 Leveraging source accuracy, copying relationships, and
value similarity can improve truth finding
JUNE 16,
2016
42

Key Observations about Copy Detection
 Structured data copy detection is important and
challenging
 Useful for truth finding, protecting rights of data sources
 Copy detection can be expensive
 Prior work (Pairwise) has time complexity O(|S|2 |D| L)
 Many opportunities for scaling up copy detection
 Avoid comparing source pairs that are likely to be
independent
 Compare few values for source pairs before making decision
 Perform few comparisons in latter iterations of copy
detection
43
JUNE 16,
2016

 Are sources S0 and S1 copying?
– Not necessarily
Copy Detection
44
Source China Japan S Korea N Korea Vietnam
S0 Beijing Tokyo Seoul Hanoi
S1 Beijing Tokyo Seoul Pyongyang Hanoi
S2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
JUNE 16,
2016

 Are sources S2 and S3 copying?
– Very likely because sharing of many false values
without copying is very unlikely.
Copy Detection
45
Source China Japan S Korea N Korea Vietnam
S0 Beijing Tokyo Seoul Hanoi
S1 Beijing Tokyo Seoul Pyongyang Hanoi
S2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
JUNE 16,
2016

Copy Detection: Iterative Process
47
Truth
Discovery
Accuracy
Computation
Copy
Detection
Step 1Step 3
Step 2
JUNE 16,
2016

Scaling Up Copy Detection: Opportunities
 Pairwise copy detection has time complexity O(|S|2.|D|.L)
 |S| is the number of sources
 |D| is the number of data items
 L is the number of iterations
 Many opportunities for scaling up copy detection
 Avoid comparing source pairs that are likely to be
independent (e.g., share no values or very few true values)
 Compare few values for source pairs before making decision
 Perform few comparisons in latter iterations of copy detection
48
JUNE 16,
2016

 Avoid comparing source pairs that are likely to be independent
– Out of 45 source pairs, 18 source pairs do not share any value
– 1 source pair shares only two true values
Scaling Up Copy Detection: Opportunity 1
49
Source Accu China Japan S Korea N Korea Vietnam
S0 0.99 Beijing Tokyo Seoul Hanoi
S1 0.99 Beijing Tokyo Seoul Pyongyang Hanoi
S2 0.2 Xi’an Tokyo Gyeongju Kaesong Ho Chi Minh City
S3 0.2 Shanghai Tokyo Gyeongju Kaesong Ho Chi Minh City
S4 0.4 Xi’an Tokyo Gyeongju Pyongyang Ho Chi Minh City
S5 0.6 Beijing Kyoto Busan Pyongyang Hanoi
S6 0.01 Nanjing Kyoto Wonsan Da Nang
S7 0.25 Nanjing Seoul Wonsan Da Nang
S8 0.2 Nanjing Nara Seoul Wonsan Da Nang
S9 0.99 Beijing Seoul Pyongyang
JUNE 16,
2016

 Each inverted index entry corresponds to Data Item.Value
– Includes provider sources (at least 2)
– Includes probability of the value being true
– Includes maximum contribution score to decision of copying
Solution: Build and Use Inverted Index
50
Data Item.Value Pr Max Score Sources
Japan.Kyoto 0.02 4.59 S5, S6
S Korea.Gyeongju 0.01 4.12 S2, S3, S4
… … … …
Vietnam.Hanoi 0.94 0.43 S0, S1, S5
China.Beijing 0.96 0.43 S0, S1, S5, S9
JUNE 16,
2016

 Algorithm Index processes entries in decreasing max score
order
– Accumulates scores for each source pair encountered in entry
– Adjusts scores for source pairs with different values for a data item
– Ignores source pairs that occur only in low max score entries
Solution: Build and Use Inverted Index
51
… … … …
JUNE 16,
2016

 Compare few values for source pairs before making decision
– Out of 4 shared values, 3 are false
– Copying can be inferred after observing only 2 false values
52
JUNE 16,
2016

 Algorithm Bound processes entries in decreasing max
score order
– Many high-score entries with source pair → early copying
decision
– Many high-score entries with neither source & many entries
with only one of the sources → early no-copying decision
Solution: Make Early Decisions
53
… … … …
JUNE 16,
2016

 Perform few comparisons in later iterations of copy detection
– High likelihood for Japan.Tokyo in round 1, small change in round 2
– Terminate with no-copying decision early
54
JUNE 16,
2016

 Algorithm Incremental processes index entries iteratively
– Copy detection depends on value probability and source accuracy
– Update scores on entries with big changes between iter i-2 and i-1
– Consider small-change entries only for source pairs whose score
changes might lead to opposite detection on copying
Solution: Incremental Copy Detection
55
… … … …
JUNE 16,
2016

Summary of Experimental Results
 Four data sets: Book-CS, Stock-1day, Book-full, Stock-
2wk
 Few vs many sources, data items, distinct values, index entries
 Validates efficiency of proposed algorithms using inverted
index
 Algorithm Index improves efficiency by 10-100x over Pairwise
 Early decisions + Incremental further improves efficiency by 10x
 Significant reduction in run times of later iterations of Incremental
 Validates effectiveness of proposed algorithms
 Early decisions + Incremental obtains very similar result to Pairwise
56
JUNE 16,
2016

Fact Statement Truthfulness Analysis
 This part of the talk is based on the following works:
 Xian Li, Weiyi Meng, Clement Yu. T-verifier: Verifying
Truthfulness of Fact Statements. ICDE, 2011.
 Xian Li, Weiyi Meng, Clement Yu. Truthfulness Analysis of
Fact Statements Using the Web. IEEE Data Engineering
Bulletin, 34(3), September 2011.
 Xian Li, Weiyi Meng, Clement Yu, Haixun Wang. Verification
of Fact Statements with Multiple Truthful Alternatives.
Submitted.
 Liang Wang, Weiyi Meng, Wenzhu Tong, Zhiyong Peng.
Resolving Seemingly Conflicting Fact Statements Caused
by Missing Terms. Submitted.
JUNE 16,
2016
57

58
Untruthful Information All Around the Web
Untruth easily
spreads on the
Web.
Confliction makes
people confused
JUNE 16,
2016

 It can mislead people, especially young students, with
incorrect knowledge.
 It can make people lose confidence about the quality of
information on the Web.
 It can cause the stock market to be more volatile.
 It can cause a politician to lose an election.
 ……
59
Untruthful Information Is Bad
Need tools and technologies to help verify
information truthfulness!
JUNE 16,
2016

 Fact statement: A statement that attempts to state a “fact”.
 Barack Obama is a Christian.
 China has 35 provinces.
 Opinionated statement: A statement that expresses an
opinion.
 Michelle Obama is beautiful.
 Stinky tofu is delicious.
Fact Statements vs. Opinionated Statements
60
We focus on fact statements in this talk.
JUNE 16,
2016

 Given a fact statement,
 determine whether or not the statement is truthful, and
if it is not,
 identify a truthful statement most relevant to the given
statement,
using information on the Web.
Problem Statement
“Facts”
61
JUNE 16,
2016

 Consider only fact statements (will be called doubtful
statements) with a single specified doubt unit (denoted in [
]) and a single truthful answer.
 Example: China has [35] provinces.
 doubtful statement = doubt unit + topic units
 Example: For “China has [35] provinces”,
doubt unit = “35”
topic units = {“China”, “provinces”}
Restricted Fact Statements
62
JUNE 16,
2016

Overview of the T-verifier Method
Doubt unit
Doubtful
statement
Alternative
statement
generation
Statement
truthfulness
verification
Alternative
statements
Web
Search
Verify
Search
Verified
Results
63
Generate a query from the input
doubtful statement S (the “topic units”
of S) and submit it to a search engine
to collect information for analysis.
Extract features from retrieved results.
Use the features to rank alternative
units (alter-units) for the doubt unit.
Form alternative statements based
on highly ranked alter-units.
Submit each alternative statement as a query
to collect information for analysis.
Extract features from the search
results. Rank alternative statements
based on extracted features.
JUNE 16,
2016

 Basic requirements for each alternative statement AS:
 Same Topic. AS should cover the same topic as the doubtful
statement (DS).
 Different Value. AS should be different from the DS on the
doubt unit. We call the term(s) in place of the doubt unit
alternative unit or alter-unit.
 Term sense/type closeness. Each alter-unit and doubt unit
should have close word sense, in both data type and semantic
meaning.
Example: “Christian” is closer to “Muslim” than to “President”,
because both “Christian” and “Muslim” are religious people.
Alternative Statement Collection (1)
64
JUNE 16,
2016

 We convert the problem of generating alternative
statements into the problem of looking for alternative units
(alter-units) and use them to generate alternative
statements.
 Two observations:
 Relevant alter-units frequently co-occur with the topic units.
 Relevant alter-units often co-occur with the doubt unit.
 when people have doubt about a fact, they often mention
other possible answers to the fact
 when people dispute a controversial point or a common
misconception, they often mention their own points or
truthful fact
Alternative Statement Generation (2)
65
JUNE 16,
2016

 Notations:
 Q: Query = topic units = doubtful statement – doubt units
 Let D = {r1, r2 …, rN} be the set of top-N SRRs retrieved by Q.
 Cont(r, T) returns 1 if r contains T and 0 if r does not.
 Each term/phrase T in D is a possible candidate alter-unit and
a ranking score is computed for each T.
 Seven Features are used to rank T.
66
JUNE 16,
2016

 Feature 1 (Data type matching (DM)):
 Any relevant alter-unit should be of the same data type
as the doubt unit.
 Several data types are considered in T-verifier: date,
time, telephone number, email address, person name,
place name (e.g. name of state, city and attractions),
and number. All the others are considered as common
string.
67
JUNE 16,
2016

 Feature 2 (Sense closeness (SC)):
 Any relevant alter-unit should have related meaning
(sense) as the doubt unit (e.g., “Christian” has a closer
meaning to “Muslim” than “president”).
 WordNet is used to capture the sense closeness between
two different terms.
where α and β are parameters,  computes a Wu-Palmer
similarity between T and DU based on their distance in the
hypernym tree
68
SC(T,DU) =
a, hyper(T,DU) = true
b, sibling(T,DU) = true
g, g = wupsimilarity(T,DU),otherwise
ì
í
ïï
î
ï
ï
JUNE 16,
2016

 Feature 3 (Term local correlation (TLC)):
 The co-occurrence coefficient of T with the doubt unit –
terms that co-occur more frequently with the doubt unit
are more likely to be good alter-units.
69
JUNE 16,
2016

 Four Textual Features:
 Feature 4 (Result coverage (RC)): The percentage of SRRs
in D that contain T – if T appears in a higher percentage of
the SRRs, it is more likely to be a good alter-unit.
70
N
TrCont
TRC
N
i i
 1
),(
)(
JUNE 16,
2016

 Textual Features (continued):
 Feature 5 (Result Query Relevance (RQR)): The relevance of
the SRRs (with respect to Q) that contain T – if T appears in
more relevant SRRs, T is more likely to be a good alter-unit.
 We consider an SRR that contains more topic units to have a
higher degree of relevance to the topic query Q.
71





 N
i i
N
i ii
TrCont
QlenQrTrCont
TQRQR
1
1
),(
)(/||),(
),(
JUNE 16,
2016

 Textual Features (continued):
 Feature 6 (SRR ranking (Rrank)): The ranks of the SRRs that
contain T – if T appears in higher ranked SRRs, it is more
likely to be a good alter-unit.
where pos(r) is the position of r in the search engine’s ranking
list and N is the number of SRRs considered.
72






 N
i i
N
i ii
Nrpos
NrposTrCont
TRrank
1
1
)/)(1(
)/)(1(),(
)(
JUNE 16,
2016

 Feature 7 (Term distance (TD))): The size of the smallest
window of consecutive words in each SRR (title or snippet) that
covers all the topic units contained in the SRR as well as the
term T – the smaller the size is, the more likely T is a good alter-
unit.
73
JUNE 16,
2016

 Alter-unit generation algorithm
 Step 1: Filter the candidate terms by data type matching
(DM), i.e., only those terms that match the data type of
the doubt unit will be considered further.
 Step 2: Rank each remaining candidate term based on
the other six features. The ranking score of term T is
computed using the following formula:
where wi, i=1,...,6, are the weights for the features. The
optimal weights are determined by a learning algorithm.
aurs(T)= w1 *SC+w2 *TLC+w3 *RC+w4 *RQR+w5 *Rank+w6 *RTD
74
JUNE 16,
2016

 Select the top-k ranked alter-units for some k to be
determined by experiments on training data.
 Replace the doubt-unit by each of the top-k alter-units to
generate k alternative statements.
75
JUNE 16,
2016

Statement Truthfulness Verification (1)
Search each
alternative
statement
Use basic
rankers to rank
the alternative
statements
Merge the
ranks from
all basic
rankers
76
Step 1: Send each of the top-N ranked alternative
statements as a query to a search engine and collect
relevant SRRs.
Step 2: Employ a number of basic rankers and
generate a ranking list of the alternative statements
using each basic ranker based on newly collected
SRRs.
Step 3: Use a rank merging algorithm to merge the
rank lists into a combined final list. Select the top-
ranked statement as the truthful statement.
Determine which of the alternative statements is truthful.
JUNE 16,
2016

Basic ranker 1: Alter-Unit Ranker (AUR)
 Rank the alternative statements in the same order as their
corresponding alter-units which is produced by the
alternative statement generation phase.
 This basic ranker tries to utilize the information used in the
alternative statement generation phase.
Basic ranker 2: Hits Ranker (HR)
 Submit each alternative statement to a search engine and
rank the alternative statements by the numbers of hits they
get from the search engine.
 This basic ranker implicitly assumes that each hit provides an
evidence that the corresponding alternative statement is correct.
77
JUNE 16,
2016

Basic rankers 3-6: Text Feature Rankers (TFR)
 TFR(RC): Rank alternative statements in descending order of
the Result Coverage (RC) values of their alter-units.
 TFR(RQR): Rank alternative statements in descending order of
the Result Query Relevance (RQR) values of their alter-units.
 TFR(Rrank): Rank alternative statements in descending order
of the SRR ranking (Rrank) values of their alter-units.
 TFR(TD): Rank alternative statements in descending order of
the Term Distance (TD) values of their alter-units.
Note: A new set of SRRs is used for each alternative statement in
the above computations, different from that used in the alternative
statement generation phase.
78
JUNE 16,
2016

Basic ranker 7: Domain Authority Ranker (DAR)
 Some researchers have observed that web pages
published by certain domains are more likely to be
truthful, such as “.gov”, “.edu”, etc.
 This basic ranker ranks alternative statements based on
the percentage of results retrieved from each domain and
the correctness weight of each domain using training
data.
79
JUNE 16, 2016

Rank merging:
 Each basic ranker produces a ranked list of the same set
of alternative statements obtained for the same doubtful
statement.
 Rank merging is to combine the seven ranked lists into a
single ranked list of these alternative statements.
80
Basic ranker 1: Ranked list 1
Basic ranker 7: Ranked list 7
:
:
:
:
Merged Ranked listRank Merging
JUNE 16,
2016

Seven rank merging algorithms are evaluated:
Probability Combination:
 Let Pi be the probability the i-th basic ranker can predict
truthfulness correctly. Then the overall probability that the
truthfulness of a statement S is correctly determined (i.e.,
the probability that at least one of the basic rankers is
correct in predicting the truthfulness of S) can be
estimated by
assuming the basic rankers are independent.
81
JUNE 16,
2016

Basic Borda Count:
 Each alternative statement Si is treated as a candidate in
an election and each basic ranker BRj is treated as a voter
in the election.
 BRj assigns n – k + 1 points to Si if Si is ranked k-th by
BRj, where n is the number of alternative statements being
ranked.
 The total points assigned to Si is the sum of all points
assigned to Si by all basic rankers.
 Rank all alternative statements in descending order of the
total points they received.
82
JUNE 16,
2016

Basic Condorcet algorithm:
 Define a comparison function between any two alternative
statements Si and Sj: if Si is ranked higher than Sj by more
basic rankers, then define Si > Sj (i.e., Si is ranked higher than
Sj in the merged list); else if Sj is ranked higher than Si by more
basic rankers, then define Sj > Si; else define Si = Sj.
 Sort all alternative statements based on the above comparison
function and output this sorted list as the merged result list.
Example: Consider the following three local results: R1 = (S3,
S1, S2), R2 = (S1, S2, S3), and R3 = (S2, S3, S1). In this
case, we have S1 beats S2, S2 beats S3, but S3 beats S1.
When a cycle is formed, all statements in the cycle are
considered to be equivalent.
JUNE 16,
2016
83

Weighted Borda Count:
 The same as the Basic Borda algorithm except that each
basic ranker BRi is assigned a weight wi, reflecting the
quality of the basic ranker. For example, wi could be
implemented as the precision of BRi based on a training
set.
 Now the points assigned by BRi are adjusted by
multiplying them with wi.
 The rest of the steps remain the same as in the Basic
Borda algorithm.
84
JUNE 16,
2016

Weighted Condorcet algorithm:
 It is the same as the Basic Condorcet algorithm except
that the comparison function is modified as follows for any
two alternative statements Si and Sj:
 if the sum of the weights of the basic rankers that rank Si
higher than Sj is larger than the sum of the weights of the
basic rankers that rank Sj higher than Si, then define Si > Sj;
 else if the sum of the weights of the basic rankers that rank
Sj higher than Si is larger than the sum of the weights of the
basic rankers that rank Si higher than Sj, then define Sj > Si;
 else define Si = Sj.
JUNE 16,
2016
85

Positional Borda Count:
 Position probability: Let Pj
i denote the probability that an
alternative statement ranked at the j-th position by the i-th
basic ranker is truthful. Pj
i can be obtained by training.
 We can interpret Pj
i as the ranking score given by BRi to
the alternative statement ranked at j-th position.
 Rank all alternative statements in descending order of
the sum of the scores (the position probabilities) they
received.
JUNE 16,
2016
86

Weighted Positional Borda Count:
 Let wi be the weight of the i-th basic ranker BRi .
 Let Pj
i be the probability that an alternative statement
ranked at the j-th position by BRi is truthful.
 The final ranking score given by BRi to the alternative
statement ranked at j-th position is computed by wi * Pj
i .
 Rank all alternative statements in descending order of the
sum of the final ranking scores they received.
JUNE 16,
2016
87

 Dataset: 50 doubtful statements are compiled from
factoid questions in Q&A track in TREC 8 and TREC 9.
See http://cs.binghamton.edu/~xianli/doubtful_statements.htm.
 For each doubtful statement, the correct answer is
manually verified.
 For each doubtful statement, form a topic query after the
doubt unit is removed. Submit it to the Yahoo! search
engine and obtain the top 200 SRRs.
 Retrieve 100 results for each alternative statement.
Experiment (1)
88
Doubtful statement Doubt unit Truth
Antarctic is the only continent without a desert. Antarctic Europe
George C. Scott won the Oscar for best actor in 1970. George C. Scott George C. Scott
JUNE 16,
2016

 Alter-unit generation algorithm evaluation
 Test 1: Randomly select 25 doubtful statements as
training set and the rest as testing set.
Experiment (2)
Training data set Testing data set
Total cases 25 Total cases 25
Truthful one as top 1st 17 Truthful one as top 1st 14
Truthful one as 2nd 7 Truthful one as 2nd 9
Truthful one as 3rd 0 Truthful one as 3rd 1
Truthful one as 4th 1 Truthful one as 4th 1
Truthful one as 5th 0 Truthful one as 5th 0
All truthful alter-units are ranked among the top 5 results.
89
JUNE 16,
2016

 Impact of using different numbers of top SRRs
Experiment (3)
90
Top 10 Top 50 Top 100 Top 150 Top 200
Ranked as 1st 12 21 29 31 31
Ranked as 2nd 6 12 10 12 16
Ranked as 3rd 5 3 3 3 1
Ranked as 4th 3 1 2 2 2
Ranked as 5th 2 0 1 1 0
Not Among top 5 22 13 5 0 0
JUNE 16,
2016

Precision of each basic ranker
 The precision of each basic ranker is defined by
precision = n / N
n: the number of truthful statements that are ranked at the top
N: the number of doubtful statements evaluated
 The precisions of the seven basic rankers:
Experiment (4)
Ranker AUR TFR(TD) TFR(RC) TFR(RQR)
Precision 0.62 0.32 0.66 0.6
Ranker HR DAR TFR(Rrank)
Precision 0.20 0.20 0.62
91
JUNE 16,
2016

Truthfulness verification evaluation
 The precisions of different merging algorithms (based on
10-fold cross validation):
Experiment (5)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BaseBorda BaseCond ProbComb PosBorda WBorda WCond WPosBorda
92
JUNE 16,
2016

Analysis of five failed cases
Cases 4 and 5 could be considered as correct.
Experiment (6)
93
Untruthful statements verified as truthful Truthful
1 Tom Hanks was lead actress in the
movie 'Sleepless in Seattle'.
Meg Ryan
2 Apollo is the first spacecraft on the moon. Luna2
3 Sullivan is the fastest swimmer in the world. Michael Phelps
4 Les Paul invented the electric guitar. Rickenbacker
5 English is the primary language of the Philippines Filipino
JUNE 16,
2016

 General fact statements and questions in QA systems
are different.
 Not all fact statements can be converted to equivalent
questions, e.g., fact statements whose doubt units are
verbs.
 The doubt units in fact statements convey more information
than the WH-word in a question.
 A fact statement may have multiple doubt units but a
question can imply only one doubt unit.
Relationship with QA
94
JUNE 16,
2016

T-verifier Answers.com Yahoo! answers
Total statements 50 50 50
Wrong results 5 4 12
Cannot find results 0 6 18
Correct Results 45 40 20
95
How does T-verifier compare with two
popular Web QA Systems?
 Example Statement Answers.com can’t find result:
 [800] people died when the Estonia sank in 1994.
 Example Statement Answers.com gives wrong result:
 [20] hexagons are on a soccer ball
 Correct answer: 20, Answers.com gives: 32
JUNE 16,
2016

 This type of statements will be called MTA statements.
 Examples of MTA statements:
Doubtful Statements with Multiple
Truthful Alternatives
96
Type Doubtful statements Truthful alternatives
CC Barack Obama was born in [Kenya]. Honolulu, Hawaii, United States
MVA [Edwin Krebs] won the Nobel Prize in
medicine in 1992.
Edwin Krebs, Edmond Fischer
SFE [Peter Ustinov] has portrayed Hercule
Poirot.
Peter Ustinov, David Suchet,
Agatha Christie
DSC [U.S.] won team title at the 2011
World Gymnastics Championships.
U.S., China
TS [Bob Dole] served as the President of
the United States.
Barack Obama since Jan 2009,
George Bush from Jan 2001 to Jan
2009, …
JUNE 16,
2016

 Objective: identify all truthful alternatives for each MTA
statement.
 Significantly more challenging than processing STA
statements (doubtful statements with a single truthful
alternative).
 The step of generating and ranking alter-units can
remain the same.
 The difference is how to determine all the truthful alter-
units from the ranked candidate alter-units.
MTA Statements Processing
97
JUNE 16,
2016

 Assumption: Truthful alter-units are likely ranked above
untruthful ones.
 In other words, the ranking scores of the truthful alter-units
are likely to be higher than those of untruthful ones.
 Top-k approach: Determine an appropriate integer k so
that the top-k ranked alter-units are recognized as the
truthful ones.
 The problem becomes how to determine the right k for
each MTA statement.
 This approach is a straightforward extension of the
solution for processing STA statements, which is a top-1
solution.
Top-k Approach (1)
98
JUNE 16,
2016

Three different ways to select the k:
 Largest Score Gap (LSG). Compare the gap between the
scores of each pair of consecutively ranked alternative
statements and choose the largest gap as the cut-off point.
 Largest Percentage Gap (LPG). Similar to LSG except that
percentage gap of scores is used.
 The percentage gap between two ranking scores Si and Si+1 of two
consecutively ranked alternative statements is: (Si – Si+1) / Si.
 First Significant Gap (FSG). This method uses the first
significant score gap as the cut-off point. In this paper,
significant score gap is defined as follows: if logb(Si / Si+1) > 1,
the score gap is considered to be significant, and k is set to be
the smallest (i.e., the first) i that satisfies this condition.
Top-k Approach (2)
99
JUNE 16,
2016

Compare the three methods for selecting the k
 50 MTA statements are used.
 A total of 143 truthful alternative statements for the 50 MTA
statements, 2.8 per MTA statement on average.
Top-k Approach (3)
100
# total
selected
# truthful
alternatives
Precision Recall F-score
LSG 140 107 0.76 0.74 0.75
LPG 182 125 0.68 0.86 0.76
FSG 178 128 0.71 0.88 0.79
Conclusions:
 Top-k approach is not sufficiently good.
 Need new approach: Divide MTA statements into different
types and develop a solution for each type.
JUNE 16,
2016

Five types of MTA statements have been identified.
 Type 1: Compatible Concepts (CC). For each MTA statement of
this type, its truthful alter-units are compatible to each other.
Usually, these alter-units either are equivalent to each other
(i.e., synonyms) or correspond to the same basic concept but
with different specificity/generality (i.e., hyponyms/hypernyms)
or with different granularity (i.e., one is a part of another).
Example 1: For “Barack Obama was born in [Honolulu]”, truthful
alternatives include “Honolulu”, “Hawaii”, “United States”, etc.
Example 2: For “Queen Elizabeth II resided in [United
Kingdom]”,
correct alter-units include “United Kingdom”, “England” and
“Great Britain”.
Different Types of MTA Statements (1)
101
JUNE 16,
2016

 Type 2: Multi-Valued Attributes (MVA). For each MTA
statement of this type, the truthful alter-units correspond to
different values of a multi-valued attribute in a relational
database. A multi-valued attribute may have multiple values for
a given entity (record).
Example: For “[Edwin Krebs] won the Nobel Prize in medicine in
1992”, two US biochemists “Edwin Krebs” and “Edmond”
Fischer” shared the 1992 Nobel Prize in medicine (they are
values of the multi-valued attribute “Recipients” of a Nobel
Prize record); therefore both of them are truthful alter-units.
102
JUNE 16,
2016

 Type 3: Shared-Feature Entities (SFE). Many entities share a
common feature. In this case, when trying to find entities that
have this feature, multiple entities may be detected.
Example: For “[Peter Ustinov] has portrayed Hercule Poirot”, all
actors who have portrayed Hercule Poirot in different movies
(including remakes) are truthful alter-units. The truthful alter-
units include Peter Ustinov, David Suchet and Agatha Christie.
 Difference between the SFE type and the MVA type: For the
former, multiple entities share a common feature, while for the
latter, the same entity has multiple values for an attribute.
 In reality, values of the same multi-valued attribute are much
more likely to co-occur compared to the entities that share the
same feature.
103
JUNE 16,
2016

 Type 4: Different Sub-Categories (DSC). Different sub-
categories related to some of the topic units of the doubtful
statement. A topic unit of a doubtful statement DS is a
term/phrase in DS different from the doubt unit. When a topic
unit is replaced by different sub-categories, the doubt unit may
have different truthful alter-units, making the doubtful statement
an MTA statement.
Example: For “[U.S.] won team title at the 2011 World
Gymnastics Championships”, topic unit “World” could be
replaced by “US”, “Europe”, etc. and “team” could be replaced
by “men’s team” or “women’s team”. Each replaced can lead to
a different truthful alter-unit.
104
JUNE 16,
2016

 Type 5: Time-Sensitive (TS).
 A fact statement is a time-sensitive statement if its truthfulness
changes over time with a fairly regular pattern on time or such
change is expected even though there is no such regularity.
Since there may be different truthful alter-units at different times
for such type of statements, they are MTA statements.
Example: A time-sensitive statement with a fairly regular pattern
is US presidents (serve 4 or 8 years).
Example: A time-sensitive statement with less regularity is “A
new world 100-meter track record was established by [Usain
Bolt]” because it is generally not predictable who will establish a
new record in the future.
105
JUNE 16,
2016

 We recently completed a paper on processing doubtful
statements of CC and MVA types.
 For statements of the CC type, we primarily explored
semantic relationships (such as synonymy and hypernymy
relationships) among alter-units.
 For statements of the MVA type, we primarily explored co-
occurrence-based correlation relationships among alter-
units.
Processing CC & MVA Statements
106
JUNE 16,
2016

 Sometimes, seemingly conflicting fact statements (SCFSs)
do not mean actual conflicts.
 Consider the following two statements:
S1: China won the team title at the 2011 World
Gymnastics Championships in Tokyo.
S2: USA won the team title at the 2011 World Gymnastics
Championships in Tokyo.
Resolving Seemingly Conflicting Fact
Statements Caused by Missing Terms
107
JUNE 16,
2016

 The seemingly conflict is caused by missing terms that
make the statements imprecise.
 With the missing terms added, we have the following
precise statements:
S1: China won the men’s team title at the 2011 World
S2: USA won the women’s team title at the 2011 World
108
JUNE 16,
2016

 General problem: Giving a set of seemingly conflicting fact
statements, determine whether the seemingly conflict is
caused by missing terms by finding the missing terms. We
also determine the appropriate position for the missing
term in each statement.
 The missing terms often create sub-categories of a
category.
Example: “men’s team title” and “women’s team title” are
sub-categories of “team title”.
 So the solution to this problem can be used to process
MTA-statements of the DSC type.
109
JUNE 16,
2016

 Current data fusion techniques assume that there is a
single true value for each data item. How to handle
situations where there may be multiple true values for
some data items?
 Data inconsistency may have different causes. How to
automatically identify these causes and apply appropriate
methods to resolve them?
Additional Research Issues (1)
110
JUNE 16,
2016

 How to automatically classify MTA statements into different
types?
 The same statement may be classified into multiple types.
 How to process the other two types of MTA statements,
namely SFE and TS?
 If a statement is classified into multiple types, it may be
processed using different techniques. How to combine the
results of these different techniques?
111
JUNE 16,
2016

 How to handle doubtful statements that have no specified
doubt unit(s)? In other words, how to automatically
determine the most worthy doubt unit(s) from a given fact
statement?
Example: For the following statement,
More than 100,000 people were killed by the 2008 Sichuan
earthquake.
112
JUNE 16,
2016

 How to process all fact statements on the Web?
Challenges:
 How to identify controversial fact statements so we can
focus on these statements only?
 How to use fact statements that have already been
verified to verify or de-verify other fact statements?
113
JUNE 16,
2016

 How to apply natural language processing (NLP) to
determine the truthfulness of fact statements?
Example: Consider the following sentences related to fact
statement “ SUNY has 64 campuses”:
1. I believe SUNY has 64 campuses.
2. I strongly believe SUNY has 64 campuses.
3. I am sure SUNY has 64 campuses.
4. I am sure SUNY has 64 campuses because this is
mentioned at the SUNY website.
114
JUNE 16,
2016

 How to identify and resolve time-sensitive data
inconsistency in fact statements?
 The problem is mostly caused by the use of relative time
references such as yesterday, last week, 2 years ago, …
 Need to establish the correct reference point for each
relative time in a data source.
115
JUNE 16,
2016

Weiyi meng web data truthfulness analysis

More Related Content

Viewers also liked

Similar to Weiyi meng web data truthfulness analysis

More from jins0618

Recently uploaded

Weiyi meng web data truthfulness analysis