1. Rule Mining and applications in
Social Data
Luis Galárraga
Télécom ParisTech
Presented at:
International Workshop on Social Media and Culture 2014
Daejeon, Korea
April 4th, 2014
1
2. Natural Language vs Knowledge
Bases (KBs)
Natural Language Knowledge Bases
2
Is a performs
born On
Feb 2, 1977
Singer
Hips don't
lie
Shakira
3. Natural Language vs Knowledge
Bases (KBs)
Natural Language Knowledge Bases
Understandable for
computer programs
3
Suitable for humans
but difficult
for computers
7. Social graphs are KBs
7
● Sources may be different
but they both share:
● Natural graph-like structure likes
Luis Galárraga
Shakira
Hips don't
lie
friendOf
Shamira
likes
performs
likes
8. Social graphs are KBs
8
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
likes
Luis Galárraga
friendOf
Shamira
likes
Shakira
Hips don't
lie
likes
performs
likes
9. likes
Social graphs are KBs
9
likes
Luis Galárraga
friendOf
Shamira
likes
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
Shakira
Hips don't
lie
likes
performs
Maybe nobody
asked me? :(
10. Social graphs are KBs
10
likes
Luis Galárraga
friendOf
Shamira
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
● Opportunities for data
description and prediction.
Shakira
Hips don't
lie
likes
performs
likes
11. Social graphs are KBs
likes
Luis Galárraga
friendOf
Shamira
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
● Opportunities for data
description and prediction.
90% of computer scientists like political party X
If you like Shakira you are likely to buy her latest song
Shakira
Hips don't
lie
likes
performs
likes
11
12. Rule Mining and KBs
● Data Mining is about finding interesting and
non-obvious correlations in data.
● Correlations are rules that hold often.
● You probably live in the same city of your spouse.
● If you like an artist, you like her songs.
● They can be formulated as logical rules:
12
isMarriedTo(x, y) ^ livesIn(x, city) => livesIn(y, city)
likes(x, artist) ^ performs(artist, song) => likes(x, song)
13. Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't
lie
likes
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
13
14. Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't
lie
likes
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
14
15. Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't
lie
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
likes
likes
15
16. Applications for social data
● Market basket analysis.
● People who buy laptops also buy laptop cases.
● Link and event prediction
● Two people who attended the same high school the same
year might know each other.
● If you registered for this workshop, then you are coming to
Daejeon (and need to book a flight and hotel).
● Dealing with incompleteness
● If you like German newspapers, fluency in German is
perhaps missing in your profile.
16
18. Challenges of Rule Mining from KBs
● Scalability
● State-of-the-art approaches for rule mining cannot
handle the size of current KBs.
– YAGO: 10M entities, 120M facts
– Dbpedia 3.8: 24.9M entities, 1.98B facts.
– Facebook Graph: 1.2B users
● Rule Mining requires exhaustive search of the data.
18
19. Challenges of Rule Mining from KBs
● Scalability
● State-of-the-art approaches cannot handle the size of
current KBs.
– YAGO: 10M entities, 120M facts
– Dbpedia 3.8: 24.9M entities, 1.98B facts.
– Facebook Graph: 1.2B users
● Rule Mining requires exhaustive search of the data.
● Solution:
● Language bias.
● A set of pruning heuristics.
● Optimized storage implementation. 19
20. AMIE: Association Rule Mining
Under Incomplete Evidence
● AMIE is a system that learns Horn rules such as:
● Starting with all possible head relations r(x,y) and a
minimum support threshold:
– The system explores the search space by means of
carefully designed mining operators.
– Search space is restricted to closed Horn rules.
– Monotonicity of support helps pruning non-promising paths.
– It relies on an optimized in-memory database.
– Confidence gain is used to prune the output.
20
livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under
incomplete evidence in ontological knowledge bases. In WWW, 2013.
24. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
24
25. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
Add closing atom (OC
) y city
livesIn
x
isMarriedTo
?r
livesIn
diedIn
…
25
26. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
Add closing atom (OC
) y city
livesIn
x
isMarriedTo
?r
livesIn
diedIn
…
livesIn
y city
livesIn
x
isMarriedTo
26
27. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
Add closing atom (OC
) y city
livesIn
x
isMarriedTo
?r
livesIn
diedIn
…
livesIn
y city
livesIn
x
isMarriedTo
27livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
28. AMIE: Association Rule Mining
Under Incomplete Evidence
Minimum support
threshold
RDF KB
k
11
Concurrent mining implementation
Tailored In-memory DB
28
29. AMIE: Association Rule Mining
Under Incomplete Evidence
Facts Rules
YAGO2 1M 3.62min 138
1M 17.76min 18K
6.7M 2.89min 6.9K
Dataset Runtime
YAGO2 (const)
Dbpedia (2 atoms)
AMIE finds rules in medium-size ontologies in a
few minutes.
30. Challenges of Rule Mining on KBs
● Incompleteness
● Graph data often contains gaps.
● Open World Assumption (OWA)
● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
31. Challenges of Rule Mining on KBs
● Incompleteness
● Graph data often contains gaps.
● Open World Assumption (OWA)
● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
citizenOf
likes(x, Shakira) => isCitizenOf(x, Ecuador)
Ecuador
32. Challenges of Rule Mining on KBs
● Incompleteness
● Graph data often contains gaps.
● Open World Assumption (OWA)
● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
likes(x, Shakira) => isCitizenOf(x, Ecuador) likes
Luis Galárraga
friendOf
ShamiraShakira
likes
citizenOf
Standard confidence uses a CWA and
counts Shamira as counterexample.
Score = 0.5
Ecuador
33. Challenges of Rule Mining on KBs
likes(x, Shakira) => isCitizenOf(x, Ecuador)
33
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
Ecuador
AMIE uses the Partial Completeness
Assumption (PCA) to estimate the confidence
of rules under OWA.
A KB knows all or none of the nationalities of a
person.
citizenOf
34. Challenges of Rule Mining on KBs
likes(x, Shakira) => isCitizenOf(x, Ecuador)
34
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
Ecuador
AMIE uses the Partial Completeness
Assumption (PCA) to estimate the confidence
of rules under OWA.
A KB knows all or none of the nationalities of a
person.
citizenOf
PCA confidence considers as counterexamples
only those people whose nationality is known to
be different from Ecuador. Score = 1.0
35. AMIE: Association Rule Mining
under Incomplete Evidence
● PCA confidence has better predictive behavior
than the standard confidence.
35
36. AMIE: Association Rule Mining
under Incomplete Evidence
isMarriedTo(x, y) livesIn(x, z) => livesIn(y, z)∧
isCitizenOf(x, y) => livesIn(x, y)
hasAdvisor(x, y) graduatedFrom(x, z) => worksAt(y, z)∧
hasWonPrize(x, Gottfried Wilhelm Leibniz Prize) => livesIn(x, Germany)
● Some rules mined by AMIE on YAGO:
37. Research outlook
● Mine other types of logical rules for more
applications.
● Numerical correlations for data description and
prediction.
– If you like Justin Bieber, then you are probably less than
18.
● Rules involving temporal information for event
prediction.
– If a person bought a laptop today, then she will buy a hard
disk in approximately one month.
– If a person traveled for Christmas to the same place in the
last two years, then she will probably do it this year.
37