FULL ENJOY Call Girls In Mohammadpur (Delhi) Call Us 9953056974
Rule Mining and Applications in Social Data
1. Rule Mining and applications in
Social Data
Luis Galárraga
Télécom ParisTech
Presented at:
International Workshop on Social Media and Culture 2014
Daejeon, Korea
April 4th, 2014
1
2. Natural Language vs Knowledge
Bases (KBs)
Natural Language Knowledge Bases
2
Is a performs
born On
Feb 2, 1977
Singer
Hips don't
lie
Shakira
3. Natural Language vs Knowledge
Bases (KBs)
Natural Language Knowledge Bases
Understandable for
computer programs
3
Suitable for humans
but difficult
for computers
7. Social graphs are KBs
7
● Sources may be different
but they both share:
● Natural graph-like structure likes
Luis Galárraga
Shakira
Hips don't
lie
friendOf
Shamira
likes
performs
likes
8. Social graphs are KBs
8
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
likes
Luis Galárraga
friendOf
Shamira
likes
Shakira
Hips don't
lie
likes
performs
likes
9. likes
Social graphs are KBs
9
likes
Luis Galárraga
friendOf
Shamira
likes
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
Shakira
Hips don't
lie
likes
performs
Maybe nobody
asked me? :(
10. Social graphs are KBs
10
likes
Luis Galárraga
friendOf
Shamira
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
● Opportunities for data
description and prediction.
Shakira
Hips don't
lie
likes
performs
likes
11. Social graphs are KBs
likes
Luis Galárraga
friendOf
Shamira
● Sources may be different
but they both share:
● Natural graph-like structure
● Incompleteness
● Opportunities for data
description and prediction.
90% of computer scientists like political party X
If you like Shakira you are likely to buy her latest song
Shakira
Hips don't
lie
likes
performs
likes
11
12. Rule Mining and KBs
● Data Mining is about finding interesting and
non-obvious correlations in data.
● Correlations are rules that hold often.
● You probably live in the same city of your spouse.
● If you like an artist, you like her songs.
● They can be formulated as logical rules:
12
isMarriedTo(x, y) ^ livesIn(x, city) => livesIn(y, city)
likes(x, artist) ^ performs(artist, song) => likes(x, song)
13. Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't
lie
likes
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
13
14. Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't
lie
likes
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
14
15. Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't
lie
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
likes
likes
15
16. Applications for social data
● Market basket analysis.
● People who buy laptops also buy laptop cases.
● Link and event prediction
● Two people who attended the same high school the same
year might know each other.
● If you registered for this workshop, then you are coming to
Daejeon (and need to book a flight and hotel).
● Dealing with incompleteness
● If you like German newspapers, fluency in German is
perhaps missing in your profile.
16
18. Challenges of Rule Mining from KBs
● Scalability
● State-of-the-art approaches for rule mining cannot
handle the size of current KBs.
– YAGO: 10M entities, 120M facts
– Dbpedia 3.8: 24.9M entities, 1.98B facts.
– Facebook Graph: 1.2B users
● Rule Mining requires exhaustive search of the data.
18
19. Challenges of Rule Mining from KBs
● Scalability
● State-of-the-art approaches cannot handle the size of
current KBs.
– YAGO: 10M entities, 120M facts
– Dbpedia 3.8: 24.9M entities, 1.98B facts.
– Facebook Graph: 1.2B users
● Rule Mining requires exhaustive search of the data.
● Solution:
● Language bias.
● A set of pruning heuristics.
● Optimized storage implementation. 19
20. AMIE: Association Rule Mining
Under Incomplete Evidence
● AMIE is a system that learns Horn rules such as:
● Starting with all possible head relations r(x,y) and a
minimum support threshold:
– The system explores the search space by means of
carefully designed mining operators.
– Search space is restricted to closed Horn rules.
– Monotonicity of support helps pruning non-promising paths.
– It relies on an optimized in-memory database.
– Confidence gain is used to prune the output.
20
livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under
incomplete evidence in ontological knowledge bases. In WWW, 2013.
24. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
24
25. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
Add closing atom (OC
) y city
livesIn
x
isMarriedTo
?r
livesIn
diedIn
…
25
26. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
Add closing atom (OC
) y city
livesIn
x
isMarriedTo
?r
livesIn
diedIn
…
livesIn
y city
livesIn
x
isMarriedTo
26
27. y city
livesIn
Add dangling atom (OD
) y city
livesIn?r
isMarriedTo
livesIn
….
x
y city
livesInmarriedTo
x
Add closing atom (OC
) y city
livesIn
x
isMarriedTo
?r
livesIn
diedIn
…
livesIn
y city
livesIn
x
isMarriedTo
27livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
28. AMIE: Association Rule Mining
Under Incomplete Evidence
Minimum support
threshold
RDF KB
k
11
Concurrent mining implementation
Tailored In-memory DB
28
29. AMIE: Association Rule Mining
Under Incomplete Evidence
Facts Rules
YAGO2 1M 3.62min 138
1M 17.76min 18K
6.7M 2.89min 6.9K
Dataset Runtime
YAGO2 (const)
Dbpedia (2 atoms)
AMIE finds rules in medium-size ontologies in a
few minutes.
30. Challenges of Rule Mining on KBs
● Incompleteness
● Graph data often contains gaps.
● Open World Assumption (OWA)
● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
31. Challenges of Rule Mining on KBs
● Incompleteness
● Graph data often contains gaps.
● Open World Assumption (OWA)
● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
citizenOf
likes(x, Shakira) => isCitizenOf(x, Ecuador)
Ecuador
32. Challenges of Rule Mining on KBs
● Incompleteness
● Graph data often contains gaps.
● Open World Assumption (OWA)
● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
likes(x, Shakira) => isCitizenOf(x, Ecuador) likes
Luis Galárraga
friendOf
ShamiraShakira
likes
citizenOf
Standard confidence uses a CWA and
counts Shamira as counterexample.
Score = 0.5
Ecuador
33. Challenges of Rule Mining on KBs
likes(x, Shakira) => isCitizenOf(x, Ecuador)
33
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
Ecuador
AMIE uses the Partial Completeness
Assumption (PCA) to estimate the confidence
of rules under OWA.
A KB knows all or none of the nationalities of a
person.
citizenOf
34. Challenges of Rule Mining on KBs
likes(x, Shakira) => isCitizenOf(x, Ecuador)
34
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
Ecuador
AMIE uses the Partial Completeness
Assumption (PCA) to estimate the confidence
of rules under OWA.
A KB knows all or none of the nationalities of a
person.
citizenOf
PCA confidence considers as counterexamples
only those people whose nationality is known to
be different from Ecuador. Score = 1.0
35. AMIE: Association Rule Mining
under Incomplete Evidence
● PCA confidence has better predictive behavior
than the standard confidence.
35
36. AMIE: Association Rule Mining
under Incomplete Evidence
isMarriedTo(x, y) livesIn(x, z) => livesIn(y, z)∧
isCitizenOf(x, y) => livesIn(x, y)
hasAdvisor(x, y) graduatedFrom(x, z) => worksAt(y, z)∧
hasWonPrize(x, Gottfried Wilhelm Leibniz Prize) => livesIn(x, Germany)
● Some rules mined by AMIE on YAGO:
37. Research outlook
● Mine other types of logical rules for more
applications.
● Numerical correlations for data description and
prediction.
– If you like Justin Bieber, then you are probably less than
18.
● Rules involving temporal information for event
prediction.
– If a person bought a laptop today, then she will buy a hard
disk in approximately one month.
– If a person traveled for Christmas to the same place in the
last two years, then she will probably do it this year.
37