This document discusses mining interesting meta-paths from complex heterogeneous information networks. It begins by introducing homogeneous and heterogeneous networks and some example meta-paths. It then discusses the limitations of current meta-path related research approaches. The document proposes a framework that can handle large-scale heterogeneous networks with complex meta-type hierarchies and automatically generate meta-paths. It describes generating paths between objects and ranking paths based on meta-data to extract interesting meta-paths.
4. Heterogeneous Network
Belongs to
Speaks at
locates at
locates at the capital of
affiliate
Professor at
People
Association
Meeting
Education
Geography
Meeting
Geography
7. How things are uniquely
connected/separated?
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Interesting meta-path is meta-path that best describes how
two objects are uniquely related in complex HINs.
7
8. NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Education Professor University Geography
9. NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Education Professor University Geography
Education Network Scientist
Catholic
University
Geography
9
10. NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Education Professor University Geography
Education Network Scientist
Catholic
University
Geography
Education
Network Scientist
who born in
Transylvania,1967
Catholic
University
at South Bend, IN
Geography
10
11. Limitations of State of the Art
Meta-Path Related Researches
• Type of meta-labels are limited
• Meta-types do not have complex hierarchy
• Meta-paths are pre-defined manually
• No large scale experiments
Term Venue
Paper
Author
11
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
12. Limitations of State of the Art
Meta-Path Related Researches
• Type of meta-labels are limited
• Meta-types do not have complex hierarchy
• Meta-paths are pre-defined manually
• No large scale experiments
Framework that can handle millions of meta-types
Meta-types with complex hierarchy
Meta-path are automatically generated
Experiments are done on Wikipedia
(10 million nodes, 740 million edges)
12
13. How to find interesting paths?
• Generate paths
• Rank top k interesting paths using meta-data
• Extract meta-path for searching
13
14. Path Generation
sib(ai, aj) i↵ ai 2 t ^ aj 2 t
8au0 2 A0
u, sib(au0 , au) 8av0 2 A0
v, sib(av0 , av)
{~y1, ~y2, . . .} 2 Y ~y = ha1, a2, . . . , a|~y|ii
~x = ha1, a2, . . . , a|~x|ii{~x1, ~x2, . . . , ~xk} 2 X
au = a1, av = a|~x|, 1 i k
• Generate path set for given points
• Generate sibling path set
14
a1 2 A0
u, a|~y| 2 A0
v, 1 i k
au, av
16. ROCKNE
NORTHEASTERN NOTRE DAME
WAND
BARABÁSI
ROSE BOWLHARVARD
CY YOUNG CARL HUBBELL
CARNEGIE MELLON UNIVERSITY
TD GARDEN LA COLISEUM
Example: Path generation
16
Which is the most interesting path?
20. 20
●
●
●
●
●
0.48
0.52
0.56
0.60
0 0.25 0.5 0.75 1
Result: Path Ranking
Result shows user more like to pick path with
lowest or highest similarity.
People pick path with highest score may because they treat best
as correct.
21. DATA MINERS
JIAWEI
HAN
DATA MINING SIGKDD JOHANNES
GEHRKE
STATISTICIANS
MATHEMATICIANS
PEOPLE
SCHOLARS AND
ACADEMICS
DATA MINING
SCIENCE
ACM SIGS
PEOPLE
MorespecificMoregeneral
Nodes
Types
COMPUTATIONAL
STATISTICS
MATHEMATICAL
SCIENCES
STATISTICS
SOCIETY
ACM
PROFESSIONAL
ORGANIZATIONS
SCIENTIFIC
SOCIETIES
DATABASE
RESEARCHERS
COMPUTER
SCIENTISTS
SCHOLARS AND
ACADEMICS
SCHOLARS
ORGANIZATIONS
Example: Extract Meta-Path
22. 22
Result: Meta-Path Constraint RWR
0 0.24 0.41 0.48
Edgar F. Codd 40.5 18.1 9.0
Johannes Gehrke 28.4 29.4 8.4 2.8
Raghu Ramakrishnan 31.1 6.0 3.6
Anita Borg 5.1 0.6 0.2
Shafi Goldwasser 4.9 0.6
Osmar R. Zaiane 4.8 3.6 1.6
Vint Cerf 4.1 2.4 0.2
Allen Newell 2.0 0.6
ACM 5.1
IEEE 4.9
Yahoo! Research 4.8
Microsoft Research 4.4
Database
researchers
Computer
Scientist