Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Implementing
Link-Prediction for Social Networks
in a Database System
Sara Cohen Netanel Cohen-Tzemach
The Hebrew University of Jerusalem

What backend to choose?
● Premise: <1M Nodes
● DIY vs. existing
● Data model
● Limitations/Features
● TPC-C won't help...

Previous work
Compared databases
Limitations and Features
Data model
Implementation
Experiments
Measurements
N. Ruflin, H. Burkhart, and S. Rizzotti. Social-data storage-systems.
In Databases and Social Networks, DBSocial '11, pages 7-12, New York, NY, USA, 2011. ACM.

Our work
● Implemented 7 Link-Prediction metrics
● Experimented on 10 social-networks
● Over 3 different backends
○ Relational (MySQL)
○ Key-Value (Redis)
○ Graph (Neo4J)
● What did we find?
○ Stay tuned :)

● Why Link Prediction?
○ Well researched
○ Useful
○ Multiple scoring functions
Link Prediction
D. Liben-Nowell and J. Kleinberg.
The link prediction problem for social networks. In CIKM, 2003.
"Given a snapshot of a social network at time t,
we seek to accurately predict the edges that will be added
to a specific node during the interval from time t
to a given future time t'."

● Common Neighbors
○ Only neighbors
● Katz measure
○ Paths
● Rooted PageRank
○ Random walk
Link Prediction examples

Storage systems: MySQL
http://www.mysql.com/
InnoDB vs MyISM: http://www.oracle.com/partners/en/knowledge-zone/mysql-5-5-innodb-myisam-522945.pdf
● Relational database
○ Edges table
○ Stored procedures, Indices, "helper" tables
2
4
1
3
6
5
ID1 ID1
1 2
1 3
2 1
2 3
2 4
2 5
3 1

Storage systems: Redis
http://redis.io/
● Key-Value store
○ Adjacency sets
○ Lua functions, "helper" database
2
4
1
3
6
5
1: (2, 3)
2: (1, 2, 3, 4)
3: (1, 2, 5)
4: (2)
5: (2, 3, 6)
6: (5)

Storage systems: Neo4J
http://www.neo4j.org/
● Graph database
○ No modeling required
○ Cypher queries, Lucene "helper" index
2
4
1
3
6
5
2
4
1
3
6
5

Storage systems
● Why these systems?
○ Popular
○ Open Source
● Perfect implementation?
○ No. But,
■ Unbiased
■ Best practices
■ Same time-frame
Full implementation available on GitHub:
github.com/natict/gdbb

Implementation of
Common Neighbours
select E2.id2 as y, count(E2.id1) as neighbor_count
from edges as E1 join edges as E2
where E1.id1 = x and E1.id2 = E2.id1
and E1.id1 <> E2.id2
group by y
order by neighbor_count desc
imit 100;
START a=node({n})
MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c)
WHERE a <> c
RETURN a.nid,c.nid,count(b) as score
ORDER BY score DESC
LIMIT 100
local tc = {};
local x = KEYS[1];
for k1,n in pairs(redis.call('smembers', x)) do
for k2,y in pairs(redis.call('smembers', n)) do
if x ~= y then
tc[y] = (tc[y] or 0) + 1;
end;
end;
end;
local ttop = {}; -- Extract top 100 results
local min = math.huge;
local mini = '';
for k,v in pairs(tc) do
if (#ttop < 100) then
table.insert(ttop, {k,v});
if v<min then min=v; mini=table.maxn(ttop); end;
else
if v>min then
ttop[mini] = {k,v};
min = math.huge;
for i = 1,#ttop,1 do
if ttop[i][2]<min then min=ttop[i][2]; mini=i; end;
end;
end;
end;
end; -- Now we just need to sort, and format the output...
...
SQL
Cypher
Lua

Cypher
START a=node({n})
MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c)
WHERE a <> c
RETURN a.nid,c.nid,count(b) as score
ORDER BY score DESC
LIMIT 100
a
b
c
b
a
c
b

Datasets
● Undirected
● Medium sized
● Socially oriented
● Data sources
○ DBLP
○ SNAP
DBLP in XML format: http://dblp.uni-trier.de/xml/
SNAP Datasets: http://snap.stanford.edu/data/index.html
Name # Nodes # Edges
dblp-all 366,600 4,349,796
ca-HepPh 12,006 237,010
enron 36,692 367,662
facebook 4,039 170,174

Experiments
Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html

Experiments (2)

Experiments (3)

Conclusions
● MySQL is highly optimised
○ mainly for simple queries (with few joins)
● Redis is very flexible and fast
○ mainly with complex metrics
● Neo4J has implementation simplicity
○ with some limitations
○ still evolving at a fast pace
● Future work
○ More databases
○ More algorithms

Thank you
Nati (Netanel) Cohen-Tzemach
linkedin.com/in/natict
Acknowledgments:
● Israel Science Foundation (Grant 143/09)
● Ministry of Science and Technology (Grant 3-8710)
● DBSocial Travel Award

Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Similar to Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013) (20)

Recently uploaded

Recently uploaded (20)

Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)