SlideShare a Scribd company logo
1 of 20
Download to read offline
Implementing
Link-Prediction for Social Networks
in a Database System
Sara Cohen Netanel Cohen-Tzemach
The Hebrew University of Jerusalem
About me
What backend to choose?
● Premise: <1M Nodes
● DIY vs. existing
● Data model
● Limitations/Features
● TPC-C won't help...
Previous work
Compared databases
Limitations and Features
Data model
Implementation
Experiments
Measurements
N. Ruflin, H. Burkhart, and S. Rizzotti. Social-data storage-systems.
In Databases and Social Networks, DBSocial '11, pages 7-12, New York, NY, USA, 2011. ACM.
Our work
● Implemented 7 Link-Prediction metrics
● Experimented on 10 social-networks
● Over 3 different backends
○ Relational (MySQL)
○ Key-Value (Redis)
○ Graph (Neo4J)
● What did we find?
○ Stay tuned :)
Link Prediction
● Why Link Prediction?
○ Well researched
○ Useful
○ Multiple scoring functions
Link Prediction
D. Liben-Nowell and J. Kleinberg.
The link prediction problem for social networks. In CIKM, 2003.
"Given a snapshot of a social network at time t,
we seek to accurately predict the edges that will be added
to a specific node during the interval from time t
to a given future time t'."
● Common Neighbors
○ Only neighbors
● Katz measure
○ Paths
● Rooted PageRank
○ Random walk
Link Prediction examples
Storage systems: MySQL
http://www.mysql.com/
InnoDB vs MyISM: http://www.oracle.com/partners/en/knowledge-zone/mysql-5-5-innodb-myisam-522945.pdf
● Relational database
○ Edges table
○ Stored procedures, Indices, "helper" tables
2
4
1
3
6
5
ID1 ID1
1 2
1 3
2 1
2 3
2 4
2 5
3 1
Storage systems: Redis
http://redis.io/
● Key-Value store
○ Adjacency sets
○ Lua functions, "helper" database
2
4
1
3
6
5
1: (2, 3)
2: (1, 2, 3, 4)
3: (1, 2, 5)
4: (2)
5: (2, 3, 6)
6: (5)
Storage systems: Neo4J
http://www.neo4j.org/
● Graph database
○ No modeling required
○ Cypher queries, Lucene "helper" index
2
4
1
3
6
5
2
4
1
3
6
5
Storage systems
● Why these systems?
○ Popular
○ Open Source
● Perfect implementation?
○ No. But,
■ Unbiased
■ Best practices
■ Same time-frame
Full implementation available on GitHub:
github.com/natict/gdbb
Implementation of
Common Neighbours
select E2.id2 as y, count(E2.id1) as neighbor_count
from edges as E1 join edges as E2
where E1.id1 = x and E1.id2 = E2.id1
and E1.id1 <> E2.id2
group by y
order by neighbor_count desc
imit 100;
START a=node({n})
MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c)
WHERE a <> c
RETURN a.nid,c.nid,count(b) as score
ORDER BY score DESC
LIMIT 100
local tc = {};
local x = KEYS[1];
for k1,n in pairs(redis.call('smembers', x)) do
for k2,y in pairs(redis.call('smembers', n)) do
if x ~= y then
tc[y] = (tc[y] or 0) + 1;
end;
end;
end;
local ttop = {}; -- Extract top 100 results
local min = math.huge;
local mini = '';
for k,v in pairs(tc) do
if (#ttop < 100) then
table.insert(ttop, {k,v});
if v<min then min=v; mini=table.maxn(ttop); end;
else
if v>min then
ttop[mini] = {k,v};
min = math.huge;
for i = 1,#ttop,1 do
if ttop[i][2]<min then min=ttop[i][2]; mini=i; end;
end;
end;
end;
end; -- Now we just need to sort, and format the output...
...
SQL
Cypher
Lua
Cypher
START a=node({n})
MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c)
WHERE a <> c
RETURN a.nid,c.nid,count(b) as score
ORDER BY score DESC
LIMIT 100
a
b
c
b
a
c
b
Datasets
● Undirected
● Medium sized
● Socially oriented
● Data sources
○ DBLP
○ SNAP
DBLP in XML format: http://dblp.uni-trier.de/xml/
SNAP Datasets: http://snap.stanford.edu/data/index.html
Name # Nodes # Edges
dblp-all 366,600 4,349,796
ca-HepPh 12,006 237,010
enron 36,692 367,662
facebook 4,039 170,174
Experiments
Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html
Experiments (2)
Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html
Experiments (3)
Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html
Conclusions
● MySQL is highly optimised
○ mainly for simple queries (with few joins)
● Redis is very flexible and fast
○ mainly with complex metrics
● Neo4J has implementation simplicity
○ with some limitations
○ still evolving at a fast pace
● Future work
○ More databases
○ More algorithms
Thank you
Nati (Netanel) Cohen-Tzemach
linkedin.com/in/natict
Acknowledgments:
● Israel Science Foundation (Grant 143/09)
● Ministry of Science and Technology (Grant 3-8710)
● DBSocial Travel Award

More Related Content

What's hot

What's hot (7)

The RSA Algorithm
The RSA AlgorithmThe RSA Algorithm
The RSA Algorithm
 
Software Metrics
Software MetricsSoftware Metrics
Software Metrics
 
matlab 2
matlab 2matlab 2
matlab 2
 
Cloud computing and security 03
Cloud computing and security 03Cloud computing and security 03
Cloud computing and security 03
 
Rsa Algorithm
Rsa AlgorithmRsa Algorithm
Rsa Algorithm
 
sfdfds
sfdfdssfdfds
sfdfds
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduce
 

Similar to Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
 
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...Yandex
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
1 chayes
1 chayes1 chayes
1 chayesYandex
 
superglue_slides.pdf
superglue_slides.pdfsuperglue_slides.pdf
superglue_slides.pdfXiaoguangHan3
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsemGopi Saiteja
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONaftab alam
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 

Similar to Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013) (20)

Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
1 chayes
1 chayes1 chayes
1 chayes
 
superglue_slides.pdf
superglue_slides.pdfsuperglue_slides.pdf
superglue_slides.pdf
 
Relations2 qa
Relations2 qaRelations2 qa
Relations2 qa
 
cluster(python)
cluster(python)cluster(python)
cluster(python)
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
Project PPT
Project PPTProject PPT
Project PPT
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsem
 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Data structures
Data structuresData structures
Data structures
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

  • 1. Implementing Link-Prediction for Social Networks in a Database System Sara Cohen Netanel Cohen-Tzemach The Hebrew University of Jerusalem
  • 3. What backend to choose? ● Premise: <1M Nodes ● DIY vs. existing ● Data model ● Limitations/Features ● TPC-C won't help...
  • 4. Previous work Compared databases Limitations and Features Data model Implementation Experiments Measurements N. Ruflin, H. Burkhart, and S. Rizzotti. Social-data storage-systems. In Databases and Social Networks, DBSocial '11, pages 7-12, New York, NY, USA, 2011. ACM.
  • 5. Our work ● Implemented 7 Link-Prediction metrics ● Experimented on 10 social-networks ● Over 3 different backends ○ Relational (MySQL) ○ Key-Value (Redis) ○ Graph (Neo4J) ● What did we find? ○ Stay tuned :)
  • 7. ● Why Link Prediction? ○ Well researched ○ Useful ○ Multiple scoring functions Link Prediction D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, 2003. "Given a snapshot of a social network at time t, we seek to accurately predict the edges that will be added to a specific node during the interval from time t to a given future time t'."
  • 8. ● Common Neighbors ○ Only neighbors ● Katz measure ○ Paths ● Rooted PageRank ○ Random walk Link Prediction examples
  • 9. Storage systems: MySQL http://www.mysql.com/ InnoDB vs MyISM: http://www.oracle.com/partners/en/knowledge-zone/mysql-5-5-innodb-myisam-522945.pdf ● Relational database ○ Edges table ○ Stored procedures, Indices, "helper" tables 2 4 1 3 6 5 ID1 ID1 1 2 1 3 2 1 2 3 2 4 2 5 3 1
  • 10. Storage systems: Redis http://redis.io/ ● Key-Value store ○ Adjacency sets ○ Lua functions, "helper" database 2 4 1 3 6 5 1: (2, 3) 2: (1, 2, 3, 4) 3: (1, 2, 5) 4: (2) 5: (2, 3, 6) 6: (5)
  • 11. Storage systems: Neo4J http://www.neo4j.org/ ● Graph database ○ No modeling required ○ Cypher queries, Lucene "helper" index 2 4 1 3 6 5 2 4 1 3 6 5
  • 12. Storage systems ● Why these systems? ○ Popular ○ Open Source ● Perfect implementation? ○ No. But, ■ Unbiased ■ Best practices ■ Same time-frame Full implementation available on GitHub: github.com/natict/gdbb
  • 13. Implementation of Common Neighbours select E2.id2 as y, count(E2.id1) as neighbor_count from edges as E1 join edges as E2 where E1.id1 = x and E1.id2 = E2.id1 and E1.id1 <> E2.id2 group by y order by neighbor_count desc imit 100; START a=node({n}) MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c) WHERE a <> c RETURN a.nid,c.nid,count(b) as score ORDER BY score DESC LIMIT 100 local tc = {}; local x = KEYS[1]; for k1,n in pairs(redis.call('smembers', x)) do for k2,y in pairs(redis.call('smembers', n)) do if x ~= y then tc[y] = (tc[y] or 0) + 1; end; end; end; local ttop = {}; -- Extract top 100 results local min = math.huge; local mini = ''; for k,v in pairs(tc) do if (#ttop < 100) then table.insert(ttop, {k,v}); if v<min then min=v; mini=table.maxn(ttop); end; else if v>min then ttop[mini] = {k,v}; min = math.huge; for i = 1,#ttop,1 do if ttop[i][2]<min then min=ttop[i][2]; mini=i; end; end; end; end; end; -- Now we just need to sort, and format the output... ... SQL Cypher Lua
  • 14. Cypher START a=node({n}) MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c) WHERE a <> c RETURN a.nid,c.nid,count(b) as score ORDER BY score DESC LIMIT 100 a b c b a c b
  • 15. Datasets ● Undirected ● Medium sized ● Socially oriented ● Data sources ○ DBLP ○ SNAP DBLP in XML format: http://dblp.uni-trier.de/xml/ SNAP Datasets: http://snap.stanford.edu/data/index.html Name # Nodes # Edges dblp-all 366,600 4,349,796 ca-HepPh 12,006 237,010 enron 36,692 367,662 facebook 4,039 170,174
  • 16. Experiments Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html
  • 17. Experiments (2) Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html
  • 18. Experiments (3) Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html
  • 19. Conclusions ● MySQL is highly optimised ○ mainly for simple queries (with few joins) ● Redis is very flexible and fast ○ mainly with complex metrics ● Neo4J has implementation simplicity ○ with some limitations ○ still evolving at a fast pace ● Future work ○ More databases ○ More algorithms
  • 20. Thank you Nati (Netanel) Cohen-Tzemach linkedin.com/in/natict Acknowledgments: ● Israel Science Foundation (Grant 143/09) ● Ministry of Science and Technology (Grant 3-8710) ● DBSocial Travel Award