KONECT CloudLarge Scale Network Mining in the Cloud                  Jérôme Kunegis  Future SOC Lab Day, 18.04.2012       ...
Networks are Everywhere                                                                       ip                          ...
Social Networks                  friend
Trust Networks                 tru                    st
Friend/Enemy Network                       en        d                          em      en                          y     ...
Interaction Network                      listen
KONECT – Koblenz Network Collection 148      network datasets       26 are undirected       38 are directed       84 are b...
Largest Network   Directed “who follows who” network        0 041 652 230 users        1 468 365 182 edges  konect.uni-kob...
148 Network Datasets     authorshipcommunication co-occurrence        features    folksonomy     interaction        physic...
What We Computed Connected components Network diameter   ←    at Future SOC Lab Clustering coefficients Degree distributio...
Network Diameter 6
90 Percentile Effective Diameter5
90 Percentile Effective Diameter                  3
90 Percentile Effective Diameter3.75
Computing the EffectiveDiameterfor each node i {                   |V|   count hops needed to reach 90%   |E|}Total runtim...
Graph Sampling                   Keep                 X% of edges
Computation  × 1 000 vertices (sampled)  × 120 840 391 edges  × 20 sample sizes (5%, 10%, …, 100%)  × 50 random samplings ...
Results
Thank You!                        Dr. Jérôme Kunegiskonect.uni-koblenz.de   kunegis@uni-koblenz.de                        ...
Upcoming SlideShare
Loading in …5
×

KONECT Cloud – Large Scale Network Mining in the Cloud

443 views
338 views

Published on

In the Winter 2011/2012 run at the Future SOC Lab, we used the KONECT
framework (Koblenz Network Collection) to compute ten
different network statistics on a large collection of downsampled
versions of a large network dataset, with the goal of determining
whether sampling of a large network can be used to reduce the
computational effort needed to compute a network statistic. Preliminary
results show that this is indeed the case.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
443
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

KONECT Cloud – Large Scale Network Mining in the Cloud

  1. 1. KONECT CloudLarge Scale Network Mining in the Cloud Jérôme Kunegis Future SOC Lab Day, 18.04.2012 1
  2. 2. Networks are Everywhere ip r sh tho Au ip dsh Fr ien t Trus n tio n i ca mu eCo m re nc c n r c tio c cu ter a Co-o In
  3. 3. Social Networks friend
  4. 4. Trust Networks tru st
  5. 5. Friend/Enemy Network en d em en y fri
  6. 6. Interaction Network listen
  7. 7. KONECT – Koblenz Network Collection 148 network datasets 26 are undirected 38 are directed 84 are bipartite 59 have unweighted edges 77 allow multiple edges 04 have signed edges 08 have ratings as edges 78 have edge arrival times konect.uni-koblenz.de
  8. 8. Largest Network Directed “who follows who” network 0 041 652 230 users 1 468 365 182 edges konect.uni-koblenz.de/networks/twitter
  9. 9. 148 Network Datasets authorshipcommunication co-occurrence features folksonomy interaction physical ratings reference semantic social trust
  10. 10. What We Computed Connected components Network diameter ← at Future SOC Lab Clustering coefficients Degree distributions Spectral distribution Eigenvector centrality Graph drawing Temporal Analysis Link prediction
  11. 11. Network Diameter 6
  12. 12. 90 Percentile Effective Diameter5
  13. 13. 90 Percentile Effective Diameter 3
  14. 14. 90 Percentile Effective Diameter3.75
  15. 15. Computing the EffectiveDiameterfor each node i { |V| count hops needed to reach 90% |E|}Total runtime: |E| × |V|
  16. 16. Graph Sampling Keep X% of edges
  17. 17. Computation × 1 000 vertices (sampled) × 120 840 391 edges × 20 sample sizes (5%, 10%, …, 100%) × 50 random samplings Evaluation on single machine: 1 TiB memory 64 cores Matlab 64 bit
  18. 18. Results
  19. 19. Thank You! Dr. Jérôme Kunegiskonect.uni-koblenz.de kunegis@uni-koblenz.de west.uni-koblenz.de

×