HW09 Social network analysis with Hadoop
 

HW09 Social network analysis with Hadoop

on

  • 7,739 views

 

Statistics

Views

Total Views
7,739
Views on SlideShare
7,337
Embed Views
402

Actions

Likes
13
Downloads
241
Comments
2

4 Embeds 402

http://mikeg.typepad.com 356
http://www.slideshare.net 41
http://planetlotus.org 4
http://static.slidesharecdn.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Amazing slide explaining on social network.

    Mark Chang, www.free-ringtones.co.in/ www.free-ringtones-for-sprint.com/
    Are you sure you want to
    Your message goes here
    Processing…
  • Similar open source project on sourceforge.

    xrime.sourceforge.net.

    The goal and approach are similar, implemented in Java over Hadoop.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HW09 Social network analysis with Hadoop HW09 Social network analysis with Hadoop Presentation Transcript

  • Social network analysis with Hadoop Jake Hofman Yahoo! Research October 2, 2009 Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Social networks • Rapid increase in amount and variety of social network data • Valuable information for products (recommendations, advertising, etc.) and research (structure/dynamics, diffusion, etc.) Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Social networks Goal: to enable analysis of large-scale social network data with readily available software/hardware Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • 1970s ∼ 101 nodes 456 JOURNAL OF ANTHROPOLOGICAL RESEARCH FIGURE 1 Social Network Model of Relationships in the Karate Club 34 1 33 3 2 27 8 26 i 9 25 10 CONFLICT AND FISSION IN SMALL GROUPS 453 to bounded social groups of all types in all settings. Also, the data required can be collected by a reliable method currently familiar to anthropologists, the use of nominal scales. 19 18 16 18 17 THE ETHNOGRAPHIC RATIONALE The is the the clubrepresentationline ofis the socialbetween of three years, the indi-1970 This karate karate was observed for a period two amongwhen 34 two viduals in graphic club. A drawn relationships the from points to 1972. In addition to direct observation, the history of outside those of to individuals being represented consistently interacted in contexts the club prior the period of the study and club meetings. Each through drawn is referredandasclub karate classes, workouts, was reconstructed such line informants to an edge. records in the university archives. During the period of observation, the club maintained between 50 and 100 members, and its activities two individuals consistently were observed to interact outside the included social affairs (parties, dances, and club normal activities of the club (karate classes banquets, etc.) Thatwell as as • Few direct observations; highly detailed info on nodes and edges meetings). regularly scheduled ifkarate lessons. could be said to be friends outside the an edge is drawn the individuals The political organization of is, clubthe club activities.This while there was a constitutionin Figure 2. officers, was informal, and graph is represented as a matrix and four All most decisions were made nondirectional at represent interaction in both the edges in Figure 1 are by concensus (they club meetings. For its classes, • E.g. karate club (Zachary, 1977) the club employed thepart-time karate instructor, who will possible to to directions), and a graph is said to be symmetrical.It is also be referred draw edges that are directed (representing one-way relationships); such as Mr. Hi.2 At the beginning of the study there was an incipient conflict between the club president, John A., and Mr. Hi over the price of Jake Hofman (Yahoo! Research) karate lessons. Mr. Hi, who analysis with prices, claimed the authority Social network wished to raise Hadoop October 2, 2009
  • 1990s ∼ 104 nodes • Larger, indirect samples; relatively few details on nodes and edges • E.g. APS co-authorship network (http://bit.ly/aps08jmh) Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Present ∼ 108 nodes + • Very large, dynamic samples; many details in node and edge metadata • E.g. Mail, Messenger, Facebook, Twitter, etc. Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Scale ... • Example numbers: • ∼ 107 nodes • ∼ 102 edges/node (degree) User 1 User 2 • no node/edge data • static • ∼8GB ... Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Scale ... • Example numbers: • ∼ 107 nodes • ∼ 102 edges/node (degree) User 1 User 2 • no node/edge data • static • ∼8GB ... Simple, static networks push memory limit for commodity machines Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Scale ... • Example numbers: • ∼ 107 nodes • ∼ 102 edges/node (degree) Message Header • node/edge metadata User 1 Content ... User 2 User User • dynamic Profile History Profile History • ∼100GB/day ... ... ... Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Scale ... • Example numbers: • ∼ 107 nodes • ∼ 102 edges/node (degree) Message Header • node/edge metadata User 1 Content ... User 2 User User • dynamic Profile History Profile History • ∼100GB/day ... ... ... Dynamic, data-rich social networks exceed memory limits; require considerable storage Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Distributed network analysis MapReduce convenient for parallelizing individual node/edge-level calculations Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Distributed network analysis Higher-order calculations more difficult when network exceeds memory constraints, but can be adapted to MapReduce framework Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Package details • Higher-order node-level • Network descriptive statistics creation/manipulation • Clustering coefficient • Logs → edges • Implicit degree • Edge list ↔ adjacency list • ... • Directed ↔ undirected • Edge thresholds • Global calculations • First-order descriptive • Pairwise connectivity • Connected components statistics • Minimum spanning tree • Number of nodes • Breadth-first search • Number of edges • Pagerank • Node degrees • Community detection Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Package details • Higher-order node-level • Network descriptive statistics creation/manipulation • Clustering coefficient • Logs → edges • Implicit degree • Edge list ↔ adjacency list • ... • Directed ↔ undirected • Edge thresholds • Global calculations • First-order descriptive • Pairwise connectivity • Connected components statistics • Minimum spanning tree • Number of nodes • Breadth-first search • Number of edges • Pagerank • Node degrees • Community detection Currently implemented in Streaming with Python Algorithms exist/developed for additional features Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Application: Twitter • Distributed crawl of Twitter social network + public messages (crawler by Eytan Bakshy, http://bit.ly/eytanb) Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Application: Twitter • Distributed crawl of Twitter social network + public messages (crawler by Eytan Bakshy, http://bit.ly/eytanb) • ∼ 25 million nodes, ∼ 800 million edges Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Twitter: Degree Distribution 8 10 out−degree (friends) in−degree (followers) 7 10 6 10 5 10 count 4 10 3 10 2 10 1 10 0 10 0 1 2 3 4 5 6 10 10 10 10 10 10 10 degree • Aggregates users by number of friends/followers seen in crawl Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Twitter: Degree Distribution 8 10 out−degree (friends) in−degree (followers) 7 10 6 10 5 10 count 4 10 3 10 2 10 1 10 0 10 0 1 2 3 4 5 6 10 10 10 10 10 10 10 degree Many people not followed by anyone; few followed by many Most people follow at least a few others Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Twitter: Node-level clustering coefficient ? ? • Fraction of edges amongst a node’s friends/followers (Watts & Strogatz, 1998) Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Twitter: Node-level clustering coefficient 8 10 followers friends 7 10 6 10 ? 5 10 count 4 10 3 10 ? 2 10 1 10 0 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 clustering coefficient • Fraction of edges amongst a node’s friends/followers (Watts & Strogatz, 1998) Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Twitter: Node-level clustering coefficient 8 10 followers friends 7 10 6 10 ? 5 10 count 4 10 3 10 ? 2 10 1 10 0 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 clustering coefficient Suprisingly high density at 0.5 (many isolated triangles) Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Future plans • Open-source release • “A Model of Computation for MapReduce”, Karloff, Suri, & Vassilvitskii, Symposium on Discrete Algorithms, 2010 (Accepted) • Twitter analysis publication (In progress) Goal: to enable analysis of large-scale social network data with readily available software/hardware Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Collaborators • Eytan Bakshym,y • Sharad Goely • Winter Masony • Sid Suriy • Sergei Vassilvitskiiy • Duncan Wattsy • (You?) y Yahoo! Research (http://research.yahoo.com) m University of Michigan Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009
  • Thanks. Questions?1 1 hofman@yahoo-inc.com, jakehofman.com Jake Hofman (Yahoo! Research) Social network analysis with Hadoop October 2, 2009