Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Massivegraph telecom ppt


Published on

Data Visualization and Massive graph

Published in: Education, Technology
  • Be the first to comment

Massivegraph telecom ppt

  1. 1. Analyzing the Structure and Evolution of Massive Telecom Graphs <ul><li>Amit Nanavati, Rahul Singh, Anupam Joshi, Gautam Das </li></ul><ul><li>CIKM 2006 </li></ul>Presented by Harshavardhan Achrekar University of Massachusetts Lowell
  2. 2. Introduction <ul><li>Mobile Telecom Focus / Challenges </li></ul><ul><ul><ul><li>Customer Acquisition </li></ul></ul></ul><ul><ul><ul><li>Customer Retention </li></ul></ul></ul><ul><li>Avoid churns </li></ul><ul><li>Strategy :- right incentive (loyalty program), marketing strategies(family plans), place network assets appropriately. </li></ul><ul><li>Goal / Critical Requirements </li></ul><ul><ul><li>Optimizing Marketing Expenditure </li></ul></ul><ul><ul><li>Improved Targeting </li></ul></ul>Loss of Subscribers who switch from one carrier / service provider to another.
  3. 3. Call Detail Record (CDR) / CALL GRAPH <ul><li>Graph theory explains User Behaviour patterns </li></ul><ul><ul><ul><li>CALL GRAPH disconnected </li></ul></ul></ul><ul><ul><ul><ul><li>blanket advertisement over word of mouth spreading </li></ul></ul></ul></ul><ul><ul><ul><li>presence of cliques </li></ul></ul></ul><ul><ul><ul><ul><li>presence of communities ...effective group targeting and retention </li></ul></ul></ul></ul>Caller Receiver {Time stamp , Duration} A clique in an undirected graph G  = ( V ,  E ) is a subset of the vertex set C  ⊆  V , such tha t for every two vertices in C , there exists an edge connecting the two.
  4. 4. Authors Contribution <ul><li>Structural Analysis of CDR ’ s of one of the Largest Mobile Telecom in World. </li></ul><ul><li>Topological properties of massive call graphs like shape , degree distribution , cliques , connected components etc., power law in scale free network. </li></ul><ul><li>Model build on edge distribution as opposed to node distribution of components of the graph. </li></ul><ul><li>Temporal analysis performed against Static snapshot of network. </li></ul><ul><li>Short Messaging Services (SMS) graph analyzed.(skipped !!!) </li></ul>
  5. 5. Data Sources <ul><li>Study was done on single mobile operator in India. </li></ul><ul><li>Analyzed calling patterns of four regions ; two metropolitan cities for a week and two states with mixture of rural and urban population for one month. </li></ul><ul><li>CDR ’ s stored at base station in data warehouse. </li></ul>
  6. 6. <ul><li>CALL GRAPH G is a pair <V(G),E(G)>, where V (G) is a nonempty finite set of vertices , and E(G) is a finite set of vertex-pairs from V (G). If u and v are vertices of G, then edge <u,v> implies u calls v. </li></ul><ul><li>Multiple calls between 2 user / nodes are treated as single edge. </li></ul><ul><li>Short duration calls (less than 10 seconds), long distance or international calls ignored. </li></ul>
  7. 7. Structural properties of CALL GRAPHS <ul><li>Node Degree Distribution - gives information about the number of nodes n(d) of each degree d in the graph. ( P( d ) = n( d )/n). </li></ul><ul><li>The degree distribution P(d) for directed networks splits in, the in-degree distribution P(d in ) and the out-degree distribution P(d out ), which are measured separately as probabilities of having d in incoming links and d out outgoing links, respectively. </li></ul><ul><li>See Fig 1 and 2. Indegree dist follows WWW.(expo) </li></ul>
  8. 8. <ul><li>heavy-tailed form fits a power-law behavior </li></ul><ul><li>Few nodes that have very high in-degree or out-degree and may be suitable for individual targeting by telecom service provider. </li></ul>
  9. 9. <ul><li>Neighbourhood Distribution / Hop Plot N( h ) for a graph is the number of pairs of nodes within a specified distance, for all distances h . </li></ul><ul><li>The individual neighbourhood function for u at h is number of nodes at distance h or less from u . </li></ul><ul><li>The neighbourhood function N( h ) is the number of pairs of nodes within distance h . </li></ul><ul><li>H is hop exponent. {Use ANF tool.} </li></ul>
  10. 10. <ul><li>Two graphs with different hop exponents, are structurally different. </li></ul><ul><li>compute hop exponent using linear fit on the N(h) distribution and found it close to 4 and 5 in telecom network like dense WWW. </li></ul><ul><li>Grid has hop exponent 2 </li></ul><ul><li>Region A AND B have same H. </li></ul>
  11. 11. <ul><li>Effective Diameter of the Network :- For a call graph of N nodes with E edges effective diameter </li></ul><ul><li>See Table II ...max value is 13. </li></ul><ul><li>if any two nodes are within delta eff hops from each other with a high probability. </li></ul><ul><li>Small-World phenomenon exists in mobile call graphs since most pairs of nodes (phone numbers) are separated by a handful of edges (calls). </li></ul><ul><li>identify social communities....Milgram ’ s experiment.. </li></ul>
  12. 12. <ul><li>Cliques :- Useful for defining closed user groups , where discounts are given for all calls made within the closed user group. (family plan) </li></ul><ul><li>The number and sizes of such groups also gives an idea of what are the right incentives to offer. </li></ul><ul><li>See Fig 4...many cliques of size 3-4...max 17. </li></ul>
  13. 13. <ul><li>Page Rank p ( i ) of page i </li></ul><ul><li>measure of Social importance of individual...grows with number of people calling the individual and social importance of the callers. </li></ul><ul><li>See Fig 5 ..follows power law distribution. </li></ul>
  14. 14. Strongly Connected Component (SCC) Scale free networks exhibits presence of SCC. Largest SCC is significantly larger than second largest SCC. (Observed in WWW graph also.)
  15. 15. Shape of CALL Graphs <ul><li>build a generative model - To study and predict usage growth in a new region </li></ul><ul><li>Reach experiments to obtain shape of network. </li></ul><ul><li>Structure based on Node Distribution </li></ul><ul><li>spot all the connected components and place them spatially with interconnections & identify shape. </li></ul><ul><li>use Random Start Breadth First Search. </li></ul><ul><li>experiment collected a set of random sample nodes and computed the reach of all these nodes </li></ul>
  16. 16. <ul><li>While ‘ reachability ’ of a node v means whether v is reachable from another node u, we use ‘ reach ’ of v to mean the set of nodes (or its cardinality) reachable from v. .....nodal analysis. </li></ul><ul><li>See Table IV </li></ul><ul><ul><li>Reach (R of a node u) is the number of all possible nodes reached in BFS, when starting from a given node. </li></ul></ul><ul><ul><li>Percentage Reach (P = R/N) is the percentage of nodes reached (to total number of nodes in the graph). </li></ul></ul><ul><ul><li>Reach Probability ( p R ) denotes the percentage probability that a given node has reach R. </li></ul></ul>
  17. 17. Bow-Tie WWW <ul><li>Reach Split between 1-6 or 1022575-1022586 </li></ul><ul><li>massively connected component CC (nodes having reach exactly equal to 1022575) </li></ul><ul><ul><li>entry component (nodes having reach more than 1022575) </li></ul></ul><ul><ul><li>exit component (nodes have reach less than 6) </li></ul></ul><ul><ul><li>disconnected components . </li></ul></ul>
  18. 18. <ul><li>explain Table V. </li></ul><ul><li>Sizes of IN,SCC,OUT for the WWW are nearly of same order (44 million, 56 million, and 44 million respectively) . </li></ul><ul><li>For our graphs, the SCC is often an order of magnitude larger than IN, and OUT is often nearly twice that of IN (124801, 755592, 266984 respectively). </li></ul><ul><li>Bow-Tie model does not characterize our graphs. </li></ul>
  19. 19. Structure based on Edge Density <ul><li>examine the number of vertices in the various regions (IN;OUT,etc.). </li></ul><ul><li>From the BFS experiment, we know that starting from a particular node, the reach is either huge (>1022575) or very low (< 6). </li></ul><ul><li>We collected the nodes whose reach is very high. These are the nodes of SCC and IN region. </li></ul><ul><li>Starting from nodes with high reach, we collected the nodes that are reachable. These nodes belong to the SCC and OUT regions. </li></ul><ul><li>We intersected these two sets to isolate SCC, IN and OUT and extracted the several edge-induced subgraph which are defined in Table VI. </li></ul>
  20. 21. Left part & Right part captures number of nodes from two sets of bipartite graphs. Edge ratio column reports the ratio of edges in a particular component to the edges of IN-SCC region.
  21. 22. CALL Graph as a Treasure Hunt Model
  22. 23. Temporal Analysis <ul><li>how some of the structural properties of these call graphs vary with time. </li></ul><ul><li>For regions B and C which had one week ’ s data researchers looked at cumulative CDR ’ s at each of the seven days. </li></ul><ul><li>For regions A and D had one month ’ s data so looked at seven time points at intervals of four days each. </li></ul><ul><li>no noise...celebration... </li></ul>
  23. 24. Degree Distribution (degree increase with time)
  24. 25. Preferential Attachment in the network, nodes with higher degree have stronger ability to grab new links. <ul><li>first found out the in-degrees and out-degrees of nodes on the first day and for the same set of nodes, the average of their in-degrees and out-degrees on the seventh day </li></ul>
  25. 26. Neighborhood Distribution <ul><li>indication of the effective diameter of a graph </li></ul><ul><li>plot gives insights on how the diameter of the call graphs is changing with time. </li></ul><ul><li>Maximum distance between any two pairs in graph is decreasing with time. </li></ul><ul><li>This decreasing diameter phenomenon is observed in WWW. </li></ul>
  26. 27. Cliques <ul><li>First day itself the largest sized clique is 7. By the fifth day a clique of size 12 is formed. </li></ul><ul><li>linear increase in the number of cliques of smaller sizes each day. </li></ul><ul><li>no cliques of size greater than 12 are formed in the last two days </li></ul>
  27. 28. Strongly Connected Component <ul><li>Fraction of nodes present in the largest SCC increases rapidly with time. </li></ul><ul><li>Fraction of nodes present in SCCs of the smallest sizes is decreasing with time. </li></ul><ul><li>CALL graphs show a tendency of greater accumulation into a single SCC over time by taking in nodes from the smaller components. </li></ul>
  28. 29. Treasure Hunt Model <ul><li>The number of edges in the maze increase very rapidly. ...expected ... observed that the percentage of nodes in the SCC was increasing. </li></ul><ul><li>The sizes of in-tunnel & out-tunnel are also increasing. But increase is not as rapid as maze. </li></ul><ul><li>The sizes of the treasure & entry components are decreasing. Shortcuts remain constant with time. </li></ul>
  29. 30. <ul><ul><li>The maze is getting bulkier by sucking in edges from the side components i.e. the entry on the one end and the treasure on the other end. </li></ul></ul>
  30. 31. <ul><li>Ratio of edges in various components wrt the edges in the in-tunnel component were similar for 4 regions. </li></ul><ul><li>Fraction of edges in the maze is increasing. </li></ul><ul><li>Fraction of edges in entry & treasure are decreasing </li></ul><ul><li>Fraction of edges in shortcuts and out-tunnel remain almost constant. </li></ul>
  31. 32. Will components collapse into a single large maze? <ul><li>Densification -- No. </li></ul><ul><li>New people who join the network initially make or receive a few calls and hence are part of the IN or the OUT region. </li></ul><ul><li>Over time they make and receive more calls thus pulling them into the SCC. </li></ul><ul><li>The constantly high influx of new nodes into the IN and the OUT regions suggests against the total vanishing of the treasure and entry regions. </li></ul>
  32. 33. Conclusion <ul><li>Systematic Approach to analyze network topologies </li></ul><ul><li>Shape of CALL Graph follow Treasure Hunt Model </li></ul><ul><li>Evolution of CALL Graph over Time </li></ul><ul><li>SMS Graph is Social.(more reciprocative..larger cliques size).(skipped!!!) </li></ul>Thank you
  33. 34. SMS Graph Analysis (Appendix)