Large graph analysis using g mine system


Published on

A small effort to illustrate the paper by the CMU

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Large graph analysis using g mine system

  1. 1. Large Graph Analysis In The GMine System By Saurabh Jogalekar TE C 51 Seminar Guide: Prof. S V Jagtap
  2. 2. Large Graph A large graph is a graph with hundreds of thousands of nodes and a million edges Our friend list, recommendations, likes, comments in case of social networks is the best example of Large Graphs Other examples of large graphs include web graphs i.e. web pages pointing to each other through hyperlinks, bipartite graphs and computer communication graphs in which IP addresses send packets to other IP addresses.
  3. 3. Representing Graphs The three techniques traditionally used for graph representation are • • • 1. Adjacency matrix 2. Adjacency list 3. Binary Decision Diagrams
  4. 4. Representing Large Graphs • • Representation of large graphs is a challenging task in the way that the overall visibility of the graph is reduced due to huge amounts of nodes and edges. Thus the traditional methods for representation fail Example of a large graph
  5. 5. Large Graph Representation • • Another problem with representing large graphs is that to acquire or mine the required nodes and edges, several complex calculations are required To overcome such hindrances in graph representation, a graph summarization method called CEPS (CEntre Piece Subgraph) is utilized
  6. 6. GRAPH-TREE • • The CEPS is utilized from Graph-tree, which is hierarchical representation of graph containing SuperGraph, SuperNodes and SuperEdges The graph-tree is formed as shown in the figure
  7. 7. FILLING A GRAPH-TREE Algorithm FillGraphTree(ptr) • • If ptr is leaf then set ptr -> fliepath to the file of corr. Subgraph Else for each child of ptr do: • • • • FillGraphTree(child) Instantiate a SuperEdge for each pair of children, find matches between unresolved edges from each pair and store them in superEdges Use external edges to determine ptr’s open nodes Propagate unresolved external edges to the parent
  8. 8. SuperNodes and GraphNodes connectivity • • • SuperNodes connectivity for two SuperNodes is the set of edges, where each of the source belongs to coverage of first SuperNode and target belongs to the coverage of second SuperNode Graph Node connectivity is the set of edges connecting the graph node to other graph nodes which are not a part of coverage of the SuperNode which includes the Graph Node Both of the connectivity are useful in constructing the graph from its hierarchical representation
  9. 9. Motivation behind CEPS • • Using a Graph-tree and hierarchical representation of a SuperGraph lessens the problem of inspecting large graphs However, the information retrieved from reaching the sub-graph is sometimes much greater than required information. To overcome this lacuna, CEPS is utilized
  10. 10. CEPS • • • . A centre-piece subgraph contains the collection of paths connecting a subset of graph nodes of interest CEPS helps interaction by significantly reducing the number of edges and of nodes to be inspected CEPS uses a Random Walk Restart method to fine the ‘importance’ score between 2 nodes
  11. 11. GOODNESS SCORE • Goodness score is calculated by a method Random Walk Restart. A matrix A(i, j) is defined which stores the steady state probabilities for each node ‘j’ with respect to the query ‘i’. 0.0088 5 0.0333 0.0024 0.0076 11 12 4 0.1260 0.0024 10 0.0283 13 3 0.1235 2 1 0.5767 0.0076 6 0.1260 0.0333 9 8 7 0.0088 Individual Score Matrix Q1 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 0.5767 0.1235 0.0283 0.0076 0.0088 0.0076 0.0088 0.0333 0.1260 0.1260 0.0333 0.0024 0.0024 Q2 0.0088 0.0076 0.0283 0.1235 0.5767 0.0076 0.0088 0.0024 0.0024 0.0333 0.1260 0.1260 0.0333 Q3 0.0088 0.0076 0.0283 0.0076 0.0088 0.1235 0.5767 0.1260 0.0333 0.0024 0.0024 0.0333 0.1260
  12. 12. EXTRACT ALGORITHM • • • • • The “EXTRACT” algorithm takes as input the weighted graph W, the importance scores on all nodes, the budget b; and produces as output a small, unweighted, undirected graph H. It is performed using dynamic programming or greedy method 1. Initialize output graph H be null 2. Let len be the maximum allowable path length 3. While H is not big enough • • • 3.1. Pick up destination node pd 3.2. For each active source node qi wrt node pd • • 3.2.1. discover a key path P(qi, pd) 3.2.2. add P(qi, pd) to H 4. Output the final H
  13. 13. GMINE SYSTEM • • • • GMine is a graph visualisation tool, used for handling large graphs. The tool makes use of Graph-Trees to offer good and readable graph exploration As the user interacts with the visualization, the system keeps track of the connectivity among communities of nodes at different levels of the partitioned graph. When the user changes the focus position on the tree structure, the system works on demand to calculate and present contextual information.
  15. 15. REFERENCES • • • • • Jose F. Rodrigues Jr, Hanghang Tong, Jia-Yu Pan, Agma J.M. Traina, Caetano Traina Jr. and Christos Faloutsos, “Large Graph Analysis in the GMine System”, IEEE transactions on knowledge and data engineering, vol. 25, no. 1, January 2013 Christos Falustos, Jose F. Rodrigues Jr, HanghangTong, Agma J.M. Traina, “GMine: A system for scalable, interactive, graph visualization and mining” In IEEE/ACM International Conference, pages 1195–1198, Oconomowoc, Wisconsin, USA. Hanghang Tong, Christos Falustos, Center Piece Subgraphs: Problem definition and fast solutions”, Carnegie-Mellon University, Research Track Paper, page 404-414 (Carnegie-Mellon University Site ) Jose F. Rodrigues Jr, Agma J.M. Traina, Caetano Traina Jr. Caio, Cesar Moreli , “GMine: Interactive browsing of large graphs”, Workshop On Information Visualization and Analysis In Social Networks – WIVA 2008
  16. 16. QUESTIONS / QUERIES .. ?
  17. 17. THANK-YOU