Visualize Big Graph Data

  • 14,122 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • You might also take a look at BioFabric (www.BioFabric.org) for visualizing big graphs. Depicting nodes as points, which is the traditional technique shown here, does not scale. BioFabric depicts nodes as lines, thus allowing edges to be placed in a rational and organized way. Quick demo of BioFabric is at: http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
14,122
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
726
Comments
1
Likes
54

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. M AT H I E U B A S T I A ND ATA V I S U A L I Z AT I O N S U M M I T, 1SAN FRANCISCO, APRIL 11-12, 2013
  • 2. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATADATA VISUALIZATION SUMMIT 2 2
  • 3. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHSDATA VISUALIZATION SUMMIT 3 3
  • 4. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DISTRIBUTED SYSTEMS COMPLEX STORAGE DATABASES INDEXATION LARGE DATASETS ALGORITHM CLOUD COMPUTING HADOOP ANALYTICS REAL-TIME VISUALIZATIONDATA VISUALIZATION SUMMIT 4 4
  • 5. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DISTRIBUTED SYSTEMS COMPLEX STORAGE DATABASES INDEXATION LARGE DATASETS ALGORITHM CLOUD COMPUTING HADOOP ANALYTICS REAL-TIME VISUALIZATIONDATA VISUALIZATION SUMMIT 5 5
  • 6. BIG DATA •  “The Petabyte age” •  All industries and domains can leverage big data Health Government Finance Technology •  Big Data => Big Problems •  Focusing on building the technology to handle big data, and big graph data (ex: graph databases) •  Seeking efficient analysis of ever more complex systemsDATA VISUALIZATION SUMMIT 6 6
  • 7. GRAPHS •  Graphs are everywhere, and it’s easy to collect graph data •  The world is more complex and interconnected that we thought Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442DATA VISUALIZATION SUMMIT 7 7
  • 8. NETWORK SCIENCE •  The study of graphs has been exploding in the last 15 years •  Networks have properties and patterns one can study •  Robustness – How a network is resistant to random attacks? •  Contagion – How fast a disease or gossip spread in a network? •  Communities – How many communities exist in a network? •  Centrality – Who is the most central individual in a network? •  If you read one of these books, you understand Network ScienceDATA VISUALIZATION SUMMIT 8 8
  • 9. GRAPHS HELP SOLVE PROBLEMS •  Saddam Hussein Network (2003) The Universe C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. www.slate.com/ id/2245228/.DATA VISUALIZATION SUMMIT 9 9
  • 10. GRAPHS HELP SOLVE PROBLEMS •  Predicting and controlling infectious disease Naoki Masuda, Petter Holme - Predicting and controlling infectious disease The Universe epidemics using temporal networks. http://f1000.com/prime/reports/b/5/6/ Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual networks in a male homosexual community in Iceland. J Acquir Immune Defic Syndr. 1992, 5:374–81.DATA VISUALIZATION SUMMIT 10 1 0
  • 11. GRAPHS HELP SOLVE PROBLEMS •  Recommendation systems The Universe Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/DATA VISUALIZATION SUMMIT 11 1 1
  • 12. GRAPHS HELP SOLVE PROBLEMS •  Recipe recommendation using ingredient networks The Universe Credit: http://www.ladamic.com/wordpress/?p=294 1DATA VISUALIZATION SUMMIT 21 2
  • 13. GRAPHS HELP SOLVE PROBLEMS •  Power grid The Universe Credit: http://www.npr.org/templates/story/story.php?storyId=110997398DATA VISUALIZATION SUMMIT 13 1 3
  • 14. SMALL GRAPHS •  Famous “Zachary’s Karate Club” study in 1977 only involved 34 nodes. •  It could be drawn by hand on paper The Universe Zachary’s Karate Club (1977) W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).DATA VISUALIZATION SUMMIT 14 1 4
  • 15. MEDIUM GRAPHS •  Your own Facebook or LinkedIn social network •  The Harlem Shake: Anatomy of a Viral Meme The Universe Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.htmlDATA VISUALIZATION SUMMIT 15 1 5
  • 16. LARGE GRAPHS •  The Internet Map (~350 000 domains) •  DBPedia (~290M relationships) •  Friendster Social Network dataset* (1.8B edges) The Universe Internet Map (http://internet-map.net) * http://snap.stanford.edu/data/index.htmlDATA VISUALIZATION SUMMIT 16 1 6
  • 17. IMPLICIT GRAPHS •  Graphs can be explicit or implicit •  Explicit: The network exists in nature (Social Network, Food Webs, Airlines Network) •  Implicit: The network is derived from other data (Word networks, co- authorship) •  Example of an implicit graph: •  A set of documents have a set of tags •  One can create a link when two tags are on the same document •  Aggregate all links across all documentsDATA VISUALIZATION SUMMIT 17 1 7
  • 18. SIMILARITY GRAPHS •  Graphs of all the co-occurrences between LinkedIn Skills (2011)DATA VISUALIZATION SUMMIT 18 1 8
  • 19. VISUALIZATION •  Visualization and statistics are the two basic toolkits one can use on graphs •  Complex questions are asked when studying graphs •  Easy •  Min, max, average, quartiles Excel can do this! •  Exact queries, search •  Harder •  Patterns, trends, correlations •  Changes over time, context •  Anomalies, data errors Visualization can do this! •  Geographical representationDATA VISUALIZATION SUMMIT 19 1 9
  • 20. GRAPH VISUALIZATION •  Due to the size of graphs and the complexity of questions, visualization is the natural tool to understand what’s going on “ We are more easily persuaded by the reasons we ourselves discover than by those which are given to us by others.” Blaise Pascal Let me play with the data! Direct manipulationDATA VISUALIZATION SUMMIT 20 2 0
  • 21. DATA EXPLORATION AND INTERACTION •  Use visualization and statistics to discover new hypothesis •  Exploratory data analysis “The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey •  The user interface is centered around the human •  Empowers the user to understand the structure and patterns in the data •  The machine augments the human •  How? •  Overview and details, zoom and pan interface •  Interactive, direct-manipulationDATA VISUALIZATION SUMMIT 21 2 1
  • 22. MAP YOUR DATA •  Iterative process to transform relational data into a map •  Use color, size and position to highlight, group and set up a hierarchyDATA VISUALIZATION SUMMIT 22 2 2
  • 23. FROM INFORMATION TO KNOWLEDGE •  Exploring networks interactively & iterating often provide “Eureka” moments for domain experts EurekaDATA VISUALIZATION SUMMIT 23 2 3
  • 24. BIG GRAPH DATA •  Big graph data doesn’t necessarily mean you’re visualizing or analyzing a large graph •  Small graphs can be extracted from large graphs and analyzed •  Small graphs can be extracted from non-graph data as well •  Graphs are just nodes and relationships after all •  Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi (Josh Wills, Cloudera, 2012)DATA VISUALIZATION SUMMIT 24 2 4
  • 25. GEPHI •  Built to solve large graph visualization problems. •  Open source tool for Windows, Mac OS X and Linux •  Large international community involved •  The latest version has been downloaded > 100,000 times •  Extensible with plug-ins •  Available at http://gephi.orgDATA VISUALIZATION SUMMIT 25 2 5
  • 26. GEPHI DATA EDITION VISUAL MAPPING FILTER VISUALIZATION STATISTICS LAYOUT TIMELINEDATA VISUALIZATION SUMMIT 26 2 6
  • 27. SIGMA.JS •  Open-source lightweight JavaScript library to draw graphs •  Uses HTML5 Canvas •  Display dynamically graphs that can be generated on the fly •  Available at http://sigmajs.org Sigma.js v0.1DATA VISUALIZATION SUMMIT 27 2 7
  • 28. SUMMARY •  Big graph data = Relational Big Data •  Graphs are everywhere! •  Graphs have fascinating structure and patterns one can analyze •  Visualization is a natural tool for such complex data and complex questions •  On graphs, visualization done right allows interaction and iteration. Play. •  The hard part is to extract a small or medium graph from big data •  Open source tools like Gephi or Sigma.js are a good startDATA VISUALIZATION SUMMIT 28 2 8
  • 29. Become a graph evangelist! QUESTIONS? Mathieu Bastian (@mathieubastian)DATA VISUALIZATION SUMMIT 29 2 9
  • 30. REFERENCES & LINKS Join the Social Network Analysis class by Lada Adamic on Coursera Sigma.js, Alexis Jacomy and al. https://www.coursera.org/course/sna http://sigmajs.org Support the Gephi Consortium Linked: How Everything Is Connected to Everything Else and What It http://consortium.gephi.org Means, Albert-Laszlo Barabasi http://www.amazon.com/gp/product/0452284392/ Computational Information Design, Ben Fry (2004) http://benfry.com/phd/ Six Degrees: The Science of a Connected Age, Duncan J. Watts http://www.amazon.com/gp/product/0393325423/ The Atlas of Economic Complexity, Harvards Center for International Development (CID) and the MIT Media Lab Nexus: Small Worlds and the Groundbreaking Science of Networks, http://atlas.media.mit.edu/ Mark Buchanan http://www.amazon.com/gp/product/0393324427 The Mesh of Civilizations and International Email Flows, Bogdan State, Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy Connected: The Surprising Power of Our Social Networks and How They http://arxiv.org/abs/1303.0045 Shape Our Lives, Nicholas A. Christakis and James H. Fowler http://www.amazon.com/dp/product/0316036137 The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Atelier Iceberg – Gephi Vidal M, Barabási A-L (2007) http://www.slideshare.net/ateliericeberg/gephi-17680699 http://www.pnas.org/content/104/21/8685.full Adding Value through graph analysis using Titan and Faunus, Matthias What does your intranet look like? Broecheler http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013 Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu- Network Maps Board on Pinterest, Mathieu Bastian Ru Lin, Lada A. Adamic http://pinterest.com/mathieubastian/network-maps/ http://arxiv.org/abs/1111.3919 Network Science Book, Albert-László Barabási US Presidents Inaugural Speeches 1969-2013 Text Network Analysis http://barabasilab.neu.edu/networksciencebook http://noduslabs.com/cases/presidents-inaugural-speeches-text- network-analysis/ Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera https://github.com/cloudera/ades 10 Reasons Why We Visualise Data http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-dataDATA VISUALIZATION SUMMIT 30 3 0