SlideShare a Scribd company logo
1 of 30
Download to read offline
M AT H I E U B A S T I A N

D ATA V I S U A L I Z AT I O N S U M M I T,
                                              1
SAN FRANCISCO, APRIL 11-12, 2013
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA




DATA VISUALIZATION SUMMIT                             2
                                                          2
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA


      BIG DATA    GRAPHS




DATA VISUALIZATION SUMMIT                             3
                                                          3
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA


      BIG DATA      GRAPHS


                                                       DISTRIBUTED SYSTEMS
                  COMPLEX
  STORAGE
                                                    DATABASES
               INDEXATION
                                 LARGE DATASETS                 ALGORITHM

             CLOUD COMPUTING
                                                  HADOOP
                               ANALYTICS
 REAL-TIME                                                        VISUALIZATION


DATA VISUALIZATION SUMMIT                                                         4
                                                                                      4
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA


      BIG DATA      GRAPHS


                                                       DISTRIBUTED SYSTEMS
                  COMPLEX
  STORAGE
                                                    DATABASES
               INDEXATION
                                 LARGE DATASETS                 ALGORITHM

             CLOUD COMPUTING
                                                  HADOOP
                               ANALYTICS
 REAL-TIME                                                        VISUALIZATION


DATA VISUALIZATION SUMMIT                                                         5
                                                                                      5
BIG DATA
    •  “The Petabyte age”
    •  All industries and domains can leverage big data




           Health      Government      Finance       Technology

    •  Big Data => Big Problems
    •  Focusing on building the technology to handle big data, and big
       graph data (ex: graph databases)
    •  Seeking efficient analysis of ever more complex systems



DATA VISUALIZATION SUMMIT                                                6
                                                                             6
GRAPHS
    •  Graphs are everywhere, and it’s easy to collect graph data
    •  The world is more complex and interconnected that we thought




        Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442


DATA VISUALIZATION SUMMIT                                                                               7
                                                                                                            7
NETWORK SCIENCE
    •  The study of graphs has been exploding in the last 15 years
    •  Networks have properties and patterns one can study
      •  Robustness – How a network is resistant to random attacks?
      •  Contagion – How fast a disease or gossip spread in a network?
      •  Communities – How many communities exist in a network?
      •  Centrality – Who is the most central individual in a network?
    •  If you read one of these books, you understand Network Science




DATA VISUALIZATION SUMMIT                                                8
                                                                             8
GRAPHS HELP SOLVE PROBLEMS
    •  Saddam Hussein Network (2003)




           The Universe

                                 C. Wilson. Searching for Saddam: a five-part series on how the US military
                                 used social networking to capture the Iraqi dictator. 2010. www.slate.com/
                                 id/2245228/.



DATA VISUALIZATION SUMMIT                                                                             9
                                                                                                          9
GRAPHS HELP SOLVE PROBLEMS
    •  Predicting and controlling infectious disease




                                       Naoki Masuda, Petter Holme - Predicting and controlling infectious disease
            The Universe               epidemics using temporal networks.
                                       http://f1000.com/prime/reports/b/5/6/

                                       Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual
                                       networks in a male homosexual community in Iceland. J Acquir Immune
                                       Defic Syndr. 1992, 5:374–81.




DATA VISUALIZATION SUMMIT                                                                               10 1
                                                                                                           0
GRAPHS HELP SOLVE PROBLEMS
    •  Recommendation systems




             The Universe


     Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/


DATA VISUALIZATION SUMMIT                                                                   11 1
                                                                                              1
GRAPHS HELP SOLVE PROBLEMS
    •  Recipe recommendation using ingredient networks




             The Universe


     Credit: http://www.ladamic.com/wordpress/?p=294


                                                         1
DATA VISUALIZATION SUMMIT                                21
                                                         2
GRAPHS HELP SOLVE PROBLEMS
    •  Power grid




              The Universe


     Credit: http://www.npr.org/templates/story/story.php?storyId=110997398


DATA VISUALIZATION SUMMIT                                                     13 1
                                                                                3
SMALL GRAPHS
    •  Famous “Zachary’s Karate Club” study in 1977 only involved 34
       nodes.
    •  It could be drawn by hand on paper




              The Universe

       Zachary’s Karate Club (1977)   W. W. Zachary, An information flow model for conflict and fission in small
                                      groups, Journal of Anthropological Research 33, 452-473 (1977).



DATA VISUALIZATION SUMMIT                                                                                14 1
                                                                                                             4
MEDIUM GRAPHS
    •  Your own Facebook or LinkedIn social network
    •  The Harlem Shake: Anatomy of a Viral Meme




             The Universe

       Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html




DATA VISUALIZATION SUMMIT                                                                       15 1
                                                                                                  5
LARGE GRAPHS
    •  The Internet Map (~350 000 domains)
    •  DBPedia (~290M relationships)
    •  Friendster Social Network dataset* (1.8B edges)




              The Universe

       Internet Map (http://internet-map.net)
                                                  * http://snap.stanford.edu/data/index.html



DATA VISUALIZATION SUMMIT                                                              16 1
                                                                                         6
IMPLICIT GRAPHS
    •  Graphs can be explicit or implicit
      •  Explicit: The network exists in nature (Social Network, Food Webs,
         Airlines Network)
      •  Implicit: The network is derived from other data (Word networks, co-
         authorship)


    •  Example of an implicit graph:
        •  A set of documents have a set of tags
        •  One can create a link when two tags are on the same document
        •  Aggregate all links across all documents




DATA VISUALIZATION SUMMIT                                                       17 1
                                                                                  7
SIMILARITY GRAPHS
    •  Graphs of all the co-occurrences between LinkedIn Skills (2011)




DATA VISUALIZATION SUMMIT                                                18 1
                                                                           8
VISUALIZATION
    •  Visualization and statistics are the two basic toolkits one can use
       on graphs
    •  Complex questions are asked when studying graphs


    •  Easy
      •  Min, max, average, quartiles          Excel can do this!
      •  Exact queries, search


    •  Harder
      •  Patterns, trends, correlations
      •  Changes over time, context
      •  Anomalies, data errors                Visualization can do this!
      •  Geographical representation



DATA VISUALIZATION SUMMIT                                                    19 1
                                                                               9
GRAPH VISUALIZATION
    •  Due to the size of graphs and the complexity of questions,
       visualization is the natural tool to understand what’s going on

                “ We are more easily persuaded by the reasons we
                ourselves discover than by those which are given to us by
                others.” Blaise Pascal
                       Let me play with the data!




 Direct manipulation



DATA VISUALIZATION SUMMIT                                                   20 2
                                                                              0
DATA EXPLORATION AND INTERACTION
    •  Use visualization and statistics to discover new hypothesis
      •  Exploratory data analysis
        “The greatest value of a picture is when it forces us
        to notice what we never expected to see.”

        John Tukey

    •  The user interface is centered around the human
    •  Empowers the user to understand the structure and patterns in
       the data
    •  The machine augments the human
    •  How?
      •  Overview and details, zoom and pan interface
      •  Interactive, direct-manipulation


DATA VISUALIZATION SUMMIT                                              21 2
                                                                         1
MAP YOUR DATA
    •  Iterative process to transform relational data into a map




    •  Use color, size and position to highlight, group and set up a
       hierarchy




DATA VISUALIZATION SUMMIT                                              22 2
                                                                         2
FROM INFORMATION TO KNOWLEDGE
    •  Exploring networks interactively & iterating often provide
       “Eureka” moments for domain experts




                                                           Eureka




DATA VISUALIZATION SUMMIT                                           23 2
                                                                      3
BIG GRAPH DATA
    •  Big graph data doesn’t necessarily mean you’re visualizing or
       analyzing a large graph
    •  Small graphs can be extracted from large graphs and analyzed
    •  Small graphs can be extracted from non-graph data as well
    •  Graphs are just nodes and relationships after all


    •  Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi
       (Josh Wills, Cloudera, 2012)




DATA VISUALIZATION SUMMIT                                               24 2
                                                                          4
GEPHI
    •  Built to solve large graph visualization problems.
    •  Open source tool for Windows, Mac OS X and Linux
    •  Large international community involved
    •  The latest version has been downloaded > 100,000 times
    •  Extensible with plug-ins
    •  Available at http://gephi.org




DATA VISUALIZATION SUMMIT                                       25 2
                                                                  5
GEPHI
              DATA EDITION


      VISUAL
     MAPPING                                  FILTER


                             VISUALIZATION   STATISTICS




     LAYOUT
                              TIMELINE

DATA VISUALIZATION SUMMIT                                 26 2
                                                            6
SIGMA.JS
    •  Open-source lightweight JavaScript library to draw graphs
    •  Uses HTML5 Canvas
    •  Display dynamically graphs that can be generated on the fly
    •  Available at http://sigmajs.org




                                                   Sigma.js v0.1


DATA VISUALIZATION SUMMIT                                            27 2
                                                                       7
SUMMARY
    •  Big graph data = Relational Big Data
    •  Graphs are everywhere!
    •  Graphs have fascinating structure and patterns one can analyze
    •  Visualization is a natural tool for such complex data and complex
       questions
    •  On graphs, visualization done right allows interaction and
       iteration. Play.
    •  The hard part is to extract a small or medium graph from big data
    •  Open source tools like Gephi or Sigma.js are a good start




DATA VISUALIZATION SUMMIT                                                  28 2
                                                                             8
Become a graph evangelist!




                    QUESTIONS?

                   Mathieu Bastian (@mathieubastian)



DATA VISUALIZATION SUMMIT                              29 2
                                                         9
REFERENCES & LINKS
    Join the Social Network Analysis class by Lada Adamic on Coursera        Sigma.js, Alexis Jacomy and al.
    https://www.coursera.org/course/sna                                      http://sigmajs.org

    Support the Gephi Consortium                                             Linked: How Everything Is Connected to Everything Else and What It
    http://consortium.gephi.org                                              Means, Albert-Laszlo Barabasi
                                                                             http://www.amazon.com/gp/product/0452284392/
    Computational Information Design, Ben Fry (2004)
    http://benfry.com/phd/                                                   Six Degrees: The Science of a Connected Age, Duncan J. Watts
                                                                             http://www.amazon.com/gp/product/0393325423/
    The Atlas of Economic Complexity, Harvard's Center for International
    Development (CID) and the MIT Media Lab                                  Nexus: Small Worlds and the Groundbreaking Science of Networks,
    http://atlas.media.mit.edu/                                              Mark Buchanan
                                                                             http://www.amazon.com/gp/product/0393324427
    The Mesh of Civilizations and International Email Flows, Bogdan State,
    Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy                  Connected: The Surprising Power of Our Social Networks and How They
    http://arxiv.org/abs/1303.0045                                           Shape Our Lives, Nicholas A. Christakis and James H. Fowler
                                                                             http://www.amazon.com/dp/product/0316036137
    The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B,        Atelier Iceberg – Gephi
    Vidal M, Barabási A-L (2007)                                             http://www.slideshare.net/ateliericeberg/gephi-17680699
    http://www.pnas.org/content/104/21/8685.full
                                                                             Adding Value through graph analysis using Titan and Faunus, Matthias
    What does your intranet look like?                                       Broecheler
    http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html   http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013

    Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu-     Network Maps Board on Pinterest, Mathieu Bastian
    Ru Lin, Lada A. Adamic                                                   http://pinterest.com/mathieubastian/network-maps/
    http://arxiv.org/abs/1111.3919
                                                                             Network Science Book, Albert-László Barabási
    US Presidents Inaugural Speeches 1969-2013 Text Network Analysis         http://barabasilab.neu.edu/networksciencebook
    http://noduslabs.com/cases/presidents-inaugural-speeches-text-
    network-analysis/                                                        Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera
                                                                             https://github.com/cloudera/ades
    10 Reasons Why We Visualise Data
    http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data




DATA VISUALIZATION SUMMIT                                                                                                                           30 3
                                                                                                                                                      0

More Related Content

Similar to Visualize Big Graph Data

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsJason Riedy
 
Inhibitors to Information Sharing
Inhibitors to Information SharingInhibitors to Information Sharing
Inhibitors to Information SharingWalter Kitchenman
 
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs Jason Riedy
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataDataCards
 
Heatmaps are the Heat
Heatmaps are the HeatHeatmaps are the Heat
Heatmaps are the HeatAbe Usher
 
TNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of DataTNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of DataZsoltNC
 
Graph visualization options and latest developments
Graph visualization options and latest developmentsGraph visualization options and latest developments
Graph visualization options and latest developmentsLinkurious
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013Gigaom
 
Friend Gastein 2012-10-04
Friend Gastein 2012-10-04Friend Gastein 2012-10-04
Friend Gastein 2012-10-04Sage Base
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESNexgen Technology
 
Global bigdata conf_01282013
Global bigdata conf_01282013Global bigdata conf_01282013
Global bigdata conf_01282013HPCC Systems
 

Similar to Visualize Big Graph Data (20)

Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Big Data
Big Data Big Data
Big Data
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
 
Inhibitors to Information Sharing
Inhibitors to Information SharingInhibitors to Information Sharing
Inhibitors to Information Sharing
 
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial Data
 
Heatmaps are the Heat
Heatmaps are the HeatHeatmaps are the Heat
Heatmaps are the Heat
 
FR.pptx
FR.pptxFR.pptx
FR.pptx
 
TNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of DataTNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of Data
 
Big data 101
Big data 101Big data 101
Big data 101
 
Graph visualization options and latest developments
Graph visualization options and latest developmentsGraph visualization options and latest developments
Graph visualization options and latest developments
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
 
Friend Gastein 2012-10-04
Friend Gastein 2012-10-04Friend Gastein 2012-10-04
Friend Gastein 2012-10-04
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
 
Global bigdata conf_01282013
Global bigdata conf_01282013Global bigdata conf_01282013
Global bigdata conf_01282013
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Visualize Big Graph Data

  • 1. M AT H I E U B A S T I A N D ATA V I S U A L I Z AT I O N S U M M I T, 1 SAN FRANCISCO, APRIL 11-12, 2013
  • 2. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA DATA VISUALIZATION SUMMIT 2 2
  • 3. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DATA VISUALIZATION SUMMIT 3 3
  • 4. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DISTRIBUTED SYSTEMS COMPLEX STORAGE DATABASES INDEXATION LARGE DATASETS ALGORITHM CLOUD COMPUTING HADOOP ANALYTICS REAL-TIME VISUALIZATION DATA VISUALIZATION SUMMIT 4 4
  • 5. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DISTRIBUTED SYSTEMS COMPLEX STORAGE DATABASES INDEXATION LARGE DATASETS ALGORITHM CLOUD COMPUTING HADOOP ANALYTICS REAL-TIME VISUALIZATION DATA VISUALIZATION SUMMIT 5 5
  • 6. BIG DATA •  “The Petabyte age” •  All industries and domains can leverage big data Health Government Finance Technology •  Big Data => Big Problems •  Focusing on building the technology to handle big data, and big graph data (ex: graph databases) •  Seeking efficient analysis of ever more complex systems DATA VISUALIZATION SUMMIT 6 6
  • 7. GRAPHS •  Graphs are everywhere, and it’s easy to collect graph data •  The world is more complex and interconnected that we thought Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442 DATA VISUALIZATION SUMMIT 7 7
  • 8. NETWORK SCIENCE •  The study of graphs has been exploding in the last 15 years •  Networks have properties and patterns one can study •  Robustness – How a network is resistant to random attacks? •  Contagion – How fast a disease or gossip spread in a network? •  Communities – How many communities exist in a network? •  Centrality – Who is the most central individual in a network? •  If you read one of these books, you understand Network Science DATA VISUALIZATION SUMMIT 8 8
  • 9. GRAPHS HELP SOLVE PROBLEMS •  Saddam Hussein Network (2003) The Universe C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. www.slate.com/ id/2245228/. DATA VISUALIZATION SUMMIT 9 9
  • 10. GRAPHS HELP SOLVE PROBLEMS •  Predicting and controlling infectious disease Naoki Masuda, Petter Holme - Predicting and controlling infectious disease The Universe epidemics using temporal networks. http://f1000.com/prime/reports/b/5/6/ Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual networks in a male homosexual community in Iceland. J Acquir Immune Defic Syndr. 1992, 5:374–81. DATA VISUALIZATION SUMMIT 10 1 0
  • 11. GRAPHS HELP SOLVE PROBLEMS •  Recommendation systems The Universe Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/ DATA VISUALIZATION SUMMIT 11 1 1
  • 12. GRAPHS HELP SOLVE PROBLEMS •  Recipe recommendation using ingredient networks The Universe Credit: http://www.ladamic.com/wordpress/?p=294 1 DATA VISUALIZATION SUMMIT 21 2
  • 13. GRAPHS HELP SOLVE PROBLEMS •  Power grid The Universe Credit: http://www.npr.org/templates/story/story.php?storyId=110997398 DATA VISUALIZATION SUMMIT 13 1 3
  • 14. SMALL GRAPHS •  Famous “Zachary’s Karate Club” study in 1977 only involved 34 nodes. •  It could be drawn by hand on paper The Universe Zachary’s Karate Club (1977) W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977). DATA VISUALIZATION SUMMIT 14 1 4
  • 15. MEDIUM GRAPHS •  Your own Facebook or LinkedIn social network •  The Harlem Shake: Anatomy of a Viral Meme The Universe Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html DATA VISUALIZATION SUMMIT 15 1 5
  • 16. LARGE GRAPHS •  The Internet Map (~350 000 domains) •  DBPedia (~290M relationships) •  Friendster Social Network dataset* (1.8B edges) The Universe Internet Map (http://internet-map.net) * http://snap.stanford.edu/data/index.html DATA VISUALIZATION SUMMIT 16 1 6
  • 17. IMPLICIT GRAPHS •  Graphs can be explicit or implicit •  Explicit: The network exists in nature (Social Network, Food Webs, Airlines Network) •  Implicit: The network is derived from other data (Word networks, co- authorship) •  Example of an implicit graph: •  A set of documents have a set of tags •  One can create a link when two tags are on the same document •  Aggregate all links across all documents DATA VISUALIZATION SUMMIT 17 1 7
  • 18. SIMILARITY GRAPHS •  Graphs of all the co-occurrences between LinkedIn Skills (2011) DATA VISUALIZATION SUMMIT 18 1 8
  • 19. VISUALIZATION •  Visualization and statistics are the two basic toolkits one can use on graphs •  Complex questions are asked when studying graphs •  Easy •  Min, max, average, quartiles Excel can do this! •  Exact queries, search •  Harder •  Patterns, trends, correlations •  Changes over time, context •  Anomalies, data errors Visualization can do this! •  Geographical representation DATA VISUALIZATION SUMMIT 19 1 9
  • 20. GRAPH VISUALIZATION •  Due to the size of graphs and the complexity of questions, visualization is the natural tool to understand what’s going on “ We are more easily persuaded by the reasons we ourselves discover than by those which are given to us by others.” Blaise Pascal Let me play with the data! Direct manipulation DATA VISUALIZATION SUMMIT 20 2 0
  • 21. DATA EXPLORATION AND INTERACTION •  Use visualization and statistics to discover new hypothesis •  Exploratory data analysis “The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey •  The user interface is centered around the human •  Empowers the user to understand the structure and patterns in the data •  The machine augments the human •  How? •  Overview and details, zoom and pan interface •  Interactive, direct-manipulation DATA VISUALIZATION SUMMIT 21 2 1
  • 22. MAP YOUR DATA •  Iterative process to transform relational data into a map •  Use color, size and position to highlight, group and set up a hierarchy DATA VISUALIZATION SUMMIT 22 2 2
  • 23. FROM INFORMATION TO KNOWLEDGE •  Exploring networks interactively & iterating often provide “Eureka” moments for domain experts Eureka DATA VISUALIZATION SUMMIT 23 2 3
  • 24. BIG GRAPH DATA •  Big graph data doesn’t necessarily mean you’re visualizing or analyzing a large graph •  Small graphs can be extracted from large graphs and analyzed •  Small graphs can be extracted from non-graph data as well •  Graphs are just nodes and relationships after all •  Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi (Josh Wills, Cloudera, 2012) DATA VISUALIZATION SUMMIT 24 2 4
  • 25. GEPHI •  Built to solve large graph visualization problems. •  Open source tool for Windows, Mac OS X and Linux •  Large international community involved •  The latest version has been downloaded > 100,000 times •  Extensible with plug-ins •  Available at http://gephi.org DATA VISUALIZATION SUMMIT 25 2 5
  • 26. GEPHI DATA EDITION VISUAL MAPPING FILTER VISUALIZATION STATISTICS LAYOUT TIMELINE DATA VISUALIZATION SUMMIT 26 2 6
  • 27. SIGMA.JS •  Open-source lightweight JavaScript library to draw graphs •  Uses HTML5 Canvas •  Display dynamically graphs that can be generated on the fly •  Available at http://sigmajs.org Sigma.js v0.1 DATA VISUALIZATION SUMMIT 27 2 7
  • 28. SUMMARY •  Big graph data = Relational Big Data •  Graphs are everywhere! •  Graphs have fascinating structure and patterns one can analyze •  Visualization is a natural tool for such complex data and complex questions •  On graphs, visualization done right allows interaction and iteration. Play. •  The hard part is to extract a small or medium graph from big data •  Open source tools like Gephi or Sigma.js are a good start DATA VISUALIZATION SUMMIT 28 2 8
  • 29. Become a graph evangelist! QUESTIONS? Mathieu Bastian (@mathieubastian) DATA VISUALIZATION SUMMIT 29 2 9
  • 30. REFERENCES & LINKS Join the Social Network Analysis class by Lada Adamic on Coursera Sigma.js, Alexis Jacomy and al. https://www.coursera.org/course/sna http://sigmajs.org Support the Gephi Consortium Linked: How Everything Is Connected to Everything Else and What It http://consortium.gephi.org Means, Albert-Laszlo Barabasi http://www.amazon.com/gp/product/0452284392/ Computational Information Design, Ben Fry (2004) http://benfry.com/phd/ Six Degrees: The Science of a Connected Age, Duncan J. Watts http://www.amazon.com/gp/product/0393325423/ The Atlas of Economic Complexity, Harvard's Center for International Development (CID) and the MIT Media Lab Nexus: Small Worlds and the Groundbreaking Science of Networks, http://atlas.media.mit.edu/ Mark Buchanan http://www.amazon.com/gp/product/0393324427 The Mesh of Civilizations and International Email Flows, Bogdan State, Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy Connected: The Surprising Power of Our Social Networks and How They http://arxiv.org/abs/1303.0045 Shape Our Lives, Nicholas A. Christakis and James H. Fowler http://www.amazon.com/dp/product/0316036137 The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Atelier Iceberg – Gephi Vidal M, Barabási A-L (2007) http://www.slideshare.net/ateliericeberg/gephi-17680699 http://www.pnas.org/content/104/21/8685.full Adding Value through graph analysis using Titan and Faunus, Matthias What does your intranet look like? Broecheler http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013 Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu- Network Maps Board on Pinterest, Mathieu Bastian Ru Lin, Lada A. Adamic http://pinterest.com/mathieubastian/network-maps/ http://arxiv.org/abs/1111.3919 Network Science Book, Albert-László Barabási US Presidents Inaugural Speeches 1969-2013 Text Network Analysis http://barabasilab.neu.edu/networksciencebook http://noduslabs.com/cases/presidents-inaugural-speeches-text- network-analysis/ Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera https://github.com/cloudera/ades 10 Reasons Why We Visualise Data http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data DATA VISUALIZATION SUMMIT 30 3 0