Untangling Influence and Desire: Visual Analysis of Massive Data

•Download as PPTX, PDF•

1 like•415 views

Turi, Inc.

Presenter: Rob Harper, Partner, Lead Product Architect, Uncharted

Technology

© 2015 Uncharted Software Inc.
http://pantera.io http://influent.io

© 2015 Uncharted Software Inc.
Visualizing
Graphs as Maps
Communities
Relationships
Spatial Model
Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009)
Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803.
doi:10.1371/journal.pone.0004803

© 2015 Uncharted Software Inc.
Challenges of
Big Graphs
Computation
Layout
Interaction
Labeling

© 2015 Uncharted Software Inc.
Chelsea FC Fan Communities
248,747,072 tweets
554,430 account nodes
100,700 mention links
Demo
skip

© 2015 Uncharted Software Inc.
Geo Communities
Fans Mention Graph

© 2015 Uncharted Software Inc.
Geo Communities
Player Sentiment

© 2015 Uncharted Software Inc.
Geo Communities
Trending Topics + Drilldown

© 2015 Uncharted Software Inc.
Community Graph

© 2015 Uncharted Software Inc.
Graph Tiling Pipeline
Hierarchical
Community
Clustering
Hierarchical
Graph Layout
Tile Generation
GraphX
Distributed Louvain
Modularity
Hierarchical
Force Directed

© 2015 Uncharted Software Inc.
Product Affinity
Stanford collection of anonymized Amazon product reviews and
“customers who bought this also bought…” over 9 years
2,372,409 product and customer nodes
9,909,551 review and co-purchase links

© 2015 Uncharted Software Inc.
Meeting the
Challenges of
Big Graphs
Computation
Layout
Interaction
Labeling
Distributed Computing
Hierarchical Communities
Level-of-detail Rendering
Layered Analytics

© 2015 Uncharted Software Inc.
Other Applications
Bitcoin Network Doctor Referrals Neural Networks

© 2015 Uncharted Software Inc.
Thanks!
Rob Harper
rharper@uncharted.software
@rdharper

Viewers also liked

Doedag DA2020 Venlo 9 maart KING

Anatomy of Eyelids & Its Clinical CorrelationsSarmila Acharya

Boimetric using 8051 Shilpa Nayak

RECHECK - Електронен Етикет recheck

Documentación requerida para la aplicación de la norma ISO 14000alfredoluis27

Eco 405 week 10 quiz strayerMaryLouis2018

Doedag DA2020 Venlo 9 maart KING

Viewers also liked (8)

Doedag DA2020 Venlo 9 maart

Anatomy of Eyelids & Its Clinical Correlations

Boimetric using 8051

RECHECK - Електронен Етикет

Documentación requerida para la aplicación de la norma ISO 14000

Eco 405 week 10 quiz strayer

Doedag DA2020 Venlo 9 maart

Similar to Untangling Influence and Desire: Visual Analysis of Massive Data

Starling Report Gary Henkle

Enhancing user engagement on mobile devicesRandall Arnold

Next Generation DevicesGavin Bridge

Carri Bugbee on Social TVCarri Bugbee

Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Darlene Fichter

The Data Behind Distributed ContentDan Valente

The Social Media Renaissancejawadshuaib

The Convergence of TV and Tablet Experiences by mPortalmPortal, Inc.

TV and the Second Screen: A Red Bee Media and Decipher white paperMIP Markets

Rich Internet ApplicationsDr. V Vorvoreanu

Healthy City Community Planning and Development webinarHealthy City

data, big data, open dataVincenzo Patruno

Keynote: Riding the Digital WaveMerging Media

Which way up? Drawing and reading maps of the blogosphereTim Highfield

DIVE+ @PATCH2015 Workshop @IUI2015Lora Aroyo

Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew

Paid TV's Uncertain FutureYankee Group

Introducing telemetricsDarryl Woodford

Oscon Presentation.Computational Journalismbradstenger

Data as a Creative MaterialAudree Lapierre

Similar to Untangling Influence and Desire: Visual Analysis of Massive Data (20)

Starling Report

Enhancing user engagement on mobile devices

Next Generation Devices

Carri Bugbee on Social TV

Data 2.0 - Harnessing New Data Visualization Tools CIL 2008

The Data Behind Distributed Content

The Social Media Renaissance

The Convergence of TV and Tablet Experiences by mPortal

TV and the Second Screen: A Red Bee Media and Decipher white paper

Rich Internet Applications

Healthy City Community Planning and Development webinar

data, big data, open data

Keynote: Riding the Digital Wave

Which way up? Drawing and reading maps of the blogosphere

DIVE+ @PATCH2015 Workshop @IUI2015

Eavesdropping on the Twitter Microblogging Site

Paid TV's Uncertain Future

Introducing telemetrics

Oscon Presentation.Computational Journalism

Data as a Creative Material

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Commit 2024 - Secret Management made easyAlfredo García Lavilla

costume and set research powerpoint presentationphoebematthew05

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Story boards and shot lists for my a level piececharlottematthew16

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Artificial intelligence in cctv survelliance.pptxhariprasad279825

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"ML in Production",Oleksandr BaganFwdays

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Are Multi-Cloud and Serverless Good or Bad?

Commit 2024 - Secret Management made easy

costume and set research powerpoint presentation

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Story boards and shot lists for my a level piece

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Understanding the Laravel MVC Architecture

Artificial intelligence in cctv survelliance.pptx

DevEX - reference for building teams, processes, and platforms

"ML in Production",Oleksandr Bagan

Designing IA for AI - Information Architecture Conference 2024

Unraveling Multimodality with Large Language Models.pdf

My Hashitalk Indonesia April 2024 Presentation

DMCC Future of Trade Web3 - Special Edition

Unleash Your Potential - Namagunga Girls Coding Club

Human Factors of XR: Using Human Factors to Design XR Systems

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

"Debugging python applications inside k8s environment", Andrii Soldatenko

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Untangling Influence and Desire: Visual Analysis of Massive Data

3. © 2015 Uncharted Software Inc. Visualizing Graphs as Maps Communities Relationships Spatial Model Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803

4. © 2015 Uncharted Software Inc. Visualizing Graphs as Maps Communities Relationships Spatial Model Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803

17. © 2015 Uncharted Software Inc. Product Affinity Stanford collection of anonymized Amazon product reviews and “customers who bought this also bought…” over 9 years 2,372,409 product and customer nodes 9,909,551 review and co-purchase links

24. © 2015 Uncharted Software Inc. Meeting the Challenges of Big Graphs Computation Layout Interaction Labeling Distributed Computing Hierarchical Communities Level-of-detail Rendering Layered Analytics

Editor's Notes

At Uncharted specializes in creating visual analytic solutions for a variety of industries. Today drawing on elements from a few of the projects we’re working on, namely: PanTera for large multidimensional and geo data and Influent for big graph data Also be pulling from related research and open source initiatives being developed as part of a DARPA program for reasoning over big data
There are many reasons to visualize node-link graph data. Often the need stems from assessing believability or trust in an ML computed answer, or seeking context for it, looking for nuance or patterns that may have been missed. But beyond that, what does visual analysis entail? Frequently we’re looking for clusters and communities; groups of nodes that are closely connected and may suggest similarity or affinity. We also look for the relationships / structures between nodes and communities …
In this map of science, what is it that connects organic chemistry to international studies? Ultimately we’re creating a spatial representation or a 2d map of a complex dataset. This helps us, as spatial thinkers, build mental models of how the data is organized. We use our well-developed visual perception to see anomalies and patterns that may beg further questions or lead to new insights.
As the volume of data that we’re collecting and analyzing grows, millions of nodes or billions of edges, actually seeing our graphs becomes increasingly problematic. The computational cost of laying out and rendering the graph becomes prohibitive Layouts often results in an unresolved hairball that reveals little The size and cost leads to our interactions being limited And ineffective labeling makes it hard to understand the nature of communities and relationships
Look at a graph of Twitter users who tweeted about Chelsea FC soccer team during the 2014 season. Demo:
Start by side-stepping two of the issues the plague big graphs – computation cost of layout and hairball results. Do this by using the actual geo-locations of the nodes (the users) as our layout. This is visualizing a graph as a map in the most literal sense and focuses on spatial communities and patterns. The arcs we see are twitter mentions between users and are drawn in a clockwise fashion to show link directionality. At this level we get an immediate sense of high-level geo structure of the data – large community in england but also high engagement in Spain, and West Africa.
To address render time and interactivity costs, we use a heatmap aggregate view of all links and show appropriate level of detail at each scale. Adopting a Google Maps paradigm we serve only the data necessary for a given view keeping render speed high in the browser. So at this level our focus is on link patterns and many are immediately obvious - such as a high one-sided volume of traffic from Guatemala to Spain, also England to West Africa. And I can drill down from the global to the local level, revealing patterns at different scales.
But we also need to address labeling - ways of understanding the character of the communities we see (beyond geography). Here is a layer showing player mention sentiment by region
Another with trending topics that may help characterize the communities we see. In this case we see a very specific hashtag “we need west cumberland hospital” common to the users in this region. Drilling down reveals a single-day campaign around a local issue.
Here we’ve plotted the same graph but laid out based on network structure. Many of the elements remain the same as the geo layout: directional arc links and approach to rendering. At this zoom level some of the most prominent communities are outlined and labeled. Just like zooming in a map reviews finer details of cities and streets, zooming here reveals additional communities and labels. Spatial proximity between clusters helps show us strength of relationship and the heatmap of links gives us a sense of structure.
Approach the labeling issue by layering in additional information to help understand the nature of our clusters. In this case we have a cluster centered around BBC Sports that also has strong mention of the soap opera “Coronation Street”. Manchester United appears as a common theme in this region of the graph.
Technical Approach: We start with raw source data and run it through a series of steps to produce a set of web-map-style data tiles for consumption by the client.
We first detect communities in the graph. We apply a distributed Louvain Modularity algorithm which starts with the raw graph and detects increasingly higher-level community structures. We address the typically large computational costs of this by using the distributed computing platform Apache Spark using the GraphX library.
We then take these communities and run a hierarchical graph layout algorithm, again using Spark. This starts by laying out the highest level aggregate community graph with a distributed force-directed layout. Once the top level communities are positioned, the next level down is laid out within the constraints of the parent communities, and so-on down to the raw nodes. In this manner we are addressing two of the challenges of big graph visualization that we avoided in the geo-case: - Computation cost, as hierarchical layout is easily data parallelizable across a cluster of computers. - Furthermore, this community-based layout is good at reducing hairball results by pulling apart communities and focusing on relationships between them.
In the final stage again we borrow concepts web geographic maps and apply a “data tile” approach - the information from earlier stages is stored using a hierarchical division of space. Again, we’re addressing interaction and labeling here by efficiently delivering appropriate level-of-detail and domain-specific data representations to the client.
Final Demo: This application looks at a far larger graph - a 2M node, 10M link graph of Amazon products, purchases, and reviews.
We use the same community-based approach to big graph layout. Here Amazon products are our nodes and links are co-review and co-purchases. Communities tend to suggest product affinity. Despite an order of magnitude increase in data size, the speed and interactivity remains the same. But one issue I didn’t discuss with the twitter graph was how we can navigate the graph at the individual node level.
By selecting a node or even a cluster of interest we can switch to a different view of the data – an interactive lens into the graph with the ability to trace a path through links of interest either individually or in aggregate. As we do so we can see the strength and temporal distribution of connections. This is all linked back to the global graph.
…where I can see them in context.
Zooming in ….
and again, we see the big cluster from earlier break up into a constellation of smaller communities.
Zooming further still, we see the position of these products in the context of other clusters showing interesting co-purchase groups by proximity – The Lord of the Rings not surprisingly near Harry Potter
In summary, our approach seeks to meet some of the challenges of visualizing big graphs: Computation cost through distributed computing and data parallelism; Hairball layouts through a hierarchical community-based layout; Interaction through the application of web-map-style data tiles And finally effective labeling through the use of scale-sensitive domain-specific data stored with the tiles.
Finally, we’ve applied this technology to several other domains such as financial networks, doctor referral networks, and biological data.

Untangling Influence and Desire: Visual Analysis of Massive Data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Untangling Influence and Desire: Visual Analysis of Massive Data

Similar to Untangling Influence and Desire: Visual Analysis of Massive Data (20)

More from Turi, Inc.

More from Turi, Inc. (20)

Recently uploaded

Recently uploaded (20)

Untangling Influence and Desire: Visual Analysis of Massive Data

Editor's Notes