SlideShare a Scribd company logo
1 of 52
Social Network Analysis
An overview
Presentation by @dougneedham
Introduction
 @dougneedham
 Data Guy - Started as a DBA in the Marine Corps, evolved to Architect,
now Data Scientist.
 Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark.
 I have a strong relational/traditional background.
 Perpetual Student
 Learning new things challenges our assumptions. Forces us to take a
new perspective on “old” problems. Eventually maybe even shows us
that there is a better way to solve a problem.
Why study social networks?
 It is cool.
 The concepts around Social Network Analysis can be applied to many
interesting problems in a variety of business verticals.
 The foundation of Social Network Analysis is Graph theory.
 Solving Crime
 Some examples: Introduction to Graph_Theory
What is Social Network Analysis?
 “Social network analysis (SNA) is a strategy for investigating social
structures through the use of network and graph theories. It
characterizes networked structures in terms of nodes (individual actors,
people, or things within the network) and the ties or edges
(relationships or interactions) that connect them. Examples of social
structures commonly visualized through social network analysis include
social media networks, friendship and acquaintance networks, kinship,
disease transmission, and sexual relationships. These networks are often
visualized through sociograms in which nodes are represented as points
and ties are represented as lines.” – Wikipedia
 https://en.wikipedia.org/wiki/Social_network_analysis
Example From wiki:
"Kencf0618FacebookNetwork" by
Kencf0618 - Own work. Licensed under
CC BY-SA 3.0 via Wikimedia Commons -
https://commons.wikimedia.org/wiki/File:
Kencf0618FacebookNetwork.jpg#/medi
a/File:Kencf0618FacebookNetwork.jpg
A little History
 The 7 Bridges of Konisberg
 Every tome on Graph theory or Network analysis devotes a small
portion of there time to the 7 Bridges of Konisberg.
 If I don’t cover this with you, the gods of mathematics will strike me
down, and never allow me to do analysis again in the future.
The Bridges
The Problem
 Folks enjoyed there Sunday afternoon strolls across the bridges, but
occasionally people would wonder if one particular route was more
efficient than another.
 Eventually Leonhard Euler was brought into the debate about the
efficiency problem.
 Euler used Vertices to represent the land masses and edges (or arcs, at
the time) to represent bridges. He realized the odd number of edges
per vertex made the problem unsolvable.
 Sarada Herke provides for one of the best explanations of the solution
Solution to Konisburg
 And here is the cool thing about mathematicians. If we tell you
something is impossible, we have to tell you why in a way you can
understand it. But he also invented the branch of mathematics today
we call Graph Theory.
 http://en.wikipedia.org/wiki/Leonhard_Euler
Why analyze Facebook data?
 Facebook is something that most people use.
 It is easy to see the relationships and the concepts of the
Graph/Network are intuitive to people who are looking at their “own”
network.
 The main idea is, if you can understand your own friend data, you can
learn the concepts quickly, then apply these same concepts to more
complicated problems.
 We will talk a little about some complicated topics at the end.
A few terms
 Stand back, we are going to talk about math!
 Basically we are talking about a bunch of dots joined together by lines
 Vertex – Dot on a graph
 Edge – Line connecting the two points
 Edge_Label – this is a term I coined originally related to Data Structure Graphs that
helps trace a path. If you label your edges, and you have multiple edges with the same
label in a Graph you can quite easily identify walks, paths, and cycles through your
graph.
 Triangle – 3 Vertices, 3 Edges
 Square – 4 Vertices, 4 edges
 Open Triangle - 3 Vertices, 2 edges /
 A lot of things are networks if you look at them the right way.
 Mark Newman has done a number of well done presentations, available on Youtube
about Network analysis.
 https://www.youtube.com/watch?v=lETt7IcDWLI
More terms
 Transitivity – The friend of my friend is my friend. Really?
 Homophily – how things are similar
 Directed Graphs – or Digraphs
 Contagion – How do things “spread” through a network?
 Let’s rearrange things, how does the layout affect understanding?
 Order of a graph – number of vertices
 Size of the graph – number of edges
 This is not just data visualization, it can also be used for prediction.
https://www.youtube.com/watch?v=rwA-y-XwjuU
Final terms
 Centrality – Hub and Authority
 This is almost a whole topic by itself, since there are different types of
Centrality:
 Degree Centrality – Simple, the Vertex with the most degrees is the most
central.
 Eigenvector Centrality – How important a particular Vertex is to a given
network.
 PageRank – similar to Eigenvector Centrality, only scaled, and if a given
vertex is closely connected to very high PageRank vertex, it is itself given a
high PageRank.
 Serious nutshell definitions.
 Shortest path – How are two vertices connected?
 Longest Path – Tracing the flow of an interesting item through a large
collection of applications.
Why is a path important? More on this
later…
The Original Joke This is me in different stores
The Math doesn’t change.
 One thing I like about Graphs –
 The Math does not change.
 The math behind Graph theory can be a little intense, but it does not
change regardless of the scale of the graph.
 Once you understand how to “do the math” on a small graph, those
same Maths apply to a Graph whether it is a graph of the people in this
room, or a graph of the people on this planet.
 Now, let me introduce you to a tool that does much of the
Mathematics for you…
But first, Netvizz…
 Netvizz is a tool that extracts data from different sections of the Facebook Platform.
 It provides an interface to the Facebook Graph API
 https://www.youtube.com/watch?v=3vkKPcN7V7Q
 For the version of data we will be looking at, I was able to extract friendship connections.
Facebook has since changed their permissions such that you can no longer extract this
information.
 However, there are some other interesting things you can do with Netvizz.
 If you manage a Facebook Group, this might be interesting.
 For this particular talk we are going to focus on Gephi interpretation. If we want to have a
more in-depth talk on Facebook and the Graph API that Facebook has opened, we can
discuss that at another time.
 To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You
can un-authorized it later)
 You will have a number of options: group data, page data, page like network, search, and
link stats.
 Click “group data”
 Select a group if you need a sample id use: 39462256584
 It runs for a bit, then dumps to a zip file.
 Save the file, then extract it.
 Open Gephi, and use Gephi to import your GDF file.
Gephi
http://gephi.github.io/
From the website: “Gephi is an
interactive visualization and exploration
platform for all kinds of networks and
complex systems, dynamic and
hierarchical graphs.”
Java 1.7 required, you may have to set
this in Gephi.conf
Depending on the size of the network
you are studying you may need to
increase the memory available to Java
in Gephi.conf
Gephi Startup
Gephi – Open GML file
Gephi – After opening
Layout
Behavior Options
After running
Partitioning
Metrics
 Remember all those numbers we spoke about?
 Here are many of them.
Data Table
Configure Labels
Here is the layout with the labels as number of connections
Add Background
Visualization
File->Export-> SVG/PDF/PNG…
Export to Excel
How do we use this?
 Finding bottlenecks.
 You have to ignore the fact that everyone on this graph is connected
to you for a moment.
 How would someone get a message to another given person?
 They would have to pass it to someone either they both know, or pass
the message to someone who is more likely to be connected to the
target of the message.
 This was the heart of Milgram’s experiment that gave us the concept of
6 degrees of separation.
Other Analysis
 What else can be done with Social Network Analysis?
 How about risk exposure to banks?
 http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm
Application to Business Intelligence
 What if the Vertices are not people ?
 What if the Edges are not mutual connections?
 Jonathan and others over the past few meetings have done a great
job at explaining the underpinnings of how a particular BI framework is
put together.
 Within a Data Architecture there are lots of moving pieces. ETL, FTP,
SFTP, Web-Services, External data feeds. Data moving into Data Marts,
and Data Warehouses. Data Moving between applications.
 Let’s imagine how to visualize this using the information we just gained.
Data Structure Graph
 A Data Structure Graph is a group of atomic entities that are related to
each other, stored in a repository, then moved from one persistence
layer to another, rendered as a Graph.
 A group of atomic entities.
 Related to each other.
 Stored in a repository.
 Moved from one persistence layer to another.
 Rendered as a Graph.
Introducing Data Structure Graphs
 Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity
Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.
 Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an
application. Each Edge is data transfer. Roughly equivalent to what we
used to call Data Flow diagrams.
 Data Structure Graph Dependency (DSG-D) – Each vertex is a
job,script, program, or process that is dependent on something
happening in sequence before it can do its work.
 A DSG-L1 can show you where you are going to have the most
interesting query performance of your tables.
 A DSG-L2 can show you where the most amount of work is going on in
your Enterprise.
 A DSG-D can show you the sequence of events that need to take
place in order for something to be completed.
New Project, Data Table, Import data.
Load as “Edges Table” Source, Target (required)
Choose Create Missing Nodes
After a few calculations and layout runs
PageRank – Which application is most important?
A few more tweaks
Where is that Node with the highest PageRank?
Remember paths?
The Original Joke This is me in different stores
Dijkstra's algorithm
 Some of you may have heard of Dijkstra’s algorithm.
 It is a method for finding the shortest path between two nodes on a
Graph.
 This is a great optimization technique, but what if you need to find the
longest path?
 What “edge_label” has the most influence on my organization?
 Iterate through each Edge_Label, create a subgraph that consists of
only the nodes this Edge_Label touches, then calculate the diameter of
that Graph.
 The data point represented by a given Edge_label that has the longest
path has the most “value” to your organization.
https://dougneedham.shinyapps.io/DataStructureGraph
Hard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You
can see how an individual data entity flows through an organization.
My book
Goes through a number of examples for doing an Graph analysis of a fictional organization.
Consider the following:
 If you need assistance, send a message to the group, or contact me
directly (I am easy to find @dougneedham)
 Network/Graph Analysis is cool.
 It can show you some interesting things about your data that you may
not have considered.
 Due thought should be put towards a network analysis project.
 Organizing the data requires a bit of thought. (From -> To vertices is just
a start).
 Directed graph, undirected, bigraph? Setup work needs to be done.
 Tools help with the detailed calculations, and show the paths, walks,
etc.
What did I leave out?
 Graphs that change over time – What happens when you remove a single
Edge or Vertex?
 Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random
versus Preferential Attachment)
 Scale Free networks – Graphs that conform to Power laws. (These are
intrinsically Social Networks, but I didn’t give much detail)
 Comparing two networks – If you have the same number of edges and
nodes, are two graphs the same? Is one graph an isomorphism of another?
 Contagion – Ceteris paribus how will things(information, virus’s,
data,disease…) spread through the network. (Since a DSG represents
different types of Edges based on Edge_Label, Contagion should not affect
this type of network entirely.)
 Large Graphs – GraphX a part of Apache Spark is best used for this
purpose.
 The strength of Weak Ties Paradox
 Social Capital
Finally… Want to do Data Science?
 Challenge for members of the audience.
 1. Download Gephi.
 2. Put together a simple CSV: Source, Target,Edge_Label that describes
your own data environment.
 3. Load it in Gephi and have Gephi run the metrics, and perform the auto
layout.
 4. Answer this question: Did you get what you expected?
 5. Get a colleague to do the same thing, compare the images. How similar
are they?
 Here is my hypothesis: If you have more than 5 data applications, including
Hadoop, and Data Warehouse infrastructure, your Graph will follow the
rules of preferential attachment. (To<->From ETL tools don’t count in the
analysis)
 Tweet me @dougneedham #DataStructureGraph (anonymized, of course.)
 What does your Graph look like?
Final Thoughts – Questions?

More Related Content

What's hot

Data mining in social network
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
 
Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasyJeff Mohr
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis WorkshopData Works MD
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPremsankar Chakkingal
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101librarianrafia
 
Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Wael Elrifai
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondNeo4j
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)SocialMediaMining
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisScott Gomer
 
Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)SocialMediaMining
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part ITHomas Plotkowiak
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An IntroductionAli Abbasi
 

What's hot (20)

Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made Easy
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social Network Analysis (SNA)
Social Network Analysis (SNA)Social Network Analysis (SNA)
Social Network Analysis (SNA)
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Link prediction
Link predictionLink prediction
Link prediction
 
Ppt
PptPpt
Ppt
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 

Viewers also liked

LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn
 
Merry Christmas
Merry ChristmasMerry Christmas
Merry Christmassoniapr30
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...sachivchawla
 
陈兵教授《论附佛外道》
陈兵教授《论附佛外道》陈兵教授《论附佛外道》
陈兵教授《论附佛外道》walkmankim
 
Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Kajsa Snickars
 
Interpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingInterpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingBryan Marks
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signalAbhishek Pathak
 
Impressionisme informàtica
Impressionisme informàticaImpressionisme informàtica
Impressionisme informàticatorragrau
 
αντιγονη
αντιγονηαντιγονη
αντιγονηekidrou
 
使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)Chengtao Lin
 
Living in the moment
Living in the momentLiving in the moment
Living in the momentwalkmankim
 
圣严法师108语录
圣严法师108语录圣严法师108语录
圣严法师108语录walkmankim
 
ROBOTS POWER POINT
ROBOTS POWER POINTROBOTS POWER POINT
ROBOTS POWER POINTsoniapr30
 
郑水吉《楞严经新表解》
郑水吉《楞严经新表解》郑水吉《楞严经新表解》
郑水吉《楞严经新表解》walkmankim
 
原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经walkmankim
 
Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Kajsa Snickars
 

Viewers also liked (20)

LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
 
Merry Christmas
Merry ChristmasMerry Christmas
Merry Christmas
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
 
陈兵教授《论附佛外道》
陈兵教授《论附佛外道》陈兵教授《论附佛外道》
陈兵教授《论附佛外道》
 
Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0
 
James_McLaughlin_Render
James_McLaughlin_RenderJames_McLaughlin_Render
James_McLaughlin_Render
 
Trailer production
Trailer production Trailer production
Trailer production
 
Interpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingInterpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and Reporting
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signal
 
Impressionisme informàtica
Impressionisme informàticaImpressionisme informàtica
Impressionisme informàtica
 
αντιγονη
αντιγονηαντιγονη
αντιγονη
 
使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)
 
Living in the moment
Living in the momentLiving in the moment
Living in the moment
 
圣严法师108语录
圣严法师108语录圣严法师108语录
圣严法师108语录
 
ROBOTS POWER POINT
ROBOTS POWER POINTROBOTS POWER POINT
ROBOTS POWER POINT
 
郑水吉《楞严经新表解》
郑水吉《楞严经新表解》郑水吉《楞严经新表解》
郑水吉《楞严经新表解》
 
原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经
 
Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0
 
11조
11조11조
11조
 
S'more fun
S'more funS'more fun
S'more fun
 

Similar to Social Network Analysis Introduction including Data Structure Graph overview.

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneDoug Needham
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
 
Intro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JIntro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JRay Lukas
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Intro to Graph Theory
Intro to Graph TheoryIntro to Graph Theory
Intro to Graph TheoryRay Lukas
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...Colin Panisset
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersRenaud Clément
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 

Similar to Social Network Analysis Introduction including Data Structure Graph overview. (20)

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
Intro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JIntro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4J
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
ML.pdf
ML.pdfML.pdf
ML.pdf
 
Intro to Graph Theory
Intro to Graph TheoryIntro to Graph Theory
Intro to Graph Theory
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for Beginners
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Social Network Analysis Introduction including Data Structure Graph overview.

  • 1. Social Network Analysis An overview Presentation by @dougneedham
  • 2. Introduction  @dougneedham  Data Guy - Started as a DBA in the Marine Corps, evolved to Architect, now Data Scientist.  Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark.  I have a strong relational/traditional background.  Perpetual Student  Learning new things challenges our assumptions. Forces us to take a new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.
  • 3. Why study social networks?  It is cool.  The concepts around Social Network Analysis can be applied to many interesting problems in a variety of business verticals.  The foundation of Social Network Analysis is Graph theory.  Solving Crime  Some examples: Introduction to Graph_Theory
  • 4. What is Social Network Analysis?  “Social network analysis (SNA) is a strategy for investigating social structures through the use of network and graph theories. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties or edges (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, friendship and acquaintance networks, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines.” – Wikipedia  https://en.wikipedia.org/wiki/Social_network_analysis
  • 5. Example From wiki: "Kencf0618FacebookNetwork" by Kencf0618 - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File: Kencf0618FacebookNetwork.jpg#/medi a/File:Kencf0618FacebookNetwork.jpg
  • 6. A little History  The 7 Bridges of Konisberg  Every tome on Graph theory or Network analysis devotes a small portion of there time to the 7 Bridges of Konisberg.  If I don’t cover this with you, the gods of mathematics will strike me down, and never allow me to do analysis again in the future.
  • 8. The Problem  Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another.  Eventually Leonhard Euler was brought into the debate about the efficiency problem.  Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable.  Sarada Herke provides for one of the best explanations of the solution Solution to Konisburg  And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory.  http://en.wikipedia.org/wiki/Leonhard_Euler
  • 9. Why analyze Facebook data?  Facebook is something that most people use.  It is easy to see the relationships and the concepts of the Graph/Network are intuitive to people who are looking at their “own” network.  The main idea is, if you can understand your own friend data, you can learn the concepts quickly, then apply these same concepts to more complicated problems.  We will talk a little about some complicated topics at the end.
  • 10. A few terms  Stand back, we are going to talk about math!  Basically we are talking about a bunch of dots joined together by lines  Vertex – Dot on a graph  Edge – Line connecting the two points  Edge_Label – this is a term I coined originally related to Data Structure Graphs that helps trace a path. If you label your edges, and you have multiple edges with the same label in a Graph you can quite easily identify walks, paths, and cycles through your graph.  Triangle – 3 Vertices, 3 Edges  Square – 4 Vertices, 4 edges  Open Triangle - 3 Vertices, 2 edges /  A lot of things are networks if you look at them the right way.  Mark Newman has done a number of well done presentations, available on Youtube about Network analysis.  https://www.youtube.com/watch?v=lETt7IcDWLI
  • 11. More terms  Transitivity – The friend of my friend is my friend. Really?  Homophily – how things are similar  Directed Graphs – or Digraphs  Contagion – How do things “spread” through a network?  Let’s rearrange things, how does the layout affect understanding?  Order of a graph – number of vertices  Size of the graph – number of edges  This is not just data visualization, it can also be used for prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU
  • 12. Final terms  Centrality – Hub and Authority  This is almost a whole topic by itself, since there are different types of Centrality:  Degree Centrality – Simple, the Vertex with the most degrees is the most central.  Eigenvector Centrality – How important a particular Vertex is to a given network.  PageRank – similar to Eigenvector Centrality, only scaled, and if a given vertex is closely connected to very high PageRank vertex, it is itself given a high PageRank.  Serious nutshell definitions.  Shortest path – How are two vertices connected?  Longest Path – Tracing the flow of an interesting item through a large collection of applications.
  • 13. Why is a path important? More on this later… The Original Joke This is me in different stores
  • 14. The Math doesn’t change.  One thing I like about Graphs –  The Math does not change.  The math behind Graph theory can be a little intense, but it does not change regardless of the scale of the graph.  Once you understand how to “do the math” on a small graph, those same Maths apply to a Graph whether it is a graph of the people in this room, or a graph of the people on this planet.  Now, let me introduce you to a tool that does much of the Mathematics for you…
  • 15. But first, Netvizz…  Netvizz is a tool that extracts data from different sections of the Facebook Platform.  It provides an interface to the Facebook Graph API  https://www.youtube.com/watch?v=3vkKPcN7V7Q  For the version of data we will be looking at, I was able to extract friendship connections. Facebook has since changed their permissions such that you can no longer extract this information.  However, there are some other interesting things you can do with Netvizz.  If you manage a Facebook Group, this might be interesting.  For this particular talk we are going to focus on Gephi interpretation. If we want to have a more in-depth talk on Facebook and the Graph API that Facebook has opened, we can discuss that at another time.  To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You can un-authorized it later)  You will have a number of options: group data, page data, page like network, search, and link stats.  Click “group data”  Select a group if you need a sample id use: 39462256584  It runs for a bit, then dumps to a zip file.  Save the file, then extract it.  Open Gephi, and use Gephi to import your GDF file.
  • 16. Gephi http://gephi.github.io/ From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.” Java 1.7 required, you may have to set this in Gephi.conf Depending on the size of the network you are studying you may need to increase the memory available to Java in Gephi.conf
  • 18. Gephi – Open GML file
  • 19. Gephi – After opening
  • 24.
  • 25. Metrics  Remember all those numbers we spoke about?  Here are many of them.
  • 28. Here is the layout with the labels as number of connections
  • 32. How do we use this?  Finding bottlenecks.  You have to ignore the fact that everyone on this graph is connected to you for a moment.  How would someone get a message to another given person?  They would have to pass it to someone either they both know, or pass the message to someone who is more likely to be connected to the target of the message.  This was the heart of Milgram’s experiment that gave us the concept of 6 degrees of separation.
  • 33. Other Analysis  What else can be done with Social Network Analysis?  How about risk exposure to banks?  http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm
  • 34.
  • 35. Application to Business Intelligence  What if the Vertices are not people ?  What if the Edges are not mutual connections?  Jonathan and others over the past few meetings have done a great job at explaining the underpinnings of how a particular BI framework is put together.  Within a Data Architecture there are lots of moving pieces. ETL, FTP, SFTP, Web-Services, External data feeds. Data moving into Data Marts, and Data Warehouses. Data Moving between applications.  Let’s imagine how to visualize this using the information we just gained.
  • 36. Data Structure Graph  A Data Structure Graph is a group of atomic entities that are related to each other, stored in a repository, then moved from one persistence layer to another, rendered as a Graph.  A group of atomic entities.  Related to each other.  Stored in a repository.  Moved from one persistence layer to another.  Rendered as a Graph.
  • 37. Introducing Data Structure Graphs  Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.  Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.  Data Structure Graph Dependency (DSG-D) – Each vertex is a job,script, program, or process that is dependent on something happening in sequence before it can do its work.  A DSG-L1 can show you where you are going to have the most interesting query performance of your tables.  A DSG-L2 can show you where the most amount of work is going on in your Enterprise.  A DSG-D can show you the sequence of events that need to take place in order for something to be completed.
  • 38. New Project, Data Table, Import data.
  • 39. Load as “Edges Table” Source, Target (required)
  • 41. After a few calculations and layout runs
  • 42. PageRank – Which application is most important?
  • 43. A few more tweaks
  • 44. Where is that Node with the highest PageRank?
  • 45. Remember paths? The Original Joke This is me in different stores
  • 46. Dijkstra's algorithm  Some of you may have heard of Dijkstra’s algorithm.  It is a method for finding the shortest path between two nodes on a Graph.  This is a great optimization technique, but what if you need to find the longest path?  What “edge_label” has the most influence on my organization?  Iterate through each Edge_Label, create a subgraph that consists of only the nodes this Edge_Label touches, then calculate the diameter of that Graph.  The data point represented by a given Edge_label that has the longest path has the most “value” to your organization.
  • 47. https://dougneedham.shinyapps.io/DataStructureGraph Hard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You can see how an individual data entity flows through an organization.
  • 48. My book Goes through a number of examples for doing an Graph analysis of a fictional organization.
  • 49. Consider the following:  If you need assistance, send a message to the group, or contact me directly (I am easy to find @dougneedham)  Network/Graph Analysis is cool.  It can show you some interesting things about your data that you may not have considered.  Due thought should be put towards a network analysis project.  Organizing the data requires a bit of thought. (From -> To vertices is just a start).  Directed graph, undirected, bigraph? Setup work needs to be done.  Tools help with the detailed calculations, and show the paths, walks, etc.
  • 50. What did I leave out?  Graphs that change over time – What happens when you remove a single Edge or Vertex?  Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random versus Preferential Attachment)  Scale Free networks – Graphs that conform to Power laws. (These are intrinsically Social Networks, but I didn’t give much detail)  Comparing two networks – If you have the same number of edges and nodes, are two graphs the same? Is one graph an isomorphism of another?  Contagion – Ceteris paribus how will things(information, virus’s, data,disease…) spread through the network. (Since a DSG represents different types of Edges based on Edge_Label, Contagion should not affect this type of network entirely.)  Large Graphs – GraphX a part of Apache Spark is best used for this purpose.  The strength of Weak Ties Paradox  Social Capital
  • 51. Finally… Want to do Data Science?  Challenge for members of the audience.  1. Download Gephi.  2. Put together a simple CSV: Source, Target,Edge_Label that describes your own data environment.  3. Load it in Gephi and have Gephi run the metrics, and perform the auto layout.  4. Answer this question: Did you get what you expected?  5. Get a colleague to do the same thing, compare the images. How similar are they?  Here is my hypothesis: If you have more than 5 data applications, including Hadoop, and Data Warehouse infrastructure, your Graph will follow the rules of preferential attachment. (To<->From ETL tools don’t count in the analysis)  Tweet me @dougneedham #DataStructureGraph (anonymized, of course.)  What does your Graph look like?
  • 52. Final Thoughts – Questions?