SlideShare a Scribd company logo
Social Network Analysis (SNA)
Ghulam Imaduddin
Definition
From the point of view of data mining, a social network is a heterogeneous and
multirelational data set represented by a graph. The graph is typically very
large, with nodes (or vertex) corresponding to objects and edges
corresponding to links representing relationships or interactions between
objects. Both nodes and links have attributes
(Han & Kamber, 2006).
2
Call, sms, IM, trf. Balance, …
mention, follow, like, …
subscriber subscriber
Benefit of SNA
3
Identify role of subscriber in
community:
• Community leader
• Bridge
• Passive
• Follower
Identify high value/prospect
community by looking at:
• Community size
• Closeness
• Member’s profile (device,
usage, ARPU, location)
• Onnet/Offnet share in
community
Suspected same
subscriber
Comparing two social network to
identify single identity of
subscriber. By comparing two
social network
Further
Utilization
• New product campaign, targeting community leader, bridge, and high value community
• Retention program prioritization for community leader, bridge, and high value community
• Product adoption campaign for follower in community that already adopt the product
• Identifying rotational churner to be excluded in retention campaign, or to evaluate dealer
• SN variable can be used to enhance another predictive model. For example: social network
variable can increase the lift of churn model for high value customer (Imaduddin, 2014)
Social Network Graph Mining
By mining the graph of social network, we can extract valuable information such
as:
• Degree (in-degree, out-degree, max-degree). Degree related to number of edge attached
to one vertex/node. Vertex with high number of in-degree means that vertex receive many
information from others, and vice versa.
• PageRank. PageRank measures the importance of each vertex in a graph. If a Twitter user
is followed by many others, the user will be ranked highly. For CDR based social network,
reverse the graph direction before use PageRank function to identify the important vertex
• Local clustering coefficient (LCC). LCC represent how close a customer’s network. The
higher the LCC, the closer the network. LCC calculation derived from triangle counting of
each vertex.
4
𝐿𝐶𝐶 =
#𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒
𝑛
2
, 𝑛 = #𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟
How To Build
5
6
Let’s get our hand dirty!
Graph Example
7
Graph Representation Data Representation
Script Example – Degree Information
8
Degree Information Result
9
Graph Representation
Result
(id, total-degree, in-degree, out-degree)
Script Example – PageRank
10
PageRank Result
11
Graph Representation
Result
(id, PageRank) (id, reverse PageRank)
Script Example – Triangle
12
Triangle Counting Result
13
Graph Representation
Result
(id, #triangle)
Solving Real World Problem
• Define the vertices. Is it subscriber, web pages, twitter account?
• Define the edge  how the vertices connected. E.g. total call minutes in a month > 5
minutes, sms > 10, etc
• Identify the mega hubs. Mega hubs is vertex that connected to massive amount of vertices
(something like call center or spammer). Mega hubs can be removed, or process separately
based on the problem.
• Identify the measure needed (PageRank, degree, LCC, triangle, etc)
• Build the data source (separate the vertex properties data and the connection data – join it
later), and put it distributed on hadoop.
• Build the code, run it, and feed the result back to data warehouse or hadoop for further
utilization
14
References & Resources
• Han, J., & Kamber, M. (2006). Data Mining Concepts and Techniques. San Francisco: Morgan Kaufmann.
• Imaduddin, G. (2014). Evaluation and Improvement of Churn Model Using Customer Value and Social
Network. Jakarta: Universitas Indonesia.
15
• Apache Spark Overview. https://spark.apache.org/docs/latest/
• Databricks Training Resources. https://databricks.com/spark-training-resources
• GraphX Programming Guide. https://spark.apache.org/docs/latest/graphx-
programming-guide.html
• Social Network Analysis. http://en.wikipedia.org/wiki/Social_network_analysis
• Spark Scala API Doc.
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.pac
kage
• The Scala Programming Language. http://www.scala-lang.org/
Appendix
16
List of Graph Operation in GraphX
17
List of Graph Operation in GraphX
18

More Related Content

What's hot

Ch 10: Attacking Back-End Components
Ch 10: Attacking Back-End ComponentsCh 10: Attacking Back-End Components
Ch 10: Attacking Back-End Components
Sam Bowne
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
Databricks
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
Luis Cabaceira
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
DataWorks Summit
 
Arquitectura Lambda
Arquitectura LambdaArquitectura Lambda
Arquitectura Lambda
Israel Gaytan
 
Alfresco tuning part2
Alfresco tuning part2Alfresco tuning part2
Alfresco tuning part2
Luis Cabaceira
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
HashiCorp Brand Guide
HashiCorp Brand GuideHashiCorp Brand Guide
HashiCorp Brand Guide
HashiCorp
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
Blind XSS & Click Jacking
Blind XSS & Click JackingBlind XSS & Click Jacking
Blind XSS & Click Jacking
n|u - The Open Security Community
 
4 Mapping the Application
4 Mapping the Application4 Mapping the Application
4 Mapping the Application
Sam Bowne
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent
 
CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)
CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)
CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)
Sam Bowne
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
A Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility CloakA Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility Cloak
Soroush Dalili
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics
 
In the Wake of Kerberoast
In the Wake of KerberoastIn the Wake of Kerberoast
In the Wake of Kerberoast
ken_kitahara
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
 

What's hot (20)

Ch 10: Attacking Back-End Components
Ch 10: Attacking Back-End ComponentsCh 10: Attacking Back-End Components
Ch 10: Attacking Back-End Components
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Arquitectura Lambda
Arquitectura LambdaArquitectura Lambda
Arquitectura Lambda
 
Alfresco tuning part2
Alfresco tuning part2Alfresco tuning part2
Alfresco tuning part2
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
HashiCorp Brand Guide
HashiCorp Brand GuideHashiCorp Brand Guide
HashiCorp Brand Guide
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Blind XSS & Click Jacking
Blind XSS & Click JackingBlind XSS & Click Jacking
Blind XSS & Click Jacking
 
4 Mapping the Application
4 Mapping the Application4 Mapping the Application
4 Mapping the Application
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)
CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)
CNIT 129S: 9: Attacking Data Stores (Part 1 of 2)
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
A Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility CloakA Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility Cloak
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
In the Wake of Kerberoast
In the Wake of KerberoastIn the Wake of Kerberoast
In the Wake of Kerberoast
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 

Similar to Social Network Analysis with Spark

Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network Context
IRJET Journal
 
The Spring 2018 Undergraduate Symposium Poster
The Spring 2018 Undergraduate Symposium PosterThe Spring 2018 Undergraduate Symposium Poster
The Spring 2018 Undergraduate Symposium Poster
Tanner Massahos
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
nishant24894
 
EVOLVING PATTERNS IN BIG DATA - NEIL AVERY
EVOLVING PATTERNS IN BIG DATA - NEIL AVERYEVOLVING PATTERNS IN BIG DATA - NEIL AVERY
EVOLVING PATTERNS IN BIG DATA - NEIL AVERY
Big Data Week
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
IRJET Journal
 
Social Network Analysis Using Gephi
Social Network Analysis Using Gephi Social Network Analysis Using Gephi
Social Network Analysis Using Gephi
Goa App
 
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Wright State University, Dayton, OH, USA
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
paperpublications3
 
SNAwithNeo4j
SNAwithNeo4jSNAwithNeo4j
SNAwithNeo4j
Sadhana Singh
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Turi, Inc.
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
 
User Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience ExpansionUser Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience Expansion
Databricks
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
Bharath123Maddipati
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
Wael Elrifai
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
Neo4j
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
Sateeshreddy N
 
ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...
ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...
ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...Asoka Korale
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
Bill Liu
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
IJSRD
 
Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...
IJMTST Journal
 

Similar to Social Network Analysis with Spark (20)

Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network Context
 
The Spring 2018 Undergraduate Symposium Poster
The Spring 2018 Undergraduate Symposium PosterThe Spring 2018 Undergraduate Symposium Poster
The Spring 2018 Undergraduate Symposium Poster
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
EVOLVING PATTERNS IN BIG DATA - NEIL AVERY
EVOLVING PATTERNS IN BIG DATA - NEIL AVERYEVOLVING PATTERNS IN BIG DATA - NEIL AVERY
EVOLVING PATTERNS IN BIG DATA - NEIL AVERY
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
 
Social Network Analysis Using Gephi
Social Network Analysis Using Gephi Social Network Analysis Using Gephi
Social Network Analysis Using Gephi
 
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
SNAwithNeo4j
SNAwithNeo4jSNAwithNeo4j
SNAwithNeo4j
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
User Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience ExpansionUser Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience Expansion
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
 
ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...
ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...
ATC full paper format-2014 Social Networks in Telecommunications Asoka Korale...
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
 
Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...
 

Recently uploaded

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Social Network Analysis with Spark

  • 1. Social Network Analysis (SNA) Ghulam Imaduddin
  • 2. Definition From the point of view of data mining, a social network is a heterogeneous and multirelational data set represented by a graph. The graph is typically very large, with nodes (or vertex) corresponding to objects and edges corresponding to links representing relationships or interactions between objects. Both nodes and links have attributes (Han & Kamber, 2006). 2 Call, sms, IM, trf. Balance, … mention, follow, like, … subscriber subscriber
  • 3. Benefit of SNA 3 Identify role of subscriber in community: • Community leader • Bridge • Passive • Follower Identify high value/prospect community by looking at: • Community size • Closeness • Member’s profile (device, usage, ARPU, location) • Onnet/Offnet share in community Suspected same subscriber Comparing two social network to identify single identity of subscriber. By comparing two social network Further Utilization • New product campaign, targeting community leader, bridge, and high value community • Retention program prioritization for community leader, bridge, and high value community • Product adoption campaign for follower in community that already adopt the product • Identifying rotational churner to be excluded in retention campaign, or to evaluate dealer • SN variable can be used to enhance another predictive model. For example: social network variable can increase the lift of churn model for high value customer (Imaduddin, 2014)
  • 4. Social Network Graph Mining By mining the graph of social network, we can extract valuable information such as: • Degree (in-degree, out-degree, max-degree). Degree related to number of edge attached to one vertex/node. Vertex with high number of in-degree means that vertex receive many information from others, and vice versa. • PageRank. PageRank measures the importance of each vertex in a graph. If a Twitter user is followed by many others, the user will be ranked highly. For CDR based social network, reverse the graph direction before use PageRank function to identify the important vertex • Local clustering coefficient (LCC). LCC represent how close a customer’s network. The higher the LCC, the closer the network. LCC calculation derived from triangle counting of each vertex. 4 𝐿𝐶𝐶 = #𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒 𝑛 2 , 𝑛 = #𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟
  • 6. 6 Let’s get our hand dirty!
  • 8. Script Example – Degree Information 8
  • 9. Degree Information Result 9 Graph Representation Result (id, total-degree, in-degree, out-degree)
  • 10. Script Example – PageRank 10
  • 11. PageRank Result 11 Graph Representation Result (id, PageRank) (id, reverse PageRank)
  • 12. Script Example – Triangle 12
  • 13. Triangle Counting Result 13 Graph Representation Result (id, #triangle)
  • 14. Solving Real World Problem • Define the vertices. Is it subscriber, web pages, twitter account? • Define the edge  how the vertices connected. E.g. total call minutes in a month > 5 minutes, sms > 10, etc • Identify the mega hubs. Mega hubs is vertex that connected to massive amount of vertices (something like call center or spammer). Mega hubs can be removed, or process separately based on the problem. • Identify the measure needed (PageRank, degree, LCC, triangle, etc) • Build the data source (separate the vertex properties data and the connection data – join it later), and put it distributed on hadoop. • Build the code, run it, and feed the result back to data warehouse or hadoop for further utilization 14
  • 15. References & Resources • Han, J., & Kamber, M. (2006). Data Mining Concepts and Techniques. San Francisco: Morgan Kaufmann. • Imaduddin, G. (2014). Evaluation and Improvement of Churn Model Using Customer Value and Social Network. Jakarta: Universitas Indonesia. 15 • Apache Spark Overview. https://spark.apache.org/docs/latest/ • Databricks Training Resources. https://databricks.com/spark-training-resources • GraphX Programming Guide. https://spark.apache.org/docs/latest/graphx- programming-guide.html • Social Network Analysis. http://en.wikipedia.org/wiki/Social_network_analysis • Spark Scala API Doc. https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.pac kage • The Scala Programming Language. http://www.scala-lang.org/
  • 17. List of Graph Operation in GraphX 17
  • 18. List of Graph Operation in GraphX 18