Data Analytics
Dr. Vala Ali Rohani
Vala@um.edu.my
VRohani@gmail.com
My Bio Data
• Postdoctoral scholar in Social Network Analysis
• PhD in Software Engineering (Recommender Systems)
• Social Network Analysis from University of Michigan
Professional Certificates:
• University Lecturer for more than 10 years
• Mining Massive Datasets from Stanford University
• Pattern Discovery in Data Mining from Illinois University
• Process Mining from Eindhoven University of Technology
• Statistical Analysis using SPSS & SAS from University of Malaya
• MongoDB for DBAs from MongoDB
Data Analytics by Vala Ali Rohani
Presentation outline
Data Science & Big Data
Social Network Analysis
Process Mining
Market Basket Analysis
Data Analytics by Vala Ali Rohani
Data Analytics by Vala Ali Rohani
Data Analytics
&
Big Data
Domain Terminology
Data Analytics by Vala Ali Rohani
Data Science & Big Data
• Data Analysis, Data Mining, Machine Learning and Mathematical Modeling are
tools: means towards an end.
• Analytics, Business Intelligence, Econometrics and Artificial Intelligence are
application areas: domains that use the tools above (and others) to produce results
within its subject.
• Statistics is a branch of Mathematics providing theoretical and practical support to the
above tools.
• Data Science is a catch-all term to describe using those all tools to provide answers in
those all areas (and also in others), specially when dealing with Big Data
http://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-
and-Big-Data-1
Data Analytics by Vala Ali Rohani
Data is the New Oil!
In the last 10 minutes we generated more data than from prehistoric times until 2003!
Data Science & Big Data
Data Analytics by Vala Ali Rohani
A data scientist is able to collect, analyze, and interpret data
from a variety of sources (social interaction, business
processes, cyber-physical systems).
Turning data into value!
Data Science & Big Data
Data Analytics by Vala Ali Rohani
Four generic data science questions:
1. What happened?
2. Why did it happen?
3. What will happen?
4. What is the best that can happen?
Data Science & Big Data
Data Analytics by Vala Ali Rohani
Data Science & Big Data
Data Analytics by Vala Ali Rohani
Data Science & Big Data
Big data is a broad term for data sets so large or complex that traditional data processing
applications are inadequate.
http://en.wikipedia.org/wiki/Big_data
How Much Data?
1 PB = 1000000000000000B = 1015 bytes = 1000terabytes.
• Google processes 20 PB a day (2008)
• Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
• Each engine of Boeing 747 generates 20 TB of information per hour
Data Analytics by Vala Ali Rohani
Data Science & Big Data
Data Analytics by Vala Ali Rohani
Data Science & Big Data
Some Big Data Theories and Techniques
Map-Reduce
Market Basket Analysis
Pattern Discovery
Social Network Analysis
Process Mining
Data Analytics by Vala Ali Rohani
Social Network
Analysis
(SNA)
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Every thing is connected
When you sell items …
When you receive customer calls ... When you make a contract …
When you ship orders …
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
What are Networks?
• Networks are sets of nodes connected by edges
“Network” ≡ “Graph”
node
edge
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
What is SNA?
SNA (Social Network Analysis) is the mapping and measuring of relationships and
flows between people, groups, organizations, computers, URLs, and other
connected entities
SNA provides both a visual and a
mathematical analysis of human
relationships.
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Why do we need Social Network Analysis?
• Are nodes connected through the network?
• How far apart are they?
• Are some nodes more important due to their position in the
network?
• How will be the patterns for information diffusion?
• Is the network composed of communities?
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Now,
let’s see some samples of SNA …
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Internet
structure of the Internet at the level of autonomous systems. Data source: Mark
Newman http://www-personal.umich.edu/~mejn/netdata/.
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Political Blogs
2004 United States Presidential Election Network
Liberals
Conservatives
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Facebook Friendship Network
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA in Organizations (or ONA)
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA Metrics :
Degree
Betweenness
Closeness
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA Main Centrality Metrics
Degree
The number of direct connections that a node has
𝑑𝑖 =
𝑗 𝑎𝑖𝑗
(𝑛 − 1)
SNA Main Centrality Metrics
Betweenness
Betweenness centrality identifies an entity's position within a network in terms of its
ability to make connections to other pairs or groups in a network.
CB (i) = gjk (i)/gjk
j<k
å
CB
'
(i) = CB (i )/[(n -1)(n -2)/2]
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA Main Centrality Metrics
Closeness
Closeness centrality measures how quickly an entity can access more entities in a
network.
Cc (i) = d(i, j)
j=1
N
å
é
ë
ê
ê
ù
û
ú
ú
-1
CC
'
(i) = (CC (i))/(N -1)
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA Tools:
NodeXL
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA Tools:
Gephi
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
SNA Tools:
UCINET
Key nodes in Organization
(from ONA view)
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Find a node that has high betweenness but
low degree
Data Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Find a node that has low betweenness but
high degree
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Social Network Analysis (SNA)
Data Analytics by Vala Ali Rohani
Process Mining
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Process mining is the missing link between model-based process analysis and
data-oriented analysis techniques.
Process mining seeks the confrontation between event data (i.e., observed behavior)
and process models (hand-made or discovered automatically).
Some example applications include:
• Analyzing treatment processes in hospitals
• Improving customer service processes
• Understanding the browsing behavior of customers using a booking site
• Analyzing failures of a baggage handling system
What is Process Mining?
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
• What is the process that people really follow?
• Where are the bottlenecks in the studied process?
• Where do people (or machines) deviate from the expected or
idealized process?
• What are the "highways" in my process?
• What factors are influencing a bottleneck?
• Can we predict problems (delay, deviation, risk, etc.) for
running cases?
• Can we recommend some improvements for main process of the
organization?
• How to redesign the process / organization / machine?
Process mining use cases
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
https://www.coursera.org/course/procmin
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
Some Examples of Real Discovered Processes
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
Some Examples of Real Discovered Processes
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
Some Examples of Real Discovered Processes
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Process Mining
Some Examples of Real Discovered Processes
Data Analytics by Vala Ali Rohani
Market Basket
Analysis
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Market Basket Analysis
Introduction
Market Basket Analysis (MBA) is a data mining technique which is widely used in the
consumer package goods (CPG) industry to identify which items are purchased together
and, more importantly, how the purchase of one item affects the likelihood of another
item being purchased.
Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Market Basket Analysis
SALES TRANSACTIONS
Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis
Our imaginary store sales the following items: bananas, bologna, bread, buns, butter, cereal,
cheese, chips, eggs, hotdogs, mayo, milk, mustard, oranges, pickles, and soda. We have
recorded 20 sales transactions as follows:
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Market Basket Analysis
MBA Theories:
Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis
Support for itemset I = the number of baskets containing all items in I.
Given a support threshold s, sets of items that appear in at least s baskets are called
frequent itemsets.
Association rules are If‐then rules about the contents of baskets.
Confidence of this association rule is the probability of j given i1,…,ik.
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Market Basket Analysis
MBA Theories:
Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Market Basket Analysis
Market Basket Example
Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani
Thank you

Data Analytics

  • 1.
    Data Analytics Dr. ValaAli Rohani Vala@um.edu.my VRohani@gmail.com
  • 2.
    My Bio Data •Postdoctoral scholar in Social Network Analysis • PhD in Software Engineering (Recommender Systems) • Social Network Analysis from University of Michigan Professional Certificates: • University Lecturer for more than 10 years • Mining Massive Datasets from Stanford University • Pattern Discovery in Data Mining from Illinois University • Process Mining from Eindhoven University of Technology • Statistical Analysis using SPSS & SAS from University of Malaya • MongoDB for DBAs from MongoDB Data Analytics by Vala Ali Rohani
  • 3.
    Presentation outline Data Science& Big Data Social Network Analysis Process Mining Market Basket Analysis Data Analytics by Vala Ali Rohani
  • 4.
    Data Analytics byVala Ali Rohani Data Analytics & Big Data
  • 5.
    Domain Terminology Data Analyticsby Vala Ali Rohani Data Science & Big Data • Data Analysis, Data Mining, Machine Learning and Mathematical Modeling are tools: means towards an end. • Analytics, Business Intelligence, Econometrics and Artificial Intelligence are application areas: domains that use the tools above (and others) to produce results within its subject. • Statistics is a branch of Mathematics providing theoretical and practical support to the above tools. • Data Science is a catch-all term to describe using those all tools to provide answers in those all areas (and also in others), specially when dealing with Big Data http://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning- and-Big-Data-1
  • 6.
    Data Analytics byVala Ali Rohani Data is the New Oil! In the last 10 minutes we generated more data than from prehistoric times until 2003! Data Science & Big Data
  • 7.
    Data Analytics byVala Ali Rohani A data scientist is able to collect, analyze, and interpret data from a variety of sources (social interaction, business processes, cyber-physical systems). Turning data into value! Data Science & Big Data
  • 8.
    Data Analytics byVala Ali Rohani Four generic data science questions: 1. What happened? 2. Why did it happen? 3. What will happen? 4. What is the best that can happen? Data Science & Big Data
  • 9.
    Data Analytics byVala Ali Rohani Data Science & Big Data
  • 10.
    Data Analytics byVala Ali Rohani Data Science & Big Data Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. http://en.wikipedia.org/wiki/Big_data How Much Data? 1 PB = 1000000000000000B = 1015 bytes = 1000terabytes. • Google processes 20 PB a day (2008) • Facebook has 2.5 PB of user data + 15 TB/day (4/2009) • Each engine of Boeing 747 generates 20 TB of information per hour
  • 11.
    Data Analytics byVala Ali Rohani Data Science & Big Data
  • 12.
    Data Analytics byVala Ali Rohani Data Science & Big Data Some Big Data Theories and Techniques Map-Reduce Market Basket Analysis Pattern Discovery Social Network Analysis Process Mining
  • 13.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA)
  • 14.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) Every thing is connected When you sell items … When you receive customer calls ... When you make a contract … When you ship orders …
  • 15.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) What are Networks? • Networks are sets of nodes connected by edges “Network” ≡ “Graph” node edge
  • 16.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) What is SNA? SNA (Social Network Analysis) is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected entities SNA provides both a visual and a mathematical analysis of human relationships.
  • 17.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) Why do we need Social Network Analysis? • Are nodes connected through the network? • How far apart are they? • Are some nodes more important due to their position in the network? • How will be the patterns for information diffusion? • Is the network composed of communities?
  • 18.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) Now, let’s see some samples of SNA …
  • 19.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) Internet structure of the Internet at the level of autonomous systems. Data source: Mark Newman http://www-personal.umich.edu/~mejn/netdata/.
  • 20.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) Political Blogs 2004 United States Presidential Election Network Liberals Conservatives
  • 21.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) Facebook Friendship Network
  • 22.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) SNA in Organizations (or ONA)
  • 23.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) SNA Metrics : Degree Betweenness Closeness
  • 24.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) SNA Main Centrality Metrics Degree The number of direct connections that a node has 𝑑𝑖 = 𝑗 𝑎𝑖𝑗 (𝑛 − 1)
  • 25.
    SNA Main CentralityMetrics Betweenness Betweenness centrality identifies an entity's position within a network in terms of its ability to make connections to other pairs or groups in a network. CB (i) = gjk (i)/gjk j<k å CB ' (i) = CB (i )/[(n -1)(n -2)/2] Data Analytics by Vala Ali Rohani Social Network Analysis (SNA)
  • 26.
    SNA Main CentralityMetrics Closeness Closeness centrality measures how quickly an entity can access more entities in a network. Cc (i) = d(i, j) j=1 N å é ë ê ê ù û ú ú -1 CC ' (i) = (CC (i))/(N -1) Data Analytics by Vala Ali Rohani Social Network Analysis (SNA)
  • 27.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) SNA Tools: NodeXL
  • 28.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) SNA Tools: Gephi
  • 29.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA) SNA Tools: UCINET
  • 30.
    Key nodes inOrganization (from ONA view) Data Analytics by Vala Ali Rohani Social Network Analysis (SNA)
  • 31.
    Data Analytics byVala Ali Rohani Social Network Analysis (SNA)
  • 32.
    Find a nodethat has high betweenness but low degree Data Analytics by Vala Ali Rohani Social Network Analysis (SNA)
  • 33.
    Find a nodethat has low betweenness but high degree Data Analytics by Vala Ali RohaniData Analytics by Vala Ali Rohani Social Network Analysis (SNA)
  • 34.
    Data Analytics byVala Ali Rohani Process Mining
  • 35.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin
  • 36.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). Some example applications include: • Analyzing treatment processes in hospitals • Improving customer service processes • Understanding the browsing behavior of customers using a booking site • Analyzing failures of a baggage handling system What is Process Mining?
  • 37.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin • What is the process that people really follow? • Where are the bottlenecks in the studied process? • Where do people (or machines) deviate from the expected or idealized process? • What are the "highways" in my process? • What factors are influencing a bottleneck? • Can we predict problems (delay, deviation, risk, etc.) for running cases? • Can we recommend some improvements for main process of the organization? • How to redesign the process / organization / machine? Process mining use cases
  • 38.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin
  • 39.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin
  • 40.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin
  • 41.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin
  • 42.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining https://www.coursera.org/course/procmin
  • 43.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining Some Examples of Real Discovered Processes
  • 44.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining Some Examples of Real Discovered Processes
  • 45.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining Some Examples of Real Discovered Processes
  • 46.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Process Mining Some Examples of Real Discovered Processes
  • 47.
    Data Analytics byVala Ali Rohani Market Basket Analysis
  • 48.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Market Basket Analysis Introduction Market Basket Analysis (MBA) is a data mining technique which is widely used in the consumer package goods (CPG) industry to identify which items are purchased together and, more importantly, how the purchase of one item affects the likelihood of another item being purchased. Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis
  • 49.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Market Basket Analysis SALES TRANSACTIONS Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis Our imaginary store sales the following items: bananas, bologna, bread, buns, butter, cereal, cheese, chips, eggs, hotdogs, mayo, milk, mustard, oranges, pickles, and soda. We have recorded 20 sales transactions as follows:
  • 50.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Market Basket Analysis MBA Theories: Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis Support for itemset I = the number of baskets containing all items in I. Given a support threshold s, sets of items that appear in at least s baskets are called frequent itemsets. Association rules are If‐then rules about the contents of baskets. Confidence of this association rule is the probability of j given i1,…,ik.
  • 51.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Market Basket Analysis MBA Theories: Bill Qualls, First Analytics, Raleigh, NC, Introduction to Market Basket Analysis
  • 52.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Market Basket Analysis Market Basket Example
  • 53.
    Data Analytics byVala Ali RohaniData Analytics by Vala Ali Rohani Thank you