1
August 22, 2014
Big Data & Analytics - Introduction
Faculty Development Program
@BIET, Davangere
Prasad Chitta
2
#FDPBigDataAnalytics
Discussion Topics
∞
•Data & Processing – small and BIG
∞
•Big Data, Data Science and Art
∞
•Analytics and Optimization
3
#FDPBigDataAnalytics
Data – A historic perspective
4
#FDPBigDataAnalytics
The data processing lifecycle
Sensing
Acquiring,
Validating
Storing
Transactional
Update
Operational
Reporting,
Dashboards
ETL,
Warehousing
OLAP reporting
Analytics
Archiving &
Purging
5
#FDPBigDataAnalytics
Aspects of ‘Data’
Data
Meta data
Master Data
Reference
Data
Integration,
Migration
Quality
Visualization
6
#FDPBigDataAnalytics
Data Scenarios…
• New product design
• Simulation
• Knowledge
representation
No Data
• From normalized
OLTP systems
• Variables , mostly
numbers
Structured
Data • Unstructured
• Quickly varying
• Mostly alpha-
numeric
BIG data
7
#FDPBigDataAnalytics
Processing of data
Serial, bring
data to process,
traditional
Parallel, take
process to data,
modern
8
#FDPBigDataAnalytics
The Data Explosion
http://pennystocks.la/internet-in-real-time/
9
#FDPBigDataAnalytics
Landing &
Staging
Integration
Store
Semantic
(Logical &
Physical)
In-Memory
Databases
Visualization
Tools &
Framework
System of
Records
ETL / ELT
Big Data – Ingestion to insights
10
#FDPBigDataAnalytics
The Big Data Landscape
11
#FDPBigDataAnalytics
Analytical Processing of Data
Operational
Reporting /
MI
OLAP / BI / ETL
Analytics
Content
(Unstructured)
Structured
Analytics
Descriptive (Uni
or bivariate)
Diagnostic or
Inquisitive
Discovery
Predictive
Predictive
Statistical Techniques Machine Learning
12
#FDPBigDataAnalytics
Analytics Landscape Overview
SQL Analytics Descriptive Analytics Data Mining Predictive Analytics Simulation Optimization
Count Univariate Distribution Association Rules Classification Montecarlo Linear Optimization
Mean Central Tendency Clustering Regression Agent based modeling Non-linear Optimization
OLAP Dispersion Feature Extraction Forecasting Discrete Event modeling
Spatial
Machine Learning
Text Analytics
BI Advanced Analytics
13
#FDPBigDataAnalytics
Business Value - Analytics Matrix
OLAP Reporting
Drill-thru
Drill-Across
Insights/Limited What-if
Actionable insights
Descriptive Modeling
Describe historical event
Predictive Modeling
Baseline Demand
Impact of Causal Factors
BusinessValue
Optimization
Linear/Non-linear
programming & Simulations
Standard Reporting
Sales, Inventory, Business
Performance
Data Management
Internal, Syndicated,
Decision Support Decision Guidance  Advanced analytics
Why something happened?
What will happen?
What is the best that can happen?
What happened?
A
n
a
l
y
t
i
c
s
R
T
B
I
DSS
DSS – Decision Support Systems,
RTBI – Real Time Business Intelligence
Analytics Value Chain
14
#FDPBigDataAnalytics
Focus Areas for Insurance Analytics
Focus Areas for Insurance Analytics
Marketing Analysis
•Customer Lead Management
•Campaign Management
•Channel Profitability Analysis
•Social Media Analytics
Customer Management
• Customer Segmentation
• Customer Churn Analysis
• Lifetime Value Analytics
• Cross-sell & Up Sell Analytics
Claims Management
• Fraud Analytics & Models
• Subrogation Models
• Claims Analysis
Sample KPI and Business Drivers
• Lead conversion rate
• Channel ROI or Effective ness
• Market share for each channel
• Customer Satisfaction Index
• Profiling of customers
• Customer Attrition/Retention Rate
• % of Repeat Business from customer
• Customer Net worth and Life time value
• Loss due to Fraudulent claims
• Loss ratios
• Claims Process Cycle ratios
• Claims reserves and Provisions
Underwriting / Risk Management
• Risk Assessment and Evaluation
• Automated Underwritings
• Re Insurance Retention Analysis
• Underwriting Margins / Profit Margins
• Capacity required for Underwriters
• Improve the retentions and profit margins
Insurance Business Analytics for effective decision making by analysing the historic data
15
#FDPBigDataAnalytics
Traditional Analytics Process
Extracting and
consolidating data
from various
sources and
databases
Generating
Random samples
to create
Development &
Validation
Samples
Understanding the
data & nature of the
variables
 Distribution
 Relationships
 Differences
Cleansing &
Preparing the data
for Modeling:
 Outlier, Missing
Treatment
 Variable
Transformation,
Derivation
Model
Building
DB2DB1
Final
Modeling
Universe
Dev
70%
Val
30%
Data Consolidation SamplingDiagnostics Data Prep Model Building
16
#FDPBigDataAnalytics
Data Scientist - Skills needed
Business and Domain knowledge
Planning & Architecting Data Science Solutions
Statistical Modeling
Technology Stack – R, Hadoop
Text Mining, Social Network Analysis and Natural Language Processing
Methods and Algorithms in Machine Learning
Optimization and Decision Analysis
Story telling and Visualization
Privacy, Security and Ethical Concerns
17
#FDPBigDataAnalytics
Thank You. You can find me on….

Introduction to Big Data & Analytics

  • 1.
    1 August 22, 2014 BigData & Analytics - Introduction Faculty Development Program @BIET, Davangere Prasad Chitta
  • 2.
    2 #FDPBigDataAnalytics Discussion Topics ∞ •Data &Processing – small and BIG ∞ •Big Data, Data Science and Art ∞ •Analytics and Optimization
  • 3.
  • 4.
    4 #FDPBigDataAnalytics The data processinglifecycle Sensing Acquiring, Validating Storing Transactional Update Operational Reporting, Dashboards ETL, Warehousing OLAP reporting Analytics Archiving & Purging
  • 5.
    5 #FDPBigDataAnalytics Aspects of ‘Data’ Data Metadata Master Data Reference Data Integration, Migration Quality Visualization
  • 6.
    6 #FDPBigDataAnalytics Data Scenarios… • Newproduct design • Simulation • Knowledge representation No Data • From normalized OLTP systems • Variables , mostly numbers Structured Data • Unstructured • Quickly varying • Mostly alpha- numeric BIG data
  • 7.
    7 #FDPBigDataAnalytics Processing of data Serial,bring data to process, traditional Parallel, take process to data, modern
  • 8.
  • 9.
  • 10.
  • 11.
    11 #FDPBigDataAnalytics Analytical Processing ofData Operational Reporting / MI OLAP / BI / ETL Analytics Content (Unstructured) Structured Analytics Descriptive (Uni or bivariate) Diagnostic or Inquisitive Discovery Predictive Predictive Statistical Techniques Machine Learning
  • 12.
    12 #FDPBigDataAnalytics Analytics Landscape Overview SQLAnalytics Descriptive Analytics Data Mining Predictive Analytics Simulation Optimization Count Univariate Distribution Association Rules Classification Montecarlo Linear Optimization Mean Central Tendency Clustering Regression Agent based modeling Non-linear Optimization OLAP Dispersion Feature Extraction Forecasting Discrete Event modeling Spatial Machine Learning Text Analytics BI Advanced Analytics
  • 13.
    13 #FDPBigDataAnalytics Business Value -Analytics Matrix OLAP Reporting Drill-thru Drill-Across Insights/Limited What-if Actionable insights Descriptive Modeling Describe historical event Predictive Modeling Baseline Demand Impact of Causal Factors BusinessValue Optimization Linear/Non-linear programming & Simulations Standard Reporting Sales, Inventory, Business Performance Data Management Internal, Syndicated, Decision Support Decision Guidance  Advanced analytics Why something happened? What will happen? What is the best that can happen? What happened? A n a l y t i c s R T B I DSS DSS – Decision Support Systems, RTBI – Real Time Business Intelligence Analytics Value Chain
  • 14.
    14 #FDPBigDataAnalytics Focus Areas forInsurance Analytics Focus Areas for Insurance Analytics Marketing Analysis •Customer Lead Management •Campaign Management •Channel Profitability Analysis •Social Media Analytics Customer Management • Customer Segmentation • Customer Churn Analysis • Lifetime Value Analytics • Cross-sell & Up Sell Analytics Claims Management • Fraud Analytics & Models • Subrogation Models • Claims Analysis Sample KPI and Business Drivers • Lead conversion rate • Channel ROI or Effective ness • Market share for each channel • Customer Satisfaction Index • Profiling of customers • Customer Attrition/Retention Rate • % of Repeat Business from customer • Customer Net worth and Life time value • Loss due to Fraudulent claims • Loss ratios • Claims Process Cycle ratios • Claims reserves and Provisions Underwriting / Risk Management • Risk Assessment and Evaluation • Automated Underwritings • Re Insurance Retention Analysis • Underwriting Margins / Profit Margins • Capacity required for Underwriters • Improve the retentions and profit margins Insurance Business Analytics for effective decision making by analysing the historic data
  • 15.
    15 #FDPBigDataAnalytics Traditional Analytics Process Extractingand consolidating data from various sources and databases Generating Random samples to create Development & Validation Samples Understanding the data & nature of the variables  Distribution  Relationships  Differences Cleansing & Preparing the data for Modeling:  Outlier, Missing Treatment  Variable Transformation, Derivation Model Building DB2DB1 Final Modeling Universe Dev 70% Val 30% Data Consolidation SamplingDiagnostics Data Prep Model Building
  • 16.
    16 #FDPBigDataAnalytics Data Scientist -Skills needed Business and Domain knowledge Planning & Architecting Data Science Solutions Statistical Modeling Technology Stack – R, Hadoop Text Mining, Social Network Analysis and Natural Language Processing Methods and Algorithms in Machine Learning Optimization and Decision Analysis Story telling and Visualization Privacy, Security and Ethical Concerns
  • 17.