SlideShare a Scribd company logo
1 of 69
Download to read offline
Company Confidential - For Internal Use Only
Copyright © 2017, SAS Institute Inc. All rights reserved.
ANALYTICS TOOLS AND METHODS:
PRACTITIONER PERSPECTIVES
Guest Lecturer
Scott Allen Mongeau
Data Scientist
Cyber Analytics
Cell: + 31 (0)6 8370 3097
scott.mongeau@sas.com
BIG DATA AND BUSINESS ANALYTICS
Masters of Business and Information Management
2016 - 2017
dr Jan van Dalen
2
2
2
Education
• PhD (ABD)
• MBA
• MA Financial Mgmt
• Cert. Finance
• GD IT Mgmt
• MA Com Tech
Experience
• SAS Institute
Sr. Mgr. Business Solutions
• Deloitte
Manager Analytics
• Nyenrode University
Lecturer Analytics
• SARK7
Owner / Principal Consultant
• Genentech Inc. / Roche
Principal Analyst / Sr. Mgr.
• Atradius
Sr. R&D Engineer
• CFSI
CIO
Data Scientist
Cyber Analytics
scott.mongeau@sas.com
+31 (0)64 235 3427
Scott Allen Mongeau
Certified Analytics Professional (CAP)
YouTube
• Introduction to Advanced Analytics
• Introduction to Cognitive Analytics
• TedX RSM: Data Analytics
Blog: sctr7.com
Twitter: sark7
Web: sark7.com
IT solutions
Research
methods
Finance
Data
analytics
Consulting
3
40 #1
14,000
93
80,000+
US $ 3.2 B
23%
SAS employees worldwide
of the top
100companieson the
GLOBAL
500 LIST
Annual reinvestment in
R&D
Continuous Revenue
Growth since 1976
Years of
BUSINESS
ANALYTICS
World’s
privately held
software company
LARGEST
Customer sites in 148 countries
DATA
ANALYTICS MARKET LEADER
4
Copyright © 2017, SAS Institute Inc. All rights reserved.
FORECASTING
DATA MINING /
MACHINE LEARNING
TEXT ANALYTICS
OPTIMIZATION
STATISTICS
Finding treasures in unstructured data
like social media or survey tools
that could uncover insights
about key business challenges
Mine transaction databases
to create models of likely
outcomes
Leveraging historical data
to drive better insight into
proactive decision-making
Analyze massive
amounts of data in
order to accurately
identify areas likely to
produce the most
profitable results
ANALYTICS SOLUTIONS
Data Management
(Integration, Quality &
Governance)
5MOORE’S LAW: EXPONENTIAL GROWTH OF COMPUTING POWER
5
25,000 x
Home computers
High-capacity servers
Smartphone
explosion
Cloud, AI / Watson, IoT
2015
66
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
7
PEOPLE & ORGANIZATION:
DATA SCIENTIST ROLE
88
99
Calvin.Andrus (2012) http://en.wikipedia.org/wiki/File:DataScienceDisciplines.png
SEEKING THE
‘DATA SCIENTIST’
10
10
DATA SCIENCE
PROFESSIONAL
PERSPECTIVES
http://www.oreilly.com/data/free/2016-data-science-salary-survey.csp
11
12
13
14
1515
15
DATA SCIENCE AS
PEOPLE/ROLES
1616
16
DATA ANALYTICS
• Data science
• Statistician
• Data miner / machine
learning
• Text analytics / mining
BUSINESS ANALYTICS
• Business analyst
• BI solutions
• Visualization / interface design
• Functional domain specialty
(i.e. marketing analytics)
DATA MANAGEMENT
• Information / data architecture
• Database management
• Data engineering
• Data quality / governance / MDM
OPERATIONS
• Analytics engineering / operations
• Security
• IT systems management
BUSINESS / ORGANIZATIONAL
• Decision Management
• Change management
• Analytics project management
• Domain expert / functional
specialty / business manager
DATA SCIENCE
PEOPLE/ROLES
17
CORE DATA SCIENCE SKILLSET
17
IT
• BI/reports/dashboards
• Programming
• Systems/software dev
• Algorithms
• Systems administration
• User interface
design/visualization
Mathematics
• Econometrics
• Graph analysis
• Matrix mathematics
• Multivariate analysis
• Probability
• Survival analysis
• Statistics
• Spatial analysis
• Temporal analysis
Business Domain
• Finance
• Operations
• Sales/marketing
• HR
Data Engineering
• Big & fast data solutions
• Data manipulation/ETL
• Database design
• Data structures
• Graphical
• NOSQL
• Unstructured data
Data Science
• Machine Learning
• Optimization
• Predictive analytics
• Simulation
• Text/semantic analytics
Research
• Scientific method
• Experimental design
• Research methodologies
• Social science methods
• Survey research
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
18
TECHNOLOGY & TOOLS:
DATA ANALYTICS TOOLS & TECH
1919
2020
APPLIED TECHNIQUES & TECHNOLOGIES
20
• Algorithms
(ex: computational complexity, CS theory)
• Back-end programming
(ex: JAVA/Rails/Objective C)
• Bayesian/Monte-Carlo statistics
(ex: MCMC, BUGS)
• Big and distributed data
(ex: Hadoop, Map/Reduce)
• Business
(ex: management, business development, budgeting)
• Classical statistics
(ex: general linear model, ANOVA)
• Data manipulation
(ex: regexes, R, SAS, web scraping)
• Front-end programming
(ex: JavaScript, HTML, CSS)
• Graphical models
(ex: social networks, Bayes networks)
• Machine learning
(ex: decision trees, neural nets, SVM, clustering)
• Math
(ex: linear algebra, real analysis, calculus)
• Optimization
(ex: linear, integer, convex, global)
• Science
(ex: experimental design, technical writing/publishing)
• Simulation
(ex: discrete, agent-based, continuous)
• Solutions development
(ex: design, project management)
• Spatial statistics
(ex: geographic covariates, GIS)
• Structured data
(ex: SQL, JSON, XML)
• Surveys and marketing
(ex: multinomial modeling)
• Systems administration
(ex: *nix, DBA, cloud tech.)
• Temporal statistics
(ex: forecasting, time-series analysis)
• Unstructured data
(ex: noSQL, text mining)
• Visualization
(ex: statistical graphics, mapping, web-based dataviz)
SOURCE: “Analyzing the Analyzers”
http://www.datasciencecentral.com/profiles/blogs/how-
to-become-a-data-scientist?overrideMobileRedirect=1
21
21
DATA SCIENCE
TECHNOLOGIES
TOOLS:
MULTIPLE TOOLS
22
22
DATA SCIENCE
TECHNOLOGIES
TOOLS:
MULTIPLE TOOLS
23
23
DATA SCIENCE
TECHNOLOGIES
TOOLS:
DBMS AND HADOOP...
24
25
26
27
27
DATA SCIENCE
TECHNOLOGIES
28
28
DATA SCIENCE
TECHNOLOGIES
2929
Enterprise Big Data
Browser Open Source
TOOLS
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
30
PROCESS & METHODS:
DATA ANALYTICS
3131
3232
3333
VALUE
SOPHISTICATION
DESCRIPTIVE
PREDICTIVE
PRESCRIPTIVE
What
happened?
What are
trends?
What to do?
3434
VALUE
SOPHISTICATION
DESCRIPTIVE
PREDICTIVE
PRESCRIPTIVE
Business
Intelligence (BI)
Econometrics
Forecasting
Machine Learning
Operations
Management
3535
business valueTransactional
analyticsmaturity
Strategic
Advanced Analytics
DESCRIPTIVE
DIAGNOSTICS
PREDICTIVE
PRESCRIPTIVE
Identifying
Factors & Causes
AspirationalTransformed
Optimizing
Systems
Understanding
Social Context
& Meaning
SEMANTIC
Data
visualization
DATA QUALITY
Business
Intelligence
Understanding
Patterns
Forecasting &
Probabilities
3636
CRISP DM
Provost; Fawcett. Data Science for Business
Chapter 2: Business Problems and Data Science Solutions
37
37
SAS ANALYTICS
LIFECYCLE
PROBLEM
FRAMING
DATA
SELECTION &
GATHERING
DATA
EXPLORATION
TRANSFORM &
SELECT
MODEL
BUILDING
MODEL
VALIDATION
MODEL
DEPLOYMENT
EVALUATE &
MONITOR
RESULTS
FRAMING &
DISCOVERY
EXPLANATION
& PREDICTION
3838
Fair use: illustrate publication and article of issue in question. The Economist.
http://en.wikipedia.org/wiki/Category:Fair_use_The_Economist_magazine_covers
38
3939
Wikipedia commons http://en.m.wikipedia.org/wiki/File:Mond-vergleich.svg
4040
Scientific test…
4141
41
Public domain Agricultural Research Service
http://en.wikipedia.org/wiki/File:Orange_juice_1.jpg
GNU Free Documentation License: Ibanix Suzuki Shahid DL650 motorcycle
http://commons.wikimedia.org/wiki/File:Suzuki_vstrom_dl650_motorcycle.jpg
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
42
PREDICTIVE ANALYTICS:
SUPERVISED MACHINE LEARNING
43
Supervised learning - predictive
• K-Means
• Decision Trees (DT)
(random forests, boosted trees)
• Naïve Bayes classifier
• Neural networks
• Support Vector Machine (SVM)
• Ensembles / Ensemble Learning
Decision Tree
Machine Learning
Support Vector Machines
4444
MACHINE LEARNING PREDICTION (SUPERVISED)
CAR Engine
Training set Validation set
Non-criminal Criminal
NORMAL UNUSUAL
Device
Time of day
Source
location
IP
Threat
intelligence
Amount
At risk
profile
Destination
location
Secure
profile
Known
devices
Average
amount
Known
location
Known
destination
45
45
EXAMPLE MACHINE LEARNING TOOLS
Open source
•R
•Python
•Weka
Commercial
• SAS BASE & JMP
• SAS Enterprise Miner
• IBM SPSS
• Oracle Data Mining
• Rapid Miner
Ranjit Bose, (2009),"Advanced analytics: opportunities and challenges",
Industrial Management & Data Systems, Vol. 109 Iss 2 pp. 155 - 172
http://dx.doi.org/10.1108/02635570910930073
4646
MACHINE LEARNING
ENGINES
WEKA SAS Enterprise Miner
47
47
DEMO: SAS ENTERPRISE MINER
Workflow
Configuration
Models / utilities
Data
IDE
4848
• Data preparation
• Model development
• Model management
• Model deployment
http://www.sas.com/en_gb/insights/articles/analytics/
Industrialize-your-analytics-today.html
4949
business valueTransactional
analyticsmaturity
Strategic
Advanced Analytics
DESCRIPTIVE
DIAGNOSTICS
PREDICTIVE
PRESCRIPTIVE
Identifying
Factors & Causes
AspirationalTransformed
Optimizing
Systems
Understanding
Social Context
& Meaning
SEMANTIC
Data
visualization
DATA QUALITY
Business
Intelligence
Understanding
Patterns
Forecasting &
Probabilities
5050
CONFUSION
MATRIX
A confusion matrix
separates out the
decisions made by
the classifier,
making explicit how
one class is being
confused for
another. In this way
different sorts of
errors may be dealt
with separately.
Foster & Fawcett. Data Science for Business
What you need to know about data mining and data-analytic thinking: Chapter 7: Decision Analytic Thinking
5151
RECEIVER OPERATING
CHARACTERISTICS (ROC) &
AREA UNDER THE CURVE (AUC)
“A ROC graph is a two-
dimensional plot of a
classifier with false positive
rate on the x axis against
true positive rate on the y
axis.
ROC graph depicts relative
trade-offs that a classifier
makes between benefits
(true positives) and costs
(false positives).”
Provost; Fawcett. Data Science for Business
Chapter 8: Visualization Model Performance
Area Under the Curve (AUC):
area under a classifier’s curve
expressed as a fraction of the
unit square. Its value ranges
from zero to one.
5252
CUMULATIVE RESPONSE /
LIFT CURVE
• How much the line representing the
model performance is lifted up over
the random performance diagonal
Provost; Fawcett. Data Science for Business. Chapter 8: Visualizing Model Performance
• I.E. “our model gives a two times (or a 2X)
lift”: this means that at the chosen
threshold (often not mentioned), the lift
curve shows that the model’s targeting is
twice as good as random
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
53
DESCRIPTIVE ANALYTICS:
UNSUPERVISED MACHINE LEARNING
54
Unsupervised learning
• Cluster analysis
• Factor analysis
• Self-Organizing Maps (SOMs)
k-nearest neighbors
Machine Learning
55
R Studio
Workflow
Configuration Data
Results
Scripting
environment
Graphical results
Models
MACHINE LEARNING R / R Studio
5656
DESCRIPTIVE
(UNSUPERVISED):
CLUSTER ANALYSIS
FOR PATTERN
DETECTION
Cluster Analysis using
SAS Enterprise Guide
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
57
BIG DATA:
BACKGROUND AND EXAMPLE
58ONLINE IN
60 SECONDS…
Qmee
http://blog.qmee.com/qmee-online-
in-60-seconds/
59
DATA ANALYTICS DRIVERS: V4C
59
Social and mobile
Data analytics
Interactive platforms Real-Time systems
•VOLUME
•VELOCITY
•VARIETY
•VARIABILITY
•COMPLEXITY
V4C
60
• Cases where prediction is
not “deterministic”
• Bayes rate
• Theoretical maximum accuracy
that can be achieved for a
problem
60
MODEL ERRORS: INHERENT
RANDOMNESS
61
• Bias: even with ‘Big Data’, model will
never reach perfect accuracy of true
model
• Example
• Linear regression model to predict
response to an advertising campaign…
• Model is an abstraction…
• True model always
more complex
61
MODEL ERRORS: BIAS
62
• Variance: procedures with more variance tend to
produce models with larger errors
• Accuracy tends to vary across training sets
• Given finite sample set…
• Different models emerge
from different samples
• Different models tend to
have different accuracy
62
MODEL ERRORS: VARIANCE
63
Big Data
• Complex model
• Many variables
• Low bias…
• but high variance
• Subject to overfitting
63
BALANCE: BIAS VERSUS VARIANCE
Strong models
– Tested abstraction
– Few, but significant
variables
– Low variance…
– but high bias
Jno. T-62 tank in Russian service. http://www.aviation.ru/jno/Kubinka02
http://commons.wikimedia.org/wiki/File:T-62_tank_in_Russian_service_(2).jpg
6464
Statistical Learning with Big Data
http://web.stanford.edu/~hastie/T
ALKS/SLBD_new.pdf
6565
Statistical Learning with Big Data
http://web.stanford.edu/~hastie/T
ALKS/SLBD_new.pdf
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
66
EXPLANATION:
CAUSAL MODELING
67
• Explanatory performance NOT EQUAL to predictive efficacy (and vice versa),
difference between inductive and deductive methods/thinking
• This is a (sometimes heated) methodological debate amongst
practitioners/academics…
• Is it really a debate, or a religious (professional/Kuhnian) dispute? Econometrics
+ machine learning (H. Varian)
EXPLANATORY
ANALYTICS
68
• Varian, Hal R. 2014. Machine Learning and Econometrics. Stanford lecture slides:
https://web.stanford.edu/class/ee380/Abstracts/140129-slides-Machine-Learning-and-Econometrics.pdf
• Varian, Hal R. 2013. Big Data: New Tricks for Econometrics. Paper:
http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.pdf
MACHINE LEARNING
AND ECONOMETRICS
69
• Ensemble learning…
• Promising – averages over many predictive
cases to reduce impact of variance
• However, is CORRELATIVE, not CAUSAL
• CAUSAL data analysis requires
• Investment in data acquisition
• Similarity measurements
• Expected value calculations
• Correlation understanding
• Identifying informative variables
• Fitting equations to data
• Significance testing
• Domain knowledge
69
MODEL MANAGEMENT

More Related Content

What's hot

Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approachjoshwills
 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca
4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca
4° Sessione - Telemetria e internet delle cose nell'ambito della ricercaJürgen Ambrosi
 
How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18Cloudera, Inc.
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the Union2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the UnionCloudera, Inc.
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 

What's hot (16)

Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca
4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca
4° Sessione - Telemetria e internet delle cose nell'ambito della ricerca
 
How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the Union2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the Union
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Anaconda Data Science Collaboration
Anaconda Data Science CollaborationAnaconda Data Science Collaboration
Anaconda Data Science Collaboration
 

Similar to Internal Analytics Tools and Methods Overview

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data ScientistGaines Kergosien
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentDenodo
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018LoQutus
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platformHaoran Du
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?DATAVERSITY
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Data leaders summit 2019
Data leaders summit 2019Data leaders summit 2019
Data leaders summit 2019Harvinder Atwal
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...HostedbyConfluent
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Charlie Berger
 

Similar to Internal Analytics Tools and Methods Overview (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Kevin Resume
Kevin ResumeKevin Resume
Kevin Resume
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things Intelligent
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Data leaders summit 2019
Data leaders summit 2019Data leaders summit 2019
Data leaders summit 2019
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
Building a Data Driven Culture and AI Revolution With Gregory Little | Curren...
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Internal Analytics Tools and Methods Overview

  • 1. Company Confidential - For Internal Use Only Copyright © 2017, SAS Institute Inc. All rights reserved. ANALYTICS TOOLS AND METHODS: PRACTITIONER PERSPECTIVES Guest Lecturer Scott Allen Mongeau Data Scientist Cyber Analytics Cell: + 31 (0)6 8370 3097 scott.mongeau@sas.com BIG DATA AND BUSINESS ANALYTICS Masters of Business and Information Management 2016 - 2017 dr Jan van Dalen
  • 2. 2 2 2 Education • PhD (ABD) • MBA • MA Financial Mgmt • Cert. Finance • GD IT Mgmt • MA Com Tech Experience • SAS Institute Sr. Mgr. Business Solutions • Deloitte Manager Analytics • Nyenrode University Lecturer Analytics • SARK7 Owner / Principal Consultant • Genentech Inc. / Roche Principal Analyst / Sr. Mgr. • Atradius Sr. R&D Engineer • CFSI CIO Data Scientist Cyber Analytics scott.mongeau@sas.com +31 (0)64 235 3427 Scott Allen Mongeau Certified Analytics Professional (CAP) YouTube • Introduction to Advanced Analytics • Introduction to Cognitive Analytics • TedX RSM: Data Analytics Blog: sctr7.com Twitter: sark7 Web: sark7.com IT solutions Research methods Finance Data analytics Consulting
  • 3. 3 40 #1 14,000 93 80,000+ US $ 3.2 B 23% SAS employees worldwide of the top 100companieson the GLOBAL 500 LIST Annual reinvestment in R&D Continuous Revenue Growth since 1976 Years of BUSINESS ANALYTICS World’s privately held software company LARGEST Customer sites in 148 countries DATA ANALYTICS MARKET LEADER
  • 4. 4 Copyright © 2017, SAS Institute Inc. All rights reserved. FORECASTING DATA MINING / MACHINE LEARNING TEXT ANALYTICS OPTIMIZATION STATISTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges Mine transaction databases to create models of likely outcomes Leveraging historical data to drive better insight into proactive decision-making Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results ANALYTICS SOLUTIONS Data Management (Integration, Quality & Governance)
  • 5. 5MOORE’S LAW: EXPONENTIAL GROWTH OF COMPUTING POWER 5 25,000 x Home computers High-capacity servers Smartphone explosion Cloud, AI / Watson, IoT 2015
  • 6. 66
  • 7. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 7 PEOPLE & ORGANIZATION: DATA SCIENTIST ROLE
  • 8. 88
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 16. 1616 16 DATA ANALYTICS • Data science • Statistician • Data miner / machine learning • Text analytics / mining BUSINESS ANALYTICS • Business analyst • BI solutions • Visualization / interface design • Functional domain specialty (i.e. marketing analytics) DATA MANAGEMENT • Information / data architecture • Database management • Data engineering • Data quality / governance / MDM OPERATIONS • Analytics engineering / operations • Security • IT systems management BUSINESS / ORGANIZATIONAL • Decision Management • Change management • Analytics project management • Domain expert / functional specialty / business manager DATA SCIENCE PEOPLE/ROLES
  • 17. 17 CORE DATA SCIENCE SKILLSET 17 IT • BI/reports/dashboards • Programming • Systems/software dev • Algorithms • Systems administration • User interface design/visualization Mathematics • Econometrics • Graph analysis • Matrix mathematics • Multivariate analysis • Probability • Survival analysis • Statistics • Spatial analysis • Temporal analysis Business Domain • Finance • Operations • Sales/marketing • HR Data Engineering • Big & fast data solutions • Data manipulation/ETL • Database design • Data structures • Graphical • NOSQL • Unstructured data Data Science • Machine Learning • Optimization • Predictive analytics • Simulation • Text/semantic analytics Research • Scientific method • Experimental design • Research methodologies • Social science methods • Survey research
  • 18. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 18 TECHNOLOGY & TOOLS: DATA ANALYTICS TOOLS & TECH
  • 19. 1919
  • 20. 2020 APPLIED TECHNIQUES & TECHNOLOGIES 20 • Algorithms (ex: computational complexity, CS theory) • Back-end programming (ex: JAVA/Rails/Objective C) • Bayesian/Monte-Carlo statistics (ex: MCMC, BUGS) • Big and distributed data (ex: Hadoop, Map/Reduce) • Business (ex: management, business development, budgeting) • Classical statistics (ex: general linear model, ANOVA) • Data manipulation (ex: regexes, R, SAS, web scraping) • Front-end programming (ex: JavaScript, HTML, CSS) • Graphical models (ex: social networks, Bayes networks) • Machine learning (ex: decision trees, neural nets, SVM, clustering) • Math (ex: linear algebra, real analysis, calculus) • Optimization (ex: linear, integer, convex, global) • Science (ex: experimental design, technical writing/publishing) • Simulation (ex: discrete, agent-based, continuous) • Solutions development (ex: design, project management) • Spatial statistics (ex: geographic covariates, GIS) • Structured data (ex: SQL, JSON, XML) • Surveys and marketing (ex: multinomial modeling) • Systems administration (ex: *nix, DBA, cloud tech.) • Temporal statistics (ex: forecasting, time-series analysis) • Unstructured data (ex: noSQL, text mining) • Visualization (ex: statistical graphics, mapping, web-based dataviz) SOURCE: “Analyzing the Analyzers” http://www.datasciencecentral.com/profiles/blogs/how- to-become-a-data-scientist?overrideMobileRedirect=1
  • 24. 24
  • 25. 25
  • 26. 26
  • 29. 2929 Enterprise Big Data Browser Open Source TOOLS
  • 30. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 30 PROCESS & METHODS: DATA ANALYTICS
  • 31. 3131
  • 32. 3232
  • 35. 3535 business valueTransactional analyticsmaturity Strategic Advanced Analytics DESCRIPTIVE DIAGNOSTICS PREDICTIVE PRESCRIPTIVE Identifying Factors & Causes AspirationalTransformed Optimizing Systems Understanding Social Context & Meaning SEMANTIC Data visualization DATA QUALITY Business Intelligence Understanding Patterns Forecasting & Probabilities
  • 36. 3636 CRISP DM Provost; Fawcett. Data Science for Business Chapter 2: Business Problems and Data Science Solutions
  • 37. 37 37 SAS ANALYTICS LIFECYCLE PROBLEM FRAMING DATA SELECTION & GATHERING DATA EXPLORATION TRANSFORM & SELECT MODEL BUILDING MODEL VALIDATION MODEL DEPLOYMENT EVALUATE & MONITOR RESULTS FRAMING & DISCOVERY EXPLANATION & PREDICTION
  • 38. 3838 Fair use: illustrate publication and article of issue in question. The Economist. http://en.wikipedia.org/wiki/Category:Fair_use_The_Economist_magazine_covers 38
  • 41. 4141 41 Public domain Agricultural Research Service http://en.wikipedia.org/wiki/File:Orange_juice_1.jpg GNU Free Documentation License: Ibanix Suzuki Shahid DL650 motorcycle http://commons.wikimedia.org/wiki/File:Suzuki_vstrom_dl650_motorcycle.jpg
  • 42. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 42 PREDICTIVE ANALYTICS: SUPERVISED MACHINE LEARNING
  • 43. 43 Supervised learning - predictive • K-Means • Decision Trees (DT) (random forests, boosted trees) • Naïve Bayes classifier • Neural networks • Support Vector Machine (SVM) • Ensembles / Ensemble Learning Decision Tree Machine Learning Support Vector Machines
  • 44. 4444 MACHINE LEARNING PREDICTION (SUPERVISED) CAR Engine Training set Validation set Non-criminal Criminal NORMAL UNUSUAL Device Time of day Source location IP Threat intelligence Amount At risk profile Destination location Secure profile Known devices Average amount Known location Known destination
  • 45. 45 45 EXAMPLE MACHINE LEARNING TOOLS Open source •R •Python •Weka Commercial • SAS BASE & JMP • SAS Enterprise Miner • IBM SPSS • Oracle Data Mining • Rapid Miner Ranjit Bose, (2009),"Advanced analytics: opportunities and challenges", Industrial Management & Data Systems, Vol. 109 Iss 2 pp. 155 - 172 http://dx.doi.org/10.1108/02635570910930073
  • 47. 47 47 DEMO: SAS ENTERPRISE MINER Workflow Configuration Models / utilities Data IDE
  • 48. 4848 • Data preparation • Model development • Model management • Model deployment http://www.sas.com/en_gb/insights/articles/analytics/ Industrialize-your-analytics-today.html
  • 49. 4949 business valueTransactional analyticsmaturity Strategic Advanced Analytics DESCRIPTIVE DIAGNOSTICS PREDICTIVE PRESCRIPTIVE Identifying Factors & Causes AspirationalTransformed Optimizing Systems Understanding Social Context & Meaning SEMANTIC Data visualization DATA QUALITY Business Intelligence Understanding Patterns Forecasting & Probabilities
  • 50. 5050 CONFUSION MATRIX A confusion matrix separates out the decisions made by the classifier, making explicit how one class is being confused for another. In this way different sorts of errors may be dealt with separately. Foster & Fawcett. Data Science for Business What you need to know about data mining and data-analytic thinking: Chapter 7: Decision Analytic Thinking
  • 51. 5151 RECEIVER OPERATING CHARACTERISTICS (ROC) & AREA UNDER THE CURVE (AUC) “A ROC graph is a two- dimensional plot of a classifier with false positive rate on the x axis against true positive rate on the y axis. ROC graph depicts relative trade-offs that a classifier makes between benefits (true positives) and costs (false positives).” Provost; Fawcett. Data Science for Business Chapter 8: Visualization Model Performance Area Under the Curve (AUC): area under a classifier’s curve expressed as a fraction of the unit square. Its value ranges from zero to one.
  • 52. 5252 CUMULATIVE RESPONSE / LIFT CURVE • How much the line representing the model performance is lifted up over the random performance diagonal Provost; Fawcett. Data Science for Business. Chapter 8: Visualizing Model Performance • I.E. “our model gives a two times (or a 2X) lift”: this means that at the chosen threshold (often not mentioned), the lift curve shows that the model’s targeting is twice as good as random
  • 53. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 53 DESCRIPTIVE ANALYTICS: UNSUPERVISED MACHINE LEARNING
  • 54. 54 Unsupervised learning • Cluster analysis • Factor analysis • Self-Organizing Maps (SOMs) k-nearest neighbors Machine Learning
  • 57. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 57 BIG DATA: BACKGROUND AND EXAMPLE
  • 59. 59 DATA ANALYTICS DRIVERS: V4C 59 Social and mobile Data analytics Interactive platforms Real-Time systems •VOLUME •VELOCITY •VARIETY •VARIABILITY •COMPLEXITY V4C
  • 60. 60 • Cases where prediction is not “deterministic” • Bayes rate • Theoretical maximum accuracy that can be achieved for a problem 60 MODEL ERRORS: INHERENT RANDOMNESS
  • 61. 61 • Bias: even with ‘Big Data’, model will never reach perfect accuracy of true model • Example • Linear regression model to predict response to an advertising campaign… • Model is an abstraction… • True model always more complex 61 MODEL ERRORS: BIAS
  • 62. 62 • Variance: procedures with more variance tend to produce models with larger errors • Accuracy tends to vary across training sets • Given finite sample set… • Different models emerge from different samples • Different models tend to have different accuracy 62 MODEL ERRORS: VARIANCE
  • 63. 63 Big Data • Complex model • Many variables • Low bias… • but high variance • Subject to overfitting 63 BALANCE: BIAS VERSUS VARIANCE Strong models – Tested abstraction – Few, but significant variables – Low variance… – but high bias Jno. T-62 tank in Russian service. http://www.aviation.ru/jno/Kubinka02 http://commons.wikimedia.org/wiki/File:T-62_tank_in_Russian_service_(2).jpg
  • 64. 6464 Statistical Learning with Big Data http://web.stanford.edu/~hastie/T ALKS/SLBD_new.pdf
  • 65. 6565 Statistical Learning with Big Data http://web.stanford.edu/~hastie/T ALKS/SLBD_new.pdf
  • 66. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. 66 EXPLANATION: CAUSAL MODELING
  • 67. 67 • Explanatory performance NOT EQUAL to predictive efficacy (and vice versa), difference between inductive and deductive methods/thinking • This is a (sometimes heated) methodological debate amongst practitioners/academics… • Is it really a debate, or a religious (professional/Kuhnian) dispute? Econometrics + machine learning (H. Varian) EXPLANATORY ANALYTICS
  • 68. 68 • Varian, Hal R. 2014. Machine Learning and Econometrics. Stanford lecture slides: https://web.stanford.edu/class/ee380/Abstracts/140129-slides-Machine-Learning-and-Econometrics.pdf • Varian, Hal R. 2013. Big Data: New Tricks for Econometrics. Paper: http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.pdf MACHINE LEARNING AND ECONOMETRICS
  • 69. 69 • Ensemble learning… • Promising – averages over many predictive cases to reduce impact of variance • However, is CORRELATIVE, not CAUSAL • CAUSAL data analysis requires • Investment in data acquisition • Similarity measurements • Expected value calculations • Correlation understanding • Identifying informative variables • Fitting equations to data • Significance testing • Domain knowledge 69 MODEL MANAGEMENT