More Related Content Similar to 05 predictive with spss Similar to 05 predictive with spss (20) More from IBM_cloud_ecosystem_development_france More from IBM_cloud_ecosystem_development_france (12) 05 predictive with spss1. A New Era for Predictive Analytics
with SPSS
2. © 2012 IBM Corporation
The Mining Metaphor
2
!
●Gold Mining Diamond Mining Data Mining
3. © 2012 IBM Corporation
What is Data Mining? An early definition
Finding patterns in your data
which you can use
to do your business better
!
–It’s about patterns
–It’s about something you can use – practical things
–It’s about business
A recent definition
▪ Business-oriented discovery of patterns across all forms of data
▪ Produces insight and a predictive capability
▪ Deployment of predictions throughout the enterprise
4. © 2012 IBM Corporation
What is Data Mining?
4
!
Information Retrieval Information Extraction Information Analysis
!
+ +
Discover new, previously unknown information
5. © 2012 IBM Corporation
IBM SPSS Supports the Predictive Enterprise
Delivering Profitable Revenue Growth & Operational
Efficiency
▪Capture a complete perspective
–Survey customers & constituents
–Leverage structured, semi-structured &
unstructured data
▪Predict behavior and preferences
–Statistics for deeper insight
–Data & text mining for predictive modeling
▪Act on results
–Deploy scoring models for dynamic
decisions
–Directly affect business process with event
integration
6. © 2012 IBM Corporation
IBM SPSS: Our core value proposition
SPSS’ goal is to apply analytic to optimize decisions at every contact point, made possible by
enabling pervasive, predictive real-time decisions at the point of impact
7. © 2012 IBM Corporation
▪ SPSS Data Collection
– Collecting additional Attitudinal data for advanced
analytics typically collected through surveys
!
▪ SPSS Statistics
– Expand analytics capabilities to Professional
Business User / Statistician
– Add advanced statistical analysis to PM
!
▪ SPSS Modeler
– Provide predictive analytics using data mining & text
mining methods for key parts of the business
– Predict future outcome and understand what
influences it.
!
▪ SPSS Deployment & Collaboration Services
– Analytical asset management across multiple
analysts
– Audit, security, refresh
– Provide a web service interface
!
▪ SPSS Analytic Server
– Provide Big Data connectivity to SPSS Modeler
– It translate SPSS modeler server requests into
Hadoop jobs
!!
▪ SPSS Analytical Decision Manager
– Business scenario analysis
– Complex Rule for operational decision management
!
SPSS Predictive Analytic Platform
8. © 2012 IBM Corporation
SPSS Modeler 16 Editions
• SPSS Modeler GOLD
-Enables organizations to build predictive models to improve business process and help people or systems
make the right decisions each time. It combines and integrates predictive analytics, rules, scoring, and
optimization techniques to deliver recommended actions at the point of impact.
!
SPSS Modeler Premium + C&DS + Analytical Decision Management
!
• SPSS Modeler Premium
- Offers a range of advanced algorithms and capabilities including text analytics, entity analytics, social network
analysis, and automated modeling and preparation techniques to address a multitude of business problems
and analytic requirements on almost any type of data.
!
SPSS Modeler Professional + Text Analytics Workbench
!
• SPSS Modeler Professional
-Includes a range of advanced algorithms, data manipulation, and automated modeling and preparation
techniques to build predictive models and uncover hidden patterns in structured data.
9. © 2012 IBM Corporation
R is gaining in popularity, Do not walk away from R
opportunities it's not a competitor
You Ready ?
▪ EMBRACE:
Integrate R algorithms (e.g. Random Forest)
Generate R charts
Use R functions for data preparations
Make R available for non-programmers
!
▪ EXTEND:
Scalability (e.g. database pushback)
Leverage R engines of other vendors like SAP HANA
Enterprise deployment
Big Data (Analytic Server)
Powered by
11. © 2012 IBM Corporation
Modeler Interface
Stream Canvas
Stream, Outputs
& Model Manager
Palettes
Nodes
12. © 2012 IBM Corporation
Visual Programming with Modeler
4
-Visual programming
-Based on icons ("nodes")
-Pick nodes from palette & place them on the bench
-Edit their attributes
-Connect to specify flow of data ("streams")
13. © 2012 IBM Corporation5
Can be exported to PMML to be reuse outside of Modeler :
like in Java applications, SAS, IBM Infosphere stream using the DataMining
ToolKit, …
Is the Result of a predictive model Generation
Yellow Nugget or Yellow Diamond
14. © 2012 IBM Corporation
CRoss-Industry Standard Process for Data Mining
2
1. Business Understanding
Project objectives and requirements
understanding, Data mining problem definition
2. Data Understanding
Initial data collection and familiarization, data
quality problems identification
3. Data Preparation
Table, record and attribute selection, data
transformation and cleaning
4. Modeling
Modeling techniques selection and application,
Parameters calibration
5. Evaluation
Business objectives & issues achievement
evaluation
6. Deployment
Result model deployment, Repeatable data
mining process implementationCRoss-Industry Standard Process for - Data Mining
( CRISP – DM )
15. © 2012 IBM Corporation
2. Data Understanding
4
Initial data collection and familiarization, data quality
problems identification
CRoss-Industry Standard Process for - Data Mining
( CRISP – DM )
16. © 2012 IBM Corporation
Reading Data
5
Modeler reads a variety of different file types, including data
stored in spreadsheets and databases, using the nodes within
the Sources palette.
17. © 2012 IBM Corporation
Getting to Know your Data
8
Data Audit Node
Distribution Node
Histogram Node
…
18. © 2012 IBM Corporation
3. Data Preparation
9
!
Table, record and attribute selection, data
transformation and cleaning
CRoss-Industry Standard Process for - Data Mining
( CRISP – DM )
19. © 2012 IBM Corporation
Data Manipulation in Modeler
10
To prepare the data before analysis:
• Eliminate missing values
• Remove unwanted fields from analysis
• Derive new fields
• Merge and match data
Intermediate nodes in Modeler
• Record operation nodes
• Field operation nodes
!
!
▪CLEM language is case sensitive
20. © 2012 IBM Corporation
CLEM language: The Expression Builder
11
21. © 2012 IBM Corporation
4. Modeling
13
!
Modeling techniques selection and application,
Parameters calibration
CRoss-Industry Standard Process for - Data Mining
( CRISP – DM )
22. © 2012 IBM Corporation
Sampling or Partitioning your Data
• May not want to use all records
• Score your model with remaining Data
• May wish to examine a subgroup separately
• May assist us with building a predictive model (oversampling)
• Keep in mind that the sampling method must be fit to the problem at hand
!
-Similar customers and I want to reduce size of dataset for modelling
then I can use simple sampling.
!
-But if you want to directly sample from a database with customers of
different types you may want to draw a complex sample.
!
23. © 2012 IBM Corporation
Matching Data to the Modeling Tool
• For example – we want to use Rule Induction...we will need to
think about
!
-How algorithm handles missing data
!
-Output that is created (binary versus larger splits)
!
-What are we trying to predict (numeric target or binary?)
!
-In Which format the input predictors have to be ?
24. © 2012 IBM Corporation
Modeling Technics in Modeler
• Supervised techniques (Predictive Models)
To model an output variable based on the several input variables, to predict future cases
where the outcome is unknown
-Neural Networks, Rule Induction (C5.0, CHAID, QUEST & C&RT)
-Decision List, Binary Classifier
-Linear Regression and Logistic Regression
-Generalized Linear Models
• Unsupervised Techniques (Clustering)
No field to predict, used to group similar records within the data
-Kohonen Networks, K-Means, Two Step, Anomaly, Discriminant
• Association Rules
To search for things that typically occur together
-APRIORI, CARMA, GRI and SLRM
!
• Data Reduction:
-PCA/Factor Analysis, Feature Selection
• Sequence Detection Models:
-Sequence
• Time Series
• Text Mining
26. © 2012 IBM Corporation
Association Models
!
–Association rules search for things (events, purchases, attributes)
that typically occur together in the data
!
–They find the patterns in data that you could manually find using
visualization techniques such as the web node (yikes!) but can do
so much faster and can explore more complex patterns.
!
–Used to answer questions such as:
• Do customers who buy fruit usually buy cheese?
29. © 2012 IBM Corporation
Segmentation or Clustering Models
!
–Clustering techniques segment data into groups of cases/records/
customers that have similar patterns of input fields.
!
–Used in market segmentation studies whose aim it is to find distinct
types of customers so they can be targeted more effectively
!
–Used to answer questions such as:
• How can I group my customer to address the right marketing campaign?
32. © 2012 IBM Corporation
Predictive or Classification Models
!
–Algorithms that are used to make predictions or forecasts based on
historical data
!
–Automatic classification allows customers to let the software
determine the best one or customers can choose a specific
algorithms such as Neural Networks, Logistic Regression, Time
Series, etc.
!
–Used to answer questions such as:
• What predicts whether a customer will leave?
• What predicts whether this employee will be a super-star?
• How many umbrellas will I sell in the next three months in Chicago?
34. © 2012 IBM Corporation
5. Evaluation
54
Business objectives & issues achievement
evaluation
CRoss-Industry Standard Process for - Data Mining
( CRISP – DM )
35. © 2012 IBM Corporation
6. Deployment
55
Result model deployment, Repeatable data mining
process implementation
CRoss-Industry Standard Process for - Data Mining
( CRISP – DM )
36. © 2012 IBM Corporation
Deployment Family: Products
▪IBM SPSS Collaboration and Deployment
Services
– A foundation for managing and
deploying analytics
!
▪IBM SPSS Analytical Decision
Management
– Integrates analytics and business
knowledge to deliver optimal outcomes
56
37. © 2012 IBM Corporation
IBM SPSS Modeler Deployment Options
▪Client (Desktop)
–Access local files
–Connect to operational databases
–Connect to Cognos BI
–Processing performed on local installation
!
!
▪Client/Server
–Data operations/processing on server
–In-database data mining
–SQL pushback For PureData and Hadoop Platform
–Modeler Batch
–SuSE Linux Enterprise Server 10 (zLinux)
–Inclusion in Smart Analytics System for Power (AIX)
!
!
!
39. © 2012 IBM Corporation
Predictive Analytics for Big Data
Get more Accurate Models with bigger volume and variety of data
- Read Data from Hadoop
!
- Write back to Hadoop
!
- Export your Models to Streams
!
- Prepare your Data on Hadoop
!
- Few Models can run on Hadoop
!
- R analytic capabilities in SPSS
!
40. © 2012 IBM Corporation
Bring Analytics on Big Data for Everyone
Automatic Summarization
• Top findings in data ranked by
“interestingness” and association strength
• Plain language synopsis
!
Automatic Exploration
• Guided presentation by selecting fields of
interest
• Dynamic Visual Insights
• Users can refine auto generated parameters
!
Automatic Modeling
• Auto selection of best models and detection
of strongest relationships: Decision Tree
(CHAID) and Key Driver Reports (based on
linear and logistic regression)
!
Sharing of Output
• Collaboration with peers
• Tablet optimization
!
!
SPSS Analytics Catalyst CR.I.S.P.-D.M. Methology
41. © 2012 IBM Corporation
Generate simulated data
!
Fit distributions from existing data
!
Evaluate the simulation
Example Use Cases:
- A retailer wants to simulate alternative
sales scenarios to identify which
strategy will make them most likely to hit
their targets
!
- A parts manufacturer is interested in
modeling storage costs based on
simulating different scenarios for future part
orders against stock supplies and excess
order fees
!
Monte Carlo Smulation
42. © 2012 IBM Corporation
Geospatial Data Mining– Understanding Geohashes
▪ Space-time Boxes use geohashes and timestamps to locate where
and when entities exist
▪ A geohash is a unique identifier that uses latitude and longitude to
create an alphanumeric string
▪ Its precision depends on its length; longer geohash = better
precision
▪ For example, geohash dr5ru7 is midtown Manhattan...but how do we
know?
43. © 2012 IBM Corporation
What Exactly is a Space –Time Box?
▪ Space-time Boxes extend geohashes to include a third
dimension: time
!
!
!
▪ Space-time Boxes ‘bin’ events in 3-D space and time
▪ Density (i.e. size) of the Space-time Box is a required
input
▪ Can help analysts understand proximity between
entities, verify relationships
dr5ru7|2013-01-01 00:00:00|2013-01-01 00:15:00
Geohash Start timestamp End timestamp
44. © 2012 IBM Corporation
IBM SPSS Modeler Embraces R
1. SPSS Modeler allows the user
to build and score R models
within the Modeler interface
2. SPSS Modeler allows the use of
R functions for data preparation
and chart/output creation
3. The Custom Dialog Builder for
R allows the user to create
custom nodes that run R
algorithms, functions, or
outputs
4. These custom nodes can be
shared with other users and
they do not require the end
user to know any R code
45. © 2012 IBM Corporation
Use R to build a custom node
46. The world of analytics !
made easy for everyone
Bouchra Denis Antoine Danil
58. Potential growth
A lot of code already
available in packages
R is a widely used
language
Survey of use
R
IBM SPSS Statistics
Rapid Miner
SAS
Weka
Microsoft SQL Server
Matlab
IBM SPSS Modeler
0 % 18 % 35 % 53 % 70 %