How Big Data Insights become Easily
Accessible with Workflow Tools
Session Overview
➢ Introduction To Workflows and Big Data For Data Scientists/Data
Citizen
➢ Examples Of Customers Benefiting From Using Workflows –
Reduction In Cost, Speed To Deploy
➢ Pulling It All Together - Introduction To Deploying Models To
External Systems
➢ Technical Overview Into Building Predictive Analytics Workflows For
Big Data (Tibco Statistica)
3
Introduction To Workflows and
Big Data For Data
Scientists/Data Citizen
Insight Action
Making sense of the dataPlatform
© Copyright 2000-2017 TIBCO Software Inc.
PREDICT
MODEL
WRANGLE
ANALYZE
ACCESS
Predictive
Analytics
Visual
Analytics
Learning Cycles
MODEL
ACCESS
ANALYZE
WRANGLE
Insight
RULES
MODELS
© Copyright 2000-2017 TIBCO Software Inc.
MONITOR
PREDICT
ACT
DECIDE
MODELPredictive
Analytics
Streaming
Analytics
Action
MONITOR
PREDICT
ACT
DECIDE
Operational Cycles
RULES
MODELS
© Copyright 2000-2017 TIBCO Software Inc.
Personas
Its all about empowering more people
10
Examples Of Customers
Benefiting From Using
Workflows
In Cost, Speed To Deploy
Reduction
Big data analytics transforms
the operating room
CASE STUDY
Company: University of Iowa (UIHC) | Industry: Healthcare | Country: USA | Web: www.uihealthcare.org
UIHC surgeons needed to know the susceptibility of patients to
infections in order to make critical treatment decisions in the
operating room. Infection rates have major implications to overall
patient health and cost savings.
UIHC used Big Data and Analytics and transformed outcomes as
different points on the patients care.
Reduced surgical site infection occurrence by 58 percent
Merged historical and live patient data to predict likelihood of infection
Personalized care based on patients’ own characteristics
Improved efficiency by enabling staff to run predictive models and access
results with a mobile application or web browser
BUSINESS CHALLENGE
SOLUTION
RESULTS
© Copyright 2000-2017 TIBCO Software Inc.
“Predictive analytics is allowing us to deal with the ever-increasing types of data that healthcare institutions need to deal with.”
Dr. John Cromwell, MD Director of Gastrointestinal Surgery
Bank speeds time to market
with advanced analytics
Business need
To deliver timely and accurate credit decisions and other customer
services in today’s 24/7 world, Danske Bank needed to be able to
quickly build and deploy advanced analytical models.
Benefits
● Slashed time to develop and deploy analytical models by 50
percent
● Improved decision-making with more advanced analytical models
● Delivered an easy-to-use, standardized toolbox that can quickly
be customized to meet users’ needs
● Ensured fast ROI by deploying easily and integrating smoothly with
existing systems
Solution
Enable customers to apply for products such as loans through Danske
Banks portal. Generate scoring models to determine whether customers
applications are accepted.
“We have reduced the time we spend
on models up to 50 percent with
Statistica. Our development process is
much leaner and smoother compared
to what it was before.”
Jens Christian Ipsen,
First Vice President, Danske Bank
13
➢ Pulling It All Together
Introduction To Deploying Models
To External Systems
PREDICT
MODEL
WRANGLE
ANALYZE
ACCESS
Predictive
Analytics
Visual
Analytics
Learning Cycles
MODEL
ACCESS
ANALYZE
WRANGLE
Insight
RULES
MODELS
© Copyright 2000-2017 TIBCO Software Inc.
Machine Learning in TIBCO Statistica
TIBCO StreamBase for real-time scoring and action
TIBCO Statistica Deploy To External Application
• Model built in TIBCO Statistica
• Score model in TIBCO StreamBase on live data
• Action: equipment intervention
MONITOR
PREDICT
ACT
DECIDE
MODELPredictive
Analytics
Streaming
Analytics
Action
MONITOR
PREDICT
ACT
DECIDE
Operational Cycles
RULES
MODELS
© Copyright 2000-2017 TIBCO Software Inc.
One stop shop for actionable insights
DATA SCIENCE STREAMING ANALYTICSBI & ANALYTICS
AI-driven visualization
to gain insight to
find actionable
insights
Create analytics that
can predict the future
based on history
Provide analytics and
take action on real
time streaming data
One stop shop for actionable insights
DATA SCIENCE STREAMING ANALYTICSBI & ANALYTICS
AI-driven visualization
to gain insight to
find actionable
insights
Create analytics that
can predict the future
based on history
Provide analytics and
take action on real
time streaming data
19
Technical Overview
Into Building
Predictive Analytics
Workflows For Big
Data (Tibco
Statistica)
Workflows
• 1000s of stats, machine and deep learning
• Supervised learning - models, ensambles
• Unsupervised learning - anomaly detection, clustering
• 100s native validated step nodes, workflows
Data Blending
• Traditional - SQL sources, Flat Files
• Big Data - HDFS in, out, data maps
Models and Rules Management
• Deployment Code generators
(C/C#/PMML/Java(POJO,MapReduce)/Teradata/SAS)
• Audit, validation, user and version control
Collaboration
• Scripted nodes (use/manage): R, Python, Scala, C#
• Algorithmic marketplaces plugins
Big Data
• In-database analytics
• H20, Spark Nodes
• Deep Learning (CNTK) © Copyright 2000-2017 TIBCO Software Inc.
Statistica - Data Science Workbench for Big Data Analytics
© Copyright 2000-2017 TIBCO Software Inc.
Simple workflow example - Predictive Modeling (Classic)
Statistica Enterprise Server
© Copyright 2000-2017 TIBCO Software Inc.
TIBCO Statistica Platform Architecture for Big Data
Model
Monitoring &
Process Control
Statistica Big
Data Analytics
Monitoring
Alerting
Server
Live Score
Server
Metadata
Repository
Document
Management
System
Spark, H2O,
In-DB, HDFS
wrappers
Change
management
& ComplianceAnalytics
Modelling
Deployment
Data
aggregation &
preparation
Web and API
access
Real time model
scoring
Enterprise Tools
Governance
Batch Jobs
Access Roles
Models / Rules
Management
© Copyright 2000-2017 TIBCO Software Inc.
Change control, access management and model versioning
Data Science Workbench
• Workflows - Package and maintain your predictive model building process steps
Data Blending
• HDFS data in,out,feed
H20 - Sparkling water
• Sparkling water nodes and workflow templates
Spark ML Nodes (Scala code)
• Data
• Feature Selection
• Decision Trees
• Regression
• Classification
• PCA …. plus workflow templates
Deep Learning (CNTK)
• Regression
• Classification
• Deployment
© Copyright 2000-2017 TIBCO Software Inc.
TIBCO Statistica for Big Data Analytic Pipelines
© Copyright 2000-2017 TIBCO Software Inc.
Big Data Analytics - In-Database Processing
Dedicated In-Database Steps
(nodes)
• Descriptives, Correlactions
• GLZ, Lasso Regression, ..
Supported SQL Databases
Support as of today for
• Microsoft SQL Server
• Oracle
• Apache Hive
• Teradata
• MySQL
(as of today, Statistica version 13.3)
Why
• Move compute to data
• Reduce data travel, use resourcesDatabase
SQL, ...In-Db
Step
Results,
Metadata
© Copyright 2000-2017 TIBCO Software Inc.
Statistica Collective Intelligence - App Market Connectors
Use models / code from marketplaces (*if good/trusted)
Example : Azure ML Model Consumer
Parameters of the Statistica Azure ML step:
• Model API key
• Connection
• Webservice in/out datasets
• Batch score in storage container (optional)
Why
• (Re)Use typical use cases models
Marketing Campaigns, Churn
Predictive Maintenance
etc
• Bring existing IP into platform and manage
• Data Prep /merge, clean, transform)
• Control / Schedule jobs for execution /
Monitor performance
Similar approach for other marketplaces
and existing code (R, Python, C#)
© Copyright 2000-2017 TIBCO Software Inc.
Open Source Options, Scripting
Any R package as a node
• Integrate any R package as
a process step
• Augment/customize core capabilities
Python scripts
SCALA code
Scripting in C#, VB
Native Custom (open code) Nodes
Shipped with many Examples
© Copyright 2000-2017 TIBCO Software Inc.
Deep Learning
CNTK based deep learning NNs
• Regression
• Classification
• Generic
• Deployment
H2O Deep Learning nodes
BTW - if you are not looking for deep learning
discovery of hidden intrinsic relationships in
with an unsupervised NN … we have the
“classical” Statistica Automated NNs
© Copyright 2000-2017 TIBCO Software Inc.
Statistica H2O nodes
H2O easy to use “wrappers”
• Example Workspaces provider
• All nodes described in Statistica Help
• See SparklingWaterBooklet.pdf from
h2o-release.s3.amazonaws.com/h2o/
© Copyright 2000-2017 TIBCO Software Inc.
Statistica Spark (Scala) nodes
© Copyright 2000-2017 TIBCO Software Inc.
Statistica Scala nodes - Architecture
Livy
Server
REST
Server
Managed Cluster
Spark at least ver. 2.0.2
(e.g. Spark nodes + YARN)
Statistica Desktop
Analytics Workbench
Statistica
Enterprise
Repository
© Copyright 2000-2017 TIBCO Software Inc.
Goals:
• Predict yield for semiconductor manufacturing process
• Detect potential quality issues early
Problems:
• Ultra-wide data: Thousands to Millions of variables
• Handling billion(s) of cases
• Mix of categorical and continuous predictors
• Sparse data
Solution:
• Spark parallel processing in big data platform
• Feature selection algorithm + Lasso regression
• Hadoop cluster (100s of cores)
• Statistica analytic workflows submit code to spark cluster
• Spotfire dashboards used for visualization
Performance:
• Analysis Running time reduced to minutes
Big Data – Practical Use case - Yield, Root Causes and
Quality Issues detection - Complex Manufacturing Process
© Copyright 2000-2017 TIBCO Software Inc.
Summary
Validated
Big Data
Options
36
Questions

Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workflow Tools

  • 1.
    How Big DataInsights become Easily Accessible with Workflow Tools
  • 2.
    Session Overview ➢ IntroductionTo Workflows and Big Data For Data Scientists/Data Citizen ➢ Examples Of Customers Benefiting From Using Workflows – Reduction In Cost, Speed To Deploy ➢ Pulling It All Together - Introduction To Deploying Models To External Systems ➢ Technical Overview Into Building Predictive Analytics Workflows For Big Data (Tibco Statistica)
  • 3.
    3 Introduction To Workflowsand Big Data For Data Scientists/Data Citizen
  • 5.
    Insight Action Making senseof the dataPlatform © Copyright 2000-2017 TIBCO Software Inc.
  • 6.
  • 7.
  • 8.
  • 9.
    Its all aboutempowering more people
  • 10.
    10 Examples Of Customers BenefitingFrom Using Workflows In Cost, Speed To Deploy Reduction
  • 11.
    Big data analyticstransforms the operating room CASE STUDY Company: University of Iowa (UIHC) | Industry: Healthcare | Country: USA | Web: www.uihealthcare.org UIHC surgeons needed to know the susceptibility of patients to infections in order to make critical treatment decisions in the operating room. Infection rates have major implications to overall patient health and cost savings. UIHC used Big Data and Analytics and transformed outcomes as different points on the patients care. Reduced surgical site infection occurrence by 58 percent Merged historical and live patient data to predict likelihood of infection Personalized care based on patients’ own characteristics Improved efficiency by enabling staff to run predictive models and access results with a mobile application or web browser BUSINESS CHALLENGE SOLUTION RESULTS © Copyright 2000-2017 TIBCO Software Inc. “Predictive analytics is allowing us to deal with the ever-increasing types of data that healthcare institutions need to deal with.” Dr. John Cromwell, MD Director of Gastrointestinal Surgery
  • 12.
    Bank speeds timeto market with advanced analytics Business need To deliver timely and accurate credit decisions and other customer services in today’s 24/7 world, Danske Bank needed to be able to quickly build and deploy advanced analytical models. Benefits ● Slashed time to develop and deploy analytical models by 50 percent ● Improved decision-making with more advanced analytical models ● Delivered an easy-to-use, standardized toolbox that can quickly be customized to meet users’ needs ● Ensured fast ROI by deploying easily and integrating smoothly with existing systems Solution Enable customers to apply for products such as loans through Danske Banks portal. Generate scoring models to determine whether customers applications are accepted. “We have reduced the time we spend on models up to 50 percent with Statistica. Our development process is much leaner and smoother compared to what it was before.” Jens Christian Ipsen, First Vice President, Danske Bank
  • 13.
    13 ➢ Pulling ItAll Together Introduction To Deploying Models To External Systems
  • 14.
  • 15.
    Machine Learning inTIBCO Statistica TIBCO StreamBase for real-time scoring and action TIBCO Statistica Deploy To External Application • Model built in TIBCO Statistica • Score model in TIBCO StreamBase on live data • Action: equipment intervention
  • 16.
  • 17.
    One stop shopfor actionable insights DATA SCIENCE STREAMING ANALYTICSBI & ANALYTICS AI-driven visualization to gain insight to find actionable insights Create analytics that can predict the future based on history Provide analytics and take action on real time streaming data
  • 18.
    One stop shopfor actionable insights DATA SCIENCE STREAMING ANALYTICSBI & ANALYTICS AI-driven visualization to gain insight to find actionable insights Create analytics that can predict the future based on history Provide analytics and take action on real time streaming data
  • 19.
    19 Technical Overview Into Building PredictiveAnalytics Workflows For Big Data (Tibco Statistica)
  • 20.
    Workflows • 1000s ofstats, machine and deep learning • Supervised learning - models, ensambles • Unsupervised learning - anomaly detection, clustering • 100s native validated step nodes, workflows Data Blending • Traditional - SQL sources, Flat Files • Big Data - HDFS in, out, data maps Models and Rules Management • Deployment Code generators (C/C#/PMML/Java(POJO,MapReduce)/Teradata/SAS) • Audit, validation, user and version control Collaboration • Scripted nodes (use/manage): R, Python, Scala, C# • Algorithmic marketplaces plugins Big Data • In-database analytics • H20, Spark Nodes • Deep Learning (CNTK) © Copyright 2000-2017 TIBCO Software Inc. Statistica - Data Science Workbench for Big Data Analytics
  • 21.
    © Copyright 2000-2017TIBCO Software Inc. Simple workflow example - Predictive Modeling (Classic)
  • 22.
    Statistica Enterprise Server ©Copyright 2000-2017 TIBCO Software Inc. TIBCO Statistica Platform Architecture for Big Data Model Monitoring & Process Control Statistica Big Data Analytics Monitoring Alerting Server Live Score Server Metadata Repository Document Management System Spark, H2O, In-DB, HDFS wrappers Change management & ComplianceAnalytics Modelling Deployment Data aggregation & preparation Web and API access Real time model scoring Enterprise Tools Governance Batch Jobs Access Roles Models / Rules Management
  • 23.
    © Copyright 2000-2017TIBCO Software Inc. Change control, access management and model versioning
  • 24.
    Data Science Workbench •Workflows - Package and maintain your predictive model building process steps Data Blending • HDFS data in,out,feed H20 - Sparkling water • Sparkling water nodes and workflow templates Spark ML Nodes (Scala code) • Data • Feature Selection • Decision Trees • Regression • Classification • PCA …. plus workflow templates Deep Learning (CNTK) • Regression • Classification • Deployment © Copyright 2000-2017 TIBCO Software Inc. TIBCO Statistica for Big Data Analytic Pipelines
  • 25.
    © Copyright 2000-2017TIBCO Software Inc. Big Data Analytics - In-Database Processing Dedicated In-Database Steps (nodes) • Descriptives, Correlactions • GLZ, Lasso Regression, .. Supported SQL Databases Support as of today for • Microsoft SQL Server • Oracle • Apache Hive • Teradata • MySQL (as of today, Statistica version 13.3) Why • Move compute to data • Reduce data travel, use resourcesDatabase SQL, ...In-Db Step Results, Metadata
  • 26.
    © Copyright 2000-2017TIBCO Software Inc. Statistica Collective Intelligence - App Market Connectors Use models / code from marketplaces (*if good/trusted) Example : Azure ML Model Consumer Parameters of the Statistica Azure ML step: • Model API key • Connection • Webservice in/out datasets • Batch score in storage container (optional) Why • (Re)Use typical use cases models Marketing Campaigns, Churn Predictive Maintenance etc • Bring existing IP into platform and manage • Data Prep /merge, clean, transform) • Control / Schedule jobs for execution / Monitor performance Similar approach for other marketplaces and existing code (R, Python, C#)
  • 27.
    © Copyright 2000-2017TIBCO Software Inc. Open Source Options, Scripting Any R package as a node • Integrate any R package as a process step • Augment/customize core capabilities Python scripts SCALA code Scripting in C#, VB Native Custom (open code) Nodes Shipped with many Examples
  • 28.
    © Copyright 2000-2017TIBCO Software Inc. Deep Learning CNTK based deep learning NNs • Regression • Classification • Generic • Deployment H2O Deep Learning nodes BTW - if you are not looking for deep learning discovery of hidden intrinsic relationships in with an unsupervised NN … we have the “classical” Statistica Automated NNs
  • 29.
    © Copyright 2000-2017TIBCO Software Inc. Statistica H2O nodes H2O easy to use “wrappers” • Example Workspaces provider • All nodes described in Statistica Help • See SparklingWaterBooklet.pdf from h2o-release.s3.amazonaws.com/h2o/
  • 30.
    © Copyright 2000-2017TIBCO Software Inc. Statistica Spark (Scala) nodes
  • 31.
    © Copyright 2000-2017TIBCO Software Inc. Statistica Scala nodes - Architecture Livy Server REST Server Managed Cluster Spark at least ver. 2.0.2 (e.g. Spark nodes + YARN) Statistica Desktop Analytics Workbench Statistica Enterprise Repository
  • 32.
    © Copyright 2000-2017TIBCO Software Inc. Goals: • Predict yield for semiconductor manufacturing process • Detect potential quality issues early Problems: • Ultra-wide data: Thousands to Millions of variables • Handling billion(s) of cases • Mix of categorical and continuous predictors • Sparse data Solution: • Spark parallel processing in big data platform • Feature selection algorithm + Lasso regression • Hadoop cluster (100s of cores) • Statistica analytic workflows submit code to spark cluster • Spotfire dashboards used for visualization Performance: • Analysis Running time reduced to minutes Big Data – Practical Use case - Yield, Root Causes and Quality Issues detection - Complex Manufacturing Process
  • 33.
    © Copyright 2000-2017TIBCO Software Inc. Summary Validated Big Data Options
  • 34.