Lou Bajuk-Yorgan
July 2016
Applying R in Enterprise Applications
Big Data Analytic Challenges for Enterprises
• More and more data…and the expectation to do
something valuable with it
• How do you get Value from Big Data?
• Deeper insights into Big Data—Visualization
and Advanced Analytics/Machine Learning
• Smarter Decisions—Provide insights to wider
community beyond Data Scientists
• Faster Decisions—both human and
automated
• Agile response to evolving opportunities and
threats
• Answers (and the questions to ask) change
rapidly
Multiple paths from Data to Decision and Action
Advanced
Analytics/Machine
Learning help
deliver smarter,
faster decisions
•
• Build predictive models on data in Spark and Hadoop (~10
models)
• Mllib
• Build predictive models in Spark (~20 models)
• In-database algorithms
• Vertica, SQL Server, Teradata Aster, Oracle, etc. (varies, ~10-20
models)
• Python
• Larger set of analytic methods…not designed for Big Data
(A few of) Many Options for Machine Learning on Big Data
© Copyright 2000-2016 TIBCO Software Inc.
• Most popular and powerful
language for Data Science
• The lingua franca of statistical
computing
• Rapidly growing
• Almost 9000 user-contributed
packages available on CRAN, and
growing exponentially
• Community supported by the R
Consortium
• www.r-consortium.org
• TIBCO a founding member
• Agile
• Easy prototyping of new
models and analysis
• Deeper insights
• Huge array of analytic
methods available
• The “best” method to solve a
given problem is likely
available
• Performance
• Not designed for real time or Big Data
applications
• Broader usage
• Hard for non-Data Scientist to use
directly
• Challenging to integrate into enterprise
applications
• Performance, commercial support and
Intellectual Property concerns
• Compromises which impact Agility
• Recode in a new, less agile, more
analytically limited environment
• Rewrite, use specialized R packages to
solve one problem better
R can help…R can help address Big Data Analytic Challenges…
…but it has it’s own challenges
What would the ideal solution look like?
• A single environment that would allow you to prototype in R, and
deploy to production in R
• Without recoding, without delay, without compromises
• Enable agile response to changing opportunities and threats
Requires
• Analytic flexibility, power and breadth of R
• High performance, scalable, robust platform
• Easy to embed in Business Intelligence, Real time and custom applications
• Fully supported for mission critical applications
• Allows R users to continue to work in their preferred development environments (e.g.,
RStudio)
TIBCO Enterprise Runtime for R (TERR)
• Unique, enterprise-grade engine for the R language,
built from the ground up by TIBCO
• Based on TIBCO’s long history and expertise with S+
• Better performance and memory management than open source R
• Designed for R language compatibility
• Wide range of built-in analytic methods
• Compatible with thousands of CRAN packages (dplyr, data.table,
etc.)
• Designed for commercial embeddability
• TIBCO licensed & supported product
• Not GPL, not a repackaging of the Open source R engine
• TERR extends the reach of R in the enterprise
• Develop code in open source R
• Deploy on a commercially-supported and robust platform
• Without the delay and cost of rewriting your code
• Embed in Data Discovery, BI and real time applications
• Leverage existing R ecosystem in Hadoop, Spark, in-db analytics.
© Copyright 2000-2016 TIBCO Software Inc.
Embed in Business Intelligence
and Visualization tools
• Decision Support
Embed in real-time Data Streams
• Decision Automation
TIBCO’s approach: Easy deployment of the R language to production
Embed R in Business Intelligence and Visualization tools
• Spotfire: Data Discovery and Visualization platform for Business Users and Analysts
• Separate analytics platform, independent of TERR/R
• Easily enhance Spotfire analyses and applications with R language scripts
• Extend the impact of the Data Scientist/R by making their analytic insights available to a wider audience
Write R code directly in Spotfire;
TERR executes locally or on server
Manage TERR analytics locally or
in Server to reuse across
community
Deploy TERR-powered
applications to the web
Illustrating the power of embedded Advanced Analytics
Advanced Analytic Applications in Spotfire
Customer Churn:
• Retain your most profitable customers
• Increase upsell, decrease churn
Fraud Detection:
• Reduce losses due to fraudulent transactions
Supply Chain Optimization:
• Anticipate peaks and lulls
• Optimize distribution centers
HR Planning:
• Predict employee attrition and optimize retention
Spotfire, TERR & Hadoop
• Putting it all together
• Drive complex, advanced analytic workflows in
Hadoop from intuitive Spotfire applications
• Initiate, parameterize and visualize Map Reduce
operations from within Spotfire templates via TERR
• Spotfire Customer application received
Cloudera Strata Data Impact Award for
Advanced Analytics
• Multinational manufacturing client
• Use data visualization and modeling tools co-
developed with TIBCO Spotfire, analysts for the
client are building mathematical models that can
predict when a manufacturing stoppage might occur.
• Real-time advanced analytics
• Apply predictive model in response to some triggering event
 Sensors on industrial equipment trending negative; customer walks into your store or purchases online,
etc.
• Trigger the right decision in response
 Extend a mobile offer to a customer; stop a fraudulent transaction in process; alert the equipment
operator or shut down the equipment
Embed R in real-time Data Streams
• Oil & Gas Extraction
• Maintenance Downtime and Equipment failures
are costly
• Engineers track sensor data to find leading
indicators
• Temperature, vibration, etc.
• Engineers usually use ad hoc rules on leading
indicators
• R/TERR used to develop predictive models for
preventative maintenance
• Deployed in real-time systems, alert when
maintenance recommended
Predictive Maintenance for Oil & Gas
© Copyright 2000-2013 TIBCO Software Inc.
• Port Congestion Detection
• Real time system triggers TERR
• Analyzes port congestion
• Recommends reduction of speed if
no berths available
• Maritime Abnormality Detection
• Based on Automatic Identification
System info, TERR calculates
likelihood of deviation from normal
sailing routes
• Alerts carrier & operator
Transportation and Logistics Optimization
Use TERR in your familiar tools
RStudio IDE
• Free, open source IDE widely used by the
R Community
• Fully compatible with TERR Developer
Edition
KNIME
• Free, open source workflow tool for data
management and analysis
• TERR fully compatible with KNIME
Interactive R Statistics Integration nodes
TERR is R for the Enterprise
• Develop code in open source R, deploy on commercially-supported,
and robust platforms
• Without recoding, without compromises
• Save time & money, quickly respond to new threats and opportunities
• Tightly & efficiently embed R language functionality
• Extend the power of R to a wider audience, more applications
• TERR Community at community.tibco.com
• Resources, Documentation, R compatibility, FAQs, Forums
• Predictive Analytics overview and resources
• Free TERR Developer Edition
• Full version of TERR engine for testing code prior to deployment
• Supported through TIBCO Community, download via tap.tibco.com
• Spotfire Free Trial: http://spotfire.tibco.com/trial
• Spotfire and Big Data http://spotfire.tibco.com/solutions/technology/big-data
• R Consortium Founding Member www.r-consortium.org
Learn more and Try it yourself

Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Louis Barjuk-Yorhan, Sr. Director, Product Management, TIBCO

  • 1.
    Lou Bajuk-Yorgan July 2016 ApplyingR in Enterprise Applications
  • 2.
    Big Data AnalyticChallenges for Enterprises • More and more data…and the expectation to do something valuable with it • How do you get Value from Big Data? • Deeper insights into Big Data—Visualization and Advanced Analytics/Machine Learning • Smarter Decisions—Provide insights to wider community beyond Data Scientists • Faster Decisions—both human and automated • Agile response to evolving opportunities and threats • Answers (and the questions to ask) change rapidly
  • 3.
    Multiple paths fromData to Decision and Action Advanced Analytics/Machine Learning help deliver smarter, faster decisions
  • 4.
    • • Build predictivemodels on data in Spark and Hadoop (~10 models) • Mllib • Build predictive models in Spark (~20 models) • In-database algorithms • Vertica, SQL Server, Teradata Aster, Oracle, etc. (varies, ~10-20 models) • Python • Larger set of analytic methods…not designed for Big Data (A few of) Many Options for Machine Learning on Big Data
  • 5.
    © Copyright 2000-2016TIBCO Software Inc. • Most popular and powerful language for Data Science • The lingua franca of statistical computing • Rapidly growing • Almost 9000 user-contributed packages available on CRAN, and growing exponentially • Community supported by the R Consortium • www.r-consortium.org • TIBCO a founding member
  • 6.
    • Agile • Easyprototyping of new models and analysis • Deeper insights • Huge array of analytic methods available • The “best” method to solve a given problem is likely available • Performance • Not designed for real time or Big Data applications • Broader usage • Hard for non-Data Scientist to use directly • Challenging to integrate into enterprise applications • Performance, commercial support and Intellectual Property concerns • Compromises which impact Agility • Recode in a new, less agile, more analytically limited environment • Rewrite, use specialized R packages to solve one problem better R can help…R can help address Big Data Analytic Challenges… …but it has it’s own challenges
  • 7.
    What would theideal solution look like? • A single environment that would allow you to prototype in R, and deploy to production in R • Without recoding, without delay, without compromises • Enable agile response to changing opportunities and threats Requires • Analytic flexibility, power and breadth of R • High performance, scalable, robust platform • Easy to embed in Business Intelligence, Real time and custom applications • Fully supported for mission critical applications • Allows R users to continue to work in their preferred development environments (e.g., RStudio)
  • 8.
    TIBCO Enterprise Runtimefor R (TERR) • Unique, enterprise-grade engine for the R language, built from the ground up by TIBCO • Based on TIBCO’s long history and expertise with S+ • Better performance and memory management than open source R • Designed for R language compatibility • Wide range of built-in analytic methods • Compatible with thousands of CRAN packages (dplyr, data.table, etc.) • Designed for commercial embeddability • TIBCO licensed & supported product • Not GPL, not a repackaging of the Open source R engine • TERR extends the reach of R in the enterprise • Develop code in open source R • Deploy on a commercially-supported and robust platform • Without the delay and cost of rewriting your code • Embed in Data Discovery, BI and real time applications • Leverage existing R ecosystem in Hadoop, Spark, in-db analytics.
  • 9.
    © Copyright 2000-2016TIBCO Software Inc. Embed in Business Intelligence and Visualization tools • Decision Support Embed in real-time Data Streams • Decision Automation TIBCO’s approach: Easy deployment of the R language to production
  • 10.
    Embed R inBusiness Intelligence and Visualization tools • Spotfire: Data Discovery and Visualization platform for Business Users and Analysts • Separate analytics platform, independent of TERR/R • Easily enhance Spotfire analyses and applications with R language scripts • Extend the impact of the Data Scientist/R by making their analytic insights available to a wider audience Write R code directly in Spotfire; TERR executes locally or on server Manage TERR analytics locally or in Server to reuse across community Deploy TERR-powered applications to the web
  • 11.
    Illustrating the powerof embedded Advanced Analytics
  • 12.
    Advanced Analytic Applicationsin Spotfire Customer Churn: • Retain your most profitable customers • Increase upsell, decrease churn Fraud Detection: • Reduce losses due to fraudulent transactions Supply Chain Optimization: • Anticipate peaks and lulls • Optimize distribution centers HR Planning: • Predict employee attrition and optimize retention
  • 13.
    Spotfire, TERR &Hadoop • Putting it all together • Drive complex, advanced analytic workflows in Hadoop from intuitive Spotfire applications • Initiate, parameterize and visualize Map Reduce operations from within Spotfire templates via TERR • Spotfire Customer application received Cloudera Strata Data Impact Award for Advanced Analytics • Multinational manufacturing client • Use data visualization and modeling tools co- developed with TIBCO Spotfire, analysts for the client are building mathematical models that can predict when a manufacturing stoppage might occur.
  • 14.
    • Real-time advancedanalytics • Apply predictive model in response to some triggering event  Sensors on industrial equipment trending negative; customer walks into your store or purchases online, etc. • Trigger the right decision in response  Extend a mobile offer to a customer; stop a fraudulent transaction in process; alert the equipment operator or shut down the equipment Embed R in real-time Data Streams
  • 15.
    • Oil &Gas Extraction • Maintenance Downtime and Equipment failures are costly • Engineers track sensor data to find leading indicators • Temperature, vibration, etc. • Engineers usually use ad hoc rules on leading indicators • R/TERR used to develop predictive models for preventative maintenance • Deployed in real-time systems, alert when maintenance recommended Predictive Maintenance for Oil & Gas © Copyright 2000-2013 TIBCO Software Inc.
  • 16.
    • Port CongestionDetection • Real time system triggers TERR • Analyzes port congestion • Recommends reduction of speed if no berths available • Maritime Abnormality Detection • Based on Automatic Identification System info, TERR calculates likelihood of deviation from normal sailing routes • Alerts carrier & operator Transportation and Logistics Optimization
  • 17.
    Use TERR inyour familiar tools RStudio IDE • Free, open source IDE widely used by the R Community • Fully compatible with TERR Developer Edition KNIME • Free, open source workflow tool for data management and analysis • TERR fully compatible with KNIME Interactive R Statistics Integration nodes
  • 18.
    TERR is Rfor the Enterprise • Develop code in open source R, deploy on commercially-supported, and robust platforms • Without recoding, without compromises • Save time & money, quickly respond to new threats and opportunities • Tightly & efficiently embed R language functionality • Extend the power of R to a wider audience, more applications
  • 19.
    • TERR Communityat community.tibco.com • Resources, Documentation, R compatibility, FAQs, Forums • Predictive Analytics overview and resources • Free TERR Developer Edition • Full version of TERR engine for testing code prior to deployment • Supported through TIBCO Community, download via tap.tibco.com • Spotfire Free Trial: http://spotfire.tibco.com/trial • Spotfire and Big Data http://spotfire.tibco.com/solutions/technology/big-data • R Consortium Founding Member www.r-consortium.org Learn more and Try it yourself

Editor's Notes

  • #4 In any sort of analytics problem, people are attempting to get from the Data to make some decision and take some Action based on that decision. Depending on the tools used, the depth and complexity of the analysis, and the skill of the user, the Decision Maker will need to apply some judgment to make the decision. The deeper the analysis, the more predictive or prescriptive it is, the less of the interpretation and mental heavy lifting the end user will need to do The R language can be a great asset here, of course, but there are obstacles to applying it
  • #6 Graphic from KDNuggets poll http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html Chart from http://www.r-bloggers.com/a-segmented-model-of-cran-package-growth/
  • #9 … What is TERR?
  • #12 Since the demo wraps up with the idea of deploying the model to real time systems, it is a good segue
  • #13 Supply Chain Optimization: simulate production and shipping scenarios to anticipate peaks and lulls HR Retention: Predict employee attrition and optimize retention
  • #14 Advanced Analytics -- A TIBCO Spotfire advanced analytics solution purpose-built for a client TIBCO built extensions of its Spotfire product for one client to support production line event analysis, improving product quality and supply chain efficiency. The client’s global manufacturing sites are peppered with sensors collecting programmable logical controller (PLC) data in sub second intervals. This data is fed into Hadoop, and with data visualization and modeling tools co-developed with TIBCO Spotfire, analysts are building mathematical models that predict when a manufacturing line stoppage might occur.  In evaluating the predictive accuracy of the models used in this environment, the client saw the potential to reduce defects by 50% through improved control settings and changes made to the engineering process, and its analysts can focus on problem solving rather than coding and software “gymnastics”. Providing this structured analysis method will ultimately enable data-driven change of the manufacturing process yielding millions of dollars in cost savings benefit.
  • #15 Example use case: real-time correlations for action Automated manufacturing yield analysis. Analyze manufacturing data in Spotfire Deploy model Compare live data to models of good behavior When actual manufacturing usage breaks the model, Spotfire used to understand why