In this presentation for the LA R User's Group on 4/1/14, I discuss how TIBCO helps organizations increase the value of their investments in the R language, using TIBCO Enterprise Runtime for R.
2. Extending the Reach of R to the Enterprise
• TIBCO, S+, and embracing R in Spotfire
• Challenges of R for Enterprise applications
• TIBCO Enterprise Runtime for R (TERR)
• Benefits for organizations (and individuals) who use R
• Spotfire & TERR
• TERR integration & performance
• Learn more and try it yourself
- 2
3. Our Journey to TERR
• John Chambers developed the S language at Bell Labs
– Starting in the mid 70’s
• Insightful (Statsci) founded to commercial S as S+ in 1987
– The ―plus‖: statistical libraries, documentation, and support
– Later focus on commercial users, ease of use, server integration
• R: development begun by Ross Ihaka and Robert Gentleman at University of
Auckland in mid 90’s
• Insightful acquired by TIBCO in 2008
– Spotfire (for Data Discovery and Visualization) acquired in 2007
• Focus shifted to applying Predictive Analytics in Spotfire
– Step 1: Embrace R
- 3
4. Embracing R
• Spotfire Statistics Server
– Integration of R & S+ into Spotfire
applications
• Later added SAS® & MATLAB®
– Leverage the interactive visualizations
of Spotfire
• Contribute to the R community
• Well received—but our Enterprise
customers need more
– R provides tremendous benefits to
statisticians
– But large enterprises are often
challenged to leverage that value
- 4
8. Easily provide targeted, relevant predictive analytics to business users to
improve decision making
• Ensure compliance & proper usage
• Share best practices and consistent workflows
• Get the answer & do ―What If?‖ analyses when needed
• Leverage investments in R, S+, SAS, MATLAB, …
Powerful Predictive Analytics tools for Spotfire analysts
• Integrated into Spotfire workflows
• Easily create, evaluate, and share Predictive Models
• Add Forecasts with a single click
Benefits of Predictive Analytics to a spectrum of users
• Increase confidence & effectiveness in decision-making
– Reduce uncertainty
– Discover meaningful patterns, important data
– Maximize ROI
• Anticipate and react to emerging trends
• Reduce/manage risk
– Scenario planning, forecasts, fraud detection
• Forecast specific behavior, preemptively act on it
– Increase upsell, decrease churn
Predictive Analytics with Spotfire
9. Providing Value for individuals who use R
• Not seeking to displace R from statistician’s
desktops
– Enterprise platform for the deployment and
integration of your work—without having to rewrite
it!
• Contribute to the R community
– Sponsor useR conferences, contribute to R
Foundation
– Contribute bug reports and propose fixes to R core
• Contribute packages to CRAN
– As we port from S+ or develop for TERR
• Supports ―Develop in Open Source R, Deploy
on TERR‖
• E.g., splusTimeSeries, splusTimeDate, sjdbc
• TERR Developer Edition
– Full version of TERR engine for testing code prior to
deployment
• Compatible with RStudio & ESS Emacs
– Free for non-production use
– Supported through Community site
-
10. Example 1: TERR vs. R Raw Performance
One specific example
• Non-optimal, non-vectorized, real-world R script
• For loop with row by row processing
for (i in seq(1,length=nrow(df))) {
…process each customer record…
}
Results
• TERR is ~35x faster for 50K rows, 150x faster for 500K rows
• No code modification required
We are looking for more real-world performance tests!
• On average 2-10x faster than R in microtests
12. TERR vs. R Performance: GLM Model Fitting and Scoring
Model Fitting: 5 Million Rows Model Scoring: 20 Million Rows
13. • Forecast Tool
– Easily add Forecasts to
Visualizations by right click menu
– Advanced users can tune settings
– Uses embedded TERR engine
• Benefits
– Extend the power of Predictive
Analytics for ad hoc analysis to all
Spotfire users
– Easy entry point to Spotfire
Predictive Analytics
Example 2: Spotfire Forecast Tool
14. • Event-Driven analysis in TIBCO Spotfire
Event Analytics
– Process monitoring, analysis, and
optimization
• Apply predictive models in real-time
decision making
– Best marketing offer
– Customer churn
– Predictive Maintenance
– Yield optimization
• Rapidly develop and iterate models in
production
– Respond to changing opportunities
and threats
TERR integration with TIBCO StreamBase
15
15. TIBCO Cloud Compute Grid
• High performance computing on the cloud
– Available on TIBCO Cloud Marketplace
– TERR, Java and .NET computations
• Robust DataSynapse GridServer architecture
– Used by Wall Street to manage 10K’s nodes
– Java, .NET, and REST APIs (JSON)
• Perfect for pure computational work
– Vastly easier to use for applications like Monte Carlo
simulations than Map-Reduce
– Run complex statistical models multiple orders of
magnitude faster than open source R on a single
computer
– Unparalleled scalability without upfront capital
investment
• Easy to get started
– Uses your Amazon EC2 account
16. Demos
• TERR in Spotfire
– Forecast Tool
– Sales Forecasting (A simple application)
• Data Functions: using the R language in Spotfire
• Calling TERR from Spotfire Expression Language
– Creating Expression Functions
– Fraud Detection (A more complex application)
• TERR in RStudio
17. • TERR Community at TIBCOmmunity.com
– Resources, FAQs, Forums
– Details of R coverage
– Product documentation & download
– More info at spotfire.tibco.com/terr
• TERR Developer Edition
– Full version of TERR engine for testing code prior to deployment
– Supported through TIBCOmmunity, download via tap.tibco.com
• TIBCO Cloud Compute Grid
– https://marketplace.cloud.tibco.com
• Presentations: http://www.slideshare.net/loubajukyorgan/presentations
• We want your feedback and input!
– Real world performance tests
– Package & R coverage prioritization
– Via TERR Community, or contact me lbajuk@tibco.com or @loubajuk
18. TIBCO Spotfire Day by Perkin Elmer
• PerkinElmer San Jose Tech Center
• Tuesday May 6th, 8:30 AM to 3 PM
• Flyers available
Editor's Notes
“But we needed to do more. This wasn’t enough for Enterprise customers)
Meaningful benchmarks are hardWe can handle how you typically write code.
Benjamin Kreck, 3/;3:Hi all, I’ve done some more performance comparisons between R and TERR: This time for support vector machines coming from e1071. Results can be found in the attached .dxp (R/TERR results are already calculated -> Spotfire can be used offline).The take home message is again: The bigger the data set the higher the performance advantage of TERR compared to R!
NOTE: TIBCO® Cloud Compute Grid will be using TERR 1.5 at Nov. 2013 release. Date for migration to TERR 2.0 TBD. More info at http://confluence.tibco.com/display/spf/TERR%2C+Grid+Server+and+TIBCO+Cloud+Compute