SlideShare a Scribd company logo
1 of 50
Download to read offline
Batter Up! Advanced Sports Analytics with R and Storm 
Meeting the Real-Time Analytics Opportunity Head-On 
Bill Jacobs 
VP Product Marketing 
Revolution Analytics 
@bill_jacobs 
December 11, 2014 
Allen Day 
Principal Data Scientist 
MapR Technologies 
@allenday 
Vineet Sharma 
Dir., Partner Marketing 
MapR Technologies
Who Am I? 
Bill Jacobs, VP Product Marketing 
Revolution Analytics 
@bill_jacobs
Polling Question #1: 
Who Are You? (choose one) 
–Statistician or modeler 
–Data Scientist 
–Hadoop Expert 
–Application builder 
–Data guru 
–Business user 
–Baseball fan
Sports Analytics as Analogy. 
Sports Teams Are Like Other Corporations. 
–Great Value Achievable With Data 
–Vast Range of Data Sources 
–Timely Analysis Amplifies Value 
And apologies if you came to learn whom to bet upon in next year’s season.
Game Changing Big Data Analytics Applications 
Marketing: Clickstream & Campaign Analyses 
Digital Media: Recommendation Engines 
Social Media: Sentiment Analysis 
Retail: Purchase Prediction 
Insurance: Fraud Waste and Abuse 
Healthcare Delivery: Treatment Outcome Prediction 
Risk Analysis: Insurance Underwriting 
Manufacturing: Predictive Maintenance 
Operations: Supply Chain Optimization 
Econometrics: Market Prediction 
Marketing: Mix and Price Optimization 
Life Sciences: Pharmacogenetics 
Transportation: Asset Utilization
Polling Question #2: 
What Language or Tools Is In Use for Analytics (check all that apply) 
–R 
–SAS or SPSS 
–Python 
–Java 
–BI tools including: MSTR, Qlik, Tableau, Business Objects, Cognos 
–Salford Systems or MATLAB 
–H20, RapidMiner, KNIME or similar 
–Other data mining tools 
–Other programming languages 
–None or Don’t know
WELCOME & INTRODUCTIONS 
R Open Source 
-Language, Community, Collaboration 
-Robert Gentleman & Ross Ihaka, 1993 
-Version 1.0 released 2000 
-2.5 Million Global Users 
-Over 4,800 add-on “Packages” 
-Why R? R in Universities = New Talent Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages
R is exploding in popularity & functionality 
R Usage Growth Rexer Data Miner Survey, 2007-2013 
70% of data miners report using R 
R is the first choice of more data miners than any other software 
Source: www.rexeranalytics.com
Innovate with R 
Most widely used data analysis software 
•Used by 2M+ data scientists, statisticians and analysts 
Most powerful statistical programming language 
•Flexible, extensible and comprehensive for productivity 
Create beautiful and unique data visualizations 
•As seen in New York Times, Twitter and Flowing Data 
Thriving open-source community 
•Leading edge of analytics research 
Fills the talent gap 
•New graduates prefer R 
White Paper R is Hot bit.ly/r-is-hot
Polling Question #3: 
How are you using R today? (choose one) 
–Not using R 
–Studying R now 
–Initial R project(s) underway 
–R is widely used for exploration & modeling 
–R is deployed into production
Revolution Analytics In A Nutshell 
Our Vision: 
R is becoming the de- facto standard for enterprise predictive analytics 
Our Mission: 
Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
Revolution Analytics Builds & Delivers: 
Software Products: 
Stable Distributions 
Broad Platform Support 
Big Data Analytics in R 
Application Integration 
Deployment Platforms 
Agile Development Tooling 
Future Platform Support 
Support & Services 
Commercial Support Programs 
Training Programs 
Professional Services 
Academic Support Programs 
IP Indemnification
Revolution R Advantages for Analytics Professionals: 
Broadly-used, scalable R language 
Large (2M+), collaborative, young R analytics community 
Largest repository of statistical & analytical algorithms 
Big data analytics capabilities 
–Scales from workstations to Hadoop 
–Transparent parallelism 
–Cross platform compatibility 
–Multi-platform architectures 
Broadens career opportunities
Revolution R Advantages for Business Executives 
Viable Alternative to Legacy Analytics Solutions 
–Predictable Time To Results 
–Simplified Licensing 
–All-Inclusive Environment 
Lower Staffing Costs 
Controllable Open Source Risks 
–Support 
–IP Infringement Protections
Revolution R Advantages for IT Organizations 
Consistency Across Platforms Avoids Sprawl 
Support for Workstations, Servers, Hadoop, EDWs and Grids 
Heterogeneous Architecture Capabilities 
Integrates With Major BI & Application Tools 
Streamline Model Deployment 
Run Complex Analytics in the “Data Lake” 
Be a “Good Citizen” in shared systems 
Commercial Support Reduces Project Risks 
Quick Start Programs Accelerate Results 
Platform Continuity Future-Proofs Architectures
YARN 
Revolution R Enterprise: Predictive Analytics Across Huge Data in Hadoop 
Exploration Visualization Predictive Modeling 
HDFS
Polling Question #4: 
Stage of Hadoop Adoption? (choose one) 
–No Need 
–Studying 
–Setting-Up Hadoop 
–Experimenting with Hadoop 
–Deploying Hadoop Now 
–Hadoop in Production
© 2014 MapR Technologies 18 
Introducing: 
Vineet Sharma 
Director, Partner Marketing 
MapR Technologies
© 2014 MapR Technologies 19 
MapR + Revolution Leverages MapR As A Scalable 
Enterprise R Engine. 
• Plus: 
– Run RRE Analytics In MapR 
Hadoop Without Change 
– Eliminate Need To Design 
Parallel Software or “Think In 
MapReduce” 
– Leverage All Revolution R 
Enterprise Pre-Parallelized 
Algorithms 
– Enable Users To Build Custom 
Apps That Leverage Hadoop’s 
Parallelism 
– Slash Data Movement by 
Analyzing Data Inside the MapR 
Data Platform 
– Expand Deployment and 
Integration Options 
Rapid Adoption of R 
MapR Enterprise Data 
Platform Capabilities 
Broad Adoption of Hadoop 
for Big Data Analytics
© 2014 MapR Technologies 20 
Predictive 
Modeling 
Algorithms 
MapR 
FS 
Data 
Desktop Users with Analytical Access to Huge Data in 
Hadoop
© 2014 MapR Technologies 21 
MapR: Best Solution for Customer Success 
Top Ranked 
Exponential 
Growth 
500+ 
Customers 
Premier 
Investors 
>2x annual bookings 
80% of accounts expand 3X 
90% software licenses 
< 1% lifetime churn 
> $1B in incremental revenue 
generated by 1 customer
© 2014 MapR Technologies 22 
Management 
MapR Data Platform 
APACHE HADOOP AND OSS ECOSYSTEM 
Security 
YARN 
Pig 
Cascading 
Spark 
Batch 
Spark 
Streaming 
Storm* 
Streaming 
HBase 
Solr 
NoSQL & 
Search 
Juju 
Provisioning 
& 
coordination 
Savannah* 
Mahout 
MLLib 
ML, Graph 
GraphX 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Workflow 
& Data 
Governance 
Tez* 
Accumulo* 
Hive 
Impala 
Shark 
Drill 
SQL 
Sqoop Sentry* Oozie ZooKeeper 
Flume Knox* Falcon* Whirr 
Data 
Integration 
& Access 
HttpFS 
Hue 
MapR-FS MapR-DB 
* Certification/support planned 
The Power of the Open Source Community
© 2014 MapR Technologies 23 
MapR Distribution for Hadoop 
Management 
MapR Data Platform 
APACHE HADOOP AND OSS ECOSYSTEM 
Security 
YARN 
Pig 
Cascading 
Spark 
Batch 
Spark 
Streaming 
Storm* 
Streaming 
HBase 
Solr 
NoSQL & 
Search 
Juju 
Provisioning 
& 
coordination 
Savannah* 
Mahout 
MLLib 
ML, Graph 
GraphX 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Workflow 
& Data 
Governance 
Tez* 
Accumulo* 
Hive 
Impala 
Shark 
Drill 
SQL 
Sqoop Sentry* Oozie ZooKeeper 
Flume Knox* Falcon* Whirr 
Data 
Integration 
& Access 
HttpFS 
Hue 
Enterprise-grade Interoperability Performance Multi-tenancy Security Operational 
MapR-FS MapR-DB 
• Standard file access 
• Standard database 
access 
• Pluggable services 
• Broad developer 
support 
• Enterprise security 
authorization 
• Wire-level 
authentication 
• Data governance 
• Ability to support 
predictive analytics, 
real-time database 
operations, and 
support high arrival 
rate data 
• Ability to logically 
divide a cluster to 
support different 
use cases, job 
types, user groups, 
and administrators 
• 2X to 7X higher 
performance 
• Consistent, low 
latency 
• High availability 
• Data protection 
• Disaster recovery 
* Certification/support planned
Management 
APACHE HADOOP & OSS ECOSYSTEM 
ZooKeeper Oozie Hue Pig Hive Impala Shark 
Flume HttpFS Cascading Solr Juju Mahout MLLib 
Storm 
Spark 
Streaming Sqoop Whirr HBase YARN 
Drill Tez 
Knox Sentry 
Spark Falcon 
Revolution R Enterprise and MapR Hadoop 
Edge Node 
BI and other Apps 
R on the Desktop 
Via Browsers 
Mobile 
YARN 
MapReduce 
MapR Data Platform
© 2014 MapR Technologies 25 
Introducing: 
Allen Day 
Principal Data Scientist 
MapR Technologies 
@allenday
© 2014 MapR Technologies 26 
Talk Overview 
• Agile Real-time Stats 
• R + Storm 
github.com/allenday/R-Storm 
• DEMO 
• How to do it? 
• Q & A @allenday 
Agile 
Methods 
Advanced 
Statistics 
Continuous 
Real-time 
Delivery 
github.com/allenday/hadoop-summit-r-storm-demo-public
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 27 
Architecting R into the Storm 
Application Development Process
© 2014 MapR Technologies 28 
Quick intro 
• Allen Day, Principal Data Scientist [ @allenday ] 
7yr Hadoop dev, 12yr R dev/author 
PhD, Human Genetics, UCLA Medicine
© 2014 MapR Technologies 29 
What’s Storm? What’s R? 
• What’s Storm? 
– Processes a data stream. Akin to UNIX pipe + tee & merge commands 
– Runs on a cluster. Fault-tolerant and designed to scale out 
– Used for: real-time analytics & machine learning 
• What’s R? 
– Programming language with advanced statistics libraries 
– Does not scale out. Can scale up 
– Used for: prototyping, data modeling, visualization 
How to combine these?
© 2014 MapR Technologies 30 
R outside, Storm inside: not practical. Why? 
• Model-building and QA is done 
on data snapshots 
• However, R => Hadoop is 
realistic. Key difference: 
referenced data can be static 
– Use MapR snapshots for dev and 
QA 
– See also: RHIPE (Purdue) and 
RHadoop (RevolutionAnalytics) 
R 
Storm 
User
© 2014 MapR Technologies 31 
Storm outside, R inside: a good fit 
• Enables separation of concerns 
– Independently manage 
modeling, ops timelines, and 
version control 
– Integrate as needed 
• Enables role specialization 
– R built-ins allow faster iteration 
and more concise stats-type 
code 
– Do DevOps with specific SW 
engineering tech, e.g. Java 
Storm 
R 
User
© 2014 MapR Technologies 32 © 2014 MapR Technologies 
Q: Who really likes statistics? 
A: Baseball fans 
A: Team Managers = Portfolio Managers
© 2014 MapR Technologies 33 
Famous Vintage Data 
Oakland Athletics 
2002 Season 
20 consecutive 
wins – the current 
record
© 2014 MapR Technologies 34 © 2014 MapR Technologies 
Goal: Detect “Moneyball” 2002 Winning Streak
© 2014 MapR Technologies 35 
Methods: 
Change Point Detection 
Find natural breakpoints in a 
time-series set of data points 
R packages implement this: 
changepoint: more 
sensitve, but not streaming 
bcp: streaming, but less 
sensitive
© 2014 MapR Technologies 36 
GIFs to 
MapR 
Filesystem 
Methods: R+Storm Demo Architecture 
Storm Bolt 
R online 
change 
point 
detector 
Storm Bolt 
(write to Jetty) 
Oakland A’s 
Data 
(accelerated) 
Jetty 
Webserver 
Browser 
(D3.js) Us  
github.com/allenday/hadoop-summit-r-storm-demo-public
© 2014 MapR Technologies 37 © 2014 MapR Technologies 
50-game sliding 
window/buffer to 
detect change points 
Cumulative history 
with detected break 
points 
Raw data (score 
difference between 
A’s and opponent) 
Demo
© 2014 MapR Technologies 38 
Methods Details: How it’s done 
• Uses R-Storm binding github.com/allenday/R-Storm 
– Storm package on CRAN cran.r-project.org/web/packages/Storm 
Storm (dev 
team) 
R 
(stats team) 
Storm 
(dev team, pure 
Java) 
Producer Consumer
© 2014 MapR Technologies 39 
Methods Details: Easy integration 
R: lambda function 
storm = Storm$new(); 
storm$lambda = function(s) { 
t = s$tuple; 
t$output = vector(length=1); t$output[1] = “tada!” 
s$emit(t) 
} 
Storm: extend ShellBolt 
public static class MyRBolt extends ShellBolt implements 
IRichBolt { 
public RBolt() { 
super("Rscript", ”my.R"); 
} 
}
© 2014 MapR Technologies 40 
Results 
• Change points are identified, but none for winning streak 
– Not using score difference, anyway 
• Time to integrate with the modeling team! 
– Send @kunpognr or @allenday a pull request on GitHub 
• Applicable to many other use cases, e.g. 
– Security (fraud detection, intrusion detection) 
– Marketing (intent to purchase / social media streams) 
– Customer Support (help desk voice calls) 
Discussion
Polling Question #5 
How important will Real-Time analytical apps become? (choose one) 
–Uncertain 
–Not important 
–Necessary 
–Critical
Real-Time and Internet of Things: Foundation of a Compelling Trend for 2015 
Big Data Analytics Meets The Internet of Things 
–Transactions + 
–Human Behavior + 
–Internet of Things: Sensors 
… and extracting value using 
–Traditional Statistics 
–Visualization 
–Machine Learning 
… plus adaptability 
–Real-Time –Agile Modeling & Fast Model Execution 
–Production Capable, Stable and Secure 
–Rapidly-Evolving Data Science
Where Does Real Time Impact The Analytical Lifecycle? 
Data Engineering 
–Collection and Ingest 
–“Blending” 
Modeling 
–Aggregation, Segmentation & Exploration 
–Model Development & Optimization 
–Testing & Validation 
Operationalization 
–Deployment & Scoring 
–Delivery 
–Monitoring & Evaluation
Typical Analytical Lifecycle 
Ingest 
Explore 
Model 
Deploy 
Score 
Act 
Measure 
Model 
Score
More Complex Event Driven Analytical Cycle 
Historic Ingest 
Explore 
Model 
Deploy 
Act 
Measure 
Data Analytics & Process Design 
Scoring 
Event Ingest 
Trans- Form 
“Blend” 
Append 
Improve 
Enrich
Real-Time Analytics Best Practices 
Develop a Common Lexicon for Real-Time 
Discriminate Between Needs of Each Stage in Lifecycle 
–Data Ingest & Manipulation and Enrichment 
–Data Source / Repository Integration Needs 
–Processes that “Fill the Lake” 
–Process that “Act on the Stream” 
–Vastly different computationally 
–Big differences in data ingest volume & latency 
Start with Tractable Goals 
–Anticipate Growing Requirements: Microbatched >> Interactive >> Autonomy 
Build for today, Architect for tomorrow
Real-Time Realities 
Plan for Diverse Needs 
–Real-Time Score Retrieval, Scoring, Modeling 
–Wide-Ranging Performance – Microbatch – Interactive - Autonomous 
Fragmentation 
–Data Delivery Systems Pre-Exist 
–Will Vary Widely by Vertical Market 
–Competing Proprietary Solutions 
Growing Demand 
–Numerous high-value targets 
–“The next step”: Put big data analytics to work
What’s Needed 
Real-Time Performance… plus… 
Agility 
–Deployment models 
–Organization 
–Infrastructure 
–Analytics 
Manageable Costs 
–Hadoop 
–Open Source R 
Production Platform(s) 
–Proven 
–Performant
Next Steps… 
www.revolutionanalytcs.com 
Whitepaper: Revolution R for Hadoop: 
—http://www.revolutionanalytics.com/whitepaper/delivering-value-big-data-revolution-r-enterprise-and- hadoop 
—…or http://bit.ly/1ua43bu 
www.maprtech.com 
Resources: 
R foundation URL: www.r-project.org 
Download Revolution R: http://mran.revolutionanalytics.com/download/ 
Learn about Apache Storm: https://storm.apache.org/ 
R-Storm bindings: github.com/allenday/R-Storm 
Storm package on CRAN: cran.r-project.org/web/packages/Storm
Thank you. 
www.revolutionanalytics.com 
1.855.GET.REVO 
Twitter: @RevolutionR

More Related Content

Viewers also liked

The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-Paolo Raineri
 
2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)Neil Horowitz
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Predicting Football Using R
Predicting Football Using RPredicting Football Using R
Predicting Football Using RMartin Eastwood
 

Viewers also liked (20)

The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-
 
Sports Analytics 2015 Brochure
Sports Analytics 2015 BrochureSports Analytics 2015 Brochure
Sports Analytics 2015 Brochure
 
2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)
 
Analytics - Sports Style, ESPN
Analytics - Sports Style, ESPNAnalytics - Sports Style, ESPN
Analytics - Sports Style, ESPN
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Predicting Football Using R
Predicting Football Using RPredicting Football Using R
Predicting Football Using R
 

Similar to Batter Up! Advanced Sports Analytics with R and Storm

R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar Revolution Analytics
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelMSDEVMTL
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR Technologies
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareBAQMaR
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxRevolution Analytics
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceMapR Technologies
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendrapasalapudi123
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution ShowcaseInside Analysis
 

Similar to Batter Up! Advanced Sports Analytics with R and Storm (20)

R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans Excel
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendra
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Revolution Analytics
 

More from Revolution Analytics (15)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 

Recently uploaded

IoT Insurance Observatory: summary 2024
IoT Insurance Observatory:  summary 2024IoT Insurance Observatory:  summary 2024
IoT Insurance Observatory: summary 2024Matteo Carbone
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 

Recently uploaded (20)

IoT Insurance Observatory: summary 2024
IoT Insurance Observatory:  summary 2024IoT Insurance Observatory:  summary 2024
IoT Insurance Observatory: summary 2024
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 

Batter Up! Advanced Sports Analytics with R and Storm

  • 1. Batter Up! Advanced Sports Analytics with R and Storm Meeting the Real-Time Analytics Opportunity Head-On Bill Jacobs VP Product Marketing Revolution Analytics @bill_jacobs December 11, 2014 Allen Day Principal Data Scientist MapR Technologies @allenday Vineet Sharma Dir., Partner Marketing MapR Technologies
  • 2. Who Am I? Bill Jacobs, VP Product Marketing Revolution Analytics @bill_jacobs
  • 3. Polling Question #1: Who Are You? (choose one) –Statistician or modeler –Data Scientist –Hadoop Expert –Application builder –Data guru –Business user –Baseball fan
  • 4. Sports Analytics as Analogy. Sports Teams Are Like Other Corporations. –Great Value Achievable With Data –Vast Range of Data Sources –Timely Analysis Amplifies Value And apologies if you came to learn whom to bet upon in next year’s season.
  • 5. Game Changing Big Data Analytics Applications Marketing: Clickstream & Campaign Analyses Digital Media: Recommendation Engines Social Media: Sentiment Analysis Retail: Purchase Prediction Insurance: Fraud Waste and Abuse Healthcare Delivery: Treatment Outcome Prediction Risk Analysis: Insurance Underwriting Manufacturing: Predictive Maintenance Operations: Supply Chain Optimization Econometrics: Market Prediction Marketing: Mix and Price Optimization Life Sciences: Pharmacogenetics Transportation: Asset Utilization
  • 6. Polling Question #2: What Language or Tools Is In Use for Analytics (check all that apply) –R –SAS or SPSS –Python –Java –BI tools including: MSTR, Qlik, Tableau, Business Objects, Cognos –Salford Systems or MATLAB –H20, RapidMiner, KNIME or similar –Other data mining tools –Other programming languages –None or Don’t know
  • 7. WELCOME & INTRODUCTIONS R Open Source -Language, Community, Collaboration -Robert Gentleman & Ross Ihaka, 1993 -Version 1.0 released 2000 -2.5 Million Global Users -Over 4,800 add-on “Packages” -Why R? R in Universities = New Talent Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages
  • 8. R is exploding in popularity & functionality R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R R is the first choice of more data miners than any other software Source: www.rexeranalytics.com
  • 9. Innovate with R Most widely used data analysis software •Used by 2M+ data scientists, statisticians and analysts Most powerful statistical programming language •Flexible, extensible and comprehensive for productivity Create beautiful and unique data visualizations •As seen in New York Times, Twitter and Flowing Data Thriving open-source community •Leading edge of analytics research Fills the talent gap •New graduates prefer R White Paper R is Hot bit.ly/r-is-hot
  • 10. Polling Question #3: How are you using R today? (choose one) –Not using R –Studying R now –Initial R project(s) underway –R is widely used for exploration & modeling –R is deployed into production
  • 11. Revolution Analytics In A Nutshell Our Vision: R is becoming the de- facto standard for enterprise predictive analytics Our Mission: Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
  • 12. Revolution Analytics Builds & Delivers: Software Products: Stable Distributions Broad Platform Support Big Data Analytics in R Application Integration Deployment Platforms Agile Development Tooling Future Platform Support Support & Services Commercial Support Programs Training Programs Professional Services Academic Support Programs IP Indemnification
  • 13. Revolution R Advantages for Analytics Professionals: Broadly-used, scalable R language Large (2M+), collaborative, young R analytics community Largest repository of statistical & analytical algorithms Big data analytics capabilities –Scales from workstations to Hadoop –Transparent parallelism –Cross platform compatibility –Multi-platform architectures Broadens career opportunities
  • 14. Revolution R Advantages for Business Executives Viable Alternative to Legacy Analytics Solutions –Predictable Time To Results –Simplified Licensing –All-Inclusive Environment Lower Staffing Costs Controllable Open Source Risks –Support –IP Infringement Protections
  • 15. Revolution R Advantages for IT Organizations Consistency Across Platforms Avoids Sprawl Support for Workstations, Servers, Hadoop, EDWs and Grids Heterogeneous Architecture Capabilities Integrates With Major BI & Application Tools Streamline Model Deployment Run Complex Analytics in the “Data Lake” Be a “Good Citizen” in shared systems Commercial Support Reduces Project Risks Quick Start Programs Accelerate Results Platform Continuity Future-Proofs Architectures
  • 16. YARN Revolution R Enterprise: Predictive Analytics Across Huge Data in Hadoop Exploration Visualization Predictive Modeling HDFS
  • 17. Polling Question #4: Stage of Hadoop Adoption? (choose one) –No Need –Studying –Setting-Up Hadoop –Experimenting with Hadoop –Deploying Hadoop Now –Hadoop in Production
  • 18. © 2014 MapR Technologies 18 Introducing: Vineet Sharma Director, Partner Marketing MapR Technologies
  • 19. © 2014 MapR Technologies 19 MapR + Revolution Leverages MapR As A Scalable Enterprise R Engine. • Plus: – Run RRE Analytics In MapR Hadoop Without Change – Eliminate Need To Design Parallel Software or “Think In MapReduce” – Leverage All Revolution R Enterprise Pre-Parallelized Algorithms – Enable Users To Build Custom Apps That Leverage Hadoop’s Parallelism – Slash Data Movement by Analyzing Data Inside the MapR Data Platform – Expand Deployment and Integration Options Rapid Adoption of R MapR Enterprise Data Platform Capabilities Broad Adoption of Hadoop for Big Data Analytics
  • 20. © 2014 MapR Technologies 20 Predictive Modeling Algorithms MapR FS Data Desktop Users with Analytical Access to Huge Data in Hadoop
  • 21. © 2014 MapR Technologies 21 MapR: Best Solution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors >2x annual bookings 80% of accounts expand 3X 90% software licenses < 1% lifetime churn > $1B in incremental revenue generated by 1 customer
  • 22. © 2014 MapR Technologies 22 Management MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill SQL Sqoop Sentry* Oozie ZooKeeper Flume Knox* Falcon* Whirr Data Integration & Access HttpFS Hue MapR-FS MapR-DB * Certification/support planned The Power of the Open Source Community
  • 23. © 2014 MapR Technologies 23 MapR Distribution for Hadoop Management MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill SQL Sqoop Sentry* Oozie ZooKeeper Flume Knox* Falcon* Whirr Data Integration & Access HttpFS Hue Enterprise-grade Interoperability Performance Multi-tenancy Security Operational MapR-FS MapR-DB • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • 2X to 7X higher performance • Consistent, low latency • High availability • Data protection • Disaster recovery * Certification/support planned
  • 24. Management APACHE HADOOP & OSS ECOSYSTEM ZooKeeper Oozie Hue Pig Hive Impala Shark Flume HttpFS Cascading Solr Juju Mahout MLLib Storm Spark Streaming Sqoop Whirr HBase YARN Drill Tez Knox Sentry Spark Falcon Revolution R Enterprise and MapR Hadoop Edge Node BI and other Apps R on the Desktop Via Browsers Mobile YARN MapReduce MapR Data Platform
  • 25. © 2014 MapR Technologies 25 Introducing: Allen Day Principal Data Scientist MapR Technologies @allenday
  • 26. © 2014 MapR Technologies 26 Talk Overview • Agile Real-time Stats • R + Storm github.com/allenday/R-Storm • DEMO • How to do it? • Q & A @allenday Agile Methods Advanced Statistics Continuous Real-time Delivery github.com/allenday/hadoop-summit-r-storm-demo-public
  • 27. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 27 Architecting R into the Storm Application Development Process
  • 28. © 2014 MapR Technologies 28 Quick intro • Allen Day, Principal Data Scientist [ @allenday ] 7yr Hadoop dev, 12yr R dev/author PhD, Human Genetics, UCLA Medicine
  • 29. © 2014 MapR Technologies 29 What’s Storm? What’s R? • What’s Storm? – Processes a data stream. Akin to UNIX pipe + tee & merge commands – Runs on a cluster. Fault-tolerant and designed to scale out – Used for: real-time analytics & machine learning • What’s R? – Programming language with advanced statistics libraries – Does not scale out. Can scale up – Used for: prototyping, data modeling, visualization How to combine these?
  • 30. © 2014 MapR Technologies 30 R outside, Storm inside: not practical. Why? • Model-building and QA is done on data snapshots • However, R => Hadoop is realistic. Key difference: referenced data can be static – Use MapR snapshots for dev and QA – See also: RHIPE (Purdue) and RHadoop (RevolutionAnalytics) R Storm User
  • 31. © 2014 MapR Technologies 31 Storm outside, R inside: a good fit • Enables separation of concerns – Independently manage modeling, ops timelines, and version control – Integrate as needed • Enables role specialization – R built-ins allow faster iteration and more concise stats-type code – Do DevOps with specific SW engineering tech, e.g. Java Storm R User
  • 32. © 2014 MapR Technologies 32 © 2014 MapR Technologies Q: Who really likes statistics? A: Baseball fans A: Team Managers = Portfolio Managers
  • 33. © 2014 MapR Technologies 33 Famous Vintage Data Oakland Athletics 2002 Season 20 consecutive wins – the current record
  • 34. © 2014 MapR Technologies 34 © 2014 MapR Technologies Goal: Detect “Moneyball” 2002 Winning Streak
  • 35. © 2014 MapR Technologies 35 Methods: Change Point Detection Find natural breakpoints in a time-series set of data points R packages implement this: changepoint: more sensitve, but not streaming bcp: streaming, but less sensitive
  • 36. © 2014 MapR Technologies 36 GIFs to MapR Filesystem Methods: R+Storm Demo Architecture Storm Bolt R online change point detector Storm Bolt (write to Jetty) Oakland A’s Data (accelerated) Jetty Webserver Browser (D3.js) Us  github.com/allenday/hadoop-summit-r-storm-demo-public
  • 37. © 2014 MapR Technologies 37 © 2014 MapR Technologies 50-game sliding window/buffer to detect change points Cumulative history with detected break points Raw data (score difference between A’s and opponent) Demo
  • 38. © 2014 MapR Technologies 38 Methods Details: How it’s done • Uses R-Storm binding github.com/allenday/R-Storm – Storm package on CRAN cran.r-project.org/web/packages/Storm Storm (dev team) R (stats team) Storm (dev team, pure Java) Producer Consumer
  • 39. © 2014 MapR Technologies 39 Methods Details: Easy integration R: lambda function storm = Storm$new(); storm$lambda = function(s) { t = s$tuple; t$output = vector(length=1); t$output[1] = “tada!” s$emit(t) } Storm: extend ShellBolt public static class MyRBolt extends ShellBolt implements IRichBolt { public RBolt() { super("Rscript", ”my.R"); } }
  • 40. © 2014 MapR Technologies 40 Results • Change points are identified, but none for winning streak – Not using score difference, anyway • Time to integrate with the modeling team! – Send @kunpognr or @allenday a pull request on GitHub • Applicable to many other use cases, e.g. – Security (fraud detection, intrusion detection) – Marketing (intent to purchase / social media streams) – Customer Support (help desk voice calls) Discussion
  • 41. Polling Question #5 How important will Real-Time analytical apps become? (choose one) –Uncertain –Not important –Necessary –Critical
  • 42. Real-Time and Internet of Things: Foundation of a Compelling Trend for 2015 Big Data Analytics Meets The Internet of Things –Transactions + –Human Behavior + –Internet of Things: Sensors … and extracting value using –Traditional Statistics –Visualization –Machine Learning … plus adaptability –Real-Time –Agile Modeling & Fast Model Execution –Production Capable, Stable and Secure –Rapidly-Evolving Data Science
  • 43. Where Does Real Time Impact The Analytical Lifecycle? Data Engineering –Collection and Ingest –“Blending” Modeling –Aggregation, Segmentation & Exploration –Model Development & Optimization –Testing & Validation Operationalization –Deployment & Scoring –Delivery –Monitoring & Evaluation
  • 44. Typical Analytical Lifecycle Ingest Explore Model Deploy Score Act Measure Model Score
  • 45. More Complex Event Driven Analytical Cycle Historic Ingest Explore Model Deploy Act Measure Data Analytics & Process Design Scoring Event Ingest Trans- Form “Blend” Append Improve Enrich
  • 46. Real-Time Analytics Best Practices Develop a Common Lexicon for Real-Time Discriminate Between Needs of Each Stage in Lifecycle –Data Ingest & Manipulation and Enrichment –Data Source / Repository Integration Needs –Processes that “Fill the Lake” –Process that “Act on the Stream” –Vastly different computationally –Big differences in data ingest volume & latency Start with Tractable Goals –Anticipate Growing Requirements: Microbatched >> Interactive >> Autonomy Build for today, Architect for tomorrow
  • 47. Real-Time Realities Plan for Diverse Needs –Real-Time Score Retrieval, Scoring, Modeling –Wide-Ranging Performance – Microbatch – Interactive - Autonomous Fragmentation –Data Delivery Systems Pre-Exist –Will Vary Widely by Vertical Market –Competing Proprietary Solutions Growing Demand –Numerous high-value targets –“The next step”: Put big data analytics to work
  • 48. What’s Needed Real-Time Performance… plus… Agility –Deployment models –Organization –Infrastructure –Analytics Manageable Costs –Hadoop –Open Source R Production Platform(s) –Proven –Performant
  • 49. Next Steps… www.revolutionanalytcs.com Whitepaper: Revolution R for Hadoop: —http://www.revolutionanalytics.com/whitepaper/delivering-value-big-data-revolution-r-enterprise-and- hadoop —…or http://bit.ly/1ua43bu www.maprtech.com Resources: R foundation URL: www.r-project.org Download Revolution R: http://mran.revolutionanalytics.com/download/ Learn about Apache Storm: https://storm.apache.org/ R-Storm bindings: github.com/allenday/R-Storm Storm package on CRAN: cran.r-project.org/web/packages/Storm
  • 50. Thank you. www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR