© Hortonworks Inc. 2013
Modern Data Architecture
…for Predictive Analytics
David Smith
VP Marketing and Community - Revolu...
© Hortonworks Inc. 2013
Your Presenters
• David Smith (@revodavid)
–VP Marketing and Community at Revolution
Analytics
–Da...
© Hortonworks Inc. 2013
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop in ...
© Hortonworks Inc. 2013
Poll #1: What stage are you at looking in
Hadoop?
•Research
•Evaluation
•Trial
•Haven’t started re...
© Hortonworks Inc. 2013
Existing Data Architecture
Page 5
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CR...
© Hortonworks Inc. 2013
Existing Data Architecture
Page 6
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CR...
© Hortonworks Inc. 2013 - Confidential
Modern Data Architecture Enabled
Page 7
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES...
© Hortonworks Inc. 2013 - Confidential
Hadoop Powers Modern Data Architecture
Page 8
Apache Hadoop is an open source proje...
© Hortonworks Inc. 2013 - Confidential
Driving Efficiency Driving Opportunity
Drivers for Hadoop Adoption
Modern Data Arch...
© Hortonworks Inc. 2013 - Confidential
Opportunity in types of data
1. Sentiment
Understand how your customers feel about ...
© Hortonworks Inc. 2013 - Confidential
Efficiency in the Modern Data Architecture
Page 11
APPLICATIONSDATASYSTEM
REPOSITOR...
© Hortonworks Inc. 2013 - Confidential
Engineered for Interoperability
Page 12
APPLICATIONSDATASYSTEMSOURCES
RDBMS EDW MPP...
© Hortonworks Inc. 2013 - Confidential
Integrated
Interoperable with
existing data center
investments Skills
Leverage your...
© Hortonworks Inc. 2013 - Confidential
Revolution R Enterprise Architecture
Page 14
APPLICATIONSDATASYSTEM
REPOSITORIES
SO...
© Hortonworks Inc. 2013
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s r...
© Hortonworks Inc. 2013
Poll #2: Which of the following best describes
your use of R and Hadoop?
•We have R+ Hadoop in Pro...
Revolution Confidential
What is the Open Source R Project?
 The R Language:
 Object-Oriented Language for Stats, Math an...
Revolution Confidential
R is open source and drives analytic innovation but has
some limitations for Enterprises
Bigger
da...
Revolution Confidential
Revolution R Enterprise
19
Enterprise-Ready
Revolution R Enterprise
is the only commercial big dat...
Modern Data Architecture
Extract and Analyze
 Ad-hoc Data Distillation
 Exploratory Data Analysis / Data Visualization
...
Revolution Confidential
The Data Scientist’s Big Data Toolkit
21
Statistical
Tests
Machine
Learning
Simulation
Descriptive...
Parallel External-Memory Algorithms
22
CPU
CPU
CPU
SMP SERVER
Parallel External-Memory Algorithms
23
HADOOP NODE
HADOOP NODE
HADOOP NODE
HADOOP CLUSTER
Revolution Confidential
Modern Data Architecture with RRE7
In-Hadoop Predictive Analytics
 Production Data Distillation (...
Revolution Confidential
Hadoop As An R Engine
 Use Revolution R Enterprise
PEMAs in Hadoop
 No need to change existing R...
© Hortonworks Inc. 2013
Integrated
Interoperable with
existing data center
investments Skills
Leverage your existing
skill...
© Hortonworks Inc. 2013
Poll #3: Which of the following would you
most like to accomplish with R + Hadoop?
•Build a model ...
© Hortonworks Inc. 2013
Next Steps:
Page 28
More about Revolution Analytics and Hadoop
http://www.revolutionanalytics.com/...
Upcoming SlideShare
Loading in...5
×

The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

4,098

Published on

Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.

Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.

This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop

Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,098
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
240
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • Remember that CRAN is a new term to IT professionals, and anyone who hasn’t learned much about R. Spend some time on it. The acronym stands for: Community R Archive Network – a single repository of R algorithms, test data, evaluations. Use by nearly all R programmers.
  • Transcript of "The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics"

    1. 1. © Hortonworks Inc. 2013 Modern Data Architecture …for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Page 1
    2. 2. © Hortonworks Inc. 2013 Your Presenters • David Smith (@revodavid) –VP Marketing and Community at Revolution Analytics –Data Scientist, Blogger and co-author of An Introduction to R • John Kreisa (@marked_man) –VP Strategic Marketing, Hortonworks –Over 20 years in data management as a developer and a marketer –Avid camper Page 2
    3. 3. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • R’s role in the MDA • Q&A Page 3
    4. 4. © Hortonworks Inc. 2013 Poll #1: What stage are you at looking in Hadoop? •Research •Evaluation •Trial •Haven’t started research Page 4
    5. 5. © Hortonworks Inc. 2013 Existing Data Architecture Page 5 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications
    6. 6. © Hortonworks Inc. 2013 Existing Data Architecture Page 6 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Business Analytics Custom Applications Packaged Applications Source: IDC 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020
    7. 7. © Hortonworks Inc. 2013 - Confidential Modern Data Architecture Enabled Page 7 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications
    8. 8. © Hortonworks Inc. 2013 - Confidential Hadoop Powers Modern Data Architecture Page 8 Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
    9. 9. © Hortonworks Inc. 2013 - Confidential Driving Efficiency Driving Opportunity Drivers for Hadoop Adoption Modern Data Architecture Hadoop has a central role in next generation data architectures while integrating with existing data systems Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge Existing Traditional Server log Clickstream Big Data Sets Emerging Sentiment/Social Machine/Sensor Geo-locations
    10. 10. © Hortonworks Inc. 2013 - Confidential Opportunity in types of data 1. Sentiment Understand how your customers feel about your brand and products – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents Value Page 10
    11. 11. © Hortonworks Inc. 2013 - Confidential Efficiency in the Modern Data Architecture Page 11 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) Business Analytics Custom Applications Packaged Applications • Drive efficiency via modern data architecture • Store data once and access it in many ways • Often referred to a data lake or data repository • Infrastructure platform driven • IT-oriented, TCO based
    12. 12. © Hortonworks Inc. 2013 - Confidential Engineered for Interoperability Page 12 APPLICATIONSDATASYSTEMSOURCES RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) HANA BusinessObjects BI OPERATIONAL TOOLS DEV & DATA TOOLS Existing Sources (CRM, ERP, Clickstream, Logs) INFRASTRUCTURE
    13. 13. © Hortonworks Inc. 2013 - Confidential Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 13 Key Services Platform, operational and data services essential for the enterprise Requirements for Hadoop’s Role in the Modern Data Architecture
    14. 14. © Hortonworks Inc. 2013 - Confidential Revolution R Enterprise Architecture Page 14 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications = Revolution R Enterprise
    15. 15. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • R’s role in the MDA • Q&A Page 15
    16. 16. © Hortonworks Inc. 2013 Poll #2: Which of the following best describes your use of R and Hadoop? •We have R+ Hadoop in Production •We have testing R+ Hadoop •We have started to investigate but nothing is implemented •No current plans Page 16
    17. 17. Revolution Confidential What is the Open Source R Project?  The R Language:  Object-Oriented Language for Stats, Math and Data Science  Comprehensive data visualization and statistical modeling capabilities  The R Community:  2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects  New graduates with data skills learn R  The R Ecosystem:  5000+ Freely Available Algorithms in CRAN  Specialized methods for finance, economics, genomics, linguistics, and every data-driven domain 17
    18. 18. Revolution Confidential R is open source and drives analytic innovation but has some limitations for Enterprises Bigger data sizes Speed of analysis Production support Memory Bound Big Data Single Threaded Scale out, parallel processing, high speed Community Support Commercial production support Innovation and scale Innovative 5000+ packages Exponential growth Combines with open source R packages where needed
    19. 19. Revolution Confidential Revolution R Enterprise 19 Enterprise-Ready Revolution R Enterprise is the only commercial big data analytics platform based on open source R statistical computing language Cross-Platform Big Data Analytics High Performance Analytics Easier Build & Deploy
    20. 20. Modern Data Architecture Extract and Analyze  Ad-hoc Data Distillation  Exploratory Data Analysis / Data Visualization  Model Development AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT HIVEPIG CUSTOM HTTP STREAM LOAD SQOOP FLUME WebHDFS NFS STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES INTERACTIVE HIVE Server2 Analytical Tools ANALYTICAL rHadoop
    21. 21. Revolution Confidential The Data Scientist’s Big Data Toolkit 21 Statistical Tests Machine Learning Simulation Descriptive Statistics Data Visualization R Data Step Predictive Models Sampling
    22. 22. Parallel External-Memory Algorithms 22 CPU CPU CPU SMP SERVER
    23. 23. Parallel External-Memory Algorithms 23 HADOOP NODE HADOOP NODE HADOOP NODE HADOOP CLUSTER
    24. 24. Revolution Confidential Modern Data Architecture with RRE7 In-Hadoop Predictive Analytics  Production Data Distillation (e.g. Semantic Analysis)  Production Model Processing / Re-Estimation  Production Model Scoring AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT HIVEPIG CUSTOM DISTILLED DATA FILES HTTP STREAM LOAD SQOOP FLUME WebHDFS NFS STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES INTERACTIVE HIVE Server2 Analytical Tools ANALYTICAL Revolution R Enterprise
    25. 25. Revolution Confidential Hadoop As An R Engine  Use Revolution R Enterprise PEMAs in Hadoop  No need to change existing R code  Simple R programming  No need to “Think In MapReduce”  Eliminate data movement to slash latencies  Use Hadoop nodes as parallel R computation engines 25 Hadoop
    26. 26. © Hortonworks Inc. 2013 Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 26 Key Services Platform, operational and data services essential for the enterprise Requirements for Hadoop’s Role in the Modern Data Architecture
    27. 27. © Hortonworks Inc. 2013 Poll #3: Which of the following would you most like to accomplish with R + Hadoop? •Build a model to be put in product in Hadoop •Build a model to be put in product elsewhere •Create new data from Hadoop to supplement an existing analytics process •Something else Page 27
    28. 28. © Hortonworks Inc. 2013 Next Steps: Page 28 More about Revolution Analytics and Hadoop http://www.revolutionanalytics.com/products/r-for- hadoop.php Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/sandbox Follow us: @hortonworks @RevolutionR
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×