Your SlideShare is downloading. ×
Rethink Analytics: EDH for Advanced Analytics
Josh Wills, Director of Data Science
Sandy Lii, Senior Manager, Solutions Ma...
Agenda
• Market Background
• Challenges and Limitations
• EDH for Advanced Analytics
• Case Studies
• How to Get Started

...
Market Background

3
From BI to Advanced Analytics

What will happen?

How can we do
better?

What happened?
When? And
Where?

How and why did
...
Advanced Analytics that Saves Us Money
• Customer churn analysis

model
• Integrated customer support
and services
• Fraud...
Advanced Analytics that Makes Us Money
• Product recommendation

$
6
6

engines
• Location-based real-time
offers
• Target...
Traditional Advanced Analytics Process

Problem
ID

Project
Definition

Data Access Request
& Discovery

Data Transformati...
Challenges and Requirements

8
Accessing the Right Data is Difficult
Multi-structured or
External Data
Structured
Internal Data
Data
Warehouse

9
“Are we there yet?”
2. Get access
to data

3. Learn
about the data

4. Move data to
ADW and
process data

1. Find
the data...
Silo’d Platforms Challenge Collaboration & Mgmt
Non-Agile Models
Data
Sources

Departmental
Warehouse

Enterprise
Apps

De...
Impact of Status Quo
Executives

“We don’t have the information
we need to answer key business
questions.”

Data
Scientist...
Cloudera’s Enterprise Data Hub

13
Use All Your Data
Use more data, and more types
of data, with existing tools
• Reduce the need to limit or
move large data...
Shorten Analytics Lifecycle
Facilitate data discovery
• Track data life-cycle in
place
• Define, test, deploy, and
update ...
Do More with Data
Deliver multi-genre analytics
in a single platform
• Apply diverse concurrent
analytics to full datasets...
Cloudera EDH for Analytics

ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

WORKLOAD MANAGEMENT

3RD PAR...
Cloudera EDH for Analytics
Use all data with
centralized mgmt
& security
ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

ST...
Cloudera EDH for Analytics
Faster data
discovery
ANALYTIC
SQL

SEARCH
SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

...
Cloudera EDH for Analytics
Multiple tools on
one platform
ANALYTIC
IMPALA
SQL

SEARCH
ENGINE

SPARK/ ORYX
MACHINE
LEARNING...
Cloudera EDH for Analytics
Operationalize
Models
ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

SPARK
STREAM
STREAMING /
P...
Cloudera Enterprise
CLOUDERA ENTERPRISE
ANALYTIC
SQL

SEARCH
ENGINE

MACHINE
LEARNING

STREAM
PROCESSING

WORKLOAD MANAGEM...
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

23
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

24
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

25
Capabilities of Cloudera Enterprise

APACHE
HADOOP™

26
Analytics Process with EDH

Problem
ID

Project
Definition

Data Access Request
& Discovery

Model
Creation
Data Transform...
Analytics Process with EDH

Problem
ID

Project
Definition

Data
Access
Request &
Discovery

Data
Transformation

Data
Sam...
Analytics Process with EDH

Problem
ID

Project
Definition

Data
Access
Request
&
Discovery

Data
Transformation

Data
Pre...
Business Value Delivered
Data Scientists

Executives

DBA/DW
Admins

• Acquire data
necessary for projects

• Acquire nece...
Case Studies

31
Ask Bigger Questions:
How can we prevent
re-admittance?
Kaiser Permanente helps providers
recommend at-home action based o...
Kaiser Makes Medical Data Actionable
The Challenge:
•
•
•

Re-admittance is expensive, reflects sub-par provider-to-patien...
Ask Bigger Questions:
How do we feed the world?
Monsanto can automate data-driven R&D
decisions to reduce time to market f...
Monsanto feeds our growing, global population
The Challenge:
• 1,000+ research scientists developing products in silos
• D...
ARE YOU READY TO START?

Answer
questions using
ALL YOUR DATA

36
QUESTIONS?
•

Try Cloudera today

Type in the “Chat” panel to ask
a question

cloudera.com/downloads

Learn more

•

http:...
Thank You!
Josh Wills
@josh_wills
Sandy Lii
@sandyliiwozniak

38
Upcoming SlideShare
Loading in...5
×

Rethink Analytics with an Enterprise Data Hub

1,590

Published on

Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:

> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,590
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Challenge and ProblemsData discovery is 90% of the projectLong data discovery => Cannot iterate fast, cannot capture business value quicklyDS are expensive! Shorten the analytics lifecycle means you can get more project done in the same timeframe
  • Transcript of "Rethink Analytics with an Enterprise Data Hub"

    1. 1. Rethink Analytics: EDH for Advanced Analytics Josh Wills, Director of Data Science Sandy Lii, Senior Manager, Solutions Marketing 1
    2. 2. Agenda • Market Background • Challenges and Limitations • EDH for Advanced Analytics • Case Studies • How to Get Started 2
    3. 3. Market Background 3
    4. 4. From BI to Advanced Analytics What will happen? How can we do better? What happened? When? And Where? How and why did it happen? Time Data Size 4 Facts Interpretations
    5. 5. Advanced Analytics that Saves Us Money • Customer churn analysis model • Integrated customer support and services • Fraud detection 5 5
    6. 6. Advanced Analytics that Makes Us Money • Product recommendation $ 6 6 engines • Location-based real-time offers • Target-based pricing strategy
    7. 7. Traditional Advanced Analytics Process Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Sampling Model Evaluation Data Preparation Time-to-Insight 7 Model Creation Model Development Deploy Model Model Deployment
    8. 8. Challenges and Requirements 8
    9. 9. Accessing the Right Data is Difficult Multi-structured or External Data Structured Internal Data Data Warehouse 9
    10. 10. “Are we there yet?” 2. Get access to data 3. Learn about the data 4. Move data to ADW and process data 1. Find the data 6. Model Deployment Data Discovery 5. Data Modeling 10
    11. 11. Silo’d Platforms Challenge Collaboration & Mgmt Non-Agile Models Data Sources Departmental Warehouse Enterprise Apps Departmental Warehouse Reporting Silo’d Analytics Silo’d Analytics Opaque schemas accumulates over time 11 Silo’d Analytics
    12. 12. Impact of Status Quo Executives “We don’t have the information we need to answer key business questions.” Data Scientists “I’m sick of waiting for my data, I’m going to make my own copy.” 12 DBA/DW Admins “I need to make sure the DW is secure & compliant for the mission critical reports.”
    13. 13. Cloudera’s Enterprise Data Hub 13
    14. 14. Use All Your Data Use more data, and more types of data, with existing tools • Reduce the need to limit or move large datasets • Centralize information security, metadata, management, and governance • 14
    15. 15. Shorten Analytics Lifecycle Facilitate data discovery • Track data life-cycle in place • Define, test, deploy, and update models all within a single platform • 15
    16. 16. Do More with Data Deliver multi-genre analytics in a single platform • Apply diverse concurrent analytics to full datasets inplace • Protect existing technology and skillset investments • Search EDH Machine Learning BI 16 SQL Query In-memory analytics
    17. 17. Cloudera EDH for Analytics ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 17 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
    18. 18. Cloudera EDH for Analytics Use all data with centralized mgmt & security ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE HADOOP Filesystem 18 Online NoSQL SYSTEM CLOUDERA MANAGER MANAGEMENT STORAGE FOR ANY TYPE OF DATA 3RD PARTY APPS DATA MANAGEMENT BATCH MAPREDUCE PROCESSING
    19. 19. Cloudera EDH for Analytics Faster data discovery ANALYTIC SQL SEARCH SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA NAVIGATOR MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 19 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
    20. 20. Cloudera EDH for Analytics Multiple tools on one platform ANALYTIC IMPALA SQL SEARCH ENGINE SPARK/ ORYX MACHINE LEARNING / MAHOUT STREAM PROCESSING WORKLOAD MANAGEMENT RD 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 20 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
    21. 21. Cloudera EDH for Analytics Operationalize Models ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING SPARK STREAM STREAMING / PROCESSING FLUME WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 21 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
    22. 22. Cloudera Enterprise CLOUDERA ENTERPRISE ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 22 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
    23. 23. Capabilities of Cloudera Enterprise APACHE HADOOP™ 23
    24. 24. Capabilities of Cloudera Enterprise APACHE HADOOP™ 24
    25. 25. Capabilities of Cloudera Enterprise APACHE HADOOP™ 25
    26. 26. Capabilities of Cloudera Enterprise APACHE HADOOP™ 26
    27. 27. Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Model Creation Data Transformation Data Sampling Model Evaluation Data Preparation Time-to-Insight 27 Model Development Deploy Model Model Deployment
    28. 28. Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Sampling Data Preparation Time-to-Insight 28 Model Creation Model Evaluation Model Development Deploy Model Model Deployment
    29. 29. Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Preparation Data Sampling Model Creation Model Evaluation Model Development Deliver Insights Sooner 29 Deploy Model Model Deployment
    30. 30. Business Value Delivered Data Scientists Executives DBA/DW Admins • Acquire data necessary for projects • Acquire necessary information sooner to make critical business decisions • Support both reporting and analytics needs • Develop analysis/models with better lift faster • Share data sets to empower others 30 • Save resources with shared security and management
    31. 31. Case Studies 31
    32. 32. Ask Bigger Questions: How can we prevent re-admittance? Kaiser Permanente helps providers recommend at-home action based on real-time data to prevent hospital visits. 32 32 32
    33. 33. Kaiser Makes Medical Data Actionable The Challenge: • • • Re-admittance is expensive, reflects sub-par provider-to-patient communications IT infrastructures can’t accommodate 24x7 data streams from devices Diverse medical ontologies present data challenge Kaiser Permanente helps providers recommend at-home action based on real-time data to prevent hospital visits. The Solution: Cloudera EDH provides a scalable, flexible platform for collection, ingestion & dissemination of healthcare information • Ingests real-time data streams of multistructured data • 33
    34. 34. Ask Bigger Questions: How do we feed the world? Monsanto can automate data-driven R&D decisions to reduce time to market from years to months. 34
    35. 35. Monsanto feeds our growing, global population The Challenge: • 1,000+ research scientists developing products in silos • Data processing bottleneck slows development • Time to market for new product is 5-10 years Monsanto can automate data-driven R&D decisions to reduce time to market to months from years. The Solution: • Cloudera Enterprise + Search + Impala: PB-scale platform for single view of all R&D data • Integration: Exadata, spatial awareness & visualization • Scientists directly access CDH; Navigator offers auditing & access control 35
    36. 36. ARE YOU READY TO START? Answer questions using ALL YOUR DATA 36
    37. 37. QUESTIONS? • Try Cloudera today Type in the “Chat” panel to ask a question cloudera.com/downloads Learn more • http://tinyurl.com/membtaw Tweet @cloudera Register now for Data Analysts Training • • 37 Follow Josh @josh_wills Follow Sandy @sandyliiwozniak Recording will be available on-demand at cloudera.com university.cloudera.com • • Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until May 2014* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until May 2014* * Excludes classes sold or delivered by Cloudera Partners
    38. 38. Thank You! Josh Wills @josh_wills Sandy Lii @sandyliiwozniak 38

    ×