• Save
Rethink Analytics with an Enterprise Data Hub
 

Rethink Analytics with an Enterprise Data Hub

on

  • 1,335 views

Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture: ...

Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:

> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?

Statistics

Views

Total Views
1,335
Views on SlideShare
1,019
Embed Views
316

Actions

Likes
7
Downloads
0
Comments
0

4 Embeds 316

http://www.cloudera.com 305
http://cloudera.com 7
http://author01.mtv.cloudera.com 2
http://author01.core.cloudera.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Challenge and ProblemsData discovery is 90% of the projectLong data discovery => Cannot iterate fast, cannot capture business value quicklyDS are expensive! Shorten the analytics lifecycle means you can get more project done in the same timeframe

Rethink Analytics with an Enterprise Data Hub Rethink Analytics with an Enterprise Data Hub Presentation Transcript

  • Rethink Analytics: EDH for Advanced Analytics Josh Wills, Director of Data Science Sandy Lii, Senior Manager, Solutions Marketing 1
  • Agenda • Market Background • Challenges and Limitations • EDH for Advanced Analytics • Case Studies • How to Get Started 2
  • Market Background 3
  • From BI to Advanced Analytics What will happen? How can we do better? What happened? When? And Where? How and why did it happen? Time Data Size 4 Facts Interpretations
  • Advanced Analytics that Saves Us Money • Customer churn analysis model • Integrated customer support and services • Fraud detection 5 5
  • Advanced Analytics that Makes Us Money • Product recommendation $ 6 6 engines • Location-based real-time offers • Target-based pricing strategy
  • Traditional Advanced Analytics Process Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Sampling Model Evaluation Data Preparation Time-to-Insight 7 Model Creation Model Development Deploy Model Model Deployment
  • Challenges and Requirements 8
  • Accessing the Right Data is Difficult Multi-structured or External Data Structured Internal Data Data Warehouse 9
  • “Are we there yet?” 2. Get access to data 3. Learn about the data 4. Move data to ADW and process data 1. Find the data 6. Model Deployment Data Discovery 5. Data Modeling 10
  • Silo’d Platforms Challenge Collaboration & Mgmt Non-Agile Models Data Sources Departmental Warehouse Enterprise Apps Departmental Warehouse Reporting Silo’d Analytics Silo’d Analytics Opaque schemas accumulates over time 11 Silo’d Analytics
  • Impact of Status Quo Executives “We don’t have the information we need to answer key business questions.” Data Scientists “I’m sick of waiting for my data, I’m going to make my own copy.” 12 DBA/DW Admins “I need to make sure the DW is secure & compliant for the mission critical reports.”
  • Cloudera’s Enterprise Data Hub 13
  • Use All Your Data Use more data, and more types of data, with existing tools • Reduce the need to limit or move large datasets • Centralize information security, metadata, management, and governance • 14
  • Shorten Analytics Lifecycle Facilitate data discovery • Track data life-cycle in place • Define, test, deploy, and update models all within a single platform • 15
  • Do More with Data Deliver multi-genre analytics in a single platform • Apply diverse concurrent analytics to full datasets inplace • Protect existing technology and skillset investments • Search EDH Machine Learning BI 16 SQL Query In-memory analytics
  • Cloudera EDH for Analytics ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 17 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • Cloudera EDH for Analytics Use all data with centralized mgmt & security ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE HADOOP Filesystem 18 Online NoSQL SYSTEM CLOUDERA MANAGER MANAGEMENT STORAGE FOR ANY TYPE OF DATA 3RD PARTY APPS DATA MANAGEMENT BATCH MAPREDUCE PROCESSING
  • Cloudera EDH for Analytics Faster data discovery ANALYTIC SQL SEARCH SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA NAVIGATOR MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 19 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • Cloudera EDH for Analytics Multiple tools on one platform ANALYTIC IMPALA SQL SEARCH ENGINE SPARK/ ORYX MACHINE LEARNING / MAHOUT STREAM PROCESSING WORKLOAD MANAGEMENT RD 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 20 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • Cloudera EDH for Analytics Operationalize Models ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING SPARK STREAM STREAMING / PROCESSING FLUME WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 21 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • Cloudera Enterprise CLOUDERA ENTERPRISE ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING WORKLOAD MANAGEMENT 3RD PARTY APPS DATA MANAGEMENT BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA Filesystem 22 Online NoSQL SYSTEM MANAGEMENT UNIFIED, ELASTIC, RESILIENT, SECURE
  • Capabilities of Cloudera Enterprise APACHE HADOOP™ 23
  • Capabilities of Cloudera Enterprise APACHE HADOOP™ 24
  • Capabilities of Cloudera Enterprise APACHE HADOOP™ 25
  • Capabilities of Cloudera Enterprise APACHE HADOOP™ 26
  • Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Model Creation Data Transformation Data Sampling Model Evaluation Data Preparation Time-to-Insight 27 Model Development Deploy Model Model Deployment
  • Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Sampling Data Preparation Time-to-Insight 28 Model Creation Model Evaluation Model Development Deploy Model Model Deployment
  • Analytics Process with EDH Problem ID Project Definition Data Access Request & Discovery Data Transformation Data Preparation Data Sampling Model Creation Model Evaluation Model Development Deliver Insights Sooner 29 Deploy Model Model Deployment
  • Business Value Delivered Data Scientists Executives DBA/DW Admins • Acquire data necessary for projects • Acquire necessary information sooner to make critical business decisions • Support both reporting and analytics needs • Develop analysis/models with better lift faster • Share data sets to empower others 30 • Save resources with shared security and management
  • Case Studies 31
  • Ask Bigger Questions: How can we prevent re-admittance? Kaiser Permanente helps providers recommend at-home action based on real-time data to prevent hospital visits. 32 32 32
  • Kaiser Makes Medical Data Actionable The Challenge: • • • Re-admittance is expensive, reflects sub-par provider-to-patient communications IT infrastructures can’t accommodate 24x7 data streams from devices Diverse medical ontologies present data challenge Kaiser Permanente helps providers recommend at-home action based on real-time data to prevent hospital visits. The Solution: Cloudera EDH provides a scalable, flexible platform for collection, ingestion & dissemination of healthcare information • Ingests real-time data streams of multistructured data • 33
  • Ask Bigger Questions: How do we feed the world? Monsanto can automate data-driven R&D decisions to reduce time to market from years to months. 34
  • Monsanto feeds our growing, global population The Challenge: • 1,000+ research scientists developing products in silos • Data processing bottleneck slows development • Time to market for new product is 5-10 years Monsanto can automate data-driven R&D decisions to reduce time to market to months from years. The Solution: • Cloudera Enterprise + Search + Impala: PB-scale platform for single view of all R&D data • Integration: Exadata, spatial awareness & visualization • Scientists directly access CDH; Navigator offers auditing & access control 35
  • ARE YOU READY TO START? Answer questions using ALL YOUR DATA 36
  • QUESTIONS? • Try Cloudera today Type in the “Chat” panel to ask a question cloudera.com/downloads Learn more • http://tinyurl.com/membtaw Tweet @cloudera Register now for Data Analysts Training • • 37 Follow Josh @josh_wills Follow Sandy @sandyliiwozniak Recording will be available on-demand at cloudera.com university.cloudera.com • • Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until May 2014* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until May 2014* * Excludes classes sold or delivered by Cloudera Partners
  • Thank You! Josh Wills @josh_wills Sandy Lii @sandyliiwozniak 38