• Save
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 

Enable Advanced Analytics with Hadoop and an Enterprise Data Hub

on

  • 1,003 views

 

Statistics

Views

Total Views
1,003
Views on SlideShare
1,003
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The requirements coming from executive: To be able to answer key business questions while run operational reports has created a strained situation between the Data Scientists and the DBA/DW Admins. DBA/DW Admins are forced to choose between DW is secure and compliant, and meeting Data Scientists’ requirements for accessing the data they need when they need them.
  • Common misconception is that data science work centers around model development. While model development is crucial, most of the effort and time are spent on data preparation. This is due to that in the traditional world of analytics, there are a lot of data movement, which is both time-consuming and limiting for the things data scientists can do.
  • And so if we come back to look at how this solution now affects the three groups of people in an enterprise, who are closest to the data, we quickly see that:For Data ScientistHe is able to acquire data necessary for the project very quickly, without the need to create rogue data martsBecause he can now use all the data very quickly, he can develop models with much better liftOnce he has the insights, he can share the data set to empower other usersFor the DW administratorHe can now support both the running of mission critical reports in his DW, while fulfilling the need for data from the data scientistsAnd he can save resources and time, now all the data are in one centralized location with unified security and management,For the Executive She can finally get the overall report that she needs on regular basis, but still able to gain competitive edge, whether it’s decreasing costs/risks or increasing revenue, with the insights gained from the use of all the data

Enable Advanced Analytics with Hadoop and an Enterprise Data Hub Enable Advanced Analytics with Hadoop and an Enterprise Data Hub Presentation Transcript

  • Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
  • 3 Agenda • Business Problem • Current Challenges • Agile Analytics • Case Studies
  • 4 Business Problem
  • 5 From BI to Advanced Analytics What happened? When? And Where? What will happen? How and why did it happen? Time Data Size Facts Interpretations How can we do better?
  • 6 Advanced Analytics that Saves Us Money • Customer churn analysis model • Integrated customer support and services • Fraud detection 6
  • 7 Advanced Analytics that Makes Us Money • Product recommendation engines • Location-based real- time offers • Target-based pricing strategy 7 $
  • 8 Analytic Opportunities 8 Marketing Operations t value$ Total Market Sales Known Market Customers
  • 9 Enterprise Pressures - Questions 9 Marketing Operations t value$ Total Market Sales Known Market Customers“We want to know what our customer do on-line and in our stored. How can we combine data from separate analytics silos to understand & serve them better?” “How can we reduce stock- outs & ensure products are in the right stores at the right time? Can we combine data from our carriers with in- store historical data from thousands of stores? “Theft, or ‘shrinkage’ in our stores is on the increase – can we combine POS data with video surveillance to reduce it without impacting customer service negatively?”
  • 10 Enterprise Pressures - Questions 10 Marketing Operations t value$ Total Market Sales Known Market Customers“We want to know what our customer do on-line and in our stored. How can we combine data from separate analytics silos to understand & serve them better?” “How can we reduce stock- outs & ensure products are in the right stores at the right time? Can we combine data from our carriers with in- store historical data from thousands of stores? “Theft, or ‘shrinkage’ in our stores is on the increase – can we combine POS data with video surveillance to reduce it without impacting customer service negatively?” Data Products
  • 11 Current Challenges
  • 12 Data Product Value Cost to implement (in time, budget, people, tools) V A L U E 5 6 7 8 2 3 4 sensor data Multi-source – Fuzzy Value operational data 1 $500K $1M $500K $1M
  • 13 Data Product + Risk Cost to implement (in time, budget, people, tools) V A L U E 5 sensor data Known Value Single-Source 1 4 7 low medium high 13 3 Multi-source – Fuzzy Value 6 8 2 $500K $1M $500K $1M Risks
  • 14 “I’m sick of waiting for my data, I’m going to make my own copy.” “I need to make sure the DW is secure & compliant for the mission critical reports.” Impact of Status Quo “We don’t have the information we need to answer key business questions.” DBA/DW Admins Executives Data Scientists
  • 15 What if? 15 Cost to implement (in time, budget, people, tools) V A L U E 5 3 1 4 6 8 7 2 $500K $1M $500K $1M low medium high Risks
  • 16 Agile Analytics
  • 17 Traditional Advanced Analytics Process Time-to-Insight Project Definition Data Preparation Exploratory Analytics Operational Analytics Model Creation Model Evaluation Deploy Model Problem ID Data Sampling Data Access Request & Discovery Data Transformation
  • 18 Time-to-Insight Project Definition Data Preparation Exploratory Analytics Operational Analytics Model Creation Model Evaluation Data Sampling Data Access Request & Discovery Deploy Model Problem ID Data Transformation Analytics Process with EDH
  • 19 Time-to-Insight Project Definition Data Preparation Exploratory Analytics Operational Analytics Model Creation Model Evaluation Deploy Model Problem ID Analytics Process with EDH Data Sampling Data Access Request & Discovery Data Transfor- mation
  • 20 Analytics Process with EDH Project Definition Data Preparation Exploratory Analytics Operational Analytics Model Creation Model Evaluation Data Sampling Data Access Request & Discovery Deploy Model Problem ID Deliver Insights Sooner Data Transfor- mation
  • 21 Issues 21 Operations value$ SalesMarketing Marketing Market Data System Information
  • 22 Step 1 : Collect all Data 22 Marketing Market Data System Information STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE Marketing
  • 23 Step 2 : Create Derived Datasets 23 Marketing BATCH PROCESSING 3RD PARTY APPS Data Set 1 Data Set 2
  • 24 Step 2 : Create Derived Datasets 24 Marketing BATCH PROCESSING 3RD PARTY APPS Data Set 1 Data Set 2
  • 25 Step 3 : Data Analysts 25 Marketing Data Set 1 Data Set 2 ANALYTIC SQL SEARCH ENGINE
  • 26 Step 4 : Analytics 26 Marketing Data Set 1 Data Set 2 MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS Clustering Recommender Regression
  • 27 Step 4 Cont: Analytics + Data Together 27 Data Set 1 Data Set 2 Old Way SAS/R JDBC-SELECT 10% MACHINE LEARNING SAS+/R+ (ORYX) ALGORITHM
  • 28 Cloudera EDH for Analytics BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT Filesystem Online NoSQL
  • 29 • Acquire necessary information sooner to make critical business decisions Executives Business Value Delivered • Support both reporting and analytics needs • Save resources with shared security and management DBA/DW Admins • Acquire data necessary for projects • Develop analysis/models with better lift faster • Share data sets to empower others Data Scientists
  • 30 Case Studies
  • 31 Monsanto can automate data-driven R&D decisions to reduce time to market from years to months. Ask Bigger Questions: How do we feed the world?
  • 32 Monsanto feeds our growing, global population The Challenge: • 1,000+ research scientists developing products in silos • Data processing bottleneck slows development • Time to market for new product is 5-10 years The Solution: • Cloudera Enterprise + Search + Impala: PB-scale platform for single view of all R&D data • Integration: Exadata, spatial awareness & visualization • Scientists directly access CDH; Navigator offers auditing & access control Monsanto can automate data- driven R&D decisions to reduce time to market to months from years.
  • 33 Patterns and Predictions analyzes mobile data and social networking text for real-time identification of risk factors. Ask Bigger Questions: How can we prevent veteran suicide?
  • 34 Patterns and Predictions aids suicide prevention The Challenge: • Suicide rates among veterans are roughly double that of general US adults • Military efforts struggle to understand risk factors The Solution: • Suicide risk predictive solution built on Cloudera + Attivio • Analyzes veterans’ mobile & social data for real- time identification of risk factors • Integrating Cloudera Search + Impala to simplify environment The Durkheim Project predicts suicide risk with statistical significance (65%+ accuracy).
  • Thank You! 35