1
Machine Learning Loves Hadoop
Enabling Machine Learning to Accelerate Data Returns
2
Agenda
©2014 Cloudera, Inc. All rights reserved.
Hadoop and Cloudera Overview
Machine Learning + The Enterprise Data Hub
Machine Learning in Practice
Q&A
Speakers
TJ Laher
Product Marketing
Sean Owen
Director of Data Science
Get Social
#ClouderaWebinars
3
Where Hadoop Began
©2014 Cloudera, Inc. All rights reserved.
Web Indexing, Google Earth,
Google Finance
Web Indexing Storing User Generated Data
2006 2008 2010
4
How Cloudera Accelerated Adoption
©2014 Cloudera, Inc. All rights reserved.
2008 2009 2010 2011 2012 2013 2014
CDH
Cloudera
Manager
CLOUDERA
ENTERPRISE
4
ASK BIGGER
QUESTIONS
ENTERPRISE
DATA HUB
Cloudera
Launched
Hadoop
Creator, Doug
Cutting, Joins
Launch CDH:
1st Commercial
Hadoop Distro
Launch
Cloudera
Manager: 1st
Hadoop
Management
Application
Cloudera U
Expands to 140
Countries
100 Customer
in Production
Release
Cloudera
Enterprise 4
300 Partners in
Cloudera
Connect
Introduce
Cloudera
Navigator,
Impala, Search
Realized the
Enterprise Data
Hub
Tom Reilly
Joins as CEO
5
The Enterprise Data Hub
©2014 Cloudera, Inc. All rights reserved.
EDHpoweredby
ApacheHadoop™
Unified
Out-of-the-box capabilities for
infinite scalability for storage,
ingest, access, metadata,
security, governance, and
management
Compliance-Ready
End-to-end security and
governance: authentication,
authorization, encryption, audit,
and lineage
Accessible
Utilize familiar tools and skills to
get value from your data faster
Multiple frameworks, including
batch and stream processing, in-
memory analytic SQL, enterprise
search, machine learning
Open
100% open source
– all components are Apache
licensed
Deploy in the cloud, on-premises,
or with an appliance
Social
Financial
Transactions
Sensor
OR
6
What does an EDH look like?
Model Building BI/ Visualizations Point Solutions
Processing Online
NoSQL
DBMS
Analytic
MPP DBMS
Search
Engine
Batch
Processing
Stream
Processing
Machine
Learning
Unified Management & Distributed Storage
Management &
Storage
Applications
Data Sources
Custom
Solutions
Management
Security & Governance
Metadata
Data
7
Machine Learning + The Enterprise Data Hub
8
Why do we use machine learning?
©2014 Cloudera, Inc. All rights reserved.
Transaction Classification
Recommendation Engine
Dynamic Pricing
…
Drug Discovery
Energy Exploration
Executive Reports
…
Operational AnalyticsInvestigative
9
Machine Learning Breakdown
©2014 Cloudera, Inc. All rights reserved.
Classification
Regression
Clustering
Collaborative Filtering
Category Algorithm Goal
Logistic Regression &
Random Decision Forest
Generalized Linear Models
K-means++
Alternating Least Squares
SupervisedUnsupervised
Pattern Recognition
Predict Future Values
Segment Historic Data
Recommend Items
10
Common Challenges with Machine Learning
©2014 Cloudera, Inc. All rights reserved.
Challenges The Cost
Time
False Positive and Negatives
Uncertainty of Model Quality
Unable to Explain and Improve Models
Bad Results
Traditional
Systems
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
11
How an Enterprise Data Hub Helps
©2014 Cloudera, Inc. All rights reserved.
Challenges The Benefit
Enterprise
Data Hub
Reduce Iteration Time
Eliminate Sampling
Test on Archived Data
Audit Data Trail
Immediate Data Access
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
12
Machine Learning in Practice
13
Fraud Detection
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Credit Card Transactions K-means++
Machine learning model leads to reduction of false negatives saving
organizations millions of dollars in fraud loss.
Management
Security & Governance
Metadata
Data
Offline Online
Cloudera
Navigator
Distributed Storage Modeling Rules Engine
14
Product Recommendations
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Customer Purchases
Social Data
Alternating Least Squares
Product recommendation engine, powered by machine learning model,
increases purchase conversation rates.
Management
Security & Governance
Metadata
Data
Distributed Storage Modeling Serve Value
Offline Online
Product #1
Product #2
Product #3
15
Predictive Maintenance
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Machine Sensors Logistic Regression
Machine learning model alerts employees for early identification of machine
failure reducing onsite visits.
Offline Online
Sensor Data Modeling Custom Application
16
Q&A
17
What’s Next?
©2014 Cloudera, Inc. All rights reserved.
TJ Laher
tlaher@cloudera.com
Sean Owen
sowen@cloudera.com
Contact Us
@Cloudera
1-866-843-7207
Use discount code Analytics10 to save 10% on new
enrollments in classes delivered by Cloudera until Sept ‘14*
Use discount code 15off2 to save 15% on enrollments in two
or more classes delivered by Cloudera until Sept ‘14*
Register now for Data Analyst, Spark,
or Data Science training at
http://university.cloudera.com

Machine Learning Loves Hadoop

  • 1.
    1 Machine Learning LovesHadoop Enabling Machine Learning to Accelerate Data Returns
  • 2.
    2 Agenda ©2014 Cloudera, Inc.All rights reserved. Hadoop and Cloudera Overview Machine Learning + The Enterprise Data Hub Machine Learning in Practice Q&A Speakers TJ Laher Product Marketing Sean Owen Director of Data Science Get Social #ClouderaWebinars
  • 3.
    3 Where Hadoop Began ©2014Cloudera, Inc. All rights reserved. Web Indexing, Google Earth, Google Finance Web Indexing Storing User Generated Data 2006 2008 2010
  • 4.
    4 How Cloudera AcceleratedAdoption ©2014 Cloudera, Inc. All rights reserved. 2008 2009 2010 2011 2012 2013 2014 CDH Cloudera Manager CLOUDERA ENTERPRISE 4 ASK BIGGER QUESTIONS ENTERPRISE DATA HUB Cloudera Launched Hadoop Creator, Doug Cutting, Joins Launch CDH: 1st Commercial Hadoop Distro Launch Cloudera Manager: 1st Hadoop Management Application Cloudera U Expands to 140 Countries 100 Customer in Production Release Cloudera Enterprise 4 300 Partners in Cloudera Connect Introduce Cloudera Navigator, Impala, Search Realized the Enterprise Data Hub Tom Reilly Joins as CEO
  • 5.
    5 The Enterprise DataHub ©2014 Cloudera, Inc. All rights reserved. EDHpoweredby ApacheHadoop™ Unified Out-of-the-box capabilities for infinite scalability for storage, ingest, access, metadata, security, governance, and management Compliance-Ready End-to-end security and governance: authentication, authorization, encryption, audit, and lineage Accessible Utilize familiar tools and skills to get value from your data faster Multiple frameworks, including batch and stream processing, in- memory analytic SQL, enterprise search, machine learning Open 100% open source – all components are Apache licensed Deploy in the cloud, on-premises, or with an appliance Social Financial Transactions Sensor OR
  • 6.
    6 What does anEDH look like? Model Building BI/ Visualizations Point Solutions Processing Online NoSQL DBMS Analytic MPP DBMS Search Engine Batch Processing Stream Processing Machine Learning Unified Management & Distributed Storage Management & Storage Applications Data Sources Custom Solutions Management Security & Governance Metadata Data
  • 7.
    7 Machine Learning +The Enterprise Data Hub
  • 8.
    8 Why do weuse machine learning? ©2014 Cloudera, Inc. All rights reserved. Transaction Classification Recommendation Engine Dynamic Pricing … Drug Discovery Energy Exploration Executive Reports … Operational AnalyticsInvestigative
  • 9.
    9 Machine Learning Breakdown ©2014Cloudera, Inc. All rights reserved. Classification Regression Clustering Collaborative Filtering Category Algorithm Goal Logistic Regression & Random Decision Forest Generalized Linear Models K-means++ Alternating Least Squares SupervisedUnsupervised Pattern Recognition Predict Future Values Segment Historic Data Recommend Items
  • 10.
    10 Common Challenges withMachine Learning ©2014 Cloudera, Inc. All rights reserved. Challenges The Cost Time False Positive and Negatives Uncertainty of Model Quality Unable to Explain and Improve Models Bad Results Traditional Systems Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 11.
    11 How an EnterpriseData Hub Helps ©2014 Cloudera, Inc. All rights reserved. Challenges The Benefit Enterprise Data Hub Reduce Iteration Time Eliminate Sampling Test on Archived Data Audit Data Trail Immediate Data Access Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 12.
  • 13.
    13 Fraud Detection ©2014 Cloudera,Inc. All rights reserved. Data Algorithm Outcome Credit Card Transactions K-means++ Machine learning model leads to reduction of false negatives saving organizations millions of dollars in fraud loss. Management Security & Governance Metadata Data Offline Online Cloudera Navigator Distributed Storage Modeling Rules Engine
  • 14.
    14 Product Recommendations ©2014 Cloudera,Inc. All rights reserved. Data Algorithm Outcome Customer Purchases Social Data Alternating Least Squares Product recommendation engine, powered by machine learning model, increases purchase conversation rates. Management Security & Governance Metadata Data Distributed Storage Modeling Serve Value Offline Online Product #1 Product #2 Product #3
  • 15.
    15 Predictive Maintenance ©2014 Cloudera,Inc. All rights reserved. Data Algorithm Outcome Machine Sensors Logistic Regression Machine learning model alerts employees for early identification of machine failure reducing onsite visits. Offline Online Sensor Data Modeling Custom Application
  • 16.
  • 17.
    17 What’s Next? ©2014 Cloudera,Inc. All rights reserved. TJ Laher tlaher@cloudera.com Sean Owen sowen@cloudera.com Contact Us @Cloudera 1-866-843-7207 Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until Sept ‘14* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until Sept ‘14* Register now for Data Analyst, Spark, or Data Science training at http://university.cloudera.com