• Like
  • Save

Machine Learning Loves Hadoop

  • 3,271 views
Uploaded on

Presentation originated from live webinar... view the webinar here (https://www.brighttalk.com/webcast/10565/114179) …

Presentation originated from live webinar... view the webinar here (https://www.brighttalk.com/webcast/10565/114179)

During this webinar you will hear from Cloudera’s Director of Data Science, Sean Owen, as he discusses how Cloudera’s enterprise data hub allows data scientist to leverage libraries full of machine learning algorithms to analyze high dimensional, high volume data. Sean will also speak to common machine learning challenges and how Cloudera’s enterprise data hub can help eliminate these issues.

Topics we will cover during the presentation will include:
What is machine learning?
Why should I use machine learning algorithms?
What are the common challenges of machine learning?
How does Cloudera’s enterprise data hub support machine learning?

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,271
On Slideshare
0
From Embeds
0
Number of Embeds
12

Actions

Shares
Downloads
28
Comments
0
Likes
20

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1 Machine Learning Loves Hadoop Enabling Machine Learning to Accelerate Data Returns
  • 2. 2 Agenda ©2014 Cloudera, Inc. All rights reserved. Hadoop and Cloudera Overview Machine Learning + The Enterprise Data Hub Machine Learning in Practice Q&A Speakers TJ Laher Product Marketing Sean Owen Director of Data Science Get Social #ClouderaWebinars
  • 3. 3 Where Hadoop Began ©2014 Cloudera, Inc. All rights reserved. Web Indexing, Google Earth, Google Finance Web Indexing Storing User Generated Data 2006 2008 2010
  • 4. 4 How Cloudera Accelerated Adoption ©2014 Cloudera, Inc. All rights reserved. 2008 2009 2010 2011 2012 2013 2014 CDH Cloudera Manager CLOUDERA ENTERPRISE 4 ASK BIGGER QUESTIONS ENTERPRISE DATA HUB Cloudera Launched Hadoop Creator, Doug Cutting, Joins Launch CDH: 1st Commercial Hadoop Distro Launch Cloudera Manager: 1st Hadoop Management Application Cloudera U Expands to 140 Countries 100 Customer in Production Release Cloudera Enterprise 4 300 Partners in Cloudera Connect Introduce Cloudera Navigator, Impala, Search Realized the Enterprise Data Hub Tom Reilly Joins as CEO
  • 5. 5 The Enterprise Data Hub ©2014 Cloudera, Inc. All rights reserved. EDHpoweredby ApacheHadoop™ Unified Out-of-the-box capabilities for infinite scalability for storage, ingest, access, metadata, security, governance, and management Compliance-Ready End-to-end security and governance: authentication, authorization, encryption, audit, and lineage Accessible Utilize familiar tools and skills to get value from your data faster Multiple frameworks, including batch and stream processing, in- memory analytic SQL, enterprise search, machine learning Open 100% open source – all components are Apache licensed Deploy in the cloud, on-premises, or with an appliance Social Financial Transactions Sensor OR
  • 6. 6 What does an EDH look like? Model Building BI/ Visualizations Point Solutions Processing Online NoSQL DBMS Analytic MPP DBMS Search Engine Batch Processing Stream Processing Machine Learning Unified Management & Distributed Storage Management & Storage Applications Data Sources Custom Solutions Management Security & Governance Metadata Data
  • 7. 7 Machine Learning + The Enterprise Data Hub
  • 8. 8 Why do we use machine learning? ©2014 Cloudera, Inc. All rights reserved. Transaction Classification Recommendation Engine Dynamic Pricing … Drug Discovery Energy Exploration Executive Reports … Operational AnalyticsInvestigative
  • 9. 9 Machine Learning Breakdown ©2014 Cloudera, Inc. All rights reserved. Classification Regression Clustering Collaborative Filtering Category Algorithm Goal Logistic Regression & Random Decision Forest Generalized Linear Models K-means++ Alternating Least Squares SupervisedUnsupervised Pattern Recognition Predict Future Values Segment Historic Data Recommend Items
  • 10. 10 Common Challenges with Machine Learning ©2014 Cloudera, Inc. All rights reserved. Challenges The Cost Time False Positive and Negatives Uncertainty of Model Quality Unable to Explain and Improve Models Bad Results Traditional Systems Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 11. 11 How an Enterprise Data Hub Helps ©2014 Cloudera, Inc. All rights reserved. Challenges The Benefit Enterprise Data Hub Reduce Iteration Time Eliminate Sampling Test on Archived Data Audit Data Trail Immediate Data Access Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 12. 12 Machine Learning in Practice
  • 13. 13 Fraud Detection ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Credit Card Transactions K-means++ Machine learning model leads to reduction of false negatives saving organizations millions of dollars in fraud loss. Management Security & Governance Metadata Data Offline Online Cloudera Navigator Distributed Storage Modeling Rules Engine
  • 14. 14 Product Recommendations ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Customer Purchases Social Data Alternating Least Squares Product recommendation engine, powered by machine learning model, increases purchase conversation rates. Management Security & Governance Metadata Data Distributed Storage Modeling Serve Value Offline Online Product #1 Product #2 Product #3
  • 15. 15 Predictive Maintenance ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Machine Sensors Logistic Regression Machine learning model alerts employees for early identification of machine failure reducing onsite visits. Offline Online Sensor Data Modeling Custom Application
  • 16. 16 Q&A
  • 17. 17 What’s Next? ©2014 Cloudera, Inc. All rights reserved. TJ Laher tlaher@cloudera.com Sean Owen sowen@cloudera.com Contact Us @Cloudera 1-866-843-7207 Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until Sept ‘14* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until Sept ‘14* Register now for Data Analyst, Spark, or Data Science training at http://university.cloudera.com