Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning

•

1 like•922 views

The document discusses Cloudera Fast Forward Labs and how it can help organizations accelerate their machine learning and data strategies. It provides research, advising, and application development services to help clients stay on top of emerging technologies, define optimal data strategies, and evaluate machine learning capabilities. Cloudera Fast Forward Labs aims to be organizations' partner for creating and executing excellent data strategies.

Technology

Accelerating Machine Learning Strategy
Shioulin Sam
Cloudera Fast Forward Labs

© Cloudera, Inc. All rights reserved.
2
Machine
Learning
Data Science
Analytics
Big Data
AI

© Cloudera, Inc. All rights reserved.
There is a generic formulation of the
problem, and then there is your problem
There is no software that can solve your
problem

© Cloudera, Inc. All rights reserved.
We believe technology and strategy are part
of the same problem.

© Cloudera, Inc. All rights reserved.
Cloudera Fast Forward Labs is your partner
to create and execute on an excellent data
strategy.
We accelerate your data and machine
learning strategy with expert research and
advising.

© Cloudera, Inc. All rights reserved.
Academic ResearchStartups
Enterprise

© Cloudera, Inc. All rights reserved.
Research
Stay on top of emerging
ML technologies
Advising
Define data strategy
Evaluate ML capabilities
Application
Development
Feasibility studies:
Build a ML product
Fast Forward Lab Services

© Cloudera, Inc. All rights reserved.
Gentle Introduction
Algorithm
Prototype
Commercial and Open Source Landscape
Ethics
Sci-fi short story
A Fast Forward Labs Report

© Cloudera, Inc. All rights reserved.
• Retail bank parsing customer service transcripts to better
recommend actions
• Investment bank parsing news effectively for commodities traders
Sample Use Cases

© Cloudera, Inc. All rights reserved.
Sample Use Cases
• Telecom modeling reasons and remedies for customer churn
• Regulatory compliance and bias testing

© Cloudera, Inc. All rights reserved.
Uncertainty
Fast Forward Labs cuts through the hype
Data Silos
Enterprise Data Hub provides a unified
foundation
Productivity
Data Science Workbench enables
collaborative self-serve
Machine Learning
at Cloudera

© Cloudera, Inc. All rights reserved.
THANK YOU

What's hot

How komatsu is driving operational efficiencies using io t and machine learni...Cloudera, Inc.

Big data journey to the cloud maz chaudhri 5.30.18Cloudera, Inc.

Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.

Making Self-Service BI a Reality in the EnterpriseCloudera, Inc.

Introducing Workload XM 8.7.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Data Drive Applications_WebinarSean Spediacci

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.

The Vision & Challenge of Applied Machine LearningCloudera, Inc.

Get started with Cloudera's cyber solutionCloudera, Inc.

How Cloudera SDX can aid GDPR complianceCloudera, Inc.

How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Protecting health and life science organizations from breaches and ransomwareCloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

What's hot (20)

How komatsu is driving operational efficiencies using io t and machine learni...

Big data journey to the cloud maz chaudhri 5.30.18

Self-service Big Data Analytics on Microsoft Azure

Making Self-Service BI a Reality in the Enterprise

Introducing Workload XM 8.7.18

Modern Data Warehouse Fundamentals Part 3

Data Drive Applications_Webinar

Edc event vienna presentation 1 oct 2019

Spark and Deep Learning Frameworks at Scale 7.19.18

The Vision & Challenge of Applied Machine Learning

Get started with Cloudera's cyber solution

How Cloudera SDX can aid GDPR compliance

How to Build Continuous Ingestion for the Internet of Things

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Cloud Data Warehousing with Cloudera Altus 7.24.18

Introducing the data science sandbox as a service 8.30.18

Protecting health and life science organizations from breaches and ransomware

Modern Data Warehouse Fundamentals Part 2

Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Similar to Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning

Data Science in EnterpriseJosh Yeh

Cloudera Fast Forward Labs: Accelerate machine learningCloudera, Inc.

Data-Driven Customer SupportCloudera, Inc.

Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.

2016 Cybersecurity Analytics State of the UnionCloudera, Inc.

Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit

Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Cloudera, Inc.

The Five Markers on Your Big Data JourneyCloudera, Inc.

12 Steps to get Started with Cloud.pdfAmazon Web Services

Big Data Fundamentals 6.6.18Cloudera, Inc.

Leading Your Team Through a Cloud Transformation - Virtual Transformation Day...Amazon Web Services

Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs

NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA DATASCIENCE

How to classify documents automatically using NLPSkyl.ai

Becoming Data-Driven Through Cultural ChangeCloudera, Inc.

Leading Your Team Through a Cloud Transformation - AWS Online Tech TalksAmazon Web Services

Mythbusting the Federal Cloud JourneyAmazon Web Services

TC028SN_Spencer_FINALTerri Spencer

Skytree Partner Program 2-15Dylan Steeg

Similar to Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning (20)

Data Science in Enterprise

Cloudera Fast Forward Labs: Accelerate machine learning

Data-Driven Customer Support

Standing Up an Effective Enterprise Data Hub -- Technology and Beyond

2016 Cybersecurity Analytics State of the Union

Machine Learning Model Deployment: Strategy to Implementation

Transform Banking with Big Data and Automated Machine Learning 9.12.17

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...

The Five Markers on Your Big Data Journey

12 Steps to get Started with Cloud.pdf

Big Data Fundamentals 6.6.18

Leading Your Team Through a Cloud Transformation - Virtual Transformation Day...

Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING

NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench

How to classify documents automatically using NLP

Becoming Data-Driven Through Cultural Change

Leading Your Team Through a Cloud Transformation - AWS Online Tech Talks

Mythbusting the Federal Cloud Journey

TC028SN_Spencer_FINAL

Skytree Partner Program 2-15

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Why Teams call analytics are critical to your entire businesspanagenda

Manulife - Insurer Innovation Award 2024The Digital Insurer

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

GenAI Risks & Security Meetup 01052024.pdflior mazor

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

A Year of the Servo Reboot: Where Are We Now?Igalia

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

MINDCTI Revenue Release Quarter One 2024MIND CTI

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Artificial Intelligence: Facts and MythsJoaquim Jorge

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Apidays New York 2024 - The value of a flexible API Management solution for O...

presentation ICT roal in 21st century education

Why Teams call analytics are critical to your entire business

Manulife - Insurer Innovation Award 2024

Partners Life - Insurer Innovation Award 2024

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Exploring the Future Potential of AI-Enabled Smartphone Processors

GenAI Risks & Security Meetup 01052024.pdf

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Automating Google Workspace (GWS) & more with Apps Script

Top 10 Most Downloaded Games on Play Store in 2024

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

A Year of the Servo Reboot: Where Are We Now?

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

MINDCTI Revenue Release Quarter One 2024

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Artificial Intelligence: Facts and Myths

Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning

1. Accelerating Machine Learning Strategy Shioulin Sam Cloudera Fast Forward Labs

5. © Cloudera, Inc. All rights reserved. Cloudera Fast Forward Labs is your partner to create and execute on an excellent data strategy. We accelerate your data and machine learning strategy with expert research and advising.

7. © Cloudera, Inc. All rights reserved. Research Stay on top of emerging ML technologies Advising Define data strategy Evaluate ML capabilities Application Development Feasibility studies: Build a ML product Fast Forward Lab Services

12. © Cloudera, Inc. All rights reserved. • Retail bank parsing customer service transcripts to better recommend actions • Investment bank parsing news effectively for commodities traders Sample Use Cases

18. © Cloudera, Inc. All rights reserved. Uncertainty Fast Forward Labs cuts through the hype Data Silos Enterprise Data Hub provides a unified foundation Productivity Data Science Workbench enables collaborative self-serve Machine Learning at Cloudera

Editor's Notes

Welcome to Cloudera Sessions! My name is JJ Sakeyand I have the honor of taking us through today’s jam-packed program. By way of introduction, {tell us about what you do JJ}. We are all here today because we believe that data can make what is impossible today possible tomorrow. Certainly many of us in this room have already created board level impacts to our business. We are going to have two customers speak to you today about their data journeys –Sentier, Amazon Web Services, and Altisource. There is no question that big data is improving insights into customers, it’s connecting products and services via IOT and its protecting your business from cyber attacks and regulatory fines. But it’s also quietly having a major social impact that affects every one of us in this room. For example, 4 out of the 5 cancer research centers are using Cloudera to find a cure, and when we do the odds are good that it was done with our software and big data. Cloudera has partnered with many hospitals and saved hundreds of lives already by early detection of sepsis. Lastly, a topic near to my heart, we have partnered with several non-profit groups to detect early signs of suicide, especially for veterans, which is the one of the leading cause of death. So, thank you for spending the day to talk about Big Data, that is having a profound effect in business and our lives. Let’s jump in and cover a couple important logistics items first…
What is artificial intelligence? What is machine learning? Popular media suggests that AI is all about recognizing pictures of cats and dogs, or machines beating human in the game of Go. But when we peel away the hype, machine learning of today is really a very smart pattern recognizer. How do we leverage this capability and tum it into a competitive advantage? The tech stack looks like this. First, there is data. then we need to have the capability to build basic understanding of the data (this is the analytics layer). At this layer, we are able to say things like, “the average age of my customer is 40”. Naturally, we would like our data to tell us more – and this is when we move into the data science layer. Both analytics and Data Science are built on top of the big data layer, but the data requirement is tighter than the analytics layer. At this layer, we start to focus more on data cleaning and preping. Here, we are able to answer questions like “How much sales do I expect to generate next year”. To answer more sophisticated questions, we move into the ML layer. The ML layer puts a lot of focus on algorithms and can only work if the organization has the know-how of the lower stacks. Analytics – descriptive stats, visualize data Data Science – data cleaning, prep, analyze (forecast) ML - algorithms
Machine learning will transform businesses. It is a huge opportunity for every company, but it is hard to execute on. Companies often do not know what questions to ask, and what problems to focus on. Even if they have converged on a problem, they soon realize that … … and that no software can solve their problem. Successful data products are often a clever combination of known components, machine learning tools and algorithms, applied to a well understood problem.
Building the right data product requires both strategy and technology to be properly aligned. You cannot build a data product independently because business strategy dictates data availability. In many cases data opportunities require optimizing over both business needs and technological capability and can also require organizational transformation.
And this is where we come in – Combined with Cloudera’s Enterprise Data Hub and Data Science Workbench, our goal is to accelerate machine learning in the enterprise, from research to production.
We sit at the intersection of three entities. What we try to to is to build a bridge to connect academic research and enterprise – in a way, extract and present information from academic research such that businesses can make use of it. We also intersect startups because we find that it’s a helpful window into what businesses are looking for. We live at and our team has experience at the intersection of startup culture (agility, novelty, speed), academic research (where new algorithmic ideas come from), and the enterprise (opportunity to execute at scale, unique data). We’ve been doing this for 5 years (cf. Amazon, MSFT, Google, etc.) Academic research doesn’t focus on valuable business problems. Startups generally don’t invent new technology. Corporate R&D struggles to align with business priorities and effectively execute.
What makes us unique? We engage in 3 ways – research subscription, advising and application development. We use research to help clients stay on top of emerging ML technologies. Every quarter we release a research report focusing on the new capability/breakthrough that we believe will become important in a next 6 months-2 years. The second way we engage is through advising, where we help define data strategy and evaluate ML capabilities. Our research subscription comes with 4 hours of advising/month. This is tailored for each client and every client utliizes this time differently. As an example, clients have used the time to do a deeper dive into our reports, to help identify data assets, to help guide their ML product development and to develop strategic and technical roadmaps. Lastly, for some clients with very specific projects in mind, but who are unsure of whether they have the resources to succeed, we help transform these science experiments into actual products by performing feasibility studies. The deliverables are proof of concept code and extensive documentation of what worked and didn’t work. In the end clients get a piece of working code, tailored to their problem and data, that you own and that you can build on top of.
Here are all the reports we have done in the past. In choosing a breakthrough topic, we use answers to the following questions as a guide: 1) Is it useful? 2) Can we build a prototype 3) Is it timely? Purely algorithmic breakthroughs are not interesting to the business community unless they have specific applications. One way to ensure that is to filter for ones where a product prototype that depends on the algorithm can be built. Finally, the breakthrough has to be timely. It has to be more possible now than it was 1-2 years ago, and we expect it to be even more possible in 1-2 years more. We predict timeliness using two gauges i) economic constraints and ii) commoditization of tools. Sudden lifting of economic constraints can make what were previously nice ideas practical while commoditization of tooling makes it quicker to build things that were possible, but were difficult to get right and time consuming. Deep learning’s acceleration by GPUs and Keras/Tensorflow clearly illustrate both aspects. In our latest report on semantic recommendations, we look at the state of recommendation systems and their common pitfalls. Recommendation systems have been around for many years and businesses rely on them to surface interesting items for end-users. Unfortunately classical recommendation systems do not understand what they are recommending. Things are recommended to you because others similar to you have liked them. In our report, we look at ways to inject content of items into the system. When we do this, we are building a recommendation system that understands user preferences as it relates to item content. Turns out this technique also solves the cold start problem – this is a common problem in classical recommendation systems where the system doesn’t know how to generate recommendations for new items. In the interpretability report, we look at ways to understand and explain how a model makes decisions. Interpretability is important not just for regulatory reasons, Being able to explain why and how a model works can help us improve models and build a better product. Black box techniques like deep learning delivers breakthrough capabilities at the cost of interpretability – in this report, we show how to make models interpretable without sacrificing their capability or accuracy. If your model is accurate, but you have no idea how it works, what are you missing? Turns out quite a lot! It’s easier to improve an interpretable model. The ability to explain individual decisions to their subjects is intrinsically useful. People like to know why a model has treated them a certain way. And in many cases there’s an ethical and/or legal duty to ensure models are safe and non-discriminatory, which can only be done if they are interpretable. A paper published in 2016 made this report possible, by releasing a algorithm called LIME to probe the inner workings of a black box model. Text summarization. This report looks at a specific and very practical problem: summarizing documents. We show how to do that using the latest and greatest ideas from deep learning and topic modeling. But because text summarization is just a special case of a much broader set of problems — how can we help computers work with natural language — it’s a report with much wider implications, for any of us who work with text, either consuming or generating it. Next, probabilistic programming. The conclusions you draw from imperfect or incomplete data are uncertain. This report is all about how you work with that. Academic statisticians have known the how to deal with this uncertainty for a long time, but it’s only in the past few years that the algorithms have caught up with the scale of big data, and only very recently that tools and have made these algorithms accessible. In our deep learning report, we look at how neural networks enable us to analyze images. We explain what neural networks are, and how we can apply deep learning today.
In all our reports, we first begin with the gentle introduction of the capability We then move on with a rigourous but conceptual discussion of the state of the art algorithm. We also describe the prototype, and the process of building it. For clients who are interested in implementing the new capability, we dedicate a chapter to commerical and open source landscape that will hopefully help with the buy or build decision. Because the focus is on business applications, each report also has a chapter on ethics. We close with a sci-fi short story - mostly to get readers to imagine in a very unconstrained way, what the capability can do for their businesses.
With all that in mind, let’s take a look at a couple of the reports. First, text summarization. This report looks at a specific and very practical problem: summarizing documents. We show how to do that using the latest and greatest ideas from deep learning and topic modeling. But because text summarization is just a special case of a much broader set of problems — how can we help computers work with natural language — it’s a report with much wider implications, for any of us who work with text, either consuming or generating it. How do you take a long document and make it shorter? More generally, how do you make language computable We describe single and multiple document summarization using: topic models (a mature, accessible approach) language embeddings and recurrent neural networks(a cutting-edge deep learning approach)
Next, let’s look at our interpretability report. In the interpretability report, we look at ways to understand and explain how a model makes decisions. Interpretability is important not just for regulatory reasons, Being able to explain why and how a model works can help us improve models and build a better product. Black box techniques like deep learning delivers breakthrough capabilities at the cost of interpretability – in this report, we show how to make models interpretable without sacrificing their capability or accuracy. If your model is accurate, but you have no idea how it works, what are you missing? Turns out quite a lot! It’s easier to improve an interpretable model. The ability to explain individual decisions to their subjects is intrinsically useful. People like to know why a model has treated them a certain way. And in many cases there’s an ethical and/or legal duty to ensure models are safe and non-discriminatory, which can only be done if they are interpretable. A paper published in 2016 made this report possible, by releasing a algorithm called LIME to probe the inner workings of a black box model. Interpretable models are easier to improve Regulators and society can better trust them to be safe and nondiscriminatory They offer insights that can be used to change real-world outcomes for the better We describe the Local Interpretable Model-Agnostic Explanation (LIME) algorithm
To illustrate the capability, we built a prototype where we model the likelihood of a customer churning. Without interpretability, all the model gives us is the probability that a customer will churn. As an example, we see here that customer iD 3676 has a 79% chance of churning.
When we add interpretability to the model by using LIME, we are now able to see why a customer is assigned a particular churn probability. The factors are color coded – redder means that LIME has assigned higher importance to this factor. Using LIME, we are able to say that the 79% churn probability is mostly caused by three factors – the fact that they have Fiber, and that their contract is month-to-month and that the customer is new.
Cloudera helps scale data science and ML: Cloudera acclerates machine learning in the enterprise, from reasearch to production. We address uncertainty with FFL research and advising that cuts through the hype We address data silos issues with our enterprise data hub that unifies collection, access and deployment with shared security and governance Lastly, our Data Science Workbench makes collective, secure data science at scale a reality for the enterprise. SDX: shared data services (ALTUS) Cloudera Altus lets you automate massive-scale data engineering and analytic database compute workloads in your public cloud, without the headache of managing the infrastructure yourself. At the core of Altus is Cloudera's Shared Data Experience (SDX) that eliminates data silos with persistent metadata, security, and governance.

Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning

Similar to Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (16)

Recently uploaded

Recently uploaded (20)

Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine Learning

Editor's Notes