1© Cloudera, Inc. All rights reserved.
Josh YEH | Software Engineer | Cloudera
https://www.linkedin.com/in/joshyeh/
How to Build the Next-Gen
ML/AI Platform
2© Cloudera, Inc. All rights reserved.
“AI-Driven Companies will take $1.2 Trillion from
competitors by 2020”
~ Forrester Predictions 2017
AI decision will drive the insights revolution
3© Cloudera, Inc. All rights reserved.
“Facebook's machine learning systems handle more
than 200 trillion predictions and five billion
translations per day. Facebook's algorithms
automatically remove millions of fake accounts every
day.”
https://www.zdnet.com/
200 trillion / day =~ 5.8 million / second
4© Cloudera, Inc. All rights reserved.
5© Cloudera, Inc. All rights reserved.
Machine Learning Community
6© Cloudera, Inc. All rights reserved.
Machine Learning WorkFlow Practice
7© Cloudera, Inc. All rights reserved.
Data Science Pipeline
8© Cloudera, Inc. All rights reserved.
Model Engineering
9© Cloudera, Inc. All rights reserved.
Data Science Pipeline
10© Cloudera, Inc. All rights reserved.
Data Exploration Challenges
• Leverage existing packages: panda, seaborn, scikit learn,
keras, tensorflow…
• Version compatibility hell...
• Train, deploy setup repeatedly.
• Virtual environments vs Containerization
• Personal experience: Nvidia GPU library, Tensorflow
and etc very aggressively.
11© Cloudera, Inc. All rights reserved.
Model Experiments/Management
Trials and errors to find the “gem”
12© Cloudera, Inc. All rights reserved.
Data Science Pipeline
13© Cloudera, Inc. All rights reserved.
Model Experiments Challenges
• Manage models
• Different hyperparameters
• Different models
• Tracking model performance
• Storage of training artifacts
• New training dataset
• Training efficiency
14© Cloudera, Inc. All rights reserved.
Model Deployment
How to easily deploy in large scale with monitoring
15© Cloudera, Inc. All rights reserved.
Data Science Pipeline
16© Cloudera, Inc. All rights reserved.
Model Deployment Challenge
• Canary testing.
• Shorten development → production cycle
• Reliability. Redundancy.
• Easiness to scale up and down.
18© Cloudera, Inc. All rights reserved.
Demonstration
Model experiments and deployment from 0 to 60
19© Cloudera, Inc. All rights reserved.
Demonstration
• Teleco: Customer churn, Fraud detection, ...
• Manufacture: Product quality, Predictive maintenance,
...
• Finances: Fraud detection, ...
• E-Commerce: Production recommendation, Fraud
detection
• and etc ...
20© Cloudera, Inc. All rights reserved.
Data Science Trending
How to make what we did better, faster, more efficient?
21© Cloudera, Inc. All rights reserved.
The Revolution of Depth of Neural Network
22© Cloudera, Inc. All rights reserved.
Machine Learning Trends
• Distribute Learning
• Transfer Learning
• Edge-device with Machine Learning
Capability
23© Cloudera, Inc. All rights reserved.
Thank you!
jjyeh@cloudera.com

Next-Gen ML/AI Platform

  • 1.
    1© Cloudera, Inc.All rights reserved. Josh YEH | Software Engineer | Cloudera https://www.linkedin.com/in/joshyeh/ How to Build the Next-Gen ML/AI Platform
  • 2.
    2© Cloudera, Inc.All rights reserved. “AI-Driven Companies will take $1.2 Trillion from competitors by 2020” ~ Forrester Predictions 2017 AI decision will drive the insights revolution
  • 3.
    3© Cloudera, Inc.All rights reserved. “Facebook's machine learning systems handle more than 200 trillion predictions and five billion translations per day. Facebook's algorithms automatically remove millions of fake accounts every day.” https://www.zdnet.com/ 200 trillion / day =~ 5.8 million / second
  • 4.
    4© Cloudera, Inc.All rights reserved.
  • 5.
    5© Cloudera, Inc.All rights reserved. Machine Learning Community
  • 6.
    6© Cloudera, Inc.All rights reserved. Machine Learning WorkFlow Practice
  • 7.
    7© Cloudera, Inc.All rights reserved. Data Science Pipeline
  • 8.
    8© Cloudera, Inc.All rights reserved. Model Engineering
  • 9.
    9© Cloudera, Inc.All rights reserved. Data Science Pipeline
  • 10.
    10© Cloudera, Inc.All rights reserved. Data Exploration Challenges • Leverage existing packages: panda, seaborn, scikit learn, keras, tensorflow… • Version compatibility hell... • Train, deploy setup repeatedly. • Virtual environments vs Containerization • Personal experience: Nvidia GPU library, Tensorflow and etc very aggressively.
  • 11.
    11© Cloudera, Inc.All rights reserved. Model Experiments/Management Trials and errors to find the “gem”
  • 12.
    12© Cloudera, Inc.All rights reserved. Data Science Pipeline
  • 13.
    13© Cloudera, Inc.All rights reserved. Model Experiments Challenges • Manage models • Different hyperparameters • Different models • Tracking model performance • Storage of training artifacts • New training dataset • Training efficiency
  • 14.
    14© Cloudera, Inc.All rights reserved. Model Deployment How to easily deploy in large scale with monitoring
  • 15.
    15© Cloudera, Inc.All rights reserved. Data Science Pipeline
  • 16.
    16© Cloudera, Inc.All rights reserved. Model Deployment Challenge • Canary testing. • Shorten development → production cycle • Reliability. Redundancy. • Easiness to scale up and down.
  • 17.
    18© Cloudera, Inc.All rights reserved. Demonstration Model experiments and deployment from 0 to 60
  • 18.
    19© Cloudera, Inc.All rights reserved. Demonstration • Teleco: Customer churn, Fraud detection, ... • Manufacture: Product quality, Predictive maintenance, ... • Finances: Fraud detection, ... • E-Commerce: Production recommendation, Fraud detection • and etc ...
  • 19.
    20© Cloudera, Inc.All rights reserved. Data Science Trending How to make what we did better, faster, more efficient?
  • 20.
    21© Cloudera, Inc.All rights reserved. The Revolution of Depth of Neural Network
  • 21.
    22© Cloudera, Inc.All rights reserved. Machine Learning Trends • Distribute Learning • Transfer Learning • Edge-device with Machine Learning Capability
  • 22.
    23© Cloudera, Inc.All rights reserved. Thank you! jjyeh@cloudera.com