SlideShare a Scribd company logo
zekeLabs
Machine Learning at Scale
Development to Deployment
Skilling for the Future
www.zekeLabs.com
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
zekeLabs
Machine Learning at Scale
Development to Deployment
Skilling for the Future
www.zekeLabs.com
Modules
1. Understanding Machine Learning Ecosystem
2. The Machine Learning Pipeline & Product stories
3. Data Challenges
4. Taking Machine Learning to Scale using Spark & Kafka
5. Knowing the Unknowns
Module 1
Understanding
Machine Learning
Ecosystem
● Black box Introduction to Machine Learning
● Types of Machine Learning
● Components of AI
● The AI Timeline
Black Box Introduction to ML
What is not Machine Learning ?
● Rule Based Approach
● Legacy Systems
Learning Algorithm
What is Machine Learning ?
● Solve prediction problem
Input Data
● Logic is learned from examples & not by rules
Training Data
Prediction Function
or
Trained Model
Types of Machine Learning
Machine Learning
ReinforcementUnsupervisedSupervised
Task Driven Data Driven Environment Driven
Spam Mail Detection
● Input - Mail
● Output - Spam or Ham
● Supervised Machine Learning,
● Binary Classification Problem
● Input - Sensor Data
● Output - Failure time
● Supervised Machine Learning,
● Regression Problem
Predicting Lift Failure
● Input - Accident details
● Output - Insurance amount
● Supervised Machine Learning,
● Regression Problem
Predicting Insurance Amount
● Input - Patient Synopsis (fever,
temperature, BP, etc. )
● Output - Diagnosis
● Supervised Machine Learning,
● Multi-class classification Problem
Medical Diagnosis
Question - What is common between them ?
Market Segmentation
● Input - Customer Details
● Output - Clusters
● Unsupervised Machine Learning
● Clustering Problem
Robot playing Football
● Input - Player information,
Rewards
● Output - Action to score
● Reinforcement Learning
What does AI consist of ?
The A.I. Timeline
Module 2
Machine Learning
Pipeline
● Understanding Machine Learning Pipeline
● User Story - Automating customer support
● Implementation
● User Story - Fast Query Chatbots
● Implementation
Machine Learning Pipeline
Machine Learning Pipeline - Business Understanding
● Business understanding includes clarity what you are trying to achieve.
● Machine learning is not possible with small data size.
● Consolidating data pipeline to channelize continues flow of data.
● Web scraping, data lakes access, REST etc.
Machine Learning Pipeline - Data Wrangling
● Production data is never clean.
● It needs a major effort ( around 70% of total effort ) to make it ready for next stage.
● Transforming & mapping data from raw format to another format ready for next stage.
Machine Learning Pipeline - Data Visualization
● Visualization makes it easy to grasp difficult concepts
● Find useful pattern in the data
● Interactively drill down into charts for deeper details
Vectors - Fixed length array of numbers
● Text documents
● Image files
● CSV
● Audio
● Video
● Time Series data
● Many more ...
Machine Learning Pipeline - Data Preprocessing
Feature Extraction
Machine Learning Pipeline - Model Training
Learning Algorithm
Regression/Trees/SVM/Naiv
e Bayes/Neural Networks/
Prediction Function
or
Trained Model
● Linear Regression
● Logistic Regression
● Naive Bayes
● Nearest Neighbors
● Decision Trees
● Ensemble Methods
● Clustering
● Support Vector Machines
● Neural Networks
● CNN
● RNN
● GAN
Machine Learning Pipeline - Learning Algorithms
Prediction
Prediction Function
or
Trained Model
Machine Learning Pipeline - Model Validation
● Training different learning method will give you different trained model.
● Also, each model have huge possibilities of configuration (hyper-parameters).
● Finding the best model among all possibilities & best configuration for it is done as a part
of Model Validation.
● If results are not satisfactory, one has to go back in the chain & fix a few things.
Machine Learning Pipeline - Deployment
Trained Model
Or
Interface Model
Consumers RESTful Interface
1. User Story : Customer Service Industry
1. Reduce manual
effort of classifying
reviews.
2.Channelizing data
from Web server to
Analytics Engine.
1. Getting
data ready for
visualization.
2. Historical
data shows
past trends.
Visualization
of trend
Text needs to
be tokenized
& vectorized
Different
models were
trained.
Naive Bayes,
SGD Classifier
Choose the
best model
with best
hyper-
parameter
Naive Bayes
(MultinomialNB)
was chosen & put
in deployment
1. Implementation : Customer Service Industry
2. User Story : Fast Query Chatbots
2. Implementation : Fast Query Chatbots
1. Reduce manual effort
understanding the text
query
2. Waiting for BI has a
long turnaround time
3. We are trying to do this
using chatbot
1. Getting data
ready for
visualization.
2. Historical
data shows
past trends
Visualization
of trend of
text & sql
Text cannot
be used for
ML
Needs to be
tokenized &
vectorized
Deep learning
models with
different layer
configuration
Choosing the
best model
with best
hyper-
parameter
Model with best
config was chosen
& put in
deployment
3. User Story : Preventing System Failure
Module 3
Data Challenges
● Optimal data size
● Identify data sources
● Identify what is useful in data
● Cleaning data to extract useful information
● Tools & Libraries to clean & extract useful information
Optimal Data size for AI product
● Expectation from a predictor -
Moderate Bias & Moderate
Variance.
● Predictor validation is important.
● The more the data better the
model becomes to a limit.
Identify Data Sources
● No specific order in identifying problem statement & data sources.
● Innovation in this space can happen in both ways - Top-Down & Bottom’s-
Up.
● Data can be historical batch data stored in RDBMS & NoSQL DBs.
● Live streamed data using Kafka.
Identify what is useful in data
Cleaning data to extract useful information
Tools vs Libraries
● Data cleaning tools available in market.
● Why they don’t work in long run?
● Data cleaning libraries available.
● Why are more and more enterprises are embracing libraries?
Changes with change in volume of data
Spark vs Other technologies
● Big Data Compute Framework
● Do data cleaning at scale with unbounded performance
● Talk to different data sources
Module 4
Machine
Learning Pipeline
at Scale
● Machine Learning Pipeline using Spark
● Spark - A very social technology
● Spark for Big Data Cleaning & Wrangling
● Spark for building ML models at Scale
● Validation & monitoring of models
● Deployment using REST interface using Apache Livy
Machine Learning Pipeline using Spark
Spark - A very social technology
Preprocessing Data at Scale
● Scaling
● CountVectorizer
● Binning
● … many things can be done at scale using Spark
Training Models using Spark
● Distributed Model Training using Spark
● Regression
● Classification
● Clustering
● Recommendation Engine
Building Data Pipeline in Spark
● Spark provides in-built Transformers & Estimators.
● Pipeline can be built to connect transformers & estimators.
● Machine Learning Pipeline can be automated.
REST Interface to Spark
Module 5
Knowing
the
Unknowns
● Implementing Transformers & Estimators on Spark
● Deep Learning using Spark
● Are model retrainable?
● The skilling journey
● Introducing Apache Beam
Transformers & Estimators on Spark
● Building Custom Transformers
● Building Custom Estimators
What is Deep Learning ?
● Specialized Learning Technique.
● Rather than we choosing features for learning, this technique finds
important feature derivatives.
● Objective is to learn best derived features for prediction.
● It mimics the way our brain learns.
● Very useful for natural language, computer vision, audio, video etc.
Do you always need Deep Learning ?
● More data is required for Deep Learning
● More Compute Power
● Models less interpretable
“Don’t kill a mosquito with a cannon ball”
Don’t use Deep Learning if you don’t need to
Deep Learning using Spark
● Which one to choose - Distributed TensorFlow & DL using Spark.
● Libraries like - spark-dl & elephas
Are models re-trainable ?
● Online learning models in scikit - SGDClassifier, Multinomial Naive Bayes
● Spark ML models are not online learning models
Skilling Journey
Apache Beam - Probably our next webinar
● Apache Beam is an evolution of the Dataflow model created by Google to
process massive amounts of data.
● The name Beam (Batch + strEAM) comes from the idea of having a unified
model for both batch and stream data processing.
● Programs written using Beam can be executed in different processing
frameworks (via runners) using a set of different IOs (Spark, Flink etc.).
Q & A
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
Additional
Components of any AI product
Data Compute Talent
Where AI got into in business?
Imp : Advice to executives about AI
● Everybody should embrace modern capability of AI, on other they should
also think about business specific problems. Not every single tool that AI
community can develop can suit them correctly.
● Biggest challenge is people change not technology change, biggest gap
now is people who can map technology to business problem.
● Insourcing vs outsourcing. Building Team vs using enterprise solutions.
● AI will change everything in next few decades. Be a part of it.
Challenges - Data & Security
● Volume of data - Machine learning
on smaller data is infeasible.
● Accessibility of data - Important
data is not accessible & may be in
encrypted format.
info@zekeLabs.com | www.zekeLabs.com | +91
Compute, Storage & Network Power
● AI products needs data gathering from sensors, servers etc.
● Once gathered, data needs to be stored for further processing.
● Learning algorithms & data processing activities need lot of compute
power.
Infrastructure for development
● Finding the best model is an iterative
process.
● More experiments leads better model.
● Hyper-parameter Tuning
● Scaled infrastructure for developer is
important.
info@zekeLabs.com | www.zekeLabs.com | +91
Infrastructure for deployment
● Speedy Deployment.
● Easy deployment
● Fluctuating Demand.
● Need of Elastic infrastructure.
● Cost optimization.
info@zekeLabs.com | www.zekeLabs.com | +91
Summary of challenges
Cost optimization:
● Use Open Source alternatives
● Infrastructure optimization
● Don’t reinvent the wheel
info@zekeLabs.com | www.zekeLabs.com | +91
Module 3
Impact of AI
● Will AI benefit human ?
● AI in human computer interaction
● Impact of AI on business
● Impact on workplace
● Impact on society
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
AI benefit human - social, environmental
● Predicting diseases
● 60% People would prefer AI assistance over humans as financial advisors
or tax preparers
● 71% people believe that AI will help humans solve complex problems and
help live more enriched lives
AI assistants
● Saves Time
● Calendar events reminder
● Helps get things done
Impact of AI on business
More
AI advisor & manager at workplace
Impact on Decision Makers
● Adoption of AI advisors
What can be outsourced to AI assistant
Impact of artificial intelligence on society
● People are averse to the idea of availing annual health check-
ups at home with a robotic smart kit (77%) or having chatbot
assistant teachers in universities/ colleges that lower the cost
of overall tuition (61%).
● Responsible AI ensures that its workings are aligned to ethical
standards and social norms pertinent within its scope of
operations.
● Explainable AI is responsible for building AI models with
accountability and the ability to describe or depict why a certain
decision was made by the algorithm.
Module 4
Identify right tools
● Programming Language
● Open source libraries
● Infrastructure Optimizations
● Other alternatives
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
Choose the right Programming Language
Why Python makes life easy ?
● Easy to learn for ETL developers
● Integrates very well with other technologies
● Full-stack development -
○ Dashboard using bokeh,
○ Web application using django,
○ Machine learning models using scikit,
○ Scaling using PySpark
info@zekeLabs.com | www.zekeLabs.com | +91
Choose appropriate Libraries
- Statistical Modeling & Data Processing
info@zekeLabs.com | www.zekeLabs.com | +91
Choose appropriate Libraries
- Visualization
info@zekeLabs.com | www.zekeLabs.com | +91
Choose appropriate Libraries
- Machine Learning or Deep Learning
Infrastructure Optimization
Monolithic or Serverless
info@zekeLabs.com | www.zekeLabs.com | +91
Monolithic Infrastructure - Preallocated Infra
Model Training
● Developers request access
whenever required
● Might incur delay in peak
working hours.
● Idle in non-working hours
Model Interfacing
● Idle in non-peak hours.
● May fall short in spikes.
● Pay even if infra is not used
info@zekeLabs.com | www.zekeLabs.com | +91
Serverless Infrastructure - Elastic Allocation
Model Training
● No-preallocation
● Pay only for what you use
● Absolute no idle time for infra
● No wait time for developers
Model Interfacing
● Allocate infra only when required
● Scales down during non-peak
hours
● Improved customer experience
even in peak hours
info@zekeLabs.com | www.zekeLabs.com | +91
Serverless Infrastructure Solutions
● Open Function as a Service (OpenFaas)
● AWS Lambda
● Google Cloud Function
● Azure Function
info@zekeLabs.com | www.zekeLabs.com | +91
Distributed Machine Learning using Spark
● Apache Spark is a distributed data
processing framework.
● Many machine learning algorithms are
implemented in Spark.
● Most of the API’s are same that of scikit-
learn
● Scaled ETL & Machine Learning can be done
using Spark
info@zekeLabs.com | www.zekeLabs.com | +91
Other alternatives
Google Cloud AI
info@zekeLabs.com | www.zekeLabs.com | +91
Module 5
Build AI Team
● Adoption of AI
● Skills
● Hiring or upskilling
● Upskilling workforce
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
Adoption Strategy
Build Business Case Scale Efficiently
Create Data
Driven Culture
Skills
Talent Acquisition
● Upskill your current team ?
info@zekeLabs.com | www.zekeLabs.com | +91
Upskilling workforce
● It’s possible to make use of the people who have delivered for you in the
past.
Q & A
info@zekeLabs.com | www.zekeLabs.com | +91
Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/
info@zekeLabs.com | www.zekeLabs.com | +91
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

More Related Content

What's hot

Serverless Summit India 2017: Fission
Serverless Summit India 2017: FissionServerless Summit India 2017: Fission
Serverless Summit India 2017: Fission
Vishal Biyani
 
Creating a Kubernetes Operator in Java
Creating a Kubernetes Operator in JavaCreating a Kubernetes Operator in Java
Creating a Kubernetes Operator in Java
Rudy De Busscher
 
Performance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releasePerformance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 release
LibbySchulze
 
[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API
Lakmal Warusawithana
 
Container Patterns
Container PatternsContainer Patterns
Container Patterns
Matthias Luebken
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
Carlos Cavero Barca
 
Serverless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionServerless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps Adoption
All Things Open
 
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
CodeOps Technologies LLP
 
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
CodeOps Technologies LLP
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
HostedbyConfluent
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
Shikha Srivastava
 
Advanced dev ops governance with terraform
Advanced dev ops governance with terraformAdvanced dev ops governance with terraform
Advanced dev ops governance with terraform
James Counts
 
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesBreaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Paul Osman
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless Architecture
AWS Vietnam Community
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
Altoros
 
Deeplearning and dev ops azure
Deeplearning and dev ops azureDeeplearning and dev ops azure
Deeplearning and dev ops azure
Vishwas N
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
Zhenzhong Xu
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrier
LibbySchulze
 
Serverless java
Serverless   javaServerless   java
Serverless java
Vishwas N
 
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Platform9
 

What's hot (20)

Serverless Summit India 2017: Fission
Serverless Summit India 2017: FissionServerless Summit India 2017: Fission
Serverless Summit India 2017: Fission
 
Creating a Kubernetes Operator in Java
Creating a Kubernetes Operator in JavaCreating a Kubernetes Operator in Java
Creating a Kubernetes Operator in Java
 
Performance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releasePerformance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 release
 
[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API
 
Container Patterns
Container PatternsContainer Patterns
Container Patterns
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
 
Serverless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionServerless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps Adoption
 
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
 
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
 
Advanced dev ops governance with terraform
Advanced dev ops governance with terraformAdvanced dev ops governance with terraform
Advanced dev ops governance with terraform
 
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesBreaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace Microservices
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless Architecture
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Deeplearning and dev ops azure
Deeplearning and dev ops azureDeeplearning and dev ops azure
Deeplearning and dev ops azure
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrier
 
Serverless java
Serverless   javaServerless   java
Serverless java
 
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
 

Similar to Machine learning at scale - Webinar By zekeLabs

AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla
 
Serverless Machine Learning
Serverless Machine LearningServerless Machine Learning
Serverless Machine Learning
Asavari Tayal
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
dtz001
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
Product School
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Machine learning
Machine learningMachine learning
Machine learning
Saravanan Subburayal
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 

Similar to Machine learning at scale - Webinar By zekeLabs (20)

AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Serverless Machine Learning
Serverless Machine LearningServerless Machine Learning
Serverless Machine Learning
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Machine learning
Machine learningMachine learning
Machine learning
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 

More from zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
zekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
zekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
zekeLabs Technologies
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
zekeLabs Technologies
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
zekeLabs Technologies
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
zekeLabs Technologies
 
Naive bayes
Naive bayesNaive bayes
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
zekeLabs Technologies
 
Linear regression
Linear regressionLinear regression
Linear regression
zekeLabs Technologies
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
zekeLabs Technologies
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
zekeLabs Technologies
 
Feature selection
Feature selectionFeature selection
Feature selection
zekeLabs Technologies
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
zekeLabs Technologies
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
zekeLabs Technologies
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
zekeLabs Technologies
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
zekeLabs Technologies
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
zekeLabs Technologies
 

More from zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Recently uploaded

Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 

Recently uploaded (20)

Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 

Machine learning at scale - Webinar By zekeLabs

  • 1. zekeLabs Machine Learning at Scale Development to Deployment Skilling for the Future www.zekeLabs.com
  • 2. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
  • 3. zekeLabs Machine Learning at Scale Development to Deployment Skilling for the Future www.zekeLabs.com
  • 4. Modules 1. Understanding Machine Learning Ecosystem 2. The Machine Learning Pipeline & Product stories 3. Data Challenges 4. Taking Machine Learning to Scale using Spark & Kafka 5. Knowing the Unknowns
  • 5. Module 1 Understanding Machine Learning Ecosystem ● Black box Introduction to Machine Learning ● Types of Machine Learning ● Components of AI ● The AI Timeline
  • 7. What is not Machine Learning ? ● Rule Based Approach ● Legacy Systems
  • 8. Learning Algorithm What is Machine Learning ? ● Solve prediction problem Input Data ● Logic is learned from examples & not by rules Training Data Prediction Function or Trained Model
  • 9. Types of Machine Learning Machine Learning ReinforcementUnsupervisedSupervised Task Driven Data Driven Environment Driven
  • 10. Spam Mail Detection ● Input - Mail ● Output - Spam or Ham ● Supervised Machine Learning, ● Binary Classification Problem
  • 11. ● Input - Sensor Data ● Output - Failure time ● Supervised Machine Learning, ● Regression Problem Predicting Lift Failure
  • 12. ● Input - Accident details ● Output - Insurance amount ● Supervised Machine Learning, ● Regression Problem Predicting Insurance Amount
  • 13. ● Input - Patient Synopsis (fever, temperature, BP, etc. ) ● Output - Diagnosis ● Supervised Machine Learning, ● Multi-class classification Problem Medical Diagnosis
  • 14. Question - What is common between them ?
  • 15. Market Segmentation ● Input - Customer Details ● Output - Clusters ● Unsupervised Machine Learning ● Clustering Problem
  • 16. Robot playing Football ● Input - Player information, Rewards ● Output - Action to score ● Reinforcement Learning
  • 17. What does AI consist of ?
  • 19. Module 2 Machine Learning Pipeline ● Understanding Machine Learning Pipeline ● User Story - Automating customer support ● Implementation ● User Story - Fast Query Chatbots ● Implementation
  • 21. Machine Learning Pipeline - Business Understanding ● Business understanding includes clarity what you are trying to achieve. ● Machine learning is not possible with small data size. ● Consolidating data pipeline to channelize continues flow of data. ● Web scraping, data lakes access, REST etc.
  • 22. Machine Learning Pipeline - Data Wrangling ● Production data is never clean. ● It needs a major effort ( around 70% of total effort ) to make it ready for next stage. ● Transforming & mapping data from raw format to another format ready for next stage.
  • 23. Machine Learning Pipeline - Data Visualization ● Visualization makes it easy to grasp difficult concepts ● Find useful pattern in the data ● Interactively drill down into charts for deeper details
  • 24. Vectors - Fixed length array of numbers ● Text documents ● Image files ● CSV ● Audio ● Video ● Time Series data ● Many more ... Machine Learning Pipeline - Data Preprocessing Feature Extraction
  • 25. Machine Learning Pipeline - Model Training Learning Algorithm Regression/Trees/SVM/Naiv e Bayes/Neural Networks/ Prediction Function or Trained Model
  • 26. ● Linear Regression ● Logistic Regression ● Naive Bayes ● Nearest Neighbors ● Decision Trees ● Ensemble Methods ● Clustering ● Support Vector Machines ● Neural Networks ● CNN ● RNN ● GAN Machine Learning Pipeline - Learning Algorithms
  • 28. Machine Learning Pipeline - Model Validation ● Training different learning method will give you different trained model. ● Also, each model have huge possibilities of configuration (hyper-parameters). ● Finding the best model among all possibilities & best configuration for it is done as a part of Model Validation. ● If results are not satisfactory, one has to go back in the chain & fix a few things.
  • 29. Machine Learning Pipeline - Deployment Trained Model Or Interface Model Consumers RESTful Interface
  • 30. 1. User Story : Customer Service Industry
  • 31. 1. Reduce manual effort of classifying reviews. 2.Channelizing data from Web server to Analytics Engine. 1. Getting data ready for visualization. 2. Historical data shows past trends. Visualization of trend Text needs to be tokenized & vectorized Different models were trained. Naive Bayes, SGD Classifier Choose the best model with best hyper- parameter Naive Bayes (MultinomialNB) was chosen & put in deployment 1. Implementation : Customer Service Industry
  • 32. 2. User Story : Fast Query Chatbots
  • 33. 2. Implementation : Fast Query Chatbots 1. Reduce manual effort understanding the text query 2. Waiting for BI has a long turnaround time 3. We are trying to do this using chatbot 1. Getting data ready for visualization. 2. Historical data shows past trends Visualization of trend of text & sql Text cannot be used for ML Needs to be tokenized & vectorized Deep learning models with different layer configuration Choosing the best model with best hyper- parameter Model with best config was chosen & put in deployment
  • 34. 3. User Story : Preventing System Failure
  • 35. Module 3 Data Challenges ● Optimal data size ● Identify data sources ● Identify what is useful in data ● Cleaning data to extract useful information ● Tools & Libraries to clean & extract useful information
  • 36. Optimal Data size for AI product ● Expectation from a predictor - Moderate Bias & Moderate Variance. ● Predictor validation is important. ● The more the data better the model becomes to a limit.
  • 37. Identify Data Sources ● No specific order in identifying problem statement & data sources. ● Innovation in this space can happen in both ways - Top-Down & Bottom’s- Up. ● Data can be historical batch data stored in RDBMS & NoSQL DBs. ● Live streamed data using Kafka.
  • 38. Identify what is useful in data
  • 39. Cleaning data to extract useful information
  • 40. Tools vs Libraries ● Data cleaning tools available in market. ● Why they don’t work in long run? ● Data cleaning libraries available. ● Why are more and more enterprises are embracing libraries?
  • 41. Changes with change in volume of data
  • 42. Spark vs Other technologies ● Big Data Compute Framework ● Do data cleaning at scale with unbounded performance ● Talk to different data sources
  • 43. Module 4 Machine Learning Pipeline at Scale ● Machine Learning Pipeline using Spark ● Spark - A very social technology ● Spark for Big Data Cleaning & Wrangling ● Spark for building ML models at Scale ● Validation & monitoring of models ● Deployment using REST interface using Apache Livy
  • 44.
  • 46. Spark - A very social technology
  • 47. Preprocessing Data at Scale ● Scaling ● CountVectorizer ● Binning ● … many things can be done at scale using Spark
  • 48. Training Models using Spark ● Distributed Model Training using Spark ● Regression ● Classification ● Clustering ● Recommendation Engine
  • 49. Building Data Pipeline in Spark ● Spark provides in-built Transformers & Estimators. ● Pipeline can be built to connect transformers & estimators. ● Machine Learning Pipeline can be automated.
  • 51. Module 5 Knowing the Unknowns ● Implementing Transformers & Estimators on Spark ● Deep Learning using Spark ● Are model retrainable? ● The skilling journey ● Introducing Apache Beam
  • 52. Transformers & Estimators on Spark ● Building Custom Transformers ● Building Custom Estimators
  • 53. What is Deep Learning ? ● Specialized Learning Technique. ● Rather than we choosing features for learning, this technique finds important feature derivatives. ● Objective is to learn best derived features for prediction. ● It mimics the way our brain learns. ● Very useful for natural language, computer vision, audio, video etc.
  • 54. Do you always need Deep Learning ? ● More data is required for Deep Learning ● More Compute Power ● Models less interpretable “Don’t kill a mosquito with a cannon ball” Don’t use Deep Learning if you don’t need to
  • 55. Deep Learning using Spark ● Which one to choose - Distributed TensorFlow & DL using Spark. ● Libraries like - spark-dl & elephas
  • 56. Are models re-trainable ? ● Online learning models in scikit - SGDClassifier, Multinomial Naive Bayes ● Spark ML models are not online learning models
  • 58. Apache Beam - Probably our next webinar ● Apache Beam is an evolution of the Dataflow model created by Google to process massive amounts of data. ● The name Beam (Batch + strEAM) comes from the idea of having a unified model for both batch and stream data processing. ● Programs written using Beam can be executed in different processing frameworks (via runners) using a set of different IOs (Spark, Flink etc.).
  • 59. Q & A
  • 60. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
  • 62. Components of any AI product Data Compute Talent
  • 63. Where AI got into in business?
  • 64. Imp : Advice to executives about AI ● Everybody should embrace modern capability of AI, on other they should also think about business specific problems. Not every single tool that AI community can develop can suit them correctly. ● Biggest challenge is people change not technology change, biggest gap now is people who can map technology to business problem. ● Insourcing vs outsourcing. Building Team vs using enterprise solutions. ● AI will change everything in next few decades. Be a part of it.
  • 65. Challenges - Data & Security ● Volume of data - Machine learning on smaller data is infeasible. ● Accessibility of data - Important data is not accessible & may be in encrypted format. info@zekeLabs.com | www.zekeLabs.com | +91
  • 66. Compute, Storage & Network Power ● AI products needs data gathering from sensors, servers etc. ● Once gathered, data needs to be stored for further processing. ● Learning algorithms & data processing activities need lot of compute power.
  • 67. Infrastructure for development ● Finding the best model is an iterative process. ● More experiments leads better model. ● Hyper-parameter Tuning ● Scaled infrastructure for developer is important. info@zekeLabs.com | www.zekeLabs.com | +91
  • 68. Infrastructure for deployment ● Speedy Deployment. ● Easy deployment ● Fluctuating Demand. ● Need of Elastic infrastructure. ● Cost optimization. info@zekeLabs.com | www.zekeLabs.com | +91
  • 70. Cost optimization: ● Use Open Source alternatives ● Infrastructure optimization ● Don’t reinvent the wheel info@zekeLabs.com | www.zekeLabs.com | +91
  • 71. Module 3 Impact of AI ● Will AI benefit human ? ● AI in human computer interaction ● Impact of AI on business ● Impact on workplace ● Impact on society info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 72. AI benefit human - social, environmental ● Predicting diseases ● 60% People would prefer AI assistance over humans as financial advisors or tax preparers ● 71% people believe that AI will help humans solve complex problems and help live more enriched lives
  • 73. AI assistants ● Saves Time ● Calendar events reminder ● Helps get things done
  • 74. Impact of AI on business
  • 75. More
  • 76. AI advisor & manager at workplace
  • 77. Impact on Decision Makers ● Adoption of AI advisors
  • 78. What can be outsourced to AI assistant
  • 79. Impact of artificial intelligence on society ● People are averse to the idea of availing annual health check- ups at home with a robotic smart kit (77%) or having chatbot assistant teachers in universities/ colleges that lower the cost of overall tuition (61%). ● Responsible AI ensures that its workings are aligned to ethical standards and social norms pertinent within its scope of operations. ● Explainable AI is responsible for building AI models with accountability and the ability to describe or depict why a certain decision was made by the algorithm.
  • 80. Module 4 Identify right tools ● Programming Language ● Open source libraries ● Infrastructure Optimizations ● Other alternatives info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 81. Choose the right Programming Language
  • 82. Why Python makes life easy ? ● Easy to learn for ETL developers ● Integrates very well with other technologies ● Full-stack development - ○ Dashboard using bokeh, ○ Web application using django, ○ Machine learning models using scikit, ○ Scaling using PySpark info@zekeLabs.com | www.zekeLabs.com | +91
  • 83. Choose appropriate Libraries - Statistical Modeling & Data Processing info@zekeLabs.com | www.zekeLabs.com | +91
  • 84. Choose appropriate Libraries - Visualization info@zekeLabs.com | www.zekeLabs.com | +91
  • 85. Choose appropriate Libraries - Machine Learning or Deep Learning
  • 86. Infrastructure Optimization Monolithic or Serverless info@zekeLabs.com | www.zekeLabs.com | +91
  • 87. Monolithic Infrastructure - Preallocated Infra Model Training ● Developers request access whenever required ● Might incur delay in peak working hours. ● Idle in non-working hours Model Interfacing ● Idle in non-peak hours. ● May fall short in spikes. ● Pay even if infra is not used info@zekeLabs.com | www.zekeLabs.com | +91
  • 88. Serverless Infrastructure - Elastic Allocation Model Training ● No-preallocation ● Pay only for what you use ● Absolute no idle time for infra ● No wait time for developers Model Interfacing ● Allocate infra only when required ● Scales down during non-peak hours ● Improved customer experience even in peak hours info@zekeLabs.com | www.zekeLabs.com | +91
  • 89. Serverless Infrastructure Solutions ● Open Function as a Service (OpenFaas) ● AWS Lambda ● Google Cloud Function ● Azure Function info@zekeLabs.com | www.zekeLabs.com | +91
  • 90. Distributed Machine Learning using Spark ● Apache Spark is a distributed data processing framework. ● Many machine learning algorithms are implemented in Spark. ● Most of the API’s are same that of scikit- learn ● Scaled ETL & Machine Learning can be done using Spark info@zekeLabs.com | www.zekeLabs.com | +91
  • 91. Other alternatives Google Cloud AI info@zekeLabs.com | www.zekeLabs.com | +91
  • 92. Module 5 Build AI Team ● Adoption of AI ● Skills ● Hiring or upskilling ● Upskilling workforce info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 93. Adoption Strategy Build Business Case Scale Efficiently Create Data Driven Culture
  • 95. Talent Acquisition ● Upskill your current team ? info@zekeLabs.com | www.zekeLabs.com | +91
  • 96. Upskilling workforce ● It’s possible to make use of the people who have delivered for you in the past.
  • 97. Q & A info@zekeLabs.com | www.zekeLabs.com | +91
  • 98. Repositories ● https://github.com/zekelabs/machine-learning-for-beginners ● https://github.com/zekelabs/tensorflow-tutorial/ ● Dog breed prediction - https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D D17AA47/ ● Python learning course - https://www.edyoda.com/resources/videolisting/98/ info@zekeLabs.com | www.zekeLabs.com | +91
  • 99. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com