SlideShare a Scribd company logo
1 of 31
MACHINE LEARNING
INFORMATION ENGINEERING, KSU
JASON TSENG
Machine Learning Workflow
MACHINE LEARNING MODEL
• A formal definition of ML given by professor Mitchell − :
“A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P,
improves with experience E.”
• ML is a field of AI consisting of learning algorithms that
• Improve their performance
• At executing some task
• Over time with experience
TASK
• From the perspective of problem, we may define the task T as the real-world problem to be
solved.
• The problem can be anything like finding best house price in a specific location or to find
best marketing strategy etc. On the other hand, if we talk about machine learning, the
definition of task is different because it is difficult to solve ML based tasks by conventional
programming approach.
• A task T is said to be a ML based task when it is based on the process and the system must
follow for operating on data points. The examples of ML based tasks are Classification,
Regression, Structured annotation, Clustering, Transcription etc.
EXPERIENCE E
• As name suggests, it is the knowledge gained from data points provided to the algorithm or
model.
• Once provided with the dataset, the model will run iteratively and will learn some inherent
pattern. The learning thus acquired is called experience .
• Making an analogy with human learning, we can think of this situation as in which a human
being is learning or gaining some experience from various attributes like situation, relationships
etc.
• Supervised, unsupervised and reinforcement learning are some ways to learn or gain experience.
The experience gained by out ML model or algorithm will be used to solve the task T.
PERFORMANCE P
• An ML algorithm is supposed to perform task and gain experience with the passage
of time.
• The measure which tells whether ML algorithm is performing as per expectation or
not is its performance .
• P is basically a quantitative metric that tells how a model is performing the task, T,
using its experience, E.
• There are many metrics that help to understand the ML performance, such as
accuracy score, F1 score, confusion matrix, precision, recall, sensitivity etc.
CHALLENGES IN MACHINES LEARNING
• Quality of data − Having good-quality data for ML algorithms is one of the
biggest challenges. Use of low-quality data leads to the problems related to data
preprocessing and feature extraction.
• Time-Consuming task − Another challenge faced by ML models is the
consumption of time especially for data acquisition, feature extraction and
retrieval.
• Lack of specialist persons − As ML technology is still in its infancy stage,
availability of expert resources is a tough job.
CHALLENGES IN MACHINES LEARNING
• No clear objective for formulating business problems − Having no clear objective
and well-defined goal for business problems is another key challenge for ML because
this technology is not that mature yet.
• Issue of overfitting & underfitting − If the model is overfitting or underfitting, it
cannot be represented well for the problem.
• Curse of dimensionality − Another challenge ML model faces is too many features of
data points. This can be a real hindrance.
• Difficulty in deployment − Complexity of the ML model makes it quite difficult to be
deployed in real life.
APPLICATIONS OF MACHINES LEARNING
ML can be used to solve many real-world
complex problems which cannot be solved
with traditional approach.
• Emotion analysis
• Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and
forecasting
• Speech synthesis, speech
recognition
• Customer segmentation, object
recognition
• Fraud detection, fraud prevention
• Recommendation of products to
customer in online shopping
TYPES OF LEARNING Is this A or B?
How much?
How many?
How is this organized?
What should I do now?
DATA LOADING FOR ML PROJECTS
• The most common format of data for ML projects is CSV (comma-separated
values), a simple file format which is used to store tabular data (number and text)
such as a spreadsheet in plain text.
• File Head: the header contains the information for each field
• Two cases related to CSV file header which must be considered −
• Case-I: When Data file is having a file header − It will automatically
assign the names to each column of data if data file is having a file header.
• Case-II: When Data file is not having a file header − We need to assign the
names to each column of data manually if data file is not having a file header.
PANDAS OR CSV OR NUMPY
UNDERSTANDING DATA WITH STATISTICS
• Looking at Raw Data
• Checking Dimensions of Data
• Statistical Summary of Data
• Reviewing Class Distribution
• Reviewing Correlation between Attributes
• Reviewing Skew of Attribute Distribution
UNDERSTANDING DATA WITH VISUALIZATION
PREPARING DATA
• This makes data preparation the most important step in ML process. Data
preparation may be defined as the procedure that makes our dataset more
appropriate for ML process.
• Data Pre-processing Techniques:
• Scaling
• Normalization
• L1 normalization
• L2 normalization
• Binarization
• Standarization
• Data labeling
SCALING
• Data rescaling makes sure that attributes
are at same scale.
• Generally, attributes are rescaled into the
range of 0 and 1. ML algorithms like
gradient descent and k-Nearest
Neighbors requires scaled data.
• We can rescale the data with the help
of MinMaxScaler class of scikit-
learn Python library.
BINARIZATION
• We can use a binary threshold for making our data
binary. The values above that threshold value will
be converted to 1 and below that threshold will be
converted to 0.
• For example, if we choose threshold value = 0.5,
then the dataset value above it will become 1 and
below this will become 0. That is why we can call
it binarizing the data or thresholding the data.
STANDARIZATION
• The standarization transforms the data attributes with a Gaussian distribution.
• It differs the mean and SD (Standard Deviation) to a standard Gaussian
distribution with a mean of 0 and a SD of 1.
• This technique is useful in ML algorithms like linear regression, logistic regression
that assumes a Gaussian distribution in input dataset and produce better results
with rescaled data.
• We can standardize the data (mean = 0 and SD =1) with the help
of StandardScaler class of scikit-learn Python library.
DATA LABELING
• It is also very important to send the data to ML algorithms having proper labeling.
• For example, in case of classification problems, lot of labels in the form of words,
numbers etc. are there on the data.
DATA FEATURE SELECTION
• The performance of machine learning model is directly proportional to the
data features used to train it.
• On the other hand, use of relevant data features can increase the accuracy
of your ML model especially linear and logistic regression.
• The following are some of the benefits of automatic feature selection
before modeling the data −
• Performing feature selection before data modeling will reduce the overfitting.
• Performing feature selection before data modeling will increases the accuracy
of ML model.
• Performing feature selection before data modeling will reduce the training
time
FEATURE SELECTION TECHNIQUES
• The followings are automatic feature selection techniques that we can use to
model ML data in Python −
• Univariate Selection
• Recursive Feature Elimination
• Principal Component Analysis (PCA)
• Feature Importance
UNIVARIATE SELECTION
• This feature selection technique is very useful in selecting those features, with the
help of statistical testing, having strongest relationship with the prediction
variables.
• We can implement univariate feature selection technique with the help of
SelectKBest0class of scikit-learn Python library.
RECURSIVE FEATURE ELIMINATION
• As the name suggests, RFE (Recursive feature elimination) feature selection
technique removes the attributes recursively and builds the model with remaining
attributes.
• We can implement RFE feature selection technique with the help of RFE class
of scikit-learn Python library.
PRINCIPAL COMPONENT ANALYSIS (PCA)
• PCA, generally called data reduction technique, is very useful feature selection
technique as it uses linear algebra to transform the dataset into a compressed
form.
• We can implement PCA feature selection technique with the help of PCA class of
scikit-learn Python library. We can select number of principal components in the
output.
FEATURE IMPORTANCE
• As the name suggests, feature importance technique is used to choose the
importance features.
• It basically uses a trained supervised classifier to select features. We can
implement this feature selection technique with the help of ExtraTreeClassifier
class of scikit-learn Python library.
machine learning workflow with data input.pptx

More Related Content

Similar to machine learning workflow with data input.pptx

MachineLearningSparkML AI and expert Systems
MachineLearningSparkML AI and expert SystemsMachineLearningSparkML AI and expert Systems
MachineLearningSparkML AI and expert Systemsshreenathji26
 
MachineLearningSparkML.pptx
MachineLearningSparkML.pptxMachineLearningSparkML.pptx
MachineLearningSparkML.pptxharikaramisetty3
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxCarolineRebeccaD
 
Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
MachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxMachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxAmanDixit74
 
Module III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptxModule III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptxRahul Borate
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
(Faiz) MachineLearning(ppt).pptx
(Faiz) MachineLearning(ppt).pptx(Faiz) MachineLearning(ppt).pptx
(Faiz) MachineLearning(ppt).pptxFaiz430036
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningPraveen Devarao
 
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...Lola Burgueño
 
Feature Engineering.pdf
Feature Engineering.pdfFeature Engineering.pdf
Feature Engineering.pdfRajoo Jha
 
MLCC Schedule #1
MLCC Schedule #1MLCC Schedule #1
MLCC Schedule #1Bruce Lee
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZCharles Vestur
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 

Similar to machine learning workflow with data input.pptx (20)

MachineLearningSparkML AI and expert Systems
MachineLearningSparkML AI and expert SystemsMachineLearningSparkML AI and expert Systems
MachineLearningSparkML AI and expert Systems
 
MachineLearningSparkML.pptx
MachineLearningSparkML.pptxMachineLearningSparkML.pptx
MachineLearningSparkML.pptx
 
MachineLearningSparkML.pptx
MachineLearningSparkML.pptxMachineLearningSparkML.pptx
MachineLearningSparkML.pptx
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
MachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxMachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptx
 
Module III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptxModule III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptx
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
(Faiz) MachineLearning(ppt).pptx
(Faiz) MachineLearning(ppt).pptx(Faiz) MachineLearning(ppt).pptx
(Faiz) MachineLearning(ppt).pptx
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
 
Feature Engineering.pdf
Feature Engineering.pdfFeature Engineering.pdf
Feature Engineering.pdf
 
MLCC Schedule #1
MLCC Schedule #1MLCC Schedule #1
MLCC Schedule #1
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

machine learning workflow with data input.pptx

  • 3. MACHINE LEARNING MODEL • A formal definition of ML given by professor Mitchell − : “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” • ML is a field of AI consisting of learning algorithms that • Improve their performance • At executing some task • Over time with experience
  • 4. TASK • From the perspective of problem, we may define the task T as the real-world problem to be solved. • The problem can be anything like finding best house price in a specific location or to find best marketing strategy etc. On the other hand, if we talk about machine learning, the definition of task is different because it is difficult to solve ML based tasks by conventional programming approach. • A task T is said to be a ML based task when it is based on the process and the system must follow for operating on data points. The examples of ML based tasks are Classification, Regression, Structured annotation, Clustering, Transcription etc.
  • 5. EXPERIENCE E • As name suggests, it is the knowledge gained from data points provided to the algorithm or model. • Once provided with the dataset, the model will run iteratively and will learn some inherent pattern. The learning thus acquired is called experience . • Making an analogy with human learning, we can think of this situation as in which a human being is learning or gaining some experience from various attributes like situation, relationships etc. • Supervised, unsupervised and reinforcement learning are some ways to learn or gain experience. The experience gained by out ML model or algorithm will be used to solve the task T.
  • 6. PERFORMANCE P • An ML algorithm is supposed to perform task and gain experience with the passage of time. • The measure which tells whether ML algorithm is performing as per expectation or not is its performance . • P is basically a quantitative metric that tells how a model is performing the task, T, using its experience, E. • There are many metrics that help to understand the ML performance, such as accuracy score, F1 score, confusion matrix, precision, recall, sensitivity etc.
  • 7. CHALLENGES IN MACHINES LEARNING • Quality of data − Having good-quality data for ML algorithms is one of the biggest challenges. Use of low-quality data leads to the problems related to data preprocessing and feature extraction. • Time-Consuming task − Another challenge faced by ML models is the consumption of time especially for data acquisition, feature extraction and retrieval. • Lack of specialist persons − As ML technology is still in its infancy stage, availability of expert resources is a tough job.
  • 8. CHALLENGES IN MACHINES LEARNING • No clear objective for formulating business problems − Having no clear objective and well-defined goal for business problems is another key challenge for ML because this technology is not that mature yet. • Issue of overfitting & underfitting − If the model is overfitting or underfitting, it cannot be represented well for the problem. • Curse of dimensionality − Another challenge ML model faces is too many features of data points. This can be a real hindrance. • Difficulty in deployment − Complexity of the ML model makes it quite difficult to be deployed in real life.
  • 9. APPLICATIONS OF MACHINES LEARNING ML can be used to solve many real-world complex problems which cannot be solved with traditional approach. • Emotion analysis • Sentiment analysis • Error detection and prevention • Weather forecasting and prediction • Stock market analysis and forecasting • Speech synthesis, speech recognition • Customer segmentation, object recognition • Fraud detection, fraud prevention • Recommendation of products to customer in online shopping
  • 10. TYPES OF LEARNING Is this A or B? How much? How many? How is this organized? What should I do now?
  • 11. DATA LOADING FOR ML PROJECTS • The most common format of data for ML projects is CSV (comma-separated values), a simple file format which is used to store tabular data (number and text) such as a spreadsheet in plain text. • File Head: the header contains the information for each field • Two cases related to CSV file header which must be considered − • Case-I: When Data file is having a file header − It will automatically assign the names to each column of data if data file is having a file header. • Case-II: When Data file is not having a file header − We need to assign the names to each column of data manually if data file is not having a file header.
  • 12. PANDAS OR CSV OR NUMPY
  • 13. UNDERSTANDING DATA WITH STATISTICS • Looking at Raw Data • Checking Dimensions of Data • Statistical Summary of Data • Reviewing Class Distribution • Reviewing Correlation between Attributes • Reviewing Skew of Attribute Distribution
  • 14. UNDERSTANDING DATA WITH VISUALIZATION
  • 15. PREPARING DATA • This makes data preparation the most important step in ML process. Data preparation may be defined as the procedure that makes our dataset more appropriate for ML process. • Data Pre-processing Techniques: • Scaling • Normalization • L1 normalization • L2 normalization • Binarization • Standarization • Data labeling
  • 16. SCALING • Data rescaling makes sure that attributes are at same scale. • Generally, attributes are rescaled into the range of 0 and 1. ML algorithms like gradient descent and k-Nearest Neighbors requires scaled data. • We can rescale the data with the help of MinMaxScaler class of scikit- learn Python library.
  • 17. BINARIZATION • We can use a binary threshold for making our data binary. The values above that threshold value will be converted to 1 and below that threshold will be converted to 0. • For example, if we choose threshold value = 0.5, then the dataset value above it will become 1 and below this will become 0. That is why we can call it binarizing the data or thresholding the data.
  • 18. STANDARIZATION • The standarization transforms the data attributes with a Gaussian distribution. • It differs the mean and SD (Standard Deviation) to a standard Gaussian distribution with a mean of 0 and a SD of 1. • This technique is useful in ML algorithms like linear regression, logistic regression that assumes a Gaussian distribution in input dataset and produce better results with rescaled data. • We can standardize the data (mean = 0 and SD =1) with the help of StandardScaler class of scikit-learn Python library.
  • 19.
  • 20. DATA LABELING • It is also very important to send the data to ML algorithms having proper labeling. • For example, in case of classification problems, lot of labels in the form of words, numbers etc. are there on the data.
  • 21.
  • 22. DATA FEATURE SELECTION • The performance of machine learning model is directly proportional to the data features used to train it. • On the other hand, use of relevant data features can increase the accuracy of your ML model especially linear and logistic regression. • The following are some of the benefits of automatic feature selection before modeling the data − • Performing feature selection before data modeling will reduce the overfitting. • Performing feature selection before data modeling will increases the accuracy of ML model. • Performing feature selection before data modeling will reduce the training time
  • 23. FEATURE SELECTION TECHNIQUES • The followings are automatic feature selection techniques that we can use to model ML data in Python − • Univariate Selection • Recursive Feature Elimination • Principal Component Analysis (PCA) • Feature Importance
  • 24. UNIVARIATE SELECTION • This feature selection technique is very useful in selecting those features, with the help of statistical testing, having strongest relationship with the prediction variables. • We can implement univariate feature selection technique with the help of SelectKBest0class of scikit-learn Python library.
  • 25.
  • 26. RECURSIVE FEATURE ELIMINATION • As the name suggests, RFE (Recursive feature elimination) feature selection technique removes the attributes recursively and builds the model with remaining attributes. • We can implement RFE feature selection technique with the help of RFE class of scikit-learn Python library.
  • 27.
  • 28. PRINCIPAL COMPONENT ANALYSIS (PCA) • PCA, generally called data reduction technique, is very useful feature selection technique as it uses linear algebra to transform the dataset into a compressed form. • We can implement PCA feature selection technique with the help of PCA class of scikit-learn Python library. We can select number of principal components in the output.
  • 29.
  • 30. FEATURE IMPORTANCE • As the name suggests, feature importance technique is used to choose the importance features. • It basically uses a trained supervised classifier to select features. We can implement this feature selection technique with the help of ExtraTreeClassifier class of scikit-learn Python library.