Machine learning introduction

Anas Jamil
Anas JamilSoftware Developer
Machine learning intro
The only limit to AI is human imagination. - Chris Duffey
By : Anas Jamil
Mar - 2019
Agenda
1- AI & ML & DL.
2- Machine learning (ML) introduction.
3- Types of Machine learning.
4- ML data types.
5- Working with missing data
6- Model performance (fitting)
AI & ML & DL introduction.
Artificial Intelligence (AI) : It is the study of how to train the computers so that computers can do things
which at present human can do. -www.geeksforgeeks.org-
Machine learning (ML) :Is the scientific study of algorithms and statistical models that computer systems
use to perform a specific task without using explicit instructions. -wikipedia-
Deep learning (DL): is an artificial intelligence function that imitates the workings of the human brain in
processing data and creating patterns for use in decision making.
Machine learning introduction
What is Machine Learning?
“Learning is any process by which a system improves performance from
experience.” - Herbert Simon
Some use cases: We ML when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models are based on huge amounts of data (genomics).
Machine learning (ML)
Machine learning introduction
1- Supervised (inductive/class driven) learning
Supervised learning: is the machine learning task of learning a function that maps an input to an output
based on example input-output pairs.
Supervised learning: is where you have input variables (x) and an output variable (Y) and you use an
algorithm to learn the mapping function from the input to the output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (x) that you
can predict the output variables (Y) for that data.
It is called supervised learning because the process of an algorithm learning from the training dataset can
be thought of as a teacher supervising the learning process.
Supervised learning terms:
Class, Target, Label
Attribute, Feature
Labeled data, dataset,
Sample data
Sample
Example
Record
Row
Instance
observation
Supervised learning:
Supervised learning algorithm types:
1- Regression: It is a Supervised Learning task where output is having continuous value.(Numeric output )
Ex: how much home worth.
2- Classifications: It is a Supervised Learning task where output is having defined labels(discrete value).
A- Binary classification: Yes/No
Ex: Spam email?
B- Multi-classes: One out of several outputs.
Ex:What is the weather?
Sample of algorithms : Support Vector Machine (SVM), Random Forest, Linear Regression, Decision Trees
2-Unsupervised (data driven) learning
Training data does not include desired outputs.
Unsupervised learning: is very much the opposite of supervised learning. It features no labels. Instead,
our algorithm would be fed a lot of data and given the tools to understand the properties of the data. From
there, it can learn to group, cluster, and/or organize the data in a way such that a human (or other
intelligent algorithm) can come in and make sense of the newly organized data.
Unsupervised learning
Unsupervised learning classified into two categories of algorithms:
Clustering: A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that describe
large portions of your data, such as people that buy X also tend to buy Y.
Unsupervised learning: Dimensionality Reduction (DR)
There are two components of dimensionality reduction:
Feature extraction: This reduces the data in a high dimensional space to a lower dimension space,
i.e. a space with lesser no. of dimensions.
a+b+c+d = e
ab+c+d = e
Feature selection: In this, we try to find a subset of the original set of features, to get a smaller subset
which can be used to model the problem
c = 0
Sample of algorithms : FastText alg, BlazingText alg, Principal component analysis (PCA)
3- Reinforcement learning (RL)
It is about taking suitable action to maximize reward in a particular situation.
3- Reinforcement learning (RL)
Types of Reinforcement: There are two types of Reinforcement:
1- Positive: Positive Reinforcement is defined as when an event, occurs due to a particular
behavior, increases the strength and the frequency of the behavior. In other words it has a positive
effect.
2-Negative: Negative Reinforcement is defined as strengthening of a behavior because a negative
condition is stopped or avoided.
Use Cases of RL : Real-time decisions , Game AI, Robo navigation, auto drive cars
Data types from ML perspectives
1- Numerical Data
2- Categorical Data
3- Time Series Data
4- Text
1- Numerical Data:
Numerical data is any data where data points are exact numbers. Statisticians also might call numerical
data, quantitative data. This data has meaning as a measurement such as house prices.
2- Categorical Data
Categorical data represents characteristics, such as a hockey player’s positions.
Categorical data can take numerical values. For example, maybe we would use 1 for colour red and 2 for
blue. But these numbers don’t have a mathematical meaning.
3- Time Series Data
Time series data is a sequence of numbers collected at regular intervals over some period of time. It is
very important, especially in particular fields like finance. Time series data has a temporal value attached
to it, so this would be something like a date or a timestamp that you can look for trends in time.
4- Text
Text data is basically just words. A lot of the time the first thing that you do with text is you turn it into
numbers using some interesting functions like the bag of words formulation.
We can use stemming, lowercase functions .. etc
4- Text
This is working not disappointed
This is not working. disappointed
Tokenization :
[ ‘disappointed’, ’is’, ’not’, ’working’, ’this’ ]
4- Text
4- Text
Orthogonal sparse bigram (OSB) :
5- Working with missing data
For row have a missing values you can :
1- Delete the row (if data is not related).
2- Impute missing data:
A- if data is related to each other you can calculate the mean for that column.
B- if data is independent you can pick data from another row.
C-if data is related to timestamp:
1- interpolation
2- fill backward
3- fill forward
5- Working with missing data
5- Working with missing data
5- Working with missing data
5- Working with missing data
5- Working with missing data
5- Working with missing data
5- Working with missing data
Some useful lib for python:
1-Numpy : Mathematical function for optimize large data
2- Pandas : Data analyzing and modeling & reading
3- Matplotlib : Plotting library for visualize the data
6- Model performance (fitting)
Relationship between input and output could be:
1- Liner
2- non-liner
Knowing this relation will help in using algorithm and choose the attributes needed in
predict function
6- Model performance (fitting)
1- Underfitting:
When: Poor performance in testing set , poor in training set
Why: Feature is not enough to capture the relationship between input and output
How : Add more rows , or add more features, optimize the hyperparameters
2- Overfitting:
When: Poor performance in testing set , Good in training set
Why: Model memories the data it has seen and unable to generalize it on unseen data.
How: Removing complex feature and optimize the hyperparameters
3- Balanced:
Good performance in testing set , Poor in training set
Regression model performance
Common Techniques for evaluating performance:
Visually observe using Plots
Residual Histograms (negative less than positive)
Evaluate with Metrics like Root Mean Square Error (RMSE)
Binary & multi-class model performance
Common Techniques for evaluating performance:
Visually observe using Plots
Confusion Matrix
Binary model performance (SKLearn)
References:
https://docs.aws.amazon.com/machine-learning/index.html
1 of 38

Recommended

Introduction to machine learning by
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningGanesh Satpute
682 views21 slides
Introduction to ML (Machine Learning) by
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
2K views31 slides
Machine Learning by
Machine LearningMachine Learning
Machine LearningRahul Kumar
1.5K views27 slides
machine learning by
machine learningmachine learning
machine learningsoundaryasarya
441 views29 slides
Introduction to Machine Learning by
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningshivani saluja
268 views18 slides
Machine learning by
Machine learningMachine learning
Machine learningRajib Kumar De
2.6K views20 slides

More Related Content

What's hot

Introduction to Machine Learning by
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
128.4K views35 slides
Application of predictive analytics by
Application of predictive analyticsApplication of predictive analytics
Application of predictive analyticsPrasad Narasimhan
1.4K views41 slides
Machine Learning for Dummies by
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for DummiesVenkata Reddy Konasani
60.3K views42 slides
Machine Learning by
Machine LearningMachine Learning
Machine LearningVivek Garg
2.5K views21 slides
Supervised and Unsupervised Machine Learning by
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSpotle.ai
502 views11 slides
Introduction to Machine Learning by
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDr. Radhey Shyam
426 views39 slides

What's hot(20)

Introduction to Machine Learning by Rahul Jain
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain128.4K views
Machine Learning by Vivek Garg
Machine LearningMachine Learning
Machine Learning
Vivek Garg2.5K views
Supervised and Unsupervised Machine Learning by Spotle.ai
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai502 views
Machine learning module 2 by Gokulks007
Machine learning module 2Machine learning module 2
Machine learning module 2
Gokulks007619 views
Data Quality for Machine Learning Tasks by Hima Patel
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
Hima Patel1.8K views
introduction to machine learning by Johnson Ubah
introduction to machine learningintroduction to machine learning
introduction to machine learning
Johnson Ubah180 views
Introduction to Machine Learning by Eng Teong Cheah
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Eng Teong Cheah2.4K views
Introduction to-machine-learning by Babu Priyavrat
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
Babu Priyavrat2.5K views
2.17Mb ppt by butest
2.17Mb ppt2.17Mb ppt
2.17Mb ppt
butest1.6K views
Module 3: Linear Regression by Sara Hooker
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker209 views
Introduction to Machine Learning by James Ward
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
James Ward5.4K views
Machine learning overview by prih_yah
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah535 views
Machine Learning Overview by Mykhailo Koval
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
Mykhailo Koval1.3K views
Machine learning by eonx_32
Machine learningMachine learning
Machine learning
eonx_32785 views
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T... by Simplilearn
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn7.7K views

Similar to Machine learning introduction

Machine Learning - Deep Learning by
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep LearningAdetimehin Oluwasegun Matthew
106 views10 slides
Introduction to machine learning by
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningAdetimehin Oluwasegun Matthew
75 views10 slides
Machine Learning Basics by
Machine Learning BasicsMachine Learning Basics
Machine Learning BasicsSuresh Arora
736 views31 slides
Chapter 05 Machine Learning.pptx by
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxssuser957b41
6 views81 slides
Introduction to Datamining Concept and Techniques by
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
858 views29 slides
Machine learning-in-details-with-out-python-code by
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeOsama Ghandour Geris
66 views14 slides

Similar to Machine learning introduction(20)

Machine Learning Basics by Suresh Arora
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
Suresh Arora736 views
Chapter 05 Machine Learning.pptx by ssuser957b41
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptx
ssuser957b416 views
Introduction to Datamining Concept and Techniques by Sơn Còm Nhom
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom858 views
Introduction to data science.pdf by alsaid fathy
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdf
alsaid fathy311 views
Self Study Business Approach to DS_01022022.docx by Shanmugasundaram M
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M169 views
Data Science as a Career and Intro to R by Anshik Bansal
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal198 views
Data analytcis-first-steps by Shesha R
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R895 views
Machine Can Think by Rahul Jaiman
Machine Can ThinkMachine Can Think
Machine Can Think
Rahul Jaiman3.2K views
Knowledge representation in AI by Vishal Singh
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
Vishal Singh88.9K views
notes as .ppt by butest
notes as .pptnotes as .ppt
notes as .ppt
butest763 views
Analysis using r by Priya Mohan
Analysis using rAnalysis using r
Analysis using r
Priya Mohan110 views
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat... by Rohit Dubey
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey40 views
Fundementals of Machine Learning and Deep Learning by ParrotAI
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning
ParrotAI359 views
End-to-End Machine Learning Project by Eng Teong Cheah
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah2.5K views

Recently uploaded

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
6 views11 slides
MOSORE_BRESCIA by
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 slides
TGP 2.docx by
TGP 2.docxTGP 2.docx
TGP 2.docxsandi636490
10 views8 slides
Ukraine Infographic_22NOV2023_v2.pdf by
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K views3 slides
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx by
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxJaysonGarabilesEspej
6 views3 slides
CRM stick or twist.pptx by
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptxinfo828217
10 views16 slides

Recently uploaded(20)

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821710 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20046 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra16 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials14 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821712 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9016 views
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info8282179 views

Machine learning introduction

  • 1. Machine learning intro The only limit to AI is human imagination. - Chris Duffey By : Anas Jamil Mar - 2019
  • 2. Agenda 1- AI & ML & DL. 2- Machine learning (ML) introduction. 3- Types of Machine learning. 4- ML data types. 5- Working with missing data 6- Model performance (fitting)
  • 3. AI & ML & DL introduction. Artificial Intelligence (AI) : It is the study of how to train the computers so that computers can do things which at present human can do. -www.geeksforgeeks.org- Machine learning (ML) :Is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions. -wikipedia- Deep learning (DL): is an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making.
  • 5. What is Machine Learning? “Learning is any process by which a system improves performance from experience.” - Herbert Simon Some use cases: We ML when: • Human expertise does not exist (navigating on Mars) • Humans can’t explain their expertise (speech recognition) • Models are based on huge amounts of data (genomics).
  • 8. 1- Supervised (inductive/class driven) learning Supervised learning: is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. Supervised learning: is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data. It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process.
  • 9. Supervised learning terms: Class, Target, Label Attribute, Feature Labeled data, dataset, Sample data Sample Example Record Row Instance observation
  • 11. Supervised learning algorithm types: 1- Regression: It is a Supervised Learning task where output is having continuous value.(Numeric output ) Ex: how much home worth. 2- Classifications: It is a Supervised Learning task where output is having defined labels(discrete value). A- Binary classification: Yes/No Ex: Spam email? B- Multi-classes: One out of several outputs. Ex:What is the weather? Sample of algorithms : Support Vector Machine (SVM), Random Forest, Linear Regression, Decision Trees
  • 12. 2-Unsupervised (data driven) learning Training data does not include desired outputs. Unsupervised learning: is very much the opposite of supervised learning. It features no labels. Instead, our algorithm would be fed a lot of data and given the tools to understand the properties of the data. From there, it can learn to group, cluster, and/or organize the data in a way such that a human (or other intelligent algorithm) can come in and make sense of the newly organized data.
  • 13. Unsupervised learning Unsupervised learning classified into two categories of algorithms: Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
  • 14. Unsupervised learning: Dimensionality Reduction (DR) There are two components of dimensionality reduction: Feature extraction: This reduces the data in a high dimensional space to a lower dimension space, i.e. a space with lesser no. of dimensions. a+b+c+d = e ab+c+d = e Feature selection: In this, we try to find a subset of the original set of features, to get a smaller subset which can be used to model the problem c = 0 Sample of algorithms : FastText alg, BlazingText alg, Principal component analysis (PCA)
  • 15. 3- Reinforcement learning (RL) It is about taking suitable action to maximize reward in a particular situation.
  • 16. 3- Reinforcement learning (RL) Types of Reinforcement: There are two types of Reinforcement: 1- Positive: Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior. In other words it has a positive effect. 2-Negative: Negative Reinforcement is defined as strengthening of a behavior because a negative condition is stopped or avoided. Use Cases of RL : Real-time decisions , Game AI, Robo navigation, auto drive cars
  • 17. Data types from ML perspectives 1- Numerical Data 2- Categorical Data 3- Time Series Data 4- Text
  • 18. 1- Numerical Data: Numerical data is any data where data points are exact numbers. Statisticians also might call numerical data, quantitative data. This data has meaning as a measurement such as house prices.
  • 19. 2- Categorical Data Categorical data represents characteristics, such as a hockey player’s positions. Categorical data can take numerical values. For example, maybe we would use 1 for colour red and 2 for blue. But these numbers don’t have a mathematical meaning.
  • 20. 3- Time Series Data Time series data is a sequence of numbers collected at regular intervals over some period of time. It is very important, especially in particular fields like finance. Time series data has a temporal value attached to it, so this would be something like a date or a timestamp that you can look for trends in time.
  • 21. 4- Text Text data is basically just words. A lot of the time the first thing that you do with text is you turn it into numbers using some interesting functions like the bag of words formulation. We can use stemming, lowercase functions .. etc
  • 22. 4- Text This is working not disappointed This is not working. disappointed Tokenization : [ ‘disappointed’, ’is’, ’not’, ’working’, ’this’ ]
  • 24. 4- Text Orthogonal sparse bigram (OSB) :
  • 25. 5- Working with missing data For row have a missing values you can : 1- Delete the row (if data is not related). 2- Impute missing data: A- if data is related to each other you can calculate the mean for that column. B- if data is independent you can pick data from another row. C-if data is related to timestamp: 1- interpolation 2- fill backward 3- fill forward
  • 26. 5- Working with missing data
  • 27. 5- Working with missing data
  • 28. 5- Working with missing data
  • 29. 5- Working with missing data
  • 30. 5- Working with missing data
  • 31. 5- Working with missing data
  • 32. 5- Working with missing data Some useful lib for python: 1-Numpy : Mathematical function for optimize large data 2- Pandas : Data analyzing and modeling & reading 3- Matplotlib : Plotting library for visualize the data
  • 33. 6- Model performance (fitting) Relationship between input and output could be: 1- Liner 2- non-liner Knowing this relation will help in using algorithm and choose the attributes needed in predict function
  • 34. 6- Model performance (fitting) 1- Underfitting: When: Poor performance in testing set , poor in training set Why: Feature is not enough to capture the relationship between input and output How : Add more rows , or add more features, optimize the hyperparameters 2- Overfitting: When: Poor performance in testing set , Good in training set Why: Model memories the data it has seen and unable to generalize it on unseen data. How: Removing complex feature and optimize the hyperparameters 3- Balanced: Good performance in testing set , Poor in training set
  • 35. Regression model performance Common Techniques for evaluating performance: Visually observe using Plots Residual Histograms (negative less than positive) Evaluate with Metrics like Root Mean Square Error (RMSE)
  • 36. Binary & multi-class model performance Common Techniques for evaluating performance: Visually observe using Plots Confusion Matrix