•2 likes•470 views

Report

Share

In this slide I answer the basic questions about machine learning like: What is Machine Learning? What are the types of machine learning? How to deal with data? How to test model performance?

Follow

- 1. Machine learning intro The only limit to AI is human imagination. - Chris Duffey By : Anas Jamil Mar - 2019
- 2. Agenda 1- AI & ML & DL. 2- Machine learning (ML) introduction. 3- Types of Machine learning. 4- ML data types. 5- Working with missing data 6- Model performance (fitting)
- 3. AI & ML & DL introduction. Artificial Intelligence (AI) : It is the study of how to train the computers so that computers can do things which at present human can do. -www.geeksforgeeks.org- Machine learning (ML) :Is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions. -wikipedia- Deep learning (DL): is an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making.
- 5. What is Machine Learning? “Learning is any process by which a system improves performance from experience.” - Herbert Simon Some use cases: We ML when: • Human expertise does not exist (navigating on Mars) • Humans can’t explain their expertise (speech recognition) • Models are based on huge amounts of data (genomics).
- 8. 1- Supervised (inductive/class driven) learning Supervised learning: is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. Supervised learning: is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data. It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process.
- 9. Supervised learning terms: Class, Target, Label Attribute, Feature Labeled data, dataset, Sample data Sample Example Record Row Instance observation
- 11. Supervised learning algorithm types: 1- Regression: It is a Supervised Learning task where output is having continuous value.(Numeric output ) Ex: how much home worth. 2- Classifications: It is a Supervised Learning task where output is having defined labels(discrete value). A- Binary classification: Yes/No Ex: Spam email? B- Multi-classes: One out of several outputs. Ex:What is the weather? Sample of algorithms : Support Vector Machine (SVM), Random Forest, Linear Regression, Decision Trees
- 12. 2-Unsupervised (data driven) learning Training data does not include desired outputs. Unsupervised learning: is very much the opposite of supervised learning. It features no labels. Instead, our algorithm would be fed a lot of data and given the tools to understand the properties of the data. From there, it can learn to group, cluster, and/or organize the data in a way such that a human (or other intelligent algorithm) can come in and make sense of the newly organized data.
- 13. Unsupervised learning Unsupervised learning classified into two categories of algorithms: Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
- 14. Unsupervised learning: Dimensionality Reduction (DR) There are two components of dimensionality reduction: Feature extraction: This reduces the data in a high dimensional space to a lower dimension space, i.e. a space with lesser no. of dimensions. a+b+c+d = e ab+c+d = e Feature selection: In this, we try to find a subset of the original set of features, to get a smaller subset which can be used to model the problem c = 0 Sample of algorithms : FastText alg, BlazingText alg, Principal component analysis (PCA)
- 15. 3- Reinforcement learning (RL) It is about taking suitable action to maximize reward in a particular situation.
- 16. 3- Reinforcement learning (RL) Types of Reinforcement: There are two types of Reinforcement: 1- Positive: Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior. In other words it has a positive effect. 2-Negative: Negative Reinforcement is defined as strengthening of a behavior because a negative condition is stopped or avoided. Use Cases of RL : Real-time decisions , Game AI, Robo navigation, auto drive cars
- 17. Data types from ML perspectives 1- Numerical Data 2- Categorical Data 3- Time Series Data 4- Text
- 18. 1- Numerical Data: Numerical data is any data where data points are exact numbers. Statisticians also might call numerical data, quantitative data. This data has meaning as a measurement such as house prices.
- 19. 2- Categorical Data Categorical data represents characteristics, such as a hockey player’s positions. Categorical data can take numerical values. For example, maybe we would use 1 for colour red and 2 for blue. But these numbers don’t have a mathematical meaning.
- 20. 3- Time Series Data Time series data is a sequence of numbers collected at regular intervals over some period of time. It is very important, especially in particular fields like finance. Time series data has a temporal value attached to it, so this would be something like a date or a timestamp that you can look for trends in time.
- 21. 4- Text Text data is basically just words. A lot of the time the first thing that you do with text is you turn it into numbers using some interesting functions like the bag of words formulation. We can use stemming, lowercase functions .. etc
- 22. 4- Text This is working not disappointed This is not working. disappointed Tokenization : [ ‘disappointed’, ’is’, ’not’, ’working’, ’this’ ]
- 23. 4- Text
- 24. 4- Text Orthogonal sparse bigram (OSB) :
- 25. 5- Working with missing data For row have a missing values you can : 1- Delete the row (if data is not related). 2- Impute missing data: A- if data is related to each other you can calculate the mean for that column. B- if data is independent you can pick data from another row. C-if data is related to timestamp: 1- interpolation 2- fill backward 3- fill forward
- 26. 5- Working with missing data
- 27. 5- Working with missing data
- 28. 5- Working with missing data
- 29. 5- Working with missing data
- 30. 5- Working with missing data
- 31. 5- Working with missing data
- 32. 5- Working with missing data Some useful lib for python: 1-Numpy : Mathematical function for optimize large data 2- Pandas : Data analyzing and modeling & reading 3- Matplotlib : Plotting library for visualize the data
- 33. 6- Model performance (fitting) Relationship between input and output could be: 1- Liner 2- non-liner Knowing this relation will help in using algorithm and choose the attributes needed in predict function
- 34. 6- Model performance (fitting) 1- Underfitting: When: Poor performance in testing set , poor in training set Why: Feature is not enough to capture the relationship between input and output How : Add more rows , or add more features, optimize the hyperparameters 2- Overfitting: When: Poor performance in testing set , Good in training set Why: Model memories the data it has seen and unable to generalize it on unseen data. How: Removing complex feature and optimize the hyperparameters 3- Balanced: Good performance in testing set , Poor in training set
- 35. Regression model performance Common Techniques for evaluating performance: Visually observe using Plots Residual Histograms (negative less than positive) Evaluate with Metrics like Root Mean Square Error (RMSE)
- 36. Binary & multi-class model performance Common Techniques for evaluating performance: Visually observe using Plots Confusion Matrix
- 37. Binary model performance (SKLearn)