Introduction to Machine Learning

•Download as PPT, PDF•

0 likes•564 views

This document provides an introduction to machine learning, including definitions of key terminology such as features, training data, test data, validation data, vectors, similarity, supervised learning, unsupervised learning, dimensionality reduction, and overfitting. It then outlines the typical steps in a model building cycle, including data collection, cleaning, preprocessing, sampling, model building, deployment, and improvement. Several common supervised learning methods are listed, such as linear regression, logistic regression, decision trees, and ensemble methods. Unsupervised learning methods like k-means clustering and hierarchical clustering are also introduced. The document concludes by discussing similarity measures, dimensionality reduction techniques, recommendations, text mining, and the vector space model.

Education

1
Introduction
to
Machine Learning
by
Shiva Dasharathi

2
Machine Learning: In simple terms, is a set of pattern learning
techniques
- These techniques are based on statistical assumptions of the
data
- Conceptually these techniques can be applied to various
forms of data
- Machine learning models are built on training data (praportion
of the actual data) and then are used to predict pattern of
unseen data
Statistical Model: The outcome of a machine learning process is an
entity (or) a model, and is often called a statistical model
Terminology

3
Feature (or) Dimension: Feature, Dimension, Variable, Attribute,
Property represent the characteristic of a data
Ex: {age, height, gender) are Features of User
Training Data (60-80%): Sampled data used for building the model
Test Data (20%): Sampled data used for testing the model
Validation Data (20%): Sampled data (unseen) used for validating
the model
Terminology cont ..

4
Vector : A vector is a multi dimensional representation of a data
point,
- each row in a matrix is a Vector
Similarity : Is a measure used to represent how close 2 data points
are the vector space model
Ex: Euclidian Distance, Cosine etc.
Terminology cont ..

5
Supervised Learning: are modeling techniques where you have the
labeled data
Ex: Customer Segmentation using Classification
Un-supervised Learning: are modeling techniques where you don’t
have the labeled data and are based on the natural occurrence
of the data
Ex: Customer Segmentation using Clustering
Dimensionality Reduction: Techniques to reduce the M dimensions
to N dimensions where M>N,
- That can explain most variation in the data,
- so that the computations & interpretations are easy.
Terminology cont ..

6
Overfitting: If a model is tuned too much for the training data, it wont
be able to predict the unseen with accuracy, this situation is
called over fitting.
Terminology cont ..

7
Typical steps of a model building cycle, but not limited to are,
1. Data collection: collecting data from sources
2. Data cleaning: Dealing with missing values etc.
3. Pre-processing: Outliers & transformations
4. Random sampling: train, test , validation sets
5. Model building: iterative process
1. Feature selection: sub set feature selection that explains
data better
2. Validation: Finalize model summaries
3. Model Selection: Model comparison & final model
6. Model deployment: For predicting unseen data
7. Feedback & model improvement
Model building cycle

8
A few Supervised Learning methods to explore
1. Linear Multiple Regression
2. Logistic Regression
3. Decision Tree
1. CART
2. CHAID
4. Ensemble Methods
1. Bagging
2. Boosting
5. KNN
6. Naïve Bayesian
Supervised Learning

9
A few Un-supervised Learning methods to explore
1. K-means clustering
2. Hierarchical clustering
Unsupervised learning

10
A few similarity measures to explore
1. Euclidian distance
2. Cosine similarity
3. Pearson correlation
4. Jaccard similarity
5. Tanimoto distance
Similarity measures

11
A few dimensionality reduction methods to explore
PCA
Factor Analysis
SVD
Dealing with sparsity
Min Hashing
LSH
Dimensionality reduction

12
Collaborative Filtering
Item based
User based
slope-one
Challenges
clod start problem
curse of dimensionality
outliers
frequent items/association rules
Recommendations

13
Text Mining
1. NLP approach (building language dependant models)
2. Machine Learning approach:
documents are converted into vector space model, and
machine learning techniques are applied on them to solve
problems.
Vector space model
documents => data points
words in the documents=> features
Feature, Document pairs
<feature , document, TF*IDF>
TF = normalized Term Frequency
IDF = Inverse Document Frequency
Text Mining

What's hot

Machine learningMohit Bishnoi

Supervised Machine LearningAnkit Rai

Cmpe 255 cross validationAbraham Kong

Exploratory Factor Analysis With Small Samples and Missing DataFatemeh Nikbakht

Supervised and Unsupervised Machine LearningSpotle.ai

Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch

Using machine learning in anti money laundering part 2Naveen Grover

Students academic performance using clustering techniquesaniacorreya

RapidMiner: Learning Schemes In Rapid MinerDataminingTools Inc

Supervised Machine Learning TechniquesTara ram Goyal

Supervised machine learning algorithms(strengths and weaknesses)MonarchSaha

Strategy pattern ooad presentationKimliang Mich

Using Machine Learning in Anti Money Laundering - Part 1Naveen Grover

Student Performance Data Mining Project ReportJinnah University for Women

Machine learning basicsAtheenaPandian Enterprises

Advanced Working Principles on Supervised and Unsupervised LearningNahin Kumar Dey

Ml part2Leon Gladston

Multiple imputation of missing dataStatistics Specialist

What's hot (18)

Machine learning

Supervised Machine Learning

Cmpe 255 cross validation

Exploratory Factor Analysis With Small Samples and Missing Data

Supervised and Unsupervised Machine Learning

Machine Learning - Accuracy and Confusion Matrix

Using machine learning in anti money laundering part 2

Students academic performance using clustering technique

RapidMiner: Learning Schemes In Rapid Miner

Supervised Machine Learning Techniques

Supervised machine learning algorithms(strengths and weaknesses)

Strategy pattern ooad presentation

Using Machine Learning in Anti Money Laundering - Part 1

Student Performance Data Mining Project Report

Machine learning basics

Advanced Working Principles on Supervised and Unsupervised Learning

Ml part2

Multiple imputation of missing data

Similar to Introduction to Machine Learning

Machine Learning_Unit 2_Full.ppt.pdfDr.DHANALAKSHMI SENTHILKUMAR

ML PPT-1.pptxTech Vision

Data analytcis-first-stepsShesha R

Pharmacokinetic pharmacodynamic modelingMeghana Gowda

Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38

Optimal Model Complexity (1).pptxMurindanyiSudi1

Machine Learning Interview Questions and AnswersSatyam Jaiswal

ML_Module_1.pdfJafarHussain48

Tech meetup Data Driven - Codemotion antimo musone

Machine learning module 2Gokulks007

Supervised learning techniques and applicationsBenjaminlapid1

machine learning basic-1.pptxDrLola1

Introduction to ML (Machine Learning)SwatiTripathi44

Intro to supervised learning.pptxSaranCreations

Machine learningShailja Tripathi

unit 1.2 supervised learning.pptxDr.Shweta

Post Graduate Admission Prediction SystemIRJET Journal

Presentation on supervised learningTonmoy Bhagawati

Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Intel® Software

NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique Sujeet Suryawanshi

Similar to Introduction to Machine Learning (20)

Machine Learning_Unit 2_Full.ppt.pdf

ML PPT-1.pptx

Data analytcis-first-steps

Pharmacokinetic pharmacodynamic modeling

Top 20 Data Science Interview Questions and Answers in 2023.pdf

Optimal Model Complexity (1).pptx

Machine Learning Interview Questions and Answers

ML_Module_1.pdf

Tech meetup Data Driven - Codemotion

Machine learning module 2

Supervised learning techniques and applications

machine learning basic-1.pptx

Introduction to ML (Machine Learning)

Intro to supervised learning.pptx

Machine learning

unit 1.2 supervised learning.pptx

Post Graduate Admission Prediction System

Presentation on supervised learning

Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...

NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique

Recently uploaded

Application orientated numerical on hev.pptRamjanShidvankar

Towards a code of practice for AI in AT.pptxJisc

Sociology 101 Demonstration of Learning Exhibitjbellavia9

Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University of Engineering & Technology, Jamshoro

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur

REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda

This PowerPoint helps students to consider the concept of infinity.christianmathematics

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith

Understanding Accommodations and ModificationsMJDuyan

How to Give a Domain for a Field in Odoo 17Celine George

Making communications land - Are they received and understood as intended? we...Association for Project Management

Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh

Graduate Outcomes Presentation Slides - Englishneillewis46

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection

Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417

Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid

Single or Multiple melodic lines structuredhanjurrannsibayan2

How to Manage Global Discount in Odoo 17 POSCeline George

Recently uploaded (20)

Application orientated numerical on hev.ppt

Towards a code of practice for AI in AT.pptx

Sociology 101 Demonstration of Learning Exhibit

Mehran University Newsletter Vol-X, Issue-I, 2024

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx

REMIFENTANIL: An Ultra short acting opioid.pptx

This PowerPoint helps students to consider the concept of infinity.

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Fostering Friendships - Enhancing Social Bonds in the Classroom

Understanding Accommodations and Modifications

How to Give a Domain for a Field in Odoo 17

Making communications land - Are they received and understood as intended? we...

Micro-Scholarship, What it is, How can it help me.pdf

Graduate Outcomes Presentation Slides - English

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...

Unit-V; Pricing (Pharma Marketing Management).pptx

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Single or Multiple melodic lines structure

How to Manage Global Discount in Odoo 17 POS

Introduction to Machine Learning

1. 1 Introduction to Machine Learning by Shiva Dasharathi

2. 2 Machine Learning: In simple terms, is a set of pattern learning techniques - These techniques are based on statistical assumptions of the data - Conceptually these techniques can be applied to various forms of data - Machine learning models are built on training data (praportion of the actual data) and then are used to predict pattern of unseen data Statistical Model: The outcome of a machine learning process is an entity (or) a model, and is often called a statistical model Terminology

3. 3 Feature (or) Dimension: Feature, Dimension, Variable, Attribute, Property represent the characteristic of a data Ex: {age, height, gender) are Features of User Training Data (60-80%): Sampled data used for building the model Test Data (20%): Sampled data used for testing the model Validation Data (20%): Sampled data (unseen) used for validating the model Terminology cont ..

4. 4 Vector : A vector is a multi dimensional representation of a data point, - each row in a matrix is a Vector Similarity : Is a measure used to represent how close 2 data points are the vector space model Ex: Euclidian Distance, Cosine etc. Terminology cont ..

5. 5 Supervised Learning: are modeling techniques where you have the labeled data Ex: Customer Segmentation using Classification Un-supervised Learning: are modeling techniques where you don’t have the labeled data and are based on the natural occurrence of the data Ex: Customer Segmentation using Clustering Dimensionality Reduction: Techniques to reduce the M dimensions to N dimensions where M>N, - That can explain most variation in the data, - so that the computations & interpretations are easy. Terminology cont ..

6. 6 Overfitting: If a model is tuned too much for the training data, it wont be able to predict the unseen with accuracy, this situation is called over fitting. Terminology cont ..

7. 7 Typical steps of a model building cycle, but not limited to are, 1. Data collection: collecting data from sources 2. Data cleaning: Dealing with missing values etc. 3. Pre-processing: Outliers & transformations 4. Random sampling: train, test , validation sets 5. Model building: iterative process 1. Feature selection: sub set feature selection that explains data better 2. Validation: Finalize model summaries 3. Model Selection: Model comparison & final model 6. Model deployment: For predicting unseen data 7. Feedback & model improvement Model building cycle

8. 8 A few Supervised Learning methods to explore 1. Linear Multiple Regression 2. Logistic Regression 3. Decision Tree 1. CART 2. CHAID 4. Ensemble Methods 1. Bagging 2. Boosting 5. KNN 6. Naïve Bayesian Supervised Learning

9. 9 A few Un-supervised Learning methods to explore 1. K-means clustering 2. Hierarchical clustering Unsupervised learning

10. 10 A few similarity measures to explore 1. Euclidian distance 2. Cosine similarity 3. Pearson correlation 4. Jaccard similarity 5. Tanimoto distance Similarity measures

11. 11 A few dimensionality reduction methods to explore PCA Factor Analysis SVD Dealing with sparsity Min Hashing LSH Dimensionality reduction

12. 12 Collaborative Filtering Item based User based slope-one Challenges clod start problem curse of dimensionality outliers frequent items/association rules Recommendations

13. 13 Text Mining 1. NLP approach (building language dependant models) 2. Machine Learning approach: documents are converted into vector space model, and machine learning techniques are applied on them to solve problems. Vector space model documents => data points words in the documents=> features Feature, Document pairs <feature , document, TF*IDF> TF = normalized Term Frequency IDF = Inverse Document Frequency Text Mining

14. 14 Thank you ! 

Introduction to Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Introduction to Machine Learning

Similar to Introduction to Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Introduction to Machine Learning