SlideShare a Scribd company logo
1 of 51
Download to read offline
Introduction to Machine Learning
Dr. Amit M. Joshi
Assistant Professor,
Dept. of Electronics & Communication
Malaviya National Institute of Technology
Jaipur (Rajasthan)
1
Contents
❖ Introduction to Data Mining
❖ Need of Data Mining
❖ Knowledge Discovery Process
❖ Machine Learning Introduction
❖ Types of Machine Learning
❖ Applications
2
Introduction
• Data Mining is process to find the pattern from large data ( or Big Data) using
the techniques like Artificial Intelligence, Machine Learning, Statistics and
database systems
• The overall goal of the data mining process is to extract useful information
from data set and transform it into an under stable structure of further use.
• Data Mining is the analysis step of the “Knowledge Discovery to Database”
process which is known as KDD
• The objective is to discover pattern rather than data itself
3
We are data rich but information poor
Why Data Mining is required?
• Volume of information is increasing everyday that we can handle
from business transactions, scientific data, sensor data, Pictures,
videos, etc. So, we need a system that will be capable of extracting
essence of information available and that can automatically generate
report, views or summary of data for better decision-making.
• Automatic summarization of data
• Extracting essence of information stored.
• Discovering patterns in raw data.
• Data Mining also known as Knowledge Discovery in Databases,
refers to the nontrivial extraction of implicit, previously unknown,
and potentially useful information from data stored in databases.
4
Data Mining?
• The actual data mining task is the automatic or semi-automatic analysis of the
large quantities of data.
• The main purpose is to extract previously unknown, interesting patterns such as
group of records (cluster analysis), unusual record (anomaly detection), and
dependencies (association and mining) etc.
• These methods can however be useful for the creation of new hypothesis to test
the against the larger data population.
• It helps to inferring the new information from already collected data
5
Data mining—searching for knowledge
in your data.
Why Data Mining
• The fast-growing, great amount of data, collected and stored in large and many
data repositories, has far exceeded our human ability for understanding without
powerful tools.
• As a result, data collected in large data repositories become “data tombs”—data
archives that are seldom visited.
• Data collected and stored at Enormous speeds (GB/Hour)
➢ remote sensing on satellite
• Data Mining may help scientists for
➢ in classifying and segmenting the data
➢ in hypothesis formation
Basics of Data Mining
• Data analysis is more inline with standard statistical software (i.e. web stats).
These usually present information about subsets and relations within the
recorded dataset (search engine usage, average visit time)
• Data Mining implies software uses some more intelligence over simple
grouping and partitioning of the data to have new inferred information.
• Data Mining is non-trivial process of identifying
➢ Valid
➢ Novel
➢ Implicit
➢ previously unknown
➢ and ultimately understandable patterns
➢ potentially useful
7
Steps for KDD
8
Steps of KDD
1. Data Cleaning: Data cleaning is defined as the removal of noisy
and irrelevant data from the collection.
1. Cleaning in case of Missing values.
2. Cleaning noisy data, where noise is a random or variance error.
3. Cleaning with Data discrepancy detection and Data
transformation tools.
2. Data Integration: Data integration is defined as heterogeneous data
from multiple sources combined in a common
source(DataWarehouse).
1. Data integration using Data Migration tools.
2. Data integration using Data Synchronization tools.
3. Data integration using ETL(Extract-Load-Transformation)
process. 9
Steps of KDD
3. Data Selection: Data selection is defined as the process where
data relevant to the analysis is decided and retrieved from the data
collection.
1. Data selection using Neural network.
2. Data selection using Decision Trees.
3. Data selection using Naive bayes.
4. Data selection using Clustering, Regression, etc.
4. Data Transformation: Data Transformation is defined as the
process of transforming data into appropriate form required by
mining procedure. Data Transformation is a two step process:
1. Data Mapping: Assigning elements from source base to
destination to capture transformations.
2. Code generation: Creation of the actual transformation
program.
10
Steps of KDD
1. Data Mining: Data mining is defined as clever techniques that are applied
to extract patterns potentially useful.
1. Transforms task relevant data into patterns.
2. Decides purpose of model using classification or characterization.
2. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly
increasing patterns representing knowledge based on given measures.
1. Find interestingness score of each pattern.
2. Uses summarization and Visualization to make data understandable
by user.
3. Knowledge representation: Knowledge representation is defined as
technique which utilizes visualization tools to represent data mining
results.
1. Generate reports.
2. Generate tables.
3. Generate discriminant rules, classification rules, characterization
rules, etc. 11
Primary Data Mining Tasks
In general, data mining tasks can be classified into two categories:
descriptive and predictive.
• Predictive methods, use some variables to predict unknown or
future values of other variables.
Ex: Classification, Regression,
• Descriptive methods, characterize the general properties of the
data in the database.
Ex: Association Rule Discovery, Clustering
13
Moving Towards Machine Learning ?
• Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
• Learning is used when:
– Human expertise does not exist
– Humans are unable to explain their expertise
– Solution changes in time
– Solution needs to be adapted to particular cases
Why do we need Machine Learning?
• Some tasks cannot be defined well, except by examples (e.g.
recognition of faces or people).
• Large amounts of data may have hidden relationships and correlations.
Only automated approaches may be able to detect these.
• The amount of knowledge about a certain problem / task may be too
large for explicit encoding by humans (e.g. in medical diagnostics)
• Environments change over time, and new knowledge is constantly
being discovered. A continuous redesign of the systems “by hand” may
be difficult.
Computer
Output
Computer
Data
Program
Output
Data
Program
Machine Learning Concept
Traditional Programming
Machine Learning
The Machine Learning Approach
Input
Data
Classifier
ML
e.g. Gene
Expression
Profiles, …
Machine
Learning
Prediction:
Yes / No
Machine Learning
• Learning Task:
– What do we want to learn or predict?
• Data and assumptions:
– What data do we have available?
– What is their quality?
– What can we assume about the given problem?
• Representation:
– What is a suitable representation of the examples to be classified?
• Method and Estimation:
– Are there possible hypotheses?
– Can we adjust our predictions based on the given results?
• Evaluation:
– How well does the method perform?
– Might another approach/model perform better?
18
19
20
Classification
• Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts, for the purpose of being able to use the
model to predict the class of objects whose class label is unknown.
• The derived model is based on the analysis of a set of training data (i.e., data
objects whose class label is known).
Classification Example
Classification Algorithms
• LDA (Linear Discriminant Analysis)
• QDA (Quadratic Discriminant Analysis )
• K-NN (k- Nearest Neighbor)
• SVM (Support Vector Machine)
• Decision Tree
• Random Forest
• ……………………….and many more
Classification Accuracy
• Accuracy: percentage of correct classifications
Total test instances classified correctly
Total number of test instances
Accuracy =
24
Evaluating a Classifier:
n-fold Cross Validation
• Suppose m labeled
instances
– Divide into n
subsets (“folds”) of
equal size
• Run classifier n times,
with each of the
subsets as the test set
– The rest (n-1) for
training
– Each run gives an
accuracy result
25
Evaluating a Classifier:
Confusion Matrix
Classified positive Classified negative
Actual positive
Actual negative
True positive
False positive
False negative
True negative
TP: number of positive examples classified correctly
FN: number of positive examples classified incorrectly
FP: number of negative examples classified incorrectly
TN: number of negative examples classified correctly 26
Evaluating a Classifier:
Precision and Recall
TP: number of positive examples classified correctly
FN: number of positive examples classified incorrectly
FP: number of negative examples classified incorrectly
TN: number of negative examples classified correctly
Precision =
TP
TP + FP
Recall =
TP
TP + FN
Note that the focus is on the positive class 27
Evaluating a Classifier:
What Affects the Performance
• Complexity of the task
– Large amounts of features (high dimensionality)
– Feature(s) appears very few times (sparse data)
• Few instances for a complex classification task
• Missing feature values for instances
• Errors in attribute values for instances
• Errors in the labels of training instances
• Uneven availability of instances in classes
28
What is Regression?
•Regression analysis is defined in Wikipedia as:
•In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships between a dependent
variable (often called the ‘outcome variable’) and one or
more independent variables (often called ‘predictors’, ‘covariates’,
or ‘features’).
29
Regression
Curve Fitting
Example: curve fitting
Lecture 1 8/25/11 31
CS 194-10 Fall 2011, Stuart Russell
Example: curve fitting
Lecture 1 8/25/11 32
CS 194-10 Fall 2011, Stuart Russell
Example: curve fitting
Lecture 1 8/25/11 33
CS 194-10 Fall 2011, Stuart Russell
Types of Regression
• Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression
• Logistic Regression
34
35
36
Clustering
•Clusters of objects are formed so that objects within a cluster have high
similarity in comparison to one another, but are very dissimilar to objects in other
clusters. Each cluster that is formed can be viewed as a class of objects, from
which rules can be derived.
38
Reinforcement Learning
• What’s Reinforcement Learning?
Environment
Agent
{Observation, Reward} {Actions}
• Agent interacts with an environment and learns by maximizing a scalar
reward signal
• No labels or any other supervision signal.
• Previously suffering from hand-craft states or representation.
40
Reinforcement Learning
• Learning a policy: A sequence of outputs
• No supervised output but delayed reward
• Credit assignment problem
• Game playing
• Robot in a maze
• Multiple agents, partial observability, ...
41
Data Mining/Machine learning Techniques
• Regression and classification are supervised learning approach that maps
an input to an output based on example input-output pairs, while clustering
is a unsupervised learning approach.
• Regression: It predicts continuous valued output. The Regression analysis
is the statistical model which is used to predict the numeric data instead of
labels. It can also identify the distribution trends based on the available data
or historic data. Predicting a person’s income from their age, education is
example of regression task.
• Classification: It defines/predicts discrete number of values. In
classification the data is categorized under different labels according to
some parameters and then the labels are predicted for the data. Classifying
emails as either spam or not spam is example of classification problem.
• Clustering: Clustering is the task of partitioning the dataset into groups,
called clusters.The goal is to split up the data in such a way that points
within single cluster are very similar and points in different clusters are
different. It determines grouping among unlabeled data.
42
Classification vs Clustering
▪In general, in classification you have a set of predefined classes and want to
know which class a new object belongs to.
▪Clustering tries to group a set of objects and find whether there is some
relationship between the objects.
▪In the context of machine learning, classification is supervised learning and
clustering is unsupervised learning.
43
Classification vs Regression
▪Classification and Regression are two major prediction problems which
are usually dealt in Data mining.
▪Predictive modeling is the technique of developing a model or function
using the historic data to predict the new data.
▪The significant difference between Classification and Regression is that
classification maps the input data object to some discrete labels.
▪On the other hand, regression maps the input data object to the continuous
real values.
44
Clustering vs Association rule
45
▪By definition, clustering is grouping a set of objects in such a manner that
objects in the same group are more similar than to those object belonging to
other groups.
▪Whereas, association rules is about finding associations amongst items within
large commercial databases.
46
What is Deep Learning?
▪‘Deep Learning’ means using a neural network with several layers
of nodes between input and output
▪ The series of layers between input & output do feature
identification and processing in a series of stages, just as our brains
seem to.
47
Machine Learning vs Deep Learning
48
Deep Learning and ML
49
Applications
• Image Classification
• Speech Recognition
• Language translation
• Stock Exchange Prediction
• Biomedical and diagnosis system
• Vehicular Communication
• Face detection
• Video Surveillance
50
51

More Related Content

Similar to Machinr Learning and artificial_Lect1.pdf

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSpartan60
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
BI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessBI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessJawaherAlbaddawi
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptxDikshantSharma63
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
Digital firm in the world with the best era
Digital firm in the world with the best eraDigital firm in the world with the best era
Digital firm in the world with the best erardeepan113
 

Similar to Machinr Learning and artificial_Lect1.pdf (20)

Machine learning
Machine learning Machine learning
Machine learning
 
data mining
data miningdata mining
data mining
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
BI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessBI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business business
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptx
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Data mining
Data miningData mining
Data mining
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
machine learning
machine learningmachine learning
machine learning
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
CLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptxCLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptx
 
Digital firm in the world with the best era
Digital firm in the world with the best eraDigital firm in the world with the best era
Digital firm in the world with the best era
 
Talk
TalkTalk
Talk
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 

Recently uploaded

Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdftimtebeek1
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmuxevmux96
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AIAGATSoftware
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio, Inc.
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Conceptsthomashtkim
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Lisi Hocke
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxNeo4j
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...Neo4j
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...OnePlan Solutions
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringPrakhyath Rai
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfkalichargn70th171
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNeo4j
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfICS
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jNeo4j
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024MulesoftMunichMeetup
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksJinanKordab
 

Recently uploaded (20)

Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmux
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 

Machinr Learning and artificial_Lect1.pdf

  • 1. Introduction to Machine Learning Dr. Amit M. Joshi Assistant Professor, Dept. of Electronics & Communication Malaviya National Institute of Technology Jaipur (Rajasthan) 1
  • 2. Contents ❖ Introduction to Data Mining ❖ Need of Data Mining ❖ Knowledge Discovery Process ❖ Machine Learning Introduction ❖ Types of Machine Learning ❖ Applications 2
  • 3. Introduction • Data Mining is process to find the pattern from large data ( or Big Data) using the techniques like Artificial Intelligence, Machine Learning, Statistics and database systems • The overall goal of the data mining process is to extract useful information from data set and transform it into an under stable structure of further use. • Data Mining is the analysis step of the “Knowledge Discovery to Database” process which is known as KDD • The objective is to discover pattern rather than data itself 3 We are data rich but information poor
  • 4. Why Data Mining is required? • Volume of information is increasing everyday that we can handle from business transactions, scientific data, sensor data, Pictures, videos, etc. So, we need a system that will be capable of extracting essence of information available and that can automatically generate report, views or summary of data for better decision-making. • Automatic summarization of data • Extracting essence of information stored. • Discovering patterns in raw data. • Data Mining also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of implicit, previously unknown, and potentially useful information from data stored in databases. 4
  • 5. Data Mining? • The actual data mining task is the automatic or semi-automatic analysis of the large quantities of data. • The main purpose is to extract previously unknown, interesting patterns such as group of records (cluster analysis), unusual record (anomaly detection), and dependencies (association and mining) etc. • These methods can however be useful for the creation of new hypothesis to test the against the larger data population. • It helps to inferring the new information from already collected data 5 Data mining—searching for knowledge in your data.
  • 6. Why Data Mining • The fast-growing, great amount of data, collected and stored in large and many data repositories, has far exceeded our human ability for understanding without powerful tools. • As a result, data collected in large data repositories become “data tombs”—data archives that are seldom visited. • Data collected and stored at Enormous speeds (GB/Hour) ➢ remote sensing on satellite • Data Mining may help scientists for ➢ in classifying and segmenting the data ➢ in hypothesis formation
  • 7. Basics of Data Mining • Data analysis is more inline with standard statistical software (i.e. web stats). These usually present information about subsets and relations within the recorded dataset (search engine usage, average visit time) • Data Mining implies software uses some more intelligence over simple grouping and partitioning of the data to have new inferred information. • Data Mining is non-trivial process of identifying ➢ Valid ➢ Novel ➢ Implicit ➢ previously unknown ➢ and ultimately understandable patterns ➢ potentially useful 7
  • 9. Steps of KDD 1. Data Cleaning: Data cleaning is defined as the removal of noisy and irrelevant data from the collection. 1. Cleaning in case of Missing values. 2. Cleaning noisy data, where noise is a random or variance error. 3. Cleaning with Data discrepancy detection and Data transformation tools. 2. Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common source(DataWarehouse). 1. Data integration using Data Migration tools. 2. Data integration using Data Synchronization tools. 3. Data integration using ETL(Extract-Load-Transformation) process. 9
  • 10. Steps of KDD 3. Data Selection: Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection. 1. Data selection using Neural network. 2. Data selection using Decision Trees. 3. Data selection using Naive bayes. 4. Data selection using Clustering, Regression, etc. 4. Data Transformation: Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data Transformation is a two step process: 1. Data Mapping: Assigning elements from source base to destination to capture transformations. 2. Code generation: Creation of the actual transformation program. 10
  • 11. Steps of KDD 1. Data Mining: Data mining is defined as clever techniques that are applied to extract patterns potentially useful. 1. Transforms task relevant data into patterns. 2. Decides purpose of model using classification or characterization. 2. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on given measures. 1. Find interestingness score of each pattern. 2. Uses summarization and Visualization to make data understandable by user. 3. Knowledge representation: Knowledge representation is defined as technique which utilizes visualization tools to represent data mining results. 1. Generate reports. 2. Generate tables. 3. Generate discriminant rules, classification rules, characterization rules, etc. 11
  • 12. Primary Data Mining Tasks In general, data mining tasks can be classified into two categories: descriptive and predictive. • Predictive methods, use some variables to predict unknown or future values of other variables. Ex: Classification, Regression, • Descriptive methods, characterize the general properties of the data in the database. Ex: Association Rule Discovery, Clustering
  • 13. 13 Moving Towards Machine Learning ? • Machine learning is programming computers to optimize a performance criterion using example data or past experience. • Learning is used when: – Human expertise does not exist – Humans are unable to explain their expertise – Solution changes in time – Solution needs to be adapted to particular cases
  • 14. Why do we need Machine Learning? • Some tasks cannot be defined well, except by examples (e.g. recognition of faces or people). • Large amounts of data may have hidden relationships and correlations. Only automated approaches may be able to detect these. • The amount of knowledge about a certain problem / task may be too large for explicit encoding by humans (e.g. in medical diagnostics) • Environments change over time, and new knowledge is constantly being discovered. A continuous redesign of the systems “by hand” may be difficult.
  • 16. The Machine Learning Approach Input Data Classifier ML e.g. Gene Expression Profiles, … Machine Learning Prediction: Yes / No
  • 17. Machine Learning • Learning Task: – What do we want to learn or predict? • Data and assumptions: – What data do we have available? – What is their quality? – What can we assume about the given problem? • Representation: – What is a suitable representation of the examples to be classified? • Method and Estimation: – Are there possible hypotheses? – Can we adjust our predictions based on the given results? • Evaluation: – How well does the method perform? – Might another approach/model perform better?
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. Classification • Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. • The derived model is based on the analysis of a set of training data (i.e., data objects whose class label is known).
  • 23. Classification Algorithms • LDA (Linear Discriminant Analysis) • QDA (Quadratic Discriminant Analysis ) • K-NN (k- Nearest Neighbor) • SVM (Support Vector Machine) • Decision Tree • Random Forest • ……………………….and many more
  • 24. Classification Accuracy • Accuracy: percentage of correct classifications Total test instances classified correctly Total number of test instances Accuracy = 24
  • 25. Evaluating a Classifier: n-fold Cross Validation • Suppose m labeled instances – Divide into n subsets (“folds”) of equal size • Run classifier n times, with each of the subsets as the test set – The rest (n-1) for training – Each run gives an accuracy result 25
  • 26. Evaluating a Classifier: Confusion Matrix Classified positive Classified negative Actual positive Actual negative True positive False positive False negative True negative TP: number of positive examples classified correctly FN: number of positive examples classified incorrectly FP: number of negative examples classified incorrectly TN: number of negative examples classified correctly 26
  • 27. Evaluating a Classifier: Precision and Recall TP: number of positive examples classified correctly FN: number of positive examples classified incorrectly FP: number of negative examples classified incorrectly TN: number of negative examples classified correctly Precision = TP TP + FP Recall = TP TP + FN Note that the focus is on the positive class 27
  • 28. Evaluating a Classifier: What Affects the Performance • Complexity of the task – Large amounts of features (high dimensionality) – Feature(s) appears very few times (sparse data) • Few instances for a complex classification task • Missing feature values for instances • Errors in attribute values for instances • Errors in the labels of training instances • Uneven availability of instances in classes 28
  • 29. What is Regression? •Regression analysis is defined in Wikipedia as: •In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, ‘covariates’, or ‘features’). 29
  • 31. Example: curve fitting Lecture 1 8/25/11 31 CS 194-10 Fall 2011, Stuart Russell
  • 32. Example: curve fitting Lecture 1 8/25/11 32 CS 194-10 Fall 2011, Stuart Russell
  • 33. Example: curve fitting Lecture 1 8/25/11 33 CS 194-10 Fall 2011, Stuart Russell
  • 34. Types of Regression • Linear Regression • Polynomial Regression • Support Vector Regression • Decision Tree Regression • Random Forest Regression • Ridge Regression • Lasso Regression • Logistic Regression 34
  • 35. 35
  • 36. 36
  • 37. Clustering •Clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Each cluster that is formed can be viewed as a class of objects, from which rules can be derived.
  • 38. 38
  • 39. Reinforcement Learning • What’s Reinforcement Learning? Environment Agent {Observation, Reward} {Actions} • Agent interacts with an environment and learns by maximizing a scalar reward signal • No labels or any other supervision signal. • Previously suffering from hand-craft states or representation.
  • 40. 40 Reinforcement Learning • Learning a policy: A sequence of outputs • No supervised output but delayed reward • Credit assignment problem • Game playing • Robot in a maze • Multiple agents, partial observability, ...
  • 41. 41
  • 42. Data Mining/Machine learning Techniques • Regression and classification are supervised learning approach that maps an input to an output based on example input-output pairs, while clustering is a unsupervised learning approach. • Regression: It predicts continuous valued output. The Regression analysis is the statistical model which is used to predict the numeric data instead of labels. It can also identify the distribution trends based on the available data or historic data. Predicting a person’s income from their age, education is example of regression task. • Classification: It defines/predicts discrete number of values. In classification the data is categorized under different labels according to some parameters and then the labels are predicted for the data. Classifying emails as either spam or not spam is example of classification problem. • Clustering: Clustering is the task of partitioning the dataset into groups, called clusters.The goal is to split up the data in such a way that points within single cluster are very similar and points in different clusters are different. It determines grouping among unlabeled data. 42
  • 43. Classification vs Clustering ▪In general, in classification you have a set of predefined classes and want to know which class a new object belongs to. ▪Clustering tries to group a set of objects and find whether there is some relationship between the objects. ▪In the context of machine learning, classification is supervised learning and clustering is unsupervised learning. 43
  • 44. Classification vs Regression ▪Classification and Regression are two major prediction problems which are usually dealt in Data mining. ▪Predictive modeling is the technique of developing a model or function using the historic data to predict the new data. ▪The significant difference between Classification and Regression is that classification maps the input data object to some discrete labels. ▪On the other hand, regression maps the input data object to the continuous real values. 44
  • 45. Clustering vs Association rule 45 ▪By definition, clustering is grouping a set of objects in such a manner that objects in the same group are more similar than to those object belonging to other groups. ▪Whereas, association rules is about finding associations amongst items within large commercial databases.
  • 46. 46
  • 47. What is Deep Learning? ▪‘Deep Learning’ means using a neural network with several layers of nodes between input and output ▪ The series of layers between input & output do feature identification and processing in a series of stages, just as our brains seem to. 47
  • 48. Machine Learning vs Deep Learning 48
  • 50. Applications • Image Classification • Speech Recognition • Language translation • Stock Exchange Prediction • Biomedical and diagnosis system • Vehicular Communication • Face detection • Video Surveillance 50
  • 51. 51