SlideShare a Scribd company logo
1 of 50
Machine Learning with Python
Compiled by : Dr. Kumud Kundu
Outline
● The general concepts of machine learning
● The three types of learning and basic terminology
● The building blocks for successfully designing machine learning systems
● Introduction to Pandas, Matlplotlib and sklearn framework
○ For basics of Python refer to (https://www.python.org/) and
○ For basics of NumPy refer to (http://www.numpy.org/).
● Simple Program of Plotting Graphs with Matplotlib.pyplot
● Coding Template of Analyzing and Visualizing Dataframe with Pandas
● Simple Program for supervised learning (prediction modelling) with Linear Regression
● Simple Program for unsupervised learning (clustering) with Kmeans
Machine Learning
Machine learning, the application and science of algorithms that make sense of data
Or
Machine Learning uses algorithms that takes input data, learns from data and make
informed decisions.
Or
To design and implement programs that improve with experience
ML: Giving Computers the Ability to Learn from Data
Machine Learning is…
Automating automation
Getting computers to program themselves
Let the data do the work instead!
Training
Data
model/
predictor
past
model/
predictor
future
Testing
Data
JOURNEY FROM DATA TO PREDICTIONS
“Machine learning is the next Internet”
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Traditional Programming Vs. Machine Learning Programmming
Machine learning is inherently a multi-disciplinary field
It draws on results from :
Artificial intelligence,
Probability
Statistics
Computational complexity theory
Information theory
Philosophy
Psychology
Neurobiology
and other fields.
Most machine learning methods work well because of human-designed representations and input
features
ML becomes just optimizing weights to best make a final prediction
Machine Learning
How Machines Learn???
Learning is all about discovering the best parameter values (a, b, c …) that maps
input to output.
Or
The main goal behind learning, we want to learn how the values are calculated
(relationships between output and input) i.e.
Machine learning algorithms are described as learning a target function (f) that
best maps input variables (X) to an output variable (Y), Y = f(X)
The relationships can be linear or non linear.
These values enable the learned model to output results for new instances based on
previous learned ones.
The problem of learning a function from data is a difficult problem
and this is the reason why the field of machine learning and machine
learning algorithms exist.
● Error creeps in predicting output from real life input data instances (X).
i.e. Y = f(X) + e
● This error might be error such as not having enough attributes to sufficiently characterize the best
mapping from X to Y.
Subject 1
Subject 2
As an example, Face Identification program will recognize subject1 similar to subject 2 on the basis
of intensity profile, though expected output is Subject1 with pose
Subject 1
with pose
Ml programming with python
Ml programming with python
The following diagram shows a typical workflow for
using machine learning in predictive modeling:
ML Program
● A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
Python for Machine Learning Program
Why Python??
Python is one of the most popular programming languages for data science and thanks to its very active developer
and open source community, a large number of useful libraries LIKE as NumPy and SciPy for scientific
computing and machine learning have been developed.
For machine learning programming tasks, the scikit-learn library, one of the most popular and accessible open
source machine learning libraries will be used.
Python on Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations,
visualizations and narrative text.
The core programming languages supported by Jupyter are Julia, Python
and R.
Use it on Google Colab colab.research.google.com
or Use Jupyter notebook on Anaconda
● Using the Anaconda Python distribution and package manager
● The Anaconda installer can be downloaded at https://docs.anaconda.com/anaconda/install/, and an
Anaconda quick start guide is available at https://docs.anaconda.com/anaconda/user-guide/getting-started/.
Key Terms in Machine Language Program
● Training example: A row in a table representing the dataset and synonymous with an observation, record,
instance, or sample (in most contexts, sample refers to a collection of training examples).
● Training: Model fitting, for parametric models similar to parameter estimation.
● Feature Set : A column in a data table or data (design) matrix. Synonymous with predictor, variable, input,
attribute, or covariate.
● Target or Test Set y: Outcome, output, response variable, dependent variable, (class) label, and ground truth.
● Loss function / Cost Function / Error Function: Function that measure the deviation of predicted output from
the expected output.
Import the Libraries into the Jupyter Notebook
● Import Numpy as np
● Import Pandas as pd
● Import Matplotlib.pyplot as plt
Matplotlib: A Plotting Library for Python
● it makes heavy use of NumPy
● Importing matplotlib :
● from matplotlib import pyplot as plt or
● import matplotlib.pyplot as plt
● Examples:
● # for plotting bar graph
● x=[1,23,4,5,6,7]
● y=[23,45,67,89,90,100]
● plt.bar(x,y)
● plt.title('bar graph')
● plt.xlabel('fff')
● plt.ylabel('Y')
● plt.show()
● plt.scatter(x,y)
● plt.title('Scatter Plot')
● plt.xlabel('fff')
● plt.ylabel('Y')
● plt.show()
For subplots (Simultaneous plotting)
● Matplotlib.pyplot.subplot
● import numpy as np
● x=np.arange(0,10,0.01)
● plt.subplot(1,3,1)
● plt.plot(x,np.sin(x))
● plt.subplot(1,3,2)
● plt.plot(x,np.cos(x))
● plt.subplot(1,3,3)
● plt.plot(x,np.sin(2*x))
● plt.show()
Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool.
Pandas in data analysis:
Importing Data
Writing to different formats
Pandas Data Structures
Data Exploration
Data Manipulation
Aggregating Data
Merging Data
DataFrame
● DataFrame is a two-dimensional array with heterogeneous data.
Reading and Writing into DataFrames
● Import pandas as pd
● Reading Data into Dataframe using Pandas
○ df=pd.read_csv(‘File Name’) # From Comma Seperated Values (CSV) file
○ df=pd.read_csv('C:fdpbatsmen_ratings_all091217.csv')
○ df=pd.read_excel(‘File Name’)
● Writing Data from dataframes to Files on System
df.to_csv(‘File Name’ or ‘Destination Path along with path file’)
df.to_excel(‘File Name’ or ‘Destination Path along with path file’
To display all the records of the file : display(df)
● types = df.dtypes
● print(types)
Getting preview of Dataframe
● To view top n records of dataframe
○ df.head(5)
● To view bottom n records of dataframe
○ df.tail(5)
● View column name
○ df.columns
○ Getting subdataframe from dataframe
○ df['name’] , df[['name','nations']]
SubDataFrame as per Query
To display the records of India with ranking <50
display(df[(df['nations'] == "IND") & (df['rank’] < 50)])
Selecting data columns from dataset with column names:
df[[‘col1’ ‘col2’]]
With iloc (integer-location) based indexing for selection by position
df.iloc[:,:-1] // select all columns but not the last one
df.iloc [:, [4:6]] // select all rows of fourth, fifth and sixth column
Drop Columns from a Dataframe using drop() method.
Drop Columns from a Dataframe using and drop() method.
Method #1: Drop Columns from a Dataframe using drop() method.
Remove specific single column.
k.drop(['rate_date'],axis=1) // Axis =1 denotes dropping column of dataset
Removing specific multiple columns.
k.drop(['rate_date', 'rating'], axis=1)
Remove columns as based on column index.
k.drop[k.columns[[0,1]],axis=1, inplace= True)
Remove all columns between a specific column to another columns
K.iloc(:,[3,4])
Code for Data Reading, Data Manipulation using Pandas
● # Importing Data Reading, Data Manipulation Library of python
import pandas as pd
# import files because the files are not present on google colab
from google.colab import files
upload=files.upload()
# reading dataset using read_csv function
● df=pd.read_csv('rating.csv')
# to display column headers in dataset
df.columns
● # to get the number of instances and associated features
df.shape
# to get insights to data by grouping the data of one column
● df.groupby('nations').size()
# to get smaller dataset as per the query or subqueries
● k=(df[(df['nations'] =="IND") & (df['rank']<50)])
# to display smaller subset of data
display(k)
# to drop desired column from the smaller set of data
● k=dataset.drop(['name','rate_date','nations'],axis=1)
Scikit /sklearn: Free Machine Learning Library for Python
● It supports Python numerical and scientific libraries like NumPy and SciPy .
● Model selection is the process of selecting one final machine learning model from among a collection of candidate
machine learning models for a training dataset. Model selection is a process that can be applied both across different
types of models (e.g. logistic regression, SVM, KNN, etc.)
● from sklearn.model_selection
● model_selection is the process of selecting one final machine learning model among a collection of machine learning
models for training set.
● model parameters are parameters which arise as a result of the fit
Challenge of ML Program
The challenge of applied machine learning is in choosing
a model among a range of different models for your
problem.
Simple Predictive ML Program using Linear Regression
Model
● SIMPLE_REGRESSION.ipynb On Google Colab
# Important Data Reading, Data Manipulation Library of python
import pandas as pd
# import files because the files are not present on google colab
from google.colab import files
upload=files.upload()
# reading dataset using read_csv function
df=pd.read_csv('rating.csv.csv')
# For plotting graphs
import matplotlib.pyplot as plt
# Dividing Dataset into Train Set (X) and Target Set (y)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
# from machine learning library of python (sklearn) import train_test_split function
from sklearn.model_selection import train_test_split
# X is training set
# y is the target set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
# split with the help of train_test_split function
# X part is divided in two parts Train and Test
# Y part is divided into two parts Train and Test
X_test.shape
# import Linear Regression Model
from sklearn.linear_model import LinearRegression
# created instance of linear regression model
model = LinearRegression()
# Finding the relationship between input AND OUTPUT with the help of fit function
model.fit(X_train, y_train)
# using the same trained model over the unknown test data i.e. x_test
y_pred = model.predict(X_test)
Visualizing and Evaluation of results
# Visualization of Results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('PCM Marks vs Placement_Package (Training set)')
plt.xlabel('PCM Marks')
plt.ylabel('Placement_Package')
plt.show()
# importing metrics from sklearn to evaluate the predicted result
from sklearn import metrics
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:',
# include Numerical Calculation Python Library numpy
import numpy as np
np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
CLUSTERING : Grouping things together
UNSUPERVISED LEARNING
Cluster Analysis : A method of Unsupervised Learning
● Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are
more similar to each other than to those in other groups.
● Clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when
we apply a clustering algorithm.
● To survey academic performance of high school students , the entire population of particular board can be divided into
different clusters (Excellent Learner, Good Learner , Average Learner and Slow learner).
K-Means Clustering
● Aims to partition ‘n’ observations into k clusters in which each observation belongs to the
cluster with the nearest mean, serving as a prototype of the cluster.
● K-Means falls under the category of centroid-based clustering.
•n = number of instances
•k = number of clusters
•t = number of iterations
K-Means Clustering Algorithm involves the following steps-
● Choose the number of clusters K.
● Randomly select any K data points as cluster centers in such a way that they are as farther as possible from each
other.
○ Calculate the distance between each data point and each cluster center by using given distance function.
○ A data point is assigned to that cluster whose center is nearest to that data point.
○ Re-compute the center of newly formed clusters.
○ The center of a cluster is computed by taking mean of all the data points contained in that cluster.
● Keep repeating the above four steps until any of the following stopping criteria is met-
○ No change in the center of newly formed clusters
○ No change in the data points of the cluster
○ Maximum number of iterations are reached
Metric to evaluate the quality of Clusters
● Inertia : Inertia actually calculates the sum of distances of all the points within a cluster from the
centroid of that cluster.
● It tells us how far the points within a cluster are
● the distance between them should be as low as possible.
from sklearn.cluster import KMeans
● Using the K-Means++ algorithm, we optimize the step where we randomly pick the cluster
centroid.
● kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
● Using the elbow method to find the optimal number of clusters
An Elbow Method Algorithm
● The basic idea of the elbow rule is to use a square of the distance between the sample points in
each cluster and the centroid of the cluster to give a series of K values. The sum of squared
errors (SSE) is used as a performance indicator. Iterate over the K-value and calculate the SSE.
● Smaller values indicate that each cluster is more convergent
Clustering Example with K-Means
Coding contd..
Coding contd..
Agglomerative Clustering
● An agglomerative algorithm is a type of hierarchical clustering algorithm where
each individual element to be clustered is in its own cluster. These clusters are merged
iteratively until all the elements belong to one cluster.
● Hierarchical clustering is a powerful technique that allows to build tree structures from
data similarities.
Hierarchical Clustering Example
Coding contd..
Ml programming with python
Applications of Clustering
● Search Engines.
● Spam Detection
● Customer Segmentation

More Related Content

What's hot

Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbaiUnmesh Baile
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
Python Programming - XII. File Processing
Python Programming - XII. File ProcessingPython Programming - XII. File Processing
Python Programming - XII. File ProcessingRanel Padon
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologiesPolad Saruxanov
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Unit 2 linked list
Unit 2   linked listUnit 2   linked list
Unit 2 linked listDrkhanchanaR
 
Primitive data types
Primitive data typesPrimitive data types
Primitive data typesbad_zurbic
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environmentizahn
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab Object Oriented Programming in Matlab
Object Oriented Programming in Matlab AlbanLevy
 
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...mporhel
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming LanguagesUnit 2 Principles of Programming Languages
Unit 2 Principles of Programming LanguagesVasavi College of Engg
 
08 class and object
08   class and object08   class and object
08 class and objectdhrubo kayal
 

What's hot (20)

Data structure
Data structureData structure
Data structure
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
Python Programming - XII. File Processing
Python Programming - XII. File ProcessingPython Programming - XII. File Processing
Python Programming - XII. File Processing
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Templates in c++
Templates in c++Templates in c++
Templates in c++
 
264finalppt (1)
264finalppt (1)264finalppt (1)
264finalppt (1)
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Unit 2 linked list
Unit 2   linked listUnit 2   linked list
Unit 2 linked list
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Primitive data types
Primitive data typesPrimitive data types
Primitive data types
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab Object Oriented Programming in Matlab
Object Oriented Programming in Matlab
 
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming LanguagesUnit 2 Principles of Programming Languages
Unit 2 Principles of Programming Languages
 
08 class and object
08   class and object08   class and object
08 class and object
 

Similar to Ml programming with python

Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei ZahariaAccelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei ZahariaDatabricks
 
Concepts In Object Oriented Programming Languages
Concepts In Object Oriented Programming LanguagesConcepts In Object Oriented Programming Languages
Concepts In Object Oriented Programming Languagesppd1961
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessingAbdurRazzaqe1
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Karthik Murugesan
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013Sanjeev Mishra
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
Course Breakup Plan- C
Course Breakup Plan- CCourse Breakup Plan- C
Course Breakup Plan- Cswatisinghal
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMatei Zaharia
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 

Similar to Ml programming with python (20)

Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei ZahariaAccelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei Zaharia
 
Concepts In Object Oriented Programming Languages
Concepts In Object Oriented Programming LanguagesConcepts In Object Oriented Programming Languages
Concepts In Object Oriented Programming Languages
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
Asgh
AsghAsgh
Asgh
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
Course Breakup Plan- C
Course Breakup Plan- CCourse Breakup Plan- C
Course Breakup Plan- C
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 

Recently uploaded

How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptxmary850239
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSyedNadeemGillANi
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Dr. Asif Anas
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...M56BOOKSTORE PRODUCT/SERVICE
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdfJayanti Pande
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeCeline George
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 

Recently uploaded (20)

How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 

Ml programming with python

  • 1. Machine Learning with Python Compiled by : Dr. Kumud Kundu
  • 2. Outline ● The general concepts of machine learning ● The three types of learning and basic terminology ● The building blocks for successfully designing machine learning systems ● Introduction to Pandas, Matlplotlib and sklearn framework ○ For basics of Python refer to (https://www.python.org/) and ○ For basics of NumPy refer to (http://www.numpy.org/). ● Simple Program of Plotting Graphs with Matplotlib.pyplot ● Coding Template of Analyzing and Visualizing Dataframe with Pandas ● Simple Program for supervised learning (prediction modelling) with Linear Regression ● Simple Program for unsupervised learning (clustering) with Kmeans
  • 3. Machine Learning Machine learning, the application and science of algorithms that make sense of data Or Machine Learning uses algorithms that takes input data, learns from data and make informed decisions. Or To design and implement programs that improve with experience
  • 4. ML: Giving Computers the Ability to Learn from Data
  • 5. Machine Learning is… Automating automation Getting computers to program themselves Let the data do the work instead! Training Data model/ predictor past model/ predictor future Testing Data
  • 6. JOURNEY FROM DATA TO PREDICTIONS “Machine learning is the next Internet”
  • 8. Machine learning is inherently a multi-disciplinary field It draws on results from : Artificial intelligence, Probability Statistics Computational complexity theory Information theory Philosophy Psychology Neurobiology and other fields.
  • 9. Most machine learning methods work well because of human-designed representations and input features ML becomes just optimizing weights to best make a final prediction Machine Learning
  • 10. How Machines Learn??? Learning is all about discovering the best parameter values (a, b, c …) that maps input to output. Or The main goal behind learning, we want to learn how the values are calculated (relationships between output and input) i.e. Machine learning algorithms are described as learning a target function (f) that best maps input variables (X) to an output variable (Y), Y = f(X) The relationships can be linear or non linear. These values enable the learned model to output results for new instances based on previous learned ones.
  • 11. The problem of learning a function from data is a difficult problem and this is the reason why the field of machine learning and machine learning algorithms exist. ● Error creeps in predicting output from real life input data instances (X). i.e. Y = f(X) + e ● This error might be error such as not having enough attributes to sufficiently characterize the best mapping from X to Y. Subject 1 Subject 2 As an example, Face Identification program will recognize subject1 similar to subject 2 on the basis of intensity profile, though expected output is Subject1 with pose Subject 1 with pose
  • 14. The following diagram shows a typical workflow for using machine learning in predictive modeling:
  • 15. ML Program ● A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
  • 16. Python for Machine Learning Program
  • 17. Why Python?? Python is one of the most popular programming languages for data science and thanks to its very active developer and open source community, a large number of useful libraries LIKE as NumPy and SciPy for scientific computing and machine learning have been developed. For machine learning programming tasks, the scikit-learn library, one of the most popular and accessible open source machine learning libraries will be used.
  • 18. Python on Jupyter Notebook The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. The core programming languages supported by Jupyter are Julia, Python and R. Use it on Google Colab colab.research.google.com or Use Jupyter notebook on Anaconda ● Using the Anaconda Python distribution and package manager ● The Anaconda installer can be downloaded at https://docs.anaconda.com/anaconda/install/, and an Anaconda quick start guide is available at https://docs.anaconda.com/anaconda/user-guide/getting-started/.
  • 19. Key Terms in Machine Language Program ● Training example: A row in a table representing the dataset and synonymous with an observation, record, instance, or sample (in most contexts, sample refers to a collection of training examples). ● Training: Model fitting, for parametric models similar to parameter estimation. ● Feature Set : A column in a data table or data (design) matrix. Synonymous with predictor, variable, input, attribute, or covariate. ● Target or Test Set y: Outcome, output, response variable, dependent variable, (class) label, and ground truth. ● Loss function / Cost Function / Error Function: Function that measure the deviation of predicted output from the expected output.
  • 20. Import the Libraries into the Jupyter Notebook ● Import Numpy as np ● Import Pandas as pd ● Import Matplotlib.pyplot as plt
  • 21. Matplotlib: A Plotting Library for Python ● it makes heavy use of NumPy ● Importing matplotlib : ● from matplotlib import pyplot as plt or ● import matplotlib.pyplot as plt ● Examples: ● # for plotting bar graph ● x=[1,23,4,5,6,7] ● y=[23,45,67,89,90,100] ● plt.bar(x,y) ● plt.title('bar graph') ● plt.xlabel('fff') ● plt.ylabel('Y') ● plt.show()
  • 22. ● plt.scatter(x,y) ● plt.title('Scatter Plot') ● plt.xlabel('fff') ● plt.ylabel('Y') ● plt.show()
  • 23. For subplots (Simultaneous plotting) ● Matplotlib.pyplot.subplot ● import numpy as np ● x=np.arange(0,10,0.01) ● plt.subplot(1,3,1) ● plt.plot(x,np.sin(x)) ● plt.subplot(1,3,2) ● plt.plot(x,np.cos(x)) ● plt.subplot(1,3,3) ● plt.plot(x,np.sin(2*x)) ● plt.show()
  • 24. Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. Pandas in data analysis: Importing Data Writing to different formats Pandas Data Structures Data Exploration Data Manipulation Aggregating Data Merging Data
  • 25. DataFrame ● DataFrame is a two-dimensional array with heterogeneous data.
  • 26. Reading and Writing into DataFrames ● Import pandas as pd ● Reading Data into Dataframe using Pandas ○ df=pd.read_csv(‘File Name’) # From Comma Seperated Values (CSV) file ○ df=pd.read_csv('C:fdpbatsmen_ratings_all091217.csv') ○ df=pd.read_excel(‘File Name’) ● Writing Data from dataframes to Files on System df.to_csv(‘File Name’ or ‘Destination Path along with path file’) df.to_excel(‘File Name’ or ‘Destination Path along with path file’ To display all the records of the file : display(df) ● types = df.dtypes ● print(types)
  • 27. Getting preview of Dataframe ● To view top n records of dataframe ○ df.head(5) ● To view bottom n records of dataframe ○ df.tail(5) ● View column name ○ df.columns ○ Getting subdataframe from dataframe ○ df['name’] , df[['name','nations']]
  • 28. SubDataFrame as per Query To display the records of India with ranking <50 display(df[(df['nations'] == "IND") & (df['rank’] < 50)]) Selecting data columns from dataset with column names: df[[‘col1’ ‘col2’]] With iloc (integer-location) based indexing for selection by position df.iloc[:,:-1] // select all columns but not the last one df.iloc [:, [4:6]] // select all rows of fourth, fifth and sixth column
  • 29. Drop Columns from a Dataframe using drop() method. Drop Columns from a Dataframe using and drop() method. Method #1: Drop Columns from a Dataframe using drop() method. Remove specific single column. k.drop(['rate_date'],axis=1) // Axis =1 denotes dropping column of dataset Removing specific multiple columns. k.drop(['rate_date', 'rating'], axis=1) Remove columns as based on column index. k.drop[k.columns[[0,1]],axis=1, inplace= True) Remove all columns between a specific column to another columns K.iloc(:,[3,4])
  • 30. Code for Data Reading, Data Manipulation using Pandas ● # Importing Data Reading, Data Manipulation Library of python import pandas as pd # import files because the files are not present on google colab from google.colab import files upload=files.upload() # reading dataset using read_csv function ● df=pd.read_csv('rating.csv') # to display column headers in dataset df.columns ● # to get the number of instances and associated features df.shape # to get insights to data by grouping the data of one column ● df.groupby('nations').size() # to get smaller dataset as per the query or subqueries ● k=(df[(df['nations'] =="IND") & (df['rank']<50)]) # to display smaller subset of data display(k) # to drop desired column from the smaller set of data ● k=dataset.drop(['name','rate_date','nations'],axis=1)
  • 31. Scikit /sklearn: Free Machine Learning Library for Python ● It supports Python numerical and scientific libraries like NumPy and SciPy . ● Model selection is the process of selecting one final machine learning model from among a collection of candidate machine learning models for a training dataset. Model selection is a process that can be applied both across different types of models (e.g. logistic regression, SVM, KNN, etc.) ● from sklearn.model_selection ● model_selection is the process of selecting one final machine learning model among a collection of machine learning models for training set. ● model parameters are parameters which arise as a result of the fit
  • 32. Challenge of ML Program The challenge of applied machine learning is in choosing a model among a range of different models for your problem.
  • 33. Simple Predictive ML Program using Linear Regression Model ● SIMPLE_REGRESSION.ipynb On Google Colab # Important Data Reading, Data Manipulation Library of python import pandas as pd # import files because the files are not present on google colab from google.colab import files upload=files.upload() # reading dataset using read_csv function df=pd.read_csv('rating.csv.csv') # For plotting graphs import matplotlib.pyplot as plt # Dividing Dataset into Train Set (X) and Target Set (y) X = df.iloc[:, :-1].values y = df.iloc[:, -1].values
  • 34. # from machine learning library of python (sklearn) import train_test_split function from sklearn.model_selection import train_test_split # X is training set # y is the target set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0) # split with the help of train_test_split function # X part is divided in two parts Train and Test # Y part is divided into two parts Train and Test X_test.shape # import Linear Regression Model from sklearn.linear_model import LinearRegression # created instance of linear regression model model = LinearRegression() # Finding the relationship between input AND OUTPUT with the help of fit function model.fit(X_train, y_train) # using the same trained model over the unknown test data i.e. x_test y_pred = model.predict(X_test)
  • 35. Visualizing and Evaluation of results # Visualization of Results plt.scatter(X_train, y_train, color = 'red') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('PCM Marks vs Placement_Package (Training set)') plt.xlabel('PCM Marks') plt.ylabel('Placement_Package') plt.show() # importing metrics from sklearn to evaluate the predicted result from sklearn import metrics print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred)) print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred)) print('Root Mean Squared Error:', # include Numerical Calculation Python Library numpy import numpy as np np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
  • 36. CLUSTERING : Grouping things together UNSUPERVISED LEARNING
  • 37. Cluster Analysis : A method of Unsupervised Learning ● Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. ● Clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. ● To survey academic performance of high school students , the entire population of particular board can be divided into different clusters (Excellent Learner, Good Learner , Average Learner and Slow learner).
  • 38. K-Means Clustering ● Aims to partition ‘n’ observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. ● K-Means falls under the category of centroid-based clustering. •n = number of instances •k = number of clusters •t = number of iterations
  • 39. K-Means Clustering Algorithm involves the following steps- ● Choose the number of clusters K. ● Randomly select any K data points as cluster centers in such a way that they are as farther as possible from each other. ○ Calculate the distance between each data point and each cluster center by using given distance function. ○ A data point is assigned to that cluster whose center is nearest to that data point. ○ Re-compute the center of newly formed clusters. ○ The center of a cluster is computed by taking mean of all the data points contained in that cluster. ● Keep repeating the above four steps until any of the following stopping criteria is met- ○ No change in the center of newly formed clusters ○ No change in the data points of the cluster ○ Maximum number of iterations are reached
  • 40. Metric to evaluate the quality of Clusters ● Inertia : Inertia actually calculates the sum of distances of all the points within a cluster from the centroid of that cluster. ● It tells us how far the points within a cluster are ● the distance between them should be as low as possible.
  • 41. from sklearn.cluster import KMeans ● Using the K-Means++ algorithm, we optimize the step where we randomly pick the cluster centroid. ● kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42) ● Using the elbow method to find the optimal number of clusters
  • 42. An Elbow Method Algorithm ● The basic idea of the elbow rule is to use a square of the distance between the sample points in each cluster and the centroid of the cluster to give a series of K values. The sum of squared errors (SSE) is used as a performance indicator. Iterate over the K-value and calculate the SSE. ● Smaller values indicate that each cluster is more convergent
  • 46. Agglomerative Clustering ● An agglomerative algorithm is a type of hierarchical clustering algorithm where each individual element to be clustered is in its own cluster. These clusters are merged iteratively until all the elements belong to one cluster. ● Hierarchical clustering is a powerful technique that allows to build tree structures from data similarities.
  • 50. Applications of Clustering ● Search Engines. ● Spam Detection ● Customer Segmentation