SlideShare a Scribd company logo
INDUSTRIAL TRAINING REPORT
Machine Learning
Submitted
In Partial Fulfillment of the Requirements
For the Degree of
Bachelor of Technology
In
Computer Science and Engineering
By
HRJEET SINGH
Roll No. 1700410019
2017-2021
2021
Sponsored by
Internshala
Noida Gurgaon U.P
Table of Content
Declaration i
Certificate ii
Acknowledgement iii
Abstract iv
1.0 Introduction ….
2.0 Company Background & Structure ….
3.0 Weekly Job Summary ….
3.1 Daily Records
3.2 About the Training
3.3 Training Schedule and location
4.0 Technical Contents ….
4.1 Description of tasks
5.0 Learning outcome and work experience ….
5.1 Application of Theory and skills
6.0 Conclusion
References
Declaration
I hereby declare that I have completed my six weeks summer
training at Internshala(one of the world’s leading online
certification training providers) from 24th Nov, 2020 to 05th Jan,
2020 under the guidance of Mr. Kunal Jain and Mr. Sunil Roy. I
have declared that I have worked with full dedication during these
six weeks of training and my learning outcomes fulfill the
requirements of training for the award of degree of Bachelor of
Technology (B.Tech.) in Computer Science and Engineering,
Raja Balwant Singh Engineering Technical Campus
Name : Hrjeet Singh
Roll.No. 1700410019
Date:
CERTIFICATE
Prof.(Dr.) Brajesh Kumar Singh
H.O.D. CSE Deptt.
ACKNOWLEDGEMENT
I would like to acknowledgement the contribution of the following people
without whose help and guidance this report would not have completed .
I acknowledgement the counsel and support of our training coordinator, Mr.
BrajeshKumar singh, Head of CSE Department, with respect and gratitude,
whose expertise, guidance, support, encouragement, and enthusiasm has
made this report possible . their feedback vastly improve the quality of this
report and provided an enthralling experience. I am indeed proud and
fortunate to be supported by him.
Although it is not possible to name individually, I shall ever remain indebted
to the faculty members of R.B.S. Engineering Technical Campus Bichpuri,
Agra for their persistent support and cooperation extended during this work.
This acknowledgement will remain incomplete if I fail to express our deep
sense of obligation to my parents and God for their consistent blessings and
encouragement.
Hrjeet singh
1700410019
Abstract
Industrial training is an important phase of a student life. A well
planned, properly executed and evaluated industrial training helps a
lot in developing a professional attitude. It develop an awareness of
industrial approach to problem solving, based on a broad
understanding of process and mode of operation of organization.
The aim and motivation of this industrial training is to receive
discipline, skills, teamwork and technical knowledge through a
proper training environment, which will help me, as a student in the
field of Information Technology, to develop a responsiveness of the
self-disciplinary nature of problems in information and
communication technology
Company Background & structure
Company profile
Internshala was created with a mission to create skilled software engineers
for our country and the world. It aims to bridge the gap between the quality of
demanded by industry and the quality of skills imparted by conventional
institute. With assessments, learning paths and courses authored by industry
experts, Internshala help businesses and individual benchmark expertise
across roles, speed up release cycles and build reliable, secure products.
VISION
We are a technology company on a mission to equip students with relevant
skills & practical exposurethrough internshipsand online trainings. Imagine a
world full of freedom and possibilities. A world where you can discover your
passion and turn it into your career. A world where your practical skills
matter more than your university degree. A world where you do not have to
wait till 21 to taste your first work experience (and get a rude shock that it is
nothing like you had imagine it to be). A world where you graduate fully
assured, fully confident, and fully prepared to stake claim on your place in the
world.
History
The platform, which was founded in 2010, started out as a WordPress blog
that aggregated internships across India and articles on education, technology
and skill gap. Internshala launched its online trainings in 2014. As of 2018,
the platform had 3.5 million students and 80,000 companies.
Mission
nternshala's mission is to equip every student with practical skills and
exposure so that they can build their dream careers. And our e-learning
platform, Internshala Trainings ( https://trainings.internshala.com) is central
to this mission. Internshala Trainings' goal is simple - to make learning easy.
Objectives
Main objective of training were to learn:
 How to determine and measure program complexity.
 Python programming.
 Machine learning Library Scikit Learn, Numpy , Matplotlib , Pandas ,
Seaborn.
 Statistical Math for the Algorithms.
 Learning to solve and Mathematical concepts.
 Supervised and Unsupervised learning.
 Classification and Regression.
 Machine learning algorithms.
 Machine Learning Programming and Use Cases.
Weekly Summery
Week 1 Introduction to machine learning, Introduction to
data, Assignment 1, assignment 2
Week 2 Introduction to python and Data Exploration and
Preprocessing. Assingment 3, Assignment 4.
Week 3 Linear Regression and Introduction to
Dimensionality Reduction. Assingment 5,
Assignment 6.
Week 4 Logistic Regression and Decision Tree.
Assingment7, Assignment 8.
Week 5 Ensemble model. Assingment 9.
Week 6 Clustering, project. Assingment 10.
About the training
Training is the process of teaching, informing or educating people so
that they may become well qualified as possible to do their job, and
they become qualified to perform in positions of greater difficulty and
responsibility.
Training is an organized and planned effort by a company in order to
facilitate employees learning regarding job related competencies.
 Industrial training at Internshala from 24th
November 2020 to
05th January 2021.
 I completed my online industrial training from “Internshala”
located in Gurgaon whose time period was of 42 days.
I have completed my online training under the guidance MR. Kunal Jain
and MR. Sunil Roy
Introduction To Machine Learning
Machine learning enables a machine to automatically learn from
data, improve performance from experiences, and predict things
without being explicitly programmed.
In the real world, we are surrounded by humans who can learn
everything from their experiences with their learning capability, and
we have computers or machines which work on our instructions.
But can a machine also learn from experiences or past data like a
human does? So here comes the role of Machine Learning.
Type of machine learning
The types of machine learning algorithm differ in their approach, the
type of data they input and output, and the type of task or problem
that they are intended to solve. Broadly machine learning can be
categorized into two categories,
I. Supervised Learning
II. Unsupervised Learning
Supervised Learning
Supervised Learning is a type of learning in witch we are given a
data set and we already know what are correct output should look
like, having the idea that there is a relationship between the input
and output. Basically, it is learning task of learning a function that
maps an input to an output based on example input-output pair.
Unsupervised Learning
Unsupervised learning is a type of learning that allow us to
approach problems with little or no idea our problem should look
like. We can derive the structure by clustering the data based on
relationship among the variables in data. With unsupervised
learning there is no feedback based on prediction result. Basically , it
is a type of self-organized learning that help in finding previously
unknown patterns in the data set without pre-existing label.
Data
Data is collection of information about any things.
Ex. Notification, Activity of time, Clock alarm etc.
Two type of data use in machine learning models,
1 Labeled data
2 Unlabeled data
Labeled data
The data which contain a target variable or an output variable that
answer a question of interest is called labeled data.
Unlabeled data
Unlabeled data is a designation for pieces of data that have not been
tagged with labels identifying characteristics, properties or
classifications.
Introduction to Python
Python is a widely used general-purpose, high level programming
language. It was initially designed by Guido van Rossum in 1991 and
developed by Python Software Foundation. It was mainly developed
for an emphasis on code readability, and its syntax allows
programmers to express concepts in fewer lines of code. Python is
dynamically typed and garbage-collected. It supports
multiple programming paradigms, including procedural, object-
oriented, and functional programming. Python is often described as
a "batteries included" language due to its comprehensive standard
library.
Basic Libraries in Python
Scikit-learn for handling basic ML algorithms like clustering, linear and
logistic regressions, regression, classification, and others.
Pandas for high-level data structures and analysis. It allows merging
and filtering of data, as well as gathering it from other external sources
like Excel, for instance.
Matplotlib for creating 2D plots, histograms, charts, and other forms of
visualization.
NumPy is a general-purpose array-processing package. It provides
a high-performance multidimensional array object, and tools for
working with these arrays
Data Preprocessing
Machine learning on’t work so well with processing raw data. Before we can
feed such data to an ML algorithm, we must preprocess it. We must apply
some transformations on it. With data preprocessing, we convert raw data
into a clean data set. To perform data this, there are 6 techniques-
1. Rescaling Data -For data with attributes of varying scales, we can
rescale attributes to possess the same scale. We rescale attributes into
the range 0 to 1 and call it normalization. We use the Min Max Scaler
class from scikit-learn. This gives us values between 0 and 1
2. Normalizing Data -In this task, we rescale each observation to a length
of 1 (a unit norm). For this, we use the Normalizer class.
3. Mean Removal-We can remove the mean from each feature to center it
on zero.
4. Some labels can be words or numbers. Usually, training data is labelled
with words to make it readable. Label encoding converts word labels
into numbers to let algorithms work on them.
5. One Hot Encoding -When dealing with few and scattered numerical
values, we may not need to store these. Then, we
can perform OneHot Encoding. For k distinct values, we can transform t
he feature into a k-dimensionalvector with one value of 1 and 0 as the
rest values.
6. Standardizing Data -With standardizing, we can take attributes with a
Gaussian distribution and different means and standard deviations and
transform them into a standard Gaussian distribution with a mean of 0
and a standard deviation of 1.
Exploratory Data Analysis (EDA)
It is the process of summarizing, visualizing and getting deeply
acquainted with the important traits of a data set. When you carry out
EDA, domainknowledge(e.g. about thebusinessor social impact category)
canhelp a great dealin understanding thedataand extracting insights from
it.
To achieve this level of certainty, here’s what you can do with EDA:
 Understand how the raw data was collected
 Get familiar with different characteristics of the data
 Learn about the individual features and their mutual relationships (or
lack of)
 Check and validate the data for anomalies, outliers, missing values,
human errors, etc.
 Extract insightsthat weren’t soevident to businessstakeholders but can
provide useful information about the business
 Discover hidden patterns in the data that allow for better
comprehension of the business problem
 Validate if the data has been generated in an expected manner
Linear regression
Linear regression may be defined as the statistical model that
analyzes the linear relationship between a dependent variable with
given set of independent variables. Linear relationship between
variables means that when the value of one or more independent
variables will change (increase or decrease), the value of dependent
variable will also change accordingly (increase or decrease).
Mathematically the relationship can be represented with the help of
following equation −
Y = mX + c
Here:
Y=Dependent Variable (Target Variable)
X=Independent Variable (predictor Variable)
C= intercept of the line
m=Linear regression coefficient
Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It
measures how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping
function, which maps the input variable to the output variable. This
mapping function is also known as Hypothesis function.
MAE (Mean absolute error) represents the difference between the original
and predicted values extracted by averaged the absolute difference over the
data set.
MSE (Mean Squared Error)representsthe differencebetween the original and
predicted values extracted by squared the average difference over the data
set.
RMSE (Root Mean Squared Error) is the error rate by the square root of MSE.
R-squared (Coefficient of determination) represents the coefficient of how
well the values fit compared to the original values. The value from 0 to 1
interpreted as percentages. The higher the value is, the better the model is.
The above metrics can be expressed,
,
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating
the gradient of the cost function.
o A regression model uses gradient descent to update the
coefficients of the line by reducing the cost function.
o It is done by a random selection of values of coefficient and
then iteratively update the values to reach the minimum cost
function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set
of observations. The process of finding the best model out of various
models is called optimization. It can be achieved by below method:
1. R-squaredmethod:
o R-squared is a statistical method that determines the goodness
of fit.
o It measures the strength of the relationship between the
dependent and independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference
between the predicted values and actual values and hence
represents a good model.
o It is also called a coefficient of determination, or coefficient
of multiple determination for multiple regression.
o It can be calculated from the below formula:
Assumptions ofLinear Regression
Below are some importantassumptionsof Linear Regression. These are some
formalchecks while buildinga Linear Regression model, which ensuresto get
the best possible resultfrom the given dataset.
Linear relationship between the features and target: Linear regression
assumes the linear relationship between the dependent and independent
variables.
Small or no multi collinearity between the features: Multi collinearity
means high-correlation between the independent variables. Due to multi
collinearity, it may difficult to find the true relationship between the
predictors and target variables. Or we can say, it is difficult to determine
which predictor variable is affecting the target variable and which is not. So,
the model assumes either little or no multi collinearity between the features
or independent variables.
Homoscedasticity Assumption: Homoscedasticity is a situation when the
error term is the same for all the values of independent variables. With
homoscedasticity, there should be no clear pattern distribution of data in the
scatter plot.
Normal distribution of error terms: Linear regression assumes that the
error term should follow the normal distribution pattern. If error terms are
not normally distributed, then confidence intervals will become either too
wide or too narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without
any deviation, which means the error is normally distributed.
No autocorrelations: The linear regression model assumes no
autocorrelation in error terms. If there will be any correlation in the error
term, then it willdrastically reduce the accuracy of the model. Autocorrelation
usually occurs if there is a dependency between residual errors.
Introduction to Dimensionality Reduction.
The number of inputfeatures, variables, or columns present in a given dataset
is known as dimensionality, and the process to reduce these features is called
dimensionality reduction.
A dataset contains a huge number of input features in various cases, which
makes the predictive modeling task more complicated. Because it is very
difficult to visualize or make predictions for the training dataset with a high
number of features, for such cases, dimensionality reduction techniques are
required to use.
Dimensionality reduction technique can be defined as, "It is a way of
converting the higher dimensions dataset into lesser dimensions dataset
ensuring that it provides similar information." These techniques are widely
used in machine learning for obtaining a better fit predictive model while
solving the classification and regression problems
Missing Value Ratio : If a dataset has too many missing values, then we drop
those variables as they do not carry muchuseful information. To perform this,
we can set a threshold level, and if a variable has missing values more than
that threshold, we will drop that variable. The higher the threshold value, the
more efficient the reduction.
Low Variance Filter : As same as missing valueratio technique, data columns
with some changes in the data have less information. Therefore, we need to
calculate the variance of each variable, and all data columns with variance
lower than a given threshold are dropped because low variance features will
not affect the target variable.
High Correlation Filter: High Correlation refers to the case when two
variables carry approximately similar information. Due to this factor, the
performance of the model can be degraded. This correlation between the
independent numerical variable gives the calculated value of the correlation
coefficient. If this value is higher than the threshold value, we can remove one
of the variables from the dataset. We can consider those variables or features
that show a high correlation with the target variable.
Backward Feature Elimination
The backward feature elimination technique is mainly used while developing
Linear Regression or Logistic Regression model. Below steps are performed in
this technique to reduce the dimensionality or in feature selection:
o In this technique, firstly, all the n variables of the given dataset are taken
to train the model.
o The performance of the model is checked.
o Now we will remove one feature each time and train the model on n-1
features for n times, and will compute the performance of the model.
o We will check the variable that has made the smallest or no change in
the performance of the model, and then we will drop that variable or
features; after that, we will be left with n-1 features.
o Repeat the complete process until no feature can be dropped.
In this technique, by selecting the optimum performance of the model and
maximum tolerable error rate, we can define the optimal number of features
require for the machine learning algorithms.
Forward Feature Selection
Forward feature selection follows the inverse process of the backward
elimination process. It means, in this technique, we don't eliminate the
feature; instead, we will find the best features that can produce the highest
increase in the performance of the model. Below steps are performed in this
technique:
o We start with a single feature only, and progressively we will add each
feature at a time.
o Here we will train the model on each feature separately.
o The feature with the best performance is selected.
o The process will be repeated until we get a significant increase in the
performance of the model.
Logistic regression
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique.
It is used for predicting the categorical dependent variable using a
given set of independent variables.
Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. but instead
of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except
that how they are used. Linear Regression is used for solving
Regression problems, whereas Logistic regression is used for
solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an
"S" shaped logistic function, which predicts two maximum values (0
or 1).
Logistic Function (Sigmoid Function):
o The sigmoid function is a mathematical function used to map
the predicted values to probabilities.
o It maps any real value into another value within a range of 0
and 1.
Z=mx+c
o The value of the logistic regression must be between 0 and 1,
which cannot go beyond this limit, so it forms a curve like the
"S" form. The S-form curve is called the Sigmoid function or the
logistic function.
o In logistic regression, we use the concept of the threshold
value, which defines the probability of either 0 or 1. Such as
values above the threshold value tends to 1, and a value below
the threshold values tends to 0.
G(x)=1/1+e-x
Y^ =g(x)
Y^ =1/1+e-(mx+c)
If z is very large positive value
e-(mx+c) =0 y^=1
If z is very large negative value
e-(mx+c) = large positive y^ =0
Confusion Matrix in Machine Learning:
The confusion matrix is a matrix used to determine the
performance of the classification models for a given set of test
data. It can only be determined if the true values for test data are
known. The matrix itself can be easily understood, but the related
terminologies may be confusing. Since it shows the errors in the
model performance in the form of a matrix, hence also known as
an error matrix. Some features of Confusion matrix are given
below.
Predict outcome
Positive Negative
Actual value Positive
Negative
The above table has the following cases:
o True Negative: Model has given prediction No, and the real or
actual value was also No.
o True Positive: The model has predicted yes, and the actual
value was also true.
o False Negative: The model has predicted no, but the actual
value was Yes, it is also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual
value was No. It is also called a Type-I error.
Accuracy: It is one of the important parameters to determine the
accuracy of the classification problems. It defines how often the
model predicts the correct output. It can be calculated as the ratio of
the number of correct predictions made by the classifier to all
number of predictions made by the classifiers. The formula is given
below:
TP FN
FP TN
Accuracy=correct prediction/total prediction
Accuracy= TP+TN / TP+TN+FP+FN
Precision: It can be defined as the number of correct outputs
provided by the model or out of all positive classes that have
predicted correctly by the model, how many of them were actually
true. It can be calculated using the below formula
Precision= TP/TP+FP
Recall: It is defined as the out of total positive classes,
how our model predicted correctly. The recall must be as
high as possible.
Recall = TP / TP+FN
F-measure: If two models have low precision and high recall or vice
versa, it is difficult to compare these models. So, for this purpose, we
can use F-score. This score helps us to evaluate the recall and
precision at the same time. The F-score is maximum if the recall is
equal to the precision. It can be calculated using the below formula:
F-measure = 2 * Recall * Precision/ Recall + Precision
Misclassification rate: It is also termed as Error rate, and it defines
how often the model gives the wrong predictions. The value of error
rate can be calculated as the number of incorrect predictions to all
number of the predictions made by the classifier. The formula is
given below:
Misclassification rate = FP+FN/TP+TN+FP+FN
Decision tree
Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents
the outcome.
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of
the given dataset.
It is a graphical representation for getting all the possible solutions to
a problem/decision based on given conditions.
A decision can contain categorical data (Yes/No) as well as
numerical data.
Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or
more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot
be segregated further after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root
node into sub-nodes according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted
branches from the tree.
Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how
to select the best attribute for the root node and for sub-nodes. So,
to solve such problems there is a technique which is called
as Attribute selection measure or ASM. By this measurement, we
can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
1. Information Gain:
o Information gain is the measurement of changes in entropy
after the segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about
a class.
o According to the value of information gain, we split the node
and build the decision tree.
o A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the
below formula:
Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature)
Entropy: Entropy is a metric to measure the impurity in a given
attribute. It specifies randomness in data. Entropy can be calculated
as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where, S= Total number of samples P(yes)= probability of yes
P(no)= probability of no
Gini Index:
o Gini index is a measure of impurity or purity used while
creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as
compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
o Gini index can be calculated using the below formula:
Gini = 1- Gini
Gini Index= 1- ∑jPj
2
Ensemble method
Ensemble methods is a machine learning technique that combines
several base models in order to produce one optimal predictive
model. To better understand this definition lets take a step back into
ultimate goal of machine learning and model building. This is going
to make more sense as I dive into specific examples and why
Ensemble methods are used.
Types of Ensemble Methods
1. BAGGING, or Bootstrap aggregating. Bagging gets its name
because it combines Bootstrapping and Aggregation to form one
ensemble model. Given a sample of data, multiple bootstrapped
subsamples are pulled. A Decision Tree is formed on each of the
bootstrapped subsamples. After each subsample Decision Tree
has been formed, an algorithm is used to aggregate over the
Decision Trees to form the most efficient predictor. The image
below will help explain:
Random Forest
Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for both
Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the
performance of the model.
As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest
takes the prediction from each tree and based on the majority votes
of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy
and prevents the problem of over fitting.
The Working process can be explained in the below steps and
diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data
points (Subsets).
Step-3: Choose the number N for decision trees that you want to
build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision
tree, and assign the new data points to the category that wins the
majority votes.
Clustering
Clustering or cluster analysis is a machine learning technique, which
groups the unlabeled dataset. It can be defined as "A way of
grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities
remain in a group that has less or no similarities with another
group."
K-Mean clustering alga
K-Means Clustering is an unsupervised learning algorithm that is
used to solve the clustering problems in machine learning or data
science. In this topic, we will learn what is K-means clustering
algorithm, how the algorithm works, along with the Python
implementation of k-means clustering.
It is an iterative algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one
group that has similar properties.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by
an iterative process.
o Assigns each data point to its closest k-center. Those data
points which are near to the particular k-center, create a
cluster.
The working of the K-Means algorithm is explained in the below
steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from
the input dataset).
Step-3: Assign each data point to their closest centroid, which will
form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each
cluster.
Step-5: Repeat the third steps, which means reassign each data
point to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to
FINISH.
Step-7: The model is ready.
Conclusion
The Industrial Training program should be taken seriously to ensure
that maximum benefit is obtained by the student in order to
increase their knowledge.
The Industrial Training component can add value to all degree
programs; specifically, it improves graduate’s work skills and
prepares them to face the challenges of the working world.
Apart from the learning from faculty, learning from the peers played
a major role during that period.
References
Analytics Vidhya - Learn Machine learning, artificial intelligence, business analytics, data
science, big data, data visualizations tools and techniques. | Analytics Vidhya.
Machine LearningAlgorithms - Javatpoint
Machine LearningTraining| LearnMachine LearningOnline |InternshalaTrainings

More Related Content

What's hot

ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200
ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200
ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200Mani Bhiryani
 
HCL Technologies parsentation
HCL Technologies parsentationHCL Technologies parsentation
HCL Technologies parsentation
CBSMS
 
Aroso Emmanuel A. - IT Technical Report.pdf
Aroso Emmanuel A. - IT Technical Report.pdfAroso Emmanuel A. - IT Technical Report.pdf
Aroso Emmanuel A. - IT Technical Report.pdf
Yolanda Ivey
 
Training Report WSO2 internship
Training Report  WSO2 internshipTraining Report  WSO2 internship
Training Report WSO2 internship
Keet Sugathadasa
 
Strategic analysis of it industry
Strategic analysis of  it industryStrategic analysis of  it industry
Strategic analysis of it industry
vyas vemuri
 
IOCL summer training report ,ECE
IOCL summer training report ,ECEIOCL summer training report ,ECE
IOCL summer training report ,ECE
DHURBAJYOTIBORUAH1
 
Cognizant details
Cognizant detailsCognizant details
Cognizant details
Arjun Ravindran
 
Computer science industrial training report carried out at web info net ltd ...
Computer science  industrial training report carried out at web info net ltd ...Computer science  industrial training report carried out at web info net ltd ...
Computer science industrial training report carried out at web info net ltd ...
rashid muganga
 
Itc limited ppt2
Itc limited ppt2Itc limited ppt2
Itc limited ppt2
balajimechjtj
 
WSO2 Internship Report
WSO2 Internship ReportWSO2 Internship Report
WSO2 Internship Report
Ujitha Iroshan
 
Wipro presentation
Wipro presentationWipro presentation
Wipro presentationmanishkr90
 
Summer Training report at TATA CMC
Summer Training report at TATA CMCSummer Training report at TATA CMC
Summer Training report at TATA CMCPallavi Srivastava
 
Equity trading in india
Equity trading in indiaEquity trading in india
Equity trading in india
smriti31dubei
 
HCL Technology PPT( overview)
HCL Technology PPT( overview)HCL Technology PPT( overview)
HCL Technology PPT( overview)
Krushang Thakor
 
TCS (Tata Consultacy Services)
TCS (Tata Consultacy Services)TCS (Tata Consultacy Services)
TCS (Tata Consultacy Services)
Nikhil Tiwari
 
Accenture
AccentureAccenture
Accenture
Pinky Kashyup
 
Presentation on HCL.....
Presentation on HCL.....Presentation on HCL.....
Presentation on HCL.....Mukesh Latwal
 
Siwes report on networking by abdullahi yahaya
Siwes report on networking by abdullahi yahayaSiwes report on networking by abdullahi yahaya
Siwes report on networking by abdullahi yahaya
Abdullahi Yahaya AESM, (CPM)
 
Model resume
Model resumeModel resume
Model resume
chakravarthy Gopi
 

What's hot (20)

L&T REPORT
L&T REPORTL&T REPORT
L&T REPORT
 
ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200
ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200
ManiBhiryani_987879_ExperienceLetter_XLIDL-2016-09-14-11-40-57-200
 
HCL Technologies parsentation
HCL Technologies parsentationHCL Technologies parsentation
HCL Technologies parsentation
 
Aroso Emmanuel A. - IT Technical Report.pdf
Aroso Emmanuel A. - IT Technical Report.pdfAroso Emmanuel A. - IT Technical Report.pdf
Aroso Emmanuel A. - IT Technical Report.pdf
 
Training Report WSO2 internship
Training Report  WSO2 internshipTraining Report  WSO2 internship
Training Report WSO2 internship
 
Strategic analysis of it industry
Strategic analysis of  it industryStrategic analysis of  it industry
Strategic analysis of it industry
 
IOCL summer training report ,ECE
IOCL summer training report ,ECEIOCL summer training report ,ECE
IOCL summer training report ,ECE
 
Cognizant details
Cognizant detailsCognizant details
Cognizant details
 
Computer science industrial training report carried out at web info net ltd ...
Computer science  industrial training report carried out at web info net ltd ...Computer science  industrial training report carried out at web info net ltd ...
Computer science industrial training report carried out at web info net ltd ...
 
Itc limited ppt2
Itc limited ppt2Itc limited ppt2
Itc limited ppt2
 
WSO2 Internship Report
WSO2 Internship ReportWSO2 Internship Report
WSO2 Internship Report
 
Wipro presentation
Wipro presentationWipro presentation
Wipro presentation
 
Summer Training report at TATA CMC
Summer Training report at TATA CMCSummer Training report at TATA CMC
Summer Training report at TATA CMC
 
Equity trading in india
Equity trading in indiaEquity trading in india
Equity trading in india
 
HCL Technology PPT( overview)
HCL Technology PPT( overview)HCL Technology PPT( overview)
HCL Technology PPT( overview)
 
TCS (Tata Consultacy Services)
TCS (Tata Consultacy Services)TCS (Tata Consultacy Services)
TCS (Tata Consultacy Services)
 
Accenture
AccentureAccenture
Accenture
 
Presentation on HCL.....
Presentation on HCL.....Presentation on HCL.....
Presentation on HCL.....
 
Siwes report on networking by abdullahi yahaya
Siwes report on networking by abdullahi yahayaSiwes report on networking by abdullahi yahaya
Siwes report on networking by abdullahi yahaya
 
Model resume
Model resumeModel resume
Model resume
 

Similar to ITVV(Industrial training report)

Vinayak Srivastava.pptx
Vinayak Srivastava.pptxVinayak Srivastava.pptx
Vinayak Srivastava.pptx
Chandan Srivastava
 
Project Report Format for Final Year Engineering Students
Project Report Format for Final Year Engineering StudentsProject Report Format for Final Year Engineering Students
Project Report Format for Final Year Engineering Students
cutericha10
 
ppt.pptx
ppt.pptxppt.pptx
ppt.pptx
mukulbansal34
 
Pragna balaji-rao final resume
Pragna balaji-rao final resumePragna balaji-rao final resume
Pragna balaji-rao final resume
PragnaBRao
 
Mid defense presentation on machine learning.pptx
Mid defense presentation on machine learning.pptxMid defense presentation on machine learning.pptx
Mid defense presentation on machine learning.pptx
nobitad323
 
Pragna balaji-rao final resume 1999
Pragna balaji-rao final resume 1999Pragna balaji-rao final resume 1999
Pragna balaji-rao final resume 1999
PragnaBRao
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Careervira
 
Master Machine Learning with Our Top-Rated Training Course in Noida.pptx
Master Machine Learning with Our Top-Rated Training Course in Noida.pptxMaster Machine Learning with Our Top-Rated Training Course in Noida.pptx
Master Machine Learning with Our Top-Rated Training Course in Noida.pptx
APTRON Solutions Noida
 
FINAL REVIEW for final semester internship.pptx
FINAL REVIEW for final semester internship.pptxFINAL REVIEW for final semester internship.pptx
FINAL REVIEW for final semester internship.pptx
royromeo560
 
What is Artificial Intelligence
 What is Artificial Intelligence What is Artificial Intelligence
What is Artificial Intelligence
emergingindia1
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah
 
ac current .pdf
ac current .pdfac current .pdf
ac current .pdf
ssuser3664c5
 
International Management Development Program on Tableau BootCamp
International Management Development Program on Tableau BootCampInternational Management Development Program on Tableau BootCamp
International Management Development Program on Tableau BootCamp
ndim1
 
Data science course ppt
Data science course pptData science course ppt
Data science course ppt
prashantnet
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
Shivaprasad544423
 
Training_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxTraining_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docx
ShubhamBishnoi14
 
Guide to Becoming a Data Analyst.pdf
Guide to Becoming a Data Analyst.pdfGuide to Becoming a Data Analyst.pdf
Guide to Becoming a Data Analyst.pdf
mohitreal1995
 
Intersnship presentation done on inventeron technology company
Intersnship presentation done on inventeron technology companyIntersnship presentation done on inventeron technology company
Intersnship presentation done on inventeron technology company
kushalk200220
 
A great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream jobA great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream job
MamathaSharma4
 
Internet of Things (IoT) Training
Internet of Things (IoT) TrainingInternet of Things (IoT) Training
Internet of Things (IoT) Training
Manish Shrivastava
 

Similar to ITVV(Industrial training report) (20)

Vinayak Srivastava.pptx
Vinayak Srivastava.pptxVinayak Srivastava.pptx
Vinayak Srivastava.pptx
 
Project Report Format for Final Year Engineering Students
Project Report Format for Final Year Engineering StudentsProject Report Format for Final Year Engineering Students
Project Report Format for Final Year Engineering Students
 
ppt.pptx
ppt.pptxppt.pptx
ppt.pptx
 
Pragna balaji-rao final resume
Pragna balaji-rao final resumePragna balaji-rao final resume
Pragna balaji-rao final resume
 
Mid defense presentation on machine learning.pptx
Mid defense presentation on machine learning.pptxMid defense presentation on machine learning.pptx
Mid defense presentation on machine learning.pptx
 
Pragna balaji-rao final resume 1999
Pragna balaji-rao final resume 1999Pragna balaji-rao final resume 1999
Pragna balaji-rao final resume 1999
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Master Machine Learning with Our Top-Rated Training Course in Noida.pptx
Master Machine Learning with Our Top-Rated Training Course in Noida.pptxMaster Machine Learning with Our Top-Rated Training Course in Noida.pptx
Master Machine Learning with Our Top-Rated Training Course in Noida.pptx
 
FINAL REVIEW for final semester internship.pptx
FINAL REVIEW for final semester internship.pptxFINAL REVIEW for final semester internship.pptx
FINAL REVIEW for final semester internship.pptx
 
What is Artificial Intelligence
 What is Artificial Intelligence What is Artificial Intelligence
What is Artificial Intelligence
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
ac current .pdf
ac current .pdfac current .pdf
ac current .pdf
 
International Management Development Program on Tableau BootCamp
International Management Development Program on Tableau BootCampInternational Management Development Program on Tableau BootCamp
International Management Development Program on Tableau BootCamp
 
Data science course ppt
Data science course pptData science course ppt
Data science course ppt
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Training_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxTraining_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docx
 
Guide to Becoming a Data Analyst.pdf
Guide to Becoming a Data Analyst.pdfGuide to Becoming a Data Analyst.pdf
Guide to Becoming a Data Analyst.pdf
 
Intersnship presentation done on inventeron technology company
Intersnship presentation done on inventeron technology companyIntersnship presentation done on inventeron technology company
Intersnship presentation done on inventeron technology company
 
A great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream jobA great PG program in Machine Learning that will help you land in your dream job
A great PG program in Machine Learning that will help you land in your dream job
 
Internet of Things (IoT) Training
Internet of Things (IoT) TrainingInternet of Things (IoT) Training
Internet of Things (IoT) Training
 

More from HRJEETSINGH

Web Tracking in cyber security and network security
Web Tracking in cyber security and  network securityWeb Tracking in cyber security and  network security
Web Tracking in cyber security and network security
HRJEETSINGH
 
6LowPAN etc.pptx computer network and IOT devices in future technology
6LowPAN etc.pptx computer network and IOT devices in future technology6LowPAN etc.pptx computer network and IOT devices in future technology
6LowPAN etc.pptx computer network and IOT devices in future technology
HRJEETSINGH
 
Data compression MCQs AKTU Final Year Examination
Data compression MCQs AKTU Final Year ExaminationData compression MCQs AKTU Final Year Examination
Data compression MCQs AKTU Final Year Examination
HRJEETSINGH
 
image processing MCQ AKTU final year Exam all units
image processing MCQ AKTU final year Exam all unitsimage processing MCQ AKTU final year Exam all units
image processing MCQ AKTU final year Exam all units
HRJEETSINGH
 
Renewable energy resources mcqs quiz unit(1 5)
Renewable energy resources mcqs quiz unit(1 5)Renewable energy resources mcqs quiz unit(1 5)
Renewable energy resources mcqs quiz unit(1 5)
HRJEETSINGH
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
HRJEETSINGH
 

More from HRJEETSINGH (6)

Web Tracking in cyber security and network security
Web Tracking in cyber security and  network securityWeb Tracking in cyber security and  network security
Web Tracking in cyber security and network security
 
6LowPAN etc.pptx computer network and IOT devices in future technology
6LowPAN etc.pptx computer network and IOT devices in future technology6LowPAN etc.pptx computer network and IOT devices in future technology
6LowPAN etc.pptx computer network and IOT devices in future technology
 
Data compression MCQs AKTU Final Year Examination
Data compression MCQs AKTU Final Year ExaminationData compression MCQs AKTU Final Year Examination
Data compression MCQs AKTU Final Year Examination
 
image processing MCQ AKTU final year Exam all units
image processing MCQ AKTU final year Exam all unitsimage processing MCQ AKTU final year Exam all units
image processing MCQ AKTU final year Exam all units
 
Renewable energy resources mcqs quiz unit(1 5)
Renewable energy resources mcqs quiz unit(1 5)Renewable energy resources mcqs quiz unit(1 5)
Renewable energy resources mcqs quiz unit(1 5)
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
 

Recently uploaded

Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 

Recently uploaded (20)

Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 

ITVV(Industrial training report)

  • 1. INDUSTRIAL TRAINING REPORT Machine Learning Submitted In Partial Fulfillment of the Requirements For the Degree of Bachelor of Technology In Computer Science and Engineering By HRJEET SINGH Roll No. 1700410019 2017-2021 2021 Sponsored by Internshala Noida Gurgaon U.P
  • 2. Table of Content Declaration i Certificate ii Acknowledgement iii Abstract iv 1.0 Introduction …. 2.0 Company Background & Structure …. 3.0 Weekly Job Summary …. 3.1 Daily Records 3.2 About the Training 3.3 Training Schedule and location 4.0 Technical Contents …. 4.1 Description of tasks 5.0 Learning outcome and work experience …. 5.1 Application of Theory and skills 6.0 Conclusion References
  • 3. Declaration I hereby declare that I have completed my six weeks summer training at Internshala(one of the world’s leading online certification training providers) from 24th Nov, 2020 to 05th Jan, 2020 under the guidance of Mr. Kunal Jain and Mr. Sunil Roy. I have declared that I have worked with full dedication during these six weeks of training and my learning outcomes fulfill the requirements of training for the award of degree of Bachelor of Technology (B.Tech.) in Computer Science and Engineering, Raja Balwant Singh Engineering Technical Campus Name : Hrjeet Singh Roll.No. 1700410019 Date:
  • 4. CERTIFICATE Prof.(Dr.) Brajesh Kumar Singh H.O.D. CSE Deptt.
  • 5. ACKNOWLEDGEMENT I would like to acknowledgement the contribution of the following people without whose help and guidance this report would not have completed . I acknowledgement the counsel and support of our training coordinator, Mr. BrajeshKumar singh, Head of CSE Department, with respect and gratitude, whose expertise, guidance, support, encouragement, and enthusiasm has made this report possible . their feedback vastly improve the quality of this report and provided an enthralling experience. I am indeed proud and fortunate to be supported by him. Although it is not possible to name individually, I shall ever remain indebted to the faculty members of R.B.S. Engineering Technical Campus Bichpuri, Agra for their persistent support and cooperation extended during this work. This acknowledgement will remain incomplete if I fail to express our deep sense of obligation to my parents and God for their consistent blessings and encouragement. Hrjeet singh 1700410019
  • 6. Abstract Industrial training is an important phase of a student life. A well planned, properly executed and evaluated industrial training helps a lot in developing a professional attitude. It develop an awareness of industrial approach to problem solving, based on a broad understanding of process and mode of operation of organization. The aim and motivation of this industrial training is to receive discipline, skills, teamwork and technical knowledge through a proper training environment, which will help me, as a student in the field of Information Technology, to develop a responsiveness of the self-disciplinary nature of problems in information and communication technology
  • 7. Company Background & structure Company profile Internshala was created with a mission to create skilled software engineers for our country and the world. It aims to bridge the gap between the quality of demanded by industry and the quality of skills imparted by conventional institute. With assessments, learning paths and courses authored by industry experts, Internshala help businesses and individual benchmark expertise across roles, speed up release cycles and build reliable, secure products. VISION We are a technology company on a mission to equip students with relevant skills & practical exposurethrough internshipsand online trainings. Imagine a world full of freedom and possibilities. A world where you can discover your passion and turn it into your career. A world where your practical skills matter more than your university degree. A world where you do not have to wait till 21 to taste your first work experience (and get a rude shock that it is nothing like you had imagine it to be). A world where you graduate fully assured, fully confident, and fully prepared to stake claim on your place in the world. History The platform, which was founded in 2010, started out as a WordPress blog that aggregated internships across India and articles on education, technology and skill gap. Internshala launched its online trainings in 2014. As of 2018, the platform had 3.5 million students and 80,000 companies. Mission nternshala's mission is to equip every student with practical skills and exposure so that they can build their dream careers. And our e-learning platform, Internshala Trainings ( https://trainings.internshala.com) is central to this mission. Internshala Trainings' goal is simple - to make learning easy.
  • 8. Objectives Main objective of training were to learn:  How to determine and measure program complexity.  Python programming.  Machine learning Library Scikit Learn, Numpy , Matplotlib , Pandas , Seaborn.  Statistical Math for the Algorithms.  Learning to solve and Mathematical concepts.  Supervised and Unsupervised learning.  Classification and Regression.  Machine learning algorithms.  Machine Learning Programming and Use Cases. Weekly Summery Week 1 Introduction to machine learning, Introduction to data, Assignment 1, assignment 2 Week 2 Introduction to python and Data Exploration and Preprocessing. Assingment 3, Assignment 4. Week 3 Linear Regression and Introduction to Dimensionality Reduction. Assingment 5, Assignment 6. Week 4 Logistic Regression and Decision Tree. Assingment7, Assignment 8. Week 5 Ensemble model. Assingment 9. Week 6 Clustering, project. Assingment 10.
  • 9. About the training Training is the process of teaching, informing or educating people so that they may become well qualified as possible to do their job, and they become qualified to perform in positions of greater difficulty and responsibility. Training is an organized and planned effort by a company in order to facilitate employees learning regarding job related competencies.  Industrial training at Internshala from 24th November 2020 to 05th January 2021.  I completed my online industrial training from “Internshala” located in Gurgaon whose time period was of 42 days. I have completed my online training under the guidance MR. Kunal Jain and MR. Sunil Roy
  • 10. Introduction To Machine Learning Machine learning enables a machine to automatically learn from data, improve performance from experiences, and predict things without being explicitly programmed. In the real world, we are surrounded by humans who can learn everything from their experiences with their learning capability, and we have computers or machines which work on our instructions. But can a machine also learn from experiences or past data like a human does? So here comes the role of Machine Learning. Type of machine learning The types of machine learning algorithm differ in their approach, the type of data they input and output, and the type of task or problem that they are intended to solve. Broadly machine learning can be categorized into two categories, I. Supervised Learning II. Unsupervised Learning Supervised Learning Supervised Learning is a type of learning in witch we are given a data set and we already know what are correct output should look like, having the idea that there is a relationship between the input
  • 11. and output. Basically, it is learning task of learning a function that maps an input to an output based on example input-output pair. Unsupervised Learning Unsupervised learning is a type of learning that allow us to approach problems with little or no idea our problem should look like. We can derive the structure by clustering the data based on relationship among the variables in data. With unsupervised learning there is no feedback based on prediction result. Basically , it is a type of self-organized learning that help in finding previously unknown patterns in the data set without pre-existing label. Data Data is collection of information about any things. Ex. Notification, Activity of time, Clock alarm etc. Two type of data use in machine learning models, 1 Labeled data 2 Unlabeled data Labeled data The data which contain a target variable or an output variable that answer a question of interest is called labeled data. Unlabeled data Unlabeled data is a designation for pieces of data that have not been tagged with labels identifying characteristics, properties or classifications.
  • 12. Introduction to Python Python is a widely used general-purpose, high level programming language. It was initially designed by Guido van Rossum in 1991 and developed by Python Software Foundation. It was mainly developed for an emphasis on code readability, and its syntax allows programmers to express concepts in fewer lines of code. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object- oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library. Basic Libraries in Python Scikit-learn for handling basic ML algorithms like clustering, linear and logistic regressions, regression, classification, and others. Pandas for high-level data structures and analysis. It allows merging and filtering of data, as well as gathering it from other external sources like Excel, for instance. Matplotlib for creating 2D plots, histograms, charts, and other forms of visualization. NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional array object, and tools for working with these arrays
  • 13. Data Preprocessing Machine learning on’t work so well with processing raw data. Before we can feed such data to an ML algorithm, we must preprocess it. We must apply some transformations on it. With data preprocessing, we convert raw data into a clean data set. To perform data this, there are 6 techniques- 1. Rescaling Data -For data with attributes of varying scales, we can rescale attributes to possess the same scale. We rescale attributes into the range 0 to 1 and call it normalization. We use the Min Max Scaler class from scikit-learn. This gives us values between 0 and 1 2. Normalizing Data -In this task, we rescale each observation to a length of 1 (a unit norm). For this, we use the Normalizer class. 3. Mean Removal-We can remove the mean from each feature to center it on zero. 4. Some labels can be words or numbers. Usually, training data is labelled with words to make it readable. Label encoding converts word labels into numbers to let algorithms work on them. 5. One Hot Encoding -When dealing with few and scattered numerical values, we may not need to store these. Then, we can perform OneHot Encoding. For k distinct values, we can transform t he feature into a k-dimensionalvector with one value of 1 and 0 as the rest values. 6. Standardizing Data -With standardizing, we can take attributes with a Gaussian distribution and different means and standard deviations and transform them into a standard Gaussian distribution with a mean of 0 and a standard deviation of 1.
  • 14. Exploratory Data Analysis (EDA) It is the process of summarizing, visualizing and getting deeply acquainted with the important traits of a data set. When you carry out EDA, domainknowledge(e.g. about thebusinessor social impact category) canhelp a great dealin understanding thedataand extracting insights from it. To achieve this level of certainty, here’s what you can do with EDA:  Understand how the raw data was collected  Get familiar with different characteristics of the data  Learn about the individual features and their mutual relationships (or lack of)  Check and validate the data for anomalies, outliers, missing values, human errors, etc.  Extract insightsthat weren’t soevident to businessstakeholders but can provide useful information about the business  Discover hidden patterns in the data that allow for better comprehension of the business problem  Validate if the data has been generated in an expected manner
  • 15. Linear regression Linear regression may be defined as the statistical model that analyzes the linear relationship between a dependent variable with given set of independent variables. Linear relationship between variables means that when the value of one or more independent variables will change (increase or decrease), the value of dependent variable will also change accordingly (increase or decrease). Mathematically the relationship can be represented with the help of following equation − Y = mX + c Here: Y=Dependent Variable (Target Variable) X=Independent Variable (predictor Variable) C= intercept of the line m=Linear regression coefficient
  • 16. Cost function- o The different values for weights or coefficient of lines (a0, a1) gives the different line of regression, and the cost function is used to estimate the values of the coefficient for the best fit line. o Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is performing. o We can use the cost function to find the accuracy of the mapping function, which maps the input variable to the output variable. This mapping function is also known as Hypothesis function. MAE (Mean absolute error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. MSE (Mean Squared Error)representsthe differencebetween the original and predicted values extracted by squared the average difference over the data set. RMSE (Root Mean Squared Error) is the error rate by the square root of MSE. R-squared (Coefficient of determination) represents the coefficient of how well the values fit compared to the original values. The value from 0 to 1 interpreted as percentages. The higher the value is, the better the model is. The above metrics can be expressed, ,
  • 17. Gradient Descent: o Gradient descent is used to minimize the MSE by calculating the gradient of the cost function. o A regression model uses gradient descent to update the coefficients of the line by reducing the cost function. o It is done by a random selection of values of coefficient and then iteratively update the values to reach the minimum cost function. Model Performance: The Goodness of fit determines how the line of regression fits the set of observations. The process of finding the best model out of various models is called optimization. It can be achieved by below method: 1. R-squaredmethod: o R-squared is a statistical method that determines the goodness of fit. o It measures the strength of the relationship between the dependent and independent variables on a scale of 0-100%. o The high value of R-square determines the less difference between the predicted values and actual values and hence represents a good model. o It is also called a coefficient of determination, or coefficient of multiple determination for multiple regression. o It can be calculated from the below formula:
  • 18. Assumptions ofLinear Regression Below are some importantassumptionsof Linear Regression. These are some formalchecks while buildinga Linear Regression model, which ensuresto get the best possible resultfrom the given dataset. Linear relationship between the features and target: Linear regression assumes the linear relationship between the dependent and independent variables. Small or no multi collinearity between the features: Multi collinearity means high-correlation between the independent variables. Due to multi collinearity, it may difficult to find the true relationship between the predictors and target variables. Or we can say, it is difficult to determine which predictor variable is affecting the target variable and which is not. So, the model assumes either little or no multi collinearity between the features or independent variables. Homoscedasticity Assumption: Homoscedasticity is a situation when the error term is the same for all the values of independent variables. With homoscedasticity, there should be no clear pattern distribution of data in the scatter plot. Normal distribution of error terms: Linear regression assumes that the error term should follow the normal distribution pattern. If error terms are not normally distributed, then confidence intervals will become either too wide or too narrow, which may cause difficulties in finding coefficients. It can be checked using the q-q plot. If the plot shows a straight line without any deviation, which means the error is normally distributed. No autocorrelations: The linear regression model assumes no autocorrelation in error terms. If there will be any correlation in the error term, then it willdrastically reduce the accuracy of the model. Autocorrelation usually occurs if there is a dependency between residual errors.
  • 19. Introduction to Dimensionality Reduction. The number of inputfeatures, variables, or columns present in a given dataset is known as dimensionality, and the process to reduce these features is called dimensionality reduction. A dataset contains a huge number of input features in various cases, which makes the predictive modeling task more complicated. Because it is very difficult to visualize or make predictions for the training dataset with a high number of features, for such cases, dimensionality reduction techniques are required to use. Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions dataset into lesser dimensions dataset ensuring that it provides similar information." These techniques are widely used in machine learning for obtaining a better fit predictive model while solving the classification and regression problems Missing Value Ratio : If a dataset has too many missing values, then we drop those variables as they do not carry muchuseful information. To perform this, we can set a threshold level, and if a variable has missing values more than that threshold, we will drop that variable. The higher the threshold value, the more efficient the reduction. Low Variance Filter : As same as missing valueratio technique, data columns with some changes in the data have less information. Therefore, we need to calculate the variance of each variable, and all data columns with variance lower than a given threshold are dropped because low variance features will not affect the target variable. High Correlation Filter: High Correlation refers to the case when two variables carry approximately similar information. Due to this factor, the performance of the model can be degraded. This correlation between the independent numerical variable gives the calculated value of the correlation coefficient. If this value is higher than the threshold value, we can remove one of the variables from the dataset. We can consider those variables or features that show a high correlation with the target variable.
  • 20. Backward Feature Elimination The backward feature elimination technique is mainly used while developing Linear Regression or Logistic Regression model. Below steps are performed in this technique to reduce the dimensionality or in feature selection: o In this technique, firstly, all the n variables of the given dataset are taken to train the model. o The performance of the model is checked. o Now we will remove one feature each time and train the model on n-1 features for n times, and will compute the performance of the model. o We will check the variable that has made the smallest or no change in the performance of the model, and then we will drop that variable or features; after that, we will be left with n-1 features. o Repeat the complete process until no feature can be dropped. In this technique, by selecting the optimum performance of the model and maximum tolerable error rate, we can define the optimal number of features require for the machine learning algorithms. Forward Feature Selection Forward feature selection follows the inverse process of the backward elimination process. It means, in this technique, we don't eliminate the feature; instead, we will find the best features that can produce the highest increase in the performance of the model. Below steps are performed in this technique: o We start with a single feature only, and progressively we will add each feature at a time. o Here we will train the model on each feature separately. o The feature with the best performance is selected. o The process will be repeated until we get a significant increase in the performance of the model.
  • 21. Logistic regression Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems. In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).
  • 22. Logistic Function (Sigmoid Function): o The sigmoid function is a mathematical function used to map the predicted values to probabilities. o It maps any real value into another value within a range of 0 and 1. Z=mx+c o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function. o In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0. G(x)=1/1+e-x Y^ =g(x) Y^ =1/1+e-(mx+c) If z is very large positive value e-(mx+c) =0 y^=1 If z is very large negative value e-(mx+c) = large positive y^ =0 Confusion Matrix in Machine Learning: The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are
  • 23. known. The matrix itself can be easily understood, but the related terminologies may be confusing. Since it shows the errors in the model performance in the form of a matrix, hence also known as an error matrix. Some features of Confusion matrix are given below. Predict outcome Positive Negative Actual value Positive Negative The above table has the following cases: o True Negative: Model has given prediction No, and the real or actual value was also No. o True Positive: The model has predicted yes, and the actual value was also true. o False Negative: The model has predicted no, but the actual value was Yes, it is also called as Type-II error. o False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I error. Accuracy: It is one of the important parameters to determine the accuracy of the classification problems. It defines how often the model predicts the correct output. It can be calculated as the ratio of the number of correct predictions made by the classifier to all number of predictions made by the classifiers. The formula is given below: TP FN FP TN
  • 24. Accuracy=correct prediction/total prediction Accuracy= TP+TN / TP+TN+FP+FN Precision: It can be defined as the number of correct outputs provided by the model or out of all positive classes that have predicted correctly by the model, how many of them were actually true. It can be calculated using the below formula Precision= TP/TP+FP Recall: It is defined as the out of total positive classes, how our model predicted correctly. The recall must be as high as possible. Recall = TP / TP+FN F-measure: If two models have low precision and high recall or vice versa, it is difficult to compare these models. So, for this purpose, we can use F-score. This score helps us to evaluate the recall and precision at the same time. The F-score is maximum if the recall is equal to the precision. It can be calculated using the below formula: F-measure = 2 * Recall * Precision/ Recall + Precision Misclassification rate: It is also termed as Error rate, and it defines how often the model gives the wrong predictions. The value of error rate can be calculated as the number of incorrect predictions to all number of the predictions made by the classifier. The formula is given below: Misclassification rate = FP+FN/TP+TN+FP+FN
  • 25. Decision tree Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. The decisions or the test are performed on the basis of features of the given dataset. It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. A decision can contain categorical data (Yes/No) as well as numerical data. Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets. Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. Branch/Sub Tree: A tree formed by splitting the tree. Pruning: Pruning is the process of removing the unwanted branches from the tree. Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.
  • 26. Attribute Selection Measures While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There are two popular techniques for ASM, which are: o Information Gain o Gini Index 1. Information Gain: o Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute. o It calculates how much information a feature provides us about a class. o According to the value of information gain, we split the node and build the decision tree. o A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula: Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature) Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as: Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) Where, S= Total number of samples P(yes)= probability of yes P(no)= probability of no
  • 27. Gini Index: o Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. o An attribute with the low Gini index should be preferred as compared to the high Gini index. o It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. o Gini index can be calculated using the below formula: Gini = 1- Gini Gini Index= 1- ∑jPj 2 Ensemble method Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model. To better understand this definition lets take a step back into ultimate goal of machine learning and model building. This is going to make more sense as I dive into specific examples and why Ensemble methods are used. Types of Ensemble Methods 1. BAGGING, or Bootstrap aggregating. Bagging gets its name because it combines Bootstrapping and Aggregation to form one ensemble model. Given a sample of data, multiple bootstrapped subsamples are pulled. A Decision Tree is formed on each of the bootstrapped subsamples. After each subsample Decision Tree
  • 28. has been formed, an algorithm is used to aggregate over the Decision Trees to form the most efficient predictor. The image below will help explain: Random Forest Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output. The greater number of trees in the forest leads to higher accuracy and prevents the problem of over fitting.
  • 29. The Working process can be explained in the below steps and diagram: Step-1: Select random K data points from the training set. Step-2: Build the decision trees associated with the selected data points (Subsets). Step-3: Choose the number N for decision trees that you want to build. Step-4: Repeat Step 1 & 2. Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the category that wins the majority votes.
  • 30. Clustering Clustering or cluster analysis is a machine learning technique, which groups the unlabeled dataset. It can be defined as "A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group." K-Mean clustering alga K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the Python implementation of k-means clustering. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. The k-means clustering algorithm mainly performs two tasks: o Determines the best value for K center points or centroids by an iterative process. o Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster.
  • 31. The working of the K-Means algorithm is explained in the below steps: Step-1: Select the number K to decide the number of clusters. Step-2: Select random K points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. Step-4: Calculate the variance and place a new centroid of each cluster. Step-5: Repeat the third steps, which means reassign each data point to the new closest centroid of each cluster. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready.
  • 32. Conclusion The Industrial Training program should be taken seriously to ensure that maximum benefit is obtained by the student in order to increase their knowledge. The Industrial Training component can add value to all degree programs; specifically, it improves graduate’s work skills and prepares them to face the challenges of the working world. Apart from the learning from faculty, learning from the peers played a major role during that period. References Analytics Vidhya - Learn Machine learning, artificial intelligence, business analytics, data science, big data, data visualizations tools and techniques. | Analytics Vidhya. Machine LearningAlgorithms - Javatpoint Machine LearningTraining| LearnMachine LearningOnline |InternshalaTrainings