This report is based on the internship experience I had during my time of internship. The relevant details of the internship program are available in the cover page. This report contains three main chapters namely, Introduction to the Training Establishment, Training Experience and Conclusion. In the following paragraphs, what each chapter contains is explained briefly.
The first chapter is titled, “Introduction to training establishment” and it contains information about the organization that I had my training at.
The second chapter includes information related to the training experience I had, during my time of stay at the training establishment.
The final chapter is the conclusion of the report, where it contains a summary of the training experience mentioned in chapter 2 and how all these training experiences affected my life and career and it distinguishes the university life from the training life, by clearly mentioning what I gained as an intern in that company.
This is the Power point presentation which shows the information about the HCL Technology. This project is Drafted by a student "Krushang Thakor" . I am A Management Student .
This data is showing the overall Information about the HCL technology till the year 2018
This PPT contains latest information about TCS (Tata Consultancy Services) to some extent which can be useful for the fresher to get a knowledge about the company to some level
Project Report Format for Final Year Engineering Studentscutericha10
Project report is a written evidence of tasks, processes and activities that are undertaken and accomplished by the students while pursuing their projects and implementing it.
This report is an official document that reflects precise and concrete information about the different aspects of the project ranging from the overview, requirements, practical aspects, theoretical considerations, tasks furnished, outcomes gained, objectives listed, reports attached, abstracts, experiments and results, conclusions and recommendations to the implementation and scope of the project.
This report is based on the internship experience I had during my time of internship. The relevant details of the internship program are available in the cover page. This report contains three main chapters namely, Introduction to the Training Establishment, Training Experience and Conclusion. In the following paragraphs, what each chapter contains is explained briefly.
The first chapter is titled, “Introduction to training establishment” and it contains information about the organization that I had my training at.
The second chapter includes information related to the training experience I had, during my time of stay at the training establishment.
The final chapter is the conclusion of the report, where it contains a summary of the training experience mentioned in chapter 2 and how all these training experiences affected my life and career and it distinguishes the university life from the training life, by clearly mentioning what I gained as an intern in that company.
This is the Power point presentation which shows the information about the HCL Technology. This project is Drafted by a student "Krushang Thakor" . I am A Management Student .
This data is showing the overall Information about the HCL technology till the year 2018
This PPT contains latest information about TCS (Tata Consultancy Services) to some extent which can be useful for the fresher to get a knowledge about the company to some level
Project Report Format for Final Year Engineering Studentscutericha10
Project report is a written evidence of tasks, processes and activities that are undertaken and accomplished by the students while pursuing their projects and implementing it.
This report is an official document that reflects precise and concrete information about the different aspects of the project ranging from the overview, requirements, practical aspects, theoretical considerations, tasks furnished, outcomes gained, objectives listed, reports attached, abstracts, experiments and results, conclusions and recommendations to the implementation and scope of the project.
Learn How to Become an Expert in Artificial Intelligence With Our Roadmap
Imagine a machine arranging all your clothes as you like it or preparing customized food, considering each family member’s choice. Interesting, Isn’t it? This is what we call Artificial Intelligence.
Artificial Intelligence in today’s world is entering every domain in our daily lives, and we can undoubtedly conclude that the future of technology is here. From various voice assistants to chatbots, over all these years, Artificial Intelligence has proved that it indeed is here to stay. So why not seize the opportunity and build a career out of it?
In this article, we will share a concise introduction to artificial intelligence and which skills can assist you in creating a vocation in this field. This is just a microscopic viewpoint of the explicit learn path, link of which is mentioned at the end of the article.
What is Artificial Intelligence?
Emerging technologies like Artificial Intelligence and Data Science have made our life easier. Artificial intelligence, or AI as it’s more commonly called, alludes to the reenactment of human insight in machines that are modified to think like and copy humans.
It is the development of computer systems that can perform tasks that predominantly require human intelligence. These include visual perception, decision-making, speech recognition, and language translation.
Educational Requirements
Artificial Intelligence is a highly demanding and skill-intensive field. Since it is related to computer science, it needs a certain level of technical expertise and technological know-how. Hence, before even starting with the learning process, the primary prerequisite you must meet is a Bachelor’s degree in fields relevant to Artificial Intelligence such as Computer Science, Engineering, Mathematics, Statistics, and Information Technology.
If you have a Bachelor’s degree in math-intensive fields such as Economics and Finance, that can help you kickstart your journey in Artificial Intelligence as well.
Master Machine Learning with Our Top-Rated Training Course in Noida.pptxAPTRON Solutions Noida
The trainers leading the Machine Learning Training Course in Noida are industry experts with extensive experience in the field. They bring a wealth of practical knowledge and real-world examples to the training sessions, helping participants grasp complex concepts more effectively. The trainers also provide personalized guidance and support, ensuring each learner receives the attention they need to succeed. Moreover, the course includes practical projects and assignments that enable participants to apply their newly acquired knowledge to real-world scenarios. These projects provide invaluable hands-on experience and build a strong portfolio, enhancing your credibility as a machine learning professional. Additionally, the training institute in Noida offers a collaborative learning environment, allowing participants to interact with their peers, share ideas, and gain insights from diverse perspectives.
https://aptronsolutions.com/best-machine-learning-training-in-noida.html
Artificial intelligence is the field of computer science that aims to make a machine act like a human. It does this by giving it the ability to learn and understand the world around it. And this field has been advancing at an amazing pace in recent years. For example, we have AI that can play chess better than any human, AI that can recognize faces more accurately than any person, and AI that can translate languages more fluently than any human. There are many different types of artificial intelligence algorithms such as supervised learning, unsupervised learning and reinforcement learning.
The world today is evolving and so are the needs and requirements of people. Furthermore, we are witnessing a fourth industrial revolution of data.
Machine Learning has revolutionized industries like medicine, healthcare, manufacturing, banking, and several other industries. Therefore, Machine Learning has become an essential part of modern industry.
This Data Science course emphasises on Project-Based Learning to meet the learning needs of students from various background and make them job-ready. Learn Data Science like a pro and our methodology invoke thought process in the learner to solve problems. Post completion of the course, learners could independently build a Data Science solution using Machine Learning models. You would be offered a chance to secure an internship with relevant industries and participate in our hackathons.
Becoming a proficient data analyst involves understanding the role's significance, developing strong analytical skills using tools like Python and R, mastering data collection and cleaning, and embracing data visualization and basic machine learning concepts. Enrolling in a data science course accelerates the journey by providing structured learning, expert guidance, and hands-on projects. Choosing tech-rich cities like Pune, Agra, Lucknow, Jaipur, Hyderabad, Chennai, or Patna offers vibrant tech communities, expert faculty, and industry exposure. Practical experience, soft skills, and staying updated are crucial. This path equips you to decode complex data and drive informed decisions in diverse sectors.
IoT brings together all kinds of connected devices into a global network of distributed intelligence. This evolution opens up a new world of innovation and creativity. IoT is not just a technology, it’s a leadership opportunity; a mechanism to transform businesses. This 1-day corporate training program helps senior executives understand and capitalize on the opportunities IoT provides.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
1. INDUSTRIAL TRAINING REPORT
Machine Learning
Submitted
In Partial Fulfillment of the Requirements
For the Degree of
Bachelor of Technology
In
Computer Science and Engineering
By
HRJEET SINGH
Roll No. 1700410019
2017-2021
2021
Sponsored by
Internshala
Noida Gurgaon U.P
2. Table of Content
Declaration i
Certificate ii
Acknowledgement iii
Abstract iv
1.0 Introduction ….
2.0 Company Background & Structure ….
3.0 Weekly Job Summary ….
3.1 Daily Records
3.2 About the Training
3.3 Training Schedule and location
4.0 Technical Contents ….
4.1 Description of tasks
5.0 Learning outcome and work experience ….
5.1 Application of Theory and skills
6.0 Conclusion
References
3. Declaration
I hereby declare that I have completed my six weeks summer
training at Internshala(one of the world’s leading online
certification training providers) from 24th Nov, 2020 to 05th Jan,
2020 under the guidance of Mr. Kunal Jain and Mr. Sunil Roy. I
have declared that I have worked with full dedication during these
six weeks of training and my learning outcomes fulfill the
requirements of training for the award of degree of Bachelor of
Technology (B.Tech.) in Computer Science and Engineering,
Raja Balwant Singh Engineering Technical Campus
Name : Hrjeet Singh
Roll.No. 1700410019
Date:
5. ACKNOWLEDGEMENT
I would like to acknowledgement the contribution of the following people
without whose help and guidance this report would not have completed .
I acknowledgement the counsel and support of our training coordinator, Mr.
BrajeshKumar singh, Head of CSE Department, with respect and gratitude,
whose expertise, guidance, support, encouragement, and enthusiasm has
made this report possible . their feedback vastly improve the quality of this
report and provided an enthralling experience. I am indeed proud and
fortunate to be supported by him.
Although it is not possible to name individually, I shall ever remain indebted
to the faculty members of R.B.S. Engineering Technical Campus Bichpuri,
Agra for their persistent support and cooperation extended during this work.
This acknowledgement will remain incomplete if I fail to express our deep
sense of obligation to my parents and God for their consistent blessings and
encouragement.
Hrjeet singh
1700410019
6. Abstract
Industrial training is an important phase of a student life. A well
planned, properly executed and evaluated industrial training helps a
lot in developing a professional attitude. It develop an awareness of
industrial approach to problem solving, based on a broad
understanding of process and mode of operation of organization.
The aim and motivation of this industrial training is to receive
discipline, skills, teamwork and technical knowledge through a
proper training environment, which will help me, as a student in the
field of Information Technology, to develop a responsiveness of the
self-disciplinary nature of problems in information and
communication technology
7. Company Background & structure
Company profile
Internshala was created with a mission to create skilled software engineers
for our country and the world. It aims to bridge the gap between the quality of
demanded by industry and the quality of skills imparted by conventional
institute. With assessments, learning paths and courses authored by industry
experts, Internshala help businesses and individual benchmark expertise
across roles, speed up release cycles and build reliable, secure products.
VISION
We are a technology company on a mission to equip students with relevant
skills & practical exposurethrough internshipsand online trainings. Imagine a
world full of freedom and possibilities. A world where you can discover your
passion and turn it into your career. A world where your practical skills
matter more than your university degree. A world where you do not have to
wait till 21 to taste your first work experience (and get a rude shock that it is
nothing like you had imagine it to be). A world where you graduate fully
assured, fully confident, and fully prepared to stake claim on your place in the
world.
History
The platform, which was founded in 2010, started out as a WordPress blog
that aggregated internships across India and articles on education, technology
and skill gap. Internshala launched its online trainings in 2014. As of 2018,
the platform had 3.5 million students and 80,000 companies.
Mission
nternshala's mission is to equip every student with practical skills and
exposure so that they can build their dream careers. And our e-learning
platform, Internshala Trainings ( https://trainings.internshala.com) is central
to this mission. Internshala Trainings' goal is simple - to make learning easy.
8. Objectives
Main objective of training were to learn:
How to determine and measure program complexity.
Python programming.
Machine learning Library Scikit Learn, Numpy , Matplotlib , Pandas ,
Seaborn.
Statistical Math for the Algorithms.
Learning to solve and Mathematical concepts.
Supervised and Unsupervised learning.
Classification and Regression.
Machine learning algorithms.
Machine Learning Programming and Use Cases.
Weekly Summery
Week 1 Introduction to machine learning, Introduction to
data, Assignment 1, assignment 2
Week 2 Introduction to python and Data Exploration and
Preprocessing. Assingment 3, Assignment 4.
Week 3 Linear Regression and Introduction to
Dimensionality Reduction. Assingment 5,
Assignment 6.
Week 4 Logistic Regression and Decision Tree.
Assingment7, Assignment 8.
Week 5 Ensemble model. Assingment 9.
Week 6 Clustering, project. Assingment 10.
9. About the training
Training is the process of teaching, informing or educating people so
that they may become well qualified as possible to do their job, and
they become qualified to perform in positions of greater difficulty and
responsibility.
Training is an organized and planned effort by a company in order to
facilitate employees learning regarding job related competencies.
Industrial training at Internshala from 24th
November 2020 to
05th January 2021.
I completed my online industrial training from “Internshala”
located in Gurgaon whose time period was of 42 days.
I have completed my online training under the guidance MR. Kunal Jain
and MR. Sunil Roy
10. Introduction To Machine Learning
Machine learning enables a machine to automatically learn from
data, improve performance from experiences, and predict things
without being explicitly programmed.
In the real world, we are surrounded by humans who can learn
everything from their experiences with their learning capability, and
we have computers or machines which work on our instructions.
But can a machine also learn from experiences or past data like a
human does? So here comes the role of Machine Learning.
Type of machine learning
The types of machine learning algorithm differ in their approach, the
type of data they input and output, and the type of task or problem
that they are intended to solve. Broadly machine learning can be
categorized into two categories,
I. Supervised Learning
II. Unsupervised Learning
Supervised Learning
Supervised Learning is a type of learning in witch we are given a
data set and we already know what are correct output should look
like, having the idea that there is a relationship between the input
11. and output. Basically, it is learning task of learning a function that
maps an input to an output based on example input-output pair.
Unsupervised Learning
Unsupervised learning is a type of learning that allow us to
approach problems with little or no idea our problem should look
like. We can derive the structure by clustering the data based on
relationship among the variables in data. With unsupervised
learning there is no feedback based on prediction result. Basically , it
is a type of self-organized learning that help in finding previously
unknown patterns in the data set without pre-existing label.
Data
Data is collection of information about any things.
Ex. Notification, Activity of time, Clock alarm etc.
Two type of data use in machine learning models,
1 Labeled data
2 Unlabeled data
Labeled data
The data which contain a target variable or an output variable that
answer a question of interest is called labeled data.
Unlabeled data
Unlabeled data is a designation for pieces of data that have not been
tagged with labels identifying characteristics, properties or
classifications.
12. Introduction to Python
Python is a widely used general-purpose, high level programming
language. It was initially designed by Guido van Rossum in 1991 and
developed by Python Software Foundation. It was mainly developed
for an emphasis on code readability, and its syntax allows
programmers to express concepts in fewer lines of code. Python is
dynamically typed and garbage-collected. It supports
multiple programming paradigms, including procedural, object-
oriented, and functional programming. Python is often described as
a "batteries included" language due to its comprehensive standard
library.
Basic Libraries in Python
Scikit-learn for handling basic ML algorithms like clustering, linear and
logistic regressions, regression, classification, and others.
Pandas for high-level data structures and analysis. It allows merging
and filtering of data, as well as gathering it from other external sources
like Excel, for instance.
Matplotlib for creating 2D plots, histograms, charts, and other forms of
visualization.
NumPy is a general-purpose array-processing package. It provides
a high-performance multidimensional array object, and tools for
working with these arrays
13. Data Preprocessing
Machine learning on’t work so well with processing raw data. Before we can
feed such data to an ML algorithm, we must preprocess it. We must apply
some transformations on it. With data preprocessing, we convert raw data
into a clean data set. To perform data this, there are 6 techniques-
1. Rescaling Data -For data with attributes of varying scales, we can
rescale attributes to possess the same scale. We rescale attributes into
the range 0 to 1 and call it normalization. We use the Min Max Scaler
class from scikit-learn. This gives us values between 0 and 1
2. Normalizing Data -In this task, we rescale each observation to a length
of 1 (a unit norm). For this, we use the Normalizer class.
3. Mean Removal-We can remove the mean from each feature to center it
on zero.
4. Some labels can be words or numbers. Usually, training data is labelled
with words to make it readable. Label encoding converts word labels
into numbers to let algorithms work on them.
5. One Hot Encoding -When dealing with few and scattered numerical
values, we may not need to store these. Then, we
can perform OneHot Encoding. For k distinct values, we can transform t
he feature into a k-dimensionalvector with one value of 1 and 0 as the
rest values.
6. Standardizing Data -With standardizing, we can take attributes with a
Gaussian distribution and different means and standard deviations and
transform them into a standard Gaussian distribution with a mean of 0
and a standard deviation of 1.
14. Exploratory Data Analysis (EDA)
It is the process of summarizing, visualizing and getting deeply
acquainted with the important traits of a data set. When you carry out
EDA, domainknowledge(e.g. about thebusinessor social impact category)
canhelp a great dealin understanding thedataand extracting insights from
it.
To achieve this level of certainty, here’s what you can do with EDA:
Understand how the raw data was collected
Get familiar with different characteristics of the data
Learn about the individual features and their mutual relationships (or
lack of)
Check and validate the data for anomalies, outliers, missing values,
human errors, etc.
Extract insightsthat weren’t soevident to businessstakeholders but can
provide useful information about the business
Discover hidden patterns in the data that allow for better
comprehension of the business problem
Validate if the data has been generated in an expected manner
15. Linear regression
Linear regression may be defined as the statistical model that
analyzes the linear relationship between a dependent variable with
given set of independent variables. Linear relationship between
variables means that when the value of one or more independent
variables will change (increase or decrease), the value of dependent
variable will also change accordingly (increase or decrease).
Mathematically the relationship can be represented with the help of
following equation −
Y = mX + c
Here:
Y=Dependent Variable (Target Variable)
X=Independent Variable (predictor Variable)
C= intercept of the line
m=Linear regression coefficient
16. Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It
measures how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping
function, which maps the input variable to the output variable. This
mapping function is also known as Hypothesis function.
MAE (Mean absolute error) represents the difference between the original
and predicted values extracted by averaged the absolute difference over the
data set.
MSE (Mean Squared Error)representsthe differencebetween the original and
predicted values extracted by squared the average difference over the data
set.
RMSE (Root Mean Squared Error) is the error rate by the square root of MSE.
R-squared (Coefficient of determination) represents the coefficient of how
well the values fit compared to the original values. The value from 0 to 1
interpreted as percentages. The higher the value is, the better the model is.
The above metrics can be expressed,
,
17. Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating
the gradient of the cost function.
o A regression model uses gradient descent to update the
coefficients of the line by reducing the cost function.
o It is done by a random selection of values of coefficient and
then iteratively update the values to reach the minimum cost
function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set
of observations. The process of finding the best model out of various
models is called optimization. It can be achieved by below method:
1. R-squaredmethod:
o R-squared is a statistical method that determines the goodness
of fit.
o It measures the strength of the relationship between the
dependent and independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference
between the predicted values and actual values and hence
represents a good model.
o It is also called a coefficient of determination, or coefficient
of multiple determination for multiple regression.
o It can be calculated from the below formula:
18. Assumptions ofLinear Regression
Below are some importantassumptionsof Linear Regression. These are some
formalchecks while buildinga Linear Regression model, which ensuresto get
the best possible resultfrom the given dataset.
Linear relationship between the features and target: Linear regression
assumes the linear relationship between the dependent and independent
variables.
Small or no multi collinearity between the features: Multi collinearity
means high-correlation between the independent variables. Due to multi
collinearity, it may difficult to find the true relationship between the
predictors and target variables. Or we can say, it is difficult to determine
which predictor variable is affecting the target variable and which is not. So,
the model assumes either little or no multi collinearity between the features
or independent variables.
Homoscedasticity Assumption: Homoscedasticity is a situation when the
error term is the same for all the values of independent variables. With
homoscedasticity, there should be no clear pattern distribution of data in the
scatter plot.
Normal distribution of error terms: Linear regression assumes that the
error term should follow the normal distribution pattern. If error terms are
not normally distributed, then confidence intervals will become either too
wide or too narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without
any deviation, which means the error is normally distributed.
No autocorrelations: The linear regression model assumes no
autocorrelation in error terms. If there will be any correlation in the error
term, then it willdrastically reduce the accuracy of the model. Autocorrelation
usually occurs if there is a dependency between residual errors.
19. Introduction to Dimensionality Reduction.
The number of inputfeatures, variables, or columns present in a given dataset
is known as dimensionality, and the process to reduce these features is called
dimensionality reduction.
A dataset contains a huge number of input features in various cases, which
makes the predictive modeling task more complicated. Because it is very
difficult to visualize or make predictions for the training dataset with a high
number of features, for such cases, dimensionality reduction techniques are
required to use.
Dimensionality reduction technique can be defined as, "It is a way of
converting the higher dimensions dataset into lesser dimensions dataset
ensuring that it provides similar information." These techniques are widely
used in machine learning for obtaining a better fit predictive model while
solving the classification and regression problems
Missing Value Ratio : If a dataset has too many missing values, then we drop
those variables as they do not carry muchuseful information. To perform this,
we can set a threshold level, and if a variable has missing values more than
that threshold, we will drop that variable. The higher the threshold value, the
more efficient the reduction.
Low Variance Filter : As same as missing valueratio technique, data columns
with some changes in the data have less information. Therefore, we need to
calculate the variance of each variable, and all data columns with variance
lower than a given threshold are dropped because low variance features will
not affect the target variable.
High Correlation Filter: High Correlation refers to the case when two
variables carry approximately similar information. Due to this factor, the
performance of the model can be degraded. This correlation between the
independent numerical variable gives the calculated value of the correlation
coefficient. If this value is higher than the threshold value, we can remove one
of the variables from the dataset. We can consider those variables or features
that show a high correlation with the target variable.
20. Backward Feature Elimination
The backward feature elimination technique is mainly used while developing
Linear Regression or Logistic Regression model. Below steps are performed in
this technique to reduce the dimensionality or in feature selection:
o In this technique, firstly, all the n variables of the given dataset are taken
to train the model.
o The performance of the model is checked.
o Now we will remove one feature each time and train the model on n-1
features for n times, and will compute the performance of the model.
o We will check the variable that has made the smallest or no change in
the performance of the model, and then we will drop that variable or
features; after that, we will be left with n-1 features.
o Repeat the complete process until no feature can be dropped.
In this technique, by selecting the optimum performance of the model and
maximum tolerable error rate, we can define the optimal number of features
require for the machine learning algorithms.
Forward Feature Selection
Forward feature selection follows the inverse process of the backward
elimination process. It means, in this technique, we don't eliminate the
feature; instead, we will find the best features that can produce the highest
increase in the performance of the model. Below steps are performed in this
technique:
o We start with a single feature only, and progressively we will add each
feature at a time.
o Here we will train the model on each feature separately.
o The feature with the best performance is selected.
o The process will be repeated until we get a significant increase in the
performance of the model.
21. Logistic regression
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique.
It is used for predicting the categorical dependent variable using a
given set of independent variables.
Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. but instead
of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except
that how they are used. Linear Regression is used for solving
Regression problems, whereas Logistic regression is used for
solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an
"S" shaped logistic function, which predicts two maximum values (0
or 1).
22. Logistic Function (Sigmoid Function):
o The sigmoid function is a mathematical function used to map
the predicted values to probabilities.
o It maps any real value into another value within a range of 0
and 1.
Z=mx+c
o The value of the logistic regression must be between 0 and 1,
which cannot go beyond this limit, so it forms a curve like the
"S" form. The S-form curve is called the Sigmoid function or the
logistic function.
o In logistic regression, we use the concept of the threshold
value, which defines the probability of either 0 or 1. Such as
values above the threshold value tends to 1, and a value below
the threshold values tends to 0.
G(x)=1/1+e-x
Y^ =g(x)
Y^ =1/1+e-(mx+c)
If z is very large positive value
e-(mx+c) =0 y^=1
If z is very large negative value
e-(mx+c) = large positive y^ =0
Confusion Matrix in Machine Learning:
The confusion matrix is a matrix used to determine the
performance of the classification models for a given set of test
data. It can only be determined if the true values for test data are
23. known. The matrix itself can be easily understood, but the related
terminologies may be confusing. Since it shows the errors in the
model performance in the form of a matrix, hence also known as
an error matrix. Some features of Confusion matrix are given
below.
Predict outcome
Positive Negative
Actual value Positive
Negative
The above table has the following cases:
o True Negative: Model has given prediction No, and the real or
actual value was also No.
o True Positive: The model has predicted yes, and the actual
value was also true.
o False Negative: The model has predicted no, but the actual
value was Yes, it is also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual
value was No. It is also called a Type-I error.
Accuracy: It is one of the important parameters to determine the
accuracy of the classification problems. It defines how often the
model predicts the correct output. It can be calculated as the ratio of
the number of correct predictions made by the classifier to all
number of predictions made by the classifiers. The formula is given
below:
TP FN
FP TN
24. Accuracy=correct prediction/total prediction
Accuracy= TP+TN / TP+TN+FP+FN
Precision: It can be defined as the number of correct outputs
provided by the model or out of all positive classes that have
predicted correctly by the model, how many of them were actually
true. It can be calculated using the below formula
Precision= TP/TP+FP
Recall: It is defined as the out of total positive classes,
how our model predicted correctly. The recall must be as
high as possible.
Recall = TP / TP+FN
F-measure: If two models have low precision and high recall or vice
versa, it is difficult to compare these models. So, for this purpose, we
can use F-score. This score helps us to evaluate the recall and
precision at the same time. The F-score is maximum if the recall is
equal to the precision. It can be calculated using the below formula:
F-measure = 2 * Recall * Precision/ Recall + Precision
Misclassification rate: It is also termed as Error rate, and it defines
how often the model gives the wrong predictions. The value of error
rate can be calculated as the number of incorrect predictions to all
number of the predictions made by the classifier. The formula is
given below:
Misclassification rate = FP+FN/TP+TN+FP+FN
25. Decision tree
Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents
the outcome.
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of
the given dataset.
It is a graphical representation for getting all the possible solutions to
a problem/decision based on given conditions.
A decision can contain categorical data (Yes/No) as well as
numerical data.
Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or
more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot
be segregated further after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root
node into sub-nodes according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted
branches from the tree.
Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
26. Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how
to select the best attribute for the root node and for sub-nodes. So,
to solve such problems there is a technique which is called
as Attribute selection measure or ASM. By this measurement, we
can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
1. Information Gain:
o Information gain is the measurement of changes in entropy
after the segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about
a class.
o According to the value of information gain, we split the node
and build the decision tree.
o A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the
below formula:
Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature)
Entropy: Entropy is a metric to measure the impurity in a given
attribute. It specifies randomness in data. Entropy can be calculated
as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where, S= Total number of samples P(yes)= probability of yes
P(no)= probability of no
27. Gini Index:
o Gini index is a measure of impurity or purity used while
creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as
compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
o Gini index can be calculated using the below formula:
Gini = 1- Gini
Gini Index= 1- ∑jPj
2
Ensemble method
Ensemble methods is a machine learning technique that combines
several base models in order to produce one optimal predictive
model. To better understand this definition lets take a step back into
ultimate goal of machine learning and model building. This is going
to make more sense as I dive into specific examples and why
Ensemble methods are used.
Types of Ensemble Methods
1. BAGGING, or Bootstrap aggregating. Bagging gets its name
because it combines Bootstrapping and Aggregation to form one
ensemble model. Given a sample of data, multiple bootstrapped
subsamples are pulled. A Decision Tree is formed on each of the
bootstrapped subsamples. After each subsample Decision Tree
28. has been formed, an algorithm is used to aggregate over the
Decision Trees to form the most efficient predictor. The image
below will help explain:
Random Forest
Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for both
Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the
performance of the model.
As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest
takes the prediction from each tree and based on the majority votes
of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy
and prevents the problem of over fitting.
29. The Working process can be explained in the below steps and
diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data
points (Subsets).
Step-3: Choose the number N for decision trees that you want to
build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision
tree, and assign the new data points to the category that wins the
majority votes.
30. Clustering
Clustering or cluster analysis is a machine learning technique, which
groups the unlabeled dataset. It can be defined as "A way of
grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities
remain in a group that has less or no similarities with another
group."
K-Mean clustering alga
K-Means Clustering is an unsupervised learning algorithm that is
used to solve the clustering problems in machine learning or data
science. In this topic, we will learn what is K-means clustering
algorithm, how the algorithm works, along with the Python
implementation of k-means clustering.
It is an iterative algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one
group that has similar properties.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by
an iterative process.
o Assigns each data point to its closest k-center. Those data
points which are near to the particular k-center, create a
cluster.
31. The working of the K-Means algorithm is explained in the below
steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from
the input dataset).
Step-3: Assign each data point to their closest centroid, which will
form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each
cluster.
Step-5: Repeat the third steps, which means reassign each data
point to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to
FINISH.
Step-7: The model is ready.
32. Conclusion
The Industrial Training program should be taken seriously to ensure
that maximum benefit is obtained by the student in order to
increase their knowledge.
The Industrial Training component can add value to all degree
programs; specifically, it improves graduate’s work skills and
prepares them to face the challenges of the working world.
Apart from the learning from faculty, learning from the peers played
a major role during that period.
References
Analytics Vidhya - Learn Machine learning, artificial intelligence, business analytics, data
science, big data, data visualizations tools and techniques. | Analytics Vidhya.
Machine LearningAlgorithms - Javatpoint
Machine LearningTraining| LearnMachine LearningOnline |InternshalaTrainings