ML_Module_1.pdf

Machine Learning
MS (Data Science)

Machine Learning
• Machine Learning is concerned with computer programs that
automatically improve their performance through experience.“
• Machine learning is a subset of artificial intelligence (AI) that
focuses on developing algorithms and models that enable
computers to learn from and make predictions or decisions
based on data without being explicitly programmed.
• Machine learning is like teaching a computer to learn from
examples and make decisions on its own; without giving it
explicit instructions. It's a way for computers to become smarter
by analyzing data and recognizing patterns.

How Machine Learning Works?
• Imagine you want to teach a computer to identify whether a
fruit is an apple or an orange.
• Instead of telling the computer exactly what to look for (like
color, shape, or size), you show it many pictures of apples and
oranges and let it figure out the differences.

• Data Collection: First, you gather lots of pictures of apples and oranges.
These pictures are your "training data."
• Features: You might point out some basic features of the fruits, like color
or shape, to the computer. These are the characteristics it should pay
attention to.
• Training: You show the computer the pictures of apples and oranges, and
you tell it which is which. The computer starts to notice patterns, like
"Apples are usually round and red, while oranges are often orange and
spherical.“
• Prediction: After enough training, you show the computer a new picture
of a fruit it hasn't seen before. Based on what it learned from the training
data, it makes a guess: "I think this is an apple."

• Feedback: If the computer's guess is correct, great! If not, you tell it the
correct answer, and it learns from its mistake.
• Improvement: Over time, the computer gets better at recognizing apples
and oranges because it keeps learning from more and more examples.

How Machine Learning-based Systems helps?
• Machine learning can do much more than just identifying fruits. It's used
in things like:
• Recommendation Systems: Think of how Netflix suggests movies you
might like based on what you've watched before.
• Speech Recognition: Virtual assistants like Siri or Alexa understand and
respond to your voice commands.
• Medical Diagnosis: ML can help doctors analyze medical images like X-
rays or MRIs to detect diseases.
• Autonomous Vehicles: Self-driving cars use ML to recognize and react to
their surroundings.

Understanding the Fundamental
Concepts and Terminologies of
Machine Learning.

Fundamental concepts and Terminologies
• Data: Data is the raw information used in machine learning. It can be in
the form of text, numbers, images, or any other type of information that is
processed by machine learning algorithms.
• Features: Features are specific attributes or characteristics of the data that
are used as input variables for machine learning models. For example, in a
spam email detection system, features could include the sender's address,
the email's subject, and the words used in the email.
• Labels: Labels are the desired output or target variable in supervised
learning. They represent the correct or expected answer that the machine
learning model should predict. In a spam email detection system, the label
would be "spam" or "not spam."

• Model: A model is a mathematical representation or algorithm used by a
machine learning system to make predictions or decisions based on input
data. Models can be as simple as linear regression or as complex as deep
neural networks.
• Training: Training is the process of teaching a machine learning model by
providing it with a labelled dataset. During training, the model learns to make
predictions by adjusting its internal parameters to minimize the difference
between its predictions and the actual labels in the training data.
• Testing/Evaluation: After training, machine learning models are tested on a
separate dataset, often called the testing or validation dataset, to assess their
performance. This helps determine how well the model generalizes to new,
unseen data.

• Supervised Learning: In supervised learning, the model is trained using a
labelled dataset, where the input data is paired with corresponding output
labels. The model learns to map input data to output labels.
• Unsupervised Learning: Unsupervised learning involves training models
on unlabeled data to discover patterns, structures, or groupings within the
data. Clustering and dimensionality reduction are common tasks in
unsupervised learning.
• Semi-Supervised Learning: Semi-supervised learning combines elements
of both supervised and unsupervised learning. It uses a small amount of
labelled data and a larger amount of unlabeled data to improve model
performance.

• Reinforcement Learning: Reinforcement learning is a type of machine learning
where an agent interacts with an environment and learns to make a sequence of
decisions to maximize a cumulative reward. It is often used in robotics and game-
playing.
• Overfitting: Overfitting occurs when a machine learning model performs well on
the training data but poorly on new, unseen data. It is a result of the model
fitting too closely to noise in the training data.
• Underfitting: Underfitting happens when a machine learning model is too
simple to capture the underlying patterns in the data, resulting in poor
performance on both the training and testing datasets.
• Hyperparameters: Hyperparameters are settings or configurations of a machine
learning model that are not learned from the data but are set before training.
Examples include learning rates and the number of hidden layers in a neural
network.

• Feature Engineering: Feature engineering is the process of selecting,
transforming, or creating new features from the raw data to improve the
performance of a machine learning model.
• Cross-Validation: Cross-validation is a technique used to assess the
performance of a machine learning model by splitting the data into
multiple subsets and training/evaluating the model on different
combinations of these subsets.
• Bias and Variance: Bias refers to the error introduced by overly simplistic
assumptions in the learning algorithm, while variance refers to the error
introduced by the model's sensitivity to small fluctuations in the training
data.

Machine Learning Life Cycle
The machine learning life cycle refers to the series of steps and processes involved in
developing, deploying, and maintaining a machine learning model. Here's an overview of
the typical machine learning life cycle:
1. Problem Definition: The first step is to clearly define the problem you want to solve
with machine learning. What are your goals, objectives, and success criteria? Identify the
specific task you want the model to perform, such as classification, regression, clustering,
or recommendation.
2. Data Collection: Gather relevant data that will be used to train and test the machine
learning model. Ensure that the data is representative of the problem and that it's of
sufficient quality and quantity.
3. Data Preprocessing: Prepare and clean the data for analysis. This step involves
handling missing values, dealing with outliers, normalizing or scaling features, and
encoding categorical variables. The goal is to make the data suitable for training a model.

4. Feature Engineering: Create or select the most relevant features (input variables) that
the model will use for learning. Feature engineering may involve transforming, combining,
or selecting features to improve model performance.
5. Data Splitting: Split the dataset into two or more subsets: typically a training set, a
validation set, and a test set. The training set is used to train the model, the validation set
is used to tune hyperparameters and evaluate model performance during development,
and the test set is used to assess the final model's generalization.
6. Model Selection: Choose the appropriate machine learning algorithm or model
architecture for your problem. This choice depends on the nature of the data and the task
(e.g., decision trees, neural networks, support vector machines, etc.).
7. Model Training: Train the selected model using the training dataset. During training,
the model learns the underlying patterns in the data and adjusts its internal parameters to
make accurate predictions.

8. Hyperparameter Tuning: Fine-tune the model's hyperparameters (settings that control
the learning process) using the validation dataset. Common hyperparameters include
learning rates, regularization strengths, and the number of hidden layers in neural
networks.
9. Model Evaluation: Assess the model's performance using the validation dataset.
Common evaluation metrics vary depending on the problem type and may include
accuracy, precision, recall, F1-score, mean squared error, etc.
10. Model Testing: Once satisfied with the model's performance on the validation set,
evaluate its performance on the separate, untouched test dataset to ensure it generalizes
well to new, unseen data.
11. Model Deployment: If the model meets the desired performance criteria, it can be
deployed into a production environment where it can make real-time predictions or
decisions. This may involve integrating the model into software applications or systems.

12. Monitoring and Maintenance: Continuously monitor the deployed model's
performance in the production environment. Update the model as needed to adapt to
changes in the data distribution or to improve performance over time.
13. Feedback Loop: Collect feedback from users and the production system to identify
issues, gather additional data, and refine the model further. This iterative process helps
maintain and improve the model's accuracy and relevance.

Differentiating between supervised,
unsupervised, and reinforcement
learning.

Supervised Learning
Supervised learning is like teaching a computer with a teacher. You provide the
computer with a dataset that includes both input data and the correct answers (labels).
In supervised learning, we require the help of previously collected data in order to
train our models.
How it works: The computer learns to make predictions or decisions by finding
patterns in the data. It figures out how to map input data to the correct answers.
Examples: Predicting house prices based on features (e.g., size, location), classifying
emails as spam or not, recognizing handwritten digits (like in ZIP code recognition).
Key Point:
Supervised learning needs labelled data to learn and is great for tasks with clear
answers.
Supervised learning has methods like classification, regression, naïve bayes theorem,
SVM, KNN, decision tree, etc.

Unsupervised Learning
Unsupervised learning is like asking the computer to find hidden patterns on its own
without a teacher. You provide the computer with data, but it doesn't have the correct
answers (labels). Unsupervised learning needs no previous data as input.
How it works: The computer explores the data to discover similarities, differences, or
groupings within it. It's like finding hidden structures in a puzzle without knowing what the
final picture should look like.
Examples: Clustering similar customer profiles for targeted marketing, topic modelling in
text data, or reducing data dimensions for visualization.
Key Point:
• Unsupervised learning is used when you want the computer to uncover hidden insights
or structures in your data.
• Various mathematical concepts like Euclidean distance, Manhattan distance

Reinforcement Learning
Reinforcement learning is like training a dog with rewards and punishments. The
computer interacts with an environment and learns to make a sequence of
decisions to maximize a reward. Reinforcement Learning is enforcing models to
learn how to make decisions. This type of learning is awesome to learn and is one
of the most researched fields in ML.
How it works: The computer takes actions, receives feedback (reward or penalty),
and adjusts its future actions to maximize the total reward over time. It's about
learning through trial and error.
Examples: Training a self-driving car to navigate traffic, teaching a robot to play
chess or control a drone, and optimizing resource allocation in businesses.
Key Point: Reinforcement learning is used when there's a need to make a series of
decisions to achieve a long-term goal, and it's often used in dynamic and complex
environments.

ML_Module_1.pdf

Recommended

Recommended

More Related Content

Similar to ML_Module_1.pdf

Similar to ML_Module_1.pdf (20)

Recently uploaded

Recently uploaded (20)

ML_Module_1.pdf