Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx

阮山松 – NGUYEN SON TUNG - F112169103
Chapter 5 Introduction to
Machine Learning with Scikit-
learn

• Enabling computers to learn from data without explicit programming.
• Key idea: Machine learning algorithms adapt their behavior based on
data, improving over time with experience.
• Quote: “A program learns from experience E for tasks T with
performance measure P, if its performance at tasks T, as measured by
P, improves with experience E.” — Tom Mitchell.
• Examples: Email spam detection, chess-playing algorithms.
What is Machine Learning?

• Objective: Machine learning focuses on discovering patterns or
predicting outcomes based on data.
• Supervised Learning: Uses labeled data where the outcome is known.
oExample: Spam detection with labeled emails.
• Unsupervised Learning: Identifies patterns in unlabeled data.
oExample: Market basket analysis (finding items that often appear
together).
Learning from Data: Two Approaches

• A machine learning approach that uses labeled datasets to train
models, allowing them to predict outcomes based on labeled
examples.
• Types:
oClassification: Predicts discrete labels (e.g., spam or not spam).
oRegression: Predicts continuous values (e.g., house prices).
• Key Feature: Requires labeled data from domain experts or logs.
Supervised Learning: Overview

• Assigning each data point to a predefined category based on training
data.
• Process: Model is trained with labeled data to classify new data into
categories.
• Examples:
oFraud detection: Classify transactions as fraudulent or legitimate.
oSentiment analysis: Classify text as positive, negative, or neutral.
Classification in Supervised Learning

• Captures relationships between dependent and independent
variables to predict continuous outcomes.
• Predict a continuous output variable based on input features.
• Example: Predicting stock prices based on historical data.
• Key Differentiator: Output is a non-stop variety as opposed to discrete
classes.
Regression in Supervised Learning

• A method of exploring data without labeled outcomes to find hidden
structures.
• Discover underlying patterns or groupings within the data.
• No labeled data: Learns from the structure and similarities in the
dataset itself.
Unsupervised Learning: Overview

Applications of Unsupervised Learning
• Clustering: Grouping similar data points together to identify natural
clusters within the data.
oExample: Customer segmentation in marketing.
• Association: Finds items that frequently occur together.
oExample: Market basket analysis in retail (e.g., milk and bread
bought together).

Structure of a Machine Learning System
• Offline Process: Training stage where the model learns from historical
data.
oObjective: Learn patterns and relationships.
• Online Process: Prediction stage where the model makes predictions
on new, unseen data.
oExample: A spam classifier trained on old emails predicts if new
emails are spam or not.

• Six Key Stages:
1. Problem Understanding: Define and understand the problem.
2. Data Collection: Gather relevant data.
3. Data Annotation and Preparation: Clean, label, and prepare data.
4. Data Wrangling: Transform data into the required format.
5. Model Development, Training, and Evaluation: Train and assess
the model.
6. Model Deployment and Maintenance: Integrate into production
and monitor.
The Machine Learning Process

• Clarify goals and set the scope for the machine learning project.
• Actions:
oDefine the problem with stakeholders.
oDetermine measurable outcomes and success criteria.
oExamples: Identify spam emails, classify product reviews.
Step 1: Problem Understanding

Step 2: Data Collection
• Gather quality data relevant to the problem.
• Data sources: Transaction logs, user behavior logs, public datasets,
etc.
• Key Consideration: Quality and relevance of data directly affect model
performance.

Step 3: Data Annotation and Data Preparation
• Data Annotation: Label data for supervised learning.
oExample: Tagging images with objects for object detection.
• Data Preparation: Cleaning, reformatting, and normalizing raw data
for model compatibility.
• Objective: Ensure data quality and consistency.

Step 4: Data Wrangling
• Transform data into a numeric format suitable for model training.
• Process: Convert data into feature vectors (numeric arrays) using
libraries like NumPy.
• Ensures compatibility with algorithms and standardizes input format.

Step 5: Model Development, Training, and
Evaluation
• Model Development: Selecting and configuring algorithms (e.g.,
Scikit-learn’s SVM, decision trees).
• Training: Model learns from a large portion of the data.
• Evaluation: Test the model on unseen data to assess performance.
• Tune parameters based on results to improve accuracy.

Step 5: Model Development, Training, and
Evaluation

Step 6: Model Deployment
• Integration: Incorporate the model into production systems.
• Inference: Model processes new data and generates predictions.
• Data Collection: Gather new data from real-world usage.
• Model Improvement: Use collected data to refine the model for
future iterations.

Scikit-Learn: The Python Library for Machine
Learning
• Scikit-learn is a highly popular library for machine learning that
provides tools for supervised and unsupervised learning.
• It is built upon the SciPy stack, which involves NumPy, SciPy,
Matplotlib, Pandas, etc

Installing Scikit-Learn
• Installation:
oCreate a new cell in Jupyter Notebook:
oRun command to install
import sklearn

Understanding the API
• Features: Numerical variables representing data points.
• Estimators: Learn patterns from data (e.g., classification, regression).
• Predictors: Make predictions on new data.
• Transformers: Preprocess and transform data (e.g., scaling, feature
extraction).
• Chaining Estimators: Combine multiple estimators for complex tasks.
• Pipeline Objects: Simplify the process of chaining multiple estimators
into a single one.
• pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])

Your First Scikit-learn Experiment
• Use a simple dataset called iris dataset
• Example 1: See the components of iris object as a dictionary.
• Result:

• Example 2: See Data, target, feature names and target names
• Result:

• Example 3: Support Vector Machines (SVM) Classifier
• Result:

Chapter five gives a top level view of system gaining knowledge of (ML) basics
thru Scikit-learn, that specialize in sensible and reachable applications. Key
insights consist of know-how supervised vs. unsupervised gaining knowledge
of and the significance of records first-rate in correct predictions. Real-
international examples, which include unsolicited mail detection, spotlight
ML`s sensible value. The six-degree ML process—hassle know-how, records
collection, annotation, wrangling, version development, and deployment—
emphasizes making plans and iteration. Scikit-learn`s simplicity, validated thru
physical activities just like the Iris dataset, reinforces principles like
hyperparameter tuning and pipeline use. The bankruptcy additionally stresses
ML's iterative nature and moral concerns for accountable version
development.
Personal Reflections on this chapter

Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx

More Related Content

Similar to Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx

Recently uploaded

Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx