阮山松 – NGUYEN SON TUNG - F112169103
Chapter 5 Introduction to
Machine Learning with Scikit-
learn
• Enabling computers to learn from data without explicit programming.
• Key idea: Machine learning algorithms adapt their behavior based on
data, improving over time with experience.
• Quote: “A program learns from experience E for tasks T with
performance measure P, if its performance at tasks T, as measured by
P, improves with experience E.” — Tom Mitchell.
• Examples: Email spam detection, chess-playing algorithms.
What is Machine Learning?
• Objective: Machine learning focuses on discovering patterns or
predicting outcomes based on data.
• Supervised Learning: Uses labeled data where the outcome is known.
oExample: Spam detection with labeled emails.
• Unsupervised Learning: Identifies patterns in unlabeled data.
oExample: Market basket analysis (finding items that often appear
together).
Learning from Data: Two Approaches
• A machine learning approach that uses labeled datasets to train
models, allowing them to predict outcomes based on labeled
examples.
• Types:
oClassification: Predicts discrete labels (e.g., spam or not spam).
oRegression: Predicts continuous values (e.g., house prices).
• Key Feature: Requires labeled data from domain experts or logs.
Supervised Learning: Overview
• Assigning each data point to a predefined category based on training
data.
• Process: Model is trained with labeled data to classify new data into
categories.
• Examples:
oFraud detection: Classify transactions as fraudulent or legitimate.
oSentiment analysis: Classify text as positive, negative, or neutral.
Classification in Supervised Learning
• Captures relationships between dependent and independent
variables to predict continuous outcomes.
• Predict a continuous output variable based on input features.
• Example: Predicting stock prices based on historical data.
• Key Differentiator: Output is a non-stop variety as opposed to discrete
classes.
Regression in Supervised Learning
• A method of exploring data without labeled outcomes to find hidden
structures.
• Discover underlying patterns or groupings within the data.
• No labeled data: Learns from the structure and similarities in the
dataset itself.
Unsupervised Learning: Overview
Applications of Unsupervised Learning
• Clustering: Grouping similar data points together to identify natural
clusters within the data.
oExample: Customer segmentation in marketing.
• Association: Finds items that frequently occur together.
oExample: Market basket analysis in retail (e.g., milk and bread
bought together).
Structure of a Machine Learning System
• Offline Process: Training stage where the model learns from historical
data.
oObjective: Learn patterns and relationships.
• Online Process: Prediction stage where the model makes predictions
on new, unseen data.
oExample: A spam classifier trained on old emails predicts if new
emails are spam or not.
• Six Key Stages:
1. Problem Understanding: Define and understand the problem.
2. Data Collection: Gather relevant data.
3. Data Annotation and Preparation: Clean, label, and prepare data.
4. Data Wrangling: Transform data into the required format.
5. Model Development, Training, and Evaluation: Train and assess
the model.
6. Model Deployment and Maintenance: Integrate into production
and monitor.
The Machine Learning Process
The Machine Learning Process
• Clarify goals and set the scope for the machine learning project.
• Actions:
oDefine the problem with stakeholders.
oDetermine measurable outcomes and success criteria.
oExamples: Identify spam emails, classify product reviews.
Step 1: Problem Understanding
Step 2: Data Collection
• Gather quality data relevant to the problem.
• Data sources: Transaction logs, user behavior logs, public datasets,
etc.
• Key Consideration: Quality and relevance of data directly affect model
performance.
Step 3: Data Annotation and Data Preparation
• Data Annotation: Label data for supervised learning.
oExample: Tagging images with objects for object detection.
• Data Preparation: Cleaning, reformatting, and normalizing raw data
for model compatibility.
• Objective: Ensure data quality and consistency.
Step 4: Data Wrangling
• Transform data into a numeric format suitable for model training.
• Process: Convert data into feature vectors (numeric arrays) using
libraries like NumPy.
• Ensures compatibility with algorithms and standardizes input format.
Step 5: Model Development, Training, and
Evaluation
• Model Development: Selecting and configuring algorithms (e.g.,
Scikit-learn’s SVM, decision trees).
• Training: Model learns from a large portion of the data.
• Evaluation: Test the model on unseen data to assess performance.
• Tune parameters based on results to improve accuracy.
Step 5: Model Development, Training, and
Evaluation
Step 6: Model Deployment
• Integration: Incorporate the model into production systems.
• Inference: Model processes new data and generates predictions.
• Data Collection: Gather new data from real-world usage.
• Model Improvement: Use collected data to refine the model for
future iterations.
Scikit-Learn: The Python Library for Machine
Learning
• Scikit-learn is a highly popular library for machine learning that
provides tools for supervised and unsupervised learning.
• It is built upon the SciPy stack, which involves NumPy, SciPy,
Matplotlib, Pandas, etc
Installing Scikit-Learn
• Installation:
oCreate a new cell in Jupyter Notebook:
oRun command to install
import sklearn
Understanding the API
• Features: Numerical variables representing data points.
• Estimators: Learn patterns from data (e.g., classification, regression).
• Predictors: Make predictions on new data.
• Transformers: Preprocess and transform data (e.g., scaling, feature
extraction).
• Chaining Estimators: Combine multiple estimators for complex tasks.
• Pipeline Objects: Simplify the process of chaining multiple estimators
into a single one.
• pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
Your First Scikit-learn Experiment
• Use a simple dataset called iris dataset
• Example 1: See the components of iris object as a dictionary.
• Result:
Your First Scikit-learn Experiment
• Example 2: See Data, target, feature names and target names
• Result:
Your First Scikit-learn Experiment
• Example 3: Support Vector Machines (SVM) Classifier
• Result:
Chapter five gives a top level view of system gaining knowledge of (ML) basics
thru Scikit-learn, that specialize in sensible and reachable applications. Key
insights consist of know-how supervised vs. unsupervised gaining knowledge
of and the significance of records first-rate in correct predictions. Real-
international examples, which include unsolicited mail detection, spotlight
ML`s sensible value. The six-degree ML process—hassle know-how, records
collection, annotation, wrangling, version development, and deployment—
emphasizes making plans and iteration. Scikit-learn`s simplicity, validated thru
physical activities just like the Iris dataset, reinforces principles like
hyperparameter tuning and pipeline use. The bankruptcy additionally stresses
ML's iterative nature and moral concerns for accountable version
development.
Personal Reflections on this chapter

Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx

  • 1.
    阮山松 – NGUYENSON TUNG - F112169103 Chapter 5 Introduction to Machine Learning with Scikit- learn
  • 2.
    • Enabling computersto learn from data without explicit programming. • Key idea: Machine learning algorithms adapt their behavior based on data, improving over time with experience. • Quote: “A program learns from experience E for tasks T with performance measure P, if its performance at tasks T, as measured by P, improves with experience E.” — Tom Mitchell. • Examples: Email spam detection, chess-playing algorithms. What is Machine Learning?
  • 3.
    • Objective: Machinelearning focuses on discovering patterns or predicting outcomes based on data. • Supervised Learning: Uses labeled data where the outcome is known. oExample: Spam detection with labeled emails. • Unsupervised Learning: Identifies patterns in unlabeled data. oExample: Market basket analysis (finding items that often appear together). Learning from Data: Two Approaches
  • 4.
    • A machinelearning approach that uses labeled datasets to train models, allowing them to predict outcomes based on labeled examples. • Types: oClassification: Predicts discrete labels (e.g., spam or not spam). oRegression: Predicts continuous values (e.g., house prices). • Key Feature: Requires labeled data from domain experts or logs. Supervised Learning: Overview
  • 5.
    • Assigning eachdata point to a predefined category based on training data. • Process: Model is trained with labeled data to classify new data into categories. • Examples: oFraud detection: Classify transactions as fraudulent or legitimate. oSentiment analysis: Classify text as positive, negative, or neutral. Classification in Supervised Learning
  • 6.
    • Captures relationshipsbetween dependent and independent variables to predict continuous outcomes. • Predict a continuous output variable based on input features. • Example: Predicting stock prices based on historical data. • Key Differentiator: Output is a non-stop variety as opposed to discrete classes. Regression in Supervised Learning
  • 7.
    • A methodof exploring data without labeled outcomes to find hidden structures. • Discover underlying patterns or groupings within the data. • No labeled data: Learns from the structure and similarities in the dataset itself. Unsupervised Learning: Overview
  • 8.
    Applications of UnsupervisedLearning • Clustering: Grouping similar data points together to identify natural clusters within the data. oExample: Customer segmentation in marketing. • Association: Finds items that frequently occur together. oExample: Market basket analysis in retail (e.g., milk and bread bought together).
  • 9.
    Structure of aMachine Learning System • Offline Process: Training stage where the model learns from historical data. oObjective: Learn patterns and relationships. • Online Process: Prediction stage where the model makes predictions on new, unseen data. oExample: A spam classifier trained on old emails predicts if new emails are spam or not.
  • 10.
    • Six KeyStages: 1. Problem Understanding: Define and understand the problem. 2. Data Collection: Gather relevant data. 3. Data Annotation and Preparation: Clean, label, and prepare data. 4. Data Wrangling: Transform data into the required format. 5. Model Development, Training, and Evaluation: Train and assess the model. 6. Model Deployment and Maintenance: Integrate into production and monitor. The Machine Learning Process
  • 11.
  • 12.
    • Clarify goalsand set the scope for the machine learning project. • Actions: oDefine the problem with stakeholders. oDetermine measurable outcomes and success criteria. oExamples: Identify spam emails, classify product reviews. Step 1: Problem Understanding
  • 13.
    Step 2: DataCollection • Gather quality data relevant to the problem. • Data sources: Transaction logs, user behavior logs, public datasets, etc. • Key Consideration: Quality and relevance of data directly affect model performance.
  • 14.
    Step 3: DataAnnotation and Data Preparation • Data Annotation: Label data for supervised learning. oExample: Tagging images with objects for object detection. • Data Preparation: Cleaning, reformatting, and normalizing raw data for model compatibility. • Objective: Ensure data quality and consistency.
  • 15.
    Step 4: DataWrangling • Transform data into a numeric format suitable for model training. • Process: Convert data into feature vectors (numeric arrays) using libraries like NumPy. • Ensures compatibility with algorithms and standardizes input format.
  • 16.
    Step 5: ModelDevelopment, Training, and Evaluation • Model Development: Selecting and configuring algorithms (e.g., Scikit-learn’s SVM, decision trees). • Training: Model learns from a large portion of the data. • Evaluation: Test the model on unseen data to assess performance. • Tune parameters based on results to improve accuracy.
  • 17.
    Step 5: ModelDevelopment, Training, and Evaluation
  • 18.
    Step 6: ModelDeployment • Integration: Incorporate the model into production systems. • Inference: Model processes new data and generates predictions. • Data Collection: Gather new data from real-world usage. • Model Improvement: Use collected data to refine the model for future iterations.
  • 19.
    Scikit-Learn: The PythonLibrary for Machine Learning • Scikit-learn is a highly popular library for machine learning that provides tools for supervised and unsupervised learning. • It is built upon the SciPy stack, which involves NumPy, SciPy, Matplotlib, Pandas, etc
  • 20.
    Installing Scikit-Learn • Installation: oCreatea new cell in Jupyter Notebook: oRun command to install import sklearn
  • 21.
    Understanding the API •Features: Numerical variables representing data points. • Estimators: Learn patterns from data (e.g., classification, regression). • Predictors: Make predictions on new data. • Transformers: Preprocess and transform data (e.g., scaling, feature extraction). • Chaining Estimators: Combine multiple estimators for complex tasks. • Pipeline Objects: Simplify the process of chaining multiple estimators into a single one. • pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
  • 22.
    Your First Scikit-learnExperiment • Use a simple dataset called iris dataset • Example 1: See the components of iris object as a dictionary. • Result:
  • 23.
    Your First Scikit-learnExperiment • Example 2: See Data, target, feature names and target names • Result:
  • 24.
    Your First Scikit-learnExperiment • Example 3: Support Vector Machines (SVM) Classifier • Result:
  • 25.
    Chapter five givesa top level view of system gaining knowledge of (ML) basics thru Scikit-learn, that specialize in sensible and reachable applications. Key insights consist of know-how supervised vs. unsupervised gaining knowledge of and the significance of records first-rate in correct predictions. Real- international examples, which include unsolicited mail detection, spotlight ML`s sensible value. The six-degree ML process—hassle know-how, records collection, annotation, wrangling, version development, and deployment— emphasizes making plans and iteration. Scikit-learn`s simplicity, validated thru physical activities just like the Iris dataset, reinforces principles like hyperparameter tuning and pipeline use. The bankruptcy additionally stresses ML's iterative nature and moral concerns for accountable version development. Personal Reflections on this chapter