Introduction to Machine
Learning
PREPARED BY: LUCY FELIX-SADIWA
Image retrieved from https://www.javatpoint.com/subsets-of-ai
What is Data Science
• Data science combines multiple fields, including statistics, scientific
methods, artificial intelligence (AI), and data analysis, to extract
value from data. Those who practice data science are called data
scientists, and they combine a range of skills to analyze data
collected from the web, smartphones, customers, sensors, and
other sources to derive actionable insights. (retrieved from
https://www.oracle.com/ph/data-science/what-is-data-science/)
Artificial Intelligence
Machine
Learning
Neural
Networks
Deep
Learning
Big Data
Data Science Data
Mining
Analytics
• Analytics is the process of discovering, interpreting, and communicating significant patterns
in data. Quite simply, analytics helps us see insights and meaningful data that we might not
otherwise detect. Business analytics focuses on using insights derived from data to make
more informed decisions that will help organizations increase sales, reduce costs, and make
other business improvements.
https://www.oracle.com/ph/business-analytics/what-is-analytics/
• 4 Types
• Descriptive Analytics - what happened in the past
• Diagnostic Analytics - why something happened in the past
• Predictive Analytics - which predicts what’s most likely to happen in the future
• Prescriptive Analytics - which recommends actions you can take to affect those
likely outcomes
Descriptive Analytics
• Any activity or method that helps us to describe or summarize raw data into
something interpretable by humans
• Allow us to learn from past behaviors and understand how they might influence future
outcomes
• Examples:
• Company’s business intelligence reports that cover different aspect of the organization to provide historical
insights regarding the company’s production, operations, sales, revenue, financials, inventory, customers,
and market share
• The sales team can learn which customer segments generated the highest peso amount in sales last year.
• The marketing team can uncover which social media platforms delivered the best return on advertising
investment last quarter.
• The finance team can track month-over-month and year-over-year revenue growth or decline.
• Operations can track demand for SKUs (Stock Keeping Units) across geographic locations throughout the
past year.
Diagnostic Analytics
• Examines data or information to answer the question “Why did it happen?”
• Techniques: Drill-down, data discovery, data mining, correlations, causations
• Provides a very good understanding of a limited piece of the problem you want to
solve
• Labor intensive – human intervention is required to perform drill-down or data mining
to go deeper into the data to understand why something happened or its root cause.
It focuses on determining the factors and events that contributed to the outcome.
• Examples:
• Decline in sales of a product line on some stores, product manager may want to look backward to
review past trends and patterns for the product line sales across different stores base on its
placement (floor, corner, aisle) within the store. The manager may look at external factors such as
demographic, season, and other factors
Predictive Analytics
• Ability to make predictions or estimations of likelihood about unknown future
events based on the past or historic patterns.
• Give insights into “What might happen?”
• Uses techniques from data mining, statistics, modeling, machine learning, and
AI to analyze current data to make predictions about the future.
• The foundation of predictive analytics is based on probabilities, and the quality
of predictions by statistical algorithms depends a lot on the quality of input
data. 100% Accuracy
• Examples: Weather Forecasting, e-mail spam identification, fraud detection,
probability of customer purchasing a product or renewal of insurance policy,
predicting the chances of a person with known illness, etc.
Prescriptive Analytics
• Area of data or business analytics dedicated to finding the best course of action for a
given situation.
• Endeavors to measure the future decision’s effect to enable the decision makers to
foresee the possible outcomes before the actual decisions are made.
• Combination of business rules, machine learning algorithms, tools that can be
applied against historic and real-time data feed.
• Key objective: not just to predict what will happen but also why it will happen by
predicting multiple futures based on different scenarios to allow companies to assess
possible outcomes base on their actions.
• Examples: simulations in design situations to help users identify system behaviors
under different configurations and ensuring that all key performance metrics are met
such as wait times, queue length, etc.
What is Machine Learning?
• Arthur Samuel (1959)
• Machine Learning is a field of study that gives computers the ability to learn without being explicitly
programmed
• Tom Mitchel(1997)
• A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
• Machine learning is a field of computer science that involves using statistical methods to
create programs that either improve performance over time, or detect patterns in massive
amounts of data that humans would be unlikely to find.
• Machine learning explores the study and construction of algorithms that can learn from and
make predictions on data. Such algorithms operate by building a model from example inputs
in order to make data driven predictions or decisions rather than following strictly static
program instructions
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.
• Suppose your email program watches which emails you do or do not
mark as spam, and based on that learns how to better filter spam.
What is the task T in this setting?
A. Classifying emails as spam or not spam
B. Watching you label emails as spam or not spam
C. The number (or fraction) of emails correctly classified as spam/not spam.
D. None of the above – This is not a machine learning problem.
• How the driverless car sees the world (https://www.youtube.com/watch?v=tiwVMrTLUWg&t=754s)
• Video Recommendations from youtube
• Posts, Ads, videos on social media like facebook.
Image retrieved from https://vitalflux.com/great-mind-maps-for-learning-machine-learning/
Categories of
Machine Learning
Supervised Learning
• The machine learning algorithm is provided with large enough example
input dataset and respective output or event/class, usually prepared
in consultation with the subject matter expert of a respective domain.
• Goal: Learn patterns in the data and build general rules to map input
to the output, class or event.
• 2 Types
• Regression - The output to be predicted is a continuous number in relevance
with a given input dataset.
• Classification – The output to be predicted is the actual or the probability of an
event/class and the number of classes to be predicted (2 or more)
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
Example by Andrew Ng on Machine Learning Course
Example by Andrew Ng on Machine Learning Course
Example by Andrew Ng on Machine Learning Course
Unsupervised Learning
• Study the patterns in the input dataset to get better understanding and identify similar
patterns that can be grouped into specific classes or events. It does not require any
intervention from the subject matter experts beforehand
• Examples of Unsupervised Learning
• Clustering - The goal is to divide the input dataset into logical groups of related items. Examples:
grouping news articles, grouping customers base on their profile, etc.
• Dimensionality Reduction – The goal is to simplify a large input dataset by mapping them to a
lower dimensional space. Example: Doing Analysis on large dimension dataset, you may want to
find the key variables that hold significant percentage (say 95%) of information, and only use them
for analysis.
• Anomaly Detection – aka Outlier Detection, is the identification of items or observations which do
not conform to an expected pattern or behavior in comparison with other items in a given dataset.
Examples: machine or system health monitoring, event detection, fraud/intrusion detection. It’s a
big area of internet of things to enable detection of abnormal behavior in a given context.
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
Reinforcement Learning - Map situations to actions
that yield the maximum final reward.
• Not only the
immediate reward but
also the next and all
subsequent rewards.
• Errors as rewards or
penalties
• If error is big, then the
penalty is high and
the reward low
• Reward feedback is
required for the
model to learn which
action is best and this
is known as
“the reinforcement si
gnal”.
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
Workflow of Machine Learning Project
(Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
Machine Learning Workflow
(PANT, 2019) RETRIEVED FROM
HTTPS://TOWARDSDATASCIENCE.COM/WORKFLOW-OF-A-MACHINE-
LEARNING-PROJECT-EC1DBA419B94
• Gathering data
• Data pre-processing
• Researching the model that
will be best for the type of data
• Training and testing the model
• Evaluation
(MAHESHWARI, 2018) RETRIEVED FROM
HTTPS://MEDIUM.DATADRIVENINVESTOR.COM/MACHINE-
LEARNING-PROJECT-WORKFLOW-8137A401ED81
• Gathering the data.
• Preparation of Data.
• EDA (Exploratory Data Analysis).
• Feature Engineering Selection.
• Choosing the best model.
• Training our model.
• Evaluating the model.
• Performing Hyper Parameter Tuning on the model.
• Interpreting the model results.
Knowledge Discovery Databases
Cross-Industry
Standard
Process for Data
Mining (CRISP-
DM)
Analytics
Solution Unified
Method for
Data Mining/
Predictive
Analytics
(ASUM-DM)
Python’s Data Analysis Packages
• Numpy - Core library for scientific computing. Its built-in mathematical
functions enable lightning-speed computation and can support
multidimensional data and large matrices. It is also used in linear algebra.
NumPy Array is often used preferentially over lists as it uses less memory
and is more convenient and efficient.
• Scikit-Learn - one of the most used machine learning libraries in Python.
Built on NumPy, SciPy, and Matplotlib
• Matplotlib - an extensive library for creating fixed, interactive, and
animated Python visualizations.
• Pandas -. It is primarily used for data analysis, data manipulation, and data
cleaning.
Commonly used Algorithms
CLASSIFICATION
• K-Nearest Neighbor
• Naive Bayes
• Decision Trees/Random Forest
• Support Vector Machine
• Logistic Regression
REGRESSION
• Linear Regression
• Support Vector Regression
• Decision Tress/Random Forest
• Gaussian Progresses
Regression
• Ensemble Methods
(Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

Programming-Introduction-to-Machine-Learning.pptx

  • 1.
  • 2.
    Image retrieved fromhttps://www.javatpoint.com/subsets-of-ai
  • 3.
    What is DataScience • Data science combines multiple fields, including statistics, scientific methods, artificial intelligence (AI), and data analysis, to extract value from data. Those who practice data science are called data scientists, and they combine a range of skills to analyze data collected from the web, smartphones, customers, sensors, and other sources to derive actionable insights. (retrieved from https://www.oracle.com/ph/data-science/what-is-data-science/)
  • 4.
  • 5.
    Analytics • Analytics isthe process of discovering, interpreting, and communicating significant patterns in data. Quite simply, analytics helps us see insights and meaningful data that we might not otherwise detect. Business analytics focuses on using insights derived from data to make more informed decisions that will help organizations increase sales, reduce costs, and make other business improvements. https://www.oracle.com/ph/business-analytics/what-is-analytics/ • 4 Types • Descriptive Analytics - what happened in the past • Diagnostic Analytics - why something happened in the past • Predictive Analytics - which predicts what’s most likely to happen in the future • Prescriptive Analytics - which recommends actions you can take to affect those likely outcomes
  • 6.
    Descriptive Analytics • Anyactivity or method that helps us to describe or summarize raw data into something interpretable by humans • Allow us to learn from past behaviors and understand how they might influence future outcomes • Examples: • Company’s business intelligence reports that cover different aspect of the organization to provide historical insights regarding the company’s production, operations, sales, revenue, financials, inventory, customers, and market share • The sales team can learn which customer segments generated the highest peso amount in sales last year. • The marketing team can uncover which social media platforms delivered the best return on advertising investment last quarter. • The finance team can track month-over-month and year-over-year revenue growth or decline. • Operations can track demand for SKUs (Stock Keeping Units) across geographic locations throughout the past year.
  • 7.
    Diagnostic Analytics • Examinesdata or information to answer the question “Why did it happen?” • Techniques: Drill-down, data discovery, data mining, correlations, causations • Provides a very good understanding of a limited piece of the problem you want to solve • Labor intensive – human intervention is required to perform drill-down or data mining to go deeper into the data to understand why something happened or its root cause. It focuses on determining the factors and events that contributed to the outcome. • Examples: • Decline in sales of a product line on some stores, product manager may want to look backward to review past trends and patterns for the product line sales across different stores base on its placement (floor, corner, aisle) within the store. The manager may look at external factors such as demographic, season, and other factors
  • 8.
    Predictive Analytics • Abilityto make predictions or estimations of likelihood about unknown future events based on the past or historic patterns. • Give insights into “What might happen?” • Uses techniques from data mining, statistics, modeling, machine learning, and AI to analyze current data to make predictions about the future. • The foundation of predictive analytics is based on probabilities, and the quality of predictions by statistical algorithms depends a lot on the quality of input data. 100% Accuracy • Examples: Weather Forecasting, e-mail spam identification, fraud detection, probability of customer purchasing a product or renewal of insurance policy, predicting the chances of a person with known illness, etc.
  • 9.
    Prescriptive Analytics • Areaof data or business analytics dedicated to finding the best course of action for a given situation. • Endeavors to measure the future decision’s effect to enable the decision makers to foresee the possible outcomes before the actual decisions are made. • Combination of business rules, machine learning algorithms, tools that can be applied against historic and real-time data feed. • Key objective: not just to predict what will happen but also why it will happen by predicting multiple futures based on different scenarios to allow companies to assess possible outcomes base on their actions. • Examples: simulations in design situations to help users identify system behaviors under different configurations and ensuring that all key performance metrics are met such as wait times, queue length, etc.
  • 12.
    What is MachineLearning? • Arthur Samuel (1959) • Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed • Tom Mitchel(1997) • A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. • Machine learning is a field of computer science that involves using statistical methods to create programs that either improve performance over time, or detect patterns in massive amounts of data that humans would be unlikely to find. • Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make data driven predictions or decisions rather than following strictly static program instructions
  • 13.
    A computer programis said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. • Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? A. Classifying emails as spam or not spam B. Watching you label emails as spam or not spam C. The number (or fraction) of emails correctly classified as spam/not spam. D. None of the above – This is not a machine learning problem. • How the driverless car sees the world (https://www.youtube.com/watch?v=tiwVMrTLUWg&t=754s) • Video Recommendations from youtube • Posts, Ads, videos on social media like facebook.
  • 14.
    Image retrieved fromhttps://vitalflux.com/great-mind-maps-for-learning-machine-learning/ Categories of Machine Learning
  • 15.
    Supervised Learning • Themachine learning algorithm is provided with large enough example input dataset and respective output or event/class, usually prepared in consultation with the subject matter expert of a respective domain. • Goal: Learn patterns in the data and build general rules to map input to the output, class or event. • 2 Types • Regression - The output to be predicted is a continuous number in relevance with a given input dataset. • Classification – The output to be predicted is the actual or the probability of an event/class and the number of classes to be predicted (2 or more)
  • 16.
    (Loukas, 2020) retrievedfrom https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
  • 17.
    Example by AndrewNg on Machine Learning Course
  • 18.
    Example by AndrewNg on Machine Learning Course
  • 19.
    Example by AndrewNg on Machine Learning Course
  • 23.
    Unsupervised Learning • Studythe patterns in the input dataset to get better understanding and identify similar patterns that can be grouped into specific classes or events. It does not require any intervention from the subject matter experts beforehand • Examples of Unsupervised Learning • Clustering - The goal is to divide the input dataset into logical groups of related items. Examples: grouping news articles, grouping customers base on their profile, etc. • Dimensionality Reduction – The goal is to simplify a large input dataset by mapping them to a lower dimensional space. Example: Doing Analysis on large dimension dataset, you may want to find the key variables that hold significant percentage (say 95%) of information, and only use them for analysis. • Anomaly Detection – aka Outlier Detection, is the identification of items or observations which do not conform to an expected pattern or behavior in comparison with other items in a given dataset. Examples: machine or system health monitoring, event detection, fraud/intrusion detection. It’s a big area of internet of things to enable detection of abnormal behavior in a given context.
  • 24.
    (Loukas, 2020) retrievedfrom https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
  • 25.
    Reinforcement Learning -Map situations to actions that yield the maximum final reward. • Not only the immediate reward but also the next and all subsequent rewards. • Errors as rewards or penalties • If error is big, then the penalty is high and the reward low • Reward feedback is required for the model to learn which action is best and this is known as “the reinforcement si gnal”. (Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
  • 26.
    Workflow of MachineLearning Project (Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
  • 27.
    Machine Learning Workflow (PANT,2019) RETRIEVED FROM HTTPS://TOWARDSDATASCIENCE.COM/WORKFLOW-OF-A-MACHINE- LEARNING-PROJECT-EC1DBA419B94 • Gathering data • Data pre-processing • Researching the model that will be best for the type of data • Training and testing the model • Evaluation (MAHESHWARI, 2018) RETRIEVED FROM HTTPS://MEDIUM.DATADRIVENINVESTOR.COM/MACHINE- LEARNING-PROJECT-WORKFLOW-8137A401ED81 • Gathering the data. • Preparation of Data. • EDA (Exploratory Data Analysis). • Feature Engineering Selection. • Choosing the best model. • Training our model. • Evaluating the model. • Performing Hyper Parameter Tuning on the model. • Interpreting the model results.
  • 28.
  • 29.
    Cross-Industry Standard Process for Data Mining(CRISP- DM) Analytics Solution Unified Method for Data Mining/ Predictive Analytics (ASUM-DM)
  • 31.
    Python’s Data AnalysisPackages • Numpy - Core library for scientific computing. Its built-in mathematical functions enable lightning-speed computation and can support multidimensional data and large matrices. It is also used in linear algebra. NumPy Array is often used preferentially over lists as it uses less memory and is more convenient and efficient. • Scikit-Learn - one of the most used machine learning libraries in Python. Built on NumPy, SciPy, and Matplotlib • Matplotlib - an extensive library for creating fixed, interactive, and animated Python visualizations. • Pandas -. It is primarily used for data analysis, data manipulation, and data cleaning.
  • 32.
    Commonly used Algorithms CLASSIFICATION •K-Nearest Neighbor • Naive Bayes • Decision Trees/Random Forest • Support Vector Machine • Logistic Regression REGRESSION • Linear Regression • Support Vector Regression • Decision Tress/Random Forest • Gaussian Progresses Regression • Ensemble Methods (Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

Editor's Notes

  • #2 Machine learning is a part of AI which provides intelligence to machines with the ability to automatically learn with experiences without being explicitly programmed. It is primarily concerned with the design and development of algorithms that allow the system to learn from historical data. Machine Learning is based on the idea that machines can learn from past data, identify patterns, and make decisions using algorithms. Machine learning algorithms are designed in such a way that they can learn and improve their performance automatically. Machine learning helps in discovering patterns in data.