Programming-Introduction-to-Machine-Learning.pptx

Introduction to Machine
Learning
PREPARED BY: LUCY FELIX-SADIWA

Image retrieved from https://www.javatpoint.com/subsets-of-ai

What is Data Science
• Data science combines multiple fields, including statistics, scientific
methods, artificial intelligence (AI), and data analysis, to extract
value from data. Those who practice data science are called data
scientists, and they combine a range of skills to analyze data
collected from the web, smartphones, customers, sensors, and
other sources to derive actionable insights. (retrieved from
https://www.oracle.com/ph/data-science/what-is-data-science/)

Artificial Intelligence
Machine
Learning
Neural
Networks
Deep
Learning
Big Data
Data Science Data
Mining

Analytics
• Analytics is the process of discovering, interpreting, and communicating significant patterns
in data. Quite simply, analytics helps us see insights and meaningful data that we might not
otherwise detect. Business analytics focuses on using insights derived from data to make
more informed decisions that will help organizations increase sales, reduce costs, and make
other business improvements.
https://www.oracle.com/ph/business-analytics/what-is-analytics/
• 4 Types
• Descriptive Analytics - what happened in the past
• Diagnostic Analytics - why something happened in the past
• Predictive Analytics - which predicts what’s most likely to happen in the future
• Prescriptive Analytics - which recommends actions you can take to affect those
likely outcomes

Descriptive Analytics
• Any activity or method that helps us to describe or summarize raw data into
something interpretable by humans
• Allow us to learn from past behaviors and understand how they might influence future
outcomes
• Examples:
• Company’s business intelligence reports that cover different aspect of the organization to provide historical
insights regarding the company’s production, operations, sales, revenue, financials, inventory, customers,
and market share
• The sales team can learn which customer segments generated the highest peso amount in sales last year.
• The marketing team can uncover which social media platforms delivered the best return on advertising
investment last quarter.
• The finance team can track month-over-month and year-over-year revenue growth or decline.
• Operations can track demand for SKUs (Stock Keeping Units) across geographic locations throughout the
past year.

Diagnostic Analytics
• Examines data or information to answer the question “Why did it happen?”
• Techniques: Drill-down, data discovery, data mining, correlations, causations
• Provides a very good understanding of a limited piece of the problem you want to
solve
• Labor intensive – human intervention is required to perform drill-down or data mining
to go deeper into the data to understand why something happened or its root cause.
It focuses on determining the factors and events that contributed to the outcome.
• Examples:
• Decline in sales of a product line on some stores, product manager may want to look backward to
review past trends and patterns for the product line sales across different stores base on its
placement (floor, corner, aisle) within the store. The manager may look at external factors such as
demographic, season, and other factors

Predictive Analytics
• Ability to make predictions or estimations of likelihood about unknown future
events based on the past or historic patterns.
• Give insights into “What might happen?”
• Uses techniques from data mining, statistics, modeling, machine learning, and
AI to analyze current data to make predictions about the future.
• The foundation of predictive analytics is based on probabilities, and the quality
of predictions by statistical algorithms depends a lot on the quality of input
data. 100% Accuracy
• Examples: Weather Forecasting, e-mail spam identification, fraud detection,
probability of customer purchasing a product or renewal of insurance policy,
predicting the chances of a person with known illness, etc.

Prescriptive Analytics
• Area of data or business analytics dedicated to finding the best course of action for a
given situation.
• Endeavors to measure the future decision’s effect to enable the decision makers to
foresee the possible outcomes before the actual decisions are made.
• Combination of business rules, machine learning algorithms, tools that can be
applied against historic and real-time data feed.
• Key objective: not just to predict what will happen but also why it will happen by
predicting multiple futures based on different scenarios to allow companies to assess
possible outcomes base on their actions.
• Examples: simulations in design situations to help users identify system behaviors
under different configurations and ensuring that all key performance metrics are met
such as wait times, queue length, etc.

What is Machine Learning?
• Arthur Samuel (1959)
• Machine Learning is a field of study that gives computers the ability to learn without being explicitly
programmed
• Tom Mitchel(1997)
• A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
• Machine learning is a field of computer science that involves using statistical methods to
create programs that either improve performance over time, or detect patterns in massive
amounts of data that humans would be unlikely to find.
• Machine learning explores the study and construction of algorithms that can learn from and
make predictions on data. Such algorithms operate by building a model from example inputs
in order to make data driven predictions or decisions rather than following strictly static
program instructions

A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.
• Suppose your email program watches which emails you do or do not
mark as spam, and based on that learns how to better filter spam.
What is the task T in this setting?
A. Classifying emails as spam or not spam
B. Watching you label emails as spam or not spam
C. The number (or fraction) of emails correctly classified as spam/not spam.
D. None of the above – This is not a machine learning problem.
• How the driverless car sees the world (https://www.youtube.com/watch?v=tiwVMrTLUWg&t=754s)
• Video Recommendations from youtube
• Posts, Ads, videos on social media like facebook.

Image retrieved from https://vitalflux.com/great-mind-maps-for-learning-machine-learning/
Categories of
Machine Learning

Supervised Learning
• The machine learning algorithm is provided with large enough example
input dataset and respective output or event/class, usually prepared
in consultation with the subject matter expert of a respective domain.
• Goal: Learn patterns in the data and build general rules to map input
to the output, class or event.
• 2 Types
• Regression - The output to be predicted is a continuous number in relevance
with a given input dataset.
• Classification – The output to be predicted is the actual or the probability of an
event/class and the number of classes to be predicted (2 or more)

(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb

Example by Andrew Ng on Machine Learning Course

Unsupervised Learning
• Study the patterns in the input dataset to get better understanding and identify similar
patterns that can be grouped into specific classes or events. It does not require any
intervention from the subject matter experts beforehand
• Examples of Unsupervised Learning
• Clustering - The goal is to divide the input dataset into logical groups of related items. Examples:
grouping news articles, grouping customers base on their profile, etc.
• Dimensionality Reduction – The goal is to simplify a large input dataset by mapping them to a
lower dimensional space. Example: Doing Analysis on large dimension dataset, you may want to
find the key variables that hold significant percentage (say 95%) of information, and only use them
for analysis.
• Anomaly Detection – aka Outlier Detection, is the identification of items or observations which do
not conform to an expected pattern or behavior in comparison with other items in a given dataset.
Examples: machine or system health monitoring, event detection, fraud/intrusion detection. It’s a
big area of internet of things to enable detection of abnormal behavior in a given context.

Reinforcement Learning - Map situations to actions
that yield the maximum final reward.
• Not only the
immediate reward but
also the next and all
subsequent rewards.
• Errors as rewards or
penalties
• If error is big, then the
penalty is high and
the reward low
• Reward feedback is
required for the
model to learn which
action is best and this
is known as
“the reinforcement si
gnal”.
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb

Workflow of Machine Learning Project
(Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

Machine Learning Workflow
(PANT, 2019) RETRIEVED FROM
HTTPS://TOWARDSDATASCIENCE.COM/WORKFLOW-OF-A-MACHINE-
LEARNING-PROJECT-EC1DBA419B94
• Gathering data
• Data pre-processing
• Researching the model that
will be best for the type of data
• Training and testing the model
• Evaluation
(MAHESHWARI, 2018) RETRIEVED FROM
HTTPS://MEDIUM.DATADRIVENINVESTOR.COM/MACHINE-
LEARNING-PROJECT-WORKFLOW-8137A401ED81
• Gathering the data.
• Preparation of Data.
• EDA (Exploratory Data Analysis).
• Feature Engineering Selection.
• Choosing the best model.
• Training our model.
• Evaluating the model.
• Performing Hyper Parameter Tuning on the model.
• Interpreting the model results.

Cross-Industry
Standard
Process for Data
Mining (CRISP-
DM)
Analytics
Solution Unified
Method for
Data Mining/
Predictive
Analytics
(ASUM-DM)

Python’s Data Analysis Packages
• Numpy - Core library for scientific computing. Its built-in mathematical
functions enable lightning-speed computation and can support
multidimensional data and large matrices. It is also used in linear algebra.
NumPy Array is often used preferentially over lists as it uses less memory
and is more convenient and efficient.
• Scikit-Learn - one of the most used machine learning libraries in Python.
Built on NumPy, SciPy, and Matplotlib
• Matplotlib - an extensive library for creating fixed, interactive, and
animated Python visualizations.
• Pandas -. It is primarily used for data analysis, data manipulation, and data
cleaning.

Commonly used Algorithms
CLASSIFICATION
• K-Nearest Neighbor
• Naive Bayes
• Decision Trees/Random Forest
• Support Vector Machine
• Logistic Regression
REGRESSION
• Linear Regression
• Support Vector Regression
• Decision Tress/Random Forest
• Gaussian Progresses
Regression
• Ensemble Methods
(Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

Programming-Introduction-to-Machine-Learning.pptx

More Related Content

Similar to Programming-Introduction-to-Machine-Learning.pptx

More from SaitoHiraga17

Recently uploaded

Programming-Introduction-to-Machine-Learning.pptx

Editor's Notes