The document provides an overview of machine learning, including definitions, types of machine learning (supervised, unsupervised, reinforcement learning), and evaluation metrics for machine learning models. It discusses classification metrics like accuracy, precision, recall, F1 score, and confusion matrices. For regression problems, it covers metrics like mean absolute error, mean squared error, R2 score. It also provides examples of calculating many of these common metrics in Python.
In Machine Learning in Credit Risk Modeling, we provide an explanation of the main Machine Learning models used in James so that Efficiency does not come at the expense of Explainability.
(Contact Yvan De Munck for more info or to receive other and future updates on the subject @yvandemunck or yvan@james.finance)
Introduction to use machine learning in python and pascal to do such a thing like train prime numbers when there are algorithms in place to determine prime numbers. See a dataframe, feature extracting and a few plots to re-search for another hot experiment to predict prime numbers.
Assessing Model Performance - Beginner's GuideMegan Verbakel
Introduction on how to assess the performance of a classifier model. Covers theories (bias-variance trade-off, over/under-fitting), data preparation (train/test split, cross-validation), common performance plots (e.g. ROC curve and confusion matrix), and common metrics (e.g. accuracy, precision, recall, f1-score).
This tutor shows the train and test set split with histogram and a probability density function in scikit-learn on synthetic datasets. The dataset is very simple as a reference of understanding.
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
Performed cleaning and founded the important variables and created a best model using different classification techniques (Random Forest, Naïve Bayes, Decision tree, KNN, Neural Network, Support Vector Machine) to predict the back-order for an organization using the best modelling and technique approach.
These slides were used in an introductory lecture to Computational Finance presented in a third-year class on Machine Learning and Artificial Intelligence. The slides present three examples of machine learning applied to computational / quantitative finance. These include
1) Model calibration (stochastic process) using the stochastic Hill Climbing algorithms.
2) Predicting Credit Default rates using a Neural Network
3) Portfolio Optimization using the Particle Swarm Optimization Algorithm.
All of the Python code is available for download on GitHub. Link is available at the end of the slide-show.
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
2. www.SunilOS.com 2
What is Machine Learning?
❑ “Learning is any process by which a system improves
performance from experience.”(Herbert Simon)
3. Concept in machine learning
❑Definition by Tom Mitchell (1998):
❑Machine Learning is the study of algorithms that
❑• improve their performance P.
❑• at some task T
❑• with experience E.
❑A well-defined learning task
❑ is given by <P,T,E>
❑
www.SunilOS.com 3
9. Why Now?
❑Flood of available data(especially with the use of
Internet)
❑Increasing Computational power.
❑Growing progress in available algorithms and
theory developed by researchers
❑Increasing support in Industries
www.SunilOS.com 9
11. Types of Learning
❑Supervised (inductive) learning:
o Given: training data + desired outputs (labels).
❑Unsupervised learning
o Given: training data (without desired outputs)
❑Reinforcement learning
o Rewards from sequence of actions
www.SunilOS.com 11
18. Framing Learning Problem
www.SunilOS.com 18
❑Choose the data.
❑Choose exactly what is to be learned.
o i.e. the target function
❑Choose the model
❑Train the model
❑Evaluate the model
21. Performance Measure of Model
❑The metrics that you choose to evaluate your machine
learning algorithms are very important.
❑Choice of metrics influences how the performance of
machine learning algorithms is measured and
compared.
o Classification Metrics
o Regression Metrics
o Multilabel ranking metrics
o Clustering metrics
www.SunilOS.com 21
22. Classification Metrics
❑Classification accuracy is the number of correct
predictions made as a ratio of all predictions made.
o Correct Prediction/Total Prediction
❑This is the most common evaluation metric for
classification problems.
❑It is really only suitable when
o There are an equal number of observations in each class
(which is rarely the case) and
o That all predictions and prediction errors are equally
important, which is often not the case.
www.SunilOS.com 22
24. Cohen’s kappa
❑The kappa score is a number between -1 and 1. Scores
above .8 are generally considered good; zero or lower
means not good (practically random labels).
❑Kappa scores can be computed for binary or multiclass
problems, but not for multilabel problems
www.SunilOS.com 24
26. Precision and Recall
❑ Precision and Recall both are the important to evaluate the
model. It is used for binary class labels.
❑ Precision can be said as a positive predictive value. The
precision is the ratio tp / (tp + fp) where
o tp is the number of true positives
o fp the number of false positives.
❑ The precision is the ability of the classifier not to label as
positive a sample that is negative.
❑ Recall is the ratio tp / (tp + fn)
o where tp is the number of true positives
o fn the number of false negatives.
o The recall is the ability of the classifier to find all the positive
samples.
www.SunilOS.com 26
29. Confusion Matrix
❑ The confusion matrix is a presentation of the accuracy of a
model with two or more classes.
❑ The table presents predictions on the x-axis and accuracy
outcomes on the y-axis.
❑ The cells of the table are the number of predictions made by a
machine learning algorithm.
❑ For example, a machine learning algorithm can predict 0 or 1 and
each prediction may actually have been a 0 or 1.
o Predictions for 0 that were actually 0 , appear in the cell for prediction=0
and actual=0,
o whereas predictions for 0 that were actually 1, appear in the cell for
prediction = 0 and actual=1. and so on
www.SunilOS.com 29
30. Confusion Matrix (cont.)
❑ The parameter normalize allows to report ratios instead of counts. The
confusion matrix can be normalized in 3 different ways: 'pred', 'true',
and 'all' which will divide the counts by the sum of each columns,
rows, or the entire matrix, respectively.
❑ from sklearn.metrics import confusion_matrix
❑ y_true = [2, 0, 2, 2, 0, 1]
❑ y_pred = [0, 0, 2, 2, 0, 2]
❑ print(confusion_matrix(y_true, y_pred))
❑ #Output
❑ [[2 1]
❑ [2 3]]
www.SunilOS.com 30
32. Get count of Values
❑For binary problems, we can get counts of true
negatives, false positives, false negatives and
true positives as follows:
o tn, fn, fp, tp=confusion_matrix(y_true, y
_pred).ravel())
www.SunilOS.com 32
33. Classification report
❑ The classification_report function builds a text report showing
the main classification metrics. Here is a small example with
custom target_names and inferred labels:
o from sklearn.metrics import classification_re
port
o y_true = [0, 1, 2, 2, 0]
o y_pred = [0, 0, 2, 1, 0]
o target_names = ['class 0', 'class 1', 'class
2']
o print(classification_report(y_true, y_pred, t
arget_names=target_names))
www.SunilOS.com 33
35. Hamming Loss
❑ The hamming loss computes the average Hamming loss
or Hamming distance between two sets of samples.
❑ It is Used for multi class classification.
❑ Hamming Loss calculates loss generated in the bit string of class
labels during prediction.
❑ It does that by XOR between the actual and predicted labels and
then average across the dataset.
o from sklearn.metrics import hamming_loss
o y_pred = [1, 2, 3, 4,6,7,8]
o y_true = [2, 2, 3, 4,5,6,7]
o print(hamming_loss(y_true, y_pred))
www.SunilOS.com 35
36. F-measures
❑The F-Measure (Fβ and F1 measures) can be interpreted
as a weighted harmonic mean of the precision and
recall. A Fβ measure reaches its best value at 1 and its
worst score at 0.
o from sklearn import metrics
o y_pred = [0, 1, 0, 0]
o y_true = [0, 1, 0, 1]
o metrics.f1_score(y_true, y_pred)
www.SunilOS.com 36
37. Receiver operating characteristic
❑ The function roc_curve computes the receiver operating
characteristic curve, or ROC curve
❑ ROC curve, is a graphical plot which illustrates the performance
of a binary classifier system as its discrimination threshold is
varied.
❑ It is created by plotting the fraction of true positives out of the
positives (TPR = true positive rate) vs. the fraction of false
positives out of the negatives (FPR = false positive rate), at
various threshold settings.
❑ TPR is also known as sensitivity, and FPR is one minus the
specificity or true negative rate.
www.SunilOS.com 37
38. Area under ROC curve
❑ AUC ranges in value from 0 to 1. A model whose predictions are
100% wrong has an AUC of 0.0; one whose predictions are
100% correct has an AUC of 1.0.
❑ import numpy as np
❑ from sklearn.metrics import roc_auc_score
❑ y_true = np.array([0, 0, 1, 1])
❑ y_scores = np.array([0.1, 0.4, 0.35, 0.8])
❑ roc_auc_score(y_true, y_scores)
www.SunilOS.com 38
40. Regression Metrics
❑ We will review 3 of the most common metrics for evaluating
predictions on regression machine learning problems:
❑ Mean Absolute Error.
o The Mean Absolute Error (or MAE) is the average of the
absolute differences between predictions and actual values. It
gives an idea of how wrong the predictions were.
❑ Mean Squared Error.
o Same as the mean absolute error.
o Taking the square root of the mean squared error converts the
units back to the original units of the output variable and can
be meaningful for description and presentation. This is called
the Root Mean Squared Error (or RMSE).
www.SunilOS.com 40
41. Regression Metrics
❑ R^2
o The R^2 (or R Squared) metric provides an indication of the goodness of
fit of a set of predictions to the actual values.
o In statistical literature, this measure is called the coefficient of
determination.
o This is a value between 0 and 1 for no-fit and perfect fit respectively.
❑These functions have an multioutput keyword argument
which specifies the way the scores or losses for each
individual target should be averaged. Values for this
keyword
o uniform_average
o raw_values
www.SunilOS.com 41
48. Median absolute error
❑ The median_absolute_error is particularly interesting because it
is robust to outliers.
❑ The loss is calculated by taking the median of all absolute
differences between the target and the prediction.
❑ If yj is the predicted value of the i-th sample and yi is the
corresponding true value, then the median absolute error
(MedAE) estimated over nsamples is defined as
❑ It does not support multioutput.
o MeadAE(yi,yj)=median(|yi-yj|,……,|yi-yj|)
www.SunilOS.com 48
50. R² score, the coefficient of determination
❑The r2 function computes the coefficient of
determination, usually denoted as R².
❑It represents the proportion of variance (of y) that has
been explained by the independent variables in the
model.
❑ It provides an indication of goodness of fit and
therefore a measure of how well unseen samples are
likely to be predicted by the model, through the
proportion of explained variance.
www.SunilOS.com 50
51. ❑As such variance is dataset dependent, R² may not be
meaningfully comparable across different datasets.
❑Best possible score is 1.0 and it can be negative
(because the model can be arbitrarily worse).
❑A constant model that always predicts the expected
value of y, disregarding the input features, would get a
R² score of 0.0.
❑If yj is the predicted value of the i-th sample and yi is
the corresponding true value for total n samples, the
estimated R² is defined as:
www.SunilOS.com 51
53. Clustering Metrics
❑Adjusted Rand index
❑Mutual Information based scores
❑Homogeneity, completeness and V-measure
❑Fowlkes-Mallows scores
❑Silhouette Coefficient
❑Contingency Matrix
www.SunilOS.com 53
54. Machine Learning In Brief
❑Tens of thousands of machine learning algorithms
❑Every ML algorithm has three components:
o Representation
o Optimization
o Evaluation
www.SunilOS.com 54
55. Tools For Machine Learning
❑ Machine Learning Libraries:
o Scikit Learn: Built on Numpy, Scipy and matplotlib. Used for Supervised
Learning and Unsupervised Learning.
o Keras: Built on Tensor flow
o Tensorflow: For NN and Deep Learning
❑ Programming:
o Python
❑ Machine Learning Tools
o Anaconda: Free distribution of Python, Package of scientific computing
o Jupyter: web Interface for Python Programming
❑ Data Visualization
o Matplotlib
o Numpy
o Pandas
www.SunilOS.com 55
56. What we will cover in this Course
❑Supervised Learning
❑Unsupervised Learning
❑Reinforcement Learning
❑Performance Measure
❑Applications of Machine Learning
❑Data Preprocessing
❑Data visualization
www.SunilOS.com 56
57. Disclaimer
❑This is an educational presentation to enhance the
skill of computer science students.
❑This presentation is available for free to computer
science students.
❑Some internet images from different URLs are used
in this presentation to simplify technical examples
and correlate examples with the real world.
❑We are grateful to owners of these URLs and
pictures.
www.SunilOS.com 57