This document provides an introduction to machine learning. It discusses what machine learning is, using examples like credit default prediction using logistic regression. The key reasons for the popularity of machine learning currently are the availability of large amounts of cheap data, numerous algorithm development companies, and cloud-based machine learning platforms. Various machine learning concepts are also introduced, such as feature vectors, supervised vs unsupervised learning, and terminology.
Machine Learning 2 deep Learning: An IntroSi Krishan
Provides a brief introduction to machine learning, reasons for its popularity, a simple walk through example and then a need for deep learning and some of its characteristics. This is an updated version of an earlier presentation.
Half day session on Machine learning and its applications. It introduces Artificial Intelligence, move on Machine Learning, applications, algorithms, types, using Cloud for ML, Deep Learning and some resources to start with
Machine learning with Big Data power point presentationDavid Raj Kanthi
This is an article made form the articles of IEEE published in the year 2017
The following presentation has the slides for the Title called the
Machine Learning with Big data. that following presentation which has the challenges and approaches of machine learning with big data.
The integration of the Big Data with Machine Learning has so many challenges that Big data has and what is the approach made by the machine learning mechanism for those challenges.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this session I will share insights and knowledge that I have gained from building up a Data Science department from scratch. The talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organization.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Machine Learning 2 deep Learning: An IntroSi Krishan
Provides a brief introduction to machine learning, reasons for its popularity, a simple walk through example and then a need for deep learning and some of its characteristics. This is an updated version of an earlier presentation.
Half day session on Machine learning and its applications. It introduces Artificial Intelligence, move on Machine Learning, applications, algorithms, types, using Cloud for ML, Deep Learning and some resources to start with
Machine learning with Big Data power point presentationDavid Raj Kanthi
This is an article made form the articles of IEEE published in the year 2017
The following presentation has the slides for the Title called the
Machine Learning with Big data. that following presentation which has the challenges and approaches of machine learning with big data.
The integration of the Big Data with Machine Learning has so many challenges that Big data has and what is the approach made by the machine learning mechanism for those challenges.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this session I will share insights and knowledge that I have gained from building up a Data Science department from scratch. The talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organization.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
This Edureka Data Science course slides will take you through the basics of Data Science - why Data Science, what is Data Science, use cases, BI vs Data Science, Data Science tools and Data Science lifecycle process. This is ideal for beginners to get started with learning data science.
You can read the blog here: https://goo.gl/OoDCxz
You can also take a complete structured training, check out the details here: https://goo.gl/AfxwBc
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
Code: https://github.com/markwest1972/titanic
Video: https://vimeo.com/289705893
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all of this?
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
The new era of data science is here. Our lives and society are continuously transformed by our ability to collect data in a systematic fashion and turn that into value. The opportunities created by this change also comes with challenges that push for new and innovative data management and analytical methods as well as translating these new methods to applications in many areas that impact science, society, and education. Collaboration and ability of multi-disciplinary teams to work together and communicate to bring together the best of their knowledge in business, data and computing is vital for impactful solutions. This talk will discusses a reference ecosystem and question-driven methodology, called PPODS, to make impactful data science applications in many fields with specific examples in hazards, smart cities and biomedical research.
NDC Oslo : A Practical Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this?
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
(1) I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
(2) Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
(3) The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
This Edureka Data Science course slides will take you through the basics of Data Science - why Data Science, what is Data Science, use cases, BI vs Data Science, Data Science tools and Data Science lifecycle process. This is ideal for beginners to get started with learning data science.
You can read the blog here: https://goo.gl/OoDCxz
You can also take a complete structured training, check out the details here: https://goo.gl/AfxwBc
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
Code: https://github.com/markwest1972/titanic
Video: https://vimeo.com/289705893
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all of this?
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
The new era of data science is here. Our lives and society are continuously transformed by our ability to collect data in a systematic fashion and turn that into value. The opportunities created by this change also comes with challenges that push for new and innovative data management and analytical methods as well as translating these new methods to applications in many areas that impact science, society, and education. Collaboration and ability of multi-disciplinary teams to work together and communicate to bring together the best of their knowledge in business, data and computing is vital for impactful solutions. This talk will discusses a reference ecosystem and question-driven methodology, called PPODS, to make impactful data science applications in many fields with specific examples in hazards, smart cities and biomedical research.
NDC Oslo : A Practical Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this?
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
(1) I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
(2) Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
(3) The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Machine Learning : why we should know and how it worksKevin Lee
The most popular buzz word nowadays in the technology world is “Machine Learning (ML).” Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes such as: self-driving vehicles; online recommendation on Netflix and Amazon; fraud detection in banks; image and video recognition; natural language processing; question answering machines (e.g., IBM Watson); and many more. This is leading many organizations to seek experts who can implement Machine Learning into their businesses.
Statistical programmers and statisticians in the pharmaceutical industry are in very interesting positions. We have very similar backgrounds as Machine Learning experts, such as programming, statistics, and data expertise, thus embodying the essential technical skill sets needed. This similarity leads many individuals to ask us about Machine Learning. If you are the leaders of biometric groups, you get asked more often.
The paper is intended for statistical programmers and statisticians who are interested in learning and applying Machine Learning to lead innovation in the pharmaceutical industry. The paper will start with the introduction of basic concepts of Machine Learning - hypothesis and cost function and gradient descent. Then, paper will introduce Supervised ML (e.g., Support Vector Machine, Decision Trees, Logistic Regression), Unsupervised ML (e.g., clustering) and the most powerful ML algorithm, Artificial Neural Network (ANN). The paper will also introduce some of popular SAS ® ML procedures and SAS Visual Data Mining and Machine Learning. Finally, the paper will discuss the current ML implementation, its future implementation and how programmers and statisticians could lead this exciting and disruptive technology in pharmaceutical industry.
Scaling Personalization via Machine-Learned Assortment Optimizationrosentep
From DataEngConf NYC 2018
https://www.datacouncil.ai/speaker/scaling-personalization-via-machine-learned-assortment-optimization
--
Machine learning has revolutionized the capability of businesses to create personalized experiences via real-time, individual predictions and recommendations. But what happens when one must make thousands of decisions for thousands of individuals at the same time?
At Dia&Co, a plus-size women’s styling service, we recently faced such an obstacle when building out a brand new product line for the business. This talk will explore how we combined modern machine learning with classical operations research techniques to scale personalization in the face of constraints inherent to a retail business.
The basics of operations research will be introduced before demonstrating how to solve a simple version of our real-world problem using all open source libraries. I will then reveal the gory details of productionizing this work, from testing to gracefully handling failures of convergence. Finally, I will cover the journey from the coldest of starts, with zero data, to synthesizing machine learning with the operations research problem.
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
Scopri come utilizzare Azure Machine Learning, un servizio cloud che consente alle aziende, università, centri di ricerca e sviluppatori di incorporare e sfrutturare nelle loro applicazioni funzionalità di apprendimento automatico e analisi predittiva su enormi set di dati. Tramite Azure ML Studio possiamo creare, testare, attuare e gestire soluzioni di analisi predittiva e apprendimento automatico nel cloud tramite un qualunque web browser. Durante la sessione si darà un saggio attraverso un esempio di analisi predittiva sul Flight Delay.
Machine Learning for Auditors: What you need to know - ISACA North America CA...Andrew Clark
Machine learning is a hot topic in today's discourse with a myriad of economic and social implications. As it gains wider adoption, what does it mean for assurance professionals? With the proliferation of buzzwords and the black box nature of machine learning, Andrew will help you cut through the noise and understand what fundamental changes are occurring and what is still more hype than reality.
The session will include an overview and high-level implementation guidelines of the two main groups of machine learning algorithms, unsupervised, and supervised, and some example use cases in the auditing and compliance spheres for each. A breakdown of the various buzzwords, AI, cognitive computing, predictive modeling, will be provided, differentiating between fact, future and fiction. The presentation will conclude with a discussion of the need for assurance individuals to begin peaking inside the black box and become aware of the potential regulatory consequences of widespread machine learning.
By providing an overview of the machine learning landscape, the need for ‘black-box auditing’, use cases to take back to your businesses and recommendation on where to learn more about machine learning, this session will provide an overview of what you need to know to adapt and thrive in the machine age.
Machine Learning with Azure and Databricks Virtual WorkshopCCG
Join CCG and Microsoft for a hands-on demonstration of Azure’s machine learning capabilities. During the workshop, we will:
- Hold a Machine Learning 101 session to explain what machine learning is and how it fits in the analytics landscape
- Demonstrate Azure Databricks’ capabilities for building custom machine learning models
- Take a tour of the Azure Machine Learning’s capabilities for MLOps, Automated Machine Learning, and code-free Machine Learning
By the end of the workshop, you’ll have the tools you need to begin your own journey to AI.
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
Machine learning model fairness and interpretability are critical for data scientists, researchers and developers to explain their models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them.
In this session, Francesca will go over a few methods and tools that enable you to "unpack” machine learning models, gain insights into how and why they produce specific results, assess your AI systems fairness and mitigate any observed fairness issues.
Using open-source fairness and interpretability packages, attendees will learn how to:
- Explain model prediction by generating feature importance values for the entire model and/or individual data points.
- Achieve model interpretability on real-world datasets at scale, during training and inference.
- Use an interactive visualization dashboard to discover patterns in data and explanations at training time.
- Leverage additional interactive visualizations to assess which groups of users might be negatively impacted by a model and compare multiple models in terms of their fairness and performance.
The Machine Learning Workflow with AzureIvo Andreev
Machine learning is not black magic but a discipline that involves data analysis, data science and of course – hard work. From searching patterns in data, applying algorithms to converting to usable predictions, you would need background and appropriate tools. In this session, we will go through major approaches to prepare data, build and deploy ML models in Azure (ML Studio, DataScience VM, Jupyter Notebook). Most importantly – based on some examples from the real world, we will provide you with a workflow of best practices.
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
Title: An Introduction to Machine Learning with Python and Scikit-Learn
Speaker: Julien Simon (https://linkedin.com/in/juliensimon/)
Date: Thursday, March 14, 2019
Event: https://www.meetup.com/Athens-Big-Data/events/259091496/
Workshop: Your first machine learning projectAlex Austin
Tutorial to help you create your first machine learning project. The goal was to make this straightforward even someone who's never written a line of code. We gave the workshop to MBA students at UC Berkeley and had a lot of fun learning together - don't be intimidated, anyone can do it!
Continues with Excel basics giving information on cell addressing styles and worksheet functions and their nesting. Also gives an example of precision setting
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. Agenda
• What
is
machine
learning?
• Why
machine
learning
and
why
now?
• Machine
learning
terminology
• Overview
of
machine
learning
methods
• Machine
learning
to
deep
learning
• Summary
and
Q
&
A
iksinc@yahoo.com
4. What
is
Machine
Learning?
• Machine
learning
deals
with
making
computers
learn
to
make
predic)ons/decisions
without
explicitly
programming
them.
Rather
a
large
number
of
examples
of
the
underlying
task
are
shown
to
op)mize
a
performance
criterion
to
achieve
learning.
iksinc@yahoo.com
5. An
Example
of
Machine
Learning:
Credit
Default
Predic)on
We
have
historical
data
about
businesses
and
their
delinquency.
The
data
consists
of
100
businesses.
Each
business
is
characterized
via
two
a7ributes:
business
age
in
months
and
number
of
days
delinquent
in
payment.
We
also
know
whether
a
business
defaulted
or
not.
Using
machine
learning,
we
can
build
a
model
to
predict
the
probability
whether
a
given
business
will
default
or
not.
0
20
40
60
80
100
0
100
200
300
400
500
iksinc@yahoo.com
6. Logis)c
Regression
• The
model
that
is
used
here
is
called
the
logis&c
regression
model.
Lets
look
at
the
following
expression
,
where
x1,
x2,…,
xk
are
the
a7ributes.
• In
our
example,
the
a7ributes
are
business
age
and
number
of
days
of
delinquency.
• The
quan)ty
p
will
always
lie
in
the
range
0-‐1
and
thus
can
be
interpreted
as
the
probability
of
outcome
being
default
or
no
default.
p =
e(a0+a1x1...+ak xk )
1+e(a0+a1x1...+ak xk )
iksinc@yahoo.com
7. Logis)c
Regression
• By
simple
rewri)ng,
we
get:
log(p/(1-‐p))
=
a0
+
a1x1
+
a2x2
+·∙·∙·∙
+
akxk
• This
ra)o
is
called
log
odds
• The
parameters
of
the
logis)c
model,
a0
,
a1,…,
ak,
are
learned
via
an
op)miza)on
procedure
• The
learned
parameters
can
then
be
deployed
in
the
field
to
make
predic)ons
iksinc@yahoo.com
8. 0
0.2
0.4
0.6
0.8
1
1.2
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
Only
in
rare
cases,
we
get
a
100%
accurate
model.
Model
Details
and
Performance
Plot
of
predicted
default
probability
iksinc@yahoo.com
9. Using
the
Model
• What
is
the
probability
of
a
business
defaul)ng
given
that
business
has
been
with
the
bank
for
26
months
and
is
delinquent
for
58
days?
e0.008*26+0.102*58-‐5.706/
(1+e0.008*26+0.102*58-‐5.706)
0.603
Plug
the
model
parameters
to
calculate
p
BUSAGE:
0.008;
DAYSDELQ:
0.102;
Intercept:
-‐5.076
iksinc@yahoo.com
12. Buzz
about
Machine
Learning
"Every
company
is
now
a
data
company,
capable
of
using
machine
learning
in
the
cloud
to
deploy
intelligent
apps
at
scale,
thanks
to
three
machine
learning
trends:
data
flywheels,
the
algorithm
economy,
and
cloud-‐hosted
intelligence."
Three
factors
are
making
machine
learning
hot.
These
are
cheap
data,
algorithmic
economy,
and
cloud-‐based
solu)ons.
iksinc@yahoo.com
13. Data
is
gemng
cheaper
For
example,
Tesla
has
780
million
miles
of
driving
data,
and
adds
another
million
every
10
hours
iksinc@yahoo.com
16. Cloud-‐Based
Intelligence
Emerging
machine
intelligence
plaoorms
hos)ng
pre-‐trained
machine
learning
models-‐as-‐a-‐service
are
making
it
easy
for
companies
to
get
started
with
ML,
allowing
them
to
rapidly
take
their
applica)ons
from
prototype
to
produc)on.
Many
open
source
machine
learning
and
deep
learning
frameworks
running
in
the
cloud
allow
easy
leveraging
of
pre-‐
trained,
hosted
models
to
tag
images,
recommend
products,
and
do
general
natural
language
processing
tasks.
iksinc@yahoo.com
20. Feature
Vectors
in
ML
• A
machine
learning
system
builds
models
using
proper)es
of
objects
being
modeled.
These
proper)es
are
called
features
or
a@ributes
and
the
process
of
measuring/obtaining
such
proper)es
is
called
feature
extrac&on.
It
is
common
to
represent
the
proper)es
of
objects
as
feature
vectors.
Sepal
width
Sepal
length
Petal
width
Petal
length
x =
2
6
6
4
x1
x2
x3
x4
3
7
7
5
iksinc@yahoo.com
21. Learning
Styles
• Supervised
Learning
– Training
data
comes
with
answers,
called
labels
– The
goal
is
to
produce
labels
for
new
data
iksinc@yahoo.com
22. Supervised
Learning
Models
• Classifica)on
models
– Predict
whether
a
customer
is
likely
to
be
lost
to
compe)tor
– Tag
objects
in
a
given
image
– Determine
whether
an
incoming
email
is
spam
or
not
iksinc@yahoo.com
23. Supervised
Learning
Models
• Regression
models
– Predict
credit
card
balance
of
customers
– Predict
the
number
of
'likes'
for
a
pos)ng
– Predict
peak
load
for
a
u)lity
given
weather
informa)on
iksinc@yahoo.com
24. Learning
Styles
• Unsupervised
Learning
– Training
data
comes
without
labels
– The
goal
is
to
group
data
into
different
categories
based
on
similari)es
Grouped
Data
iksinc@yahoo.com
25. Unsupervised
Learning
Models
• Segment/
cluster
customers
into
different
groups
• Organize
a
collec)on
of
documents
based
on
their
content
• Make
Recommenda)ons
for
products
iksinc@yahoo.com
26. Learning
Styles
• Reinforcement
Learning
– Training
data
comes
without
labels
– The
learning
system
receives
feedback
from
its
opera)ng
environment
to
know
how
well
it
is
doing
– The
goal
is
to
perform
be7er
iksinc@yahoo.com
28. Walk
Through
An
Example:
Flower
Classifica)on
• Build
a
classifica)on
model
to
differen)ate
between
two
classes
of
flower
iksinc@yahoo.com
29. How
Do
We
Go
About
It?
• Collect
a
large
number
of
both
types
of
flowers
with
the
help
of
an
expert
• Measure
some
a7ributes
that
can
help
differen)ate
between
the
two
types
of
flowers.
Let
those
a7ributes
be
petal
area
and
sepal
area.
iksinc@yahoo.com
31. We
can
separate
the
flower
types
using
the
linear
boundary
shown
above.
The
parameters
of
the
line
represent
the
learned
classifica)on
model.
iksinc@yahoo.com
32. Another
possible
boundary.
This
boundary
cannot
be
expressed
via
an
equa)on.
However,
a
tree
structure
can
be
used
to
express
this
boundary.
Note,
this
boundary
does
be7er
predic)on
of
the
collected
data
iksinc@yahoo.com
33. Yet
another
possible
boundary.
This
boundary
does
predic)on
without
any
error.
Is
this
a
be7er
boundary?
iksinc@yahoo.com
34. Model
Complexity
• There
are
tradeoffs
between
the
complexity
of
models
and
their
performance
in
the
field.
A
good
design
(model
choice)
weighs
these
tradeoffs.
• A
good
design
should
avoid
overfimng.
How?
– Divide
the
en)re
data
into
three
sets
• Training
set
(about
70%
of
the
total
data).
Use
this
set
to
build
the
model
• Test
set
(about
20%
of
the
total
data).
Use
this
set
to
es)mate
the
model
accuracy
auer
deployment
• Valida)on
set
(remaining
10%
of
the
total
data).
Use
this
set
to
determine
the
appropriate
semngs
for
free
parameters
of
the
model.
May
not
be
required
in
some
cases.
iksinc@yahoo.com
35. Measuring
Model
Performance
• True
Posi)ve:
Correctly
iden)fied
as
relevant
• True
Nega)ve:
Correctly
iden)fied
as
not
relevant
• False
Posi)ve:
Incorrectly
labeled
as
relevant
• False
Nega)ve:
Incorrectly
labeled
as
not
relevant
Image:
True
Posi)ve
True
Nega)ve
Cat
vs.
No
Cat
False
Nega)ve
False
Posi)ve
iksinc@yahoo.com
36. Precision,
Recall,
and
Accuracy
• Precision
– Percentage
of
posi)ve
labels
that
are
correct
– Precision
=
(#
true
posi)ves)
/
(#
true
posi)ves
+
#
false
posi)ves)
• Recall
– Percentage
of
posi)ve
examples
that
are
correctly
labeled
– Recall
=
(#
true
posi)ves)
/
(#
true
posi)ves
+
#
false
nega)ves)
• Accuracy
– Percentage
of
correct
labels
– Accuracy
=
(#
true
posi)ves
+
#
true
nega)ves)
/
(#
of
samples)
iksinc@yahoo.com
37. Sum-‐of-‐Squares
Error
for
Regression
Models
For
regression
model,
the
error
is
measured
by
taking
the
square
of
the
difference
between
the
predicted
output
value
and
the
target
value
for
each
training
(test)
example
and
adding
this
number
over
all
examples
as
shown
iksinc@yahoo.com
38. Bias
and
Variance
• Bias:
expected
difference
between
model’s
predic)on
and
truth
• Variance:
how
much
the
model
differs
among
training
sets
• Model
Scenarios
– High
Bias:
Model
makes
inaccurate
predic)ons
on
training
data
– High
Variance:
Model
does
not
generalize
to
new
datasets
– Low
Bias:
Model
makes
accurate
predic)ons
on
training
data
– Low
Variance:
Model
generalizes
to
new
datasets
iksinc@yahoo.com
40. Model
Building
Algorithms
• Supervised
learning
algorithms
– Linear
methods
– k-‐NN
classifiers
– Neural
networks
– Support
vector
machines
– Decision
trees
– Ensemble
methods
iksinc@yahoo.com
41. Illustra)on
of
k-‐NN
Model
Predicted
label
of
test
example
with
1-‐NN
model
:
Versicolor
Predicted
label
of
text
example
with
3-‐NN
model:
Virginica
Test
example
iksinc@yahoo.com
42. Illustra)on
of
Decision
Tree
Model
Petal
width
<=
0.8
Setosa
Yes
Petal
length
<=
4.75
Versicolor
Virginica
Yes
No
No
The
decision
tree
is
automa)cally
generated
by
a
machine
learning
algorithm.
iksinc@yahoo.com
43. Model
Building
Algorithms
• Unsupervised
learning
– k-‐means
clustering
– Agglomera)ve
clustering
– Self
organiza)on
feature
maps
– Recommenda)on
system
iksinc@yahoo.com
44. K-‐means
Clustering
K-m
“by far the
clusterin
nowadays in
industrial
Choose
the
number
of
clusters,
k,
and
ini)al
cluster
centers
45. K-‐means
Clustering
K-m
“by far the
clusterin
nowadays in
industrial
K-means clustering
2
K-means clustering
2
K-means clustering
2
Assign
data
points
to
clusters
based
on
distance
to
cluster
centers
46. K-‐means
Clustering
K-m
“by far the
clusterin
nowadays in
industrial
K-means clustering
2
K-means clustering
2
K-means clustering
2
K-means clustering p
(sum of square dis
from data points to
centers)
minimize
N
n=1
⇥xn cente
3
Update
cluster
centers
and
reassign
data
points.
K-means
K-means clustering problem
(sum of square distances
from data points to cluster
minimize
N
n=1
⇥xn centern⇥
2
49. Steps
Towards
a
Machine
Learning
Project
• Collect
data
• Explore
data
via
sca7er
plots,
histograms.
Remove
duplicates
and
data
records
with
missing
values
• Check
for
dimensionality
reduc)on
• Build
model
(itera)ve
process)
• Transport/Integrate
with
an
applica)on
iksinc@yahoo.com
51. Machine
Learning
Limita)on
• Machine
learning
methods
operate
on
manually
designed
features.
• The
design
of
such
features
for
tasks
involving
computer
vision,
speech
understanding,
natural
language
processing
is
extremely
difficult.
This
puts
a
limit
on
the
performance
of
the
system.
iksinc@yahoo.com
Feature
Extractor
Trainable
Classifier
52. Processing
Sensory
Data
is
Hard
How
do
we
bridge
this
gap
between
the
pixels
and
meaning
via
machine
learning?
53. Sensory
Data
Processing
is
Challenging
So
why
not
build
integrated
learning
systems
that
perform
end-‐to-‐end
learning,
i.e.
learn
the
representa)on
as
well
as
classifica)on
from
raw
data
without
any
engineered
features.
Feature
Learner
Trainable
Classifier
An
approach
performing
end-‐to-‐end
learning,
typically
performed
through
a
series
of
successive
abstrac)ons,
is
in
a
nutshell
deep
learning
54. SegNet
is
a
deep
learning
architecture
for
pixel
wise
seman)c
segmenta)on
from
the
University
of
Cambridge.
An
example
of
deep
learning
Capability
55. Summary
• We
have
just
skimmed
machine
learning
at
surface
• Web
is
full
of
reading
resources
(free
books,
lecture
notes,
blogs,
videos)
to
dig
into
machine
learning
• Several
open
source
souware
resources
(R,
Rapid
Miner,
and
Scikit-‐learn
etc.)
to
learn
via
experimenta)on
• Applica)ons
based
on
vision,
speech,
and
natural
language
processing
are
excellent
candidates
for
deep
learning
iksinc@yahoo.com