SlideShare a Scribd company logo
1 of 43
Artificial Intelligence and
Machine Learning for Business.
A No-Nonsense Guide to Data Driven Technologies
June 2017
Introduction
• Artificial Intelligence (AI) and Machine Learning are now mainstream business
tools.
• They are being applied across many industries to increase profits, reduce costs,
save lives and improve customer experiences.
• Organisations which understand these tools and know how to use them are
benefiting at the expense of their rivals.
• The material in this presentation is taken from the book:
2
Artificial Intelligence and Machine Learning for Business.
A no-Nonsense Guide to Data Driven Technologies
by Steven Finlay (2017), published by Relativistic Books.
It is available at Amazon and all good book stores.
Objectives
• The objective of this presentation is to cut through the technical jargon
which is often associated with Artificial Intelligence and Machine Learning to
deliver a simple and concise introduction for managers and business
people.
• The focus is very much on practical application.
 How to work with technical specialists (data scientists) to maximise the benefits of these
technologies.
• Its formula free.
 There is no complex maths or equations.
 There is a bit of arithmetic.
3
What is Artificial Intelligence (AI)?
• A simple working definition of Artificial Intelligence (AI):
• A good AI application can perform as well or better than the
average person when faced with everyday tasks.
4
The replication of human analytical and/or decision making capabilities.
What is Artificial Intelligence (AI)?
• Examples of AI in practice:
 The ability to identify people from their Facebook photos.
 Assessing someone’s creditworthiness more accurately than an experienced
underwriter.
 The ability to beat the best go and chess players.
 Digital personal assistants, such as Amazon’s Echo and Apple’s Siri.
 Being better at spotting the signs of cancer on a medical scan than an expert
radiologist.
 Travelling from A to B and avoiding hitting things on the way (self driving cars
and autonomous robots).
• At one level, AI applications can seem almost magical. However, like
most things, when you get under the bonnet the mystique evaporates.
5
What is Artificial Intelligence (AI)?
• A mistake to avoid is thinking current AI applications are in any way
intelligent in a human conscious way.
 Its all just clever maths at the end of the day.
• Despite the hype appearing in the media, most experts agree that we
are years away from creating a machine with a true human-like sense
of self, which could pass as human day in, day out.
 That’s not to say there aren’t some very good chatbots out there!
• General AI research has many strands. There are lots of different
avenues being explored.
• From a business perspective, almost all practical applications of AI in
use today are based on machine learning.
 It’s not technically correct, but from a practical perspective, one can argue that
machine learning and artificial intelligence are more or less the same thing.
6
What is Machine Learning?
• Machine learning involves the use of mathematical procedures (algorithms) to analyze data.
• The goal is to discover useful patterns (relationships/correlations/associations) between different
items of data.
 People who buy chicken often buy white wine as well.
 Lack of exercise is correlated with heart disease.
 A poor credit history is associated with higher insurance claims.
 Washing machines are cube shaped, most umbrellas are long and thin (when closed).
• Once the relationships have been identified, these can be used to make inferences about the
behavior of new cases when they present themselves. Appropriate actions then can be taken.
 Send discount coupons for white wine to people who buy chicken.
 Offer non-active people free gym membership
 Don’t offer the best insurance deals to people with a poor credit history
 If an object is cube shaped, its more likely to be a washing machine than an umbrella.
• In essence, this is analogous to the way people learn.
 We observe what goes on around us and draw conclusions about how the world works.
 We then apply what we have learnt to help us deal with new situations that we find ourselves in.
 The more we experience and learn, the better our ability to make decisions becomes.
• The end product of the machine learning process is a predictive model (or just model going
forward).
 The model captures the key relationships that the machine learning process has identified.
7
What is Machine Learning?
An Example.
• Machine learning can be applied to examine historic patient records to find the patient characteristics that
correlate with heart disease. The predictive model that is created from the machine learning process is then
used to identify those individuals most at risk.
• Machine learning needs data, and lots of it, to work effectively. Therefore, the government takes a large
development sample of say, 500,000 historic patient records from 5 years ago. They then flag (label) each
case to indicate who got heart disease and who did not in the following 5 years.
 30,000 did get heart disease.
 470,000 did not get heart disease.
• Having data about people who didn’t get heart disease is just as important as data about those that did.
• A common mistake with data analysis is just to record data about the event of interest, and to forget about the
“non-events.” In this case, having information about people who didn’t get heart disease is just as important as
having information about those that did. This is because the machine learning process works by identifying the
patterns that differentiate the two event types.
• The data contains lots of information about each person. This includes things such as their gender, income,
age, alcohol intake, BMI, blood pressure, if they smoke, and so on.
8
• Imagine that the government wants to screen everyone in the population for their likelihood of developing heart
disease in the next 5 years.
• Those at highest risk will be contacted to invite them for a check-up and lifestyle review to try and reduce their
risk of developing heart disease in the future.
• Lifestyle reviews are expensive. The government can only afford to invite 5% of the population for a review.
What is Machine Learning?
An Example.
9
Starting score (constant) 350
Age (years) Gross annual income ($)
<23 -57 < $22,000 11
23 - 32 -26 $22,001 - $38,000 6
33 - 41 0 $38,001 - $60,000 0
42 - 48 7 $60,001 - $94,000 -3
49 - 57 15 $94,001 - $144,000 -5
58 - 64 24 >$144,000 -6
65 - 71 31
>71 65 Smoker ?
Yes 37
BMI (weight in kg / {height in metres}2
) No 0
<19 2
19 - 26 0 Diabetic ?
27 - 29 8 Yes 21
30 - 32 14 No 0
>32 29
Cholesterol level (mg per decilitre of blood)
Gender Low (< 160 mg) -2
Male 2 Normal (160 - 200 mg) 0
Female -4 High (201 - 240 mg) 19
Very high (>240 mg) 32
Alcohol consumption (units/week)
0 4 Blood pressure
1 - 12 0 Low (below 90/60) 3
13 - 24 5 Average (between 90/60 and 140/90) 0
25 - 48 10 High (above 140/90) 36
For a 45 year old female with the
following characteristics:
4
• A BMI of 28.
• Drinks an average of 6 units of alcohol a week.
• An honours degree graduate.
• An income of $50,000 a year.
• Smokes.
• Is not diabetic.
• Normal cholesterol levels.
• Low blood pressure.
The starting score is 350. Seven points are added
due to her age, which brings the score to 357.
Eight points are then added based on her BMI,
and so on. After adding/subtracting all of the
relevant points, the final score is 404.
Note that graduate does not feature in the
model. This indicates the algorithm has found no
correlation between being a graduate (or not) and
heart disease.
• In this example, the machine learning process is used to generate a scorecard type model.
• Scorecards are very simple and easy to understand, as in the example below.
• The score, calculated by adding up all the points that apply, represents the individual’s risk.
The training algorithm has used the data available to determine
what the points in the scorecard should be; i.e. the points represent
the patterns that the machine learning process has identified.
What is Machine Learning?
An Example.
10
• This is a score distribution table.
• It has been created by running all 500,000 cases
in the development sample1 against the
scorecard to generate a score for each case.
• The scores have been put into ranges (for
convenience)
• To generate a prediction, one calculates a
person’s score, and then reads across to the
right most column.
• For someone scoring 303, they fall into the
second row of the table. The proportion of
people in the 301-320 score range who were
observed to develop heart disease over a five
year period was 0.12%
• Therefore, our prediction for someone scoring
303 today, is that there is a 0.12% chance of
them developing heart disease in the next 5
years.
• For the case of the 45 year old women who scored 404, she falls in the 401-420 score range.
• Reading across, the proportion of people who scored between 401 and 420 who went on to develop heart
disease was 1.76%.
• i.e. the model predicts that she has a 1.76% (1 in 57) chance of developing heart disease in the next five
years.
From To
0 300 55,950 11.19% 40 0.07%
301 320 56,606 11.32% 68 0.12%
321 340 59,700 11.94% 129 0.22%
341 360 58,706 11.74% 216 0.37%
361 380 64,429 12.89% 403 0.63%
381 400 52,749 10.55% 575 1.09%
401 420 34,089 6.82% 600 1.76%
421 440 21,107 4.22% 632 2.99%
441 460 17,269 3.45% 878 5.09%
461 480 23,364 4.67% 2,020 8.65%
481 500 17,477 3.50% 2,553 14.61%
501 520 13,554 2.71% 3,366 24.84%
521 540 7,103 1.42% 3,463 48.76%
541 560 8,260 1.65% 6,587 79.74%
561 999 9,637 1.93% 8,469 87.88%
Total 500,000 100.00% 30,000 6.00%
% of
population
Number with
heart disease
after 5 yrs.
% with heart
disease after
5 yrs.
Score range Number of
people
What is Machine Learning?
An Example.
11
• That’s pretty good isn’t it?
• When I apply this model across the entire population of many millions, I can expect to identify 62%
of cases, within just 5% of the population.
• The model is far better than a random selection process, which would only identify 6% of cases.
• OK. So how do we use the prediction (score)?
• The government has indicated that it can afford
to invite 5% of the population for a check up.
• We use the score distribution to identify the
“best” 5% to invite by using the % of population
column (blue figures).
• If you add up the figures in blue, then that
equals 5% of the population.
• So our decision rule is to invite anyone scoring
521 or more for a check-up.
• By doing this, we will identify 18,519 out of
30,000 people who will get heart disease; i.e.
62% of cases (Sum of green figures).
From To
0 300 55,950 11.19% 40 0.07%
301 320 56,606 11.32% 68 0.12%
321 340 59,700 11.94% 129 0.22%
341 360 58,706 11.74% 216 0.37%
361 380 64,429 12.89% 403 0.63%
381 400 52,749 10.55% 575 1.09%
401 420 34,089 6.82% 600 1.76%
421 440 21,107 4.22% 632 2.99%
441 460 17,269 3.45% 878 5.09%
461 480 23,364 4.67% 2,020 8.65%
481 500 17,477 3.50% 2,553 14.61%
501 520 13,554 2.71% 3,366 24.84%
521 540 7,103 1.42% 3,463 48.76%
541 560 8,260 1.65% 6,587 79.74%
561 999 9,637 1.93% 8,469 87.88%
Total 500,000 100.00% 30,000 6.00%
Score range Number of
people
% of
population
Number with
heart disease
after 5 yrs.
% with heart
disease after
5 yrs.
Example 2:
Object recognition
• The goal of an object recognition system is to identify what
object is present in an image; e.g. a photo.
• The underlying principles are the same as for the heart disease
problem:
 A large development sample is taken.
 The sample contains thousands of images of objects such as washing
machines, chairs, umbrellas and so on.
• Many different images of the same object, but taken from different angles.
• Images of different versions of that object. E.g. different types of chairs and umbrellas.
 Each image is labelled to say what type of object it is.
• Picture 1 is a chair
• Picture 2 is a washing machine
• Picture 3 is another washing machine
• Picture 4 is an umbrella
• Etc.
12
Example 2:
Object recognition
• The pixels that form each image are analogous to
the patient data for the heart disease problem:
• BMI
• Age,
• Etc.
• The data in this case are the colour and
brightness of individual pixels in the image:
• Pixel 1
• Pixel 2
• …
• Pixel N
• Lots of pixels! A low(ish) resolution image made of
256 * 256 pixels has 65,536 pixels in total.
• This is a big problem, requiring powerful
computers and lots of processing time!
 A lot more complex then the heart disease problem,
even though the underlying principles are very similar.
13
1 2 3 …
N
Example 2:
Object recognition
• A separate score is created to represent the probability of the item in
the image being a particular object.
 A complex system could therefore generate thousands of separate scores.
• When a new image is presented to the model, a score is calculated for
every possible object.
 If we used scorecards to do this, this would mean a separate scorecard to
calculate a score for each object.
• The object with the highest score (highest probability) is the one
selected.
 i.e. the object “predicted” by the model
14
1 2 3 …
Object Score
(Probability)
Washing machine 0.07
Umbrella 0.04
Cup 0.03
Chair 0.62
Bathtub 0.02
Table 0.16
Shoe 0.01
Etc. …
Different types of predictive models
• A scorecard is just one type of predictive model.
• There are lots of other types of model, some far more complex than scorecards.
 Some are much more predictive than scorecards in some cases, but not always.
 Some types of model are better suited to certain types of problem than others.
 For some types of problem, many different types of model perform equally well.
 There is no one type of predictive model that can be said to out perform all others in all situations.
• However, the way models are used are identical, regardless of the model type:
1. The models generate numbers (scores) based on the information presented to them.
2. The score(s) from the model represents the thing(s) one is trying to predict.
3. A decision is then taken about what action to take on the basis of the score(s).
• Let’s take a look a some other types of predictive model.
15
Different types of predictive models:
Decision trees
Relativistic 16
To calculate a score for someone using
the decision tree, one starts with the
top node of the tree (Entire population).
At each branching point the node is
chosen that fits that person’s individual
characteristics. The first branching is
based on age. If a person is less than 52
years old, the left hand side of the tree
is followed. If they are aged 52 years or
more, then the right hand branch is
followed.
The process is repeated at each decision
point until there are no further
decisions to be made. The final node
(called an end node or leaf node) into
which a person falls determines their
score.
There are 13 end nodes (shaded in grey
and numbered) and hence 13 scores
that someone can receive.
Those scoring 1 have the lowest chance
of developing heart disease, those
scoring 13 the highest.
• This decision tree has been created using the same 500,000 cases used to
develop the scorecard type model presented earlier.
• Note that the scorecard and decision tree contain slightly different information.
The scorecard includes annual income, but the decision tree does not. Likewise
“Graduate” features in the decision tree model but not the scorecard. This is a
feature of the different algorithms used to create each type of model.
Different types of predictive models:
Decision trees
• The decision tree is used in the same way as the scorecard to make
decisions, using a score distribution table.
17
• The only major difference is that whereas the scorecard created scores in the range 0 – 999,
there are just 13 possible scores generated by the decision tree.
• Given that the decision tree and scorecard were developed using different algorithms, their
predictive accuracy will be different.
• In this example, the scorecard is slightly better than the decision tree (as measured on the
proportion of heart disease cases in the highest scoring 5%), but for another problem the
results could be different.
Different types of predictive models:
(Artificial) Neural Networks (NNs)
• Our brains contain billions upon billions of neurons, each interconnected to yield
trillions of connections. It's the combination of neurons and the connections between
them that drive our ability to think and act intelligently, to understand how the world
works and to make decisions (both consciously and unconsciously) about how we
interact with the world.
• Back in the 1950s, computer scientists first came up with the concept of the
“Artificial Neuron”, but it wasn’t until the mid-1980s that using multiple artificial
neurons linked together (a Neural Network ) was proposed as a way to solve
complex problems. Since that time, the popularity of neural networks has continued
to grow, and advanced forms of neural networks are currently at the forefront of
research in to artificial intelligence and machine learning.
• The first thing to say about neural networks is that while they are sometimes touted
as being immensely complex “brain like” things, they can be readily understood if
one is willing to spend a little time and effort to study them.
• It’s certainly true that a typical neural network is more complex than the scorecards
and decision trees that were introduced in previous slides, but the underlying
principles are not that much different at the end of the day – there is just more of it.
18
Different types of predictive models:
(Artificial) Neural Networks (NNs)
• A neuron in machine learning terms is not a living thing that’s been grown by a mad
scientist in a covert lab somewhere, but is a simple and simplistic representation of how
natural neurons behave, created using equations and implemented as computer code.
19
The way a neuron works is as follows:
1. The patient data, such as Age, BMI, etc., provides the
inputs to the neuron.
2. Each input is multiplied by a weight (which can be
positive or negative). For non-numeric data, such as
gender or smoker, 0/1 flags are used to represent each
condition. For example 1 if female, 0 if male.
3. The inputs, multiplied by their weights are added up to
get an initial score.
4. The initial score is transformed. Often, this is to force
the score to be in the range 0-1. This transformationa is
not absolutely essential, but is “good practice.” This is so
that when the neuron is combined with others to
produce a neural network, all the neurons produce
values in the same range; i.e. between 0 and 1.
5. The transformed version of the initial score is the
output score produced by the neuron.
a. I promised no complex formulas, but I’ll throw just one into the mix. There are many
transformations (activation functions) that can be applied. A very common one is the
“logistic function.” This is calculated as 1/(1+e - (Initial Score)). The value of e is 2.718. e is a
bit like PI (3.142) in that it appears in all sorts of places in mathematics and statistics.
Different types of predictive models:
(Artificial) Neural Networks (NNs)
• So a neuron in the context of AI / machine learning isn’t anything
mysterious or complex. It’s just a couple of simple formulas.
 The first formula multiples each of the input variables by a weight and
adds up the total.
 The second formula then modifies (transforms) this value so that it lies
within a fixed range, usually between 0 and 1.
 It’s as simple as that!
• The clever bit is determining what the weights should be.
• A single neuron is not much different from a scorecard type
model. The only material difference is the transformation to force
the score to lie in a fixed range.
• It can be demonstrated that the outputs generated by neuron and
a scorecard type model are in fact equivalent.
20
Different types of predictive models:
(Artificial) Neural Networks (NNs)
21
• A neural network is created by combining several neurons together.
The key features of this neural network are:
1.The inputs to each neuron in a given layer are the same (e.g. Age,
BMI etc. for the first layer neurons), but the weights within each
neuron are different. Therefore, each neuron produces a different
score.
2.The outputs (the scores) from the first set of neurons (the first
layer) provide the inputs to the second set of neurons (the second
layer).
3.There are four neurons in the first layer in this example. In practice,
the optimal number of neurons can’t be determined up front.
Instead, a number of competing models are built with different
numbers of neurons, and the one that performs the best is chosen.
4.There are lot of weights. This is a very simple network, with just 8
input variables and 5 neurons, but there are 36 weights. A typical
business application with say, 50 input variables and 75 neurons,
would have more than 3,000 weights.
The clever bit is how one determines what the weights should be.
This is where the clever maths comes into play, to find the weights
that provide the best overall model; i.e. the one that provides the
most accurate predictions (scores). This is not an easy task, and can
require a huge amount of computer processing time to undertake for
some problems.
This example has just one output score (final score). For complex
problems with several possible outcomes, there would be many
more neurons in the final layer, one for each possible outcome.
Different types of predictive models:
(Artificial) Neural Networks (NNs)
• The reason why neural networks are so popular is their ability to detect
subtle patterns in data which other simpler methods, such as
scorecards and decision trees, may not be able to find.
• This means that they can potentially generate more accurate
predictions than scorecard or decision tree type models.
 This is achieved by having the two layers of neurons, with the scores from the
first layer providing the inputs to the second layer.
• The main drawback of neural networks is that the scores that they
generate are not intuitive.
• One can see what the various weights in the model are and understand
the overall score calculation, but if I was to ask you which input variable
contributes the most to the final score, then this is much less obvious
than for the scorecard and decision tree models of heart disease.
22
Deep learning
• Deep learning represents the latest evolution of neural network type
models.
• The neural network example on previous slides effectively has two
layers of neurons and this structure is used very successfully in many
traditional neural network applications.
• However, there is no reason why there can’t be many more layers. So
there could be 3, 4, …, 100+ layers if desired, with different numbers of
neurons in each layer.
• What one tends to find is that as more layers are added, so the ability of
the network to identify complex and/or subtle patterns increases.
• The more layers, the “deeper” the network. Generally speaking,
anything with more than 2 or 3 of layers could be classified as a “deep”
network – but there is no single accepted definition.
23
Deep learning
• As well as expanding the number of neurons and layers in a network, other avenues
of research associated with deep learning consider how the neurons in the network
are connected.
• In a standard network (like the example used in this presentation), all of the neurons
in each layer are connected to all of the neurons in the next layer.
• However, other configurations are possible. For example, not connecting all inputs
to all of the neurons in the first layer (a convoluted neural network). This works
particularly well for certain types of problem such as image recognition.
• Another variant on standard neural networks is to create feedback loops. The output
of neurons in later layers act as inputs to earlier layers (a recurrent neural
network). Recurrent neural networks work well when the information used to train
the network is contained in a sequence of events. When using machine learning to
decipher handwriting one letter at a time, the fact that the previous letter in a word
was say, the letter “Z,” is an important predictor for the next letter, which is almost
certainly a vowel or the letter “S.” A traditional neural network for handwriting
recognition would not consider these sequential relationships. Google translate is
one example of a commercial product that uses recurrent networks.
• The most complex neural network models in use today, such as the ones developed
by Google’s Deepmind subsidiary to beat the world’s best Go players, combine
these features and have millions of neurons connected across dozens of layers.
24
Deep learning
• Complex deep learning models, based on various types of neural networks, are
driving many sophisticated artificial intelligence applications, but for many types of
business problem this is overkill.
• A very complex model is only required for a very complex problem.
• It’s the old adage: “you don’t need a sledgehammer to crack a nut.”
• If you apply a very complex machine learning approach to a very simple problem,
then sometimes you will get a worse result than if a very simple model is used. This
may seem counter intuitive, but there will be a tendency for the model to find false
patterns that don’t really exist, whereas this is less likely with simple models.
 Many common business problems are “simple” i.e. scorecards and decisions trees will provide a
pretty good solution. A complex model, such as a neural network might be better, but in many
cases, not that much better (in terms of predictive accuracy) and in some cases, no better at all.
 Axiom. If you can’t get a simple predictive model to add value to what you do, then a more
complex one probably won’t either.
• A good data scientist will explore different options and deliver the simplest solution
that they can, which meets your business objectives.
25
Creating an intelligent application
• A predictive model on its own is not a very user friendly thing:
 It generates scores when presented with data
 It does not interact with people directly
 It does not appear “intelligent”
• Models need to be integrated within your organization’s business
systems, if they are going to add value.
 Model implementation is often a larger, more expensive and time consuming
task than the construction of the model – many data scientists forget this!
• For customer orientated applications, a user interface is required
 Take input from a user e.g. voice input or data captured on an application form.
 Data is pre-processed to transform it into a suitable format to allow scores to be
calculated.
 Scores are generated by running the pre-processed data through the model(s)
 Decision rules are applied; i.e. what action to take given the prediction that has
been generated.
 Results fed back to the user in a “human like” manner. E.g. written or verbal
response.
26
Creating an intelligent application:
Core components
27
Data input. This can include includes:
• Sensory inputs from cameras (eyes), microphones (ears) or other sources.
• Geodemographic data such as people’s age and income
• Location and usage details for smartphones, cars , televisions and other devices
• Spending patterns captured from credit cards and supermarket tills.
Data (pre)processing.
• Data input is processed into a standard “computer friendly” representation.
• Computers like numbers. Images and text are transformed into a suitable numeric formats
• New data items are created e.g. Age is often more useful than Date of Birth. Therefore create
Age from date of birth.
Application of predictive models (score calculation).
• The models generated by the machine learning process are applied.
• Pre-processed data for new cases is fed into the models in order to generate fresh
predictions going forward (model scores).
Decision rules (rule sets).
• Decision rules are used in conjunction with data inputs and the scores generated by
predictive models to decide what to do.
• Sometimes the rules are derived automatically by the machine learning algorithm, but often
they will also include rules defined by human experts/business users.
Output.
• A response needs to be taken based upon the decision(s) that have been made.
• If the system decides that someone should be hired, then an offer letter and contract need
to be sent to the candidate.
Creating an intelligent application:
A. Selection tool for doctors
1. Patient records held in the health authority’s
central database provide the raw data.
2. Every patient’s data is automatically pre-
processed so that it is in the required format to
be scored by the model.
3. Each patient’s (pre-processed) data is presented
to the model and a score generated.
4. Those scoring 521 or more are added to a mailing
list.
5. A letter is sent to everyone on the mailing list,
asking them to make an appointment with their
Doctor.
B. A self-diagnosis tool for patients
1. An phone app is created, which prompts the user
to enter their health data (BMI, Gender, etc.) as
speech.
2. The patient’s verbal responses are translated and
pre-processed so that it is in the format required
by the predictive model.
3. Each patient’s (pre-processed) data is presented
to the model created from the machine learning
process and a score generated.
4. Those scoring 521 or more, are flagged, indicating
that they should be invited for a check-up.
5. A standard verbal response is played back to the
patient.
1. If they score 521 or more play the message: “Please arrange an
appointment with your doctor as soon as possible”
2. If they score below 521 play the message “You are not in a high
risk category. There is no need to contact your doctor.”
Relativistic 28
Let’s look at 2 ways in which the heart disease model could be deployed
The central tasks of calculating scores and applying decision rules are more or less the same.
It’s the interface to the models, and the way that the results are played back to the user which
differ.
Creating an intelligent application:
• Complex AI applications often deploy multiple, interacting
predictive models, combined with lots of decision rules.
• For the patient self diagnosis tool on the previous slide, a second
predictive model would typically be incorporated into the app.
 This model would predict what was being spoken, and map this to the
data items required by the heart disease predictive model.
 In effect, this first predictive model pre-processes spoken information,
to transform it into a format which the second model (the heart disease
prediction model) understands.
• Speech recognition tools are now available “off the shelf.”
Amazon, for example, provides an interface to its speech
recognition software for free, which anyone can incorporate into
their own AI applications.
29
Business applications:
Assessing how good an AI application is
• Accuracy, Accuracy, Accuracy!
 That’s what most data scientists focus on when applying machine learning to
develop applications based on predictive models.
• As a business user, I have no interest in accuracy per-se!!!
• I don’t care that a model is 84% accurate, has a “GINI2” of 76.2% or
whatever.
• What I’m interested in is my bottom line. Things like:
 How much money will the solution make me?
 How many less staff will I need to employ?
 What impact will the solution have on customer service?
 How many more lives will be saved?
• A data scientist needs to work with business people to understand how
their organisation works and how the solution is going to be deployed.
• For example, to evaluate the worth of a machine learning solution to
reduce the number of staff screening CVs in the HR department, they
need to know:
 How many CVs are currently screened per year.
 How long it takes to screen each CV.
 Staff costs.
 What proportion of CVs can be expected to be screened automatically using the
application (it may not be all of them, there will probably be some exception cases).
 How much the machine learning solution is going to cost. This includes up front
costs, annual license fees and any additional maintenance costs.
• Axiom. If a data scientist can’t express the benefits of their solution to you
in terms of your bottom line, then they have not done their job.
30
GINI = 76.2%
$$$
≠
Business applications:
Other considerations
• Is your organisation ready for AI & machine learning?
 If implementing a machine learning based process is going to result in
changes to what people do, redundancies or deskilling, then how is the
organisation going to deal with that?
• Ethical & Legal considerations
 Machine learning has no social awareness and no conscience.
 Patterns are patterns, whatever those patterns may be.
 If the patterns that are found during machine learning are undesirable,
outside social norms, or illegal to use, then this needs to be dealt with.
• For example, if the machine learning process generates
predictive models which result in less favourable treatment
based on gender or ethnic origin, then can you deal with that?
 The statistical evidence might support it, but society won’t tolerate it.
 Even if its legal, there is risk of reputational damage if the existence of
bias in your systems is made public.
31
Business applications
• Its very easy to get excited about the maths and the predictive
models generated by the machine learning process.
• However, one must not lose sight that of the fact that one is
trying to solve a business problem.
• Having a very accurate predictive model is only useful if you
have a real problem to apply it to. The model has no intrinsic
value in itself!
• Implementing the fruits of machine learning is often a much more
expensive and time consuming task than the application of
machine learning to build a predictive model.
32
Business applications
Putting together a project team
• Its very tempting to hire a data scientist (or a
consultancy firm) and just let them “get on
with it,” hoping that they will delivery
something useful at the end of the day.
• That’s a bit like getting a load of builders to
build a house for you, without first hiring an
architect to design it for you, and not having
a project manager to organise things on site
to make sure you get what you’ve asked for.
• Data scientists are clever people, but they
have a specific skill set, and tend to focus on
the mathematical aspects. They can’t do
everything, and they won’t know all the ins
and outs of your business.
• For any large scale AI/Machine learning
based project you will need to pull together a
project team, of which the data scientists are
just one contributor.
• A requirements specification, detailing what
the project aims to achieve and how success
will be measured should be produced at the
outset. Project meetings should held during
the project to assess progress and to ensure
that things remain focused and on track.
33
Project Team
Role What they do
Project sponsor Senior manager who provides spend
approval and champions the project at
senior forums.
Project manager Owns the end-to-end process for
design, build and implementation of
the solution. This should be someone
who represents the business, not a
supplier.
Subject matter
experts (SMEs)
People who know the relevant area of
the business inside out. They
understand how data is currently used
and the constraints and limitations of
the organisation.
Data scientists The techy people who do machine
learning and develop apps.
Audit / validation Providing independent oversight, to
ensure that the project delivers what it
says it will and aligns and meets any
legal or regulatory requirements.
Business applications:
A checklist for business success
• What follows on the next slide is a set of questions that you
should ask yourself, and any data scientists working for
you, to answer BEFORE you move beyond the feasibility
stage of any Machine Learning / AI project.
• Answers should be captured in an appropriate form within
the business requirements of the project.
34
Business applications:
A checklist for success
1. What is your business problem? Machine learning has no intrinsic value in itself. It has to be applied to a specific problem or objective if
it’s going to be of any value.
2. What metric do you wish to optimise? This should be something that can be measured in simple terms and which is important to you.
Time saved, increased profit, reduced cost, customer satisfaction and lives saved, for example. This is what the machine learning process
will be used to optimize.
3. What type of decision rules and actions will you apply? You have to make decisions based on what predictive models tell you, and act
upon those decisions. If you can’t think what you will do with the model, then why have it?
4. Where will the data to do machine learning come from? Machine learning requires at least several hundred, and ideally many
thousands of example cases to work with. The metric of interest (above) needs to be available for these cases.
5. How will you operationalize the machine learning process? What system or process will the predictive models and decisions rules be
implemented within? This will require time and money. Who will do it and who will pay for it? The models and decision rules won’t
implement themselves.
6. What are the ethical and legal risks associated with automation? If you are making important decisions about people then you may
want to test the system to ensure that it does not display unfair or illegal bias against certain groups.
7. Is all required data available operationally? Make the data scientist provide a checklist against every data item that features in their
models to ensure that it exists within the implementation platform. If it’s not, then you won’t be able to use the solution that’s been
developed.
8. Do you need controls to ensure decisions are acted upon? How will you stop staff overriding decisions made on the basis of the
machine learning process? If over-rides are permitted, how will you monitor them to ensure that staff are making the right decisions?
9. How will you assess the success of the project? The theoretical benefits presented at the end of the analytical phase of a machine
learning project are only indicative of what actually happens in real life. A monitoring process is needed to assess the system when it goes
live. This is so that you can measure the actual benefits that are realized against those that were promised.
10.What are the on-going costs of maintaining the system? Over time, new data, new regulations, changing business requirements, and
so on, will mean that the system will require modification and may need to be redeveloped. How often is this expected to occur and how
much is it likely to cost?
35
Appendix A. Common Terms used in AI and Machine
Learning
• Activation function. An equation (mathematical formula) used to transform the raw output of a neuron into its final output (score). Activation
functions are often used to force the output of neurons to lie in a fixed range, usually between 0 and 1, but some result in outputs with different
ranges. Common activation functions are the logistic function, the hyperbolic tangent function and the identify function. See
https://en.wikipedia.org/wiki/Activation_function for a more complete list of activation functions and their derivation.
• Algorithm. An algorithm is a set of instructions or procedures, executed in sequence to solve a particular problem. In machine learning various
algorithms are applied to discover patterns in data and to create predictive models.
• Big Data. A very large collection of varied, changeable data, which is difficult to process using a standard PC/Server; i.e. typically terabytes in size
or larger. Big Data technologies make use of advanced computer architectures and specialist software to facilitate the rapid processing of these data
sets. Machine learning is one of the primary tools used to extract value from Big Data.
• Causation. The reasoning behind why something happened; i.e. the cause of something. Eating too much causes one to become fat. Not to be
confused with correlation.
• Classification model. A predictive model which predicts if a given event will or will not occur. For example, the likelihood that two people will hit it off
on a date, the probability that someone will buy a certain type of car, or the chance that a person develops diabetes sometime in the next 10 years.
The score generated by a classification model is an estimate of the likelihood of the event occurring. Not to be confused with regression model.
• Cluster. Clusters are groups that contain people or things with similar traits. For example, people with similar ages and incomes might be in one
cluster, those with similar job roles and family sizes in another. Clustering algorithms are one form of unsupervised learning (see below).
• Convoluted (neural) network. A type of deep neural network where not all neurons in a layer are connected to all the neurons in the next layer.
With fewer weights, the (very considerable) time required to train a network is much reduced. A classic application is object recognition. Imagine you
have an image comprising 256 * 256 (65,536) pixels. A traditional neural network would need 65,536 weights for each neuron in the first layer, one
for each pixel! By segmenting the image into say, 64 sub-regions each with 32 *32 pixels, and only connecting neurons within each sub-region, the
number of weights is reduced to just 1,024 per neuron. This works because the most important features in images are usually close together (i.e. in
the same or neighbouring sub-regions). Pixels at opposite sides of the image are much less important.
36
Appendix A. Common Terms used in AI and Machine
Learning
• Correlation. One variable is correlated with another if a change in that variable occurs in tandem with a change in the other. It is important to
appreciate that this does not necessarily mean that one thing is caused by another (causation), although it might be. Stork migration is correlated
with births (more children born in the spring when storks migrate), but stork migration does not cause births!
• Cut-off score, see decision rule.
• Data science. The name given to the skill/art of being able combine mathematical (machine learning) knowledge with data and IT skills in a
pragmatic way, to deliver practical value add machine learning based solutions.
• Data scientist. The name given to someone who does data science. Good data scientists focus on delivering useful solutions that work in real world
environments. They don’t get too hung up on theory. If it works it works!
• Decision rule. Predictive models generate scores. Decision rules are then used to decide how people are treated on the basis of the score. Those
who score above a given score (the cut-off score) receive one treatment, those scoring below the cut-off another. For example, when assessing
people for a medical condition, only those scoring above the cut-off; i.e. those with the highest risk of developing the condition, are offered treatment.
• Decision tree. A type of predictive model, created using an algorithm that recursively segments a population into smaller and smaller groups. Also
known as a Classification and Regression Tree or CART (Because they can be used for both classification and regression!)
• Deep learning. Predictive models based on complex neural networks (or related architectures), containing very large numbers of neurons and many
layers and/or complex overlapping networks. These tools are proving successful at complex “AI” type problems such as object recognition and
language translation.
• Development sample. The set of data used by the machine learning process to construct a predictive model. Development samples need to contain
at least several hundred cases, but usually larger samples are used, containing thousands or millions of cases.
• Ensemble. A predictive model comprised of several subsidiary models (as few as 3 or sometimes many thousands). All of the subsidiary models
predict the same outcome, but have been constructed using different methods and/or using different data. This means that the models don’t always
generate the same predictions. The ensemble combines all of the individual predictions to arrive at a final prediction. Model ensembles often (but not
always) significantly outperform the best individual model.
37
Appendix A. Common Terms used in AI and Machine
Learning
• Feed forward neural network. A neural network in which the connections are all one way, from one layer of neurons to the next. There are no
backward or inter-layer connections. Most neural networks are feed forward networks.
• Forecast horizon. Many (but not all) predictive models forecast future unknown events or quantities. The forecast horizon is the time frame over
which a model predicts. Marketing models typically predict response over a forecast horizon of hours or days. In medicine, forecast horizons of many
years are used when predicting survival probabilities for particular conditions.
• Gains chart. A commonly used visual tool used to demonstrate the benefits of a model. Often used in conjunction with a lift chart.
• Gini Coefficient. A popular measure for assessing the discrimination (accuracy) of a predictive model. The GINI is equivalent to the Area Under the
Curve (AUC).
• Internet of Things (The). This describes intelligent everyday devices such as fridges, washing machines and even light bulbs which can be
connected to the internet or other systems. This enables data about their use to be gathered, and for users to be able to control these devices
remotely e.g. set the washing machine going when I’m on my way home via a smartphone app.
• Lift chart. A widely used graphical tool for demonstrating the practical benefits of a predictive model. Often used in conjunction with a Gains chart.
• Linear model. A popular type of predictive model that is easy to understand and use. A score is calculated by multiplying the value of each
characteristic by its relevant weight, and then summing up all the results. A popular way of representing linear models is in the form of a scorecard.
• Linear regression. One of the oldest and most popular methods for creating regression models. The development of linear regression dates back
more than 100 years.
• Logistic regression. A very simple and popular method used for creating classification models.
• Machine learning. Machine learning is the process of finding features (patterns in data). Machine learning algorithms derive from research into
artificial intelligence and pattern recognition. The algorithms used to train (deep) neural networks and support vector machines are two examples of
machine learning approaches.
• Model. A mathematical representation of a real world system or situation. The model is used to determine how the real world system would behave
under different conditions. See also predictive model.
38
Appendix A. Common Terms used in AI and Machine
Learning
• Monitoring. The performance of predictive models tends to deteriorate over time. It is therefore prudent to instigate a monitoring regime following model
implementation to measure how models are performing on an ongoing basis. Models are redeveloped when the monitoring indicates that a significant deterioration
in model performance has occurred.
• Neural network. A popular type of predictive model derived using machine learning. Neural networks are well suited to capturing complex interactions and non-
linarites in data in a way that is analogous to human learning. “Deep” neural networks (Deep learning/Deep belief networks) are very large and complex neural
networks (often containing thousands or millions of artificial neurons) which are used for “AI” tasks such as speech recognition and in self-driving cars.
• Neuron. The key component of a neural network, which is often discussed as being analogous to biological neurons in the human brain. In reality, a neuron is a
linear model whose score is then subject to a (non-linear) transformation. A neural network can therefore be considered as a set of interconnected linear models and
non-linear transformations.
• Odds. A way to represent the likelihood of an event occurring. The odds of an event is equal to (1/p) – 1 where p is the probability of the event. Likewise, the
probability is equal to (1/Odds+1). So, odds of 1:1 is the same as a probability of 0.5, odds of 2:1 a probability of 0.33, 3:1 a probability of 0.25 and so on.
• Over-fitting. The bane of a data scientist’s life. Over-fitting occurs when an algorithm goes too far in its search for correlations in the data used to develop a
predictive model. The net result is that the model looks to be very predictive when measured against the development sample, but performs very poorly when it’s
used to predict new outcomes using data that has not been presented to the model before.
• Over-ride rule. Sometimes, certain actions must be taken regardless of the prediction generated by a model. For example, a predictive model used to target people
with offers for beer might predict that some children are very likely to take up the offer. An override rule is therefore put in place to prevent offers being sent to
children, regardless of the score generated by the model.
• Predictive analytics (PA). The application of machine learning techniques to generate predictive models. Some would argue that for all practical purposes machine
learning and predictive analytics are more or less the same thing, given that they use the same types of data as inputs, apply the same type of algorithms and
generate similar outputs (scores).
• Predictive model. A predictive model is the key output produced by a machine learning algorithm. The model captures the relationships (correlations) that the
process has discovered. Once a predictive model has been created, it can then be applied to new situations to predict future, or otherwise unknown, events.
• Predictive modelling, see predictive analytics.
39
Appendix A. Common Terms used in AI and Machine
Learning
• Python. Along with R, Python is one of the most popular, freely available, software packages that is used for machine learning.
• R. The R language has become one of the most popular software platforms for undertaking machine learning. The R software is free and open
source, with the user community able to develop new functionality and share it with other users. For those wanting to become data scientists, then a
good grounding in R (or Python) is recommended. Some books on R and Python are discussed in Appendix B.
• Random forest. Random forests are an ensemble method, based on combining together the output of a large number of decisions trees, where
each decision tree has been created under a slightly different set of conditions. Random forests are one of the most successful and widely applied
ensemble methods.
• Recurrent neural network. A type of neural network that uses the outputs from previous cases in the development sample as inputs for sub-
sequent training. These types of network are particularly useful where there are sequential patterns in data; i.e. the ordering of cases in the
development sample is important. One example is analysis of video. The changes in state from one images to the next (i.e. as objects move) can
provide a lot of information about the nature of the current image.
• Regression model. A popular tool for predicting the magnitude of something. For example, how much someone will spend or how long they will live.
This is in contrast to a classification model which predicts the likelihood of an event occurring.
• Response (choice) modelling. This term refers to marketing models used to predict the likelihood that someone buys a product or service that they
have been targeted with. A response model is a type of classification model.
• Score. Most predictions generated by predictive models are represented in the form of a number (a score). For a classification model the score is a
representation of the probability of an event occurring e.g. how likely someone is to respond to a marketing communication or the probability of them
defaulting on a loan. For a regression model the score represents the magnitude of the predicted behaviour e.g. how much someone might spend, or
how long they might live.
• Scorecard. A scorecard is a way of presenting linear models which is easy for non-experts to understand. The main benefit of a scorecard is that it
is additive; i.e. a model score is calculated by simply adding up the points that apply. There is no multiplication, division or other more complex
arithmetic.
40
Appendix A. Common Terms used in AI and Machine
Learning
• Score distribution. A table or graph showing how the scores from a predictive model are distributed across the population of interest. Lift and Gain
charts are both ways of presenting the score distribution in a graphical form.
• Sentiment analysis. This is a popular technique for extracting information about peoples’ attitude towards things. For example, if they had a positive
or negative experience when using a particular product or service. In machine learning, sentiment analysis is used to extract information from text or
speech that is then used to build predictive models.
• Supervised learning. The application of machine learning where each case in the development sample has an associated outcome which one
wants to predict. The cases are said to be “Labelled.” An example of supervised learning in target marketing is where each customer’s response to
marketing activity is known (they either responded or they didn’t). The model generated by the algorithm is them optimized so as to predict if
customers will respond to marketing or not. In practice, most machine learning approaches are examples of supervised learning.
• Support vector machine. An advanced type of non-linear model. Support vector machines have some similarities with neural networks.
• Training Algorithm. The term used to describe iterative machine learning algorithms which are used to determine the structure of predictive
models. The term is most widely used in relation to neural network type models. The algorithm repeats a number of times, adjusting the weights in
the network so as to optimize the performance of the network. The training algorithm terminates after a fixed number of iterations or when no further
significant improvements in model performance are obtained.
• Unsupervised learning. The application of machine learning where cases in the development sample don’t have an associated outcome. Cases
are said to be “Unlabelled.” Unsupervised algorithms typically seek to group cases with similar characteristics (features) together. An example of
unsupervised learning applied to target marketing is where there is no information about previous customers responses to marketing activity.
Clustering is applied to group similar types of customer together based on their age, income, gender etc. Marketing activity is then targeted at
specific clusters of individuals.
• Validation sample. An independent data set used to evaluate a predictive model after it has been constructed. The validation sample should be
completely separate from the development sample, and should not be used during model construction. Using one (or more) validation samples is
important, because machine learning sometimes over-fits a model to the development sample. This means that if you evaluate a predictive model
using the development sample, it can appear to be more predictive than it actually is.
41
Appendix B. Notes
• 1. In a real world application, one would produce the score
distribution from a second, independent validation sample.
This is to ensure that the results have not been biased, due
to some problem with the algorithm.
• 2. The GINI (and the related AUC measure) is a very
common measure used by data scientists to assess the
discriminatory ability of predictive models.
42
Artificial Intelligence and
Machine Learning for Business.
A No-Nonsense Guide to Data Driven Technologies
June 2017
43

More Related Content

Similar to ArtificialIntelligenceandMachineLearningforBusiness.pptx

1 10 everyday reasons why statistics are important
1   10 everyday reasons why statistics are important1   10 everyday reasons why statistics are important
1 10 everyday reasons why statistics are importantJason Edington
 
Thesis Essay Examples Telegraph
Thesis Essay Examples TelegraphThesis Essay Examples Telegraph
Thesis Essay Examples TelegraphStefanie Yang
 
Essay On Why Pitbulls Should Not Be Banned
Essay On Why Pitbulls Should Not Be BannedEssay On Why Pitbulls Should Not Be Banned
Essay On Why Pitbulls Should Not Be BannedApril Bergseth
 
Application of probability in daily life and in civil engineering
Application of probability in daily life and in civil engineeringApplication of probability in daily life and in civil engineering
Application of probability in daily life and in civil engineeringEngr Habib ur Rehman
 
Mba Scholarship Essay Examples. Online assignment writing service.
Mba Scholarship Essay Examples. Online assignment writing service.Mba Scholarship Essay Examples. Online assignment writing service.
Mba Scholarship Essay Examples. Online assignment writing service.Sonya Pope
 
How To Write A Standout Argumentative Essay 2023 - AtOnce
How To Write A Standout Argumentative Essay 2023 - AtOnceHow To Write A Standout Argumentative Essay 2023 - AtOnce
How To Write A Standout Argumentative Essay 2023 - AtOnceKristi Anderson
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumDale Sanders
 
2013 HealthTech Conference Notes
2013 HealthTech Conference Notes2013 HealthTech Conference Notes
2013 HealthTech Conference NotesLindsay Meyer
 
FREE 6 Sample Informative Essay Templates In MS Word
FREE 6 Sample Informative Essay Templates In MS WordFREE 6 Sample Informative Essay Templates In MS Word
FREE 6 Sample Informative Essay Templates In MS WordSandra Long
 
Use of Statistics in civil engineering and in real life
Use of Statistics in civil engineering and in real lifeUse of Statistics in civil engineering and in real life
Use of Statistics in civil engineering and in real lifeEngr Habib ur Rehman
 
Ap Us History Essay Structure. Online assignment writing service.
Ap Us History Essay Structure. Online assignment writing service.Ap Us History Essay Structure. Online assignment writing service.
Ap Us History Essay Structure. Online assignment writing service.Sharon Garcia
 
Computer vision in medical application
Computer vision in medical applicationComputer vision in medical application
Computer vision in medical applicationVishwasN6
 
Healthcare analytics 101 - Proverbs to Prediction
Healthcare analytics 101 - Proverbs to PredictionHealthcare analytics 101 - Proverbs to Prediction
Healthcare analytics 101 - Proverbs to PredictionPromotable
 

Similar to ArtificialIntelligenceandMachineLearningforBusiness.pptx (18)

Data Science-final7
Data Science-final7Data Science-final7
Data Science-final7
 
1 10 everyday reasons why statistics are important
1   10 everyday reasons why statistics are important1   10 everyday reasons why statistics are important
1 10 everyday reasons why statistics are important
 
Thesis Essay Examples Telegraph
Thesis Essay Examples TelegraphThesis Essay Examples Telegraph
Thesis Essay Examples Telegraph
 
Use of Statistics in real life
Use of Statistics in real lifeUse of Statistics in real life
Use of Statistics in real life
 
2015 think ida expo
2015 think ida expo2015 think ida expo
2015 think ida expo
 
Essay On Why Pitbulls Should Not Be Banned
Essay On Why Pitbulls Should Not Be BannedEssay On Why Pitbulls Should Not Be Banned
Essay On Why Pitbulls Should Not Be Banned
 
Application of probability in daily life and in civil engineering
Application of probability in daily life and in civil engineeringApplication of probability in daily life and in civil engineering
Application of probability in daily life and in civil engineering
 
Mba Scholarship Essay Examples. Online assignment writing service.
Mba Scholarship Essay Examples. Online assignment writing service.Mba Scholarship Essay Examples. Online assignment writing service.
Mba Scholarship Essay Examples. Online assignment writing service.
 
How To Write A Standout Argumentative Essay 2023 - AtOnce
How To Write A Standout Argumentative Essay 2023 - AtOnceHow To Write A Standout Argumentative Essay 2023 - AtOnce
How To Write A Standout Argumentative Essay 2023 - AtOnce
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
When Wellness Worlds Collide: Wellness Critics Respond to Attacks
When Wellness Worlds Collide:  Wellness Critics Respond to AttacksWhen Wellness Worlds Collide:  Wellness Critics Respond to Attacks
When Wellness Worlds Collide: Wellness Critics Respond to Attacks
 
2013 HealthTech Conference Notes
2013 HealthTech Conference Notes2013 HealthTech Conference Notes
2013 HealthTech Conference Notes
 
Next Generation Impact Measurement | July 19, 2016
Next Generation Impact Measurement | July 19, 2016Next Generation Impact Measurement | July 19, 2016
Next Generation Impact Measurement | July 19, 2016
 
FREE 6 Sample Informative Essay Templates In MS Word
FREE 6 Sample Informative Essay Templates In MS WordFREE 6 Sample Informative Essay Templates In MS Word
FREE 6 Sample Informative Essay Templates In MS Word
 
Use of Statistics in civil engineering and in real life
Use of Statistics in civil engineering and in real lifeUse of Statistics in civil engineering and in real life
Use of Statistics in civil engineering and in real life
 
Ap Us History Essay Structure. Online assignment writing service.
Ap Us History Essay Structure. Online assignment writing service.Ap Us History Essay Structure. Online assignment writing service.
Ap Us History Essay Structure. Online assignment writing service.
 
Computer vision in medical application
Computer vision in medical applicationComputer vision in medical application
Computer vision in medical application
 
Healthcare analytics 101 - Proverbs to Prediction
Healthcare analytics 101 - Proverbs to PredictionHealthcare analytics 101 - Proverbs to Prediction
Healthcare analytics 101 - Proverbs to Prediction
 

More from PerumalPitchandi

Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 
22ADE002 – Business Analytics- Module 1.pptx
22ADE002 – Business Analytics- Module 1.pptx22ADE002 – Business Analytics- Module 1.pptx
22ADE002 – Business Analytics- Module 1.pptxPerumalPitchandi
 
Descriptive_Statistics_PPT.ppt
Descriptive_Statistics_PPT.pptDescriptive_Statistics_PPT.ppt
Descriptive_Statistics_PPT.pptPerumalPitchandi
 
20IT204-COA-Lecture 18.ppt
20IT204-COA-Lecture 18.ppt20IT204-COA-Lecture 18.ppt
20IT204-COA-Lecture 18.pptPerumalPitchandi
 
20IT204-COA- Lecture 17.pptx
20IT204-COA- Lecture 17.pptx20IT204-COA- Lecture 17.pptx
20IT204-COA- Lecture 17.pptxPerumalPitchandi
 
Capability Maturity Model (CMM).pptx
Capability Maturity Model (CMM).pptxCapability Maturity Model (CMM).pptx
Capability Maturity Model (CMM).pptxPerumalPitchandi
 
Comparison_between_Waterfall_and_Agile_m (1).pptx
Comparison_between_Waterfall_and_Agile_m (1).pptxComparison_between_Waterfall_and_Agile_m (1).pptx
Comparison_between_Waterfall_and_Agile_m (1).pptxPerumalPitchandi
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxPerumalPitchandi
 
FDS Module I 24.1.2022.ppt
FDS Module I 24.1.2022.pptFDS Module I 24.1.2022.ppt
FDS Module I 24.1.2022.pptPerumalPitchandi
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptPerumalPitchandi
 
AgileSoftwareDevelopment.ppt
AgileSoftwareDevelopment.pptAgileSoftwareDevelopment.ppt
AgileSoftwareDevelopment.pptPerumalPitchandi
 
Agile and its impact to Project Management 022218.pptx
Agile and its impact to Project Management 022218.pptxAgile and its impact to Project Management 022218.pptx
Agile and its impact to Project Management 022218.pptxPerumalPitchandi
 

More from PerumalPitchandi (20)

Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
22ADE002 – Business Analytics- Module 1.pptx
22ADE002 – Business Analytics- Module 1.pptx22ADE002 – Business Analytics- Module 1.pptx
22ADE002 – Business Analytics- Module 1.pptx
 
biv_mult.ppt
biv_mult.pptbiv_mult.ppt
biv_mult.ppt
 
ppt_ids-data science.pdf
ppt_ids-data science.pdfppt_ids-data science.pdf
ppt_ids-data science.pdf
 
ANOVA Presentation.ppt
ANOVA Presentation.pptANOVA Presentation.ppt
ANOVA Presentation.ppt
 
Data Science Intro.pptx
Data Science Intro.pptxData Science Intro.pptx
Data Science Intro.pptx
 
Descriptive_Statistics_PPT.ppt
Descriptive_Statistics_PPT.pptDescriptive_Statistics_PPT.ppt
Descriptive_Statistics_PPT.ppt
 
SW_Cost_Estimation.ppt
SW_Cost_Estimation.pptSW_Cost_Estimation.ppt
SW_Cost_Estimation.ppt
 
CostEstimation-1.ppt
CostEstimation-1.pptCostEstimation-1.ppt
CostEstimation-1.ppt
 
20IT204-COA-Lecture 18.ppt
20IT204-COA-Lecture 18.ppt20IT204-COA-Lecture 18.ppt
20IT204-COA-Lecture 18.ppt
 
20IT204-COA- Lecture 17.pptx
20IT204-COA- Lecture 17.pptx20IT204-COA- Lecture 17.pptx
20IT204-COA- Lecture 17.pptx
 
Capability Maturity Model (CMM).pptx
Capability Maturity Model (CMM).pptxCapability Maturity Model (CMM).pptx
Capability Maturity Model (CMM).pptx
 
Comparison_between_Waterfall_and_Agile_m (1).pptx
Comparison_between_Waterfall_and_Agile_m (1).pptxComparison_between_Waterfall_and_Agile_m (1).pptx
Comparison_between_Waterfall_and_Agile_m (1).pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
FDS Module I 24.1.2022.ppt
FDS Module I 24.1.2022.pptFDS Module I 24.1.2022.ppt
FDS Module I 24.1.2022.ppt
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.ppt
 
AgileSoftwareDevelopment.ppt
AgileSoftwareDevelopment.pptAgileSoftwareDevelopment.ppt
AgileSoftwareDevelopment.ppt
 
Agile and its impact to Project Management 022218.pptx
Agile and its impact to Project Management 022218.pptxAgile and its impact to Project Management 022218.pptx
Agile and its impact to Project Management 022218.pptx
 
Data_exploration.ppt
Data_exploration.pptData_exploration.ppt
Data_exploration.ppt
 
state-spaces29Sep06.ppt
state-spaces29Sep06.pptstate-spaces29Sep06.ppt
state-spaces29Sep06.ppt
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 

ArtificialIntelligenceandMachineLearningforBusiness.pptx

  • 1. Artificial Intelligence and Machine Learning for Business. A No-Nonsense Guide to Data Driven Technologies June 2017
  • 2. Introduction • Artificial Intelligence (AI) and Machine Learning are now mainstream business tools. • They are being applied across many industries to increase profits, reduce costs, save lives and improve customer experiences. • Organisations which understand these tools and know how to use them are benefiting at the expense of their rivals. • The material in this presentation is taken from the book: 2 Artificial Intelligence and Machine Learning for Business. A no-Nonsense Guide to Data Driven Technologies by Steven Finlay (2017), published by Relativistic Books. It is available at Amazon and all good book stores.
  • 3. Objectives • The objective of this presentation is to cut through the technical jargon which is often associated with Artificial Intelligence and Machine Learning to deliver a simple and concise introduction for managers and business people. • The focus is very much on practical application.  How to work with technical specialists (data scientists) to maximise the benefits of these technologies. • Its formula free.  There is no complex maths or equations.  There is a bit of arithmetic. 3
  • 4. What is Artificial Intelligence (AI)? • A simple working definition of Artificial Intelligence (AI): • A good AI application can perform as well or better than the average person when faced with everyday tasks. 4 The replication of human analytical and/or decision making capabilities.
  • 5. What is Artificial Intelligence (AI)? • Examples of AI in practice:  The ability to identify people from their Facebook photos.  Assessing someone’s creditworthiness more accurately than an experienced underwriter.  The ability to beat the best go and chess players.  Digital personal assistants, such as Amazon’s Echo and Apple’s Siri.  Being better at spotting the signs of cancer on a medical scan than an expert radiologist.  Travelling from A to B and avoiding hitting things on the way (self driving cars and autonomous robots). • At one level, AI applications can seem almost magical. However, like most things, when you get under the bonnet the mystique evaporates. 5
  • 6. What is Artificial Intelligence (AI)? • A mistake to avoid is thinking current AI applications are in any way intelligent in a human conscious way.  Its all just clever maths at the end of the day. • Despite the hype appearing in the media, most experts agree that we are years away from creating a machine with a true human-like sense of self, which could pass as human day in, day out.  That’s not to say there aren’t some very good chatbots out there! • General AI research has many strands. There are lots of different avenues being explored. • From a business perspective, almost all practical applications of AI in use today are based on machine learning.  It’s not technically correct, but from a practical perspective, one can argue that machine learning and artificial intelligence are more or less the same thing. 6
  • 7. What is Machine Learning? • Machine learning involves the use of mathematical procedures (algorithms) to analyze data. • The goal is to discover useful patterns (relationships/correlations/associations) between different items of data.  People who buy chicken often buy white wine as well.  Lack of exercise is correlated with heart disease.  A poor credit history is associated with higher insurance claims.  Washing machines are cube shaped, most umbrellas are long and thin (when closed). • Once the relationships have been identified, these can be used to make inferences about the behavior of new cases when they present themselves. Appropriate actions then can be taken.  Send discount coupons for white wine to people who buy chicken.  Offer non-active people free gym membership  Don’t offer the best insurance deals to people with a poor credit history  If an object is cube shaped, its more likely to be a washing machine than an umbrella. • In essence, this is analogous to the way people learn.  We observe what goes on around us and draw conclusions about how the world works.  We then apply what we have learnt to help us deal with new situations that we find ourselves in.  The more we experience and learn, the better our ability to make decisions becomes. • The end product of the machine learning process is a predictive model (or just model going forward).  The model captures the key relationships that the machine learning process has identified. 7
  • 8. What is Machine Learning? An Example. • Machine learning can be applied to examine historic patient records to find the patient characteristics that correlate with heart disease. The predictive model that is created from the machine learning process is then used to identify those individuals most at risk. • Machine learning needs data, and lots of it, to work effectively. Therefore, the government takes a large development sample of say, 500,000 historic patient records from 5 years ago. They then flag (label) each case to indicate who got heart disease and who did not in the following 5 years.  30,000 did get heart disease.  470,000 did not get heart disease. • Having data about people who didn’t get heart disease is just as important as data about those that did. • A common mistake with data analysis is just to record data about the event of interest, and to forget about the “non-events.” In this case, having information about people who didn’t get heart disease is just as important as having information about those that did. This is because the machine learning process works by identifying the patterns that differentiate the two event types. • The data contains lots of information about each person. This includes things such as their gender, income, age, alcohol intake, BMI, blood pressure, if they smoke, and so on. 8 • Imagine that the government wants to screen everyone in the population for their likelihood of developing heart disease in the next 5 years. • Those at highest risk will be contacted to invite them for a check-up and lifestyle review to try and reduce their risk of developing heart disease in the future. • Lifestyle reviews are expensive. The government can only afford to invite 5% of the population for a review.
  • 9. What is Machine Learning? An Example. 9 Starting score (constant) 350 Age (years) Gross annual income ($) <23 -57 < $22,000 11 23 - 32 -26 $22,001 - $38,000 6 33 - 41 0 $38,001 - $60,000 0 42 - 48 7 $60,001 - $94,000 -3 49 - 57 15 $94,001 - $144,000 -5 58 - 64 24 >$144,000 -6 65 - 71 31 >71 65 Smoker ? Yes 37 BMI (weight in kg / {height in metres}2 ) No 0 <19 2 19 - 26 0 Diabetic ? 27 - 29 8 Yes 21 30 - 32 14 No 0 >32 29 Cholesterol level (mg per decilitre of blood) Gender Low (< 160 mg) -2 Male 2 Normal (160 - 200 mg) 0 Female -4 High (201 - 240 mg) 19 Very high (>240 mg) 32 Alcohol consumption (units/week) 0 4 Blood pressure 1 - 12 0 Low (below 90/60) 3 13 - 24 5 Average (between 90/60 and 140/90) 0 25 - 48 10 High (above 140/90) 36 For a 45 year old female with the following characteristics: 4 • A BMI of 28. • Drinks an average of 6 units of alcohol a week. • An honours degree graduate. • An income of $50,000 a year. • Smokes. • Is not diabetic. • Normal cholesterol levels. • Low blood pressure. The starting score is 350. Seven points are added due to her age, which brings the score to 357. Eight points are then added based on her BMI, and so on. After adding/subtracting all of the relevant points, the final score is 404. Note that graduate does not feature in the model. This indicates the algorithm has found no correlation between being a graduate (or not) and heart disease. • In this example, the machine learning process is used to generate a scorecard type model. • Scorecards are very simple and easy to understand, as in the example below. • The score, calculated by adding up all the points that apply, represents the individual’s risk. The training algorithm has used the data available to determine what the points in the scorecard should be; i.e. the points represent the patterns that the machine learning process has identified.
  • 10. What is Machine Learning? An Example. 10 • This is a score distribution table. • It has been created by running all 500,000 cases in the development sample1 against the scorecard to generate a score for each case. • The scores have been put into ranges (for convenience) • To generate a prediction, one calculates a person’s score, and then reads across to the right most column. • For someone scoring 303, they fall into the second row of the table. The proportion of people in the 301-320 score range who were observed to develop heart disease over a five year period was 0.12% • Therefore, our prediction for someone scoring 303 today, is that there is a 0.12% chance of them developing heart disease in the next 5 years. • For the case of the 45 year old women who scored 404, she falls in the 401-420 score range. • Reading across, the proportion of people who scored between 401 and 420 who went on to develop heart disease was 1.76%. • i.e. the model predicts that she has a 1.76% (1 in 57) chance of developing heart disease in the next five years. From To 0 300 55,950 11.19% 40 0.07% 301 320 56,606 11.32% 68 0.12% 321 340 59,700 11.94% 129 0.22% 341 360 58,706 11.74% 216 0.37% 361 380 64,429 12.89% 403 0.63% 381 400 52,749 10.55% 575 1.09% 401 420 34,089 6.82% 600 1.76% 421 440 21,107 4.22% 632 2.99% 441 460 17,269 3.45% 878 5.09% 461 480 23,364 4.67% 2,020 8.65% 481 500 17,477 3.50% 2,553 14.61% 501 520 13,554 2.71% 3,366 24.84% 521 540 7,103 1.42% 3,463 48.76% 541 560 8,260 1.65% 6,587 79.74% 561 999 9,637 1.93% 8,469 87.88% Total 500,000 100.00% 30,000 6.00% % of population Number with heart disease after 5 yrs. % with heart disease after 5 yrs. Score range Number of people
  • 11. What is Machine Learning? An Example. 11 • That’s pretty good isn’t it? • When I apply this model across the entire population of many millions, I can expect to identify 62% of cases, within just 5% of the population. • The model is far better than a random selection process, which would only identify 6% of cases. • OK. So how do we use the prediction (score)? • The government has indicated that it can afford to invite 5% of the population for a check up. • We use the score distribution to identify the “best” 5% to invite by using the % of population column (blue figures). • If you add up the figures in blue, then that equals 5% of the population. • So our decision rule is to invite anyone scoring 521 or more for a check-up. • By doing this, we will identify 18,519 out of 30,000 people who will get heart disease; i.e. 62% of cases (Sum of green figures). From To 0 300 55,950 11.19% 40 0.07% 301 320 56,606 11.32% 68 0.12% 321 340 59,700 11.94% 129 0.22% 341 360 58,706 11.74% 216 0.37% 361 380 64,429 12.89% 403 0.63% 381 400 52,749 10.55% 575 1.09% 401 420 34,089 6.82% 600 1.76% 421 440 21,107 4.22% 632 2.99% 441 460 17,269 3.45% 878 5.09% 461 480 23,364 4.67% 2,020 8.65% 481 500 17,477 3.50% 2,553 14.61% 501 520 13,554 2.71% 3,366 24.84% 521 540 7,103 1.42% 3,463 48.76% 541 560 8,260 1.65% 6,587 79.74% 561 999 9,637 1.93% 8,469 87.88% Total 500,000 100.00% 30,000 6.00% Score range Number of people % of population Number with heart disease after 5 yrs. % with heart disease after 5 yrs.
  • 12. Example 2: Object recognition • The goal of an object recognition system is to identify what object is present in an image; e.g. a photo. • The underlying principles are the same as for the heart disease problem:  A large development sample is taken.  The sample contains thousands of images of objects such as washing machines, chairs, umbrellas and so on. • Many different images of the same object, but taken from different angles. • Images of different versions of that object. E.g. different types of chairs and umbrellas.  Each image is labelled to say what type of object it is. • Picture 1 is a chair • Picture 2 is a washing machine • Picture 3 is another washing machine • Picture 4 is an umbrella • Etc. 12
  • 13. Example 2: Object recognition • The pixels that form each image are analogous to the patient data for the heart disease problem: • BMI • Age, • Etc. • The data in this case are the colour and brightness of individual pixels in the image: • Pixel 1 • Pixel 2 • … • Pixel N • Lots of pixels! A low(ish) resolution image made of 256 * 256 pixels has 65,536 pixels in total. • This is a big problem, requiring powerful computers and lots of processing time!  A lot more complex then the heart disease problem, even though the underlying principles are very similar. 13 1 2 3 … N
  • 14. Example 2: Object recognition • A separate score is created to represent the probability of the item in the image being a particular object.  A complex system could therefore generate thousands of separate scores. • When a new image is presented to the model, a score is calculated for every possible object.  If we used scorecards to do this, this would mean a separate scorecard to calculate a score for each object. • The object with the highest score (highest probability) is the one selected.  i.e. the object “predicted” by the model 14 1 2 3 … Object Score (Probability) Washing machine 0.07 Umbrella 0.04 Cup 0.03 Chair 0.62 Bathtub 0.02 Table 0.16 Shoe 0.01 Etc. …
  • 15. Different types of predictive models • A scorecard is just one type of predictive model. • There are lots of other types of model, some far more complex than scorecards.  Some are much more predictive than scorecards in some cases, but not always.  Some types of model are better suited to certain types of problem than others.  For some types of problem, many different types of model perform equally well.  There is no one type of predictive model that can be said to out perform all others in all situations. • However, the way models are used are identical, regardless of the model type: 1. The models generate numbers (scores) based on the information presented to them. 2. The score(s) from the model represents the thing(s) one is trying to predict. 3. A decision is then taken about what action to take on the basis of the score(s). • Let’s take a look a some other types of predictive model. 15
  • 16. Different types of predictive models: Decision trees Relativistic 16 To calculate a score for someone using the decision tree, one starts with the top node of the tree (Entire population). At each branching point the node is chosen that fits that person’s individual characteristics. The first branching is based on age. If a person is less than 52 years old, the left hand side of the tree is followed. If they are aged 52 years or more, then the right hand branch is followed. The process is repeated at each decision point until there are no further decisions to be made. The final node (called an end node or leaf node) into which a person falls determines their score. There are 13 end nodes (shaded in grey and numbered) and hence 13 scores that someone can receive. Those scoring 1 have the lowest chance of developing heart disease, those scoring 13 the highest. • This decision tree has been created using the same 500,000 cases used to develop the scorecard type model presented earlier. • Note that the scorecard and decision tree contain slightly different information. The scorecard includes annual income, but the decision tree does not. Likewise “Graduate” features in the decision tree model but not the scorecard. This is a feature of the different algorithms used to create each type of model.
  • 17. Different types of predictive models: Decision trees • The decision tree is used in the same way as the scorecard to make decisions, using a score distribution table. 17 • The only major difference is that whereas the scorecard created scores in the range 0 – 999, there are just 13 possible scores generated by the decision tree. • Given that the decision tree and scorecard were developed using different algorithms, their predictive accuracy will be different. • In this example, the scorecard is slightly better than the decision tree (as measured on the proportion of heart disease cases in the highest scoring 5%), but for another problem the results could be different.
  • 18. Different types of predictive models: (Artificial) Neural Networks (NNs) • Our brains contain billions upon billions of neurons, each interconnected to yield trillions of connections. It's the combination of neurons and the connections between them that drive our ability to think and act intelligently, to understand how the world works and to make decisions (both consciously and unconsciously) about how we interact with the world. • Back in the 1950s, computer scientists first came up with the concept of the “Artificial Neuron”, but it wasn’t until the mid-1980s that using multiple artificial neurons linked together (a Neural Network ) was proposed as a way to solve complex problems. Since that time, the popularity of neural networks has continued to grow, and advanced forms of neural networks are currently at the forefront of research in to artificial intelligence and machine learning. • The first thing to say about neural networks is that while they are sometimes touted as being immensely complex “brain like” things, they can be readily understood if one is willing to spend a little time and effort to study them. • It’s certainly true that a typical neural network is more complex than the scorecards and decision trees that were introduced in previous slides, but the underlying principles are not that much different at the end of the day – there is just more of it. 18
  • 19. Different types of predictive models: (Artificial) Neural Networks (NNs) • A neuron in machine learning terms is not a living thing that’s been grown by a mad scientist in a covert lab somewhere, but is a simple and simplistic representation of how natural neurons behave, created using equations and implemented as computer code. 19 The way a neuron works is as follows: 1. The patient data, such as Age, BMI, etc., provides the inputs to the neuron. 2. Each input is multiplied by a weight (which can be positive or negative). For non-numeric data, such as gender or smoker, 0/1 flags are used to represent each condition. For example 1 if female, 0 if male. 3. The inputs, multiplied by their weights are added up to get an initial score. 4. The initial score is transformed. Often, this is to force the score to be in the range 0-1. This transformationa is not absolutely essential, but is “good practice.” This is so that when the neuron is combined with others to produce a neural network, all the neurons produce values in the same range; i.e. between 0 and 1. 5. The transformed version of the initial score is the output score produced by the neuron. a. I promised no complex formulas, but I’ll throw just one into the mix. There are many transformations (activation functions) that can be applied. A very common one is the “logistic function.” This is calculated as 1/(1+e - (Initial Score)). The value of e is 2.718. e is a bit like PI (3.142) in that it appears in all sorts of places in mathematics and statistics.
  • 20. Different types of predictive models: (Artificial) Neural Networks (NNs) • So a neuron in the context of AI / machine learning isn’t anything mysterious or complex. It’s just a couple of simple formulas.  The first formula multiples each of the input variables by a weight and adds up the total.  The second formula then modifies (transforms) this value so that it lies within a fixed range, usually between 0 and 1.  It’s as simple as that! • The clever bit is determining what the weights should be. • A single neuron is not much different from a scorecard type model. The only material difference is the transformation to force the score to lie in a fixed range. • It can be demonstrated that the outputs generated by neuron and a scorecard type model are in fact equivalent. 20
  • 21. Different types of predictive models: (Artificial) Neural Networks (NNs) 21 • A neural network is created by combining several neurons together. The key features of this neural network are: 1.The inputs to each neuron in a given layer are the same (e.g. Age, BMI etc. for the first layer neurons), but the weights within each neuron are different. Therefore, each neuron produces a different score. 2.The outputs (the scores) from the first set of neurons (the first layer) provide the inputs to the second set of neurons (the second layer). 3.There are four neurons in the first layer in this example. In practice, the optimal number of neurons can’t be determined up front. Instead, a number of competing models are built with different numbers of neurons, and the one that performs the best is chosen. 4.There are lot of weights. This is a very simple network, with just 8 input variables and 5 neurons, but there are 36 weights. A typical business application with say, 50 input variables and 75 neurons, would have more than 3,000 weights. The clever bit is how one determines what the weights should be. This is where the clever maths comes into play, to find the weights that provide the best overall model; i.e. the one that provides the most accurate predictions (scores). This is not an easy task, and can require a huge amount of computer processing time to undertake for some problems. This example has just one output score (final score). For complex problems with several possible outcomes, there would be many more neurons in the final layer, one for each possible outcome.
  • 22. Different types of predictive models: (Artificial) Neural Networks (NNs) • The reason why neural networks are so popular is their ability to detect subtle patterns in data which other simpler methods, such as scorecards and decision trees, may not be able to find. • This means that they can potentially generate more accurate predictions than scorecard or decision tree type models.  This is achieved by having the two layers of neurons, with the scores from the first layer providing the inputs to the second layer. • The main drawback of neural networks is that the scores that they generate are not intuitive. • One can see what the various weights in the model are and understand the overall score calculation, but if I was to ask you which input variable contributes the most to the final score, then this is much less obvious than for the scorecard and decision tree models of heart disease. 22
  • 23. Deep learning • Deep learning represents the latest evolution of neural network type models. • The neural network example on previous slides effectively has two layers of neurons and this structure is used very successfully in many traditional neural network applications. • However, there is no reason why there can’t be many more layers. So there could be 3, 4, …, 100+ layers if desired, with different numbers of neurons in each layer. • What one tends to find is that as more layers are added, so the ability of the network to identify complex and/or subtle patterns increases. • The more layers, the “deeper” the network. Generally speaking, anything with more than 2 or 3 of layers could be classified as a “deep” network – but there is no single accepted definition. 23
  • 24. Deep learning • As well as expanding the number of neurons and layers in a network, other avenues of research associated with deep learning consider how the neurons in the network are connected. • In a standard network (like the example used in this presentation), all of the neurons in each layer are connected to all of the neurons in the next layer. • However, other configurations are possible. For example, not connecting all inputs to all of the neurons in the first layer (a convoluted neural network). This works particularly well for certain types of problem such as image recognition. • Another variant on standard neural networks is to create feedback loops. The output of neurons in later layers act as inputs to earlier layers (a recurrent neural network). Recurrent neural networks work well when the information used to train the network is contained in a sequence of events. When using machine learning to decipher handwriting one letter at a time, the fact that the previous letter in a word was say, the letter “Z,” is an important predictor for the next letter, which is almost certainly a vowel or the letter “S.” A traditional neural network for handwriting recognition would not consider these sequential relationships. Google translate is one example of a commercial product that uses recurrent networks. • The most complex neural network models in use today, such as the ones developed by Google’s Deepmind subsidiary to beat the world’s best Go players, combine these features and have millions of neurons connected across dozens of layers. 24
  • 25. Deep learning • Complex deep learning models, based on various types of neural networks, are driving many sophisticated artificial intelligence applications, but for many types of business problem this is overkill. • A very complex model is only required for a very complex problem. • It’s the old adage: “you don’t need a sledgehammer to crack a nut.” • If you apply a very complex machine learning approach to a very simple problem, then sometimes you will get a worse result than if a very simple model is used. This may seem counter intuitive, but there will be a tendency for the model to find false patterns that don’t really exist, whereas this is less likely with simple models.  Many common business problems are “simple” i.e. scorecards and decisions trees will provide a pretty good solution. A complex model, such as a neural network might be better, but in many cases, not that much better (in terms of predictive accuracy) and in some cases, no better at all.  Axiom. If you can’t get a simple predictive model to add value to what you do, then a more complex one probably won’t either. • A good data scientist will explore different options and deliver the simplest solution that they can, which meets your business objectives. 25
  • 26. Creating an intelligent application • A predictive model on its own is not a very user friendly thing:  It generates scores when presented with data  It does not interact with people directly  It does not appear “intelligent” • Models need to be integrated within your organization’s business systems, if they are going to add value.  Model implementation is often a larger, more expensive and time consuming task than the construction of the model – many data scientists forget this! • For customer orientated applications, a user interface is required  Take input from a user e.g. voice input or data captured on an application form.  Data is pre-processed to transform it into a suitable format to allow scores to be calculated.  Scores are generated by running the pre-processed data through the model(s)  Decision rules are applied; i.e. what action to take given the prediction that has been generated.  Results fed back to the user in a “human like” manner. E.g. written or verbal response. 26
  • 27. Creating an intelligent application: Core components 27 Data input. This can include includes: • Sensory inputs from cameras (eyes), microphones (ears) or other sources. • Geodemographic data such as people’s age and income • Location and usage details for smartphones, cars , televisions and other devices • Spending patterns captured from credit cards and supermarket tills. Data (pre)processing. • Data input is processed into a standard “computer friendly” representation. • Computers like numbers. Images and text are transformed into a suitable numeric formats • New data items are created e.g. Age is often more useful than Date of Birth. Therefore create Age from date of birth. Application of predictive models (score calculation). • The models generated by the machine learning process are applied. • Pre-processed data for new cases is fed into the models in order to generate fresh predictions going forward (model scores). Decision rules (rule sets). • Decision rules are used in conjunction with data inputs and the scores generated by predictive models to decide what to do. • Sometimes the rules are derived automatically by the machine learning algorithm, but often they will also include rules defined by human experts/business users. Output. • A response needs to be taken based upon the decision(s) that have been made. • If the system decides that someone should be hired, then an offer letter and contract need to be sent to the candidate.
  • 28. Creating an intelligent application: A. Selection tool for doctors 1. Patient records held in the health authority’s central database provide the raw data. 2. Every patient’s data is automatically pre- processed so that it is in the required format to be scored by the model. 3. Each patient’s (pre-processed) data is presented to the model and a score generated. 4. Those scoring 521 or more are added to a mailing list. 5. A letter is sent to everyone on the mailing list, asking them to make an appointment with their Doctor. B. A self-diagnosis tool for patients 1. An phone app is created, which prompts the user to enter their health data (BMI, Gender, etc.) as speech. 2. The patient’s verbal responses are translated and pre-processed so that it is in the format required by the predictive model. 3. Each patient’s (pre-processed) data is presented to the model created from the machine learning process and a score generated. 4. Those scoring 521 or more, are flagged, indicating that they should be invited for a check-up. 5. A standard verbal response is played back to the patient. 1. If they score 521 or more play the message: “Please arrange an appointment with your doctor as soon as possible” 2. If they score below 521 play the message “You are not in a high risk category. There is no need to contact your doctor.” Relativistic 28 Let’s look at 2 ways in which the heart disease model could be deployed The central tasks of calculating scores and applying decision rules are more or less the same. It’s the interface to the models, and the way that the results are played back to the user which differ.
  • 29. Creating an intelligent application: • Complex AI applications often deploy multiple, interacting predictive models, combined with lots of decision rules. • For the patient self diagnosis tool on the previous slide, a second predictive model would typically be incorporated into the app.  This model would predict what was being spoken, and map this to the data items required by the heart disease predictive model.  In effect, this first predictive model pre-processes spoken information, to transform it into a format which the second model (the heart disease prediction model) understands. • Speech recognition tools are now available “off the shelf.” Amazon, for example, provides an interface to its speech recognition software for free, which anyone can incorporate into their own AI applications. 29
  • 30. Business applications: Assessing how good an AI application is • Accuracy, Accuracy, Accuracy!  That’s what most data scientists focus on when applying machine learning to develop applications based on predictive models. • As a business user, I have no interest in accuracy per-se!!! • I don’t care that a model is 84% accurate, has a “GINI2” of 76.2% or whatever. • What I’m interested in is my bottom line. Things like:  How much money will the solution make me?  How many less staff will I need to employ?  What impact will the solution have on customer service?  How many more lives will be saved? • A data scientist needs to work with business people to understand how their organisation works and how the solution is going to be deployed. • For example, to evaluate the worth of a machine learning solution to reduce the number of staff screening CVs in the HR department, they need to know:  How many CVs are currently screened per year.  How long it takes to screen each CV.  Staff costs.  What proportion of CVs can be expected to be screened automatically using the application (it may not be all of them, there will probably be some exception cases).  How much the machine learning solution is going to cost. This includes up front costs, annual license fees and any additional maintenance costs. • Axiom. If a data scientist can’t express the benefits of their solution to you in terms of your bottom line, then they have not done their job. 30 GINI = 76.2% $$$ ≠
  • 31. Business applications: Other considerations • Is your organisation ready for AI & machine learning?  If implementing a machine learning based process is going to result in changes to what people do, redundancies or deskilling, then how is the organisation going to deal with that? • Ethical & Legal considerations  Machine learning has no social awareness and no conscience.  Patterns are patterns, whatever those patterns may be.  If the patterns that are found during machine learning are undesirable, outside social norms, or illegal to use, then this needs to be dealt with. • For example, if the machine learning process generates predictive models which result in less favourable treatment based on gender or ethnic origin, then can you deal with that?  The statistical evidence might support it, but society won’t tolerate it.  Even if its legal, there is risk of reputational damage if the existence of bias in your systems is made public. 31
  • 32. Business applications • Its very easy to get excited about the maths and the predictive models generated by the machine learning process. • However, one must not lose sight that of the fact that one is trying to solve a business problem. • Having a very accurate predictive model is only useful if you have a real problem to apply it to. The model has no intrinsic value in itself! • Implementing the fruits of machine learning is often a much more expensive and time consuming task than the application of machine learning to build a predictive model. 32
  • 33. Business applications Putting together a project team • Its very tempting to hire a data scientist (or a consultancy firm) and just let them “get on with it,” hoping that they will delivery something useful at the end of the day. • That’s a bit like getting a load of builders to build a house for you, without first hiring an architect to design it for you, and not having a project manager to organise things on site to make sure you get what you’ve asked for. • Data scientists are clever people, but they have a specific skill set, and tend to focus on the mathematical aspects. They can’t do everything, and they won’t know all the ins and outs of your business. • For any large scale AI/Machine learning based project you will need to pull together a project team, of which the data scientists are just one contributor. • A requirements specification, detailing what the project aims to achieve and how success will be measured should be produced at the outset. Project meetings should held during the project to assess progress and to ensure that things remain focused and on track. 33 Project Team Role What they do Project sponsor Senior manager who provides spend approval and champions the project at senior forums. Project manager Owns the end-to-end process for design, build and implementation of the solution. This should be someone who represents the business, not a supplier. Subject matter experts (SMEs) People who know the relevant area of the business inside out. They understand how data is currently used and the constraints and limitations of the organisation. Data scientists The techy people who do machine learning and develop apps. Audit / validation Providing independent oversight, to ensure that the project delivers what it says it will and aligns and meets any legal or regulatory requirements.
  • 34. Business applications: A checklist for business success • What follows on the next slide is a set of questions that you should ask yourself, and any data scientists working for you, to answer BEFORE you move beyond the feasibility stage of any Machine Learning / AI project. • Answers should be captured in an appropriate form within the business requirements of the project. 34
  • 35. Business applications: A checklist for success 1. What is your business problem? Machine learning has no intrinsic value in itself. It has to be applied to a specific problem or objective if it’s going to be of any value. 2. What metric do you wish to optimise? This should be something that can be measured in simple terms and which is important to you. Time saved, increased profit, reduced cost, customer satisfaction and lives saved, for example. This is what the machine learning process will be used to optimize. 3. What type of decision rules and actions will you apply? You have to make decisions based on what predictive models tell you, and act upon those decisions. If you can’t think what you will do with the model, then why have it? 4. Where will the data to do machine learning come from? Machine learning requires at least several hundred, and ideally many thousands of example cases to work with. The metric of interest (above) needs to be available for these cases. 5. How will you operationalize the machine learning process? What system or process will the predictive models and decisions rules be implemented within? This will require time and money. Who will do it and who will pay for it? The models and decision rules won’t implement themselves. 6. What are the ethical and legal risks associated with automation? If you are making important decisions about people then you may want to test the system to ensure that it does not display unfair or illegal bias against certain groups. 7. Is all required data available operationally? Make the data scientist provide a checklist against every data item that features in their models to ensure that it exists within the implementation platform. If it’s not, then you won’t be able to use the solution that’s been developed. 8. Do you need controls to ensure decisions are acted upon? How will you stop staff overriding decisions made on the basis of the machine learning process? If over-rides are permitted, how will you monitor them to ensure that staff are making the right decisions? 9. How will you assess the success of the project? The theoretical benefits presented at the end of the analytical phase of a machine learning project are only indicative of what actually happens in real life. A monitoring process is needed to assess the system when it goes live. This is so that you can measure the actual benefits that are realized against those that were promised. 10.What are the on-going costs of maintaining the system? Over time, new data, new regulations, changing business requirements, and so on, will mean that the system will require modification and may need to be redeveloped. How often is this expected to occur and how much is it likely to cost? 35
  • 36. Appendix A. Common Terms used in AI and Machine Learning • Activation function. An equation (mathematical formula) used to transform the raw output of a neuron into its final output (score). Activation functions are often used to force the output of neurons to lie in a fixed range, usually between 0 and 1, but some result in outputs with different ranges. Common activation functions are the logistic function, the hyperbolic tangent function and the identify function. See https://en.wikipedia.org/wiki/Activation_function for a more complete list of activation functions and their derivation. • Algorithm. An algorithm is a set of instructions or procedures, executed in sequence to solve a particular problem. In machine learning various algorithms are applied to discover patterns in data and to create predictive models. • Big Data. A very large collection of varied, changeable data, which is difficult to process using a standard PC/Server; i.e. typically terabytes in size or larger. Big Data technologies make use of advanced computer architectures and specialist software to facilitate the rapid processing of these data sets. Machine learning is one of the primary tools used to extract value from Big Data. • Causation. The reasoning behind why something happened; i.e. the cause of something. Eating too much causes one to become fat. Not to be confused with correlation. • Classification model. A predictive model which predicts if a given event will or will not occur. For example, the likelihood that two people will hit it off on a date, the probability that someone will buy a certain type of car, or the chance that a person develops diabetes sometime in the next 10 years. The score generated by a classification model is an estimate of the likelihood of the event occurring. Not to be confused with regression model. • Cluster. Clusters are groups that contain people or things with similar traits. For example, people with similar ages and incomes might be in one cluster, those with similar job roles and family sizes in another. Clustering algorithms are one form of unsupervised learning (see below). • Convoluted (neural) network. A type of deep neural network where not all neurons in a layer are connected to all the neurons in the next layer. With fewer weights, the (very considerable) time required to train a network is much reduced. A classic application is object recognition. Imagine you have an image comprising 256 * 256 (65,536) pixels. A traditional neural network would need 65,536 weights for each neuron in the first layer, one for each pixel! By segmenting the image into say, 64 sub-regions each with 32 *32 pixels, and only connecting neurons within each sub-region, the number of weights is reduced to just 1,024 per neuron. This works because the most important features in images are usually close together (i.e. in the same or neighbouring sub-regions). Pixels at opposite sides of the image are much less important. 36
  • 37. Appendix A. Common Terms used in AI and Machine Learning • Correlation. One variable is correlated with another if a change in that variable occurs in tandem with a change in the other. It is important to appreciate that this does not necessarily mean that one thing is caused by another (causation), although it might be. Stork migration is correlated with births (more children born in the spring when storks migrate), but stork migration does not cause births! • Cut-off score, see decision rule. • Data science. The name given to the skill/art of being able combine mathematical (machine learning) knowledge with data and IT skills in a pragmatic way, to deliver practical value add machine learning based solutions. • Data scientist. The name given to someone who does data science. Good data scientists focus on delivering useful solutions that work in real world environments. They don’t get too hung up on theory. If it works it works! • Decision rule. Predictive models generate scores. Decision rules are then used to decide how people are treated on the basis of the score. Those who score above a given score (the cut-off score) receive one treatment, those scoring below the cut-off another. For example, when assessing people for a medical condition, only those scoring above the cut-off; i.e. those with the highest risk of developing the condition, are offered treatment. • Decision tree. A type of predictive model, created using an algorithm that recursively segments a population into smaller and smaller groups. Also known as a Classification and Regression Tree or CART (Because they can be used for both classification and regression!) • Deep learning. Predictive models based on complex neural networks (or related architectures), containing very large numbers of neurons and many layers and/or complex overlapping networks. These tools are proving successful at complex “AI” type problems such as object recognition and language translation. • Development sample. The set of data used by the machine learning process to construct a predictive model. Development samples need to contain at least several hundred cases, but usually larger samples are used, containing thousands or millions of cases. • Ensemble. A predictive model comprised of several subsidiary models (as few as 3 or sometimes many thousands). All of the subsidiary models predict the same outcome, but have been constructed using different methods and/or using different data. This means that the models don’t always generate the same predictions. The ensemble combines all of the individual predictions to arrive at a final prediction. Model ensembles often (but not always) significantly outperform the best individual model. 37
  • 38. Appendix A. Common Terms used in AI and Machine Learning • Feed forward neural network. A neural network in which the connections are all one way, from one layer of neurons to the next. There are no backward or inter-layer connections. Most neural networks are feed forward networks. • Forecast horizon. Many (but not all) predictive models forecast future unknown events or quantities. The forecast horizon is the time frame over which a model predicts. Marketing models typically predict response over a forecast horizon of hours or days. In medicine, forecast horizons of many years are used when predicting survival probabilities for particular conditions. • Gains chart. A commonly used visual tool used to demonstrate the benefits of a model. Often used in conjunction with a lift chart. • Gini Coefficient. A popular measure for assessing the discrimination (accuracy) of a predictive model. The GINI is equivalent to the Area Under the Curve (AUC). • Internet of Things (The). This describes intelligent everyday devices such as fridges, washing machines and even light bulbs which can be connected to the internet or other systems. This enables data about their use to be gathered, and for users to be able to control these devices remotely e.g. set the washing machine going when I’m on my way home via a smartphone app. • Lift chart. A widely used graphical tool for demonstrating the practical benefits of a predictive model. Often used in conjunction with a Gains chart. • Linear model. A popular type of predictive model that is easy to understand and use. A score is calculated by multiplying the value of each characteristic by its relevant weight, and then summing up all the results. A popular way of representing linear models is in the form of a scorecard. • Linear regression. One of the oldest and most popular methods for creating regression models. The development of linear regression dates back more than 100 years. • Logistic regression. A very simple and popular method used for creating classification models. • Machine learning. Machine learning is the process of finding features (patterns in data). Machine learning algorithms derive from research into artificial intelligence and pattern recognition. The algorithms used to train (deep) neural networks and support vector machines are two examples of machine learning approaches. • Model. A mathematical representation of a real world system or situation. The model is used to determine how the real world system would behave under different conditions. See also predictive model. 38
  • 39. Appendix A. Common Terms used in AI and Machine Learning • Monitoring. The performance of predictive models tends to deteriorate over time. It is therefore prudent to instigate a monitoring regime following model implementation to measure how models are performing on an ongoing basis. Models are redeveloped when the monitoring indicates that a significant deterioration in model performance has occurred. • Neural network. A popular type of predictive model derived using machine learning. Neural networks are well suited to capturing complex interactions and non- linarites in data in a way that is analogous to human learning. “Deep” neural networks (Deep learning/Deep belief networks) are very large and complex neural networks (often containing thousands or millions of artificial neurons) which are used for “AI” tasks such as speech recognition and in self-driving cars. • Neuron. The key component of a neural network, which is often discussed as being analogous to biological neurons in the human brain. In reality, a neuron is a linear model whose score is then subject to a (non-linear) transformation. A neural network can therefore be considered as a set of interconnected linear models and non-linear transformations. • Odds. A way to represent the likelihood of an event occurring. The odds of an event is equal to (1/p) – 1 where p is the probability of the event. Likewise, the probability is equal to (1/Odds+1). So, odds of 1:1 is the same as a probability of 0.5, odds of 2:1 a probability of 0.33, 3:1 a probability of 0.25 and so on. • Over-fitting. The bane of a data scientist’s life. Over-fitting occurs when an algorithm goes too far in its search for correlations in the data used to develop a predictive model. The net result is that the model looks to be very predictive when measured against the development sample, but performs very poorly when it’s used to predict new outcomes using data that has not been presented to the model before. • Over-ride rule. Sometimes, certain actions must be taken regardless of the prediction generated by a model. For example, a predictive model used to target people with offers for beer might predict that some children are very likely to take up the offer. An override rule is therefore put in place to prevent offers being sent to children, regardless of the score generated by the model. • Predictive analytics (PA). The application of machine learning techniques to generate predictive models. Some would argue that for all practical purposes machine learning and predictive analytics are more or less the same thing, given that they use the same types of data as inputs, apply the same type of algorithms and generate similar outputs (scores). • Predictive model. A predictive model is the key output produced by a machine learning algorithm. The model captures the relationships (correlations) that the process has discovered. Once a predictive model has been created, it can then be applied to new situations to predict future, or otherwise unknown, events. • Predictive modelling, see predictive analytics. 39
  • 40. Appendix A. Common Terms used in AI and Machine Learning • Python. Along with R, Python is one of the most popular, freely available, software packages that is used for machine learning. • R. The R language has become one of the most popular software platforms for undertaking machine learning. The R software is free and open source, with the user community able to develop new functionality and share it with other users. For those wanting to become data scientists, then a good grounding in R (or Python) is recommended. Some books on R and Python are discussed in Appendix B. • Random forest. Random forests are an ensemble method, based on combining together the output of a large number of decisions trees, where each decision tree has been created under a slightly different set of conditions. Random forests are one of the most successful and widely applied ensemble methods. • Recurrent neural network. A type of neural network that uses the outputs from previous cases in the development sample as inputs for sub- sequent training. These types of network are particularly useful where there are sequential patterns in data; i.e. the ordering of cases in the development sample is important. One example is analysis of video. The changes in state from one images to the next (i.e. as objects move) can provide a lot of information about the nature of the current image. • Regression model. A popular tool for predicting the magnitude of something. For example, how much someone will spend or how long they will live. This is in contrast to a classification model which predicts the likelihood of an event occurring. • Response (choice) modelling. This term refers to marketing models used to predict the likelihood that someone buys a product or service that they have been targeted with. A response model is a type of classification model. • Score. Most predictions generated by predictive models are represented in the form of a number (a score). For a classification model the score is a representation of the probability of an event occurring e.g. how likely someone is to respond to a marketing communication or the probability of them defaulting on a loan. For a regression model the score represents the magnitude of the predicted behaviour e.g. how much someone might spend, or how long they might live. • Scorecard. A scorecard is a way of presenting linear models which is easy for non-experts to understand. The main benefit of a scorecard is that it is additive; i.e. a model score is calculated by simply adding up the points that apply. There is no multiplication, division or other more complex arithmetic. 40
  • 41. Appendix A. Common Terms used in AI and Machine Learning • Score distribution. A table or graph showing how the scores from a predictive model are distributed across the population of interest. Lift and Gain charts are both ways of presenting the score distribution in a graphical form. • Sentiment analysis. This is a popular technique for extracting information about peoples’ attitude towards things. For example, if they had a positive or negative experience when using a particular product or service. In machine learning, sentiment analysis is used to extract information from text or speech that is then used to build predictive models. • Supervised learning. The application of machine learning where each case in the development sample has an associated outcome which one wants to predict. The cases are said to be “Labelled.” An example of supervised learning in target marketing is where each customer’s response to marketing activity is known (they either responded or they didn’t). The model generated by the algorithm is them optimized so as to predict if customers will respond to marketing or not. In practice, most machine learning approaches are examples of supervised learning. • Support vector machine. An advanced type of non-linear model. Support vector machines have some similarities with neural networks. • Training Algorithm. The term used to describe iterative machine learning algorithms which are used to determine the structure of predictive models. The term is most widely used in relation to neural network type models. The algorithm repeats a number of times, adjusting the weights in the network so as to optimize the performance of the network. The training algorithm terminates after a fixed number of iterations or when no further significant improvements in model performance are obtained. • Unsupervised learning. The application of machine learning where cases in the development sample don’t have an associated outcome. Cases are said to be “Unlabelled.” Unsupervised algorithms typically seek to group cases with similar characteristics (features) together. An example of unsupervised learning applied to target marketing is where there is no information about previous customers responses to marketing activity. Clustering is applied to group similar types of customer together based on their age, income, gender etc. Marketing activity is then targeted at specific clusters of individuals. • Validation sample. An independent data set used to evaluate a predictive model after it has been constructed. The validation sample should be completely separate from the development sample, and should not be used during model construction. Using one (or more) validation samples is important, because machine learning sometimes over-fits a model to the development sample. This means that if you evaluate a predictive model using the development sample, it can appear to be more predictive than it actually is. 41
  • 42. Appendix B. Notes • 1. In a real world application, one would produce the score distribution from a second, independent validation sample. This is to ensure that the results have not been biased, due to some problem with the algorithm. • 2. The GINI (and the related AUC measure) is a very common measure used by data scientists to assess the discriminatory ability of predictive models. 42
  • 43. Artificial Intelligence and Machine Learning for Business. A No-Nonsense Guide to Data Driven Technologies June 2017 43