SlideShare a Scribd company logo
1 of 110
Download to read offline
Lecture 14: Artificial intelligence and machine learning
Dr. Martin Chapman
Principles of Health Informatics (7MPE1000). https://martinchapman.co.uk/teaching
Preamble: Setting expectations…
Studying artificial intelligence is
less of this…
Ex Machina, 2015
Preamble: Setting expectations…
And more of this…
Preamble: Simplifications
Artificial Intelligence and machine learning could easily take up an
entire course (or 10), and are very hard to describe properly within a
single lecture.
As such, I will make some fairly significant simplifications,
particularly when discussing machine learning.
As such, while the explanations here are enough to understand
concepts at a high level, for a full and proper understanding further
resources will be required.
Preamble: You’ll find a million (literally) similar
explanations online…
We’re not trying to compete with these resources, and they may even be
useful in respect of the simplification discussed previously.
What is (medical) AI?
There are lots of different ways to define AI, particularly in the
healthcare domain. Perhaps one of the first was:
Medical artificial intelligence is primarily concerned with the
construction of AI programs that perform diagnosis and make therapy
recommendations… medical AI programs are based on
symbolic models of disease entities and their relationship to patient
factors and clinical manifestations.
William J. Clancey and Edward H. Shortliffe.
Readings in medical artificial intelligence: the first decade.
1984.
We’ll see how this definition might
not apply exactly anymore.
What is (medical) AI?
We can/will argue that, like humans, a (medical) AI is able to
represent knowledge in the world and use that knowledge to reason,
and, before all of this, is able to gain that knowledge from
somewhere.
Hence the focus of this AI (and machine learning) lecture is…
Lecture structure
1. Computational representation and reasoning
2. Model building
Learning outcomes
1. Understand and critique different approaches to storing and
applying knowledge.
2. Understand the connection between modelling and these
approaches.
3. Be able to list different computational discovery techniques and
how they support the collection of knowledge.
Relationship with CDSS
Lecture 13 introduced a fairly compelling example of an AI – a
clinical decision support system (CDSS).
Therefore, the representation, reasoning and knowledge acquisition
techniques discussed here very much explain the ‘how’ of Lecture 13;
how do CDSSs deliver the benefits we saw there.
We’ll provide specific examples of this where possible.
Computational representation and
reasoning
How can we represent knowledge, and what approaches can we use
to automatically take that knowledge and arrive at a conclusion?
Recall: Knowledge structure
Knowledge, from our model: If a plane
does not have pontoons, then it will
sink.
We refer to this as an inference
procedure.
It is supported by three other (sub-)
entities:
(1) A language
(2) A knowledge base
(3) An ontology
When determining whether a plane
sinks, we are applying knowledge
that actually consists of four distinct
entities…
Computational representation and reasoning
In the first half of this lecture, we’ll be asking how we build this
knowledge base (representation) and define the associated inference
procedures so that reasoning can occur.
We’ll look at three representation and reasoning tools in the
following slides: (1) Rules, (2) Networks and (3) Systems dynamics.
For each, we should be able to answer the following:
1. How can we structure our knowledge?
2. How can we apply that structured knowledge to new data to gain
information?
Rules
Logical and statistical
The first way we can structure knowledge is as a set of rules. We
saw a basic version of a rule in Lecture 2. We can formalise this as:
Importantly, rules like this are a subset of formal logic:
RuleP1
If plane not pontoons
then conclude sink
Logic Rules
No pontoons ⇒ plane sinks
This symbol means
implies; if A is true we
are free to conclude that
B is true.
We’re free to use some of
the Boolean logic we saw
in Lecture 3.
Reasoning with logic rules
How do we now draw conclusions from these rules? We actually saw
this answer to this back in Lecture 4…
Recall: Deduction, abduction and induction
Reasoning with logic rules
How do we now draw conclusions from these rules? We actually saw
this answer to this back in Lecture 3…
Using deduction, for example, in combination with this rule, we can
conclude that if we encounter a plane without pontoons it will sink.
RuleP1
If plane not pontoons
then conclude sink
This would support the kind of ‘symbolic
AI’ mentioned in our original definition.
Reasoning with logic rules
How do we now draw conclusions from these rules? We actually saw
this answer to this back in Lecture 3…
From the perspective of logic (the core of our rules), we are saying if
a plane doesn’t have pontoons, and not having pontoons implies that
a plane sinks, then the plane must sink.
This is an inference procedure known as
modus ponens and adds weight to our
conclusions. It is a formalisation of our
original inference procedure from Lecture 2.
Reasoning with logic rules
How do we now draw conclusions from these rules? We actually saw
this answer to this back in Lecture 3…
We could also work backwards and use a rule of logic called modus
tollens to tell us that if a plane isn’t sinking then it must have
pontoons.
We need to be careful with modus tollens though as it creates a very
strict association. If we want to have multiple potential reasons for
an outcome, the abduction alone might be a better way to draw
conclusions.
Statistical rules
Logic rules allow us to express that one event definitely implies
another, but as we’ve seen throughout the course it’s often the case
that certain outcomes have an element of probability attached to
them.
When rules account for probability, we call them statistical rules.
Many statistical rules are based on something we’ve already seen…
Recall: A question
You are told that a person named Alex has the following personality
traits:
‘Quiet’, ‘Reserved’, ‘Good at maths’
Do you think it’s more likely that Alex:
(A)Works in retail (e.g. a shop assistant)
or
(B) Is a computer programmer?
This is our new data
Recall: Bayes’ theorem: Formula
We can write this process out as a formula:
(75% × 25%)
((75% × 25%)+(25% × 75%))
(Programmer likelihood × Programmer prior)
((Programmer likelihood × Programmer prior) +
(Retail likelihood × Retail prior))
or
= 50%
Prior
Likelihood
Likelihood
Statistical rules – Bayes’ theorem
But we can represent, and thus reason with, even more complex
scenarios using the principles of Bayes’ theorem…
RuleB1
If person is ‘Quiet’ ‘Reserved’, ‘Good at maths’
then conclude programmer with probability (0.5)
As a statistical rule, this is
what the output from our
previous Bayes’ calculation
would look like.
Networks
Bayesian Belief Networks, Markov chains and Neural Networks
Preamble: Beyond statistical rules
Relying on sets of rules alone – which are relatively simple in form –
and associated inference procedures, to represent and reason with
knowledge may be somewhat limiting.
Therefore, we can look instead to creating more complex structures
to store and link our knowledge, and facilitate reasoning: networks.
We saw a basic example of a network containing linked knowledge
back in Lecture 4:
Recall: ‘Decision trees’
We want to know the chance that Alex could fill a Python programmer role:
Suitable for the job: (50% × 80%) + (50% × 1%) = 40.5%
Not suitable for the job: (50% × 20%) + (50% × 99%) = 59.5%
Alex turns up
Alex is a
programmer
Alex is not a
programmer
Alex knows
Python
Alex does not
know Python
Alex knows
Python
Alex does not
know Python
Suitable for
the job
Not suitable
for the job
Suitable for
the job
Not suitable
for the job
50%
50%
80%
20%
99%
1%
Bayesian belief networks
We can take things a step further by explicitly combining decision
trees with concepts from Bayes’ theorem to introduce a more
nuanced representation of probabilistic knowledge – a Bayesian belief
network – that allows us to reason with more complex scenarios.
First, we can do this simply…
Bayesian belief networks
Alex is a
programmer
Alex is quiet
Likelihood = 75%
The probability that someone
with Alex’s traits is a
programmer
The probability that anyone in
the population is a programmer
Prior = 25%
We lay out (part) of the
information from our
instantiation of Bayes’ theorem
in a network structure.
Bayesian belief networks
We can take things a step further by explicitly combining decision
trees with concepts from Bayes’ theorem to introduce a more
nuanced representation of probabilistic knowledge – a Bayesian belief
network – that allows us to reason with more complex scenarios.
First, we can do this simply… reformatting our existing Bayes’
equation graphically, capturing that if a parent node is true (initially
based on a prior probability) then a child node is also true, with a
given conditional probability (likelihood).
We can then make this more complex…
Bayesian belief networks
…
…
Prior = …
Likelihood = … Likelihood = …
Prior = …
Likelihood = 75% Likelihood = …
Alex is a
programmer
Alex is quiet
Prior = 25%
Prior = …
Bayesian belief networks
We can take things a step further by explicitly combining decision
trees with concepts from Bayes’ theorem to introduce a more
nuanced representation of probabilistic knowledge – a Bayesian belief
network – that allows us to reason with more complex scenarios.
First, we can do this simply… reformatting our existing Bayes’
equation graphically, showing that if a parent node is true (initially
based on a prior probability) then a child node is also true, with a
given conditional probability (likelihood).
We can then make this more complex… adding additional
dependencies. Complex questions can then be asked of this network,
and a form of Bayes’ rule applied iteratively to infer the answer.
Networks
Bayesian Belief Networks, Markov chains and Neural Networks
Markov chains
A Bayesian belief network stores probabilistic knowledge and allows
us to reason with it.
Another type of network, a Markov Chain, does something very
similar. Here, probabilities represent the chance of transitioning from
a current state to another state when those states are connected.
An example of a current state might be the state of the weather
today, and another, connected state might be the state of the
weather tomorrow:
Markov chains
Here we are representing, for example, that if today is sunny, the
chance of it being sunny tomorrow is 20%.
We’ll talk about where these numbers are likely to come from later.
Sunny Rainy
20%
80%
75%
25%
Markov chains - inference
Is predicting the weather tomorrow (inference) therefore as simple as
reading these probabilities, or should we look at past patterns (i.e.
combine probabilities from previous states)?
Something called the Markov property, connected to Markov chains,
actually says no, in order to balance simplicity with correctness, we
should not do this.
We should actually just use the current state, interpreting the
probabilities on the chain directly.
Hidden Markov chains
Let’s imagine we know our weather state transition probabilities
(seen previously), but for some reason we can’t see what the
weather is like (e.g. we are in a room with no windows).
We can, however, see evidence of the weather (e.g. people coming
into the room with or without an umbrella)
This would introduce a second set of probabilities (emission
probabilities), between the hidden state (the weather) and our
observations (the presence of an umbrella):
Hidden Markov chains
Umbrella No umbrella
Hidden
states
Observations
Emission
probabilities
Transition
probabilities
90% 10%
90%
10%
We have much more
expressive power in a
Hidden Markov chain
Sunny Rainy
20%
80%
75%
25%
Recall: Part-of-speech (POS) tagging
Our bag of words approach doesn’t allow us to appreciate that
words appear in sentences, and each have a different grammatical
role (e.g. verbs and nouns).
The process of determining the role each word plays in a sentence is
known as part-of-speech (POS) tagging.
It is important to understand whether a word is a noun or a verb, for
example, so we can correctly label entities from our terminology.
We’ll look at Markov models, which support automatic POS tagging, later in the course.
POS tagging and Hidden Markov chains
In POS tagging, we encounter a similar set of observations and
hidden states as we do in our (slightly contrived) weather example.
Our observations are our words (e.g. ‘land’), and our hidden states –
which these observations might be evidence of – are the different
grammatical roles each word might have (e.g. ‘verb’ or ‘noun’).
Transition probabilities are the chance of, for example, a verb
following a noun, and our emission probabilities are the chance of a
word, in general, holding a certain grammatical role:
POS tagging and Hidden Markov chains
How inference would be applied to a network like this for POS tagging is
outside the scope of this course, but we will come back to where our
probabilities come from.
The overall chance
of land being a
verb
How often one type
of word is followed by
another
Transition
probabilities
Emission
probabilities
Hidden
states
Observations
Land Seaplane
?% ?%
?% ?%
Noun Verb
?%
?%
?%
?%
Networks
Bayesian Belief Networks, Markov chains and Neural Networks
Background: Human thinking
The human brain consists of a set of neurons, connected by a set of
synapses. If the inputs (e.g. observations in the world) to a given
neuron, via a set of synapses, are sufficient (i.e. we see enough of
them) that neuron will fire. This is the basis of thought.
Neuron
Synapses
If I see this wing, these feet and
this beak, the neuron fires and I
can conclude this is a penguin.
Background: Human thinking
But of course some input features are more or less correlated with a
given output than others. To represent this, we implicitly weight
certain inputs in our mind.
These weights thus reflect our knowledge.
Neuron
Synapses
Because other animals might have wings like
this, I give this input a lower weight as it
alone is less indicative of this being a penguin.
0.25
0.5
0.5
Neural networks
A very similar structure is used to store knowledge – in the form of
weights – in a third type of network, a neural network.
Reasoning then occurs by seeing if the inputs are, given the weights,
sufficient for a neuron to fire.
Input 1
Input 2
Input 3
Weight A
Weight B
Weight C
Output
Neural networks
The network represents whether inputs are sufficient for a neuron to fire
using something called an activation function.
The simplest activation function (f), a binary step function, has a
straightforward threshold for firing the neuron: whether, once weighted
and summed, there is any (positive) input at all.
Input 1
Input 2
Input 3
Weight A
Weight B
Weight C
Output
x = Input 1 × Weight A
+ Input 2 × Weight B
+ Input 3 × Weight C
If x > 0: f(x) = 1
If x < 0: f(x) = 0
Summation
Activation
Neural networks
x = 0.5 × 1
+ 0.1 × 1
+ 0 × 1
0.5
0.1
0
= 0.6
f(0.6) = 1
1
1 could indicate
‘true’ (e.g. is a
penguin)
1
1
1
Output
Summation
Activation
If x > 0: f(x) = 1
If x < 0: f(x) = 0
The network represents whether inputs are sufficient for a neuron to fire
using something called an activation function.
The simplest activation function (f), a binary step function, has a
straightforward threshold for firing the neuron: whether, once weighted
and summed, there is any (positive) input at all.
Systems dynamics
Preamble: Rule and network limitations
We will talk more about how we acquire the knowledge we’ve seen
thus far (e.g. the percentages for Bayes’ rule) shortly, but informally
we know it is based on past experiences.
We can estimate, for example, the portion of programmers in a
population fitting a description using our experiences with
programmers.
But what if we’ve never met any programmers and seen their traits
before? In other words, what if we’ve never seen a particular pattern
before?
Background: Foundational knowledge
In these situations, all we can rely on foundational knowledge; things
we already know to be true of the world, and reason from that.
Blueprints are a good example of
foundational knowledge. They
represent, at a very low level, exactly
what we know to be true of a device,
so that when new problems are
encountered they can be reasoned
upon to generate new information in
respect of that problem.
Systems dynamics
An entity that can capture a greater range of foundational
knowledge than a blueprint is something called systems dynamics.
This representation uses a variety of (diagrammatic) formalisms –
stocks, flows, converters and connectors – to represent complex
systems.
Systems dynamics – an example
Let’s say we want to represent foundational knowledge about the
impact of birth rate on the size of a population.
This isn’t as straightforward as connecting birth rate and population,
as, for example, a decrease in birth rate doesn’t directly decrease the
population. Instead, it causes it to increase at a slower rate.
Similarly, there is a reverse connection between population size and
birth rate (a larger population means, in turn, more births).
We can capture all this using systems dynamics:
Systems dynamics
Population
Births
Birth rate
An example of a
stock: a value that
can be increased or
decreased.
An example of a
(in)flow: an entity that
increases the value of a
stock
An example of a
converter: an entity
whose value is not
determined directly by
the system.
An example of a connector:
stocks can also, in turn,
impact flows.
An example of a
connector: convertors
can impact flows.
Only flows can impact
stocks.
By separating out the
concepts of birth rate,
births and population we
can capture this nuance of
the world.
Systems dynamics
An entity that can capture a greater range of foundational
knowledge than a blueprint is something called systems dynamics.
This representation uses a variety of (diagrammatic) formalisms –
stocks, flows, converters and connectors – to represent complex
systems.
Inference is then done by digitalising these formalisms, providing
initial values and then simulating how flows are likely to impact these
stocks over time.
Summary: Computational representation and reasoning
How can we structure our
knowledge?
How can we apply that structured knowledge to
new data to gain information?
Logical rules See which rules match the data. What these rules
tell us is definitely true then becomes our new
information.
Statistical rules See which rules match the data. What these rules
tell us might be true then becomes our new
information.
Summary: Computational representation and reasoning
How can we structure our
knowledge?
How can we apply that structured knowledge to
new data to gain information?
Bayesian networks Our data tells us what is true in the network, and
then we apply a form of Bayes’ rule we gain new
information about what else may also be true.
Markov chains Our data tells us the current state, and then we use
the chain to gain information about which state(s)
may be transitioned to.
Neural networks Our data is used as input to an (artificial) neuron,
the activation (or not) of which is information.
Summary: Computational representation and reasoning
How can we structure our
knowledge?
How can we apply that structured knowledge to
new data to gain information?
Systems dynamics Simulations are run on computational versions of
systems dynamics, and the state of the system at
the end of the simulation is our information.
These reasoning processes can be done by a machine (an AI),
hence the term ‘computational’.
Model building
Where does the knowledge that we represent and then reason with
come from?
Model building
So far, we’ve been representing knowledge and applying it back to
the world (reasoning). This might be familiar to you…
Recall: Knowledge acquisition and application
In this way, we see how models
capture knowledge that has been
gained from the world, and in
turn apply that knowledge to the
world to gain new information.
Abstraction
+
Knowledge
Acquisition
Instantiation
+
Knowledge
Application
Model building
So far, we’ve been representing knowledge and applying it back to
the world. This might be familiar to you…
We have thus, in the first half of this lecture,
ticked off the right-hand side of this diagram,
which we saw back in Lecture 2.
This diagram also tells us that – whether we
are constructing rules, networks or systems
dynamics – what we are doing is building
models.
Model building
The left-hand side is now of
interest: how do we get the
knowledge needed to build these
models?
We will use a generalised form of
this diagram to visualise the
techniques we look at in the
remainder of the lecture, each of
which tells us how we acquire
knowledge.
Model building
? Reasoning
Representation
?
as rules,
networks or
systems dynamics
Model
e.g. for clinical
decision support
Running example
We know that there is knowledge represented within an instantiation
of Bayes’ theorem, which can then be used to reason about the world
(e.g. whether an individual is a programmer based on their traits). It
is a model.
But where does this knowledge (the likelihoods and priors) come from?
We’ll answer this question when exploring various model-building
methods, but these methods could be used to construct other models
too (other rules, networks and systems dynamics).
Model building - asking humans
Model building from humans
Probably the most straightforward way to obtain the knowledge for
a machine to represent and reason with is to ask humans for it. This
is often called knowledge engineering.
We could, for example, ask humans to estimate the probabilities for
our Bayes’ theorem (similar to what we did in Lecture 4 for our
programmer vs. retail likelihoods).
Model building from humans
? Reasoning
Representation
as rules,
networks or
systems dynamics
Model
Model building from humans
Probably the most straightforward way to obtain the knowledge for
a machine to represent and reason with is to ask humans for it. This
is often called knowledge engineering.
We could, for example, ask humans to estimate the probabilities for
our Bayes’ rule (similar to what we did in Lecture 4 for our initial
programmer vs. retail likelihoods).
Knowledge like this can be obtained from humans using several
different mechanisms:
Model building from humans - mechanisms
‘Think-aloud’
Interviews Observation
Ask users to verbalise their
thoughts while they complete
tasks.
Model building from humans - mechanisms
Systematic Review
Repertory grid Crowdsourcing
Ask users to discuss the
similarities and differences
between different concepts to
capture relationships.
As we saw in Lecture 4, search is
one way we can systematically look
through data to obtain knowledge,
such as when examining literature.
Summary: Model building from humans
Interviews Reasoning
Representation
as rules,
networks or
systems dynamics
Model
Model building – computational discovery
Data Mining, Clustering (Unsupervised machine learning) and Naïve
Bayes Classifier (Supervised machine learning)
Model building from data (computational discovery)
Our final example of model building from humans, a systematic
literature review, suggests that if human knowledge is embedded in
data, we may not need to consult humans at all.
In other words, we might be able to build our models (acquire
knowledge) automatically from data.
Model building from data (computational discovery)
? Reasoning
Representation
as rules,
networks or
systems dynamics
Model
Model building from data (computational discovery)
Our final example of model building from humans, a systematic
literature review, suggests that if human knowledge is embedded in
data, we may not need to consult humans at all.
In other words, we might be able to build our models (acquire
knowledge) automatically.
Automatic model building is known as computational discovery. We
will look at three computational discovery mechanisms: (1) Data
mining, (2) Unsupervised learning (briefly) and (3) Supervised
learning.
Data Mining
A broad approach to computational discovery is something called
data mining. If the data being examined is unstructured (e.g. free
text) this may also be called text mining.
Here, we look for specific patterns within data. The more often we
find the same pattern, the surer we can be that it is knowledge.
These patterns may be rules (e.g. quiet ⇒ programmer), or
sequences of events (known as process mining).
Model building – computational discovery
Data Mining, Clustering (Unsupervised machine learning) and Naïve
Bayes Classifier (Supervised machine learning)
Unsupervised machine learning
One way to conceptualise data mining is as a system (or machine)
learning new knowledge (e.g. rules) from patterns in the data.
Therefore, if data mining is conducted without any human support,
as is often the case, we can describe what is being done as
unsupervised machine learning.
Clustering
A common form of unsupervised machine learning is known as
clustering.
Rather than identifying, for example, rules from the data, the goal of
clustering is broader: to identify groups (classes) in the data. The
features of those groups and the value of those features then become
new knowledge that can be used to better understand the data or
reason with new data.
We could apply clustering to our programmer and retail worker
population to try and learn the values needed for Bayes’ theorem…
Recall: Our programmer and retail worker population
One (existing) cluster
or class containing
programmers
Another (existing) cluster
or class containing retail
workers
Clustering
A common form of unsupervised machine learning is known as
clustering.
Rather than identifying, for example, rules from the data, the goal of
clustering is broader: to identify groups (classes) in the data. The
features of those groups and the value of those features then become
new knowledge that can be used to better understand the data or
reason with new data.
We could apply clustering to our programmer and retail worker
population to try and learn the values needed for Bayes’ theorem…
but this doesn’t tell us much more than we already know.
We already know the classes that
would have been discovered
through the clustering process.
Model building – computational discovery
Data Mining, Clustering (Unsupervised machine learning) and Naïve
Bayes Classifier (Supervised machine learning)
Supervised Machine Learning
We’ve been talking a lot about learning without talking about a
related concept: teaching.
Rather than leaving a machine unsupervised to learn, we can
supervise (teach) it by providing it with examples that it can learn
relationships from.
To understand this better, we can again ask how we might learn the
probabilities in Bayes’ theorem:
Recall: Our programmer and retail worker population
Population features
Quiet
Quiet
Quiet
Loud
Quiet
Loud
Loud
Loud
Loud
Loud
Quiet
Loud
Loud
Loud
Quiet
Loud
Let’s imagine we now
have information on
the traits of people
within our population.
ID Trait Age Occupation
1 Quiet 42 Programmer
2 Quiet 38 Programmer
3 Quiet 76 Programmer
4 Loud 51 Programmer
5 Quiet 19 Retail
6 Loud 53 Retail
7 Loud 59 Retail
8 Loud 80 Retail
Prelude: Populations as data
Let’s translate this graphic to a set of data (and add age as well to help our example).
How would we now calculate our Bayes’ values?
ID Trait Age Occupation
9 Loud 77 Retail
10 Loud 61 Retail
11 Quiet 40 Retail
12 Loud 35 Retail
13 Loud 32 Retail
14 Loud 46 Retail
15 Quiet 56 Retail
16 Loud 25 Retail
ID Trait Age Occupation
1 Quiet 42 Programmer
2 Quiet 38 Programmer
3 Quiet 76 Programmer
4 Loud 51 Programmer
5 Quiet 19 Retail
6 Loud 53 Retail
7 Loud 59 Retail
8 Loud 80 Retail
Learning Bayes’: The prior
25%
Calculating the prior(s) is much the
same as we did before, however…
ID Trait Age Occupation
9 Loud 77 Retail
10 Loud 61 Retail
11 Quiet 40 Retail
12 Loud 35 Retail
13 Loud 32 Retail
14 Loud 46 Retail
15 Quiet 56 Retail
16 Loud 25 Retail
ID Trait Age Occupation
1 Quiet 42 Programmer
2 Quiet 38 Programmer
3 Quiet 76 Programmer
4 Loud 51 Programmer
5 Quiet 19 Retail
6 Loud 53 Retail
7 Loud 59 Retail
8 Loud 80 Retail
Learning Bayes’: The likelihood
ID Trait Age Occupation
9 Loud 77 Retail
10 Loud 61 Retail
11 Quiet 40 Retail
12 Loud 35 Retail
13 Loud 32 Retail
14 Loud 46 Retail
15 Quiet 56 Retail
16 Loud 25 Retail
75%
…we can now calculate the likelihood directly, as
we have actual data we can use.
Previously, we had estimated these values, based
upon what portion of a group of programmers
(the number of which was based on our prior)
we thought would have the traits of interest.
ID Trait Age Occupation
1 Quiet 42 Programmer
2 Quiet 38 Programmer
3 Quiet 76 Programmer
4 Loud 51 Programmer
5 Quiet 19 Retail
6 Loud 53 Retail
7 Loud 59 Retail
8 Loud 80 Retail
ID Trait Age Occupation
9 Loud 77 Retail
10 Loud 61 Retail
11 Quiet 40 Retail
12 Loud 35 Retail
13 Loud 32 Retail
14 Loud 46 Retail
15 Quiet 56 Retail
16 Loud 25 Retail
Learning Bayes’: The likelihood
We can do a similar thing for retail
workers.
25%
The process we have been through here is the process a machine goes
through automatically when undertaking what is known as supervised
learning. Let’s formalise slightly some of the things we did…
Supervised learning
ID Trait Age Occupation
1 Quiet 42 Programmer
2 Quiet 38 Programmer
3 Quiet 76 Programmer
4 Loud 51 Programmer
5 Quiet 19 Retail
6 Loud 53 Retail
7 Loud 59 Retail
8 Loud 80 Retail
ID Trait Age Occupation
9 Loud 77 Retail
10 Loud 61 Retail
11 Quiet 40 Retail
12 Loud 35 Retail
13 Loud 32 Retail
14 Loud 46 Retail
15 Quiet 56 Retail
16 Loud 25 Retail
Training data, features and classes
Features Classes (or labels)
Predictor
We call this training data
What
is
the
relationship
between
the
predictor
and
each
label?
Supervised learning
The process we have been through here is the process a machine goes
through automatically when undertaking what is known as supervised
learning. Let’s formalise slightly some of the things we did…
Our training data supplied us with features, from which we selected
trait as a predictor and added labels to form positive examples for
different classes (e.g. programmer).
From these examples, we learned the relationship between, for
example, someone’s traits and them being a programmer (e.g. 75%
chance of being quiet if a programmer).
Much like our initial guess about the portions of programmers and
retail workers, we hope our training data generalises (more later).
This contrasts unsupervised learning,
where we didn’t have any labelled
examples to learn from.
Learning Bayes’: Final model
We now have all the numbers we need to instantiate Bayes’ theorem.
Specifically, the simple model built from this type of supervised learning
is called a Naïve Bayes’ classifier. It can now make future predictions
when new data is encountered.
This classifier is naïve because it makes several assumptions, such as
there being no connection between different features. In reality, this
might not be the case (e.g. might someone’s traits be impacted by
age?).
Other supervised learning-based models
The kind of simplification within a Naïve Bayes classifier is the same
kind of simplification we have identified in models throughout the
course.
Therefore, there are several other examples of supervised machine
learning-based models, some of which you may have heard of, such
as logistic regression (models).
These models, of course, also rely on the presence of labelled data. If
we don’t have this data, the best we can do is rely on unsupervised
learning.
Recall: Markov chains
Knowing that machine learning uses
data to acquire knowledge, we can
now answer where the transition
probabilities in our Markov chain
might come from:
Day Weather
1 Sunny
2 Rainy
3 Sunny
4 Rainy
5 Sunny
6 Rainy
7 Sunny
8 Sunny
9 Rainy
10 Rainy
Sunny Rainy
20%
80%
75%
25%
Count of transitions from state i to state j
Total count of transitions from state i
Recall: Neural Networks
We can also answer where the weights (knowledge) in our neural
network might come from: given a set of inputs (features) and
outputs (labels), we experiment with different weights until the
output of the activation function matches the output in the data:
This is obviously a very simple training process. When we have more complex neural networks – those with different
activation functions or potentially those where we have neurons organised into different groups or layers – then our
training approaches also become more complex, often referred to as ‘deep learning’.
Input A Input B Input C Output
0.5 0.1 0 1
… … … …
Aside: What about Generative AI?
We can’t really talk about AI and machine learning without talking
about ChatGPT.
The more formal way to refer to ChatGPT is as a Large Language
Model-backed (LLM-backed) chatbot; the chatbot uses a trained
LLM (Generative Pretrained Transformer (GPT)) to support its
operation.
LLMs are trained on large amounts of existing text using a mixture
of unsupervised and supervised processes.
Aside: What about Generative AI?
LLMs are a form of generative AI as they generate new text content:
based on a prompt, they generate the next most likely sequence of
words based on the information about the nature of text gained
during training.
LLMs could, therefore, conduct the Named Entity Recognition (NER)
process we saw in Lecture 11, taking the string to ‘make
computable’ as a prompt and generating the labels. They are thus
also part of the field of Natural Language Processing (NLP).
Because Markov Chains also support POS tagging, part of the NER
process, they can be viewed as a simple form of language model.
LLMs in the context of
what we’ve seen so far…
Aside: It’s all statistics in the end…
In many situations, the relationship
supervised machine learning tries to
identify can just be viewed as a line (or
an equation that describes that line) that
separates the examples given.
With just one or two classes this is simple and only requires two
dimensions, but with more classes this becomes more complex and
requires more sophisticated methods such as support vector
machines.
Retail
Programmer
Summary: Model building from data (computational
discovery)
Machine
learning
Reasoning
Representation
as rules,
networks or
systems dynamics
Model
Summary
Artificial intelligence effectively equates to automatically gaining
knowledge, representing that knowledge, and then automatically
applying it to new data to obtain information.
It is thus akin to the modelling processes we have seen.
Knowledge can be gained from humans, or from existing data through
data mining, unsupervised or supervised learning. The applicability of
each depends upon the data available for learning.
Knowledge can be represented and applied using rules (logical or
statistical), networks (Bayes’, Markov or Neural) or Systems
Dynamics.
Recall: It all comes back to public health
interventions…
If we can automate the application of knowledge to health data,
then we can automate (the introduction of) interventions.
If we can’t fully use information systems to automate this
application, then they can assist clinicians in the delivery of
interventions.
Diabetes
A computer, as an
information system, could
automatically determine
whether a patient has
diabetes and act accordingly
Epilogue: Evaluating models
Model suitability
The success of the (simple) supervised learning process we’ve seen
depends upon how representative our data is of the true situation in
the world.
If it is representative, then the relationships we capture in our model
are likely to be correct. If it is not, then the relationships are also
unlikely to be accurate.
The smaller a dataset is the less likely it is to be representative.
The accuracy of the relationships captured can also depend on the
features selected, the accuracy of the labels and even the learning
algorithm itself.
Model evaluation
For these reasons, it’s important to check how well our model
performs before we use it to make future predictions.
To do this, we run our model against test data and evaluate the
quality of the predictions it makes.
We can base this evaluation on techniques we’ve already seen…
Recall: Questioning: Evaluation (Query Performance
Measures – Sensitivity)
‘UCL Institute
of Health Inf’
Q: ‘Health Informatics’
The items that we want that are
included in our search are called True
Positives (TP).
Any items that are missed by our
search (i.e. we wanted them but they
were not retrieved) are called False
Negatives (FN).
We can calculate a true-positive rate
(or sensitivity) as the proportion of
true positives from amongst those
results that we wanted, or
TP/(TP+FN).
‘KCL History’
‘KCL Physics’
1
2
= 50%
‘KCL
Principles of
Health Inf’
Recall: Questioning: Evaluation (Query Performance
Measures – Specificity)
‘KCL
Principles of
Health Inf’
‘KCL History’
‘UCL Institute
of Health Inf’
Q: ‘Health Informatics’
The items that we don’t want that are
not included in our search are called
True Negatives (TN).
Any unwanted items that are included
by our search (i.e. we didn’t want
them but they were retrieved anyway)
are called False Positives (FP).
We can calculate a true-negative rate
(or specificity) as the proportion of
true negatives from amongst those
results that we did not want, or
TN/(TN+FP).
‘KCL Physics’ 1
2
= 50%
Model evaluation
For these reasons, it’s important to check how well our model
performs before we use it to make future predictions.
To do this, we run our model against test data and evaluate the
quality of the predictions it makes.
We can base this evaluation on techniques we’ve already seen…
calculating sensitivity and specificity, for example.
Specifically, we show the testing data to our model (e.g. our Naïve
Bayes classifier) hiding the true labels when doing so. The labels the
classifier comes up with can thus be used to determine its
performance:
Model evaluation
ID Trait Age Occupation
1 Quiet 42 Programmer
2 Quiet 38 Programmer
3 Quiet 76 Programmer
4 Loud 51 Programmer
Occupation
Programmer
Programmer
Programmer
Retail
3
4
= 75%
1. True labels (hidden):
2. Classifier labels:
3. Evaluate the
classifier labels
in the way
we’ve seen:
4.Calculate
sensitivity:
References and Images
Enrico Coiera. Guide to Health Informatics (3rd ed.). CRC Press, 2015.
Bella Martin. Universal methods of design 100 ways to research complex problems, develop innovative ideas, and design
effective solutions. Rockport Publishers, 2012.
https://www.gettyimages.co.uk/
https://etn-sas.eu/2020/09/23/part-of-speech-tagging-using-hidden-markov-models/
https://towardsdatascience.com/deep-learning-with-python-neural-networks-complete-tutorial-6b53c0b06af0
https://www.pinterest.co.uk/pin/2040762311279389/
https://thesystemsthinker.com/step-by-step-stocks-and-flows-improving-the-rigor-of-your-thinking

More Related Content

Similar to Principles of Health Informatics: Artificial intelligence and machine learning

Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AIVishal Singh
 
LECTURE8.PPT
LECTURE8.PPTLECTURE8.PPT
LECTURE8.PPTbutest
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeOsama Ghandour Geris
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)ijceronline
 
Slides(ppt)
Slides(ppt)Slides(ppt)
Slides(ppt)butest
 
Statistical foundations of ml
Statistical foundations of mlStatistical foundations of ml
Statistical foundations of mlVipul Kalamkar
 
Lecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnLecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnKodok Ngorex
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inferencebutest
 
Deep Learning from Scratch - Building with Python from First Principles.pdf
Deep Learning from Scratch - Building with Python from First Principles.pdfDeep Learning from Scratch - Building with Python from First Principles.pdf
Deep Learning from Scratch - Building with Python from First Principles.pdfYungSang1
 
Learning
LearningLearning
Learningbutest
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)Rizwan Shaukat
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Daniel Katz
 
Bt9402, artificial intelligence
Bt9402, artificial intelligenceBt9402, artificial intelligence
Bt9402, artificial intelligencesmumbahelp
 
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptxPPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptxRaviKiranVarma4
 

Similar to Principles of Health Informatics: Artificial intelligence and machine learning (20)

Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
 
LECTURE8.PPT
LECTURE8.PPTLECTURE8.PPT
LECTURE8.PPT
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Slides(ppt)
Slides(ppt)Slides(ppt)
Slides(ppt)
 
Statistical foundations of ml
Statistical foundations of mlStatistical foundations of ml
Statistical foundations of ml
 
4KN Editted 2012.ppt
4KN Editted 2012.ppt4KN Editted 2012.ppt
4KN Editted 2012.ppt
 
Lecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnLecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can Learn
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Deep Learning from Scratch - Building with Python from First Principles.pdf
Deep Learning from Scratch - Building with Python from First Principles.pdfDeep Learning from Scratch - Building with Python from First Principles.pdf
Deep Learning from Scratch - Building with Python from First Principles.pdf
 
Learning
LearningLearning
Learning
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
 
Bt9402, artificial intelligence
Bt9402, artificial intelligenceBt9402, artificial intelligence
Bt9402, artificial intelligence
 
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptxPPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-3.pptx
 

More from Martin Chapman

Principles of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systemsPrinciples of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systemsMartin Chapman
 
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Martin Chapman
 
Technical Validation through Automated Testing
Technical Validation through Automated TestingTechnical Validation through Automated Testing
Technical Validation through Automated TestingMartin Chapman
 
Scalable architectures for phenotype libraries
Scalable architectures for phenotype librariesScalable architectures for phenotype libraries
Scalable architectures for phenotype librariesMartin Chapman
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
 
Using AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsUsing AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsMartin Chapman
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
 
Principles of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwarePrinciples of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwareMartin Chapman
 
Principles of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwarePrinciples of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwareMartin Chapman
 
Principles of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthPrinciples of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthMartin Chapman
 
Principles of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcarePrinciples of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcareMartin Chapman
 
Principles of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systemsPrinciples of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systemsMartin Chapman
 
Principles of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledgePrinciples of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledgeMartin Chapman
 
Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...Martin Chapman
 
Principles of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systemsPrinciples of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systemsMartin Chapman
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Martin Chapman
 
Using Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareUsing Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareMartin Chapman
 
Using CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingUsing CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingMartin Chapman
 
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesPhenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesMartin Chapman
 

More from Martin Chapman (20)

Principles of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systemsPrinciples of Health Informatics: Clinical decision support systems
Principles of Health Informatics: Clinical decision support systems
 
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
 
Technical Validation through Automated Testing
Technical Validation through Automated TestingTechnical Validation through Automated Testing
Technical Validation through Automated Testing
 
Scalable architectures for phenotype libraries
Scalable architectures for phenotype librariesScalable architectures for phenotype libraries
Scalable architectures for phenotype libraries
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
 
Using AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patientsUsing AI to autonomously identify diseases within groups of patients
Using AI to autonomously identify diseases within groups of patients
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
 
Principles of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical softwarePrinciples of Health Informatics: Evaluating medical software
Principles of Health Informatics: Evaluating medical software
 
Principles of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical softwarePrinciples of Health Informatics: Usability of medical software
Principles of Health Informatics: Usability of medical software
 
Principles of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile healthPrinciples of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Social networks, telehealth, and mobile health
 
Principles of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcarePrinciples of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Communication systems in healthcare
 
Principles of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systemsPrinciples of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Terminologies and classification systems
 
Principles of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledgePrinciples of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Representing medical knowledge
 
Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Informatics skills - communicating, structu...
 
Principles of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systemsPrinciples of Health Informatics: Models, information, and information systems
Principles of Health Informatics: Models, information, and information systems
 
Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...Using AI to understand how preventative interventions can improve the health ...
Using AI to understand how preventative interventions can improve the health ...
 
Using Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareUsing Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research Software
 
Using CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotypingUsing CWL to support EHR-based phenotyping
Using CWL to support EHR-based phenotyping
 
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable PhenotypesPhenoflow: An Architecture for Computable Phenotypes
Phenoflow: An Architecture for Computable Phenotypes
 
Phenoflow 2021
Phenoflow 2021Phenoflow 2021
Phenoflow 2021
 

Recently uploaded

Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 

Recently uploaded (20)

Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

Principles of Health Informatics: Artificial intelligence and machine learning

  • 1. Lecture 14: Artificial intelligence and machine learning Dr. Martin Chapman Principles of Health Informatics (7MPE1000). https://martinchapman.co.uk/teaching
  • 2. Preamble: Setting expectations… Studying artificial intelligence is less of this… Ex Machina, 2015
  • 4. Preamble: Simplifications Artificial Intelligence and machine learning could easily take up an entire course (or 10), and are very hard to describe properly within a single lecture. As such, I will make some fairly significant simplifications, particularly when discussing machine learning. As such, while the explanations here are enough to understand concepts at a high level, for a full and proper understanding further resources will be required.
  • 5. Preamble: You’ll find a million (literally) similar explanations online… We’re not trying to compete with these resources, and they may even be useful in respect of the simplification discussed previously.
  • 6. What is (medical) AI? There are lots of different ways to define AI, particularly in the healthcare domain. Perhaps one of the first was: Medical artificial intelligence is primarily concerned with the construction of AI programs that perform diagnosis and make therapy recommendations… medical AI programs are based on symbolic models of disease entities and their relationship to patient factors and clinical manifestations. William J. Clancey and Edward H. Shortliffe. Readings in medical artificial intelligence: the first decade. 1984. We’ll see how this definition might not apply exactly anymore.
  • 7. What is (medical) AI? We can/will argue that, like humans, a (medical) AI is able to represent knowledge in the world and use that knowledge to reason, and, before all of this, is able to gain that knowledge from somewhere. Hence the focus of this AI (and machine learning) lecture is…
  • 8. Lecture structure 1. Computational representation and reasoning 2. Model building
  • 9. Learning outcomes 1. Understand and critique different approaches to storing and applying knowledge. 2. Understand the connection between modelling and these approaches. 3. Be able to list different computational discovery techniques and how they support the collection of knowledge.
  • 10. Relationship with CDSS Lecture 13 introduced a fairly compelling example of an AI – a clinical decision support system (CDSS). Therefore, the representation, reasoning and knowledge acquisition techniques discussed here very much explain the ‘how’ of Lecture 13; how do CDSSs deliver the benefits we saw there. We’ll provide specific examples of this where possible.
  • 11. Computational representation and reasoning How can we represent knowledge, and what approaches can we use to automatically take that knowledge and arrive at a conclusion?
  • 12. Recall: Knowledge structure Knowledge, from our model: If a plane does not have pontoons, then it will sink. We refer to this as an inference procedure. It is supported by three other (sub-) entities: (1) A language (2) A knowledge base (3) An ontology When determining whether a plane sinks, we are applying knowledge that actually consists of four distinct entities…
  • 13. Computational representation and reasoning In the first half of this lecture, we’ll be asking how we build this knowledge base (representation) and define the associated inference procedures so that reasoning can occur. We’ll look at three representation and reasoning tools in the following slides: (1) Rules, (2) Networks and (3) Systems dynamics. For each, we should be able to answer the following: 1. How can we structure our knowledge? 2. How can we apply that structured knowledge to new data to gain information?
  • 15. The first way we can structure knowledge is as a set of rules. We saw a basic version of a rule in Lecture 2. We can formalise this as: Importantly, rules like this are a subset of formal logic: RuleP1 If plane not pontoons then conclude sink Logic Rules No pontoons ⇒ plane sinks This symbol means implies; if A is true we are free to conclude that B is true. We’re free to use some of the Boolean logic we saw in Lecture 3.
  • 16. Reasoning with logic rules How do we now draw conclusions from these rules? We actually saw this answer to this back in Lecture 4…
  • 18. Reasoning with logic rules How do we now draw conclusions from these rules? We actually saw this answer to this back in Lecture 3… Using deduction, for example, in combination with this rule, we can conclude that if we encounter a plane without pontoons it will sink. RuleP1 If plane not pontoons then conclude sink This would support the kind of ‘symbolic AI’ mentioned in our original definition.
  • 19. Reasoning with logic rules How do we now draw conclusions from these rules? We actually saw this answer to this back in Lecture 3… From the perspective of logic (the core of our rules), we are saying if a plane doesn’t have pontoons, and not having pontoons implies that a plane sinks, then the plane must sink. This is an inference procedure known as modus ponens and adds weight to our conclusions. It is a formalisation of our original inference procedure from Lecture 2.
  • 20. Reasoning with logic rules How do we now draw conclusions from these rules? We actually saw this answer to this back in Lecture 3… We could also work backwards and use a rule of logic called modus tollens to tell us that if a plane isn’t sinking then it must have pontoons. We need to be careful with modus tollens though as it creates a very strict association. If we want to have multiple potential reasons for an outcome, the abduction alone might be a better way to draw conclusions.
  • 21. Statistical rules Logic rules allow us to express that one event definitely implies another, but as we’ve seen throughout the course it’s often the case that certain outcomes have an element of probability attached to them. When rules account for probability, we call them statistical rules. Many statistical rules are based on something we’ve already seen…
  • 22. Recall: A question You are told that a person named Alex has the following personality traits: ‘Quiet’, ‘Reserved’, ‘Good at maths’ Do you think it’s more likely that Alex: (A)Works in retail (e.g. a shop assistant) or (B) Is a computer programmer? This is our new data
  • 23. Recall: Bayes’ theorem: Formula We can write this process out as a formula: (75% × 25%) ((75% × 25%)+(25% × 75%)) (Programmer likelihood × Programmer prior) ((Programmer likelihood × Programmer prior) + (Retail likelihood × Retail prior)) or = 50% Prior Likelihood Likelihood
  • 24. Statistical rules – Bayes’ theorem But we can represent, and thus reason with, even more complex scenarios using the principles of Bayes’ theorem… RuleB1 If person is ‘Quiet’ ‘Reserved’, ‘Good at maths’ then conclude programmer with probability (0.5) As a statistical rule, this is what the output from our previous Bayes’ calculation would look like.
  • 25. Networks Bayesian Belief Networks, Markov chains and Neural Networks
  • 26. Preamble: Beyond statistical rules Relying on sets of rules alone – which are relatively simple in form – and associated inference procedures, to represent and reason with knowledge may be somewhat limiting. Therefore, we can look instead to creating more complex structures to store and link our knowledge, and facilitate reasoning: networks. We saw a basic example of a network containing linked knowledge back in Lecture 4:
  • 27. Recall: ‘Decision trees’ We want to know the chance that Alex could fill a Python programmer role: Suitable for the job: (50% × 80%) + (50% × 1%) = 40.5% Not suitable for the job: (50% × 20%) + (50% × 99%) = 59.5% Alex turns up Alex is a programmer Alex is not a programmer Alex knows Python Alex does not know Python Alex knows Python Alex does not know Python Suitable for the job Not suitable for the job Suitable for the job Not suitable for the job 50% 50% 80% 20% 99% 1%
  • 28. Bayesian belief networks We can take things a step further by explicitly combining decision trees with concepts from Bayes’ theorem to introduce a more nuanced representation of probabilistic knowledge – a Bayesian belief network – that allows us to reason with more complex scenarios. First, we can do this simply…
  • 29. Bayesian belief networks Alex is a programmer Alex is quiet Likelihood = 75% The probability that someone with Alex’s traits is a programmer The probability that anyone in the population is a programmer Prior = 25% We lay out (part) of the information from our instantiation of Bayes’ theorem in a network structure.
  • 30. Bayesian belief networks We can take things a step further by explicitly combining decision trees with concepts from Bayes’ theorem to introduce a more nuanced representation of probabilistic knowledge – a Bayesian belief network – that allows us to reason with more complex scenarios. First, we can do this simply… reformatting our existing Bayes’ equation graphically, capturing that if a parent node is true (initially based on a prior probability) then a child node is also true, with a given conditional probability (likelihood). We can then make this more complex…
  • 31. Bayesian belief networks … … Prior = … Likelihood = … Likelihood = … Prior = … Likelihood = 75% Likelihood = … Alex is a programmer Alex is quiet Prior = 25% Prior = …
  • 32. Bayesian belief networks We can take things a step further by explicitly combining decision trees with concepts from Bayes’ theorem to introduce a more nuanced representation of probabilistic knowledge – a Bayesian belief network – that allows us to reason with more complex scenarios. First, we can do this simply… reformatting our existing Bayes’ equation graphically, showing that if a parent node is true (initially based on a prior probability) then a child node is also true, with a given conditional probability (likelihood). We can then make this more complex… adding additional dependencies. Complex questions can then be asked of this network, and a form of Bayes’ rule applied iteratively to infer the answer.
  • 33. Networks Bayesian Belief Networks, Markov chains and Neural Networks
  • 34. Markov chains A Bayesian belief network stores probabilistic knowledge and allows us to reason with it. Another type of network, a Markov Chain, does something very similar. Here, probabilities represent the chance of transitioning from a current state to another state when those states are connected. An example of a current state might be the state of the weather today, and another, connected state might be the state of the weather tomorrow:
  • 35. Markov chains Here we are representing, for example, that if today is sunny, the chance of it being sunny tomorrow is 20%. We’ll talk about where these numbers are likely to come from later. Sunny Rainy 20% 80% 75% 25%
  • 36. Markov chains - inference Is predicting the weather tomorrow (inference) therefore as simple as reading these probabilities, or should we look at past patterns (i.e. combine probabilities from previous states)? Something called the Markov property, connected to Markov chains, actually says no, in order to balance simplicity with correctness, we should not do this. We should actually just use the current state, interpreting the probabilities on the chain directly.
  • 37. Hidden Markov chains Let’s imagine we know our weather state transition probabilities (seen previously), but for some reason we can’t see what the weather is like (e.g. we are in a room with no windows). We can, however, see evidence of the weather (e.g. people coming into the room with or without an umbrella) This would introduce a second set of probabilities (emission probabilities), between the hidden state (the weather) and our observations (the presence of an umbrella):
  • 38. Hidden Markov chains Umbrella No umbrella Hidden states Observations Emission probabilities Transition probabilities 90% 10% 90% 10% We have much more expressive power in a Hidden Markov chain Sunny Rainy 20% 80% 75% 25%
  • 39. Recall: Part-of-speech (POS) tagging Our bag of words approach doesn’t allow us to appreciate that words appear in sentences, and each have a different grammatical role (e.g. verbs and nouns). The process of determining the role each word plays in a sentence is known as part-of-speech (POS) tagging. It is important to understand whether a word is a noun or a verb, for example, so we can correctly label entities from our terminology. We’ll look at Markov models, which support automatic POS tagging, later in the course.
  • 40. POS tagging and Hidden Markov chains In POS tagging, we encounter a similar set of observations and hidden states as we do in our (slightly contrived) weather example. Our observations are our words (e.g. ‘land’), and our hidden states – which these observations might be evidence of – are the different grammatical roles each word might have (e.g. ‘verb’ or ‘noun’). Transition probabilities are the chance of, for example, a verb following a noun, and our emission probabilities are the chance of a word, in general, holding a certain grammatical role:
  • 41. POS tagging and Hidden Markov chains How inference would be applied to a network like this for POS tagging is outside the scope of this course, but we will come back to where our probabilities come from. The overall chance of land being a verb How often one type of word is followed by another Transition probabilities Emission probabilities Hidden states Observations Land Seaplane ?% ?% ?% ?% Noun Verb ?% ?% ?% ?%
  • 42. Networks Bayesian Belief Networks, Markov chains and Neural Networks
  • 43. Background: Human thinking The human brain consists of a set of neurons, connected by a set of synapses. If the inputs (e.g. observations in the world) to a given neuron, via a set of synapses, are sufficient (i.e. we see enough of them) that neuron will fire. This is the basis of thought. Neuron Synapses If I see this wing, these feet and this beak, the neuron fires and I can conclude this is a penguin.
  • 44. Background: Human thinking But of course some input features are more or less correlated with a given output than others. To represent this, we implicitly weight certain inputs in our mind. These weights thus reflect our knowledge. Neuron Synapses Because other animals might have wings like this, I give this input a lower weight as it alone is less indicative of this being a penguin. 0.25 0.5 0.5
  • 45. Neural networks A very similar structure is used to store knowledge – in the form of weights – in a third type of network, a neural network. Reasoning then occurs by seeing if the inputs are, given the weights, sufficient for a neuron to fire. Input 1 Input 2 Input 3 Weight A Weight B Weight C Output
  • 46. Neural networks The network represents whether inputs are sufficient for a neuron to fire using something called an activation function. The simplest activation function (f), a binary step function, has a straightforward threshold for firing the neuron: whether, once weighted and summed, there is any (positive) input at all. Input 1 Input 2 Input 3 Weight A Weight B Weight C Output x = Input 1 × Weight A + Input 2 × Weight B + Input 3 × Weight C If x > 0: f(x) = 1 If x < 0: f(x) = 0 Summation Activation
  • 47. Neural networks x = 0.5 × 1 + 0.1 × 1 + 0 × 1 0.5 0.1 0 = 0.6 f(0.6) = 1 1 1 could indicate ‘true’ (e.g. is a penguin) 1 1 1 Output Summation Activation If x > 0: f(x) = 1 If x < 0: f(x) = 0 The network represents whether inputs are sufficient for a neuron to fire using something called an activation function. The simplest activation function (f), a binary step function, has a straightforward threshold for firing the neuron: whether, once weighted and summed, there is any (positive) input at all.
  • 49. Preamble: Rule and network limitations We will talk more about how we acquire the knowledge we’ve seen thus far (e.g. the percentages for Bayes’ rule) shortly, but informally we know it is based on past experiences. We can estimate, for example, the portion of programmers in a population fitting a description using our experiences with programmers. But what if we’ve never met any programmers and seen their traits before? In other words, what if we’ve never seen a particular pattern before?
  • 50. Background: Foundational knowledge In these situations, all we can rely on foundational knowledge; things we already know to be true of the world, and reason from that. Blueprints are a good example of foundational knowledge. They represent, at a very low level, exactly what we know to be true of a device, so that when new problems are encountered they can be reasoned upon to generate new information in respect of that problem.
  • 51. Systems dynamics An entity that can capture a greater range of foundational knowledge than a blueprint is something called systems dynamics. This representation uses a variety of (diagrammatic) formalisms – stocks, flows, converters and connectors – to represent complex systems.
  • 52. Systems dynamics – an example Let’s say we want to represent foundational knowledge about the impact of birth rate on the size of a population. This isn’t as straightforward as connecting birth rate and population, as, for example, a decrease in birth rate doesn’t directly decrease the population. Instead, it causes it to increase at a slower rate. Similarly, there is a reverse connection between population size and birth rate (a larger population means, in turn, more births). We can capture all this using systems dynamics:
  • 53. Systems dynamics Population Births Birth rate An example of a stock: a value that can be increased or decreased. An example of a (in)flow: an entity that increases the value of a stock An example of a converter: an entity whose value is not determined directly by the system. An example of a connector: stocks can also, in turn, impact flows. An example of a connector: convertors can impact flows. Only flows can impact stocks. By separating out the concepts of birth rate, births and population we can capture this nuance of the world.
  • 54. Systems dynamics An entity that can capture a greater range of foundational knowledge than a blueprint is something called systems dynamics. This representation uses a variety of (diagrammatic) formalisms – stocks, flows, converters and connectors – to represent complex systems. Inference is then done by digitalising these formalisms, providing initial values and then simulating how flows are likely to impact these stocks over time.
  • 55. Summary: Computational representation and reasoning How can we structure our knowledge? How can we apply that structured knowledge to new data to gain information? Logical rules See which rules match the data. What these rules tell us is definitely true then becomes our new information. Statistical rules See which rules match the data. What these rules tell us might be true then becomes our new information.
  • 56. Summary: Computational representation and reasoning How can we structure our knowledge? How can we apply that structured knowledge to new data to gain information? Bayesian networks Our data tells us what is true in the network, and then we apply a form of Bayes’ rule we gain new information about what else may also be true. Markov chains Our data tells us the current state, and then we use the chain to gain information about which state(s) may be transitioned to. Neural networks Our data is used as input to an (artificial) neuron, the activation (or not) of which is information.
  • 57. Summary: Computational representation and reasoning How can we structure our knowledge? How can we apply that structured knowledge to new data to gain information? Systems dynamics Simulations are run on computational versions of systems dynamics, and the state of the system at the end of the simulation is our information. These reasoning processes can be done by a machine (an AI), hence the term ‘computational’.
  • 58. Model building Where does the knowledge that we represent and then reason with come from?
  • 59. Model building So far, we’ve been representing knowledge and applying it back to the world (reasoning). This might be familiar to you…
  • 60. Recall: Knowledge acquisition and application In this way, we see how models capture knowledge that has been gained from the world, and in turn apply that knowledge to the world to gain new information. Abstraction + Knowledge Acquisition Instantiation + Knowledge Application
  • 61. Model building So far, we’ve been representing knowledge and applying it back to the world. This might be familiar to you… We have thus, in the first half of this lecture, ticked off the right-hand side of this diagram, which we saw back in Lecture 2. This diagram also tells us that – whether we are constructing rules, networks or systems dynamics – what we are doing is building models.
  • 62. Model building The left-hand side is now of interest: how do we get the knowledge needed to build these models? We will use a generalised form of this diagram to visualise the techniques we look at in the remainder of the lecture, each of which tells us how we acquire knowledge.
  • 63. Model building ? Reasoning Representation ? as rules, networks or systems dynamics Model e.g. for clinical decision support
  • 64. Running example We know that there is knowledge represented within an instantiation of Bayes’ theorem, which can then be used to reason about the world (e.g. whether an individual is a programmer based on their traits). It is a model. But where does this knowledge (the likelihoods and priors) come from? We’ll answer this question when exploring various model-building methods, but these methods could be used to construct other models too (other rules, networks and systems dynamics).
  • 65. Model building - asking humans
  • 66. Model building from humans Probably the most straightforward way to obtain the knowledge for a machine to represent and reason with is to ask humans for it. This is often called knowledge engineering. We could, for example, ask humans to estimate the probabilities for our Bayes’ theorem (similar to what we did in Lecture 4 for our programmer vs. retail likelihoods).
  • 67. Model building from humans ? Reasoning Representation as rules, networks or systems dynamics Model
  • 68. Model building from humans Probably the most straightforward way to obtain the knowledge for a machine to represent and reason with is to ask humans for it. This is often called knowledge engineering. We could, for example, ask humans to estimate the probabilities for our Bayes’ rule (similar to what we did in Lecture 4 for our initial programmer vs. retail likelihoods). Knowledge like this can be obtained from humans using several different mechanisms:
  • 69. Model building from humans - mechanisms ‘Think-aloud’ Interviews Observation Ask users to verbalise their thoughts while they complete tasks.
  • 70. Model building from humans - mechanisms Systematic Review Repertory grid Crowdsourcing Ask users to discuss the similarities and differences between different concepts to capture relationships. As we saw in Lecture 4, search is one way we can systematically look through data to obtain knowledge, such as when examining literature.
  • 71. Summary: Model building from humans Interviews Reasoning Representation as rules, networks or systems dynamics Model
  • 72. Model building – computational discovery Data Mining, Clustering (Unsupervised machine learning) and Naïve Bayes Classifier (Supervised machine learning)
  • 73. Model building from data (computational discovery) Our final example of model building from humans, a systematic literature review, suggests that if human knowledge is embedded in data, we may not need to consult humans at all. In other words, we might be able to build our models (acquire knowledge) automatically from data.
  • 74. Model building from data (computational discovery) ? Reasoning Representation as rules, networks or systems dynamics Model
  • 75. Model building from data (computational discovery) Our final example of model building from humans, a systematic literature review, suggests that if human knowledge is embedded in data, we may not need to consult humans at all. In other words, we might be able to build our models (acquire knowledge) automatically. Automatic model building is known as computational discovery. We will look at three computational discovery mechanisms: (1) Data mining, (2) Unsupervised learning (briefly) and (3) Supervised learning.
  • 76. Data Mining A broad approach to computational discovery is something called data mining. If the data being examined is unstructured (e.g. free text) this may also be called text mining. Here, we look for specific patterns within data. The more often we find the same pattern, the surer we can be that it is knowledge. These patterns may be rules (e.g. quiet ⇒ programmer), or sequences of events (known as process mining).
  • 77. Model building – computational discovery Data Mining, Clustering (Unsupervised machine learning) and Naïve Bayes Classifier (Supervised machine learning)
  • 78. Unsupervised machine learning One way to conceptualise data mining is as a system (or machine) learning new knowledge (e.g. rules) from patterns in the data. Therefore, if data mining is conducted without any human support, as is often the case, we can describe what is being done as unsupervised machine learning.
  • 79. Clustering A common form of unsupervised machine learning is known as clustering. Rather than identifying, for example, rules from the data, the goal of clustering is broader: to identify groups (classes) in the data. The features of those groups and the value of those features then become new knowledge that can be used to better understand the data or reason with new data. We could apply clustering to our programmer and retail worker population to try and learn the values needed for Bayes’ theorem…
  • 80. Recall: Our programmer and retail worker population One (existing) cluster or class containing programmers Another (existing) cluster or class containing retail workers
  • 81. Clustering A common form of unsupervised machine learning is known as clustering. Rather than identifying, for example, rules from the data, the goal of clustering is broader: to identify groups (classes) in the data. The features of those groups and the value of those features then become new knowledge that can be used to better understand the data or reason with new data. We could apply clustering to our programmer and retail worker population to try and learn the values needed for Bayes’ theorem… but this doesn’t tell us much more than we already know. We already know the classes that would have been discovered through the clustering process.
  • 82. Model building – computational discovery Data Mining, Clustering (Unsupervised machine learning) and Naïve Bayes Classifier (Supervised machine learning)
  • 83. Supervised Machine Learning We’ve been talking a lot about learning without talking about a related concept: teaching. Rather than leaving a machine unsupervised to learn, we can supervise (teach) it by providing it with examples that it can learn relationships from. To understand this better, we can again ask how we might learn the probabilities in Bayes’ theorem:
  • 84. Recall: Our programmer and retail worker population
  • 86. ID Trait Age Occupation 1 Quiet 42 Programmer 2 Quiet 38 Programmer 3 Quiet 76 Programmer 4 Loud 51 Programmer 5 Quiet 19 Retail 6 Loud 53 Retail 7 Loud 59 Retail 8 Loud 80 Retail Prelude: Populations as data Let’s translate this graphic to a set of data (and add age as well to help our example). How would we now calculate our Bayes’ values? ID Trait Age Occupation 9 Loud 77 Retail 10 Loud 61 Retail 11 Quiet 40 Retail 12 Loud 35 Retail 13 Loud 32 Retail 14 Loud 46 Retail 15 Quiet 56 Retail 16 Loud 25 Retail
  • 87. ID Trait Age Occupation 1 Quiet 42 Programmer 2 Quiet 38 Programmer 3 Quiet 76 Programmer 4 Loud 51 Programmer 5 Quiet 19 Retail 6 Loud 53 Retail 7 Loud 59 Retail 8 Loud 80 Retail Learning Bayes’: The prior 25% Calculating the prior(s) is much the same as we did before, however… ID Trait Age Occupation 9 Loud 77 Retail 10 Loud 61 Retail 11 Quiet 40 Retail 12 Loud 35 Retail 13 Loud 32 Retail 14 Loud 46 Retail 15 Quiet 56 Retail 16 Loud 25 Retail
  • 88. ID Trait Age Occupation 1 Quiet 42 Programmer 2 Quiet 38 Programmer 3 Quiet 76 Programmer 4 Loud 51 Programmer 5 Quiet 19 Retail 6 Loud 53 Retail 7 Loud 59 Retail 8 Loud 80 Retail Learning Bayes’: The likelihood ID Trait Age Occupation 9 Loud 77 Retail 10 Loud 61 Retail 11 Quiet 40 Retail 12 Loud 35 Retail 13 Loud 32 Retail 14 Loud 46 Retail 15 Quiet 56 Retail 16 Loud 25 Retail 75% …we can now calculate the likelihood directly, as we have actual data we can use. Previously, we had estimated these values, based upon what portion of a group of programmers (the number of which was based on our prior) we thought would have the traits of interest.
  • 89. ID Trait Age Occupation 1 Quiet 42 Programmer 2 Quiet 38 Programmer 3 Quiet 76 Programmer 4 Loud 51 Programmer 5 Quiet 19 Retail 6 Loud 53 Retail 7 Loud 59 Retail 8 Loud 80 Retail ID Trait Age Occupation 9 Loud 77 Retail 10 Loud 61 Retail 11 Quiet 40 Retail 12 Loud 35 Retail 13 Loud 32 Retail 14 Loud 46 Retail 15 Quiet 56 Retail 16 Loud 25 Retail Learning Bayes’: The likelihood We can do a similar thing for retail workers. 25%
  • 90. The process we have been through here is the process a machine goes through automatically when undertaking what is known as supervised learning. Let’s formalise slightly some of the things we did… Supervised learning
  • 91. ID Trait Age Occupation 1 Quiet 42 Programmer 2 Quiet 38 Programmer 3 Quiet 76 Programmer 4 Loud 51 Programmer 5 Quiet 19 Retail 6 Loud 53 Retail 7 Loud 59 Retail 8 Loud 80 Retail ID Trait Age Occupation 9 Loud 77 Retail 10 Loud 61 Retail 11 Quiet 40 Retail 12 Loud 35 Retail 13 Loud 32 Retail 14 Loud 46 Retail 15 Quiet 56 Retail 16 Loud 25 Retail Training data, features and classes Features Classes (or labels) Predictor We call this training data What is the relationship between the predictor and each label?
  • 92. Supervised learning The process we have been through here is the process a machine goes through automatically when undertaking what is known as supervised learning. Let’s formalise slightly some of the things we did… Our training data supplied us with features, from which we selected trait as a predictor and added labels to form positive examples for different classes (e.g. programmer). From these examples, we learned the relationship between, for example, someone’s traits and them being a programmer (e.g. 75% chance of being quiet if a programmer). Much like our initial guess about the portions of programmers and retail workers, we hope our training data generalises (more later). This contrasts unsupervised learning, where we didn’t have any labelled examples to learn from.
  • 93. Learning Bayes’: Final model We now have all the numbers we need to instantiate Bayes’ theorem. Specifically, the simple model built from this type of supervised learning is called a Naïve Bayes’ classifier. It can now make future predictions when new data is encountered. This classifier is naïve because it makes several assumptions, such as there being no connection between different features. In reality, this might not be the case (e.g. might someone’s traits be impacted by age?).
  • 94. Other supervised learning-based models The kind of simplification within a Naïve Bayes classifier is the same kind of simplification we have identified in models throughout the course. Therefore, there are several other examples of supervised machine learning-based models, some of which you may have heard of, such as logistic regression (models). These models, of course, also rely on the presence of labelled data. If we don’t have this data, the best we can do is rely on unsupervised learning.
  • 95. Recall: Markov chains Knowing that machine learning uses data to acquire knowledge, we can now answer where the transition probabilities in our Markov chain might come from: Day Weather 1 Sunny 2 Rainy 3 Sunny 4 Rainy 5 Sunny 6 Rainy 7 Sunny 8 Sunny 9 Rainy 10 Rainy Sunny Rainy 20% 80% 75% 25% Count of transitions from state i to state j Total count of transitions from state i
  • 96. Recall: Neural Networks We can also answer where the weights (knowledge) in our neural network might come from: given a set of inputs (features) and outputs (labels), we experiment with different weights until the output of the activation function matches the output in the data: This is obviously a very simple training process. When we have more complex neural networks – those with different activation functions or potentially those where we have neurons organised into different groups or layers – then our training approaches also become more complex, often referred to as ‘deep learning’. Input A Input B Input C Output 0.5 0.1 0 1 … … … …
  • 97. Aside: What about Generative AI? We can’t really talk about AI and machine learning without talking about ChatGPT. The more formal way to refer to ChatGPT is as a Large Language Model-backed (LLM-backed) chatbot; the chatbot uses a trained LLM (Generative Pretrained Transformer (GPT)) to support its operation. LLMs are trained on large amounts of existing text using a mixture of unsupervised and supervised processes.
  • 98. Aside: What about Generative AI? LLMs are a form of generative AI as they generate new text content: based on a prompt, they generate the next most likely sequence of words based on the information about the nature of text gained during training. LLMs could, therefore, conduct the Named Entity Recognition (NER) process we saw in Lecture 11, taking the string to ‘make computable’ as a prompt and generating the labels. They are thus also part of the field of Natural Language Processing (NLP). Because Markov Chains also support POS tagging, part of the NER process, they can be viewed as a simple form of language model. LLMs in the context of what we’ve seen so far…
  • 99. Aside: It’s all statistics in the end… In many situations, the relationship supervised machine learning tries to identify can just be viewed as a line (or an equation that describes that line) that separates the examples given. With just one or two classes this is simple and only requires two dimensions, but with more classes this becomes more complex and requires more sophisticated methods such as support vector machines. Retail Programmer
  • 100. Summary: Model building from data (computational discovery) Machine learning Reasoning Representation as rules, networks or systems dynamics Model
  • 101. Summary Artificial intelligence effectively equates to automatically gaining knowledge, representing that knowledge, and then automatically applying it to new data to obtain information. It is thus akin to the modelling processes we have seen. Knowledge can be gained from humans, or from existing data through data mining, unsupervised or supervised learning. The applicability of each depends upon the data available for learning. Knowledge can be represented and applied using rules (logical or statistical), networks (Bayes’, Markov or Neural) or Systems Dynamics.
  • 102. Recall: It all comes back to public health interventions… If we can automate the application of knowledge to health data, then we can automate (the introduction of) interventions. If we can’t fully use information systems to automate this application, then they can assist clinicians in the delivery of interventions. Diabetes A computer, as an information system, could automatically determine whether a patient has diabetes and act accordingly
  • 104. Model suitability The success of the (simple) supervised learning process we’ve seen depends upon how representative our data is of the true situation in the world. If it is representative, then the relationships we capture in our model are likely to be correct. If it is not, then the relationships are also unlikely to be accurate. The smaller a dataset is the less likely it is to be representative. The accuracy of the relationships captured can also depend on the features selected, the accuracy of the labels and even the learning algorithm itself.
  • 105. Model evaluation For these reasons, it’s important to check how well our model performs before we use it to make future predictions. To do this, we run our model against test data and evaluate the quality of the predictions it makes. We can base this evaluation on techniques we’ve already seen…
  • 106. Recall: Questioning: Evaluation (Query Performance Measures – Sensitivity) ‘UCL Institute of Health Inf’ Q: ‘Health Informatics’ The items that we want that are included in our search are called True Positives (TP). Any items that are missed by our search (i.e. we wanted them but they were not retrieved) are called False Negatives (FN). We can calculate a true-positive rate (or sensitivity) as the proportion of true positives from amongst those results that we wanted, or TP/(TP+FN). ‘KCL History’ ‘KCL Physics’ 1 2 = 50% ‘KCL Principles of Health Inf’
  • 107. Recall: Questioning: Evaluation (Query Performance Measures – Specificity) ‘KCL Principles of Health Inf’ ‘KCL History’ ‘UCL Institute of Health Inf’ Q: ‘Health Informatics’ The items that we don’t want that are not included in our search are called True Negatives (TN). Any unwanted items that are included by our search (i.e. we didn’t want them but they were retrieved anyway) are called False Positives (FP). We can calculate a true-negative rate (or specificity) as the proportion of true negatives from amongst those results that we did not want, or TN/(TN+FP). ‘KCL Physics’ 1 2 = 50%
  • 108. Model evaluation For these reasons, it’s important to check how well our model performs before we use it to make future predictions. To do this, we run our model against test data and evaluate the quality of the predictions it makes. We can base this evaluation on techniques we’ve already seen… calculating sensitivity and specificity, for example. Specifically, we show the testing data to our model (e.g. our Naïve Bayes classifier) hiding the true labels when doing so. The labels the classifier comes up with can thus be used to determine its performance:
  • 109. Model evaluation ID Trait Age Occupation 1 Quiet 42 Programmer 2 Quiet 38 Programmer 3 Quiet 76 Programmer 4 Loud 51 Programmer Occupation Programmer Programmer Programmer Retail 3 4 = 75% 1. True labels (hidden): 2. Classifier labels: 3. Evaluate the classifier labels in the way we’ve seen: 4.Calculate sensitivity:
  • 110. References and Images Enrico Coiera. Guide to Health Informatics (3rd ed.). CRC Press, 2015. Bella Martin. Universal methods of design 100 ways to research complex problems, develop innovative ideas, and design effective solutions. Rockport Publishers, 2012. https://www.gettyimages.co.uk/ https://etn-sas.eu/2020/09/23/part-of-speech-tagging-using-hidden-markov-models/ https://towardsdatascience.com/deep-learning-with-python-neural-networks-complete-tutorial-6b53c0b06af0 https://www.pinterest.co.uk/pin/2040762311279389/ https://thesystemsthinker.com/step-by-step-stocks-and-flows-improving-the-rigor-of-your-thinking