ISSUES IN AI, POLICING AND
Dr Janet Bastiman
What is AI?
"Any system that makes a decision that appears to be intelligent
from specific inputs” John McCarthy (1955)
AI -> Machine Learning -> Deep Learning -> (G)AI
Visualizing and Understanding Convolutional Networks,
Zeiler and Fergus 2013
Deep if there's more than one stage
of non-linear feature transformation
What is Bias?
"an unwarranted correlation between input variables and output classification"
"erroneous assumptions in the learning algorithm resulting
in missing relevant relationships" - bias (underfitting)
"noise is modelled rather than the valid outputs" - variance (overfitting)
Nothing is (externally) peer reviewed
IP is in the training data, network architecture and test set - hidden
Results are cherry picked / exaggerated
This might look fine but would be embarrassing and potentially
contract ending for a brand – how disastrous could this be for a
decision on a person’s life?
• Overcomplicating AI deliberately to promote its abilities
• Removes desire to question as it sounds too difficult
• Pretending something is AI when it’s not
We cannot have transparency when the general public are deliberately misled
with statistics in the interest of sensationalism.
Bias in data gathering
• Ignorance of the problem
– building a lego kit without picture or instructions
• Asking biased questions to lead the model
– presupposing the answer
• Limiting the data to a set that supports the hypothesis
• representative of real world data
• able to define the predictions well
There will always be exceptions - a good model will handle these sensibly.
Ignorance of the problem space will lead to poor models
Data choices ...
Biased data will result in biased models
E.g. Oxbridge entry is inherently biased:
• state school candidates are:
• less likely to apply
• more likely to apply to oversubscribed courses
Hence there is an observed bias towards privately educated students
Racial discrepancy is exacerbated without understanding the inherent
COMPAS: Is there bias? If so, where?
• Only looked at individuals arrested for crimes
• Exact algorithms and training data unknown
• 137 point questionnaire to feed risk score
• Questions appear unsuitable
• Testing showed a significant correlation between risk scores and
• Black individuals were given higher risk scores than white
individuals for the same crimes
Was it biased?
• 5575 individuals
• 50.2% Black, 42.0% White, 7.8% Other
• 86.3% male
• Had been assessed by COMPAS for risk of reoffence and were
then monitored over 12 months.
• Over half the individuals scored low (1-3)
• Changes between percentage, percentage change and
percentage per category to show results in the best light
COMPAS: Pro Republica Study
• Same individuals
• Concluded that there was fundamental bias as a false positive for
a black defendant categorised as high risk was twice that of a
Two conflicting statistical studies…
Bias-free but skewed population?
• Chouldechova (2016) reviewed both
• For all risk scores in the set, the test is fair if the probability is
irrespective of group membership
• P(Y=1 | S= s, R = b) = P(Y=1 | S=s, R = w)
• COMPAS adheres well to this fairness condition
• However, FNR and FPR are related by:
• 𝐹𝑃𝑅 =
1 − 𝐹𝑁𝑅
• If recidivism is not equal between the two groups then a fair test
score cannot have equal FPR and FNR across the two groups
• Either the prediction is unbiased or the error is unbiased
Inherent trade offs
• Kleinberg et al (2016)
• Is statistical parity possible? Or should we strive for balance of
classes so that the chance of making a mistake does not depend
on their group.
• Also determined independently that you could not balance
unbiased predictions with unbiased errors.
• They define more complex feature vectors 𝜎 and avoid the case
where 𝜎 is not defined or incomplete for some individuals
• Rigorous proof that you cannot balance all side
Making data fair?
• Hardt et al (2016)
• Ignore all protected attributes? Ineffective due to redundant
encodings and other connected attributes
• Demographic parity? If the probability within a class varies you
cannot have positive parity and error parity at the same time
• Propose new parity – aiming to equalise both true positives and
• “fairer” is still subjective
• Cannot maximise predictive success and equalise error across
• Several approaches – depends on what is the goal of the algorithm
• Input data needs to be unbiased
“All models are wrong, but some are useful – how wrong do they have to be
to not be useful?” – George E. P. Box
Human “gut feel” decisions are acceptable and when challenged as long as
some of the contributing factors are explained, that’s okay.
Are we holding AI predictive models to an unachievably high standard that
we do not apply to humans?