Criminals presentation

An investigation Into bias in
Facial Recognition using
Learning Algorithms

Clockwise from Top Left:
 Michael Murray: Serial killer lives in Dublin 4
 Miranda Barbour: Teen serial killer 22
murders
 Mezut Oezil: Football player
 Jessie Sleator : Teen girl from Dublin
Just as a human would find it hard to spot
a criminal, a machine learning algorithm
faces same challenge.

Motivation for research
▪ Initially, the motivation came from work such as that of Wu & Zhang (2016) who
claimed to have high accuracy in classifying criminality from facial images.
▪ There were strong reactions to their work with accusations of biases within their
dataset.
▪ Algorithm may not pick up on underlying physical structures associated with
criminality, but rather may discriminate based on context-specific cues from the
situations under which the photographs were taken.
▪ Machine learning algorithms are only as good as their training data
▪ Bias example: criminal mug-shots may be more likely to show negative emotion

Related Work
▪ Automated Inference on criminality (2016)
- Wu, X. and Zhang, X. (2016). Automated inference on criminality using face images.
▪ A.I. Gaydar (2017)
- Wang, Y. and Kosinski, M. (2017). Deep neural networks are more accurate than
humans at detecting sexual orientation from facial images.
▪ Instagram photos predict depression. (2016)
- Reece, A. G. and Danforth, C. M. (2016). Instagram photos reveal predictive markers of
depression

Research Question/Objectives
▪ The aim of this paper is to investigate the presence and effects of biases in
training datasets by focusing on the facial recognition features pattern
identification problem applied to criminal classification.
▪ There are many types of biases e.g. Emotion, gender, race, facial features such
as tattoos, hairstyles, image background.
▪ E.g. Given the context, criminal mug-shots tend to exhibit negative emotional
states such as fear, contempt and anger.
▪ Many datasets are open datasets that do not require informed consent for usage.
In comparison, some datasets are prepared by researchers that endeavour to
create unbiased image sets.
▪ This fact makes the awareness of biases even more important.

PCA
 1102 images of criminals and non-criminal.
 40,000 features (200 x 200 pixels).
 PCA is applied to reduce dimensions while
maintaining explained variance.
 A graph of no. of components vs. explained
variance was used for optimisation.

Implementation – Criminal Classifier
 Main steps involved in criminal classifier model design:
1. Read in images, convert to grayscale, align eyes and crop using OpenCv functionality.
2. Applied PCA to reduce dimensions from 40,000 to 300-750.
3. Used supervised learning algorithm (Keras Sequential NN) for training and validating the
model. (Stratified K-fold cross validation).
4. Neural net optimisation using various architectures and hyper-parameter tuning.
(Epochs, batches, Dropout)
5. Obtained performance metrics i.e .accuracy, confusion matrix and learning curves
(python sklearn).

Implementation – Emotion Classifier
 FACS (Ekman, 1978)
 (Cohn-Kanade database)
 Classifier trained and tested using Fisherface Recognizer (OpenCv)

Emotion Classification of Image Sets
Emotion Profile Criminals Emotion profile Non-Criminals
The emotion profiling above shows an imbalance across datasets which may well be a source of bias
that could over-estimate the classifier efficacy. Overlapping emotions sets are small and would require
a larger dataset to incorporate into the classifier.

Evaluation Scenarios
 A number of biases were investigated:
Scenario 1: Classifier was run on all images (481 non-criminals & 621 criminals).
Scenario 2: Classifier was run on 240 criminal men and 252 non-criminal women.
Scenario 3: To compare with Scenario 2, the classifier was run for 240 criminal men and
224 non-criminal men. Given that Scenario 2 has a gender bias, we may expect that
Scenario 3 may perform worse.
Scenario 4: 77 Criminal women vs. 78 non-criminal men. This scenario attempted to
investigate if accuracy is improved due to gender bias (Small dataset a concern for
predictive power)

Evaluation
Results Set Scenario 1 – All Images
 (Mixed gender, race and emotion)
Criminal images: 621
Non-criminal images: 481
No. of Principal Components: 750
Explained Variance: 99.1%
Stratified 10-Fold Cross Validation Accuracy: 60%
 Multiple potential biases – gender, emotion,
tattoos, hair
Confusion matrix is a single cross-validated fold i.e.
111 from 1102 images (90:10 train-test)

Evaluation
Results Set Scenario 2 – Criminal Men vs.
Non-criminal women
 (Mixed race and emotion)
Criminal images (Men): 240
Non-criminal images (Women): 252
Explained Variance: 97.8%
Stratified 10-Fold Cross Validation Accuracy: 59.2%
 Accuracy in Scenario 2 similar to Scenario 1 but
Scenario 2 has 45% of the image count of Scenario
1. (1102 vs. 492 images)

Evaluation
Results Set Scenario 3 – Criminal Men vs.
Non-criminal men
 (Mixed race and emotion)
Criminal images (Men): 240
Non-criminal images (Men): 224
Explained Variance: 98 %
Stratified 10-Fold Cross Validation Accuracy: 51.3%
 Scenario 2 and 3 were trained/validated on similar
image set sizes.
 The Stratified 10-fold cross validated accuracy is
8% higher for the Scenario 2 which has a data sets
with opposing genders – perhaps gender aided
classification.

Evaluation
Results Set Scenario 4 – Criminal women
vs. Non-criminal men
 Mixed race and emotion)
Criminal images (Women): 77
Non-criminal images (Men): 78
Explained Variance: 98%
Stratified 10-Fold Cross Validation Accuracy: 59%
 Given that there was only 77 criminal women
images, the classifier may be limited in its predictive
power.

Conclusions/Future Work
▪ Many biases can exist within images. This research attempted to show that both gender and
emotion biases could affect the performance of a classifier.
▪ The model shows a high emotion imbalance across criminal and non-criminal datasets as well
as performance differences when gender bias was included.
▪ In the case of labelling people based on categories such as criminal, gay, IQ etc, there are
serious consideration s to be addressed if machine learning algorithms are to be utilised and
trusted as accurate.
▪ Future Work:
▪ Larger training dataset.
▪ With more images, the classifier could be run in emotion balanced sets.
▪ Use of VGG Face which uses a DNN (pretrained on 2.6 million images) to extract facial features.
▪ Investigation of Kernal PCA and Convolutional Neural Networks

Criminals presentation

Recommended

Recommended

More Related Content

Similar to Criminals presentation

Similar to Criminals presentation (20)

Recently uploaded

Recently uploaded (20)

Criminals presentation