My research project involved investigating the potential to apply Machine Learning techniques and Artificial Intelligence to distinguish criminal tendencies in people. I know that the pursuit of this kind of classification is highly controversial but more and more machine learning and deep learning are being applied to data of all types. While my project inherently ran classifiers on images datasets of criminal and non criminal individuals, the main focus was to investigate the presence and effects of biases in the image sets. Quite recently, there was a Chinese paper by Wu and Zhang (2016) that claimed very high performance in discriminating between a criminal and non-criminal dataset. It received widespread criticism
so I endeavoured to investigate using my own assembled datasets whether I could show
that the performance decreased with the removal of biases such as emotion imbalance across sets and background colouring and texture. It mainly involved web-scraping, image
analysis using facial recognition functionality, emotion detection and a Deep Learning Neural Net classifier.
2. Clockwise from Top Left:
Michael Murray: Serial killer lives in Dublin 4
Miranda Barbour: Teen serial killer 22
murders
Mezut Oezil: Football player
Jessie Sleator : Teen girl from Dublin
Just as a human would find it hard to spot
a criminal, a machine learning algorithm
faces same challenge.
3. Motivation for research
▪ Initially, the motivation came from work such as that of Wu & Zhang (2016) who
claimed to have high accuracy in classifying criminality from facial images.
▪ There were strong reactions to their work with accusations of biases within their
dataset.
▪ Algorithm may not pick up on underlying physical structures associated with
criminality, but rather may discriminate based on context-specific cues from the
situations under which the photographs were taken.
▪ Machine learning algorithms are only as good as their training data
▪ Bias example: criminal mug-shots may be more likely to show negative emotion
4. Related Work
▪ Automated Inference on criminality (2016)
- Wu, X. and Zhang, X. (2016). Automated inference on criminality using face images.
▪ A.I. Gaydar (2017)
- Wang, Y. and Kosinski, M. (2017). Deep neural networks are more accurate than
humans at detecting sexual orientation from facial images.
▪ Instagram photos predict depression. (2016)
- Reece, A. G. and Danforth, C. M. (2016). Instagram photos reveal predictive markers of
depression
5. Research Question/Objectives
▪ The aim of this paper is to investigate the presence and effects of biases in
training datasets by focusing on the facial recognition features pattern
identification problem applied to criminal classification.
▪ There are many types of biases e.g. Emotion, gender, race, facial features such
as tattoos, hairstyles, image background.
▪ E.g. Given the context, criminal mug-shots tend to exhibit negative emotional
states such as fear, contempt and anger.
▪ Many datasets are open datasets that do not require informed consent for usage.
In comparison, some datasets are prepared by researchers that endeavour to
create unbiased image sets.
▪ This fact makes the awareness of biases even more important.
7. PCA
1102 images of criminals and non-criminal.
40,000 features (200 x 200 pixels).
PCA is applied to reduce dimensions while
maintaining explained variance.
A graph of no. of components vs. explained
variance was used for optimisation.
8. Implementation – Criminal Classifier
Main steps involved in criminal classifier model design:
1. Read in images, convert to grayscale, align eyes and crop using OpenCv functionality.
2. Applied PCA to reduce dimensions from 40,000 to 300-750.
3. Used supervised learning algorithm (Keras Sequential NN) for training and validating the
model. (Stratified K-fold cross validation).
4. Neural net optimisation using various architectures and hyper-parameter tuning.
(Epochs, batches, Dropout)
5. Obtained performance metrics i.e .accuracy, confusion matrix and learning curves
(python sklearn).
10. Emotion Classification of Image Sets
Emotion Profile Criminals Emotion profile Non-Criminals
The emotion profiling above shows an imbalance across datasets which may well be a source of bias
that could over-estimate the classifier efficacy. Overlapping emotions sets are small and would require
a larger dataset to incorporate into the classifier.
11. Evaluation Scenarios
A number of biases were investigated:
Scenario 1: Classifier was run on all images (481 non-criminals & 621 criminals).
Scenario 2: Classifier was run on 240 criminal men and 252 non-criminal women.
Scenario 3: To compare with Scenario 2, the classifier was run for 240 criminal men and
224 non-criminal men. Given that Scenario 2 has a gender bias, we may expect that
Scenario 3 may perform worse.
Scenario 4: 77 Criminal women vs. 78 non-criminal men. This scenario attempted to
investigate if accuracy is improved due to gender bias (Small dataset a concern for
predictive power)
12. Evaluation
Results Set Scenario 1 – All Images
(Mixed gender, race and emotion)
Criminal images: 621
Non-criminal images: 481
No. of Principal Components: 750
Explained Variance: 99.1%
Stratified 10-Fold Cross Validation Accuracy: 60%
Multiple potential biases – gender, emotion,
tattoos, hair
Confusion matrix is a single cross-validated fold i.e.
111 from 1102 images (90:10 train-test)
13. Evaluation
Results Set Scenario 2 – Criminal Men vs.
Non-criminal women
(Mixed race and emotion)
Criminal images (Men): 240
Non-criminal images (Women): 252
No. of Principal Components: 300
Explained Variance: 97.8%
Stratified 10-Fold Cross Validation Accuracy: 59.2%
Accuracy in Scenario 2 similar to Scenario 1 but
Scenario 2 has 45% of the image count of Scenario
1. (1102 vs. 492 images)
14. Evaluation
Results Set Scenario 3 – Criminal Men vs.
Non-criminal men
(Mixed race and emotion)
Criminal images (Men): 240
Non-criminal images (Men): 224
No. of Principal Components: 300
Explained Variance: 98 %
Stratified 10-Fold Cross Validation Accuracy: 51.3%
Scenario 2 and 3 were trained/validated on similar
image set sizes.
The Stratified 10-fold cross validated accuracy is
8% higher for the Scenario 2 which has a data sets
with opposing genders – perhaps gender aided
classification.
15. Evaluation
Results Set Scenario 4 – Criminal women
vs. Non-criminal men
Mixed race and emotion)
Criminal images (Women): 77
Non-criminal images (Men): 78
No. of Principal Components: 120
Explained Variance: 98%
Stratified 10-Fold Cross Validation Accuracy: 59%
Given that there was only 77 criminal women
images, the classifier may be limited in its predictive
power.
16. Conclusions/Future Work
▪ Many biases can exist within images. This research attempted to show that both gender and
emotion biases could affect the performance of a classifier.
▪ The model shows a high emotion imbalance across criminal and non-criminal datasets as well
as performance differences when gender bias was included.
▪ In the case of labelling people based on categories such as criminal, gay, IQ etc, there are
serious consideration s to be addressed if machine learning algorithms are to be utilised and
trusted as accurate.
▪ Future Work:
▪ Larger training dataset.
▪ With more images, the classifier could be run in emotion balanced sets.
▪ Use of VGG Face which uses a DNN (pretrained on 2.6 million images) to extract facial features.
▪ Investigation of Kernal PCA and Convolutional Neural Networks