Alexander Gammerman - Machine Learning for Big Data
Machine Learning for Big Data
Computer Learning Research Centre
Royal Holloway, University of London
Trends in Big Data
STFC/RUSI: Big Data for Security and Resillience
March 7th, 2014
1 / 19
1 Debunking the myth
2 Machine Learning (Data Analytics)
3 Trends in Machine Learning for Big Data
2 / 19
AI, Cybernetics, Neural Networks, Expert Systems,
Big Data, small data, any data – what we need is Data Analysis or
Data Analytics or Machine Learning
3 / 19
Machine Learning: what is it?
ML is intersection of Statistics and Computer Science.
Statistics deals with inferences to obtain valid conclusions from data under
various models and assumptions.
Computer Science considers what is computable, develops eﬃcient
algorithms and concerns with data storage and manipulation.
ML takes the past data, ”learns”, tries to ﬁnd some rules, regularities in
the data in order to make predictions for the future examples. Eﬃcient
algorithms have to be developed to make valid predictions.
4 / 19
Computer Learning Research Centre (CLRC) at Royal
Holloway, University of London
Established in 1998 to develop machine learning theory, including design of
eﬃcient algorithms for data analysis.
CLRC Fellows, including several prominent ones, such as: Vapnik and
Chervonenkis (the two founders of statistical learning theory), Shafer
(co-founder of the DempsterShafer theory), Rissanen (inventor of the
Minumum Description Length principle), Levin (one of the 3 founders of
the theory of NP-completeness, made fundamental contributions to
5 / 19
Recent years: explosion of interest in machine-learning methods, in
particular statistical learning theory. Statistical learning theory: similar
goals to statistical science, but
it is nonparametric and
concerned with the problem of prediction.
6 / 19
Problems and Current Techniques
Classical techniques: small scale, low-dimensional data. But conceptual
and computational diﬃculties for high-dimensional data. Validity of
predictions. Conﬁdence measures. Online prediction.
Current techniques for dimensionality problem: Support Vector Machine
(Vapnik, 1995, 1998; Vapnik and Chervonenkis, 1974); Kernel Methods.
New technique for validity problem: Conformal Predictors.
7 / 19
Compact Descriptors for Automatic Target Identiﬁcation (with
Statistical proﬁling of oﬀenders (with the Home Oﬃce).
Material identiﬁcation with atmosphere corrections (with Watefall
Unmixing spectra (with Qinetiq).
Anomaly detection (vehicles) (with Thales).
Fault Diagnosis (with Marconi Instruments).
8 / 19
Projects – cont’d
Abdominal Pain (with Western General Hospital, Edinburgh).
Ovarian Cancer (with Institute for Women’s Health, UCL).
Depression (with Institute of Psychiatry, Kings College)
Child Leukemia (with Royal London Hospital)
Heart Diseases ((with Institute for Women’s Health, UCL).
Analysis of microarrays (with Veterinary Laboratory Agency –
Protein-Protein Interaction (EU project)
9 / 19
How much data do we need to answer our questions?
Big Data: V 3
Volume: Gigabyte(109); Terabyte (1012); Petabyte (1015); Exabyte
(1018); Zettabyte (1021).
Variety: structured, semi-structured, unstructured; text, image, audio,
Velocity: dynamic; time-varying, etc.
But: if the answer is a Zettabyte what is the question?
The global data supply reached 2.8 zettabytes (ZB) in 2012 - or 2.8
trillion GB - but just 0.5% of this is used for analysis, according to the
Digital Universe Study. Volumes of data are projected to reach 40ZB by
2020, or 5,247 GB per person.
10 / 19
We don’t need the big data per se - we need to have a problem ﬁrst and
then decide how much data we need to solve the problem.
If a child wants to learn a concept of a car, he/she doesn’t need to have 1
million or billion cars to learn the concept - enough 10 or 100.
If we want to predict digits, we can learn on the ﬁrst 100 or 1000 digits
and conﬁdently with high accuracy, identify the next one.
11 / 19
Figure : Conformal Predictors on USPS data: Online cumulative multiple
predictions at diﬀerent conﬁdence levels (”Hedging predictions in Machine
Learning” by A.Gammerman and V.Vovk The Computer Journal (2007) 50 (2):
13 / 19
In fact, there is a well-known concept in machine learning. If in the past
people thought that the larger training set of data we have the more
accurate results can be obtained. But the founders of statistical learning
theory, V.Vapnik and A.Cherovnenkis, showed that it is not just the length
of the training data - it is actually another charachterisitcs called
”capacity” that is more important.
14 / 19
Trends in Machine Learning for Big Data
How do we make machine learning algorithms scale to large datasets?
There are two main approaches: (1) developing parallelizable ML
algorithms and integrating them with large parallel systems and (2)
developing more eﬃcient algorithms.
The data growth is driving the need for parallel and online algorithms and
models that can handle this ”Big Data”.
Need to explore the computational foundations associated with performing
these analyses in the context of parallel and cloud architectures.
15 / 19
Large-scale modeling techniques and algorithms include
transductive and inductive models,
online compression models (extension of conformal predictors),
deep learning and semi-supervised learning algorithms,
parallel learning algorithms.
The computational techniques provide a basic foundation in large-scale
programming, ranging from the basic ”parfor” to parallel abstractions,
such as MapReduce (Hadoop) and GraphLab.
16 / 19
Figure : Induction and Transduction [V.Vapnik, 1995]
17 / 19
Why use conformal predictions?
Why, after 100 years of research in statistics, do we need yet another
method of prediction?
It is simple and rigorous.
Given any of a wide range of learning/statistical prediction methods,
conformal prediction can be used as a wrapper to provide a measure
It is valid under weak assumptions.
It limits the fraction of prediction mistakes from the start. (Crudely, a
predictor can either make a prediction, or else say dont know, possibly
in a graded way, such as giving a wide prediction interval.)
It works in practice.
18 / 19
”It took Deep Thought 7.5 million years to answer the ultimate question.
As nobody knew what the ultimate question to Life, The Universe and
Everything actually was, nobody knows what to make of the answer (42)”.
Nowdays, as John Poppelaars noticed, many people think that the Big
Data would help to ﬁnd the ultimate question.
But I already know that it is not Big Data, and the answer is not 42, but
the Machine Learning.
19 / 19