Alexander Gammerman - Machine Learning for Big Data
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Alexander Gammerman - Machine Learning for Big Data

Uploaded on

This is a presentation delivered by Alexander Gammerman, at the STFC Futures / RUSI Conference Series: Data for Security and Resilience 2014

This is a presentation delivered by Alexander Gammerman, at the STFC Futures / RUSI Conference Series: Data for Security and Resilience 2014

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Machine Learning for Big Data Alexander Gammerman Computer Learning Research Centre Royal Holloway, University of London Trends in Big Data STFC/RUSI: Big Data for Security and Resillience March 7th, 2014 1 / 19
  • 2. Layout 1 Debunking the myth 2 Machine Learning (Data Analytics) 3 Trends in Machine Learning for Big Data 4 Conclusions 2 / 19
  • 3. ”Fashionable” pursuit AI, Cybernetics, Neural Networks, Expert Systems, Big Data? Big Data, small data, any data – what we need is Data Analysis or Data Analytics or Machine Learning 3 / 19
  • 4. Machine Learning: what is it? ML is intersection of Statistics and Computer Science. Statistics deals with inferences to obtain valid conclusions from data under various models and assumptions. Computer Science considers what is computable, develops efficient algorithms and concerns with data storage and manipulation. ML takes the past data, ”learns”, tries to find some rules, regularities in the data in order to make predictions for the future examples. Efficient algorithms have to be developed to make valid predictions. 4 / 19
  • 5. Computer Learning Research Centre (CLRC) at Royal Holloway, University of London Established in 1998 to develop machine learning theory, including design of efficient algorithms for data analysis. CLRC Fellows, including several prominent ones, such as: Vapnik and Chervonenkis (the two founders of statistical learning theory), Shafer (co-founder of the DempsterShafer theory), Rissanen (inventor of the Minumum Description Length principle), Levin (one of the 3 founders of the theory of NP-completeness, made fundamental contributions to Kolmogorov complexity) 5 / 19
  • 6. Recent years: explosion of interest in machine-learning methods, in particular statistical learning theory. Statistical learning theory: similar goals to statistical science, but it is nonparametric and concerned with the problem of prediction. 6 / 19
  • 7. Problems and Current Techniques Classical techniques: small scale, low-dimensional data. But conceptual and computational difficulties for high-dimensional data. Validity of predictions. Confidence measures. Online prediction. Current techniques for dimensionality problem: Support Vector Machine (Vapnik, 1995, 1998; Vapnik and Chervonenkis, 1974); Kernel Methods. New technique for validity problem: Conformal Predictors. 7 / 19
  • 8. Projects Compact Descriptors for Automatic Target Identification (with QinetiQ). Statistical profiling of offenders (with the Home Office). Material identification with atmosphere corrections (with Watefall Solutions). Unmixing spectra (with Qinetiq). Anomaly detection (vehicles) (with Thales). Fault Diagnosis (with Marconi Instruments). 8 / 19
  • 9. Projects – cont’d Abdominal Pain (with Western General Hospital, Edinburgh). Ovarian Cancer (with Institute for Women’s Health, UCL). Depression (with Institute of Psychiatry, Kings College) Child Leukemia (with Royal London Hospital) Heart Diseases ((with Institute for Women’s Health, UCL). Analysis of microarrays (with Veterinary Laboratory Agency – DEFRA) Protein-Protein Interaction (EU project) 9 / 19
  • 10. How much data do we need to answer our questions? Big Data: V 3 Volume: Gigabyte(109); Terabyte (1012); Petabyte (1015); Exabyte (1018); Zettabyte (1021). Variety: structured, semi-structured, unstructured; text, image, audio, video. Velocity: dynamic; time-varying, etc. Plus: high-dimensionality But: if the answer is a Zettabyte what is the question? The global data supply reached 2.8 zettabytes (ZB) in 2012 - or 2.8 trillion GB - but just 0.5% of this is used for analysis, according to the Digital Universe Study. Volumes of data are projected to reach 40ZB by 2020, or 5,247 GB per person. 10 / 19
  • 11. We don’t need the big data per se - we need to have a problem first and then decide how much data we need to solve the problem. If a child wants to learn a concept of a car, he/she doesn’t need to have 1 million or billion cars to learn the concept - enough 10 or 100. If we want to predict digits, we can learn on the first 100 or 1000 digits and confidently with high accuracy, identify the next one. 11 / 19
  • 12. Figure : USPS data 12 / 19
  • 13. Figure : Conformal Predictors on USPS data: Online cumulative multiple predictions at different confidence levels (”Hedging predictions in Machine Learning” by A.Gammerman and V.Vovk The Computer Journal (2007) 50 (2): 151-163). 13 / 19
  • 14. In fact, there is a well-known concept in machine learning. If in the past people thought that the larger training set of data we have the more accurate results can be obtained. But the founders of statistical learning theory, V.Vapnik and A.Cherovnenkis, showed that it is not just the length of the training data - it is actually another charachterisitcs called ”capacity” that is more important. 14 / 19
  • 15. Trends in Machine Learning for Big Data How do we make machine learning algorithms scale to large datasets? There are two main approaches: (1) developing parallelizable ML algorithms and integrating them with large parallel systems and (2) developing more efficient algorithms. The data growth is driving the need for parallel and online algorithms and models that can handle this ”Big Data”. Need to explore the computational foundations associated with performing these analyses in the context of parallel and cloud architectures. 15 / 19
  • 16. Large-scale modeling techniques and algorithms include transductive and inductive models, online compression models (extension of conformal predictors), graphical models, deep learning and semi-supervised learning algorithms, clustering algorithms, parallel learning algorithms. The computational techniques provide a basic foundation in large-scale programming, ranging from the basic ”parfor” to parallel abstractions, such as MapReduce (Hadoop) and GraphLab. 16 / 19
  • 17. Transduction Data General Knowledgelearning Particular (future examples) (past examples) inductive transduction deduction Figure : Induction and Transduction [V.Vapnik, 1995] 17 / 19
  • 18. Why use conformal predictions? Why, after 100 years of research in statistics, do we need yet another method of prediction? It is simple and rigorous. Given any of a wide range of learning/statistical prediction methods, conformal prediction can be used as a wrapper to provide a measure of confidence. It is valid under weak assumptions. It limits the fraction of prediction mistakes from the start. (Crudely, a predictor can either make a prediction, or else say dont know, possibly in a graded way, such as giving a wide prediction interval.) It works in practice. 18 / 19
  • 19. Conclusions ”It took Deep Thought 7.5 million years to answer the ultimate question. As nobody knew what the ultimate question to Life, The Universe and Everything actually was, nobody knows what to make of the answer (42)”. Nowdays, as John Poppelaars noticed, many people think that the Big Data would help to find the ultimate question. But I already know that it is not Big Data, and the answer is not 42, but the Machine Learning. 19 / 19