Big Data &
Machine Learning
Angelo MARIANO
@angelinux74
angelo.mariano@gmail.com
Summary
What I am trying to present...
● Everything is data
● Automation is everywhere
● Learning how to learn
Big data is like teenage sex:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is doing it,
so everyone claims they are doing it...
Dan Ariely, Duke University
What is Big Data?
In very few words, we talk of big data when
we let a machine (a bunch of machines)
analyse huge amount of data
Data explosion
Data Sources
Who collects them?
● GPS
● Internet
● Smartphones
● Wearable devices
● Sensors
In 2020, there will be 150 billions of recording devices, 20x the Earth population
“Today, we are so dependent on oil, and oil is so
embedded in our daily doings, that we hardly
stop to comprehend its pervasive significance. It
is oil that makes possible where we live, how we
live, how we commute to work, how we travel –
even where we conduct our courtships. It is the
lifeblood of suburban communities.”
Daniel Yergin (1991) The Prize
Big Data Axes
2013
2016
MapReduce & HDFS
Automation
How to process big data?
● Everything will become intelligent; soon we will
not only have smart phones, but also smart
homes, smart factories and smart cities.
● There is an automation of data analysis. Artificial
intelligence is no longer programmed line by line,
but is now capable of learning, thereby
continuously developing itself.
● Algorithms can now recognize handwritten
language and patterns almost as well as humans
and even complete some tasks better than them.
● Today 70% of all financial transactions are
performed by algorithms
● Today, it is easier and cheaper to generate data
than ever before, and the tools to turn these
data into insights are growing exponentially in
both quality and quantity. So much so that any
organization dealing with data that does not
apply machine learning in some fashion will be
left behind.
AI is the new electricity. Just as 100 years
ago electricity transformed industry after
industry, AI will now do the same.
Andrew Ng, Oct. 2016, Fortune Magazine
Machine learning
● Through the machine learning technique computers are enabled to learn without having been
programmed to do so, thanks to a huge amount of data that is provided to them as study
material
● As humans learn from experience, these algorithms learn from the data. A child learns to
recognize a cat after having seen five / six specimens, a computer needs thousands and
thousands of examples, but it's amazing how many things they can do when they are able to
profit from their learning
● Once trained, the machines can make predictions. This is the most important skill of the AI
revolution: algorithms provide on Amazon books that we would like to read, on Facebook what
we want to see in our feed, but may provide, along with a doctor, even a disease in the early
stages a patient is developing
An example: Feed Forward Neural Net
Frank Rosenblatt (1957), The Perceptron - a perceiving and recognizing automaton 1
Universal Approximation Theorem
George Cybenko (1989), Kurt Hornik (1991)
Something new happens...
Deep
Neural
Network
Deep
Visualization
Toolbox
AlphaGo
Deep Learning for a Japanese farmer
So, follow the data. Choose a representation that can use
unsupervised learning on unlabeled data, which is so much
more plentiful than labeled data. Represent all the data
with a nonparametric model rather than trying to summarize
it with a parametric model, because with very large data
sources, the data holds a lot of detail.
Halevy, Norvig, Pereira (2009), The Unreasonable Effectiveness of Data
@angelinux74
angelo.mariano@gmail.com
What we
understand
is over the
surface...
References
● Dan Ariely on big data
● DevOps Borat on big data
● Data explosion
● Daniel Yergin, The prize
● Data is the new oil
● IBM infographic on big data
● Data never sleeps
● Data never sleeps 4.0
● MapReduce Google paper
● MapReduce Tutorial
● Google Filesystem
● HDFS Architecture
● Big Data & Analytics
● Big Data ecosystem
● Automation and democracy
● Andrew Ng, Fortune
● Frank Rosenblatt, the Perceptron
● Introduction to ANN
● Universal approximation theorem and its
illustration
● Deep neural networks
● Deep learning tutorial
● Multi GPU deep learning from Nvidia
● Deep Visualization Toolbox
● How Google's AlphaGo beat a Go world
champion
● A Japanese farmer and deep learning
● Eugene Wigner, The Unreasonable
Effectiveness of Mathematics in the Natural
Sciences
● The Unreasonable Effectiveness of Data
● Jeremy Howard, The wonderful and terrifying
implications of computers that can learn

Big Data & Machine Learning

  • 1.
    Big Data & MachineLearning Angelo MARIANO @angelinux74 angelo.mariano@gmail.com
  • 2.
    Summary What I amtrying to present... ● Everything is data ● Automation is everywhere ● Learning how to learn
  • 3.
    Big data islike teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... Dan Ariely, Duke University
  • 4.
    What is BigData? In very few words, we talk of big data when we let a machine (a bunch of machines) analyse huge amount of data
  • 5.
  • 6.
    Data Sources Who collectsthem? ● GPS ● Internet ● Smartphones ● Wearable devices ● Sensors In 2020, there will be 150 billions of recording devices, 20x the Earth population
  • 7.
    “Today, we areso dependent on oil, and oil is so embedded in our daily doings, that we hardly stop to comprehend its pervasive significance. It is oil that makes possible where we live, how we live, how we commute to work, how we travel – even where we conduct our courtships. It is the lifeblood of suburban communities.” Daniel Yergin (1991) The Prize
  • 9.
  • 10.
  • 11.
  • 13.
    Automation How to processbig data? ● Everything will become intelligent; soon we will not only have smart phones, but also smart homes, smart factories and smart cities. ● There is an automation of data analysis. Artificial intelligence is no longer programmed line by line, but is now capable of learning, thereby continuously developing itself. ● Algorithms can now recognize handwritten language and patterns almost as well as humans and even complete some tasks better than them. ● Today 70% of all financial transactions are performed by algorithms ● Today, it is easier and cheaper to generate data than ever before, and the tools to turn these data into insights are growing exponentially in both quality and quantity. So much so that any organization dealing with data that does not apply machine learning in some fashion will be left behind.
  • 14.
    AI is thenew electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same. Andrew Ng, Oct. 2016, Fortune Magazine
  • 15.
    Machine learning ● Throughthe machine learning technique computers are enabled to learn without having been programmed to do so, thanks to a huge amount of data that is provided to them as study material ● As humans learn from experience, these algorithms learn from the data. A child learns to recognize a cat after having seen five / six specimens, a computer needs thousands and thousands of examples, but it's amazing how many things they can do when they are able to profit from their learning ● Once trained, the machines can make predictions. This is the most important skill of the AI revolution: algorithms provide on Amazon books that we would like to read, on Facebook what we want to see in our feed, but may provide, along with a doctor, even a disease in the early stages a patient is developing
  • 16.
    An example: FeedForward Neural Net Frank Rosenblatt (1957), The Perceptron - a perceiving and recognizing automaton 1
  • 17.
    Universal Approximation Theorem GeorgeCybenko (1989), Kurt Hornik (1991)
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Deep Learning fora Japanese farmer
  • 23.
    So, follow thedata. Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data. Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail. Halevy, Norvig, Pereira (2009), The Unreasonable Effectiveness of Data
  • 24.
  • 25.
    References ● Dan Arielyon big data ● DevOps Borat on big data ● Data explosion ● Daniel Yergin, The prize ● Data is the new oil ● IBM infographic on big data ● Data never sleeps ● Data never sleeps 4.0 ● MapReduce Google paper ● MapReduce Tutorial ● Google Filesystem ● HDFS Architecture ● Big Data & Analytics ● Big Data ecosystem ● Automation and democracy ● Andrew Ng, Fortune ● Frank Rosenblatt, the Perceptron ● Introduction to ANN ● Universal approximation theorem and its illustration ● Deep neural networks ● Deep learning tutorial ● Multi GPU deep learning from Nvidia ● Deep Visualization Toolbox ● How Google's AlphaGo beat a Go world champion ● A Japanese farmer and deep learning ● Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences ● The Unreasonable Effectiveness of Data ● Jeremy Howard, The wonderful and terrifying implications of computers that can learn