Demystifying Machine Learning and Artificial Intelligence

DEMYSTIFYING
MACHINE LEARNING &
AI
Advanced Technologies for Industry 4.0
Dr Rob Baxter
r.baxter@epcc.ed.ac.uk

Machine learning & AI: why now?
• Because: Big Data
• We have (a lot) more (digital) data
• from the Web
• from sensors (including cameras, mobile devices)
• from transactional business systems
• We have faster computers
• parallel ‘cluster’ computing is mainstream
• CPUs, GPUs, FPGAs, ASICs
• We have lots of useful open source software
• for data management, data pipelining & analytics
• driven by social media giants

Big Data 2017: per year…
1PB
NASDAQ
3PB
US
Census
4PB
US Library
of
Congress
5PB
NOAA
archive
6PB
YouTube
15PB

CERN
archive
73PB
searches
on Google
98PB
uploads to
Facebook
180PB

CERN
archive
73PB
…2025 per year
Square
Kilometre Array
Telescope,
Phase 1
300PB
searches
on Google
98PB
uploads to
Facebook
180PB
High Luminosity
Large Hadron Collider
1,000PB
Square Kilometre Array
Telescope, Phase 2
1,000PB

Different kinds of “big”
• Big Data are typically measured three ways
• volume – from gigabytes to terabytes to petabytes
• velocity – data streams at you or changes rapidly
• variety – no longer are data in nice, neat tables
• some folk add others
• veracity, verifiability, validity, value…
• Big Data come in many flavours
• very large transaction databases
• very large social graphs
• very large image collections
• very large numbers of sensor feeds
• etc.

Data [ science | engineering | management ]
~20%
Data science
• analytics
• statistics
• machine learning
~40%
Data engineering
• data movement
• data pipelines
• data tech deployment (“data dev ops”)
• database design
• data preparation & cleaning
~40%
Data management
• data storage
• data formats
• metadata management
• data preservation & backup
• data preparation & cleaning

Machine learning
“Machine learning is the science of getting computers
to act without being explicitly programmed.”
– Andrew Ng, Stanford University
• Two main kinds of machine learning
• unsupervised learning finds patterns in data without being
told exactly what to look for
• e.g. for clustering, fitting
• supervised learning uses labelled training data to build a
model, which is then used to make predictions
• e.g. for classification

Unsupervised learning in action: k-means
clustering
Iteration

Unsupervised learning: limitations of k-
means
• Clusters assumed
to the the same size
• Clusters on density
not so good

minPts = 5, ε = 0.7 minPts = 5, ε = 0.8 minPts = 5, ε = 0.9
Unsupervised learning as art
• Plenty of other unsupervised learning algorithms
• distribution-based clustering
• density-based clustering… etc
• More complex ones have more free parameters
• tweaking is as much art as science

Supervised learning: classifying irises
? o
o ?
? o
Versicolor iris image courtesy of David
Berger under a CC-BY licence
setosa
versicolor
virginica
• Crunch data on flower
size, shape to identify
its type (class label)
• label = F (petal, sepal)

Supervised learning: step 1 – training
• Need labelled (i.e. already classified)
data
• want to train a model to recognise the
classes from the data (i.e. find F() )
• class label is dependent variable
• rest of data are independent variables or
predictors
• Split your big data set into training & test
sets
• 70/30 or 60/40 or so
• Feed training data into model-learning
software
• e.g. neural net, decision tree…
• Result: a classifier model F :
• label = F (petal, sepal)
petal sepal label
1.5 5.2 setosa
1.2 4.6 setosa
4.1 6.0 versicolor
5.2 6.0 virginica
6.0 7.2 virginica
… … …
Modelling
software
Classifier

Supervised learning: step 2 – evaluation
• Feed test data into classifier
model F
• Count hits, misses vs your known
labels
• true positives, false positives…
• Good enough?
• good to go!
• Not good enough?
• go back
• tweak your modelling software
• try again
petal sepal label
1.4 5.1 setosa
5.3 6.5 virginica
4.5 6.2 virginica
… … …
Classifier
petal sepal label
model
says…
1.4 5.1 setosa setosa
5.3 6.5 virginica virginica
4.5 6.2 virginica versicolor
… … … …

Advanced supervised learning: deep learning
• Deep learning: “learn multiple levels of representations
that correspond to different levels of abstraction”
• Wikipedia
• An old-fashioned neural net is 1 layer deep
• Deep learning neural nets are… deeper!
• multi-layer NNs, deep NNs, recurrent NNs, convolution NNs
• e.g. deep learning for image recognition
• look at flat pixel data… (1 layer)
• …and edge-detection in the image data… (another layer)
• …and different scales of the image data… (another layer)
• all in the same modelling framework

Advanced supervised learning: deep learning

Deep learning: spotting solar panels
• Accuracy:
• 99.60% !
• Careful!
• a classifier that
always says
“background” is
98.75% accurate
• precision is a
better measure!
• Precision:
• 84.54%

Advanced supervised learning: reinforcement
learning
• Reinforcement learning allows software
“agents” to “explore”
• don’t need labelled data
• just set up an environment & go
• An agent:
• takes actions in an environment
• which is interpreted into a reward…
• and a representation of the state…
• which are fed back into the agent
• Good example is DeepMind’s AlphaGo Zero
• two versions of the agent play Go against each other
• learn winning strategies by beating the other guy

Machine learning and artificial intelligence
• Today’s ML is principally pattern recognition
• IF data.looksLike(pedestrian) THEN report(‘Pedestrian’);
• This can be a powerful tool for decision support
• Think of AI as taking next step to decision making:
• IF data.looksLike(pedestrian) THEN brakes.On(now);
• Generally, we want to use empirical data to take next-best-
action
• whether a human is in, on or out of the loop

The future of AI
• State-of-the-art in AI driven robotics:
• a team at Nanyang Technological University, Singapore got two industrial
robots to assemble (most of) an IKEA STEFAN chair in c. 20 mins
• The Economist, April 2018
• Current research topics are transfer learning…
• can a machine learn the rules of Go (yes) then figure out how to apply
them to the game of Chess (not yet)
• …and curiosity-based learning
• continuing the reinforcement-learning trend
• Hardware is becoming specialised
• GPUs (graphical processing units) and more
• Excellent source: https://www.stateof.ai/
• Nathan Benaich, Ian Hogarth (UK AI VCs), June 2018

Be problem-driven, not data-driven
• Big Data / AI / ML is not a silver bullet
• Don’t start with the tech – start with the problem
• Don’t look at “your” data and ask what can I do with them?
• Look at your business and ask, what can I do better?
• improve operational efficiency (data management)
• understand my customers better (data science/ML)
• measure or monitor things with sensors (data engineering)
• simulate things digitally (data engineering/management)
• automate processes/decisions (ML/AI)

Demystifying Machine Learning and Artificial Intelligence

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Demystifying Machine Learning and Artificial Intelligence

Similar to Demystifying Machine Learning and Artificial Intelligence (20)

More from EPCC, University of Edinburgh

More from EPCC, University of Edinburgh (10)

Recently uploaded

Recently uploaded (20)

Demystifying Machine Learning and Artificial Intelligence

Editor's Notes