SlideShare a Scribd company logo
DEMYSTIFYING
MACHINE LEARNING &
AI
Advanced Technologies for Industry 4.0
Dr Rob Baxter
r.baxter@epcc.ed.ac.uk
Machine learning & AI: why now?
• Because: Big Data
• We have (a lot) more (digital) data
• from the Web
• from sensors (including cameras, mobile devices)
• from transactional business systems
• We have faster computers
• parallel ‘cluster’ computing is mainstream
• CPUs, GPUs, FPGAs, ASICs
• We have lots of useful open source software
• for data management, data pipelining & analytics
• driven by social media giants
Advanced Technologies for Industry 4.0
Big Data 2017: per year…
1PB
NASDAQ
3PB
US
Census
4PB
US Library
of
Congress
5PB
NOAA
archive
6PB
YouTube
15PB
Advanced Technologies for Industry 4.0
Big Data 2017: per year…
Advanced Technologies for Industry 4.0
Big Data 2017: per year…
CERN
archive
73PB
searches
on Google
98PB
uploads to
Facebook
180PB
Advanced Technologies for Industry 4.0
Big Data 2017: per year…
CERN
archive
73PB
…2025 per year
Square
Kilometre Array
Telescope,
Phase 1
300PB
searches
on Google
98PB
uploads to
Facebook
180PB
High Luminosity
Large Hadron Collider
1,000PB
Square Kilometre Array
Telescope, Phase 2
1,000PB
Advanced Technologies for Industry 4.0
Different kinds of “big”
• Big Data are typically measured three ways
• volume – from gigabytes to terabytes to petabytes
• velocity – data streams at you or changes rapidly
• variety – no longer are data in nice, neat tables
• some folk add others
• veracity, verifiability, validity, value…
• Big Data come in many flavours
• very large transaction databases
• very large social graphs
• very large image collections
• very large numbers of sensor feeds
• etc.
Advanced Technologies for Industry 4.0
Data [ science | engineering | management ]
~20%
Data science
• analytics
• statistics
• machine learning
~40%
Data engineering
• data movement
• data pipelines
• data tech deployment (“data dev ops”)
• database design
• data preparation & cleaning
~40%
Data management
• data storage
• data formats
• metadata management
• data preservation & backup
• data preparation & cleaning
Advanced Technologies for Industry 4.0
Machine learning
“Machine learning is the science of getting computers
to act without being explicitly programmed.”
– Andrew Ng, Stanford University
• Two main kinds of machine learning
• unsupervised learning finds patterns in data without being
told exactly what to look for
• e.g. for clustering, fitting
• supervised learning uses labelled training data to build a
model, which is then used to make predictions
• e.g. for classification
Advanced Technologies for Industry 4.0
Unsupervised learning in action: k-means
clustering
Advanced Technologies for Industry 4.0
Iteration
Unsupervised learning: limitations of k-
means
• Clusters assumed
to the the same size
• Clusters on density
not so good
Advanced Technologies for Industry 4.0
minPts = 5, ε = 0.7 minPts = 5, ε = 0.8 minPts = 5, ε = 0.9
Unsupervised learning as art
• Plenty of other unsupervised learning algorithms
• distribution-based clustering
• density-based clustering… etc
• More complex ones have more free parameters
• tweaking is as much art as science
Advanced Technologies for Industry 4.0
Supervised learning: classifying irises
Advanced Technologies for Industry 4.0
? o
o ?
? o
Versicolor iris image courtesy of David
Berger under a CC-BY licence
setosa
versicolor
virginica
• Crunch data on flower
size, shape to identify
its type (class label)
• label = F (petal, sepal)
Supervised learning: step 1 – training
• Need labelled (i.e. already classified)
data
• want to train a model to recognise the
classes from the data (i.e. find F() )
• class label is dependent variable
• rest of data are independent variables or
predictors
• Split your big data set into training & test
sets
• 70/30 or 60/40 or so
• Feed training data into model-learning
software
• e.g. neural net, decision tree…
• Result: a classifier model F :
• label = F (petal, sepal)
Advanced Technologies for Industry 4.0
petal sepal label
1.5 5.2 setosa
1.2 4.6 setosa
4.1 6.0 versicolor
5.2 6.0 virginica
6.0 7.2 virginica
… … …
Modelling
software
Classifier
Supervised learning: step 2 – evaluation
• Feed test data into classifier
model F
• Count hits, misses vs your known
labels
• true positives, false positives…
• Good enough?
• good to go!
• Not good enough?
• go back
• tweak your modelling software
• try again
Advanced Technologies for Industry 4.0
petal sepal label
1.4 5.1 setosa
5.3 6.5 virginica
4.5 6.2 virginica
… … …
Classifier
petal sepal label
model
says…
1.4 5.1 setosa setosa
5.3 6.5 virginica virginica
4.5 6.2 virginica versicolor
… … … …
Advanced supervised learning: deep learning
• Deep learning: “learn multiple levels of representations
that correspond to different levels of abstraction”
• Wikipedia
• An old-fashioned neural net is 1 layer deep
• Deep learning neural nets are… deeper!
• multi-layer NNs, deep NNs, recurrent NNs, convolution NNs
• e.g. deep learning for image recognition
• look at flat pixel data… (1 layer)
• …and edge-detection in the image data… (another layer)
• …and different scales of the image data… (another layer)
• all in the same modelling framework
Advanced Technologies for Industry 4.0
Advanced supervised learning: deep learning
Advanced Technologies for Industry 4.0
Deep learning: spotting solar panels
• Accuracy:
• 99.60% !
• Careful!
• a classifier that
always says
“background” is
98.75% accurate
• precision is a
better measure!
• Precision:
• 84.54%
Advanced Technologies for Industry 4.0
Advanced supervised learning: reinforcement
learning
• Reinforcement learning allows software
“agents” to “explore”
• don’t need labelled data
• just set up an environment & go
• An agent:
• takes actions in an environment
• which is interpreted into a reward…
• and a representation of the state…
• which are fed back into the agent
Advanced Technologies for Industry 4.0
• Good example is DeepMind’s AlphaGo Zero
• two versions of the agent play Go against each other
• learn winning strategies by beating the other guy
Machine learning and artificial intelligence
• Today’s ML is principally pattern recognition
• IF data.looksLike(pedestrian) THEN report(‘Pedestrian’);
• This can be a powerful tool for decision support
• Think of AI as taking next step to decision making:
• IF data.looksLike(pedestrian) THEN brakes.On(now);
• Generally, we want to use empirical data to take next-best-
action
• whether a human is in, on or out of the loop
Advanced Technologies for Industry 4.0
The future of AI
• State-of-the-art in AI driven robotics:
• a team at Nanyang Technological University, Singapore got two industrial
robots to assemble (most of) an IKEA STEFAN chair in c. 20 mins
• The Economist, April 2018
• Current research topics are transfer learning…
• can a machine learn the rules of Go (yes) then figure out how to apply
them to the game of Chess (not yet)
• …and curiosity-based learning
• continuing the reinforcement-learning trend
• Hardware is becoming specialised
• GPUs (graphical processing units) and more
• Excellent source: https://www.stateof.ai/
• Nathan Benaich, Ian Hogarth (UK AI VCs), June 2018
Advanced Technologies for Industry 4.0
Be problem-driven, not data-driven
• Big Data / AI / ML is not a silver bullet
• Don’t start with the tech – start with the problem
• Don’t look at “your” data and ask what can I do with them?
• Look at your business and ask, what can I do better?
• improve operational efficiency (data management)
• understand my customers better (data science/ML)
• measure or monitor things with sensors (data engineering)
• simulate things digitally (data engineering/management)
• automate processes/decisions (ML/AI)
Advanced Technologies for Industry 4.0

More Related Content

What's hot

Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
Narong Intiruk
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
Alex Pinto
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
CS, NcState
 
IT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceIT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conference
Albert Yefimov
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
CS, NcState
 
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...Agile Research in Information Systems Field: Analysis from Knowledge Transfor...
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...
Ilia Bider
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Robert Williams
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
Gregory Piatetsky-Shapiro
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and Patterns
Xiaoyu Wang
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
CS, NcState
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Raveen Perera
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
Tao Xie
 
A Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software Visualization
Arie van Deursen
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
André Karpištšenko
 

What's hot (16)

Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
IT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceIT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conference
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...Agile Research in Information Systems Field: Analysis from Knowledge Transfor...
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and Patterns
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
A Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software Visualization
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
 

Similar to Demystifying Machine Learning and Artificial Intelligence

Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
Steve Moore
 
Deep learning for dummies dec 23 2017
Deep learning for dummies   dec 23 2017Deep learning for dummies   dec 23 2017
Deep learning for dummies dec 23 2017
Ashok Govindarajan
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
Diego Oppenheimer
 
Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
Rakuten Group, Inc.
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
Ashish Jaiman
 
Vertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial IntelligenceVertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial Intelligence
Vertex Holdings
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligence
Yanai Oron
 
Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...
Michael Petychakis
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
GoDataDriven
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
Minhaz A V
 
2_Image Classification.pdf
2_Image Classification.pdf2_Image Classification.pdf
2_Image Classification.pdf
FEG
 
Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1
Xiaohu ZHU
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
Jeff Heaton
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
Zavain Dar
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 

Similar to Demystifying Machine Learning and Artificial Intelligence (20)

Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
Deep learning for dummies dec 23 2017
Deep learning for dummies   dec 23 2017Deep learning for dummies   dec 23 2017
Deep learning for dummies dec 23 2017
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
 
Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
Vertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial IntelligenceVertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial Intelligence
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligence
 
Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
2_Image Classification.pdf
2_Image Classification.pdf2_Image Classification.pdf
2_Image Classification.pdf
 
Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 

More from EPCC, University of Edinburgh

EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
EPCC, University of Edinburgh
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
EPCC, University of Edinburgh
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
EPCC, University of Edinburgh
 
'How are organisations exploiting the Internet of Things?'
'How are organisations exploiting the Internet of Things?''How are organisations exploiting the Internet of Things?'
'How are organisations exploiting the Internet of Things?'
EPCC, University of Edinburgh
 
How companies are exploiting data science
How companies are exploiting data scienceHow companies are exploiting data science
How companies are exploiting data science
EPCC, University of Edinburgh
 
EPCC: at the heart of data-driven innovation for business
EPCC: at the heart of data-driven innovation for businessEPCC: at the heart of data-driven innovation for business
EPCC: at the heart of data-driven innovation for business
EPCC, University of Edinburgh
 
What is a data safe haven?
What is a data safe haven?What is a data safe haven?
What is a data safe haven?
EPCC, University of Edinburgh
 
HPC and Machine Learning collaboration: an industry view
HPC and Machine Learning collaboration: an industry viewHPC and Machine Learning collaboration: an industry view
HPC and Machine Learning collaboration: an industry view
EPCC, University of Edinburgh
 
Current and future data resources in Scotland
Current and future data resources in ScotlandCurrent and future data resources in Scotland
Current and future data resources in Scotland
EPCC, University of Edinburgh
 
Collaboration with industry: success stories
Collaboration with industry: success storiesCollaboration with industry: success stories
Collaboration with industry: success stories
EPCC, University of Edinburgh
 

More from EPCC, University of Edinburgh (10)

EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
 
'How are organisations exploiting the Internet of Things?'
'How are organisations exploiting the Internet of Things?''How are organisations exploiting the Internet of Things?'
'How are organisations exploiting the Internet of Things?'
 
How companies are exploiting data science
How companies are exploiting data scienceHow companies are exploiting data science
How companies are exploiting data science
 
EPCC: at the heart of data-driven innovation for business
EPCC: at the heart of data-driven innovation for businessEPCC: at the heart of data-driven innovation for business
EPCC: at the heart of data-driven innovation for business
 
What is a data safe haven?
What is a data safe haven?What is a data safe haven?
What is a data safe haven?
 
HPC and Machine Learning collaboration: an industry view
HPC and Machine Learning collaboration: an industry viewHPC and Machine Learning collaboration: an industry view
HPC and Machine Learning collaboration: an industry view
 
Current and future data resources in Scotland
Current and future data resources in ScotlandCurrent and future data resources in Scotland
Current and future data resources in Scotland
 
Collaboration with industry: success stories
Collaboration with industry: success storiesCollaboration with industry: success stories
Collaboration with industry: success stories
 

Recently uploaded

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 

Recently uploaded (20)

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 

Demystifying Machine Learning and Artificial Intelligence

  • 1. DEMYSTIFYING MACHINE LEARNING & AI Advanced Technologies for Industry 4.0 Dr Rob Baxter r.baxter@epcc.ed.ac.uk
  • 2. Machine learning & AI: why now? • Because: Big Data • We have (a lot) more (digital) data • from the Web • from sensors (including cameras, mobile devices) • from transactional business systems • We have faster computers • parallel ‘cluster’ computing is mainstream • CPUs, GPUs, FPGAs, ASICs • We have lots of useful open source software • for data management, data pipelining & analytics • driven by social media giants Advanced Technologies for Industry 4.0
  • 3. Big Data 2017: per year… 1PB NASDAQ 3PB US Census 4PB US Library of Congress 5PB NOAA archive 6PB YouTube 15PB Advanced Technologies for Industry 4.0
  • 4. Big Data 2017: per year… Advanced Technologies for Industry 4.0
  • 5. Big Data 2017: per year… CERN archive 73PB searches on Google 98PB uploads to Facebook 180PB Advanced Technologies for Industry 4.0
  • 6. Big Data 2017: per year… CERN archive 73PB …2025 per year Square Kilometre Array Telescope, Phase 1 300PB searches on Google 98PB uploads to Facebook 180PB High Luminosity Large Hadron Collider 1,000PB Square Kilometre Array Telescope, Phase 2 1,000PB Advanced Technologies for Industry 4.0
  • 7. Different kinds of “big” • Big Data are typically measured three ways • volume – from gigabytes to terabytes to petabytes • velocity – data streams at you or changes rapidly • variety – no longer are data in nice, neat tables • some folk add others • veracity, verifiability, validity, value… • Big Data come in many flavours • very large transaction databases • very large social graphs • very large image collections • very large numbers of sensor feeds • etc. Advanced Technologies for Industry 4.0
  • 8. Data [ science | engineering | management ] ~20% Data science • analytics • statistics • machine learning ~40% Data engineering • data movement • data pipelines • data tech deployment (“data dev ops”) • database design • data preparation & cleaning ~40% Data management • data storage • data formats • metadata management • data preservation & backup • data preparation & cleaning Advanced Technologies for Industry 4.0
  • 9. Machine learning “Machine learning is the science of getting computers to act without being explicitly programmed.” – Andrew Ng, Stanford University • Two main kinds of machine learning • unsupervised learning finds patterns in data without being told exactly what to look for • e.g. for clustering, fitting • supervised learning uses labelled training data to build a model, which is then used to make predictions • e.g. for classification Advanced Technologies for Industry 4.0
  • 10. Unsupervised learning in action: k-means clustering Advanced Technologies for Industry 4.0 Iteration
  • 11. Unsupervised learning: limitations of k- means • Clusters assumed to the the same size • Clusters on density not so good Advanced Technologies for Industry 4.0
  • 12. minPts = 5, ε = 0.7 minPts = 5, ε = 0.8 minPts = 5, ε = 0.9 Unsupervised learning as art • Plenty of other unsupervised learning algorithms • distribution-based clustering • density-based clustering… etc • More complex ones have more free parameters • tweaking is as much art as science Advanced Technologies for Industry 4.0
  • 13. Supervised learning: classifying irises Advanced Technologies for Industry 4.0 ? o o ? ? o Versicolor iris image courtesy of David Berger under a CC-BY licence setosa versicolor virginica • Crunch data on flower size, shape to identify its type (class label) • label = F (petal, sepal)
  • 14. Supervised learning: step 1 – training • Need labelled (i.e. already classified) data • want to train a model to recognise the classes from the data (i.e. find F() ) • class label is dependent variable • rest of data are independent variables or predictors • Split your big data set into training & test sets • 70/30 or 60/40 or so • Feed training data into model-learning software • e.g. neural net, decision tree… • Result: a classifier model F : • label = F (petal, sepal) Advanced Technologies for Industry 4.0 petal sepal label 1.5 5.2 setosa 1.2 4.6 setosa 4.1 6.0 versicolor 5.2 6.0 virginica 6.0 7.2 virginica … … … Modelling software Classifier
  • 15. Supervised learning: step 2 – evaluation • Feed test data into classifier model F • Count hits, misses vs your known labels • true positives, false positives… • Good enough? • good to go! • Not good enough? • go back • tweak your modelling software • try again Advanced Technologies for Industry 4.0 petal sepal label 1.4 5.1 setosa 5.3 6.5 virginica 4.5 6.2 virginica … … … Classifier petal sepal label model says… 1.4 5.1 setosa setosa 5.3 6.5 virginica virginica 4.5 6.2 virginica versicolor … … … …
  • 16. Advanced supervised learning: deep learning • Deep learning: “learn multiple levels of representations that correspond to different levels of abstraction” • Wikipedia • An old-fashioned neural net is 1 layer deep • Deep learning neural nets are… deeper! • multi-layer NNs, deep NNs, recurrent NNs, convolution NNs • e.g. deep learning for image recognition • look at flat pixel data… (1 layer) • …and edge-detection in the image data… (another layer) • …and different scales of the image data… (another layer) • all in the same modelling framework Advanced Technologies for Industry 4.0
  • 17. Advanced supervised learning: deep learning Advanced Technologies for Industry 4.0
  • 18. Deep learning: spotting solar panels • Accuracy: • 99.60% ! • Careful! • a classifier that always says “background” is 98.75% accurate • precision is a better measure! • Precision: • 84.54% Advanced Technologies for Industry 4.0
  • 19. Advanced supervised learning: reinforcement learning • Reinforcement learning allows software “agents” to “explore” • don’t need labelled data • just set up an environment & go • An agent: • takes actions in an environment • which is interpreted into a reward… • and a representation of the state… • which are fed back into the agent Advanced Technologies for Industry 4.0 • Good example is DeepMind’s AlphaGo Zero • two versions of the agent play Go against each other • learn winning strategies by beating the other guy
  • 20. Machine learning and artificial intelligence • Today’s ML is principally pattern recognition • IF data.looksLike(pedestrian) THEN report(‘Pedestrian’); • This can be a powerful tool for decision support • Think of AI as taking next step to decision making: • IF data.looksLike(pedestrian) THEN brakes.On(now); • Generally, we want to use empirical data to take next-best- action • whether a human is in, on or out of the loop Advanced Technologies for Industry 4.0
  • 21. The future of AI • State-of-the-art in AI driven robotics: • a team at Nanyang Technological University, Singapore got two industrial robots to assemble (most of) an IKEA STEFAN chair in c. 20 mins • The Economist, April 2018 • Current research topics are transfer learning… • can a machine learn the rules of Go (yes) then figure out how to apply them to the game of Chess (not yet) • …and curiosity-based learning • continuing the reinforcement-learning trend • Hardware is becoming specialised • GPUs (graphical processing units) and more • Excellent source: https://www.stateof.ai/ • Nathan Benaich, Ian Hogarth (UK AI VCs), June 2018 Advanced Technologies for Industry 4.0
  • 22. Be problem-driven, not data-driven • Big Data / AI / ML is not a silver bullet • Don’t start with the tech – start with the problem • Don’t look at “your” data and ask what can I do with them? • Look at your business and ask, what can I do better? • improve operational efficiency (data management) • understand my customers better (data science/ML) • measure or monitor things with sensors (data engineering) • simulate things digitally (data engineering/management) • automate processes/decisions (ML/AI) Advanced Technologies for Industry 4.0

Editor's Notes

  1. We have a lot of data but we need techniques/tools/machines to understand/interpret the data and make use of it. Here is where machine learning and AI come into play. Data, powerful machines, and open-source software are available.
  2. Although we call this ‘unsupervised’ actually we have told the computer to divide the dataset into three groups. We could have said 2 or 4. This is the ‘k’ value. The ‘means’ part signifies that a data point is assigned to the cluster that has the closest mean (average) value. The algorithm tries to get the points so that the sum of the distances of each point to the mean of its cluster is the minimum for the whole set. Run the animation and watch the crosses (the mean of each cluster) move as the algorithm progresses towards a better solution.
  3. This method does not work for all datasets. These are standard datasets that are used to show that the method can break down.
  4. Setting the ringed parameter to differing values produces different results. These diagrams show unsupervised learning where the number of clusters is not given. Instead there are parameter to define the minimum number of points and how far they are allowed to be from the centre of a cluster but still be counted as part of it. Varying the distance parameter changes the results significantly. Which value is correct? There is no correct answer to that!
  5. Iris is a type of flower with three categories. The picture shows versicolor. Sepal is the part of the flower that supports the petals (usually green) not shown that well in this diagram. How would you classify the question marks? First one is quite easy Second one is quite easy Third one is trickier --- Plot comes from: my.plot <- xyplot(Sepal.Length ~ Petal.Length, data = iris, groups=Species, panel = panel.superpose, col.line = trellis.par.get("strip.background")$col, col.symbol = trellis.par.get("strip.shingle")$col, key = list(title = "Iris Data", x = .15, y=.85, corner = c(0,1), border = TRUE, points = list(col=trellis.par.get("strip.shingle")$col[1:3], pch = trellis.par.get("superpose.symbol")$pch[1:3], cex = trellis.par.get("superpose.symbol")$cex[1:3] ), text = list(levels(iris$Species))))
  6. The algorithm (set of steps) used to train a model.
  7. It’s quite an important point here that it’s not a good idea to use your training data as test data. This is why you hold some back (as on the previous slide). It’s possible to end up with a very misleading accuracy by ‘overtraining’ the algorithm so that it performs with very high accuracy on the training set, but is no good for data it has not ‘seen’ before.
  8. Neural networks usually have several layers. Deep learning comes from “deep neural networks”, ie a deep learning model contains a lot of layers. For further information, see: https://www.mathworks.com/discovery/deep-learning.html https://devblogs.nvidia.com/deep-learning-nutshell-core-concepts/
  9. CNNS are commonly used for image processing. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers. Convolutional layers apply a convolution operation to the input, passing the result to the next layer. See: https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional
  10. i.e. PPV: 84.54% of the predicted solar panel pixels are solar panel pixels i.e. TPR: 83.78% of the pixels that belong to a solar panel are correctly predicted as solar panel pixels
  11. To learn more, see the Deepmind page: https://deepmind.com/blog/alphago-zero-learning-scratch/