Machine Learning: Past, Present and Future - by Tom Dietterich

Machine Learning: Past, Present,
and Future
Professor Thomas G. Dietterich
Oregon State University
Chief Scientist, BigML
11 MAY 2017
1

Question: How to build smart software?
Answer (in 1980): Interview an expert, encode knowledge in software
This worked well for expert systems
Machine Learning Past
Medical diagnosis of blood diseases (MYCIN)
Interpreting mass spectrograms (DENDRAL)
01
Configuring computer hardware systems (XCON)02
03
2

Question: How to build smart software?
Answer (in 1980): Interview an expert, encode knowledge in software
But it did NOT work well for other tasks:
Optical character recognition
Robot control
01
Natural language processing02
03
3

1.
➔ “3”
➔ “6”
➔ “2”
➔
4

Many new algorithms developed
Decision trees
Probabilistic graphical models including Naïve Bayes
1980-2000
01
Support vector machines
Ensemble methods: Bagging and Boosting
02
03
04
5

Learning algorithms succeed because they find
patterns in the data
Some algorithms reveal those patterns in easy-to-
understand ways
Data mining and knowledge discovery focus on
discovering and visualizing these patterns.
From Function Learning to
Knowledge Discovery
Decision trees
Probabilistic graphical models including Naïve Bayes
01
Association rules (frequent item sets)02
03
6

Machine Learning: Present
Automated Decision Making01
Perceptual Tasks02
Anomaly Detection03
7

Examples:
Challenge: You only learn the outcome of the chosen action; you are not told what the
best action would have been (“Bandit Feedback”)
Example: In advertisement placement, I might have 10,000 possible advertisements
that I could present to the user. I must choose 5. The user might click on one of these
(1) Automated Decision Making
Recommendations (Netflix; Amazon)
Robot Control (self-driving cars)
01
Advertisement placement (Google; Facebook)02
03
8

1. Must systematically try all actions in many situations, but eventually focus on the
most promising actions
2. When analyzing historical data (e.g., which advertisements a user clicked on), you
must keep track of what advertisements were displayed
3. Goal is to uncover causal relationship between the selected action and the
observed result
Automated Decision Making
Requires Experimentation
9

1.
Algorithms: Contextual Bandits
credit: Cohen, et al. PTRS B 2008
10

1.Self-driving car
▪ State of world: location, speed, and
acceleration of car
▪ Actions: steering, acceleration, braking
▪ Rewards: reaching destination quickly, not
colliding with people, obstacles, or cars,
conserving fuel
2.Playing the game of Go
▪ State of world: State of the Go board
▪ Actions: placing a stone
▪ Reward: winning the game
3.Operations (logistics, inventory management)
Reinforcement Learning Examples
Tesla AutoSteer
Credit: Tesla Motors
12

1. Standard machine learning algorithms require that the
data be converted to meaningful features
2. This is easy for typical database information
3. But very difficult for signal-level data (images, speech)
4. Deep learning methods are able to automatically
discover meaningful intermediate features
(2) Deep Learning for Perception
13

Top5ClassificationError(%)
0
7,5
15
22,5
30
2010 2011 2012 2013 2014
ImageNet 1000 object categories. Is the right answer in the top 5 predictions?
Progress in Object Recognition
14

Progress in Speech Recognition
Credit: Fernando Pereira and Matthew Firestone (Google)
15

1. To develop a deep learning application, the programmer designs a task-specific
deep network
2. This network must be differentiable so that the network parameters can be adjusted
via gradient search
3. Modern deep net tools compute the derivatives automatically
4. DL programmers are busy exploring the space of differentiable programs to
understand what patterns and idioms work well
▪ Convolutional blocks
▪ LSTM gates
▪ Various forms of associative memory
▪ Auto-encoders and Generative Adversarial Networks
5.It is still very hard to get a DL application to work
Deep Learning =
Differentiable Programming
16

In many applications, we have a large amount of data describing “normal” behavior and
our goal is to detect “anomalous” behavior
1. Fraud detection
▪ Normal behavior: good customers
▪ Abnormal behavior: fraudulent customers
2. Cyber attacks
▪ Normal behavior: normal network flows
▪ Abnormal behavior: network flows caused by cyber attacks
3. Machine diagnosis
▪ Normal behavior: normal sensor readings
▪ Abnormal behavior: unusual sensor readings
It is usually not safe to assume abnormal behavior has a fixed probability distribution
(3) Anomaly Detection
17

1.25,685 Benchmark Datasets
2.Eight algorithms
3. Systematic control of relevant
parameters (e.g., anomaly
frequency, irrelevant features,
problem difficulty)
We found that the Isolation Forest
algorithm was the overall best.
Included in BigML services
Experimental Study of Anomaly
Detection Methods
ChangeinMetricwrtControl
Dataset 0
0,275
0,55
0,825
1,1
Algorithm
iforest lof abod ocsvm
logit(AUC)
log(LIFT)
[Emmott, Das, Dietterich, Fern, Wong, 2013; KDD ODD-2013]
[Emmott, Das, Dietterich, Fern, Wong. 2016; arXiv 1503.01158v2]
18

1.Detecting and Correcting for Bias
2.Risk-Sensitive Optimization
3.Explanation of Black Box Systems
4.Verification and Validation
5.Integrating ML components into larger software systems
Machine Learning: Future
19

20
1.Race, Sex, Age are legally-protected categories for certain decisions
▪ Granting loans
▪ Renting or buying houses
▪ Employment
What constraints should this place on machine learning algorithms?
Constraint 1: The algorithms should not use these categories—or any features from
which these categories can be predicted—to make these decisions
Constraint 2: The learned predictive model should exhibit the same false positive/
false negative tradeoffs (same ROC curve) for each protected subgroup
(1) Detecting and Correcting for Bias

22
1.
(2) Risk-Sensitive Optimization
0.0
0.1
0.2
0.3
0 2 4 6 8
V
P(V)
0.0
0.1
0.2
0.3
0 2 4 6 8
V
P(V)

23
Some machine learning methods, particularly ensembles and deep neural networks, do
not provide simple visualizations or explanations of their predictions
We need explanations
1. To debug data sets and ML systems
2. To satisfy regulatory requirements (e.g., EU Right to Explanation)
3. To understand what knowledge has been discovered by the system
4. To decide whether to trust the system
General strategy: Compute interpretable approximations of complex ML systems
(3) Explanations for Black Box ML
Systems

24
1.
(4) Verification and Validation

25
ML provides new ways for constructing software
We need software engineering processes that ensure successful end-to-end system
performance
Recommended reading:
Martin Zinkevich: Rules for Machine Learning: Best Practices for ML Engineering
based on his experience at Google
http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
(5) Integrating ML Components

26
Machine Learning Past:
1.Machine Learning started as a way to construct software from training examples.
▪ This is still a major goal
2.ML methods were extended to support data mining and knowledge discovery
Machine Learning Present:
1. Automated Decision Making: Contextual Bandit and Reinforcement Learning methods
▪ Requires the capability to perform experiments
2. Perceptual Tasks: Deep learning for computer vision, speech recognition, etc.
▪ Can learn its own features from raw signals
3. Anomaly Detection: Fraud detection, Cyber security, Machine diagnosis
Machine Learning Future:
1. Detecting and Correcting for Bias
2. Risk-Sensitive Optimization
3. Explanations of Black Box Systems
4. Verification and Validation
5. Integrating ML Components into Larger Software Systems
Summary

27
Jeff Bezos, CEO, Amazon
"Machine learning and A.I. is a horizontal enabling layer. It will empower
and improve every business, every government organization, every
philanthropy. Basically, there's no institution in the world that cannot be
improved with machine learning”
--Inc. Magazine May 9, 2017
A Final Quote

Machine Learning: Past, Present and Future - by Tom Dietterich

More Related Content

What's hot

Similar to Machine Learning: Past, Present and Future - by Tom Dietterich

More from BigML, Inc

Recently uploaded

Machine Learning: Past, Present and Future - by Tom Dietterich