Ralf Herbrich gave a presentation on machine learning at Amazon. Some key points included:
- He discussed his background and experience in machine learning from 1992 to the present.
- The presentation provided an overview of machine learning including definitions from computer science and statistics perspectives, the history of machine learning, and examples of machine learning applications at Amazon like forecasting, machine translation, and visual systems.
6. Machine Learning: The Science
Science
• Computer Science
• Statistics
• Neuroscience
• Operations Research
Artificial Intelligence
• Rule extraction from data
• Inspired by human learning
• Adaptive algorithms
Engineering
• Training: Data à Models
• Prediction: Models à Forecast
• Decision: Forecast à Actions
11/8/16 6
12. History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 12
13. History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 13
14. History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 14
15. History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 15
16. History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 16
17. History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 17
23. Demand Forecasting
2311/8/16
Training Range: Non-fashion items
have longer training ranges that we
can leverage. Need to information
share across new and old products.
Seasonality: This item has Christmas
seasonality with higher growth over time.
This is where we need growth features in
addition to date features.
Missing Features or Input:
Unexplained spikes in demand are
likely caused by missing features or
incomplete input data.
Example Softlines product to illustrate the challenges of forecasting.
24. New Products
2411/8/16
Learning across groups of products with varying ages to improve accuracy for new products
New Product Without Sharing:
Product is less than 1 year old and
hasn’t seen all dates before. Features
learned per product are not very strong.
Red = Actual Demand
Black = Forecast
New Product With Sharing: Once
we share data across groups of
products, we start to see the
appropriate lift for new holidays.
28. Machine Translation: Deep Dive
p(English |Chinese) =
p(English)× p(Chinese | English)
p(Chinese)
∝ p(English)× p(Chinese | English)
Language
Model
Translation
Model
• Language Model: What are fluent English sentences?
• Translation Model: What English sentences account
well for a given Chinese sentence?
11/8/16 28