Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon

Machine Learning @ Amazon
Ralf Herbrich
11/8/16 1

Background
1992 – 1997 (Berlin, Diploma)
1997 – 2000 (Berlin, PhD)
2000 – 2009 (Microsoft Research)
2009 – 2011 (Microsoft)
2011 – 2012 (Facebook)
2012 – Present (Amazon)

Overview
• What is Machine Learning?
• A Computer Science and Statistics Perspective
• History of Machine Learning
• Machine Learning @ Amazon
• Forecasting
• Machine Translation
• Amazon Machine Learning
• Visual Systems
11/8/16 3

Overview
• Forecasting
• Visual Systems
11/8/16 4

Overview
• Forecasting
• Visual Systems
11/8/16 5

Machine Learning: The Science
Science
• Computer Science
• Statistics
• Neuroscience
• Operations Research
Artificial Intelligence
• Rule extraction from data
• Inspired by human learning
• Adaptive algorithms
Engineering
• Training: Data à Models
• Prediction: Models à Forecast
• Decision: Forecast à Actions
11/8/16 6

Machine Learning: A Programer Perspective
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
711/8/16

High atop the steps of
the Pyramid of Giza a
young woman laughed
and called down to him.
"Robert,
hurry up! I knew I should
have married a younger
man!" Her smile was
magic. ….
ML Examples: Named Entity Extraction
8
Author Annotator
young woman laughed
"Robert,
man!" Her smile was
magic. ….
if (word is capitalized) and
(word before is ‘in’) then
PLACE
else if (word = ‘her’) or (word = ‘his’)
or (word = ‘he’) or (word = ‘she’) then
PERSON
...
Data Output (Annotation)
Program
11/8/16

young woman laughed
"Robert,
man!" Her smile was
magic. ….
ML Examples: Named Entity Extraction
9
Author Annotator
… "Robert,
man!". ….
Machine Learning Service
young woman laughed
… Her smile was magic.
….
11/8/16

Amazon’s Virtuous Cycles
Growth
Customer
Experience
Traffic
Sellers
Selection &
Convenience
Lower
Prices
Lower Cost
Structure
1. Saving costs by better planning (e.g., forecasting)
2. Saving costs by automating human decision making (e.g., pricing)
3. Increasing revenue by low-friction experience (e.g. recommendation)
311/8/16

Overview
• Forecasting
• Visual Systems
11/8/16 11

History of Machine Learning
• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 12

• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 13

• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 14

• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 15

• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 16

• Deep Neural
Networks
• Fast hardware
(GPUs)
• Distributed
computing and
storage
• Learning =
Adaptation of
Weights in a
brain-like
layered
architecture
2015
("AI")
• Distributed
computing and
storage
• Adaptive
systems
• Learning =
Scalable,
Adaptive
Computation
for Various Big
Data
2010
(“Service”)
• Wide
application in
products
• Statistical
Modeling of
Data
• Learning =
Parameter
Estimation or
Inference
2005
(“Graphical
Models”)
• Statistical
Learning
Theory
• Scoring
Systems
• Learning =
Optimization of
Convex
Functions
2000
(“Kernel
Machines”)
• Expert Systems
• Decision-Tree
Learning (C4.5)
• Learning =
Methods to
automatically
build Expert
Systems
1990
(“Symbolic”)
• Neural
Networks
• Artificial
Intelligence
• Learning =
Adaptation of
Neurons based
on External
Stimuli
1980
(“Neuro”)
11/8/16 17

Overview
• Forecasting
• Visual Systems
11/8/16 18

Machine Learning Opportunities @ Amazon
Retail
• Demand
Forecasting
• Vendor Lead Time
Prediction
• Pricing
• Packaging
• Substitute
Prediction
Customers
• Product
Recommendation
• Product Search
• Visual Search
• Product Ads
• Shopping Advice
• Customer Problem
Detection
Seller
• Fraud Detection
• Predictive Help
• Seller Search &
Crawling
Catalog
• Browse-Node
Classification
• Meta-data
validation
• Review Analysis
• Hazmat Prediction
Digital
• Named-Entity
Extraction
• XRay
• Plagiarism
Detection
• Echo Speech
Recognition
• Knowledge
Acquisiion
1911/8/16

Locations
20
ML Seattle
ML Bangalore
S9
A9
A2Z
11/8/16
Ivona
ML Berlin
Evi

Overview
• Machine Learning and Artificial Intelligence
• Forecasting
• Visual Systems
11/8/16 21

Forecasting
• Given past sales of a product in every region, predict regional demand up to one year into the future
Setting
• New Products: No past demand!
• Regionalized: 100+ fulfillment centers worldwide
• Sparsity: Huge skew – many products sell very few items
• Seasonal: Huge variation due to external, seasonal events
• Distributions: Future is uncertain è predictions must be distributions
• Scale: 20M+ products fulfilled by Amazon alone!
• Orders: Customers demand bundle of products
• Censored: Past sales ≠ past demand (inventory constraint)
Challenges
11/8/16 22

Demand Forecasting
2311/8/16
Training Range: Non-fashion items
have longer training ranges that we
can leverage. Need to information
share across new and old products.
Seasonality: This item has Christmas
seasonality with higher growth over time.
This is where we need growth features in
addition to date features.
Missing Features or Input:
Unexplained spikes in demand are
likely caused by missing features or
incomplete input data.
Example Softlines product to illustrate the challenges of forecasting.

New Products
2411/8/16
Learning across groups of products with varying ages to improve accuracy for new products
New Product Without Sharing:
Product is less than 1 year old and
hasn’t seen all dates before. Features
learned per product are not very strong.
Red = Actual Demand
Black = Forecast
New Product With Sharing: Once
we share data across groups of
products, we start to see the
appropriate lift for new holidays.

Overview
• Forecasting
• Visual Systems
11/8/16 25

ASIN Machine Translation
ASINs
ContributionProfit
Human Translation
Machine Translation
Selection Gap
11/8/16 26

Machine Translation Pipeline
11/8/16 27
Input Normalization Tokenization
Sentence SegmentationLowercasing
Translation/Decoding Recasing
Post-processing De-Tokenization
Input Request
Detection & Escaping of
Non-translatables
Re-insertion of (converted)
Nontranslatables
Translated Request

Machine Translation: Deep Dive
p(English |Chinese) =
p(English)× p(Chinese | English)
p(Chinese)
∝ p(English)× p(Chinese | English)
Language
Model
Translation
Model
• Language Model: What are fluent English sentences?
• Translation Model: What English sentences account
well for a given Chinese sentence?
11/8/16 28

Overview
• Forecasting
• Visual Systems
11/8/16 29

Scalable Algorithms & Services
• No limitations on model size and data size!
Setting
• Distributed: Parameters need to be distributed
• Fault Tolerance: Data and model chunks might fail
• Simplicity: Zero-parameter algorithms for engineers
• Any-Time: Any-time convergence of algorithms
• Resource-Constrains: Learning algorithms that optimize under resource & budget
constraints
Challenges
11/8/16 30

Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing and
dashboards
Predictions
to enable smart
applications
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
Amazon Machine
Learning

Introducing Amazon ML
• Easy to use, managed machine learning service
built for developers
• Robust, powerful machine learning technology
based on Amazon’s internal systems
• Create models using your data already stored in
the AWS cloud
• Deploy models to production in seconds

Overview
• Forecasting
• Visual Systems
11/8/16 33

Automated Produce Inspection
New Automated InspectionCurrent Inspection
Computer Vision

Conclusions
• Machine Learning is an emerging and scientifically young discipline!
• Machine Learning “translates” data from the past into accurate
predictions about the future!
• Amazon has a broad range of applications for Machine Learning – it’s
central to Amazon’s business!
11/8/16 35

Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (20)

Similar to Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon

Similar to Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon (20)

More from Rising Media Ltd.

More from Rising Media Ltd. (20)

Recently uploaded

Recently uploaded (20)

Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon