How Lazada
ranks products
to improve customer experience and conversion
Strata Hadoop Singapore 2016
How Lazada
ranks products
to improve customer experience and conversion
Strata Hadoop Singapore 2016
Leading
e-commerce
platform in
South-East
Asia
Lazada Data Science
Data App Devs expose, integrate, platform-ize
Data Scientists explore, prepare, model
Data Engineers collect, store, maintain
Start from bottom up
Ranking
affects what
appears
on top
Ranking is
different
from recom-
mendation
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
“How can I
rank well on an
e-commerce
platform?”
Ranking products for
catalog and search
Introducing
new products
Emphasizing
product quality
Web Tracker
(JavaScript)
Mobile Tracker
(Adjust)
3rd
Party
(e.g. ,ZenDesk,
SurveyGizmo)
Kafka Queues
Bulk Loaders
(Spark)
Hadoop
Hadoop
Data
Exploration
+
Data
Preparation
+
Feature
Engineering
+
Modelling
(Spark)
Manual
Boosting
(Django)
Local
Validation
A/B
Testing
Product
Seller
Transaction
Product rankings
Split traffic and measure outcomes
(Category Managers)
(User devices)
Overall results
Better ranking improved conversion and revenue per session
Introducing new products improved new product engagement
Emphasizing product quality had neutral to positive outcomes
Ranking products for
catalog and search
Intent
Provide shoppers quick access to best products
in catalog/search results, making shopping easy
Problem
Lazada has millions of products—not easy to navigate
How to identify products that interest users in the future?
How do we measure interest?
Methodology
Measure shoppers’ interest through product
engagement as a proxy
Clicks, add-to-cart, checkouts, etc.
Predict future interest
Collecting
behavioral data
Track and collect events on web (JavaScript)
and app (Adjust)
Stream and process via Kafka
Store in Hive tables
Data preparation
Filter and categorize online behavioral events
(e.g., impressions, clicks, etc.)
Merge various views of product data (e.g. price, stock, etc.)
Exclude outliers and potentially fraudulent events
Feature engineering
Calculate product engagement metrics
(e.g., average clicks, conversion rate, etc.)
Derive product attributes (e.g., age, discount, etc.)
Exclude outliers (e.g., conversion rate > 1.00)
Modelling
(i.e., machine learning)
Predict future (tomorrow’s) product clicks/checkouts
Examine results against a benchmark model
Pandas + XGBoost is faster and more effective than
Spark + MLlib; assessing XGBoost4J-Spark
Boosting products
(manually)
Manually increase rank of certain products
(e.g., highly anticipated products, campaign tie-ups)
User-friendly interface to drag-and-drop products
Limits on how many products can be boosted
Validation and
A/B testing
Local validation is easy, but difficult to ensure
similar results via A/B testing
A/B test all updates before production
Results
Increased conversion rate by 3 – 8%
Increased revenue per session by 5 – 20%
Introducing
new products
Intent
Provide potentially good new products with exposure
Provide shoppers with new products they like
Keep catalog fresh
Problem
Products with strong engagement stay on top
Products without engagement don’t get traffic
How can we identify new products that are likely
to interest users?
Methodology (demand)
Find what people need
Measure needs through internal/external data
Rank new products in terms of demand
Methodology (supply)
Find products similar to top products
Measure similarity with top products
Rank new products based on similarity and top
product volume
Data preparation and
feature engineering
Parse (log) data to identify shoppers’ needs
Measure potential product demand
Model product similarity (Spark GraphX / ElasticSearch)
Validation and
A/B testing
Limited capability on existing A/B testing platforms
to track specific products
Measure performance of new products across
experimental groups using in-house tracker
Results
Increased new product click-thru rate by 30 – 80%
Increased new product add-to-cart by 20 – 90%
Expected overall conversion to decrease—increased
instead (though not statistically significant)
Emphasizing
product quality
Intent
Improve customer experience throughout purchase journey
From online browsing to receiving of product
Product quality identified as key driver
Problem
How do we measure product “quality”?
Methodology (online)
Content (e.g., title quality, richness of content)
Reviews (e.g., average rating, negative reviews)
Performance (e.g., click-thru rate, browsing time)
Methodology (offline)
Perfect order rate (i.e., not cancelled, not returned, etc.)
Negative feedback (e.g., counterfeit, complaints, etc.)
Seller metrics (e.g., timely shipped-rate, return rate, etc.)
Data preparation and
feature engineering
Derive product features (e.g., title quality, image quality, etc.)
Measure content richness (e.g., attributes available,
grouping, etc.)
Measure delivery performance and customer feedback
Results
Improved quality of products displayed
Increased conversion by 3 – 5% for some countries
Small conversion change in other countries (non-significant)
Key takeaways
Data science is (i) team sport, (ii) partly R&D, (iii) iterative
How you use data to solve problems (methodology), data
preparation, and feature engineering > machine learning
Thank you!
eugene.yan@lazada.com

How Lazada ranks products to improve customer experience and conversion