How Lazada ranks products to improve customer experience and conversion

How Lazada
ranks products
to improve customer experience and conversion
Strata Hadoop Singapore 2016

Leading
e-commerce
platform in
South-East
Asia

Lazada Data Science
Data App Devs expose, integrate, platform-ize
Data Scientists explore, prepare, model
Data Engineers collect, store, maintain
Start from bottom up

Ranking
affects what
appears
on top

Ranking is
different
from recom-
mendation

“How can I
rank well on an
e-commerce
platform?”

Ranking products for
catalog and search
Introducing
new products
Emphasizing
product quality

Web Tracker
(JavaScript)
Mobile Tracker
(Adjust)
3rd
Party
(e.g. ,ZenDesk,
SurveyGizmo)
Kafka Queues
Bulk Loaders
(Spark)
Hadoop
Hadoop
Data
Exploration
+
Data
Preparation
+
Feature
Engineering
+
Modelling
(Spark)
Manual
Boosting
(Django)
Local
Validation
A/B
Testing
Product
Seller
Transaction
Product rankings
Split traffic and measure outcomes
(Category Managers)
(User devices)

Overall results
Better ranking improved conversion and revenue per session
Introducing new products improved new product engagement
Emphasizing product quality had neutral to positive outcomes

Ranking products for
catalog and search

Intent
Provide shoppers quick access to best products
in catalog/search results, making shopping easy

Problem
Lazada has millions of products—not easy to navigate
How to identify products that interest users in the future?
How do we measure interest?

Methodology
Measure shoppers’ interest through product
engagement as a proxy
Clicks, add-to-cart, checkouts, etc.
Predict future interest

Collecting
behavioral data
Track and collect events on web (JavaScript)
and app (Adjust)
Stream and process via Kafka
Store in Hive tables

Data preparation
Filter and categorize online behavioral events
(e.g., impressions, clicks, etc.)
Merge various views of product data (e.g. price, stock, etc.)
Exclude outliers and potentially fraudulent events

Feature engineering
Calculate product engagement metrics
(e.g., average clicks, conversion rate, etc.)
Derive product attributes (e.g., age, discount, etc.)
Exclude outliers (e.g., conversion rate > 1.00)

Modelling
(i.e., machine learning)
Predict future (tomorrow’s) product clicks/checkouts
Examine results against a benchmark model
Pandas + XGBoost is faster and more effective than
Spark + MLlib; assessing XGBoost4J-Spark

Boosting products
(manually)
Manually increase rank of certain products
(e.g., highly anticipated products, campaign tie-ups)
User-friendly interface to drag-and-drop products
Limits on how many products can be boosted

Validation and
A/B testing
Local validation is easy, but difficult to ensure
similar results via A/B testing
A/B test all updates before production

Results
Increased conversion rate by 3 – 8%
Increased revenue per session by 5 – 20%

Intent
Provide potentially good new products with exposure
Provide shoppers with new products they like
Keep catalog fresh

Problem
Products with strong engagement stay on top
Products without engagement don’t get traffic
How can we identify new products that are likely
to interest users?

Methodology (demand)
Find what people need
Measure needs through internal/external data
Rank new products in terms of demand

Methodology (supply)
Find products similar to top products
Measure similarity with top products
Rank new products based on similarity and top
product volume

Data preparation and
feature engineering
Parse (log) data to identify shoppers’ needs
Measure potential product demand
Model product similarity (Spark GraphX / ElasticSearch)

Validation and
A/B testing
Limited capability on existing A/B testing platforms
to track specific products
Measure performance of new products across
experimental groups using in-house tracker

Results
Increased new product click-thru rate by 30 – 80%
Increased new product add-to-cart by 20 – 90%
Expected overall conversion to decrease—increased
instead (though not statistically significant)

Intent
Improve customer experience throughout purchase journey
From online browsing to receiving of product
Product quality identified as key driver

Problem
How do we measure product “quality”?

Methodology (online)
Content (e.g., title quality, richness of content)
Reviews (e.g., average rating, negative reviews)
Performance (e.g., click-thru rate, browsing time)

Methodology (offline)
Perfect order rate (i.e., not cancelled, not returned, etc.)
Negative feedback (e.g., counterfeit, complaints, etc.)
Seller metrics (e.g., timely shipped-rate, return rate, etc.)

Data preparation and
feature engineering
Derive product features (e.g., title quality, image quality, etc.)
Measure content richness (e.g., attributes available,
grouping, etc.)
Measure delivery performance and customer feedback

Results
Improved quality of products displayed
Increased conversion by 3 – 5% for some countries
Small conversion change in other countries (non-significant)

Key takeaways
Data science is (i) team sport, (ii) partly R&D, (iii) iterative
How you use data to solve problems (methodology), data
preparation, and feature engineering > machine learning

Thank you!
eugene.yan@lazada.com

How Lazada ranks products to improve customer experience and conversion

More Related Content

What's hot

Similar to How Lazada ranks products to improve customer experience and conversion

More from Eugene Yan Ziyou

Recently uploaded

How Lazada ranks products to improve customer experience and conversion