SlideShare a Scribd company logo
Data Science Challenges
and Impact @Lazada
Big Data & Analytics Innovation Summit Singapore 2018
#1 Shopping
Site in SEA
145,000 sellers
3,000 brands
Lazada Data
Data App Devs expose, integrate, platform-ize
Data Scientists explore, prepare, model
Data Engineers collect, store, maintain
Start from bottom up
Considerations and
Challenges
How much business
input/overriding?
Trade-off: Manual human input vs. automated algorithms
Necessary to some extent, but harmful if overdone
Technically, manual input and rules are difficult to maintain
How much business
input/overriding?
Example: Manual override of product ranking on the site
Allows category managers to incorporate their domain
knowledge (e.g., new product releases, trending, etc.)
Nonetheless, too much manual overriding reduced metrics
Conducted AB tests to find optimal level of manual overriding
How fast is “too fast”?
Trade-off: Development speed vs. production stability
You can move faster without building tooling/abstractions, code
reviews, automated testing, repaying technical debt, documentation
But in the long run, they save time and effort
FB: “Move fast and break things” -> “Move fast with stable infra”
How fast is “too fast”?
Moar features!
Quick POC
Automation,
testing, tooling,
clear tech debt
Environment in place
Project size
Effort
Production
Dev SpeedStability
Less effort and faster =)
More effort and slower =(
Dev, dev, dev
Development effort over the long run
How fast is “too fast”?
Example: 8 man team, 10 problems—mostly focused on delivery
In the first two years, the team achieved a lot and proved our worth
Nonetheless, as we matured and had to maintain more production
code, investing in iteration speed and code quality had high ROI
How to set priorities with
business?
Trade-off: Short-term vs. long term
Business understands best what is needed, though can be overly
focused on day-to-day ops and near term goals
Data science is aware of the latest research and can innovate, but
risks being detached from business needs
How to set priorities with
business?
Example: Timebox-ed skunkworks resulting in POCs
Data leadership sponsored some POCs that were hacked together
in 2 – 4 weeks—some eventually made it into production
Nonetheless, the focus is on research and innovation that can be
applied to improve the online shopping experience
Development and
Impact
Automated Review QC
Product
Review
API
Spam
Classification
General
Classification
Model-based
Data sources
Rule-based
Keywords
Spam
Characteristics
Review
API
Manual QC
Input and post-processing
Audit
Overall results
Significant manpower cost savings (5-figures monthly)
Existing workforce can be diverted to difficult-to-automate tasks
Reduced lead-time before reviews are live on site
Product Ranking
Ranking
affects what
appears
on top
Ranking is
different
from recom-
mendation
Web Tracker
(JavaScript)
Mobile Tracker
(Adjust)
3rd Party
(e.g. ,ZenDesk,
SurveyGizmo)
Kafka Queues
Bulk Loaders
(Spark)
Hadoop
Hadoop
Data
Exploration
+
Data
Preparation
+
Feature
Engineering
+
Modelling
(Spark)
Manual
Boosting
(Django)
Local
Validation
A/B
Testing
Product
Seller
Transaction
Product rankings
Split traffic and measure outcomes
(Category Managers)
(User devices)
Overall results
Better ranking improved conversion (3 – 8%) and revenue per
session (5 – 20%)
Introducing new products improved new product engagement
(CTR increased 30 – 80%; add-to-cart increased 20 – 90%)
Emphasizing product quality had neutral to positive outcomes
(reduced return rate; increased product net promoter score)
Key takeaways
There is no single best answer to the challenges raised—it
depends on the maturity stage of the team and organization
Data science > Coding + Machine Learning—many other
activities contribute greatly to the final impact
Thank you!
eugene.yan@lazada.com
Our culture: http://bit.ly/datascienceculture
How we rank products: http://bit.ly/how-lazada-ranks-products

More Related Content

What's hot

Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Apache spark
Apache sparkApache spark
Apache spark
TEJPAL GAUTAM
 
A timeline view of Evolution of Analytics
A timeline view of Evolution of AnalyticsA timeline view of Evolution of Analytics
A timeline view of Evolution of Analytics
Saurabh Banerjee
 
WAND Top-k Retrieval
WAND Top-k RetrievalWAND Top-k Retrieval
WAND Top-k Retrieval
Andrew Zhang
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
Neo4j
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
Matteo Manca
 
Personalized search
Personalized searchPersonalized search
Personalized search
Toine Bogers
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
Ganesh Venkataraman
 
7 steps to Predictive Analytics
7 steps to Predictive Analytics 7 steps to Predictive Analytics
7 steps to Predictive Analytics
Coforge (Erstwhile WHISHWORKS)
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
zouzias
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
Xiaoqian Liu
 
MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...
MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...
MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...
Francesco Cabiddu
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
Harshavardhan851231
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
Sujoy Bag
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
Luay AL-Assadi
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
Alice Zheng
 

What's hot (20)

Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Apache spark
Apache sparkApache spark
Apache spark
 
A timeline view of Evolution of Analytics
A timeline view of Evolution of AnalyticsA timeline view of Evolution of Analytics
A timeline view of Evolution of Analytics
 
WAND Top-k Retrieval
WAND Top-k RetrievalWAND Top-k Retrieval
WAND Top-k Retrieval
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Personalized search
Personalized searchPersonalized search
Personalized search
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
7 steps to Predictive Analytics
7 steps to Predictive Analytics 7 steps to Predictive Analytics
7 steps to Predictive Analytics
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...
MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...
MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 

Similar to Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
AgileNetwork
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...
Alan Kan
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...
Alan Kan
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
Edgar Alejandro Villegas
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthHypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Prabhat Gupta
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Startup Product Development
Startup Product DevelopmentStartup Product Development
Startup Product Development
Aaron Stannard
 
Future directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factorsFuture directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factors
Varun Luthra
 
Npi with bpm webinar
Npi with bpm webinarNpi with bpm webinar
Npi with bpm webinarAisurya Puhan
 
RAD Lab Overview v04
RAD Lab Overview v04RAD Lab Overview v04
RAD Lab Overview v04Daniel Grbac
 
Building Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACLBuilding Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACL
Jim Kaplan CIA CFE
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Triggr In
 
Best Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentBest Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentMarc Nehme
 
Designing a to be process
Designing a to be processDesigning a to be process
Designing a to be process
Ihor Malytskyi
 
Gov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/OverviewGov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/Overview
Splunk
 
Improving Speed to Market in E-commerce
Improving Speed to Market in E-commerceImproving Speed to Market in E-commerce
Improving Speed to Market in E-commerce
Cognizant
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
Sanjeev Sharma
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
DataScienceConferenc1
 

Similar to Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018) (20)

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthHypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Startup Product Development
Startup Product DevelopmentStartup Product Development
Startup Product Development
 
Future directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factorsFuture directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factors
 
Npi with bpm webinar
Npi with bpm webinarNpi with bpm webinar
Npi with bpm webinar
 
RAD Lab Overview v04
RAD Lab Overview v04RAD Lab Overview v04
RAD Lab Overview v04
 
Building Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACLBuilding Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACL
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
Best Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentBest Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight Deployment
 
CIS 499 Final
CIS 499 FinalCIS 499 Final
CIS 499 Final
 
Designing a to be process
Designing a to be processDesigning a to be process
Designing a to be process
 
Gov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/OverviewGov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/Overview
 
Improving Speed to Market in E-commerce
Improving Speed to Market in E-commerceImproving Speed to Market in E-commerce
Improving Speed to Market in E-commerce
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 

More from Eugene Yan Ziyou

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
Eugene Yan Ziyou
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrix
Eugene Yan Ziyou
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
Eugene Yan Ziyou
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
Eugene Yan Ziyou
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data Science
Eugene Yan Ziyou
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data Science
Eugene Yan Ziyou
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Eugene Yan Ziyou
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
Eugene Yan Ziyou
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)
Eugene Yan Ziyou
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Eugene Yan Ziyou
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
Eugene Yan Ziyou
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG community
Eugene Yan Ziyou
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Eugene Yan Ziyou
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)
Eugene Yan Ziyou
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Eugene Yan Ziyou
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
Eugene Yan Ziyou
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
Eugene Yan Ziyou
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the US
Eugene Yan Ziyou
 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brands
Eugene Yan Ziyou
 

More from Eugene Yan Ziyou (19)

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrix
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data Science
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data Science
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG community
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the US
 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brands
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

  • 1. Data Science Challenges and Impact @Lazada Big Data & Analytics Innovation Summit Singapore 2018
  • 2. #1 Shopping Site in SEA 145,000 sellers 3,000 brands
  • 3. Lazada Data Data App Devs expose, integrate, platform-ize Data Scientists explore, prepare, model Data Engineers collect, store, maintain Start from bottom up
  • 5. How much business input/overriding? Trade-off: Manual human input vs. automated algorithms Necessary to some extent, but harmful if overdone Technically, manual input and rules are difficult to maintain
  • 6. How much business input/overriding? Example: Manual override of product ranking on the site Allows category managers to incorporate their domain knowledge (e.g., new product releases, trending, etc.) Nonetheless, too much manual overriding reduced metrics Conducted AB tests to find optimal level of manual overriding
  • 7. How fast is “too fast”? Trade-off: Development speed vs. production stability You can move faster without building tooling/abstractions, code reviews, automated testing, repaying technical debt, documentation But in the long run, they save time and effort FB: “Move fast and break things” -> “Move fast with stable infra”
  • 8. How fast is “too fast”? Moar features! Quick POC Automation, testing, tooling, clear tech debt Environment in place Project size Effort Production Dev SpeedStability Less effort and faster =) More effort and slower =( Dev, dev, dev Development effort over the long run
  • 9. How fast is “too fast”? Example: 8 man team, 10 problems—mostly focused on delivery In the first two years, the team achieved a lot and proved our worth Nonetheless, as we matured and had to maintain more production code, investing in iteration speed and code quality had high ROI
  • 10. How to set priorities with business? Trade-off: Short-term vs. long term Business understands best what is needed, though can be overly focused on day-to-day ops and near term goals Data science is aware of the latest research and can innovate, but risks being detached from business needs
  • 11. How to set priorities with business? Example: Timebox-ed skunkworks resulting in POCs Data leadership sponsored some POCs that were hacked together in 2 – 4 weeks—some eventually made it into production Nonetheless, the focus is on research and innovation that can be applied to improve the online shopping experience
  • 15. Overall results Significant manpower cost savings (5-figures monthly) Existing workforce can be diverted to difficult-to-automate tasks Reduced lead-time before reviews are live on site
  • 19. Web Tracker (JavaScript) Mobile Tracker (Adjust) 3rd Party (e.g. ,ZenDesk, SurveyGizmo) Kafka Queues Bulk Loaders (Spark) Hadoop Hadoop Data Exploration + Data Preparation + Feature Engineering + Modelling (Spark) Manual Boosting (Django) Local Validation A/B Testing Product Seller Transaction Product rankings Split traffic and measure outcomes (Category Managers) (User devices)
  • 20. Overall results Better ranking improved conversion (3 – 8%) and revenue per session (5 – 20%) Introducing new products improved new product engagement (CTR increased 30 – 80%; add-to-cart increased 20 – 90%) Emphasizing product quality had neutral to positive outcomes (reduced return rate; increased product net promoter score)
  • 21. Key takeaways There is no single best answer to the challenges raised—it depends on the maturity stage of the team and organization Data science > Coding + Machine Learning—many other activities contribute greatly to the final impact
  • 22. Thank you! eugene.yan@lazada.com Our culture: http://bit.ly/datascienceculture How we rank products: http://bit.ly/how-lazada-ranks-products