Hardcore Data Science—
in Practice
Dr. Mikio L. Braun, Delivery Lead for Recommendation and Search
StrataConf 2016, London
mikio.braun@zalando.de
@mikiobraun
tech.zalando.com
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
• 15 countries, 3 warehouses, 16+
million customers, 3bn€ revenue in
2015, …
• Heavily using data science for
recommendation
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Recommendations
Data Driven Recommendations
• Collaborative
filtering
• Content based
recommendation
• Personalised
recommendations
• …
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
For Example, One-pass Ranking Models
(Freno, Jenatton, Saveski, Archambeau, “One-Pass Ranking Models for Low-Latency Product
Recommendations”, KDD 2015)
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Hardcore Data Science to Production
• Usually one shot
computation
• Sometimes done
in Python
• Getting raw data
hard initially
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Production System
• Realtime system
• Usually done in Java/
JVM based
• Events and article data
continually upgraded
Data Science vs. Production
• A/B Test
offline
evaluation
• Iterate on data
science part
• Iterate on the
whole system!
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Data Scientists and Developers
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
DS&D: Coding
Very different
approaches to
coding…
← developers
data scientists →
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
DS&D: Collaboration
• What is the
most productive
way?
• Ideally, interface
on code, not just
documentation
• Production logs
often become
data analysis
input!
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Summary
• “Static” Data Analysis vs. Production: Real-time,
frequently update & monitor.
• Facilitate fast iteration of data analysis &
production system.
• Data Scientists and Developers: Different
approaches, find a common ground
• Organizations: Cross-functional teams, micro
services
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London