Hardcore Data Science—
in Practice
Dr. Mikio L. Braun, Delivery Lead for Recommendation and Search

StrataConf 2016, London



	 mikio.braun@zalando.de

@mikiobraun

	 tech.zalando.com
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
• 15 countries, 3 warehouses, 16+
million customers, 3bn€ revenue in
2015, … 

• Heavily using data science for
recommendation
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Recommendations
Data Driven Recommendations
• Collaborative
filtering
• Content based
recommendation
• Personalised
recommendations
• …
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
For Example, One-pass Ranking Models
(Freno, Jenatton, Saveski, Archambeau, “One-Pass Ranking Models for Low-Latency Product
Recommendations”, KDD 2015)
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Hardcore Data Science to Production
• Usually one shot
computation
• Sometimes done
in Python
• Getting raw data
hard initially
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Production System
• Realtime system
• Usually done in Java/
JVM based
• Events and article data
continually upgraded
Data Science vs. Production
• A/B Test
offline
evaluation
• Iterate on data
science part
• Iterate on the
whole system!
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Data Scientists and Developers
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
DS&D: Coding
Very different
approaches to
coding…
← developers
data scientists →
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
DS&D: Collaboration
• What is the
most productive
way?
• Ideally, interface
on code, not just
documentation
• Production logs
often become
data analysis
input!
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Organization
• Cross-functional
teams
• Communication!
• Microservices, at
Zalando:

STUPS (Docker on
AWS)
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Summary
• “Static” Data Analysis vs. Production: Real-time,
frequently update & monitor.
• Facilitate fast iteration of data analysis &
production system.
• Data Scientists and Developers: Different
approaches, find a common ground
• Organizations: Cross-functional teams, micro
services
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London

Hardcore Data Science - in Practice

  • 1.
    Hardcore Data Science— inPractice Dr. Mikio L. Braun, Delivery Lead for Recommendation and Search StrataConf 2016, London 
 mikio.braun@zalando.de @mikiobraun
 tech.zalando.com
  • 2.
    Mikio Braun, HardcoreData Science in Practice, Strata+Hadoop World 2016, London • 15 countries, 3 warehouses, 16+ million customers, 3bn€ revenue in 2015, … • Heavily using data science for recommendation
  • 3.
    Mikio Braun, HardcoreData Science in Practice, Strata+Hadoop World 2016, London Recommendations
  • 4.
    Data Driven Recommendations •Collaborative filtering • Content based recommendation • Personalised recommendations • … Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 5.
    For Example, One-passRanking Models (Freno, Jenatton, Saveski, Archambeau, “One-Pass Ranking Models for Low-Latency Product Recommendations”, KDD 2015) Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 6.
    Hardcore Data Scienceto Production • Usually one shot computation • Sometimes done in Python • Getting raw data hard initially Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 7.
    Mikio Braun, HardcoreData Science in Practice, Strata+Hadoop World 2016, London Production System • Realtime system • Usually done in Java/ JVM based • Events and article data continually upgraded
  • 8.
    Data Science vs.Production • A/B Test offline evaluation • Iterate on data science part • Iterate on the whole system! Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 9.
    Data Scientists andDevelopers Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 10.
    DS&D: Coding Very different approachesto coding… ← developers data scientists → Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 11.
    DS&D: Collaboration • Whatis the most productive way? • Ideally, interface on code, not just documentation • Production logs often become data analysis input! Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 12.
    Organization • Cross-functional teams • Communication! •Microservices, at Zalando:
 STUPS (Docker on AWS) Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 13.
    Summary • “Static” DataAnalysis vs. Production: Real-time, frequently update & monitor. • Facilitate fast iteration of data analysis & production system. • Data Scientists and Developers: Different approaches, find a common ground • Organizations: Cross-functional teams, micro services Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London