SlideShare a Scribd company logo
1 of 13
Download to read offline
Hardcore Data Science—
in Practice
Dr. Mikio L. Braun, Delivery Lead for Recommendation and Search

StrataConf 2016, London



	 mikio.braun@zalando.de

@mikiobraun

	 tech.zalando.com
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
• 15 countries, 3 warehouses, 16+
million customers, 3bn€ revenue in
2015, … 

• Heavily using data science for
recommendation
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Recommendations
Data Driven Recommendations
• Collaborative
filtering
• Content based
recommendation
• Personalised
recommendations
• …
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
For Example, One-pass Ranking Models
(Freno, Jenatton, Saveski, Archambeau, “One-Pass Ranking Models for Low-Latency Product
Recommendations”, KDD 2015)
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Hardcore Data Science to Production
• Usually one shot
computation
• Sometimes done
in Python
• Getting raw data
hard initially
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Production System
• Realtime system
• Usually done in Java/
JVM based
• Events and article data
continually upgraded
Data Science vs. Production
• A/B Test
offline
evaluation
• Iterate on data
science part
• Iterate on the
whole system!
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Data Scientists and Developers
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
DS&D: Coding
Very different
approaches to
coding…
← developers
data scientists →
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
DS&D: Collaboration
• What is the
most productive
way?
• Ideally, interface
on code, not just
documentation
• Production logs
often become
data analysis
input!
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Organization
• Cross-functional
teams
• Communication!
• Microservices, at
Zalando:

STUPS (Docker on
AWS)
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
Summary
• “Static” Data Analysis vs. Production: Real-time,
frequently update & monitor.
• Facilitate fast iteration of data analysis &
production system.
• Data Scientists and Developers: Different
approaches, find a common ground
• Organizations: Cross-functional teams, micro
services
Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London

More Related Content

What's hot

Data analytic for mobile app development
Data analytic for mobile app developmentData analytic for mobile app development
Data analytic for mobile app developmentTrieu Nguyen
 
Linkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious
 
GraphTour 2020 - Practical Applications of Neo4j 4.0
GraphTour 2020 - Practical Applications of Neo4j 4.0GraphTour 2020 - Practical Applications of Neo4j 4.0
GraphTour 2020 - Practical Applications of Neo4j 4.0Neo4j
 
Don’t Choose One Database Choose Them All!, Capgemini
Don’t Choose One Database Choose Them All!, CapgeminiDon’t Choose One Database Choose Them All!, Capgemini
Don’t Choose One Database Choose Them All!, CapgeminiNeo4j
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...Neo4j
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Neo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
Neo4j GraphTalks Oslo - Next Generation Solutions built on NeoejNeo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
Neo4j GraphTalks Oslo - Next Generation Solutions built on NeoejNeo4j
 
Neo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to GraphsNeo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to GraphsNeo4j
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j
 
Digital Graph tour Rome: "Connect the Dots, Lorenzo Speranzoni
Digital Graph tour Rome:  "Connect the Dots, Lorenzo SperanzoniDigital Graph tour Rome:  "Connect the Dots, Lorenzo Speranzoni
Digital Graph tour Rome: "Connect the Dots, Lorenzo SperanzoniNeo4j
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmersOutliers Collective
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?Mo Patel
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jIvan Zoratti
 
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Trieu Nguyen
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Enterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York City
Enterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York CityEnterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York City
Enterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York CityNeo4j
 

What's hot (20)

Data analytic for mobile app development
Data analytic for mobile app developmentData analytic for mobile app development
Data analytic for mobile app development
 
Linkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4j
 
GraphTour 2020 - Practical Applications of Neo4j 4.0
GraphTour 2020 - Practical Applications of Neo4j 4.0GraphTour 2020 - Practical Applications of Neo4j 4.0
GraphTour 2020 - Practical Applications of Neo4j 4.0
 
Don’t Choose One Database Choose Them All!, Capgemini
Don’t Choose One Database Choose Them All!, CapgeminiDon’t Choose One Database Choose Them All!, Capgemini
Don’t Choose One Database Choose Them All!, Capgemini
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Neo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
Neo4j GraphTalks Oslo - Next Generation Solutions built on NeoejNeo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
Neo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
 
Neo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to GraphsNeo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to Graphs
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
 
INEGI ESS big data workshop
INEGI ESS big data workshopINEGI ESS big data workshop
INEGI ESS big data workshop
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 
Digital Graph tour Rome: "Connect the Dots, Lorenzo Speranzoni
Digital Graph tour Rome:  "Connect the Dots, Lorenzo SperanzoniDigital Graph tour Rome:  "Connect the Dots, Lorenzo Speranzoni
Digital Graph tour Rome: "Connect the Dots, Lorenzo Speranzoni
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmers
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
 
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Enterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York City
Enterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York CityEnterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York City
Enterprise Ready: A Look at Neo4j in Production at Neo4j GraphDay New York City
 

Viewers also liked

Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
 
How We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityHow We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityZalando Technology
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Flink Forward
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFlink Forward
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
15 mashable social media day pietro pannone-15
15 mashable social media day   pietro pannone-1515 mashable social media day   pietro pannone-15
15 mashable social media day pietro pannone-15SMDayMi
 
Are you scaling smart or scaling towards failure?
Are you scaling smart or scaling towards failure?Are you scaling smart or scaling towards failure?
Are you scaling smart or scaling towards failure?Equiteq
 
Tcvb2 marco gomes_wireless
Tcvb2 marco gomes_wirelessTcvb2 marco gomes_wireless
Tcvb2 marco gomes_wirelessMarco Gomes
 
Всемирный день туризма
Всемирный день туризмаВсемирный день туризма
Всемирный день туризмаlibusue
 
InfoFlow: January 3rd, 2011
InfoFlow: January 3rd, 2011InfoFlow: January 3rd, 2011
InfoFlow: January 3rd, 2011Ajmal Pictures
 

Viewers also liked (20)

Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
How We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityHow We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards Scalability
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
15 mashable social media day pietro pannone-15
15 mashable social media day   pietro pannone-1515 mashable social media day   pietro pannone-15
15 mashable social media day pietro pannone-15
 
Are you scaling smart or scaling towards failure?
Are you scaling smart or scaling towards failure?Are you scaling smart or scaling towards failure?
Are you scaling smart or scaling towards failure?
 
Tcvb2 marco gomes_wireless
Tcvb2 marco gomes_wirelessTcvb2 marco gomes_wireless
Tcvb2 marco gomes_wireless
 
Arianna
AriannaArianna
Arianna
 
μυλοπετρα
μυλοπετραμυλοπετρα
μυλοπετρα
 
Всемирный день туризма
Всемирный день туризмаВсемирный день туризма
Всемирный день туризма
 
InfoFlow: January 3rd, 2011
InfoFlow: January 3rd, 2011InfoFlow: January 3rd, 2011
InfoFlow: January 3rd, 2011
 
Wish list for girls
 Wish list for girls  Wish list for girls
Wish list for girls
 
Startup AddVenture Budapest 2015 Pitch - Apptalk.Ninja
Startup AddVenture Budapest 2015 Pitch - Apptalk.NinjaStartup AddVenture Budapest 2015 Pitch - Apptalk.Ninja
Startup AddVenture Budapest 2015 Pitch - Apptalk.Ninja
 

Similar to Hardcore Data Science - in Practice

Marvin Platform – Potencializando equipes de Machine Learning
Marvin Platform – Potencializando equipes de Machine LearningMarvin Platform – Potencializando equipes de Machine Learning
Marvin Platform – Potencializando equipes de Machine LearningDaniel Takabayashi, MSc
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectOntotext
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsBen Blaiszik
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Debmalya Biswas
 
2016: Applying AI Innovation in Business
2016: Applying AI Innovation in Business2016: Applying AI Innovation in Business
2016: Applying AI Innovation in BusinessLeandro de Castro
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903Richard Hackathorn
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkAdaryl "Bob" Wakefield, MBA
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariKarissa Rae McKelvey
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Seeling Cheung
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureBig Data Spain
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
IoT as a metaphor!
IoT as a metaphor!IoT as a metaphor!
IoT as a metaphor!PG Madhavan
 

Similar to Hardcore Data Science - in Practice (20)

Marvin Platform – Potencializando equipes de Machine Learning
Marvin Platform – Potencializando equipes de Machine LearningMarvin Platform – Potencializando equipes de Machine Learning
Marvin Platform – Potencializando equipes de Machine Learning
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning Models
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
 
2016: Applying AI Innovation in Business
2016: Applying AI Innovation in Business2016: Applying AI Innovation in Business
2016: Applying AI Innovation in Business
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
 
Deploying ml
Deploying mlDeploying ml
Deploying ml
 
Using hadoop for big data
Using hadoop for big dataUsing hadoop for big data
Using hadoop for big data
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
IoT as a metaphor!
IoT as a metaphor!IoT as a metaphor!
IoT as a metaphor!
 

Recently uploaded

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
1C_PNS.pdf Philippines National standard
1C_PNS.pdf Philippines National standard1C_PNS.pdf Philippines National standard
1C_PNS.pdf Philippines National standardraffietividad53
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Recently uploaded (20)

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
1C_PNS.pdf Philippines National standard
1C_PNS.pdf Philippines National standard1C_PNS.pdf Philippines National standard
1C_PNS.pdf Philippines National standard
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

Hardcore Data Science - in Practice

  • 1. Hardcore Data Science— in Practice Dr. Mikio L. Braun, Delivery Lead for Recommendation and Search StrataConf 2016, London 
 mikio.braun@zalando.de @mikiobraun
 tech.zalando.com
  • 2. Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London • 15 countries, 3 warehouses, 16+ million customers, 3bn€ revenue in 2015, … • Heavily using data science for recommendation
  • 3. Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London Recommendations
  • 4. Data Driven Recommendations • Collaborative filtering • Content based recommendation • Personalised recommendations • … Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 5. For Example, One-pass Ranking Models (Freno, Jenatton, Saveski, Archambeau, “One-Pass Ranking Models for Low-Latency Product Recommendations”, KDD 2015) Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 6. Hardcore Data Science to Production • Usually one shot computation • Sometimes done in Python • Getting raw data hard initially Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 7. Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London Production System • Realtime system • Usually done in Java/ JVM based • Events and article data continually upgraded
  • 8. Data Science vs. Production • A/B Test offline evaluation • Iterate on data science part • Iterate on the whole system! Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 9. Data Scientists and Developers Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 10. DS&D: Coding Very different approaches to coding… ← developers data scientists → Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 11. DS&D: Collaboration • What is the most productive way? • Ideally, interface on code, not just documentation • Production logs often become data analysis input! Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 12. Organization • Cross-functional teams • Communication! • Microservices, at Zalando:
 STUPS (Docker on AWS) Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London
  • 13. Summary • “Static” Data Analysis vs. Production: Real-time, frequently update & monitor. • Facilitate fast iteration of data analysis & production system. • Data Scientists and Developers: Different approaches, find a common ground • Organizations: Cross-functional teams, micro services Mikio Braun, Hardcore Data Science in Practice, Strata+Hadoop World 2016, London