Daniel Takabayashi
Lucas Bonatto
From exploratory models to production
BIG DATA WEEK
SÃO PAULO 2017
Daniel Takabayashi
daniel.takabayashi@b2wdigital.com@
linkedin.com/in/danieltakabayashi
github.com/takabayashi
IT Manager @ B2W Digital
Lucas Bonatto
lucas.bonatto@b2wdigital.com@
linkedin.com/in/lucasbonatto
twitter.com/lucasbonatto
github.com/lucasbm88
IT Specialist @ B2W Digital
B2W Digital: e-commerce leader in LatAm
Source: 2016 Results from ri.b2w.digital
Total GMV (R$)
12,458 MM
Market share (%)
26,2%
The Digital Platform that
connects People,
Businesses,Products
and Services.
Outline
• Context
• Data-driven culture
• Artificial Intelligence
• Domains of knowledge
• Problem Statement
• Marvin
• Main components
• Architecture
• DASFE pattern
• General features
• Case
• Roadmap
Context: data-driven culture
Single source of truth
Data dictionary
Broad data access
Data literacy
Decision making
Why is it important to
be data-driven?
Context: data-driven culture
Optimize decision-making
Context: Artificial Intelligence
Machine Learning NLP Computer Vision
• Buy Box
optimization
• Forecast demand
• Fraud detection
• Adspend
optimization
• Feature extraction
from product
description
• Product category
classification
• Image matching to
find associated
products
Context: domains of knowledge
Context: model lifecycle
Models Management
Support & Feedback
Model ServingExploration & Development
Problem statement
How can we abstract the complexity in
the creation of an AI application?
Building AI projects is not a simple task.
One is required to have advanced
knowledge in different domains.
GitHub.com/marvin-ai
Artificial Intelligence Platform
Marvin Artificial Intelligence Platform
Empowers data science teams to deliver
AI applications, simplifying the process
of exploitation and modeling.
Marvin: main components
ENGINE EXECUTOR
ENGINE
Data acquisitor
Prediction preparator
Predictor
Trainingpreparator
Feedback
Trainer
TOOLBOX
Evaluator
Marvin: context diagram
Marvin: architecture
Marvin: DASFE pattern
Batch Data Acquisition
& Cleaning
Training
Preparation
Model
Training
Model
Evaluation
Marvin: DASFE pattern
Online Prediction
Preparation
Model
Prediction
Batch Data Acquisition
& Cleaning
Training
Preparation
Model
Training
Model
Evaluation
Marvin: DASFE pattern
Online Prediction
Feedback
Online Prediction
Preparation
Model
Prediction
Batch Data Acquisition
& Cleaning
Training
Preparation
Model
Training
Model
Evaluation
Marvin: walkthrough
Marvin: general features
• Training pipeline REST interface
• Experiment and artifacts versioning
• Engine project scaffold generator
• Data sampling and import CLI
• Engine test framework (unit, functional, dryrun)
• Toolbox: Python support
• Artifacts persistence layer: HDFS support
• Remote provisioning and deployment
Case: risk analysis model
• XGBoost in python
• Dataset: 1,0 M of orders
• Training pipeline: 15 min
• REST HTTP predictions: 15 ms
• Load test: 100 rps w/ 15 ms mrt
Case: how marvin helped the team?
“... Jupyter notebook integration with Spark through Marvin’s
toolbox lib was very helpful during prototyping phase...”
“... the data importation utility speeds up
data collection and sampling... ”
“... it was easy to do feature engineering, feature selection and
model choice using the DASFE model... ”
“... we automated the training and deployment phase without
having a dev/ops in our team...”
Marvin: roadmap
• Admin module
• Toolbox: Java and Scala support
• Feedback server
• Artifacts persistence layer: S3 and local FS support
• Remote provisioning and deployment: Azure, AWS and GCP
• Customized notebook kernel
• Automate feature engineering
• Hyper parameters support
• ML for no-data scientists
• …
Artificial Intelligence Platform
Fork me on GitHub.com/marvin-ai
and feel free to contribute!
Thank you!
@
GitHub.com/marvin-ai
twitter.com/_marvin_ai
marvin-ai@googlegroups.com

Marvin Platform – Potencializando equipes de Machine Learning