LessonsLearned
LessonsLearned
MoreDatavs.BetterModels
Really?
Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
Sometimes, it’s
not about more
data
Norvig: “Google
does not have
better Algorithms
only more Data”
Many features/
low-bias models
Sometimes, it’s
not about more
data
YouMightnotneed
allyour“bigData”
○
○
Sometimesyoudoneed
aComplexModel
Itpaysofftobesmartabout
Hyperparameters
○
○
Supervisedvs.plus
UnsupervisedLearning
○
○
○
○
○
○
○
Everythingisanensemble
○
○
○
○
○
○
○
○
Theoutputofyourmodel
willbetheinputofanotherone
(andothersystemdesignproblems)
○
○
○
○
Thepains&gains
ofFeatureEngineering
○
○
○
○
○
○
○
○
Implicitsignalsbeat
explicitones
(almostalways)
○
○
○
○
○
○
○
bethoughtfulaboutyour
TrainingData
○
○
○
○
○
YourModelwilllearn
whatyouteachittolearn
○
○
○
○
○
Learntodealwith
PresentationBias
More likely
to see
Less likely
DataandModelsaregreat.Youknowwhat’
sevenbetter?
Therightevaluationapproach!
○
○
Youdon’tneedtodistribute
yourMLalgorithm
○
○
○
○
○
○
but,Ifyoudo,youshouldunderstandat
whatleveltodoit
The three levels of Distribution/Parallelization
● For each subset of the population (e.g. region)
● For each combination of the hyperparameters
● For each subset of the training data
Each level has different requirements
ANN Training over distributed GPU’s
somethingsarebetterdone Online and
othersoffline…and,thereis Nearlinefor
everythinginbetween
System Overview
● Blueprint for multiple
personalization algorithm
services
● Ranking
● Row selection
● Ratings
● …
● Recommendation involving
multi-layered Machine
Learning
Matrix Factorization Example
Thetwofacesofyour
MLinfrastructure
○
○
○
○
○
○
○
○
○
○
○
○
Whyyoushouldcareabout
answeringquestions(aboutyourmodel)
○
○
○
Theuntoldstoryof
DataScienceandvs.MLengineering
○
○
○
○
○
○
○
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned

Barcelona ML Meetup - Lessons Learned