The Model and the Train Wreck - A Training Data How-To -- @mrogati's talk at Strata 2012
by Monica Rogati
- 6,172 views
Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative. ...
Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative.
… Or is it? You’ve probably learned an algorithm to run on top of your existing algorithm, now and every time you re-train. And what do you do when the data product you’re building doesn’t have any users yet? Do you really launch with random results, hand label 50K examples, or ask a Turker to pretend they’re User #1337?
Unlike having a better algorithm, having better training data can improve your results by orders of magnitude. Yet training data generation is often an afterthought—a footnote in a formula-filled publication.
In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation.
Accessibility
Categories
Upload Details
Uploaded via SlideShare as Microsoft PowerPoint
Usage Rights
© All Rights Reserved
Statistics
- Likes
- 4
- Downloads
- 0
- Comments
- 2
- Embed Views
- Views on SlideShare
- 6,006
- Total Views
- 6,172
1–2 of 2 previous next