SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

The Model and the Train Wreck - A Training Data How-To -- @mrogati's talk at Strata 2012

by Data Scientist at LinkedIn on Mar 12, 2012

  • 6,934 views

Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative. ...

Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative.

… Or is it? You’ve probably learned an algorithm to run on top of your existing algorithm, now and every time you re-train. And what do you do when the data product you’re building doesn’t have any users yet? Do you really launch with random results, hand label 50K examples, or ask a Turker to pretend they’re User #1337?

Unlike having a better algorithm, having better training data can improve your results by orders of magnitude. Yet training data generation is often an afterthought—a footnote in a formula-filled publication.

In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation.

Statistics

Views

Total Views
6,934
Views on SlideShare
6,714
Embed Views
220

Actions

Likes
6
Downloads
0
Comments
2

10 Embeds 220

http://www.linkedin.com 69
http://lanyrd.com 67
http://invervegascore.blogspot.co.nz 42
https://twitter.com 23
http://jwjeong.com 13
https://abs.twimg.com 2
http://us-w1.rockmelt.com 1
http://invervegascore.blogspot.com 1
http://a0.twimg.com 1
https://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • mrogati Monica Rogati, Data Scientist at LinkedIn Video: http://www.youtube.com/watch?v=F7iopLnhDik 1 year ago
    Are you sure you want to
    Your message goes here
    Processing…
  • mrogati Monica Rogati, Data Scientist at LinkedIn My slides are very image-driven (and there's some animation), so if you'd like to understand the talk you can follow along w/ the speaker notes (see tab above). 2 years ago
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Model and the Train Wreck - A Training Data How-To -- @mrogati's talk at Strata 2012 The Model and the Train Wreck - A Training Data How-To -- @mrogati's talk at Strata 2012 Presentation Transcript