Fighting Spam at Flickr

1,400 views
1,259 views

Published on

Slides for my "Fighting Spam at Flickr" session from Web2Expo 2010 in San Francisco

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,400
On SlideShare
0
From Embeds
0
Number of Embeds
180
Actions
Shares
0
Downloads
11
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide



  • I am living proof that you can work at a photosite, own a really nice camera and still take really crappy photos. Simon couldn’t make because he got married this past weekend and is off on his honeymoon.






  • Fighting spam can be very depressing






  • Whenever you “optimize” an email, you’re optimizing it for the spammers as well
    Mom example - not great with computers, only uses Flickr when I send something along. Likely to assume that any mail from Flickr is from me.
  • * other sites spam detection - it all looks like “flickr.com” to them!





  • Good, you’re popular. but that also means more spam



  • You know all that work you did to make sure your emails get delivered? The spammers thank you.
  • JUST the storm botnet
  • story about spamhaus













  • An important point: once the spam leaves your site, it damages your site’s reputation on other sites trying to combat spam - namely email providers
  • The amount of work involved in dealing with the aftermath of a largescale spam attack when operating this way is insane. Engineers, support staff, ops - everyone is just doing manual, tedious work. Deleting accounts, going through user reports etc.



  • Thank Simon for fighting the good fight
  • Thank Simon for fighting the good fight
  • Thank Simon for fighting the good fight


  • ( I started out as a tools guy)
    Your tools should be very clear and easy to use
    allow for easy batch operations








  • how long can you delay sending a message? in most cases, quite a bit of time; things like comments have to show up immediately, but you can delay the email notification.







  • Fighting Spam at Flickr

    1. 1. Finding True Love on the Internet With Matthew Rothenberg and Stewart Butterfield
    2. 2. Just kidding. This is much less exciting.
    3. 3. Fighting Spam at Flickr Your Spam. Our Balls. Mikhail Panchenko and Simon Batistoni
    4. 4. Spammers • Numerous • Diverse • Inventive • Ubiquitous - if there’s a textbox with an implied recipient, they will spam it.
    5. 5. A simpler time • Sending spam is an incredibly complicated scheme these days • Highly distributed bot nets of unsuspecting, heterogenous machines • The result of a long long arms race • That means that combatting it is complicated as well
    6. 6. Skynet is here • Bots/scripts are able to signup for accounts (including filling out captcha), log into Flickr, upload photos, set their buddy icon, and start sending spam. • You can also buy these accounts in bulk...
    7. 7. The Harsh Truth Someone whose time is really cheap is constantly working to send spam through your site
    8. 8. http://icanhascheezburger.com/2008/01/22/funny-pictures-sisyphus-cat-tries-again/ - see more Lolcats and funny pictures
    9. 9. "The struggle itself...is enough to fill a man's heart. One must imagine Sisyphus happy." Albert Camus, The Myth of Sisyphus
    10. 10. ... but there’s hope; we’ll get to that later
    11. 11. Social Sites as Gateways
    12. 12. Social Sites as Gateways • User-generated content
    13. 13. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole”
    14. 14. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole” • Email notifications for said content
    15. 15. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole” • Email notifications for said content • Relationship based, trust inducing
    16. 16. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole” • Email notifications for said content • Relationship based, trust inducing • Mom gets excited any time she gets an email from Flickr
    17. 17. What Trust Means • Something familiar that a user is used to opening • Increases the likelihood that a user will open the email and perform whatever it is that you want them to • Piggybacking on the research and work done by the site itself!
    18. 18. More on Trust • Very easy to lose - other services will blackhole mail coming from your domains • Users stop coming • Very hard to regain - the burden of proof ends up entirely on you
    19. 19. The Answer is Simple
    20. 20. The Answer is Simple Don’t let users generate content!
    21. 21. The Economics of Spam ( an excuse to pretend to use my degree )
    22. 22. The Demand: sites want exposure, sometimes at any cost The Supply: trusted message gateways
    23. 23. The Demand: sites want exposure, sometimes at any cost The Supply: trusted message gateways the broken part - someone else is selling your gateway
    24. 24. Econ 101
    25. 25. Econ Continued The more well-known your site gets, the higher the demand for your message delivery mechanism - more likely a recipient will actually open the message
    26. 26. ANOTHER GRAPH!
    27. 27. In a Perfect World
    28. 28. Some Numbers "Spamalytics: An Empirical Analysis of Spam Marketing Conversion" C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. Voelker,V. Paxson, and S. Savage. 15th ACM Conference on Computer and Communications Security (CCS), 27-31 October 2008, Alexandria,VA. http://www.icsi.berkeley.edu/pubs/ networking/2008-ccs-spamalytics.pdf
    29. 29. Some Numbers • 0.0000081% overall conversion rate • 28 conversions for every 347,590,389 emails attempted
    30. 30. Where we fit • Only ~25% of the attempted emails sent were actually accepted by the mail server ( first step in the funnel ) • Using a social site as a gateway almost guarantees a higher number • A whole lot of effort goes into making sure notifications get delivered
    31. 31. Put some $$ on it • $3.5 million dollars of revenue in a year • 5% increase in delivery rate = $175,000/yr
    32. 32. They figured this out
    33. 33. Back to Trust • This can’t be ignored • Remember, once you lose that trust, it’s a long way back up • As you lose your trustworthiness as a message gateway, the spammers go away • ... but so do the users
    34. 34. Fighting Back
    35. 35. Traditional Prevention
    36. 36. Traditional Prevention • Captchas
    37. 37. Traditional Prevention • Captchas • Mass Signup detection using IPs
    38. 38. Traditional Prevention • Captchas • Mass Signup detection using IPs • Rate Limiting
    39. 39. these are mostly good things, and it certainly doesn’t hurt to have them ... however ...
    40. 40. A Confession
    41. 41. A Confession I almost always have to type a captcha code twice
    42. 42. A Confession I almost always have to type a captcha code twice Bots consistently evolve to solve incrementally complex variations
    43. 43. A Confession I almost always have to type a captcha code twice Bots consistently evolve to solve incrementally complex variations Draw your conclusions
    44. 44. Photo from http://www.flickr.com/photos/azkid2dc
    45. 45. The Tension
    46. 46. The Tension • Want to be able to allow users to send messages and generally enjoy themselves
    47. 47. The Tension • Want to be able to allow users to send messages and generally enjoy themselves • Don’t want to make it too easy to send spam
    48. 48. The Tension • Want to be able to allow users to send messages and generally enjoy themselves • Don’t want to make it too easy to send spam • Traditional prevention techniques like captchas result in epic degradation of UX and ultimately end up ineffective
    49. 49. Traditional Response • User reports • Manual account removal • Manual message cleanup • except you can’t clean up the email once it’s sent • Manually Adding patterns to a list of things to filter • Engineers running mass deletion/cleanup scripts
    50. 50. Photo from http://www.flickr.com/photos/mekin/
    51. 51. What a Waste • Responding to incidents this way is a huge drain on resources and morale • That’s time your team could be spending on projects, features, being happy...
    52. 52. The Alternative A holistic, comprehensive approach
    53. 53. The Alternative A holistic, comprehensive approach ( aka “take this shit seriously” )
    54. 54. Make Time
    55. 55. Make Time • Product teams might be reticent to put spam on the roadmap and dedicate resources to it
    56. 56. Make Time • Product teams might be reticent to put spam on the roadmap and dedicate resources to it • ... until you miss a bunch of deadlines because you’re too busy cleaning up spam
    57. 57. Make Time • Product teams might be reticent to put spam on the roadmap and dedicate resources to it • ... until you miss a bunch of deadlines because you’re too busy cleaning up spam • ... and your notifications aren’t being delivered because you’re blacklisted
    58. 58. Develop a Strategy • A spam attack is no different than a typical DoS or outage - you need a plan • Figure out what data you need and whether or not you already have it • Figure out ways to consolidate and automate the work
    59. 59. Build your Tools • Make things reusable • a user should look the same in all tools • tools that show lists of users should reuse the same logic for batch ops • Leave a consistent trail
    60. 60. Look at the Big Picture • Your tools should be very well integrated • your user report tools should pop suspected accounts into review tools • deleting accounts and messages should automatically close user report cases
    61. 61. The goal is to be able to have one person look at a single tool, make decisions, and go back to sleep
    62. 62. Photo from http://www.flickr.com/photos/dreamcicle
    63. 63. ... but we can get close!
    64. 64. Work Smart
    65. 65. Work Smart • Spam is limited to going from one user on your site to another user on your site • That forces certain behavior patterns - know what those are for your site
    66. 66. Work Smart, continued • If you have some obstacles at signup time (captcha, mass signup detection), you can pretty much expect two things: • a slow trickle of signups (to get around signup- time mass signup checks) • a sudden surge of messages • Constant “under the radar” trickle doesn’t make sense - if you delete the accounts after a few user reports, they don’t get their payload sent
    67. 67. Work Smart, continued You know a LOT about your users by default • The signup - when, where • Relationships are key • You can see what’s happening globally • patterns are important • The message contents are less helpful, and really, less important
    68. 68. Examine What You Send • Separate the act of sending a message from the actual delivery • Obviously doesn’t work with all content • Queue up messages at some reasonable interval instead of sending them instantly • Examine what’s in the queue before sending it out
    69. 69. Clustering is your friend • Cluster the messages in the queue using as many characteristics as possible • Doing this will make most spam look really obvious • Fairly straight forward to implement ( don’t need a massive cluster or Hadoop, at least initially )
    70. 70. Clustering Scores • (I’m sure there’s a more scientific term for this) • The size of the cluster a particular message belongs to as a percentage of the total number of messages • Example: if you have 200 messages and a message falls into a cluster of 10, that message’s cluster score for that particular characteristic is 5 (10/200 = .05 = 5%)
    71. 71. Example Signup Date Score Signup IP Score Message 1 5 3 Message 2 4 8 Message 3 6 12 Message 4 7 4 Message 5 6 10 Message 6 20 19 Message 7 20 20 Message 8 20 19 Message 9 20 19 Message 10 20 20
    72. 72. Example Signup Date Score Signup IP Score Message 1 5 3 Message 2 4 8 Message 3 6 12 Message 4 7 4 Message 5 6 10 Message 6 20 19 Message 7 20 20 Message 8 20 19 Message 9 20 19 Message 10 20 20
    73. 73. JACKPOT Photo from http://www.flickr.com/photos/aresauburnphotos/
    74. 74. The Tough Questions • What do you do with this information? • Just how much can you automate? • We’re still looking for that balance
    75. 75. Further Reading • http://www.icsi.berkeley.edu/pubs/ networking/2008-ccs-spamalytics.pdf • http://www.slideshare.net/ hadoopusergroup/mail-antispam

    ×