As the world’s largest professional network, LinkedIn is subject to a barrage of fraudulent and/or abusive activity aimed at its member-facing products. LinkedIn’s Security Data Science team is tasked with detecting bad activity and building proactive solutions to keep it from happening in the first place. In this talk we explore various types of abuse we see at LinkedIn and discuss some of the solutions we’ve built to defend against them. We focus on ways bad actors can enter the site: fake accounts and account takeover. Some common themes include:
- Precision/recall tradeoffs: No model is 100% accurate, so we must always make a call on where to draw the line when flagging accounts or activity as abusive. What’s the cost of labeling a good member as bad vs. labeling a bad member as good?
- Online/offline tradeoffs: Online models can stop fraudulent activity before it has a chance to gain traction; offline models can use more data and cast a wider net, while also requiring less engineering effort to build. For any given abuse pattern, we must consider whether we can detect and stop the activity in real-time and also whether it’s worth the effort to do so.
- Machine learning vs. heuristic rules: Machine-learned models can be very powerful, but they also require sufficient well-labeled training data and are more difficult to maintain. Heuristic (though still data-driven!) rules can often achieve 90% of the goal with 10% of the effort — but how do you tell when this is the case?