5 Reputation Missteps (And how to avoid them)


Published on

Web2.0 Expo presentation from F Randall Farmer and Bryce Glass, authors of the O'Reilly / Yahoo! Press book "Building Web Reputation Systems."

This talk addresses five common fallacies in designing your site or community's reputation system.

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • [Greetings – I’m Randy Farmer and my co-author/presenter is Bryce Glass] Randy has been building online reputation systems – everything from infrastructure level platforms up to user-incentive systems - for at least 20 years and Bryce has been focused on the UX side of reputation systems for the last five. We first worked together at Yahoo! Almost all of that distilled knowledge is in our new book from O'Reill'y and Yahoo! Press: Building Web Reputation Systems. We wrote this book on-line in wiki form and there is a companion blog at http://buildingreputation.com During our time working with dozens of product mangers, engineering teams, and we've built and deployed dozens of reputation models and surveyed at least a hundred more. Our key observation is this – reputation systems play a critical role in the operation of all web businesses, sites, and even the internet itself.
  • We’ve encountered many recurring misconceptions, mis-perceptions, and logical fallacies. Today we're going to speak about our top-five – the things we've encountered repeatedly – and that have the potential to do a large amount of damage to any reptuation project if these errors were to go uncorrected. “ Significant negative repercussions to your project going forward.”
  • The most common confusion when discussion this topic is the very definition of the word reputation . When you ask “What is reputation?”, the quick response most people give is that it's all about the people . “Who's a good guy,?” “Who's a bad guy?” “Can I trust him or her ?”
  • Though we often say reputation is only about people, a story like the one depicted here usually helps people realize that the definition is much more subtle than that. For the sake of those who are too far back, we'd like to share the text of this slide – we promise that it is the only slide we'll actually read to you. :-) Meet Robert and Bill. Robert has taken Bill to a business dinner at a quaint little Italian place in Boston that'd he'd heard about from a coworker. After ordering, Robert recalls from a previous discussion that Bill has a college age son, and shares some thrilling news:
  • Robert: “ Our Wendy will be going to Harvard next year! ”
  • Bill: “ Really! I’m curious — why Harvard? ”
  • Robert: “ Why, it has the best reputation—especially for law. ”
  • Bill: “ Did she consider Yale? ”
  • Robert:” Yes, but we like Harvard’s proximity. ”
  • Bill:” Won’t that be expensive? ”
  • Robert:“ We’ll make tradeoffs if we have to — it’s worth it for my little girl! ” Although there was no direct statement that any person was good or bad - this conversation is packed with reputation! In fact, the entire context of the conversation is, itself, layered with reputation as well. This simple exchange demonstrates the power of reputation in our everyday lives. Reputation is pervasive and inescapable. It's a critical tool that enables us to make decisions, both large (like Harvard versus Yale ) and small ( What restaurant would impress my client for dinner tonight? ) Robert and Bill's conversation also yields other insights into the nature of reputation. From this example, you can probably see that people have reputation, but so do things : Categories of food, specific restaurants, and colleges were only the more obvious non-people objects in our play. This observation suggests a definition:
  • Reputation is information used to make a value judgment about an object or a person.
  • Reputation is information used to make a value judgment about an object or a person. We go a bit further and formalize this definition into a simple formulation which we call the Reputation Statement: [Build] A Source makes a Claim about a Target. Here’s an example from our play: [Build] Bill (the source) says that Harvard (the Target) is expensive (the Claim.)
  • Here's are some of the many reputation statements represented by the story we just shared. Honestly, there are probably dozens more second-order and third-order reputation statements lingering here, but you see our point: “ Reputation is Everywhere” - Even before we enter the world of the web, we can't live without it! Imagine how little you could get done if you had to investigate every single option yourself to make a decision. Nothing would ever get done. On the sensory- and relationship-impoverished web, this is even more the case. We need reputation for everything – every page, every user, every movie, every link, even other reputations themselves! People are constantly asking themselves questions like: “Can I trust the ratings on YouTube?” So – if reputation is any claim by a source about a target – what about people-reputation – it seems that is a special category, right?
  • Don't worry! I promised we wouldn't read any more slides! Although we've demonstrated that most reputation is not about people directly, clearly there is people-centric reputation. We give this a separate designation, borrowing from the Hindi word Karma. Why a special name? Because we've discovered a number of best-practices and rules-of-thumb that apply to Karma alone. We don't have time to go into all of them here, but I'll share the list from a recent blog post of ours. Search for Top-Line Karma and you'll find it. Three of these rules will get consideration in this presentation today since many of the largest misconceptions about reputation are actually about the nature and uses of on-line karma. For now, know that adding karma to a website should be considered a perilous endeavor, not undertaken lightly or unarmed.
  • So, instead of Reputation being “All about people” it should be that “Reputation is Everywhere!” Reputation is the primary method for sorting through the good, bad, and ugly on the web. And when we talk about people centric reputation: We'll call that Karma. “ Reputation is hard, and Karma is harder!” – and as promised we now begin to show why…
  • Do you think that someone’s Ebay Seller Feedback Score should follow them onto Facebook? Should Slashdot karma matter over on Reddit? There have been more than a few startups built on the belief that one grand-unified karma is enough to accurately convey “the measure of a man” for all his or her dealings on the web.
  • The idea of a all-web (or global) reputation is pervasive – at Yahoo we consistently encountered executives that wanted to create one: a proxy for “real person” or “good citizen” reputation system for use across the entire web.. “ It'll be like a Credit Score for the internet!” Hmm. Lets see – how about we unpack that a bit...
  • A credit score is the global reputation that has the single greatest impact on the economic transactions in your life. Several credit scoring systems and agencies exist in the United States, but the prevalent reputation tool in the world of creditworthiness is the FICO credit score devised by the company Fair Isaac. The lessons we learn from the FICO score apply nearly verbatim to reputation systems on the Web. This pie-chart shows the factors that go into a FICO score which was always intended to be a reasonable representation of something we can call creditworthiness . For most of its history of more than 50 years, the nature of this calculation was secret, and your score was difficult to access. It took a major event, such as the purchase of a home to even learn your FICO score. At the time, this obscurity was considered a benefit to vendors and lenders. But an error in your score could go undetected for months-or even years-with potentially deleterious effects on your cash flow: increased interest rates, decreased credit limits, and higher lending fees. As access to credit reports has increased, the credit bureaus have kept pace with the trend and have steadily marketed the reports for a growing number of purposes. More and more transaction-based businesses have started using them (primarily the FICO score) for less and less relevant evaluations. In addition to their original purpose-establishing the terms of a credit account-credit reports are now used by landlords for the less common but somewhat relevant purpose of risk mitigation when renting a house or apartment, and by some businesses to run background checks on prospective employees-a legal but unreasonably invasive requirement. In the United States, it is routinely used for employee credit screening – in effect, preventing someone who has bad credit from getting a job that could help them pay down their credit and increase their FICO. FICO has migrated from a narrow-context, one of credit worthiness, to a global karma – influencing other critical decisions that don’t have any direct feedback into the model!
  • Global Web reputation scores are so powerful and easily accessible that the temptation to apply them outside of their original context is almost irresistible. As with the FICO score, it is a bad idea to co-opt a reputation system for another purpose, and it dilutes the actual meaning of the score in its original context. The eBay Feedback score reflects only the transaction worthiness of a specific account, and it does so only for particular products bought or sold on eBay. The user behind that identity may in fact steal candy from babies, cheat at online poker, and fail to pay his credit card bills. Even eBay displays multiple types of reputation ratings within its singular limited context. There should be no “web FICO” (global karma) because there is no kind of reputation statement that can be legitimately applied to all contexts. And that's the key insight...
  • Reputation should always be in context – not just Karma, which we now hope is self-evident, but all object reputation as well. As homework, might I suggest you compare the average movie ratings on Netflix and Yahoo! Movies? There are significant differences – can you puzzle out how the context is different between the sites? Speaking of star ratings, I think my partner Bryce has a few thoughts to share about those as well…
  • Ratings input mechanisms fall into, and out of, vogue. Once it was 5-Stars, then it was Digg-style upvoting, now Facebook’s ‘Like’ holds the crown. If you’re tempted to start with a ratings scheme in mind, and then back-design a system to justify it, proceed with caution. (Includes a bonus fallacy: “Of course I need a down-vote!”)
  • Interface first design Started as MB site – diamonds in the rough, Great Dishes in not-so-great restaurants Bought twice – now CBS Added stars! [Small volume of Reviews] Wrong context (should be food items?)
  • So let’s look at some ineffective ratings data. These graphs represent the ratings distributions across a number of popular Yahoo! properties. It’s the distributions we care about — ratings volume is irrelevant for this discussion. Several factors may contribute to this: Positive Acquiescence Bias. (People tend to rate things favorably anyway.) The average Star-rating on the internet is 4.3 Stars (WSJ, October 2009 — http://online.wsj.com/article/SB125470172872063071.html )… But it is likely the context that contributes as well. With the notable exception of Autos Custom (Look at that nice ‘W’ distribution!) all of the other properties use 5 stars to rate canned, feed content that users probably don’t have much of a personal relationship with. They are non-committal. In fact, this lesson has been learned most recently by…
  • Until recently, this was the ratings mechanism for YouTube videos: a 5-star input mechanism. This video’s received 331 ratings, and is averaging 5 stars , when rounded. This is not an exception, in fact, it was the norm on YouTube…
  • It’s our old friend, the J-Curve distribution, “the overwhelming majority of videos on YouTube have a stellar five-star rating” Beats the industry average with 4.85 stars
  • “ the ratings system is primarily being used as a seal of approval, not as an editorial indicator of what the community thinks about a video.”
  • So YouTube has recently gone to a simpler “Like” mechanism. Notice, too, that the YouTube designers have included a downvote…. CLICK Re-indexed back to the 5-point scale, the average for this video figures out to 4.75.
  • Downplay the Downvote •  The temptation is to include a Downvote simply because there’s an UP. This is symmetrical thinking, but not BALANCED. As we’ll discuss later, negative karma (a downvote) carries significantly heavier weight than positive (an upvote.) Always, always consider leaving the downvote off — provide BALANCE instead by providing other means to dispute a claim.
  • Feeling inspired by game-like elements? Levels, points and leaderboards? It’s all the rage.
  • “ Members without any pre-existing friends on the site had little chance to earn points unless they literally campaigned for them in the comments, encouraging point whoring. Members with lots of friends on the site sat in unimpeachable positions on the scoreboards, encouraging elitism. People became stressed out that they were not earning enough points, and became frustrated because they had no direct control over their scores.” From “I Love my Chicken Wire Mommy” – a blog post by Ben Brown Effective at Incentivizing part of your community. (But which part? And who will be turned off instead?) Effective on a psychological level at influencing individual behavior, but social software must take group behaviors into account. Applying the trappings of competition to domains where they don’t make sense.
  • This overly-simplified diagram is a useful tool when thinking about how to map your user’s motivations to reputation systems. In our book we detail three basic classes of incentives: Altruistic, which roughly maps to the left end of this scale, Egocentric, which is more typically represented on the right end, And Commercial, which tends torward the center – the positioning isn’t so important, what is important is realizing that often designs that optimize for one group of users, dis-incentivzes another.
  • Karma problems escalate to competition, and eventually to abuse…
  • What’s the best way to identify the bad actors in your community? We call this ‘negative karma’ - Again the initial impulse is to simply label them as such for the entire community to see – use the same tools that highlight the good users to flush out the bad ones too… (as long as we keep it all “in context” right?)
  • Because an underlying karma score is a number, product managers often misunderstand the interaction between numerical values and online identity. The thinking goes something like this: * In our application context, the users' value will be represented by a single karma, which is a numerical value. * There are good, trustworthy users and bad, untrustworthy users, and everyone would like to know which is which, so we will display their karma. * We should represent good actions as positive numbers and bad actions as negative, and we'll add them up to make karma. * Good users will have high positive scores (and other users will interact with them), and bad users will have low negative scores (and other users will avoid them).
  • Let me share a cautionary tale about displaying negative karma – it’s about the Sim’s Mafia. The Sims Online was a multiplayer version of the popular Sims games by Electronic Arts and Maxis in which the user controlled an animated character in a virtual world with houses, furniture, games, virtual currency (called Simoleans), rental property, and social activities. You could call it playing dollhouse online. This blue-ringed face represents a user on the Sim’s Online. One of the features that supported user socialization in the game was the ability to declare that another user was a trusted friend. The feature involved a graphical display that showed the faces of users who had declared you trustworthy outlined in green, attached in a hub-and-spoke pattern to your face in the center.
  • [We’re presenting a cleaned up and stylized version of the graphics here, if you’d really like to see what it looked like, you can find screen captures on the web.] People checked each other's hubs for help in deciding whether to take certain in-game actions, such as becoming roommates in a house. Decisions like these are costly for a new user – the ramifications of the decision stick with a newbie for a long time, and backing outof a bad decision is not an easy thing to do. The hub was a useful decision-making device for these purposes. That feature was fine as far as it went, but unlike other social networks, The Sims Online allowed users to declare other users un trustworthy too. The face of an untrustworthy user appeared circled in bright red among all the trustworthy faces in a user's hub. It didn't take long for a group calling itself the Sims Mafia to figure out how to use this mechanic to shake down new users when they arrived in the game. The dialog would go something like this: "Hi! I see from your hub that you're new to the area. Give me all your Simoleans or my friends and I will make it impossible to rent a house.” "What are you talking about?" "I'm a member of the Sims Mafia, and we will all mark you as untrustworthy, turning your hub solid red (with no more room for green), and no one will play with you. You have five minutes to comply. If you think I'm kidding, look at your friends-hub” Often this entailed the mobster to actually teach the user how to display their hub, which they would do, and then the hapless new user might see something like this:
  • “ Don't worry, we'll turn it green when you pay…" If you think this is a fun game, think again-a typical response to this shakedown was for the user to decide that the game wasn't worth $10 a month. Playing dollhouse doesn't usually involve gangsters.
  • * There can be no negative public karma-at least for establishing the trustworthiness of active users. A bad enough public score will simply lead to that user's abandoning the account and starting a new one, a process we call karma bankruptcy. This setup defeats the primary goal of karma-to publicly identify bad actors. Assuming that a karma starts at zero for a brand-new user that an application has no information about, it can never go below zero, since karma bankruptcy resets it. Just look at the record of eBay sellers with more than three red stars-you'll see that most haven't sold anything in months or years, either because the sellers quit or they're now doing business under different account names. * It's not a good idea to combine positive and negative inputs in a single public karma score. Say you encounter a user with 75 karma points and another with 69 karma points. Who is more trustworthy? You can't tell: maybe the first user used to have hundreds of good points but recently accumulated a lot of negative ones, while the second user has never received a negative point at all. If you must have public negative reputation, handle it as a separate score (as in the eBay seller feedback pattern). Even eBay, with the most well-known example of public negative karma, doesn't represent how untrustworthy an actual seller might be-it only gives buyers reasons to take specific actions to protect themselves. In general, avoid negative public karma. If you really want to know who the bad guys are, keep the score separate and restrict it to internal use by moderation staff.
  • Just because we say you shouldn't publicly display negative karma, we don't mean you can't use reputation systems to catch the bad guys. Chapter 10 of Building Web Reputation Systems is a detailed case-study of the complete history of developing a corporate (never displayed) karma-based reputation model to shut down trolls and spammers on Yahoo! Answers. The general idea was to allow any user to report content as bad, and keep track of how accurate they were in a secret Abuse Reporter Karma. When enough reports from enough people with good Reporter Karma came in, we’d automatically believe them and remove the content from the site. We built the system summarized in the flow diagram here, and hoped to decrease the 18+ hour turnaround time to about an hour, and then we wouldn’t have to pay quite so many customer support staff to look through all those reports. What we found once we released it amazed us -
  • We reduced the turnaround time to 30 seconds! The bad content was off of the site almost immediately after it was added! A 2000x improvement! We also found that the users were more significant accruate than staff but we now needed less than 1% of customer care staff. We were saving $1million per year. Clearly, karma can effect your bottom line.
  • So – what did we learn about negative karma? “ Karma is complex – made of indirect inputs...” not direct people evaluations. Beware reputation bankruptcy... “Public Karma is almost always positive.” Positive karma can be used to incentivize desirable behavior and, in some contexts, even increase content quality. But, what about the our original common misconception, that Karma Will Out the Bad Guys?
  • In fact – Negative karma *can* out the bad guys, but only if you keep it completely secret . A Yahoo! Security expert once told us that they consider this the very best use of reptuation systems – secretly determining who is being bad, and taking subtle countermeasures.
  • So, in summary - reputation is always in context and hard, and karma is even harder!
  • Read the book or ebook, either online at the website, online etailer, or local bookstore. We’ve covered less than 1% of the content therein.  Thanks for having us here today. Any questions?
  • 5 Reputation Missteps (And how to avoid them)

    1. 5 Reputation Missteps (and how to avoid them) <ul><li>F Randall Farmer </li></ul><ul><li>Bryce Glass </li></ul>
    2. Reputation Systems Are Everywhere
    3. Accepted Reputation Wisdom <ul><li>“ It’s the People, Dummy” </li></ul><ul><li>“ One Reputation to Rule Them All” </li></ul><ul><li>“ All I Need is Five Stars” </li></ul><ul><li>“ Competition is Always Good” </li></ul><ul><li>“ Negative Karma Will Out the Bad Guys” </li></ul>
    4. It’s the People, Dummy!
    5. Reputation in Action Hi! Bill! How’s the family, Robert?
    6. Our Wendy will be going to Harvard next year!
    7. Really! I’m curious — why Harvard?
    8. Why, it has the best reputation—especially for law.
    9. Did she consider Yale?
    10. Yes, but we like Harvard’s proximity.
    11. Won’t that be expensive?
    12. We’ll make tradeoffs if we have to — it’s worth it for my little girl!
    13. Reputation is information used to make a value judgment about an object or a person.
    15. Bill & Robert’s Entire Conversation (and Meal!) Are Made of Reputation Statements
    16. People reputation is Karma , and it is special <ul><li>Karma is user reputation within a context </li></ul><ul><li>Karma is useful for building trust between users, and between a user and the site </li></ul><ul><li>Karma can be an incentive for participation and contributions </li></ul><ul><li>Karma is contextual and has limited utility globally. </li></ul><ul><li>Karma comes in several flavors - Participation, Quality and Robust </li></ul><ul><li>Karma is complex (via indirect evaluations), and formulation is often opaque </li></ul><ul><li>Personal karma is displayed only to the owner, good for measuring progress </li></ul><ul><li>Corporate karma is used by the site to find the very best and very worst users </li></ul><ul><li>Public karma is displayed to other users, which makes it the hardest to get right </li></ul><ul><ul><li>It should be used sparingly </li></ul></ul><ul><ul><li>It is hard to understand, isn't expected, and easily confused with content ratings. </li></ul></ul><ul><ul><li>It shouldn't be have a socially negative value. </li></ul></ul><ul><ul><li>It encourages competition in some users, and may discourage others. </li></ul></ul>
    17. Reputation is… … information used to make a value judgment about an object or a person. … everywhere! … hard, but karma is harder!
    18. One Karma To Rule Them All!
    19. Global Karma? We need it, Right?
    20. Case Study: FICO as Global Karma
    21. Lots of Little Karmas vs. One Big Score
    22. So… Reputation is always in context!
    23. All I Need Is Five Stars … Or Thumbs Up … Or ‘Digg-style’ Voting … Or a ‘Like’ Button
    24. Interface-First Design
    25. The Classic “J” Curve
    26. Consider
    27. Look Familiar? http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html
    28. “ Great videos prompt action; anything less prompts indifference. Thus, the ratings system is primarily being used as a seal of approval, not as an editorial indicator of what the community thinks about a video.” — Shiva Rajaraman, Product Manager
    29. I’m In Like With You
    30. Do You Need a Downvote? From How To Build Dioramas by Sheperd Paine
    31. So…
    32. Instead… Let the Context Determine the Inputs Be Sparing — Ask for Only What You Need Downplay the Downvote
    33. Competition is Always Good!
    34. “ We built a point system into Consumating because we thought giving direct feedback to people about their conduct on the site would encourage them to be nice to one another—you get a thumbs up when you are nice (treat!), and a thumbs down when you are a douche (electric shock!)” — Ben Brown, Internet Rockstar
    35. The Competitive Spectrum
    36. So… Competition is fine if the context calls for it Don’t assume competition where there is none. Competition for Karma will generally escalate.
    37. Negative Karma Will Out the Bad Guys
    38. The Good Guys and The Bad Guys
    39. Cautionary Tale: A Virtual Mafia Shakedown
    40. Cautionary Tale: A Virtual Mafia Shakedown
    41. Cautionary Tale: A Virtual Mafia Shakedown
    42. Karma Meaninglessness and Bankruptcy
    43. Secret Karma: Yahoo! Answers Case Study Table_12-1: Yahoo! Answers community content moderation system results Metric Baseline Goal Result Improvement Mean time: from Report to Removal 18 hours 1 hour - - Report evaluation error rate 10% 10% - - Customer care costs 100% $1 million per year 10% $100,000 per year - -
    44. Secret Karma: Yahoo! Answers Case Study Table_12-1: Yahoo! Answers community content moderation system results Metric Baseline Goal Result Improvement Mean time: from Report to Removal 18 hours 1 hour 30 seconds 120x goal 2000x baseline Report evaluation error rate 10% 10% <0.1% 100x goal/baseline Customer care costs 100% $1 million per year 10% $100,000 per year <0.1% <$10,000 per year 10x goal 100x baseline Saved > $990,000/yr
    45. So… Karma is Complex (built of indirect inputs) Public Karma is Positive Karma
    46. Karma is Complex (built of indirect inputs) Public Karma is Positive Karma Secret Karma can Out the Bad Guys
    47. Summary
    48. Any Questions?