Challenges design, implementing, and operating reputation systems.
[Graphics by Bryce]I’ve been in what we now call Social Media for 35+ years. BBSes, Forums, Virtual Worlds, MMOs, Online Markets, Social Networks, Mobile Real-time – I’ve been on the cutting edge of it all… You know the old joke about recognizing a pioneer by counting the arrows in his back? I’m thinking about opening an archery range with my collection. Despite my best efforts, I keep collecting new ones…Designed Identity, Social Graph, Abuse mitigation, and Reputation platforms and features. Redesigns for Yahoo! Groups and many other products.
We wrote a book about web reputation systems for product managers and designers, the direct result of decades of practical experience with online reputation systems, culminating in our work together at Yahoo! Where Randy was Directory of Community Platform Strategy, and Bryce was Sr. User Interface Designer.There is not yet a consensus on terminology, but for today’s talk we’re going to stick with the definitions we used in our book…YES – eBay Feedback is KarmaCONTEXT CONTEXTCONTEXTCONTEXT
So many rep systems because there is too much stuff coming on… Data coming faster than we can filter it.
There are bad reputation systems. We’ve made some very bad ones.This talk will hit the very top of several challenges and fallacies related to practitioner's use of Web Reputation Systems – All of these are drawn from our own direct experiences by building and operating these systems or participating in their operation, or in their creators own words (via interviews or public writings.)When deploying this stuff out in the field, time-to-market trumps all. And Product Managers don’t read research.
There are different contexts that bringspressure to bear on reputation system design.
People doing stuff for selfish reasons (which is good…)
This is the pattern most visible today. An accumulator.What does it mean? What does it actually do? [Example - Politics]Accumulator, Bookmarking, Friend-recommend‘Likes’ are wildly successful, though they suffer from the Sentiment problem. Or: ‘What exactly IS a ‘Like’? Does it mean: I like this; I agree with this; I am feeling sympathy for you; I am taking pleasure in your discomfort; “Digital reputation scores, by construction, are limited in the subtly of their inputs - only numbers go in, and those numbers usually come from simple user actions, such as “Marked an Item as Favorite”. One might think that this input would always be interpreted as a positive effect on reputation-but in practice this assumption is incorrect. Usually marking an item as favorite also provides bookmarking semantics, allowing the user to easily return to the item later. It turns out that users who report content as violating a sites Terms of Service often mark the item as a favorite in order to make it easy to find out when the item is removed or changed! Thus, digital reputation scores often don't accurately represent the sentiment that was intended.”Would like to point out that ‘building a better input’ may not be the sole solution here. While inputs may be impoverished, the trend – if anything – has been to make them increasingly SIMPLER. (See the evolution of inputs: Full Rating & Review 5 Star Rating Thumbs Up / Down Digg Like.) User experience is a gating factor.
In our definitions slide we said: “Reputation about a person (physical or legal) has special features and pitfalls” – here’s a list of some of the karma related challenges we detail in our book.Unique challenges to Karma (vs. Object Reputation)
Company executives get all hot and bothered about the idea of a unified reputation score. Roll everything up into a number and call it “Person Trust Rank” or something, and make a fortune, just like Google! Let me explain why this is impossible using FICO as an example.---As access to credit reports has increased, the credit bureaus have kept pace with the trend and have steadily marketed the reports for a growing number of purposes. More and more transaction-based businesses have started using them (primarily the FICO score) for less and less relevant evaluations. In addition to their original purpose-establishing the terms of a credit account-credit reports are now used by landlords for the less common but somewhat relevant purpose of risk mitigation when renting a house or apartment, and by some businesses to run background checks on prospective employees-a legal but unreasonably invasive requirement. FICO has migrated from a narrow-context, one of credit worthiness, to a global karma – influencing other critical decisions that don’t have any direct feedback into the model!
Public Karma is Identity, am extension of the self. Manifestation of the person. People are sensitive about this.Low scores are a challenge that leads to karmic bankruptcy.BUT… people are funny. They tend to identify almost as strongly with the things they produce as they do with their own identity.Personalization – we take it personal.
Dating site. Wanted engagement. Copied other sites thumbs up/down. Product managers world-round all do this…From “I Love my Chicken Wire Mommy” – a blog post by Ben Brown
“Members without any pre-existing friends on the site had little chance to earn points unless they literally campaigned for them in the comments, encouraging point whoring. Members with lots of friends on the site sat in unimpeachable positions on the scoreboards, encouraging elitism. People became stressed out that they were not earning enough points, and became frustrated because they had no direct control over their scores.”Disaster. Site died a horrible death. Changed the meaning of the site. The evaluations became meaningless.
Individual- behavior centric things like Ratings/Likes/Badges/Points (aka gamification) may initially seem effective on a psychological level at influencing individual behavior, but social software must take group behaviors into account.After all, for sites that are based on user-generated content, the point of that content is for others to use – often in groups attempting to accomplish some goal or interact with some community, even if just to schedule a holiday party.
Individual Motivations mapping to… [build]“The Competitive Spectrum” of Community Temperaments - Caring: Cancer, Collaborative: eBay, Cordial: Bloggers, Competitive: Zynga, Combative: XboxLiveConsumating was a mismatch. Ego centric inputs/motivations vs. collaborative temperament.This is a very hard problem to design around: how do you acknowledge the spirit or ‘temperament’ of a community, but still allow for individual motivations to hold sway? How do you design a reputation system that makes VALUE JUDGEMENTS while still acknowledging that other people in the community may hold different values than you?
While at Yahoo!, we spent a lot of time espousing this idea: that reputation, skillfully used, could lead to a Virtuous Circle of user-generated content and participation.[Talk while building]
We worry MORE, however, about this scenario: The Vicious Spiral.Points for arbitrary actions *HarietKrausener @amazon 7-books/day. *Consumaing point whoringFactions form as things aren’t matching individual goalsBabby is popular!
Loss control of community. This is Not Abuse for the community, but not useful for Yahoo!. They were hoping for Wikipedia and got puerile humor instead.Keep your hand on the wheel!The needs of the community sometimes conflict with the needs of the corporation.
Web reputation is always in a “corporate context” – someone is rolling up the scores and using them to make decisions about something. With Web reputation, this is usually a Site but not always a commercial company.
This is a simplification of eBay’s main virtuous circle – Sellers and Buyers transacting through listings and evaluations.Successful virtuous circles generate excess “heat” that can be extracted as revenue.Most of eBay’s revenue is from listing and transaction-related services in this circle.
But things get complicated when there are more interested parties in the transaction….Three parties leads - Sell premium packages, advertising via listings to consumers (taking it two ways.)Tricky to balance - Yelp lawsuits, cries of ‘extortion.’ Wait – “You are allowing people to rate me, then calling me to fix it!”Accused of extracting direct value from the store owners – trying to pull money out of BOTH SIDES OF THE CHAINThe Third-party problem: when the reputation credentials are issued by a party with a vested interest, then who do you trust?
CONTEXT!Another challenge is using technology that works well in one context isn’t usually good for another (unless discovered through use!)Doctors are NOT restaurant owners. Their businesses are not the same. Yelp will need to adopt.This happened to StackOverflow/StackExchange as well..
Composite feedback score is simple accumulator and hides the value by treating positive and negative a computationally equal.Clearly not how people do this in their heads. 100p – 5neg is not the same as 95positives. So this number is a basically transaction counter.So what is a single negative feedback rating worth, in US dollars?[Dropshipper case]
Google’s Virtuous CircleVery successful at extracting value from this circlePublishers create content (in the form of web pages)Users link to the content (evaluations, ie. likes)Google provides keyword search to consumers and PageRank to the publishers. Google extracts value by selling information about both to advertisers.Down-chain marking the ads and leaking value. ---- A story about Answers.comContent farms. SEO – a multi-billion dollar industry.
There is a last context “outside” the individual companies… There’s stuff between them…
First instinct on the web: Map the Real World.Amazon examples. Mad Kindle owners.There’s something else going on!The pre-web history of stars for hotels has different meanings.When we design reputation systems, generally we're talking about numerical scores and small snippets of text. But, compared to real world reputation, these computational models are almost too simple to even merit using the same word. Real life reputation is subtle, personally defined, and internally evaluated - most of the time it is inherently non-numerical! Digitalreputation's sense of value is concrete (numerical), globally defined for all, brutally simple to calculate, and usually publicly shared - pretty clumsy in comparison.
Forward in Reputation Society – Craig Neumark intro called out Honestly.comCompared to LinkedIn – anonymous reviewers – model the idea of old-school reference check. Added user-targets without permissions (importing LinkedIn and Facebook graphs.)Negative reviews – lawsuits. Finally only reviews were positive – investors and bloggers who solicit positive reviews.[Click for Death]
The idea that public karma for users may actually displace other traditional political influence factors. Specifically, a hope that karma will displace money as the measure of power. Reputation is user actions – software can simulate that – subject by abuse by construction – even if you stop bots, you can pay people to click stuff. Crowdsourcing as SEO.Specialized reputations are great, but general ones are a mess and shouldn’t be used for any real influence.Stomp Wuffie! Reputation doesn’t replace money, it replaces money? Down and Out in the Magic Kingdom, Cory Doctorow. Spendable karma.It already exists: MoneyThe future looks a lot like the past, but not the same – and copying the passed doesn’t work…
PAYPAL Plugin (on-the-fly disposable credit cards) STORY HERE!Memento. This man (Leonard Shelby) is suffering from short-term memory loss, uses notes and tattoos to hunt for the man he thinks killed his wife. Doomed to repeat the same mistakes he made every day for the rest of his life, he’s just like product development on the internet: We forget our mistakes. Because we just move on and don’t bother telling anyone, most of the time.In the marketplace, failures go away. Therefore we’re doomed to repeat them.How do we get these lessons into the art around this work? It’s important.
Some of these challenges are tractable ones, but industry is not well-situated to address them.Some may be intractable but there is value in finally admitting that, in order to pursue new challenges.The Fuzzy and the Focused - The thing to stress here is that we are NOT academic researchers. We’ve learned many of these lessons by being wrong about how things work and bumping our heads. We’re probably still wrong in subtle ways about many of these things. When we started our book in 2009 we were shocked how little layman-accessible text there was on the subject of reputation systems – that’s main reason we wrote it, warts and all.
Reputation Systems Talk at Ebay
Social (Reputation) System Design:
Opportunities and Challenges
A Practitioners’ View
F Randall Farmer
(Coauthor: Bryce Glass)
@eBay, November 14th, 2013
Why Web Reputation Systems?
Searching, organizing, filtering, advertizing—all
built on top of web reputation systems—is
driving an economy worth $ trillions.
Challenges and Fallacies
But, if web reputation systems are so important, why do
so many of them suck?
– “Any content reputation system is better than none.”
– “Content quality is a rich-man’s problem.”
– “The other guys have thumbs-up/down, we should too.”
– “What cool tools do we have on the shelf?”
Time-to-market trumps correctness, testing, and even
research. Reputation is not, primarily, a technical
Contexts for Reputation
Personal Meaning & Utility of Ratings
What does this mean?
What does it actually do?
Karma is Hard
Karma is useful for building trust between users, and between a user and the site
Karma can be an incentive for participation and contributions
Karma is contextual and has limited utility globally.
Karma comes in several flavors - Participation, Quality and Robust
Karma is complex (via indirect evaluations), and formulation is often opaque
Personal karma is displayed only to the owner, good for measuring progress
Corporate karma is used by the site to find the very best and very worst users
Public karma is displayed to other users, which makes it the hardest to get right
– It should be used sparingly
– It is hard to understand, isn't expected, and easily confused with content
– It shouldn't be have a socially negative value.
– It encourages competition in some users, and may discourage others
The FICO Fallacy: Universal Karma
Credit Score Factors
Length of Credit
Types of Credit Used
Public Karma is Identity (and Personal)
Scott McCloud, Understanding Comics, pp. 38.
Words are Weapons: Public Karma
“We built a point system into Consumating
because we thought giving direct feedback to
people about their conduct on the site would
encourage them to be nice to one another—
you get a thumbs up when you are nice
(treat!), and a thumbs down when you are a
douche (electric shock!)”
Ben Brown, Internet Rockstar
eBay as Multi-Sided Market:
Drop-shippers and Feedback
When is this true? 100 – 5 < 0
What is a Negative Feedback worth?
(in U.S. Dollars)
Google’s Virtuous Circle
CREATE WEB PAGES FOR
PROVIDES TRAFFIC BACK TO