Hi, I’m Sachin, I’m with Dawdle, and I’ll be talking about Actually Useful Trust Metrics for Online Marketplaces and Communities. Let me give you a quick overview of what Dawdle is and how it works so you can understand how we’ve approached handling trust on Dawdle.
Here’s the homepage – you can see the left 2/3 is focused on buying and right 1/3 is focused on selling. Always a work in progress. This image is what I see – it’s slightly different than production – hopefully, will be live to everyone by the end of the weekend.
Concept pitch to guy on the street: stubhub for gaming, specifically stubhub for games, systems, and accessories. We are built for gaming, just like StubHub is built for tickets, so we make it easier to find what you want and list items quickly using our database. Just like StubHub focuses on ticket brokers – but anyone can list – Dawdle focuses on the 2500 “mom and pop” stores who compete with GameStop, but anyone can list on Dawdle as well. So, Dawdle’s an online marketplace – where people buy and sell stuff with other people they’ve never met in real life. Obviously, that requires quite a bit of trust. Now, we back up every transaction with our 100% money back guarantee, but that’s a backstop. People won’t buy on our site in the first place if they don’t trust that the users in that marketplace are trustworthy. Let me get into a little bit of how our marketplace works…
We work a little differently – we have our patent-pending Intelligent Matching Engine. Some people who haven’t met me ask if the picture of the guy in the middle is me, which blew my mind the first few times I heard the question. So for people who will only see this on SlideShare, I wanted to add this slide in there…
Clearly, the guy on the left is not me. So, the line to the guy off the street is StubHub for video games, systems, and accessories, but what gets me excited is how the technology works – how the Intelligent Matching Engine works – which is…
Kristen Nicole of Mashable wrote us up as eHarmony for auctions, which I pretty much just stole. So, thanks, K-money. We can take what we’re doing in gaming and expand it into other categories, either ourselves or licensing out the technology. No plans to now – our market niche is between $3-4 billion (with a B) just here in the US, which is plenty for us for now. So, in a marketplace – or any community – you need to have some level of trust with the other participants in order to participate yourself or make some action based on what the community tells you.
Trust is more than just reputation – it is the establishment of comfort to engage in an action in the community. That’s posting or sharing on a messageboard or forum. You’ll have ratings left by users to give you trust in products. In a marketplace, it’s what gets you to buy or bid. On eBay, you have the additional trust issue of selling to someone – will they pay me? Will they pay me quickly? There is no need for trust for a lurker, except to get them comfortable to make an action. In that way, trust is especially important for newcomers to a community. If you’re at a review site, you’ll go to a new place or purchase a new product based in large part – or entirely – on the ratings left. Trust metrics sneakily establish hierarchy – keep that idea in mind. So let’s look at some different trust metrics…
Here’s the simplest kind of trust metric: a score. Scores are explicit, and they’re just simple aggregates - +1 for up, -1 for down, and sometimes a zero for a neutral. The Amazon one gets you to an effective score of 402 if you want to assume the 198 who did not find it helpful are minus 1’s. But that one requires a bit of work. Math! Eek! That required a little bit of work, but it’s nothing compared to…
This. This is the feedback dashboard for a large video game seller on eBay. Anyways, on the left side, you can see all manner of plus/minuses – done on a 3 month, 6 month, and 12 month basis. On the right side, you have what eBay calls Detailed Seller Ratings, which only show up for higher volume sellers. You can see that he’s above four stars, but you can’t see the exact values, but you can see how many ratings were left. Given that he’s got almost 50K on each DSR, you should have a high level of confidence that these are accurate for the things being asked. The DSRs are an obscured score – they’re averages, they’re not the raw data that you get from the simple feedback score on the previous page. This is mitigated by showing you the total number of ratings left.
Here’s how Amazon does it. There’s an overall star rating, where they give you an average which is off of the last 12 months. There are no simple Feedback ratings on the plus one/minus one scale, just percentages. Amazon does a 1/3/12 month breakdown and, again, gives you total number of ratings left. What I find interesting is that even though buyers leave star ratings, the percentages go to a simple positive/negative/neutral. So it’s pretty much the reverse of eBay. So we’ve seen how ratings and scores can get completely out of hand. Let’s talk about something a bit more simple… You can see pretty clearly that this is very clearly done to be “simpler” than eBay. Let’s look at something that *is* actually simpler than the deceptively complex scores…
Badges! Badges are generally a binary signal – either you have it or you don’t. But note how, especially with reviews, the hierarchy thing happens again, pretty explicitly. Yelp Elite – you have to be chosen. You can’t buy your way in. One of the things they require for Elite is that you use your real name. I’ll come back to that point in a couple minutes. Flickr pro as a badge is really interesting – the photo speaks for itself, but somehow that badge has some sort of signaling value. I wish I was able to grab an image to represent forum moderators, but after seeing all sorts of moronic avatars, I just gave up. I now want to talk about one of my favorite visual trust metrics – and one that you probably haven’t thought about yet…
Persistent identities. For those who are on multiple sites – and I imagine everyone in this room is – you’ve noticed that there are a lot of people who use the same username on multiple sites. Honestly, this makes FriendFeed a little redundant, aside from the fact that it aggregates. Yeah, I’m sachinag on FriendFeed as well. While there’s a convenience factor to the user – it’s also a signal. You can see a username and have a high degree of trust that it’s the same person as someone with that username on another service. They don’t have to be formally linked; it just happens naturally.
And you can see that it works just the same for companies – My favorite example of this is Dan Honigman’s fake ColonelTribune character. If you look on Digg, there’s a very high correlation between what dan360man and ColonelTribune diggs. But they’re different users entirely, and that’s because Dan’s established a persona and a presence for the Colonel on Twitter, Facebook, MySpace, Flickr, and god knows how many other sites. How many other sites, Dan? The best part is that Dan’s been able to establish an account for the Colonel on Facebook. Back in the day, Facebook didn’t let people register under fake names. They deleted those accounts swiftly. Interestingly, Friendster did the same thing and the outrage was huge. There were a number of people who suggested that when Friendster started doing that, it took the fun out of Friendster and it jumped the shark/went corporate/became lame/whatever. But on Facebook, it wasn’t a problem. Why? Because you had to register with a .edu address – you couldn’t waste the one .edu you had on a fake name. Since you couldn’t register with a Hotmail account, you had a high degree of trust that someone was who they said they were. Even now, the number of fake celebs on Facebook is insanely low.
Which brings us to a higher level of trust – using real names. Yelp, again, doesn’t require real names to review, but it does require them for Elite status, which shows you how much stock they put into it. When I comment on other blogs, I always use my real name. I don’t have to, but it shows that I, personally, stand behind what I say. And even if someone doesn’t know who I am, the fact that I do so engenders a higher level of credibility in what I have to say. Not everyone uses persistent identities – so you have anonymous users transacting with each other. And what’s the problem with anonymity?...
If the other person is anonymous and online, you have every reason to assume that he’s a total fuckwad.
On Dawdle, just like Facebook and LinkedIn, we force people to use their real names. Pro sellers – our store partners – can have their store’s name displayed, but they must register under a real name. If you’re a new marketplace, and you want to show that people can trust others, it’s absolutely the right thing to do.
Now, we only impose showing your name on sellers. Buyers are totally anonymous to everyone. Only the seller knows who the buyer is, because the seller needs to know where the item is being shipped to. Again, however, everyone has to *register* using their real name.
The second part of how we show trust metrics on Dawdle is by having a five star scale. People intuitively get that the more stars you have, the more trustworthy you are. I want to be really clear here: this isn’t a Feedback Rating like on eBay where you can get credit for both buying and selling. Since we handle all payments, there’s no nonpayment risk. Since sellers only get an item sold e-mail after we’ve charged a buyer’s credit card, we’ve eliminated the need to get feedback on buyers. That’s why our stars – our ratings – are Seller Ratings. Unlike other sites, you have to earn your stars on Dawdle. Let me walk you through how you can “earn your stars” on Dawdle.
This is Rihanna. Rihanna loves games. See how she’s super into it, with her Xbox 360 headset on, leaning forward, a look of concentration on her face? This girl is a hardcore gamer. Rihanna comes to Dawdle, registers her name, and she starts off with her default three stars.
Like a lot of hardcore gamers – Rihanna sells her old games to get money that she spends on new games. She’s part of the 16 million “New Game Gluttons” who constitute about half a billion dollars in new game purchases annually. So she sells some games on Dawdle, and her buyers leave her positive feedback or no feedback at all. Since Dawdle doesn’t push buyers to leave feedback, it’s up to Rihanna to ask them to leave feedback to increase her seller rating. But she’s sold some stuff, people haven’t left her bad feedback, so we know that she’s a pretty good seller.
So with some sales, Rihanna’s now a four star seller. Again, this is a seller rating. You can buy all day long, but if you never sell, you’ll stay at three stars forever. But Rihanna isn’t all about selling games – she’s not a store – she’s not trying to make a living by selling games. She wants to play! So how can she get a higher seller rating?
She can give us more information about herself! In the MyDawdle section of the site, we have a section called Personal Info. The more information you give us about yourself, the more we can trust you. As we get into a place where we have Facebook Connect and Google Friend Connect and MySpace Data Availability, we can verify this information. For now, we assume that if you take the time to fill this in with valid information that what you give us is true and you. This is actually how we also let stores get up to five stars. The stores actually get a default bump up all the way to four stars after they’ve sold their very first item. That’s because we know who they are – we have phone numbers and physical addresses for them. We’ve verified that they make a living buying and selling games. If it’s their livelihood, it’s not in their interest to try to screw people. They have to keep selling to keep making money.
So now that we know that Rihanna’s a good seller from her feedback (and lack thereof) and we know some stuff about her, we can say to our buyers that Rihanna is one of our very best sellers! So now she has a five star Seller Rating. But wait! How do I know that Rihanna is one of our very best sellers? Well, remember what I said early on about how trust metrics can be sneaky? Specifically, I said…
Trust metrics sneakily establish hierarchy. But if everyone can get to five stars by giving us information and selling a few things, then there’s no hierarchy.
To have a hierarchy, you have to rank things. But no one’s ranking anything in these feedbacks. They’re rating. You can’t possibly ask someone to rank their top 10 commenters on Metafilter, 100 best restaurants on Yelp, or their 736 online purchases on Dawdle. Heck, even the US News rankings don’t ask other colleges to rank other colleges – they ask them to rate them. *Then* they do something with those ratings. Let’s talk about what it is they do with those ratings.
This is a pretty common review distribution graph on Amazon. Amazon and other sites, but the academic research focuses on Amazon, shows that there tends to be a bimodal distribution. This isn’t just fat tails – this is just nutty. And you’ll see it again and again and again. Everything on Amazon can’t be either awesome or terrible. But Amazon’s community has settled on this bimodal distribution where they only leave feedback if it’s awesome or terrible. Note how the average customer review isn’t reflected in the distribution graph. The mean is very different than the median which is very different from the mode…
On top, you see the same Amazon distribution for a product that we had on the previous slide. On the bottom, you see two different Yelp reviewers’ distributions of the things they’ve rated. What you’ll see consistently on Yelp is that users are very generous. Everything is either excellent or very good. There’s not a lot of terribles. Everyone on Yelp is a happy person who loves the world and they only go to the greatest places because they’re just that cool? Probably not, but that’s what their review distributions look like. That’s kind of silly. So you have a lot of bunching up. It makes it hard to differentiate between different products or users or places or whatever. If you can’t differentiate, how can you tell who’s better than whom?...
Normalize! You can normalize any set of data. When we are able to create a distribution that looks like this, we can say that Rihanna is far enough in our right tail that she’s one of our very best sellers and we can give her our highest five star Seller Rating endorsement. I think that this is a very powerful tool that allows us to quickly and easily differentiate between users, or sellers in our case. On eBay, how much different is someone with 1,000 feedback versus someone with 10,000? Do you really think the 10K seller is that much more trustworthy? That’s the issue that this solves. When you get into large numbers, every incremental data point is pretty meaningless. However, when you look for patterns and you identify the anomalies, you can find great data there. One small but important point…
I really wanted to avoid bullet points, but I just didn’t know how else to do this slide. We can normalize all we want, but we really want to guide users into helping us make the stratification a little more natural. So here you can see how we guide people to three. If we don’t get people to naturally create a bell curve distribution themselves, then we have to do it. But when we do it, we’re exaggerating very small differences between data points. It’s what happens when you have a grading curve and an easy class. If only a third of the students can get A’s, but the average score is a 95, you’re unfairly punishing people for minute differences between a 100 and a 97. On the other hand, if the test is hard, the distribution is going to be much wider. Then, if it’s a third of the class gets As, the cut off may very well be a 75. That’s what we’re going for here. We guide people as much as possible to leave 3s for standard transactions. The hope is that this will allow us to have a more natural bell curve that we don’t have to massage as much. We want to make sure that there are meaningful differences between a three-star and a four-star and a five-star seller. We want to make sure that there are meaningful differences between a *four-and-a-half* star seller and a five-star seller. So, let’s wrap this up – one slide about Dawdle, and one about takeaways and recommendations for all y’all.
For Dawdle, it’s very clear what this does. Having meaningful trust metrics aligns buyers, sellers, and the marketplace in a way that no other marketplace does. Because we allow buyers to screen on Seller Rating – something no one else does – we incent sellers to do everything they can to get their rating as high as they can. How? Because lower rated sellers can be screened away, they have to compete on price with higher-rated sellers. Higher rated sellers can easily parlay their higher seller ratings to charge a *premium*. Note that our three feedback questions aren’t about price – they’re about the experience delivered. Because shipping costs are always included in the purchase price, so it looks like free shipping to the buyer, all cost uncertainty is taken off the table. Seller ratings are about the experience, period. As higher rated sellers can charge higher prices, that means that Dawdle gets more money. Our commissions will be higher as their prices are higher. But we are still fair to all sellers – we don’t give higher seller ratings to those who sell the most – we give five stars to the sellers who have proven themselves to be exceptional in the past and who indicate that they will be exceptional in *future* transactions. The only way this works is if there are meaningful differences between sellers that the buyers can easily identify and make determinations accordingly. That’s why we spent a lot time working on our seller rating mechanism and our servers scream at us at night. It’s a ton of data that they’re handling. But you don’t have to do it that way; there are simple recommendations that I want to make for you…
Recommendations and Takeaways: Use outside information – persistent identities should be encouraged, link to other accounts if necessary, and consider checking user data against information in public databases, if that is appropriate to your situation. Using real names is something worth considering Encourage people to do the stratification for you; no one thinks its fair when a curve in a class hurts your grade, but a hard test that separates the wheat from the chaff does it. You don’t want to amplify minute differences.
Some academic papers and academic writing that may be helpful for you. I didn’t want to give anyone a works cited, but just some jumping off points if you find this stuff interesting.
Yes, that’s Mario with a butterfly knife and a glock. Peach is smoking. Not sure if it’s a cigarette or a joint, but whatever.
Actually Useful Trust Metrics
Actually Useful Trust Metrics for Online Marketplaces and CommunitiesSachin Agarwal, Co-Founder and CEO, Dawdle.com @sachinag, @dawdledotcom CONFIDENTIAL – The Cono Project, Inc. 1
Guiding To Three• What was the condition of the product you received? – 3. Exactly as expected• How quickly did you receive your item? – 3. Exactly as expected• How was the Seller’s communication? – 3. Good – 2. No communication required CONFIDENTIAL – The Cono Project, Inc. 29
For Background And More Information• http://dooooooom.blogspot.com/2008/05/trust-in• http://sloan.ucr.edu/category/working-papers/pro• http://faculty.cs.tamu.edu/caverlee/pubs/caverlee CONFIDENTIAL – The Cono Project, Inc. 32
Questions and DiscussionCONFIDENTIAL – The Cono Project, Inc. 33