The document discusses common myths about data products and how to build "product sense". It outlines five myths: 1) that data products are just about displaying data, 2) that users will behave as expected, 3) that products should optimize for clicks, 4) that complex algorithms beat simple ones, and 5) that users want to configure things themselves. It argues that building product sense requires intuition, learning by doing, talking to users, using your own product, knowing competitors, and being data driven. The key is taking a user-first approach over a machine learning-first one.
8. Left Turn Straight Ahead Right Turn
50%
Live Traffic i
Left Turn Straight Ahead Right Turn
50%
Weekend Traffic i
Left Turn Straight Ahead Right Turn
50%
Weekday Traffic i
Left Turn Straight Ahead Right Turn
50%
Sunny i
Left Turn Straight Ahead Right Turn
50%
Rain i
0
10
20
30
40
50
60
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Left Turn
Straight Ahead
Right Turn
Monthly Breakdown i
9.
10.
11.
12. Data Product
Use data to provide highly personalized
experiences and tangible value to users.
New Big Data incubation team at Salesforce.
Find ways to leverage tons and tons of data to build intelligence into products across the sales, services, marketing clouds.
Share insights about building data products learnt along the way.
Structure talk in two parts.
Canonical example of such a data product -- Google Analytics. Most widely used web analytics service on the internet, showing you usage analytics for your website … who your visitors are, where they come from etc. Upwards of 50% of the most popular sites on the internet all use Google Analytics today. To many product managers and data scientists today, this represents the canonical data product. View data products as an opportunity to present data to users.
Take this kind of thinking to an example that we use every day … Google maps. With all the live traffic information streaming in through android phones, this is really a great opportunity to give data back to the users. So how about this? As a user is driving, and navigating a particular intersection, what if we show how other people are navigating it. 50% go straight …
Even better, we could show it in a gauge like this.
But why stop at live traffic? We have historical data …
And the user can have this entire user dashboard at his fingertips to help him make real time decisions on his daily commute!
Not that dashboards are bad. People think that that’s what they want. But actually, they want …
Thankfully, this is not what Google did. Instead built navigation tool that lets you specify a destination, and then gives you turn by turn directions till you reach there.
And they used live traffic data to detect congestion on highways and suggest alternative routes. So much more useful. Now, I’d like to believe that the dashboard example was so absurd that such a thing would never happen in real life. But if you talk to many product managers and data scientists today, the dashboard metaphor is often their framework for thinking about data products. We should be willing to think outside of that box in designing a product that provides the most value to users.
It should be a product that uses data to offer a highly personalized experience that provides actual value to the user, such as helping him make a decision, or telling him what his next actions should be.
If you look on the right hand side, there is this section called “people also viewed”. It shows that the people who viewed Ahmet’s profile, also viewed the profile of these other people. The intuition behind this data product is that people tend to co-view the profiles of professionally similar people. And so this widget should surface profiles of people who are similar to Ahmet. Sure enough, if you look at these folks, they’re all data scientists, and they work at companies like LinkedIn, Salesforce, Facebook and so forth. This is an extremely useful product. Recruiters used to use it heavily to find candidates.
What you end up seeing is a territory account manager at EMC and a photography intern at the green bay packers and so on. What just happened here. People tend to click on pictures of pretty women. In fact, throughout this product, you will see that there is a general bias towards women. What it means, is that our expectation that users would co-view the profiles of the most professionally similar people was not entirely true, and our model should have accounted for that, and taken other features into account. Users will not be good and well behaved and act just the way you expect them to.
It turns out that co-views are based not just on professional similarity, but also on other features, for example physical appearance. So the people who viewed Ashley’s profile as a group are statistically more likely to view the profiles of similar looking attractive women, and that turns out to be the dominant signal in her co-view statistics.
Search engines, recommendation engines, ads … everyone optimizes for clicks.
Or this one.
These are examples of clickbait. Facebook brimming with these kinds of articles. Called clickbait because try to reel you in with a mysterious headline, and then when you click on them, you’re disappointed. If you optimize solely for clicks, you create an ecosystem for this kind of clickbait. What facebook then did, to counter the problem, was take into account the amount of time people actually spent on a page to get a much stronger signal for usefulness. And this kind of clickbait just drastically reduced. And in general, that’s the kind of thing you need to do … you need to be smarter or nuanced about how you optimize. Because, you may be increasing clicks for now, but in the long run you might be hurting other metrics that you really care about much more, such as retention and engagement.
You could build mechanisms into your system that give you stronger signals of usefulness, such as shares or upvotes. You should correctly model and account for hidden costs – such as the cost of annoying users with information that looks interesting but is not actually.
And sometimes, even more importantly than the algorithms you’re using, very simple things, such as choosing your data sources more wisely, or allowing users to curate their own feeds the way twitter does, can be much more impactful.
As the saying goes …
If you optimize for clicks, without regard for any nuances, you encourage the creation and promotion of this kind of content. Because this is the content that people tend to click on!
Now, if you are buzzfeed, this might be alright. But if you’re not, this might actually hurt other metrics that you care about … engagement, retention, repeat, active users.
Closely related to the previous one. There’s all these amazing products built on top of complex machine learning techniques. Deep learning. What is data science if not machine learning and statistics and algorithms?
The service cloud is essentially a customer service application that empowers companies to manage all their customer information and service conversations in the cloud . So it allows service agents to create cases, search a knowledge base for solutions, reach out to customers across different channels and consolidate all these multi-channel efforts within salesforce.
One of the apps in the service cloud is Social Studio. It allows service and marketing teams to monitor conversations around their brand across all the social channels such as facebook and twitter. Gives you a breakdown by topics and trends. Sentiment etc. Allows you to directly engage with the customers on those channels and so forth. So if there is a customer service issue that is tweeted about on Twitter, an agent can monitor the conversation, and reach out to the person, and directly tie it to a case in the service cloud.
Now a super enthusiastic data scientist would be tempted to say, let’s automatically detect who this person is, what his booking code was, and surface this information to the service agent so that he can resolve this case faster. Seems fair right?
Right thing to do in this case. The problem with the automated approach is that the entity resolution could be inaccurate, the person’s twitter handle may be different from his name. If you get it wrong, you just waste more time, and worse, you make the company look even more incompetent. So just ask the user directly. Less elegant, perhaps, but more simple and more importantly, more accurate.
machine learning is a stepping stone. use as tool not to make final decision.
hybrid human + machine learning
More of a conclusion that one might be tempted to draw from the previous two myths. That users want to configure things for themselves. So give them more choice. Allow them to curate their own newsfeeds. Ask them for information about themselves and so forth.
Google knows so many things about me, that it has inferred from my past actions. It has basically become a mind reader. And yet at no point does it ever push any of the behind the scenes complexity to the end user. No options for filtering or sorting. No requests for explicit feedback on the search results.
For 17 years, it’s just been a simple search box.
Want to emphasize, none of these myths are actually fallacies. Dashboards do serve their function. Optimizing for clicks is a good proxy in many situations. The point of all the examples was more to show that building data products is something of an art.
And a very crucial component of that art is “product sense”
What do I mean by product sense? Basically knowing what to build and why. What tangible value does the product provide the users. What does the competitive landscape look like? How does this new feature affect company KPIs. You can follow a data driven approach to answer these questions -- where you can formulate hypotheses and look at your data to see if these hypotheses are valid. But you can only really begin to answer these questions, when you know your user. Know what you can expect of them. Anticipate their behaviors. So talk to your users. Eat your own dogfood so you can really build empathy for your users! And finally, having product sense, means knowing that a user first approach trumps a machine learning first approach. As data scientists, we’re often tempted to create all kinds of magic for our users. But a simple solution, and a quick path to execution followed by rapid iteration is invariably the right thing to do.
product sense isn’t an all or nothing thing. You could have great intuition for building consumer web products, not so much for enterprise products.
But more importantly, experience helps build intuition. The best way to learn is by actually doing. So if you are a data scientist, don’t be content to come up with cool algorithms and toss them across the fence. Actively engage with product managers, designers and users.
Talk to your users. To get a sense of the different personas that use the product and what the problem space actually is.
Build empathy for your users. At LinkedIn, some designers and data scientists would really go out of their way to do this. Everyone used to get a pro account for free after joining LinkedIn for instance, but these people would purposely, refrain from upgrading, so they could have the experience of non-pro users, which was distinctly different due to differences in visibility into the LinkedIn graph.
You want to of course be building new products that offer something different from your competitors. But knowing what works for them and what doesn’t can be very useful input. And you don’t want to be nokia building yet another nokia 7360 when everyone else is building smartphones.
Do what data scientists do best – be data driven. Formulate hypotheses, validate with data, before embarking on projects.
At Msft, every planning session, would begin with a triaging session, where we would triage failed queries, to figure out what the biggest pain points were, and where we should focus our efforts next to improve search experience.
We all kind of know all these things already, But it’s so easy for us to forget about them, when we’re caught up in our day-to-day jobs. So don’t just pay lip service. Stop to think about the value proposition to your users, and whether this really is the simplest thing you could be building first to test the waters, as you go about building awesome new data products.