Consumer Science and Product Development at Netflix - OSCON 2012

4,630 views

Published on

At the core of product innovation at Netflix is consumer science; Netflix constantly tests and iterates on new ideas for every aspect of the streaming service. This experiment-oriented culture uses data from actual customer usage of the product to quickly understand which ideas are great.

At the 2012 O'Reilly Open Source Convention (OSCON), Rochelle King and Matt Marenghi described the consumer science process in detail. They demonstrated how it can apply to everything from a new UI concept to a new back-end algorithm. They also discussed how the company culture and the product development approach inter-relate, and shared how this customer focus drives decision-making at every level of the company. They also discussed some of the limitations of consumer science, and how to make decisions where science can’t easily shine a light.

1 Comment
27 Likes
Statistics
Notes
  • Why not bring back the sortable lists everyone WANTS? Do you have a problem with giving your customers what THEY want?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
4,630
On SlideShare
0
From Embeds
0
Number of Embeds
376
Actions
Shares
0
Downloads
0
Comments
1
Likes
27
Embeds 0
No embeds

No notes for slide
  • \n
  • ROCHELLE - introduction\nMATT - introduction\n
  • \n
  • At Netflix, we strive to delight our customers by making it as easy as possible to find and watch movies and TV shows.\n\n
  • 26M global streaming members\n
  • In the past two years we’ve developed relationships to get built into over 800 partner products\n\nWe are on major game consoles (Wii, PS3, Xbox), mobile devices (iPad, Android, iPhone) also on DVD players, smart TVs and home theaters\n\n\n
  • Streaming service has allowed us to change the way people use our service, much more mobile, more flexible and instant\n\nMaking it as easy as possible to get to is important which is why a key part of the strategy has been to get on as many devices as possible.\n\nInternet connected TVs, gaming consoles\n
  • Mobile - tablet, phones\n
  • PC & Macs\n
  • Started in the United States\n
  • Expanded to Canada in 2010\n
  • Latin America in Sept 2011\n
  • UK in Jan 2012, next territory in Q4 of 2012\n
  • At Netflix, we make most of our product decisions using “consumer science”. \n\n\n
  • When building a product, you need to be clear on what your goal is. Netflix is a consumer product - so customer satisfaction is the primary goal that drives most product innovation. We’re also a subscription business - and we believe that satisfied customers will be more likely to renew their subscriptions and retain better. \n\n\n\n
  • We’re a data-driven organization, so it’s important for us to understand and measure whether or not the new product features we’re rolling out are making a positive impact on our customers. Understanding how we measure success needs to be shared across the entire organization.\n\n
  • We use the term “Consumer Science” to capture how we measure success. We want to gather as much information as possible, directly from our customers, to understand what is or isn’t working for them. Consumer Science can be made up of many components:\n- customer surveys\n- hard core data (demographics, % hours watched, etc.)\n- qualitative feedback directly from consumers via focus groups and usability testing\n- A/B testing or split testing where you give your customers a few experiences that are slightly different from each other and see which one performs best\n\n\n\n\n
  • We use the term “Consumer Science” to capture how we measure success. We want to gather as much information as possible, directly from our customers, to understand what is or isn’t working for them. Consumer Science can be made up of many components:\n- customer surveys\n- hard core data (demographics, % hours watched, etc.)\n- qualitative feedback directly from consumers via focus groups and usability testing\n- A/B testing or split testing where you give your customers a few experiences that are slightly different from each other and see which one performs best\n\n\n\n\n
  • We use the term “Consumer Science” to capture how we measure success. We want to gather as much information as possible, directly from our customers, to understand what is or isn’t working for them. Consumer Science can be made up of many components:\n- customer surveys\n- hard core data (demographics, % hours watched, etc.)\n- qualitative feedback directly from consumers via focus groups and usability testing\n- A/B testing or split testing where you give your customers a few experiences that are slightly different from each other and see which one performs best\n\n\n\n\n
  • We use the term “Consumer Science” to capture how we measure success. We want to gather as much information as possible, directly from our customers, to understand what is or isn’t working for them. Consumer Science can be made up of many components:\n- customer surveys\n- hard core data (demographics, % hours watched, etc.)\n- qualitative feedback directly from consumers via focus groups and usability testing\n- A/B testing or split testing where you give your customers a few experiences that are slightly different from each other and see which one performs best\n\n\n\n\n
  • We use the term “Consumer Science” to capture how we measure success. We want to gather as much information as possible, directly from our customers, to understand what is or isn’t working for them. Consumer Science can be made up of many components:\n- customer surveys\n- hard core data (demographics, % hours watched, etc.)\n- qualitative feedback directly from consumers via focus groups and usability testing\n- A/B testing or split testing where you give your customers a few experiences that are slightly different from each other and see which one performs best\n\n\n\n\n
  • The entire team needs to be on the same page about what “performs best” means.\n\n
  • Important to choose the right metrics. For Netflix, as a subscription business, RETENTION is the core metric that we want to measure on our tests. Anything that we test in our product should be with the intent of improving retention. \n\n
  • However, retention can be hard to measure or take a long time to measure. Therefore, it’s important to develop leading indicators or proxy metrics. Hours watched is one of our proxy metrics. A customer who watches 4 hours a week from Netflix will be more likely to stick around as a customer than someone who is watching only 1 hour a month. Generally speaking, if they’re watching more Netflix, then they’re getting more value from our service and more likely to retain. \n\n\n
  • Every test starts with a hypothesis. Why do we think what we’re going to do is actually going to make a difference for the business? Some ideas might sound like they’ll make a difference to the core metric, but you need to ensure that they will actually help the overall business (e.g $1 per play - not good for business).\n\n
  • Every test starts with a hypothesis. Why do we think what we’re going to do is actually going to make a difference for the business? Some ideas might sound like they’ll make a difference to the core metric, but you need to ensure that they will actually help the overall business (e.g $1 per play - not good for business).\n\n
  • Every test starts with a hypothesis. Why do we think what we’re going to do is actually going to make a difference for the business? Some ideas might sound like they’ll make a difference to the core metric, but you need to ensure that they will actually help the overall business (e.g $1 per play - not good for business).\n\n
  • We’ll use this last hypothesis as an example to walk through how A/B testing works.\n
  • What variables will you test to determine if your hypothesis is sound or not? At Netflix, our movies and TV shows are displayed in rows. Each row represents a different genre or category. \n\n
  • If the hypothesis is about “showing more movies & TV shows” up front, then you can either: 1) add more titles per row (provide more depth within each genre/category)\nor 2) add more rows and categories (provide more breadth in the catalog for customers to browse). Think about dependent & independent variables, control, significance…\n\n\n
  • If the hypothesis is about “showing more movies & TV shows” up front, then you can either: 1) add more titles per row (provide more depth within each genre/category)\nor 2) add more rows and categories (provide more breadth in the catalog for customers to browse). Think about dependent & independent variables, control, significance…\n\n\n
  • If the hypothesis is about “showing more movies & TV shows” up front, then you can either: 1) add more titles per row (provide more depth within each genre/category)\nor 2) add more rows and categories (provide more breadth in the catalog for customers to browse). Think about dependent & independent variables, control, significance…\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • Every test starts with a control - usually the experience that is already out there. Then we make several different experiences (test cells) which hopefully give us a better understanding about how much impact the variables that we’re testing has on our customers. It’s a test/experiment - keep in mind that the ideal execution can be 2X as effective as a prototype, but not 10X. Once the test is run, analyze your data. Users will answer the question for you of which variables make an impact. We can learn a lot by understanding why our test succeeded or failed. In this test, adding breadth lifted viewing.\n\n\n
  • We like A/B testing because: 1) Levels the playing field - many ideas (from anyone) can be tested and it helps democratize product development. Eliminates the problem of only building the idea from the person who yells the loudest or the “highest paid opinion”. 2) Gives us data from real customers - best and most direct way for understanding what will work with our customers 3) Aligns to core metrics - Keeps the entire team (design, development, product management) on the same page about what we’re measuring and thinking about how to move the business forward\n
  • We like A/B testing because: 1) Levels the playing field - many ideas (from anyone) can be tested and it helps democratize product development. Eliminates the problem of only building the idea from the person who yells the loudest or the “highest paid opinion”. 2) Gives us data from real customers - best and most direct way for understanding what will work with our customers 3) Aligns to core metrics - Keeps the entire team (design, development, product management) on the same page about what we’re measuring and thinking about how to move the business forward\n
  • We like A/B testing because: 1) Levels the playing field - many ideas (from anyone) can be tested and it helps democratize product development. Eliminates the problem of only building the idea from the person who yells the loudest or the “highest paid opinion”. 2) Gives us data from real customers - best and most direct way for understanding what will work with our customers 3) Aligns to core metrics - Keeps the entire team (design, development, product management) on the same page about what we’re measuring and thinking about how to move the business forward\n
  • A/B testing can be used for to test radically different ideas as well as smaller iterative ones.\n
  • This was the original TV UI for Playstation 3, before we had the ability to do A/B testing and dynamically update just the UI with server-delivered UI code. It was the launch of our PS3 downloadable application in late 2009, which replaced the original disc-based version, that introduced our use of the open source browser WebKit. Using WebKit in our application as the UI engine meant we could start doing true A/B testing of the UI in the same way that we had been doing for years on our netflix.com website on PC/Mac.\n\n\n
  • This experience served as our control. It was already available on a small number of Smart TVs. The main elements of it included: a) a menu structure for selecting different categories. The menu allowed for introducing navigation hierarchy with sub-lists, allowing for deeper drill-down into niches of the catalog, for example Romantic Comedies. b) this hierarchical browse exposed more of the catalog via browsing. c) more boxshots were available on screen at any one time.\n \n
  • This experience strived for simplicity. No hierarchy, no menus. Simple navigation within a grid of titles. Horizontal rows for individual categories/lists with a title always receiving focus and a panel along the right side providing a rich amount of metadata to inform one’s decision.\n
  • This experience strived for separating navigation from the content. A guided experience through a set of menus. Once a category or sub-category was selected, the navigation got out of the way and the focus was on titles and supporting similar titles to whatever title has focus.\n\n
  • This experience focused on the power of playing video to help inform one’s decision. They hypothesis being that perhaps customers can more easily choose what to watch through the act of watching. A title is always playing at fullscreen with an browse experience as an overlay over the video. Selecting a different title would result in that playing, but allowing the customer to continue browsing for other potential titles, similar to a channel surfing type of experience.\n
  • Which one do you think performed the best?\n
  • The Simple Grid UI won compared to our control. It’s also worth noting that cells 3 & 4 were negative compared to the control. Internally, a lot of people were excited about the video cell and were confident that it would perform well. If we had simply rolled out that experience without first testing it, we would have done a disservice to our customers.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Once you have a general direction, then you can take it and test iterations on it so that you can improve it’s performance even more.\n
  • Example from our website experience to illustrate another benefit of A/B testing.\n
  • Hypothesis is that by removing a lot of the clutter and affordances on the UI will make it easier for customers to discover great movies and TV shows to watch. \n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The ‘control’ was the default UI at the time and cell 1 is a cleaned up version of the UI:\n- Box shots made larger to showcase the content, so large that we could remove the titles from above all the box shots\n- A number of items put into the hover state (play buttons, stars, etc.)\n\n
  • The “clean” design won and increased streaming hours.\n
  • The “clean” design won and increased streaming hours.\n
  • Naturally, we were excited to roll it out. However, the reaction we got from customers posting on our blog was VERY negative. When you get such emotional feedback (even if it’s from a vocal minority) - it’s hard not to question the decision you made and second guess yourself. But the data can give you confidence that there was something that was working better for customers in your design... Some folks seemed to be asking for a full rollback to the original site. However some (still negative) gave us some useful insight about the specifics of what they didn’t like (scrolling and missing sortable list). \n
  • Naturally, we were excited to roll it out. However, the reaction we got from customers posting on our blog was VERY negative. When you get such emotional feedback (even if it’s from a vocal minority) - it’s hard not to question the decision you made and second guess yourself. But the data can give you confidence that there was something that was working better for customers in your design... Some folks seemed to be asking for a full rollback to the original site. However some (still negative) gave us some useful insight about the specifics of what they didn’t like (scrolling and missing sortable list). \n
  • Naturally, we were excited to roll it out. However, the reaction we got from customers posting on our blog was VERY negative. When you get such emotional feedback (even if it’s from a vocal minority) - it’s hard not to question the decision you made and second guess yourself. But the data can give you confidence that there was something that was working better for customers in your design... Some folks seemed to be asking for a full rollback to the original site. However some (still negative) gave us some useful insight about the specifics of what they didn’t like (scrolling and missing sortable list). \n
  • Naturally, we were excited to roll it out. However, the reaction we got from customers posting on our blog was VERY negative. When you get such emotional feedback (even if it’s from a vocal minority) - it’s hard not to question the decision you made and second guess yourself. But the data can give you confidence that there was something that was working better for customers in your design... Some folks seemed to be asking for a full rollback to the original site. However some (still negative) gave us some useful insight about the specifics of what they didn’t like (scrolling and missing sortable list). \n
  • Naturally, we were excited to roll it out. However, the reaction we got from customers posting on our blog was VERY negative. When you get such emotional feedback (even if it’s from a vocal minority) - it’s hard not to question the decision you made and second guess yourself. But the data can give you confidence that there was something that was working better for customers in your design... Some folks seemed to be asking for a full rollback to the original site. However some (still negative) gave us some useful insight about the specifics of what they didn’t like (scrolling and missing sortable list). \n
  • Naturally, we were excited to roll it out. However, the reaction we got from customers posting on our blog was VERY negative. When you get such emotional feedback (even if it’s from a vocal minority) - it’s hard not to question the decision you made and second guess yourself. But the data can give you confidence that there was something that was working better for customers in your design... Some folks seemed to be asking for a full rollback to the original site. However some (still negative) gave us some useful insight about the specifics of what they didn’t like (scrolling and missing sortable list). \n
  • Because we controlled for different variables, our data combined with the customer feedback allowed us to discern what changes we should make to the features that we rolled out while being able to maintain a lot of the positive benefits we saw as well.\n\n
  • The site today retains much of what we originally rolled out - but without A/B testing, there’s a chance that we would have rolled back all our changes and not been able to move the product forward in a meaningful way.\n
  • \n
  • Positive result: roll it out, but keep in mind that for existing customers there can be a “change effect” which might have a negative impact\nNegative result: kill the test, resist the urge to revisit it, polish it, thinking it will turn a negative result into a positive one - too costly and unlikely to work\n\n
  • Positive result: roll it out, but keep in mind that for existing customers there can be a “change effect” which might have a negative impact\nNegative result: kill the test, resist the urge to revisit it, polish it, thinking it will turn a negative result into a positive one - too costly and unlikely to work\n\n
  • Positive result: roll it out, but keep in mind that for existing customers there can be a “change effect” which might have a negative impact\nNegative result: kill the test, resist the urge to revisit it, polish it, thinking it will turn a negative result into a positive one - too costly and unlikely to work\n\n
  • Positive result: roll it out, but keep in mind that for existing customers there can be a “change effect” which might have a negative impact\nNegative result: kill the test, resist the urge to revisit it, polish it, thinking it will turn a negative result into a positive one - too costly and unlikely to work\n\n
  • Positive result: roll it out, but keep in mind that for existing customers there can be a “change effect” which might have a negative impact\nNegative result: kill the test, resist the urge to revisit it, polish it, thinking it will turn a negative result into a positive one - too costly and unlikely to work\n\n
  • Positive result: roll it out, but keep in mind that for existing customers there can be a “change effect” which might have a negative impact\nNegative result: kill the test, resist the urge to revisit it, polish it, thinking it will turn a negative result into a positive one - too costly and unlikely to work\n\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We often see tests with a “flat” result. It's important to have the discipline to insist that any product change that doesn't change metrics in a positive direction should be reverted. Even if the change is "only neutral" and you really, really, really like it better, force yourself (and your team) to go back to the drawing board and try again.\n
  • We caution ourselves not to let A/B testing become a crutch for making decisions. That is, not every idea is worth the cost and effort to test. Focus testing on the ideas that will likely move the business forward and be measurable through core metrics. This helps eliminate mediocre ideas, weak hypotheses, and test that won’t move the needle. \nClear Signal - We tend to focus testing on new users because they don’t come with predisposed notions of how to use the product and the largest audience for us is the one that we have yet to get. \nVariations - its costly and time-intensive to test every single variation in isolation. Test cell design should be thoughful and each cell or variation should have a hypothesis behind it. \nLocal Maximum - Free yourself to focus on the big bet/wild/unpopular ideas AND the smaller, incremental sound hypothesis ideas. Know when to “pivot” because you’ve max’d out a specific angle.\nEarly victory - have the discipline to let a test run its expected course before getting too excited by very early results. Likewise, if a test has run its planned course (e.g. 2 months) and there is no positive signal or its negative, don’t be stubborn and let it run indefinitely hoping it will magically turn positive. Its unlikely, and having the maturity to know when to call it a day and move on to testing other great ideas is important.\n\n
  • We caution ourselves not to let A/B testing become a crutch for making decisions. That is, not every idea is worth the cost and effort to test. Focus testing on the ideas that will likely move the business forward and be measurable through core metrics. This helps eliminate mediocre ideas, weak hypotheses, and test that won’t move the needle. \nClear Signal - We tend to focus testing on new users because they don’t come with predisposed notions of how to use the product and the largest audience for us is the one that we have yet to get. \nVariations - its costly and time-intensive to test every single variation in isolation. Test cell design should be thoughful and each cell or variation should have a hypothesis behind it. \nLocal Maximum - Free yourself to focus on the big bet/wild/unpopular ideas AND the smaller, incremental sound hypothesis ideas. Know when to “pivot” because you’ve max’d out a specific angle.\nEarly victory - have the discipline to let a test run its expected course before getting too excited by very early results. Likewise, if a test has run its planned course (e.g. 2 months) and there is no positive signal or its negative, don’t be stubborn and let it run indefinitely hoping it will magically turn positive. Its unlikely, and having the maturity to know when to call it a day and move on to testing other great ideas is important.\n\n
  • We caution ourselves not to let A/B testing become a crutch for making decisions. That is, not every idea is worth the cost and effort to test. Focus testing on the ideas that will likely move the business forward and be measurable through core metrics. This helps eliminate mediocre ideas, weak hypotheses, and test that won’t move the needle. \nClear Signal - We tend to focus testing on new users because they don’t come with predisposed notions of how to use the product and the largest audience for us is the one that we have yet to get. \nVariations - its costly and time-intensive to test every single variation in isolation. Test cell design should be thoughful and each cell or variation should have a hypothesis behind it. \nLocal Maximum - Free yourself to focus on the big bet/wild/unpopular ideas AND the smaller, incremental sound hypothesis ideas. Know when to “pivot” because you’ve max’d out a specific angle.\nEarly victory - have the discipline to let a test run its expected course before getting too excited by very early results. Likewise, if a test has run its planned course (e.g. 2 months) and there is no positive signal or its negative, don’t be stubborn and let it run indefinitely hoping it will magically turn positive. Its unlikely, and having the maturity to know when to call it a day and move on to testing other great ideas is important.\n\n
  • We caution ourselves not to let A/B testing become a crutch for making decisions. That is, not every idea is worth the cost and effort to test. Focus testing on the ideas that will likely move the business forward and be measurable through core metrics. This helps eliminate mediocre ideas, weak hypotheses, and test that won’t move the needle. \nClear Signal - We tend to focus testing on new users because they don’t come with predisposed notions of how to use the product and the largest audience for us is the one that we have yet to get. \nVariations - its costly and time-intensive to test every single variation in isolation. Test cell design should be thoughful and each cell or variation should have a hypothesis behind it. \nLocal Maximum - Free yourself to focus on the big bet/wild/unpopular ideas AND the smaller, incremental sound hypothesis ideas. Know when to “pivot” because you’ve max’d out a specific angle.\nEarly victory - have the discipline to let a test run its expected course before getting too excited by very early results. Likewise, if a test has run its planned course (e.g. 2 months) and there is no positive signal or its negative, don’t be stubborn and let it run indefinitely hoping it will magically turn positive. Its unlikely, and having the maturity to know when to call it a day and move on to testing other great ideas is important.\n\n
  • We caution ourselves not to let A/B testing become a crutch for making decisions. That is, not every idea is worth the cost and effort to test. Focus testing on the ideas that will likely move the business forward and be measurable through core metrics. This helps eliminate mediocre ideas, weak hypotheses, and test that won’t move the needle. \nClear Signal - We tend to focus testing on new users because they don’t come with predisposed notions of how to use the product and the largest audience for us is the one that we have yet to get. \nVariations - its costly and time-intensive to test every single variation in isolation. Test cell design should be thoughful and each cell or variation should have a hypothesis behind it. \nLocal Maximum - Free yourself to focus on the big bet/wild/unpopular ideas AND the smaller, incremental sound hypothesis ideas. Know when to “pivot” because you’ve max’d out a specific angle.\nEarly victory - have the discipline to let a test run its expected course before getting too excited by very early results. Likewise, if a test has run its planned course (e.g. 2 months) and there is no positive signal or its negative, don’t be stubborn and let it run indefinitely hoping it will magically turn positive. Its unlikely, and having the maturity to know when to call it a day and move on to testing other great ideas is important.\n\n
  • We caution ourselves not to let A/B testing become a crutch for making decisions. That is, not every idea is worth the cost and effort to test. Focus testing on the ideas that will likely move the business forward and be measurable through core metrics. This helps eliminate mediocre ideas, weak hypotheses, and test that won’t move the needle. \nClear Signal - We tend to focus testing on new users because they don’t come with predisposed notions of how to use the product and the largest audience for us is the one that we have yet to get. \nVariations - its costly and time-intensive to test every single variation in isolation. Test cell design should be thoughful and each cell or variation should have a hypothesis behind it. \nLocal Maximum - Free yourself to focus on the big bet/wild/unpopular ideas AND the smaller, incremental sound hypothesis ideas. Know when to “pivot” because you’ve max’d out a specific angle.\nEarly victory - have the discipline to let a test run its expected course before getting too excited by very early results. Likewise, if a test has run its planned course (e.g. 2 months) and there is no positive signal or its negative, don’t be stubborn and let it run indefinitely hoping it will magically turn positive. Its unlikely, and having the maturity to know when to call it a day and move on to testing other great ideas is important.\n\n
  • Consumer Science is successful at Netflix because it’s part of our DNA, and something that we’ve evolved over many, many years. Everyone who works at Netflix (design, engineering, product management BUT ALSO legal, HR, recruiting, finance) understands what A/B testing is and how it’s leveraged.\n
  • Two parts to make this successful:\n1) the day to day practice of A/B testing\n2) the people that we hire (again, in ALL parts of the company)\n
  • Two parts to make this successful:\n1) the day to day practice of A/B testing\n2) the people that we hire (again, in ALL parts of the company)\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • FOSTERING THE CULTURE:\nUniversally Embraced - from the executive team to all the individuals that are working on execution\n\nVocabulary - words like: “hypothesis” and “core metric” are commonly used to explain what we do. keeps everyone on the same page. makes product discussions, brainstorms more effective and less opinion driven with no reference to needed data to back it up (e.g. “I’d hypothesize...”, “I believe the data might show..” vs. “I think users want X”, ‘how are we going to measure X?’)\n\nDiscipline - with lots of exciting ideas, it’s easy to want to make exceptions and just roll things out (but if we had done that the video-based TV UI, we would have done the wrong thing). A/B testing is not a selectively used tool, and not viewed as optional. It’s the default approach to making decisions when a hypothesis is testable. Decisions in absence of A/B test results are the exception, because you just don’t know if that decision positively/negatively affected your business. If it CAN be tested, it SHOULD be tested.\nShare Results - Habitual sharing/context setting/broad communication, company-wide, of test results. It helps reinforce that many decisions are influenced by test results. Also, it helps everyone learn from what worked and what didn’t. Allows all of us to hone our consumer instincts.\n\nPEOPLE:\nHumble - Empirical focus keeps us humble - most of the time you don’t know exactly what your customer wants (even if you’re an expert in your field). Quick feedback from testing set us straight, forces us to optimize for the customer. You WILL be wrong at some point in your career and you need to be able to accept that (no egos).\nFocused - With the experimental nature of testing, you need to be able to know what to focus on and when. Know how much effort to put into something to make it work well enough to get a good signal, and know what polish you can put off until it goes into production (if it goes into production)\nData-driven - EVERYONE needs to have an appreciation for the data (design, engineering, PM) and a basic understanding of how our data analysis works (statistical significance, etc.)\nCuriosity about business - Business acumen and business savvy are important because all of our tests are designed to make an impact on the business. Understanding the fundamental business strategy allows you to have a more holistic understanding of how your day to day work is impacting the company at large. (And will allow you to participate better in testing by helping you craft better ideas for what to test).\n\n\n\n\n\n
  • \n\n\n
  • \n
  • \n
  • Consumer Science and Product Development at Netflix - OSCON 2012

    1. 1. Consumer Science &Product Development
    2. 2. Rochelle KingVP, User Experience & Product Services Matt Marenghi VP, User Interface Engineering
    3. 3. Netflix Overview
    4. 4. TV & MovieEnjoyment Made Easy
    5. 5. ~26 Million Members
    6. 6. ~26 Million Members
    7. 7. Over800PartnerProducts
    8. 8. 8
    9. 9. 9
    10. 10. 10
    11. 11. 11
    12. 12. 12
    13. 13. 13
    14. 14. ? 13
    15. 15. Netflix & Consumer Science“If you want to increase your success rate, double your failure rate.” – Thomas Watson, Sr., founder of IBM
    16. 16. Goal:Customer Satisfaction 15
    17. 17. Measuring Success 16
    18. 18. 17
    19. 19. Consumer Science 17
    20. 20. Consumer Science 17
    21. 21. Consumer Science 17
    22. 22. Consumer Science 17
    23. 23. Consumer Science A/B testing 17
    24. 24. What “performs best”? 18
    25. 25. Choosing the Right Metrics 19
    26. 26. Choosing the Right Metrics Core Metric: Retention 19
    27. 27. Choosing the Right Metrics Core Metric: Retention 20
    28. 28. Choosing the Right Metrics Core Metric: RetentionProxy Metric: Hours Watched 20
    29. 29. Start with a Hypothesis... 21
    30. 30. Start with a Hypothesis... If we make ahuge “play” button, people will watch more. 21
    31. 31. Start with a Hypothesis... If we make ahuge “play” button, people will watch more. If we give people $1 every time they press “play”, retention will improve. 21
    32. 32. Start with a Hypothesis... If we make a huge “play” button, people will watch more. If we give people $1 every time they press “play”, retention will improve. Showing moremovies & TV shows will lead tomore streaming and improved retention. 21
    33. 33. Start with a Hypothesis... If we make a huge “play” button, people will watch more. If we give people $1 every time they press “play”, retention will improve. Showing moremovies & TV shows will lead tomore streaming and improved retention. 22
    34. 34. Determine the Variables Showing moremovies & TV shows will lead tomore streaming and improved retention. 23
    35. 35. Determine the Variables Showing moremovies & TV shows will lead tomore streaming and improved retention. 23
    36. 36. Determine the Variables 24
    37. 37. Determine the Variables more titles per rowmore rows 24
    38. 38. Determine the Variables more titles per row Depth vs.more rows Breadth 24
    39. 39. Design the Test Control 25 rows x 75 titles 25
    40. 40. Design the Test Control 25 rows x 75 titles 25
    41. 41. Design the Test Control 25 rows x 75 titles Cell 125 rows x150 titles 25
    42. 42. Design the Test Control 25 rows x 75 titles Cell 1 Cell 225 rows x 31 rows x150 titles 75 titles 25
    43. 43. Design the Test Control 25 rows x 75 titles Cell 1 Cell 2 Cell 325 rows x 31 rows x 31 rows x150 titles 75 titles 150 titles 25
    44. 44. Design the Test Control 25 rows x 75 titles Cell 1 Cell 3 Cell 225 rows x 31 rows x150 titles 31 rows x 150 titles 75 titles 25
    45. 45. 26
    46. 46. Level theplaying field 26
    47. 47. Level the Data fromplaying field real customers 26
    48. 48. Level the Data from Align to coreplaying field real customers metrics 26
    49. 49. Large scale concept testingcan provide general direction
    50. 50. Original PlayStation 3 UI
    51. 51. Original PlayStation 3 UIHow can we get our customers to watch more?
    52. 52. Cell 1:Browsing more titles using a flexible menu system and hierarchy will lead to more viewing
    53. 53. Cell 2:A simple, flat interface which focuses on content will lead to more viewing
    54. 54. Cell 3:Separating navigation from content will guide ourmembers to the content and lead to more viewing
    55. 55. Cell 4:A video-rich browsing experience will lead to more viewing
    56. 56. VOTE! Cell 1: Cell 2:Hierarchy Grid Cell 3: Cell 4:Separation Video
    57. 57. And the winner is... Cell 1: Cell 2: Control Grid Cell 3: Cell 4:Separation Video
    58. 58. Iterate...
    59. 59. Iterate...
    60. 60. Iterate...
    61. 61. Iterate...
    62. 62. Iterate...
    63. 63. Iterate...
    64. 64. Iterate...
    65. 65. Iterate...
    66. 66. Data can give you confidence in your decisions
    67. 67. HypothesisA cleaner UI which showcases the content will lead to more viewing.
    68. 68. Cell 0: Cell 1:Control Clean
    69. 69. larger boxes
    70. 70. no titles
    71. 71. on hover
    72. 72. ResultsCell 0: Cell 1:Control Clean 39
    73. 73. ResultsCell 0: Cell 1:Control Clean 39
    74. 74. Roll Out! 40
    75. 75. Roll Out! but... 40
    76. 76. Roll Out! but... “NO GOOD...it SucKs BIG TIME...plz change back” 40
    77. 77. Roll Out! but... “NO GOOD...it SucKs BIG TIME...plz change back” “I am hoping that at least oneperson at Netflix with authority will put down the crack pipe...and go back to the old interface” 40
    78. 78. Roll Out! but... “NO GOOD...it SucKs BIG TIME...plz change back” “I am hoping that at least oneperson at Netflix with authority will put down the crack pipe...and go back to the old interface” “I don’t like it, where is the sortable list? and I can’t stand the scroll it’s just wierd and stupid...” 40
    79. 79. Respond... 41
    80. 80. Today
    81. 81. Making Decisions
    82. 82. Dealing With Results 44
    83. 83. Dealing With ResultsRoll it out 44
    84. 84. Dealing With Results Roll it outWith consideration to thechange effect on users 44
    85. 85. Dealing With Results Roll it out Move OnWith consideration to thechange effect on users 44
    86. 86. Dealing With Results Roll it out Move OnWith consideration to the Polish won’t make it turnchange effect on users positive 44
    87. 87. When the World is Flat 45
    88. 88. When the World is FlatUnsure of Value Retest? 45
    89. 89. When the World is Flat Unsure of Value Retest? If there’s a specificconcern, address it and consider retesting 45
    90. 90. When the World is Flat Unsure of Value Value Add Feature Retest? Roll out? If there’s a specificconcern, address it and consider retesting 45
    91. 91. When the World is Flat Unsure of Value Value Add Feature Retest? Roll out? If there’s a specific but...concern, address it and - Ongoing tax consider retesting - Likely to constrain future innovation 45
    92. 92. Pitfalls 46
    93. 93. Pitfalls• A/B testing becomes a crutch for decision making 46
    94. 94. Pitfalls• A/B testing becomes a crutch for decision making• Not getting a clear signal 46
    95. 95. Pitfalls• A/B testing becomes a crutch for decision making• Not getting a clear signal• Too many variations 46
    96. 96. Pitfalls• A/B testing becomes a crutch for decision making• Not getting a clear signal• Too many variations• Local maximum problem 46
    97. 97. Pitfalls• A/B testing becomes a crutch for decision making• Not getting a clear signal• Too many variations• Local maximum problem• Declaring victory too soon 46
    98. 98. Pitfalls• A/B testing becomes a crutch for decision making• Not getting a clear signal• Too many variations• Local maximum problem• Declaring victory too soon• Not knowing when to end a test 46
    99. 99. Culture of Consumer Science
    100. 100. + 48
    101. 101. Approach + 48
    102. 102. Approach + People 48
    103. 103. 49
    104. 104. Fostering the Culture 49
    105. 105. Fostering the Culture• Universally embraced 49
    106. 106. Fostering the Culture• Universally embraced• Common vocabulary 49
    107. 107. Fostering the Culture• Universally embraced• Common vocabulary• Be disciplined 49
    108. 108. Fostering the Culture• Universally embraced• Common vocabulary• Be disciplined• Share results broadly 49
    109. 109. Fostering the Culture People Matter• Universally embraced• Common vocabulary• Be disciplined• Share results broadly 49
    110. 110. Fostering the Culture People Matter• Universally embraced • Humble• Common vocabulary• Be disciplined• Share results broadly 49
    111. 111. Fostering the Culture People Matter• Universally embraced • Humble• Common vocabulary • Focused• Be disciplined• Share results broadly 49
    112. 112. Fostering the Culture People Matter• Universally embraced • Humble• Common vocabulary • Focused• Be disciplined • Data-driven• Share results broadly 49
    113. 113. Fostering the Culture People Matter• Universally embraced • Humble• Common vocabulary • Focused• Be disciplined • Data-driven• Share results broadly • Curious about business 49
    114. 114. “We are what we repeatedly do.Excellence, then, is not an act but a habit.” – Aristotle
    115. 115. Questions? Rochelle King - roking@netflix.com Matt Marenghi - mmarenghi@netflix.comPS - Interested in learning more first hand? We’re hiring designers and engineers!
    116. 116. END

    ×