The document discusses the importance of building data products in the right order. It recommends first focusing on data infrastructure, then doing offline modeling, launching an initial online data product, and gathering user feedback. This allows issues to be identified and addressed without wasting resources on unnecessary optimizations. Two key questions are proposed: 1) how a change will impact the core user metric, and 2) how users will spend their limited time with the product. Premature optimization should be avoided, and focus should be on the critical 3% of code that matters most for users.
7. The challenge
Exception: tracking code missing/
overloaded!
Debug: Power user computation
takes forever!
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback
8. The challenge
Data viz --> ID'ed new data potential
--> Yet another data product
Sparse data --> Crappy model -->
Need to nudge users for *more* data
Non-standardized data --> Crappy
model --> Need to standardize
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback
9.
10. • Four diseases have broken out in the world and it
is up to a team of specialists in various fields to
find cures for these diseases before mankind is
wiped out ... the diseases are out breaking fast
and time is running out: the team must try to stem
the tide of infection in diseased areas while also
towards cures. A truly cooperative game where
you all win or you all lose.
• How do you win?
• Optimally deploy minimal resources in the right
order
11. • What is optimal
• Do you fix that tracking issue first?
• Do you optimize your power user computation?
• Do you double down on standardization?
• Relevant classifications
• P0 vs P1
• big company vs small company
14. What is the one metric that
your data product will move?
• Retention. Growth. Engagement. Money. Etc.
• Find it, and focus
15. If your users use your product a min/
day/user, how would you spend that?
• Data scientists love data. More
the merrier.
• More data solves your data
scientist's problem. It does not
solve your user's problem.
16. Do you fix that tracking issue first?
• Q1: Is it in the critical path of measuring that
metric?
• Q2: Are you throwing away user's time?
17. Do you optimize your power user
computation?
• Q1: Are power users your key user metric to lift?
• Q2: What fraction of total user's time is affected
by this?
18. Do you double down on
standardization?
• Q1: Peel the onion. How will x
% increase in standardization
rate affect your current and
projected metric?
• Q2: Does it add friction to the
funnel?
20. • Right order:
• talent first
• assimilation
• the 3%; fail fast
21. “Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact when
debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization is
the root of all evil. Yet we should not pass up our opportunities in that
critical 3%. A good programmer will not be lulled into complacency by
such reasoning, he will be wise to look carefully at the critical code; but
only after that code has been identified. It is often a mistake to make a
priori judgments about what parts of a program are really critical, since
the universal experience of programmers who have been using
measurement tools has been that their intuitive guesses fail.”
–Donald Knuth