Building Data Right Order Things

•

3 likes•1,836 views

The document discusses the importance of building data products in the right order. It recommends first focusing on data infrastructure, then doing offline modeling, launching an initial online data product, and gathering user feedback. This allows issues to be identified and addressed without wasting resources on unnecessary optimizations. Two key questions are proposed: 1) how a change will impact the core user metric, and 2) how users will spend their limited time with the product. Premature optimization should be avoided, and focus should be on the critical 3% of code that matters most for users.

Data & Analytics

Building Data Products:
The Right Order of Things
Gloria Lau
VP of Data, Timeful
Keynote @ Big Data Tech Con

http://www.linkedin.com/in/gloriatlau/
@gloriatlau

Right order of things
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback

The challenge
Exception: tracking code missing/
overloaded!
Debug: Power user computation
takes forever!
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback

The challenge
Data viz --> ID'ed new data potential
--> Yet another data product
Sparse data --> Crappy model -->
Need to nudge users for *more* data
Non-standardized data --> Crappy
model --> Need to standardize
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback

• Four diseases have broken out in the world and it
is up to a team of specialists in various fields to
find cures for these diseases before mankind is
wiped out ... the diseases are out breaking fast
and time is running out: the team must try to stem
the tide of infection in diseased areas while also
towards cures. A truly cooperative game where
you all win or you all lose.
• How do you win?
• Optimally deploy minimal resources in the right
order

• What is optimal
• Do you fix that tracking issue first?
• Do you optimize your power user computation?
• Do you double down on standardization?
• Relevant classifications
• P0 vs P1
• big company vs small company

2 Questions to ask
1 Quote answers them all

“Premature optimization is the root of all evil.”
–Donald Knuth

What is the one metric that
your data product will move?
• Retention. Growth. Engagement. Money. Etc.
• Find it, and focus

If your users use your product a min/
day/user, how would you spend that?
• Data scientists love data. More
the merrier.
• More data solves your data
scientist's problem. It does not
solve your user's problem.

Do you fix that tracking issue first?
• Q1: Is it in the critical path of measuring that
metric?
• Q2: Are you throwing away user's time?

$Do you optimize your power user computation? • Q1: Are power users your key user metric to lift? • Q2: What fraction of total user's time is affected by this?$

Do you double down on
standardization?
• Q1: Peel the onion. How will x
% increase in standardization
rate affect your current and
projected metric?
• Q2: Does it add friction to the
funnel?

• Right order:
• talent first
• assimilation
• the 3%; fail fast

“Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact when
debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization is
the root of all evil. Yet we should not pass up our opportunities in that
critical 3%. A good programmer will not be lulled into complacency by
such reasoning, he will be wise to look carefully at the critical code; but
only after that code has been identified. It is often a mistake to make a
priori judgments about what parts of a program are really critical, since
the universal experience of programmers who have been using
measurement tools has been that their intuitive guesses fail.”
–Donald Knuth

What's hot

Simplify your analytics strategy by Jayesh DosiJayesh Dosi

Worst Practices in Artificial IntelligenceWilliam Tsoi

Levelling up your data infrastructureSimon Belak

Intro to Data and Analytics for StartupsThe Ohio State University Wexner Medical Center

Acceptance, Accessible, Actionable and AuditableAlban Gérôme

MassIntelligence 2018: How to Rapidly Prototype an AI SolutionMassTLC

Data Mashups -Data Science SummitPeter Skomoroch

Adoption is the only option hadoop is changing our world and changing yours f...DataWorks Summit

Correlation does not mean causationPeter Varhol

LJC 2014 "Professional Software Development: Thinking Fast and Slow"Daniel Bryant

Books! Google isn't the only source of informationJisc

Artificial Intelligence and the Data Centersflaig

Boosting Customer EngagementScott Truitt

Dataiku r users group v2Cdiscount

Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.

Online Games Analytics - Data Science for FunDataiku

Big Data Ppt Powerpoint Presentation SlidesSlideTeam

Adi Wijaya - Scrum in Data Science, What Works and What Doesn’tAgile Impact

Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku

What's hot (19)

Simplify your analytics strategy by Jayesh Dosi

Worst Practices in Artificial Intelligence

Levelling up your data infrastructure

Intro to Data and Analytics for Startups

Acceptance, Accessible, Actionable and Auditable

MassIntelligence 2018: How to Rapidly Prototype an AI Solution

Data Mashups -Data Science Summit

Adoption is the only option hadoop is changing our world and changing yours f...

Correlation does not mean causation

LJC 2014 "Professional Software Development: Thinking Fast and Slow"

Books! Google isn't the only source of information

Artificial Intelligence and the Data Center

Boosting Customer Engagement

Dataiku r users group v2

Enable Advanced Analytics with Hadoop and an Enterprise Data Hub

Online Games Analytics - Data Science for Fun

Big Data Ppt Powerpoint Presentation Slides

Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t

Dataiku - data driven nyc - april 2016 - the solitude of the data team m...

Similar to Building Data Right Order Things

Doing Analytics Right - Building the Analytics EnvironmentTasktop

predictive analysis and usage in procurement ppt 2017Prashant Bhatmule

Barga Galvanize Sept 2015Roger Barga

Machine learning in productionTuri, Inc.

Doing Analytics Right - Designing and Automating AnalyticsTasktop

Testing metrics webinarPractiTest

Building and Scaling High Performing Technology Organizations by Jez Humble a...Agile India

The UX AnalystJainan Sankalia

Similar to Building Data Right Order Things (20)

Doing Analytics Right - Building the Analytics Environment

predictive analysis and usage in procurement ppt 2017

Barga Galvanize Sept 2015

Machine learning in production

Doing Analytics Right - Designing and Automating Analytics

Testing metrics webinar

Building and Scaling High Performing Technology Organizations by Jez Humble a...

The UX Analyst

Recently uploaded

Digi Khata Problem along complete plan.pptxTanveerAhmed817946

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Ukraine War presentation: KNOW THE BASICSAishani27

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

Recently uploaded (20)

Digi Khata Problem along complete plan.pptx

Customer Service Analytics - Make Sense of All Your Data.pptx

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Ukraine War presentation: KNOW THE BASICS

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Call Girls In Mahipalpur O9654467111 Escorts Service

Log Analysis using OSSEC sasoasasasas.pptx

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Decoding Loan Approval: Predictive Modeling in Action

04242024_CCC TUG_Joins and Relationships

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

E-Commerce Order PredictionShraddha Kamble.pptx

Building Data Right Order Things

1. Building Data Products: The Right Order of Things Gloria Lau VP of Data, Timeful Keynote @ Big Data Tech Con

2. http://www.linkedin.com/in/gloriatlau/ @gloriatlau

3. What do they have in common?

4. Right order of things def __init__(self): data infrastructure for x in range(3): offline modeling online data product user feedback

5. Model Product

6. Model Product

7. The challenge Exception: tracking code missing/ overloaded! Debug: Power user computation takes forever! def __init__(self): data infrastructure for x in range(3): offline modeling online data product user feedback

8. The challenge Data viz --> ID'ed new data potential --> Yet another data product Sparse data --> Crappy model --> Need to nudge users for *more* data Non-standardized data --> Crappy model --> Need to standardize def __init__(self): data infrastructure for x in range(3): offline modeling online data product user feedback

10. • Four diseases have broken out in the world and it is up to a team of specialists in various fields to find cures for these diseases before mankind is wiped out ... the diseases are out breaking fast and time is running out: the team must try to stem the tide of infection in diseased areas while also towards cures. A truly cooperative game where you all win or you all lose. • How do you win? • Optimally deploy minimal resources in the right order

11. • What is optimal • Do you fix that tracking issue first? • Do you optimize your power user computation? • Do you double down on standardization? • Relevant classifications • P0 vs P1 • big company vs small company

12. 2 Questions to ask 1 Quote answers them all

13. “Premature optimization is the root of all evil.” –Donald Knuth

14. What is the one metric that your data product will move? • Retention. Growth. Engagement. Money. Etc. • Find it, and focus

15. If your users use your product a min/ day/user, how would you spend that? • Data scientists love data. More the merrier. • More data solves your data scientist's problem. It does not solve your user's problem.

16. Do you fix that tracking issue first? • Q1: Is it in the critical path of measuring that metric? • Q2: Are you throwing away user's time?

17. Do you optimize your power user computation? • Q1: Are power users your key user metric to lift? • Q2: What fraction of total user's time is affected by this?

18. Do you double down on standardization? • Q1: Peel the onion. How will x % increase in standardization rate affect your current and projected metric? • Q2: Does it add friction to the funnel?

19. “Premature optimization is the root of all evil.” –Donald Knuth

20. • Right order: • talent first • assimilation • the 3%; fail fast

21. “Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.” –Donald Knuth

22. It's an art.

Building Data Right Order Things

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Building Data Right Order Things

Similar to Building Data Right Order Things (20)

Recently uploaded

Recently uploaded (20)

Building Data Right Order Things