Thinking with Data
Max Shron
@mshron
Thinking with Data
Max Shron
@mshron
Big picture
• Data is too much fun, too easy to rabbit-hole
• Specialized knowledge is hard to communicate
• Not all stats is well-adapted to the real world
• We need techniques to handle that
Big picture
• Design — UX, consulting, etc.
• Humanities — philosophy, law, etc.
• Social science — sociology, psychology, etc.
Scoping
• First set of techniques: scoping.
• The world gives us vague requests.
• We should have things clear before we start, or
we end up with uninteresting questions.
• Write things down or say out loud.
Scoping
• Imagine we are working with a company with a
subscription business. The CEO asks us for a
churn model.
• Bad scope: “We will use R to create a logistic
regression to predict who will quit using the
product.”
• Not actionable, irrelevant detail.
Scoping
• CoNVO:
• Context
• Need
• Vision
• Outcome
• Iterative process — start simple, refine, refine, refine.
Scoping
• Context
• Who are we working with? What are the big
picture, long term goals?
• “The company has subscription model. CEO’s
goal is to improve profitability.”
Scoping
• Need
• What is the particular knowledge we are
missing?
• “We want to understand who drops off early
enough so that we can intervene.”
Scoping
• Vision
• What would it look like to solve the problem?
• “We will build a predictive model using
behavioral data to predict who will drop off —
early enough to be useful.”
• Sources of data: important. Kinds of offers:
important. Kind of experimentation: important.
Kind of model: unimportant.
Scoping
• Outcome
• Who will be responsible for next steps? How will we
know if we are correct?
• “The tech team will implement the model in a batch
process to run daily, automatically sending out email
offers. We will calculate success metrics (precision
and recall) on held out users, and send a weekly
email of stats to stay on top.”
• We need a control group!
Scoping
• How do we develop a CoNVO?
• interviews
• kitchen sink interrogation
• roleplaying
• story-telling
• mockups
• Clearer vision with mockups
John Smith is 36 years old,
he has seen 40 different
pages over the past two
weeks, his and he has a 20%
chance to convert.
Scoping
Scoping
• Context We are hired to work in a hospital system with
250K patients over 20 years. Report to CEO, who is
interested in building a tool for reducing medical issues.
• Need After talking to some doctors, some belief that
there is overuse of antibiotics, but hard to detect.
• Vision A pilot investigation. If we find signal, repeatable
flagging tool.
• Outcome CMO will decide if pilot is valuable based on
report. Automated tool would be run by CMO on demand.
Arguments
• Data is not a ray gun!
• People need to be convinced, including you.
• The world is not deductive logic, we need a
theory that includes that people have minds.
• Trusting a tool, making a point with a graph,
coming to terms on a definition, convincing
someone to act differently, etc etc.
Arguments
• General model is semi-deductive. We move from
what is known / agreed-upon and move towards
what is not yet known.
• Patterns of reasoning help us make stronger cases
in less time and effort. Take advantage of two
thousand years of research.
Arguments
• Example: Predicting Δ poverty from satellite data
• It takes 5-10 years to get small scale poverty
estimates in poor countries.
• The vision: predict whether the poverty estimates
will go up or down ahead of time, using cheap
satellite data.
• The outcome: use to informally guide policy
decisions, keeping track of interventions.
Arguments
• Claim - Your audience does not believe it yet but
you think you can make a case for it.
• “Poverty can be modeled effectively with satellite
data.”
• Prior knowledge - Things your audience already
believes before the case is started.
Arguments
• Evidence - Where data enters an argument. We
transform data into evidence. Counts, models, graphs,
etc. make up the evidence.
• Justification - The reasoning why the evidence should
cause us to believe the claim.
• “These graphs indicate that the residuals for our
model are as we had anticipated.”
• Rebuttal - Any of the reasons why the justification might
not hold in this particular case. Usually smart to know.
Arguments
• Patterns!
• Causal analysis
• Convincing takes more than math
• Categories of dispute
Arguments
• Disputes of fact — getting the details straight
• “The F1 for this model is .7”
• Two stock issues:
• What is a reasonable truth condition?
• Is it satisfied?
Arguments
• Disputes of definition — relating words to math
• “Poverty is defined as FGT, α = 2”
• Three stock issues:
• Does this definition make a useful distinction?
• How consistent is this definition with prior ideas?
• What, if any, are the reasonable alternatives?
Arguments
• Disputes of value — making the right trade-offs
• “Our model is simple enough.”
• Two stock issues:
• How do our goals determine which values are
most important?
• Have the values been properly applied here?
Arguments
• Disputes of policy — the right course of action
• “We should use this model to informally guide our decisions
between official estimates.”
• Four stock issues:
• Is there a problem? ill
• Where is credit or blame due? blame
• Will the proposal solve it? cure
• Will it be better on balance? cost
Summary
• Take half the math and tools and twice the listening
to what people actually need.
• This is the tip of the iceberg. In general, we have a
lot to learn from others
• Let’s talk! @mshron

Max Shron, Thinking with Data at the NYC Data Science Meetup

  • 1.
  • 3.
  • 5.
    Big picture • Datais too much fun, too easy to rabbit-hole • Specialized knowledge is hard to communicate • Not all stats is well-adapted to the real world • We need techniques to handle that
  • 6.
    Big picture • Design— UX, consulting, etc. • Humanities — philosophy, law, etc. • Social science — sociology, psychology, etc.
  • 8.
    Scoping • First setof techniques: scoping. • The world gives us vague requests. • We should have things clear before we start, or we end up with uninteresting questions. • Write things down or say out loud.
  • 9.
    Scoping • Imagine weare working with a company with a subscription business. The CEO asks us for a churn model. • Bad scope: “We will use R to create a logistic regression to predict who will quit using the product.” • Not actionable, irrelevant detail.
  • 10.
    Scoping • CoNVO: • Context •Need • Vision • Outcome • Iterative process — start simple, refine, refine, refine.
  • 11.
    Scoping • Context • Whoare we working with? What are the big picture, long term goals? • “The company has subscription model. CEO’s goal is to improve profitability.”
  • 12.
    Scoping • Need • Whatis the particular knowledge we are missing? • “We want to understand who drops off early enough so that we can intervene.”
  • 13.
    Scoping • Vision • Whatwould it look like to solve the problem? • “We will build a predictive model using behavioral data to predict who will drop off — early enough to be useful.” • Sources of data: important. Kinds of offers: important. Kind of experimentation: important. Kind of model: unimportant.
  • 14.
    Scoping • Outcome • Whowill be responsible for next steps? How will we know if we are correct? • “The tech team will implement the model in a batch process to run daily, automatically sending out email offers. We will calculate success metrics (precision and recall) on held out users, and send a weekly email of stats to stay on top.” • We need a control group!
  • 15.
    Scoping • How dowe develop a CoNVO? • interviews • kitchen sink interrogation • roleplaying • story-telling • mockups
  • 16.
    • Clearer visionwith mockups John Smith is 36 years old, he has seen 40 different pages over the past two weeks, his and he has a 20% chance to convert. Scoping
  • 17.
    Scoping • Context Weare hired to work in a hospital system with 250K patients over 20 years. Report to CEO, who is interested in building a tool for reducing medical issues. • Need After talking to some doctors, some belief that there is overuse of antibiotics, but hard to detect. • Vision A pilot investigation. If we find signal, repeatable flagging tool. • Outcome CMO will decide if pilot is valuable based on report. Automated tool would be run by CMO on demand.
  • 19.
    Arguments • Data isnot a ray gun! • People need to be convinced, including you. • The world is not deductive logic, we need a theory that includes that people have minds. • Trusting a tool, making a point with a graph, coming to terms on a definition, convincing someone to act differently, etc etc.
  • 20.
    Arguments • General modelis semi-deductive. We move from what is known / agreed-upon and move towards what is not yet known. • Patterns of reasoning help us make stronger cases in less time and effort. Take advantage of two thousand years of research.
  • 22.
    Arguments • Example: PredictingΔ poverty from satellite data • It takes 5-10 years to get small scale poverty estimates in poor countries. • The vision: predict whether the poverty estimates will go up or down ahead of time, using cheap satellite data. • The outcome: use to informally guide policy decisions, keeping track of interventions.
  • 23.
    Arguments • Claim -Your audience does not believe it yet but you think you can make a case for it. • “Poverty can be modeled effectively with satellite data.” • Prior knowledge - Things your audience already believes before the case is started.
  • 24.
    Arguments • Evidence -Where data enters an argument. We transform data into evidence. Counts, models, graphs, etc. make up the evidence. • Justification - The reasoning why the evidence should cause us to believe the claim. • “These graphs indicate that the residuals for our model are as we had anticipated.” • Rebuttal - Any of the reasons why the justification might not hold in this particular case. Usually smart to know.
  • 26.
    Arguments • Patterns! • Causalanalysis • Convincing takes more than math • Categories of dispute
  • 27.
    Arguments • Disputes offact — getting the details straight • “The F1 for this model is .7” • Two stock issues: • What is a reasonable truth condition? • Is it satisfied?
  • 28.
    Arguments • Disputes ofdefinition — relating words to math • “Poverty is defined as FGT, α = 2” • Three stock issues: • Does this definition make a useful distinction? • How consistent is this definition with prior ideas? • What, if any, are the reasonable alternatives?
  • 29.
    Arguments • Disputes ofvalue — making the right trade-offs • “Our model is simple enough.” • Two stock issues: • How do our goals determine which values are most important? • Have the values been properly applied here?
  • 30.
    Arguments • Disputes ofpolicy — the right course of action • “We should use this model to informally guide our decisions between official estimates.” • Four stock issues: • Is there a problem? ill • Where is credit or blame due? blame • Will the proposal solve it? cure • Will it be better on balance? cost
  • 31.
    Summary • Take halfthe math and tools and twice the listening to what people actually need. • This is the tip of the iceberg. In general, we have a lot to learn from others • Let’s talk! @mshron