MIKE DRISCOLL<br />CO-FOUNDER + CTO<br />METAMARKETS<br />@medriscoll<br />making sense of data:  Lessons for start-ups<br />
If it is unmanaged, you will be blind to weaknesses, deaf to new opportunities, and dumb to your customers.<br />data IS S...
Data is the sensory input that moves through it.<br />your technology stack is your nervous system<br />
Collecting customer data is a way to “get out of the building.”<br />create feedback loops<br />
customers<br />
Complexity lies at the boundaries between systems<br />make etl a priority<br />
Real-Time<br />Daily<br />Weekly<br />sync data latencieswith decision loops<br />
All data models are wrong.<br />Some data models are useful.<br />don’t agonize overdata schemas<br />
Hadoop is a processing layer<br />You also need a query layer<br />hadoop isn’t enough<br />
Embrace a polyglot architecture of formats and data stores<br />there is no‘One True database’<br />
A RESTful query layer will reduce pain of migration.<br />separate query& storage layers<br />
Reduce the barriers to accessing data across systems.<br />make data easy<br />
“Human-time” means that queries return in seconds.<br />make data fast<br />
Human activity is small in size<br />fully instrument your customers<br />
Human activity is small in size.<br />fully instrument your customers<br />
Machine-generated data can quickly overwhelm.<br />selectively instrument your machines<br />
Machine-generated data can quickly overwhelm.<br />selectively instrument your machines<br />
Work backwards from business questions.<br />Don’t let data architecture drive business needs<br />architect aroundbusines...
Someone who can munge, model, & visualize data<br />hire a data scientist<br />
Engineers with a thin grasp of statistics beat statisticians with thin grasp of engineering.<br />working code beats theor...
Isolated from production systems.<br />Analytics are a different constituency with different needs<br />create an analytic...
Both internal & external<br />obsess about dashboard design<br />
Either by directly monetizing them or enhance customer experience<br />extract value from yourdata assets<br />
YOUR TECHNOLOGYSTACK IS YOUR NERVOUS SYSTEM.YOUR DATA IS YOUR SENSORY INPUT.<br />
MIKE DRISCOLL<br />CO-FOUNDER + CTO<br />METAMARKETS<br />@medriscoll<br />making sense of data:  lessons for start-ups<br...
Upcoming SlideShare
Loading in …5
×

Making Sense of Data

4,029 views
3,909 views

Published on

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,029
On SlideShare
0
From Embeds
0
Number of Embeds
119
Actions
Shares
0
Downloads
43
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Feedback loops.
  • Over the next set of slides, I’m going discuss some lessons as data moves through a start-ups organization...
  • So this is how we frame our technology stack at my start-up, Metamarkets. It’s a four-tiered stack. I believe that many start-ups have similar stacks when they think about how data moves through them.But there’s something important missing here: your technology stack doesn’t exist in a vacuum.
  • Over the next set of slides, I’m going discuss some lessons as data moves through a start-ups organization...
  • To be successful, we’ve got to incorporate feedback, both from customers, and the larger world.Feedback is critical. Steve Blank and Eric Ries have talked about not iterating in a vacuum.The feedback you can achieve by managing your data can be incredibly important.
  • Which begins at ingestion, and ends at the top with products.
  • ETL often gets a bad wrap. Nothing could be more important to your company than moving data between systems.That is what ETL does. It should be a first class piece of your architecture, you should put one of top engineers at this layer of the stack.(At Metamarkets, we have a former VP of BlackRock working on ETL, and he’s been outstanding).When our ETL breaks down, the data stops flowing, and our business stops moving.
  • * Don’t invest in real-time data if you’re making weekly decisions.* Moving away from batch systems is hard work.Alternatively, some systems – such as those required for monitoring – may need sub-millisecond response times.But as a general rule, reducing latency in systems creates value in unexpected ways.
  • Don’t get bogged down in discussions of the perfect data format for your company. “All models are wrong, some models are useful.”There is no such thing.
  • Which begins at ingestion, and ends at the top with products.
  • You will likely end up using a variety of data stores in your organization.So don’t agonize over your data store choices.
  • As you scale and grow, you will have to change storage layers.We went through three different versions, first Postgres, then Greenplum, then HBase, before developing on our own version.
  • embrace standardssimple, flat formats wherever possible (XML is the clamshell packaging of data)We recently onboarded a client who gave us JSON data. It’s a beautiful thing.Everyone knows SQL: Cloudera found that Hadoop cluster use went up 10x when HIVE was installed.
  • But HIVE isn’t going to cut it for getting quick insights into their data. No wants to wait 15 minutes for answers.Put in ETL flows that summarize data, and keep a core set of key business metrics in a “hot” database, one that can be queried in real-time.
  • Feedback loops.
  • Requirements for systems should be driven by their business needs.
  • Which begins at ingestion, and ends at the top with products.
  • but remember...
  • 4sq explorepymkkaggle winnerswritten by individuals who were engineers first, statisticians second.when hiring folks to do your analytics, you want those who can roll up their sleaves and actually code the models themselvees.
  • don’t make your analytics team compete for resources, or jeopardize production systemsthey will only get burned and then cut outset up systems where analytics folks can play with data, safelyanalytics often falls into the class of problems that are important, but not urgent. don’t let this happen to your organization.
  • Which begins at ingestion, and ends at the top with products.
  • Data represents the totality of a start-up’s sensory experiences.Absent a well-developed digital nervous system to respond to these inputs, you are blind to your deficiencies, deaf to your customers, and dumb to your opportunities.
  • Either externally, as Klout,Flightcaster, and BillGuard have done.4SQ’s Explore and LinkedIn’s PYMK, has both improved User Experience.Having strong analytical talent in your organization is critical to success here.
  • Feedback loops.
  • Making Sense of Data

    1. 1. MIKE DRISCOLL<br />CO-FOUNDER + CTO<br />METAMARKETS<br />@medriscoll<br />making sense of data: Lessons for start-ups<br />
    2. 2. If it is unmanaged, you will be blind to weaknesses, deaf to new opportunities, and dumb to your customers.<br />data IS SENSORY INPUT<br />
    3. 3. Data is the sensory input that moves through it.<br />your technology stack is your nervous system<br />
    4. 4.
    5. 5. Collecting customer data is a way to “get out of the building.”<br />create feedback loops<br />
    6. 6. customers<br />
    7. 7.
    8. 8. Complexity lies at the boundaries between systems<br />make etl a priority<br />
    9. 9. Real-Time<br />Daily<br />Weekly<br />sync data latencieswith decision loops<br />
    10. 10. All data models are wrong.<br />Some data models are useful.<br />don’t agonize overdata schemas<br />
    11. 11. Hadoop is a processing layer<br />You also need a query layer<br />hadoop isn’t enough<br />
    12. 12.
    13. 13. Embrace a polyglot architecture of formats and data stores<br />there is no‘One True database’<br />
    14. 14. A RESTful query layer will reduce pain of migration.<br />separate query& storage layers<br />
    15. 15. Reduce the barriers to accessing data across systems.<br />make data easy<br />
    16. 16. “Human-time” means that queries return in seconds.<br />make data fast<br />
    17. 17. Human activity is small in size<br />fully instrument your customers<br />
    18. 18. Human activity is small in size.<br />fully instrument your customers<br />
    19. 19. Machine-generated data can quickly overwhelm.<br />selectively instrument your machines<br />
    20. 20. Machine-generated data can quickly overwhelm.<br />selectively instrument your machines<br />
    21. 21. Work backwards from business questions.<br />Don’t let data architecture drive business needs<br />architect aroundbusiness questions <br />
    22. 22.
    23. 23. Someone who can munge, model, & visualize data<br />hire a data scientist<br />
    24. 24. Engineers with a thin grasp of statistics beat statisticians with thin grasp of engineering.<br />working code beats theoretical models<br />
    25. 25. Isolated from production systems.<br />Analytics are a different constituency with different needs<br />create an analytics sandbox<br />
    26. 26.
    27. 27. Both internal & external<br />obsess about dashboard design<br />
    28. 28.
    29. 29. Either by directly monetizing them or enhance customer experience<br />extract value from yourdata assets<br />
    30. 30. YOUR TECHNOLOGYSTACK IS YOUR NERVOUS SYSTEM.YOUR DATA IS YOUR SENSORY INPUT.<br />
    31. 31. MIKE DRISCOLL<br />CO-FOUNDER + CTO<br />METAMARKETS<br />@medriscoll<br />making sense of data: lessons for start-ups<br />questions?<br />

    ×