Building Competitive Moats With Data 
Pete Skomoroch 
@peteskomoroch 
DataLead 
Oct 1, 2014 - Berkeley
About Me 
• Ex Principal Data Scientist @ LinkedIn 
• Entrepreneur, Advisor at Data Collective
Competitive Moats
Data as Competitive Moat
Why the current obsession with Big Data?
The rise of Hadoop
What is Big Data?
Big Data: Myths
Big Data: Reality 
• Science, theory, and reason are not being replaced 
• Big Data is different: for some problems, big data produces 
better results than we find with smaller samples 
• Data storage and logging are increasingly cheap, so err on the 
side of collecting data to process later if you think it may be 
valuable 
• Large, differentiated data assets are the foundation for 
defensible products and better decisions
If software is eating the world…
… it is replacing it with data
Startups are moving offline life to online data 
• Restaurants => Yelp 
• Resume + Rolodex => LinkedIn 
• Powerpoint => SlideShare 
• Yearbook + Photos => Facebook 
• Real Estate => RedFin 
• Interior Design => Houzz
The Data Factory Revolution 
Source: 
http://www.linkedin.com/channels/disrupt 
2013 Steve Jennings/Getty Images 
Entertainment
Early Data Factory: del.icio.us
User Generated Data Moats
User entered data has Gravity
Behavioral history is a moat: life is easier when 
apps remember you
Reputation based Data Moats
Network Based Data Moats
Don’t build on top of someone else’s moat
Real scientists make their own data
Build distinct, defensible datasets
This sounds great, how do I build a data moat? 
http://xkcd.com/802/
A new occupation: data scientist
What do data scientists actually do? 
source: data from 
http://www.linkedin.com/skills
Two species of data scientist* 
Type I: Traditional BI 
• Question-driven 
• Interactive 
• Ad-hoc, post-hoc 
• Fixed data 
• Focus on speed and 
flexibility 
• Output is embedded into a 
report, dashboard, or in-database 
scoring engine 
Type II: Data Products 
• Metric-driven 
• Automated 
• Systematic 
• Fluid data 
• Focus on transparency and 
reliability 
• Output is a production 
system that makes 
customer-facing decisions 
*Slide adapted from Josh Wills “From the Lab to the Factory”
Data Products: automated systems that make 
customer facing decisions and collect data
Data Product pre-history: Data Aggregators 
• 1972: Vinod Gupta forms American Business 
Information, Inc., a database initially built via 
manual data entry of Yellow Pages 
information 
• 1973: LEXIS full text legal search launches 
publicly 
• 1986: Bloomberg reaches 5,000 terminal 
subscribers 
• 1994: Jerry Yang & David Filo compile and 
maintain a hand curated set of categorized 
links on the World Wide Web known as the 
Yahoo! Directory
The Rise of Algorithmic Data Products 
• Google: Web Search, PageRank, AdWords 
• Netflix: Movie Recommendations 
• Pandora: Music Recommendations 
• eBay: Product Search, Fraud Detection, Advertising 
• Amazon: Similar Items, Book Recommendations 
• LinkedIn: People You May Know, Who Viewed My Profile
LinkedIn Skills: a moat built by data products
Data Product investment and ROI 
• Skill Extraction and Standardization Pipeline 
• Skill Pages 
• Skills Section on member profiles 
• Suggested Skills Algorithm and email > 20M members 
• Skill Endorsements > 60M members, 3B+ Edges 
• Big product wins: engagement, recall, relevance 
• SkillRank & Reputation Algorithm R&D 
• LinkedIn is now the definitive source for information 
on skills & expertise 
*Statistics as of 2013
How leaders can drive data growth 
• Accountability: Who defines the data vision & 
roadmap in your organization? Who is accountable for 
building and expanding your moat? 
• Invest in data infrastructure, training, logging, & tools 
for rapid iteration. Build a data lake. 
• Invest in exploration and innovation, including user 
facing data product and algorithm development 
• Define a framework for trading off data quality and 
quantity metrics 
• Ask “How does this increase our data moat?” when 
evaluating any new project, incentivize it
Twitter: @peteskomoroch 
LinkedIn: linkedin.com/in/peterskomoroch

Building Competitive Moats With Data

  • 1.
    Building Competitive MoatsWith Data Pete Skomoroch @peteskomoroch DataLead Oct 1, 2014 - Berkeley
  • 2.
    About Me •Ex Principal Data Scientist @ LinkedIn • Entrepreneur, Advisor at Data Collective
  • 3.
  • 4.
  • 5.
    Why the currentobsession with Big Data?
  • 6.
  • 7.
  • 8.
  • 9.
    Big Data: Reality • Science, theory, and reason are not being replaced • Big Data is different: for some problems, big data produces better results than we find with smaller samples • Data storage and logging are increasingly cheap, so err on the side of collecting data to process later if you think it may be valuable • Large, differentiated data assets are the foundation for defensible products and better decisions
  • 10.
    If software iseating the world…
  • 11.
    … it isreplacing it with data
  • 12.
    Startups are movingoffline life to online data • Restaurants => Yelp • Resume + Rolodex => LinkedIn • Powerpoint => SlideShare • Yearbook + Photos => Facebook • Real Estate => RedFin • Interior Design => Houzz
  • 13.
    The Data FactoryRevolution Source: http://www.linkedin.com/channels/disrupt 2013 Steve Jennings/Getty Images Entertainment
  • 14.
  • 15.
  • 16.
    User entered datahas Gravity
  • 17.
    Behavioral history isa moat: life is easier when apps remember you
  • 18.
  • 19.
  • 20.
    Don’t build ontop of someone else’s moat
  • 21.
    Real scientists maketheir own data
  • 22.
  • 23.
    This sounds great,how do I build a data moat? http://xkcd.com/802/
  • 24.
    A new occupation:data scientist
  • 25.
    What do datascientists actually do? source: data from http://www.linkedin.com/skills
  • 26.
    Two species ofdata scientist* Type I: Traditional BI • Question-driven • Interactive • Ad-hoc, post-hoc • Fixed data • Focus on speed and flexibility • Output is embedded into a report, dashboard, or in-database scoring engine Type II: Data Products • Metric-driven • Automated • Systematic • Fluid data • Focus on transparency and reliability • Output is a production system that makes customer-facing decisions *Slide adapted from Josh Wills “From the Lab to the Factory”
  • 27.
    Data Products: automatedsystems that make customer facing decisions and collect data
  • 28.
    Data Product pre-history:Data Aggregators • 1972: Vinod Gupta forms American Business Information, Inc., a database initially built via manual data entry of Yellow Pages information • 1973: LEXIS full text legal search launches publicly • 1986: Bloomberg reaches 5,000 terminal subscribers • 1994: Jerry Yang & David Filo compile and maintain a hand curated set of categorized links on the World Wide Web known as the Yahoo! Directory
  • 29.
    The Rise ofAlgorithmic Data Products • Google: Web Search, PageRank, AdWords • Netflix: Movie Recommendations • Pandora: Music Recommendations • eBay: Product Search, Fraud Detection, Advertising • Amazon: Similar Items, Book Recommendations • LinkedIn: People You May Know, Who Viewed My Profile
  • 30.
    LinkedIn Skills: amoat built by data products
  • 31.
    Data Product investmentand ROI • Skill Extraction and Standardization Pipeline • Skill Pages • Skills Section on member profiles • Suggested Skills Algorithm and email > 20M members • Skill Endorsements > 60M members, 3B+ Edges • Big product wins: engagement, recall, relevance • SkillRank & Reputation Algorithm R&D • LinkedIn is now the definitive source for information on skills & expertise *Statistics as of 2013
  • 32.
    How leaders candrive data growth • Accountability: Who defines the data vision & roadmap in your organization? Who is accountable for building and expanding your moat? • Invest in data infrastructure, training, logging, & tools for rapid iteration. Build a data lake. • Invest in exploration and innovation, including user facing data product and algorithm development • Define a framework for trading off data quality and quantity metrics • Ask “How does this increase our data moat?” when evaluating any new project, incentivize it
  • 33.
    Twitter: @peteskomoroch LinkedIn:linkedin.com/in/peterskomoroch

Editor's Notes

  • #22 Scientists make measurements: http://seanjtaylor.com/post/41463778912/real-scientists-make-their-own-data Creating new information, observations, alpha Some data scientists go to great lengths to avoid collecting data or touching the user interface, when a small change can eliminate tons of wasted time Requires authority or support from leadership to make product changes Works best if data scientists are involved in design decisions from the start - CERN supercollider - collect something nobody else has collected
  • #23 Vision/Roadmap: what data doesn’t exist that would make your product better, aligned with company mission. Google Streetview Photos => Self Driving Car
  • #25 Facebook / LinkedIn story – emergence of new role
  • #27 --- "Built to Last" Be a clock builder - an architect - not a time teller --- Another analogy: are you a sports reporter, repeating the details of the game in a dashboard, or are you crunching that data to select the best new talent
  • #29 http://www.referenceforbusiness.com/history2/62/American-Business-Information-Inc.html http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down
  • #30 Consumer Internet: productization of data + algorithms - eBay, Google, Amazon, Netflix, Pandora, Google Index size is a barrier now