Roger EhrenbergFounder & Managing PartnerIA Ventures
http://www.flickr.com/photos/wallyg/3777954520/http://www.flickr.com/photos/chanc/310847464/                              ...
Storage cost                                                          Network access                                      ...
Small   Thousands of sales figures (10 GB)          Stored in memoryMedium    Millions of web pages          Stored on dis...
✗
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
Data Only From                                    Others DataOthers Platforms                   Source of Data     Hybrid ...
http://www.flickr.com/photos/tps58/6158683716
Complex Data Architectures  Proprietary Algorithms      Rich Analytics
Complex Data Architectures  Proprietary Algorithms      Rich Analytics
010001011        Contributory         Database          Platform
User               engagementImprovements               PRODUCT      Data                 Insight
http://www.billfrymire.com/blog/wp-content/uploads/2008/04/dna-strand-code.jpg
Hacking                          Statistics          Domain Expertise                             Drew Conway, The Data Sc...
MachineHacking                          Statistics              Learning                Data              Scientist       ...
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Creating Competitive Advantage Through Data (IA Ventures)
Upcoming SlideShare
Loading in...5
×

Creating Competitive Advantage Through Data (IA Ventures)

3,986

Published on

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
  • I'm not sure there are any VCs who understand the business of data as well as IA Ventures seems to
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,986
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
213
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide
  • HOW DID WE GET HERE?WHAT IS BIG DATA?WHAT REALLY CREATES TRUE COMPETITIVE BARRIERS IN DATA-DRIVEN BUSINESSES?
  • As I wrote a post recently, DATA IS THE NEW DOT COM. Funds are announcing a new focus on “Big Data.” – IT’S HOT. WHY NOW?
  • Big Data is pervasive - permeating every industryAdvertisingGovernmentFinancial ServicesCommercePharma Biotech & HealthcareThe good news: data is becoming MORE ACTIONABLEThe bad news: it is INCREASINGLY DIFFICULT TO EXTRACT VALUE given the VOLUME, VELOCITY AND MULTIPLE DATA TYPES
  • MASSINVE ADVANCES IN INFRASTRUCTURE HAS SEEDED THE BIG DATA REVOLUTION OVER THE PAST 50 years
  • THESE TRENDS HAVE A DIRECT IMPACT UPON BUSINESS – AND THE BOTTOM LINEe.g., RECOMMENDATION ENGINESTHAT LEVERAGEHISTORICAL DATA andPREDICTIVE ANALYTICS to generateACTIONABLE REAL-TIME INSIGHT for customers
  • CAN WE AGREE ON A SET OF DEFINITIONS GIVEN THE AMBIGUITY OF THE TERM?
  • Sizes that were unimaginable a few years ago are now commonplaceJust storing and accessing the data can be difficultSIZE – MANAGED WITH – STOREDSmall :: Excel, R :: fits in memory on one machineMedium :: indexed files, monolithic DB :: fits on disk on one machineLarge :: Hadoop, Distributed DB :: stored across many machines Example in the IA Ventures portfolio: METAMARKETS – LARGE + REAL-TIMEPROBLEM: WHEN YOU MOVE TO DISTRIBUTED DATABASES, even the most simple mathematical tasks which are trivial for small and medium size systems are challenging
  • Data that DIFFICULT FOR COMPUTERS TO UNDERSTANDPrincipal example being NATURAL LANGUAGETEXT, IMAGES, VIDEOVALUABLE INFORMATION TRAPPED INSIDE THIS DATA, e.g., Twitter, earnings releasesExample in the IA Ventures portfolio: RECORDED FUTURE – LARGE + UNSTRUCTURED
  • More data coming in fasterDecision windows getting shorterValuable to worthless in a matter of minutes. (seconds … no milliseconds) :: RAPID VALUE DECAY – EVERYTHING IS BEGINNING TO LOOK LIKE TRADINGe.g., trading, ad servingSTREAMS ARE WHERE REAL-TIME INSIGHT COME FROM:: Stream processing – insight is extracted as soon as the data shows upExample in the IA Ventures portfolio: DATASIFT – LARGE + UNSTRUCTURED + REAL-TIME
  • BIG DATA = COMPLEX DATAExtracting value from Big Data is FREAKING HARDBig Data companies are mash-ups of these different attributes :: we like that at IA Ventures. WE BELIEVE THIS CREATES BARRIERSSTORAGE AND ANALYTICS generally go hand in hand :: LOTS OF DEPENDENCIES
  • THE IA VENTURE DEFINITION
  • At IA Ventures we call this the DATA TAXONOMYINPUTS on the y-axisOUTPUTS on the x-axis
  • SINGLE SOURCE DATA PLATFORMS TWITTERData generated on its platform – consumed as a discrete data streamPeople come to Twitter for the streamHigher order enrichment delivered by others
  • THIRD PARTY DATA PLATFORMSDATASIFTIngests a variety of streams from a range of platforms – Twitter, Wordpress,LinkedIn, etc.ENRICHES THOSE STREAMS with analytics and other forms of data like SENTIMENT AND REPUTATIONCan either consume a pure data product (the Twitter firehose) or OVERLAY ADDED VALUE TO EXTRACT INSIGHT
  • MORE SOPHISTICATED PRODUCTIZATION AROUND THE DATA ASSETPLACE IQMULTI-SOURCE – GEO DATA, WEATHER DATA, TRAFFIC DATA, ETC.COMPLEX ALGORITHMS, e.g., looking at the relationship among brand, weather forecast and time of day to optimize ad placement and offersCreate and maintain competitive advantage through FRESHNESS – TIMELY and ACTIONABLE information
  • SINGLE SOURCE PLATFORMS WITH RICH PRODUCT OFFERINGSRepresent a phase change – Big Data companies who don’t sell data BUT USE DATA TO OPTIMIZE PRODUCT AMAZON – rich trove of user data that is leveraged to optimize both user experience and economic outcomes. REAL-TIME PERSONALIZATION, HYPER-CONTEXTUAL
  • MULTI-SOURCE, HIGHLY REFINED PRODUCT –FUSING INTERNAL AND EXTERNAL DATA FOR MAXIMUM COMPETITIVE ADVANTAGEWAL-MARTIntersection of historical user behavior, inventory levels and weather data to optimize a promotion, shipping patterns, buying policy, etc.RENAISSANCE TECHNOLOGIESBuy massive amounts of external dataCreate their own metadataIndex and archive petabytes of data for historical analysis, model creation and calibrationThe firm’s success – massive absolute and relative returns – is the ultimate example of A HIGHER-ORDER DATA DRIVEN PRODUCT
  • THE TREND AS SIMPLE DATA BECOMES COMMODITIZED andACTIONABLE INSIGHTS ARE WHAT CUSTOMERS REALLY WANT – AND ARE WILLING TO PAY FOR
  • EXECUTION is TABLE STAKES TO PLAY THE GAME
  • SO IF IT’S NOT ABOUT TECHNOLOGY AND ALGORITHMS, WHAT IS IT ABOUT??
  • The rise of the CONTRIBUTORY DATABASE – DATA EXHIBITING TRADITIONAL NETWORK EFFECTSThese companies TRANSCEND SMART ALGORITHMSIn the SHORT RUN, SMARTER ALGOS provide a needed edge to gain early adoption (OUT-EXECUTE everyone else)In the LONG RUN, at scale, USER CONTRIBUTED DATA IS WHAT CREATES THE COMPETITIVE MOATBILLGUARD
  • The rise of DATA ECONOMIES OF SCALEDay 1: not much data, not much valueAs the data asset builds, insights are gleaned, fed back into the product, users interact with the product and create more valuable usage dataBANKSIMPLE, PLACEIQ
  • CORE COMPETENCIES FOR A BIG DATA COMPANY
  • Machine learning: great skills, mathematically grounded but inability to bring deep industry knowledge to problem-solvingResearch: strong industry knowledge and mathematical grounding but inability to operate at scaleDanger zone: strong dev skills plus industry knowledge but without analytical rigorDATA SCIENTSTS ARE TRUE UNICORNS
  • NOT ONLY ABOUT DATA SCIENTISTS AND TECHNOLOGISTS, but DATA CENTRIC LEADERSHIP
  • Creating Competitive Advantage Through Data (IA Ventures)

    1. 1. Roger EhrenbergFounder & Managing PartnerIA Ventures
    2. 2. http://www.flickr.com/photos/wallyg/3777954520/http://www.flickr.com/photos/chanc/310847464/ http://www.flickr.com/photos/northeastindiana/2313044640/ http://www.flickr.com/photos/ynse/542370154/
    3. 3. Storage cost Network access 1B hosts # of hosts$ per TB 1980 – Apple: $14M per TB ARPAnet Node 1 2010 – Barracuda, $70 per TB At UCLA 1970 today 1969 today CPU cost Bandwidth cost $1200 per Mbps 1961 – IBM 1620 , $1,100,000,000$ per GFLOPS $ per Mbps 2009 – AMD Radeon, $0.59 $5 per Mbps 1960 today 1998 today Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
    4. 4. Small Thousands of sales figures (10 GB) Stored in memoryMedium Millions of web pages Stored on diskLarge Billions of web clicks (1TB+) Distributed storage
    5. 5.
    6. 6. Data Only From Others DataOthers Platforms Source of Data Hybrid Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    7. 7. Data Only From Others DataOthers Platforms Source of Data Hybrid Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    8. 8. Data Only From Others DataOthers Platforms Source of Data Hybrid Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    9. 9. Data Only From Others DataOthers Platforms Source of Data Hybrid Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    10. 10. Data Only From Others DataOthers Platforms Source of Data Hybrid Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    11. 11. Data Only From Others DataOthers Platforms Source of Data Hybrid Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    12. 12. Data Only From Others DataOthers Platforms Source of Data Hybrid Companies focused on delivering increasing insight Your DataData Only From Your Platform Data Product Data-driven Product Final Product Sell Data Directly Sell Insight Sell Product
    13. 13. http://www.flickr.com/photos/tps58/6158683716
    14. 14. Complex Data Architectures Proprietary Algorithms Rich Analytics
    15. 15. Complex Data Architectures Proprietary Algorithms Rich Analytics
    16. 16. 010001011 Contributory Database Platform
    17. 17. User engagementImprovements PRODUCT Data Insight
    18. 18. http://www.billfrymire.com/blog/wp-content/uploads/2008/04/dna-strand-code.jpg
    19. 19. Hacking Statistics Domain Expertise Drew Conway, The Data Science Venn Diagram
    20. 20. MachineHacking Statistics Learning Data Scientist Domain Expertise Drew Conway, The Data Science Venn Diagram
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×