The Future of Data


     In 10 minutes
Google 2000
Google Inc
Today announced it has
released the largest search engine on the
Internet. 

Google’s new index, comprising
more than 1 billion URLs
Google 2008
Our indexing system for processing links
indicates that we now count 1 trillion unique
URLs 

(and the number of individual web
pages out there is growing by several billion
pages per day).
Data Growth
                                                            1,000
1000



750


                                                     500
500


                                              250
250
                                       120
                                 55
            4      10     24
       1
  0
   2000    2001   2002   2003   2004   2005   2006   2007   2008
What are we doing with all this
           data?
A brief history of data
 incomplete and mostly inaccurate
Relational databases
 became a crutch
Memcache, a scooter
Limited by our tools…
Shiny Raw Data
Around 2007
The crutches began to snap
  – Social graph emergence
  – Substantial data growth
  – Flexible & rich data structures needed
  – Open source tools emerged
  – No longer just Google & Amazon
MongoDB, Hadoop, etc.
An empowering exoskeleton
Products built with these
   new data tools largely
resembled earlier products
The crutch was gone but we
 acted like it was still there
The data tools today are
 vastly more powerful


  Tools built to make
   sense of our data
Your data is telling you things



        Are you listening?
We need to do more than report



We need products that…

inform
educate
reveal
Google analytics
What if it provided actual intelligence, like…

Steve,
Your website currently has higher than usual traffic.

Most of the visitors are coming from a blog post on
Lifehacker which links to your Vim page. There's quite a
conversation happening there so you may want to
participate. Additionally the conversation seems to have
spilled over onto Twitter.
Google analytics

Of the people that go to MongoDB.org:

 4% visited DataStax
20% visited Amazon Web Services
10% searched for cloud hosting
…
It's about making connections


          … so your users don’t have to
City of Chicago
Goals
• Bring all the data all in one place
• Make it available in realtime
• Find relationships in the data
Modern tools like
         Redis
        Hadoop
       MongoDB
     Storm / Spark
R / Ruby / Python / Go

 empower you
What are you going to do with
         your data?
Questions
           @spf13
         spf13.com
       MongoDB.org

Future of data

  • 1.
    The Future ofData In 10 minutes
  • 2.
    Google 2000 Google Inc
Todayannounced it has released the largest search engine on the Internet. 

Google’s new index, comprising more than 1 billion URLs
  • 3.
    Google 2008 Our indexingsystem for processing links indicates that we now count 1 trillion unique URLs 

(and the number of individual web pages out there is growing by several billion pages per day).
  • 4.
    Data Growth 1,000 1000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008
  • 5.
    What are wedoing with all this data?
  • 6.
    A brief historyof data incomplete and mostly inaccurate
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Around 2007 The crutchesbegan to snap – Social graph emergence – Substantial data growth – Flexible & rich data structures needed – Open source tools emerged – No longer just Google & Amazon
  • 12.
    MongoDB, Hadoop, etc. Anempowering exoskeleton
  • 13.
    Products built withthese new data tools largely resembled earlier products
  • 14.
    The crutch wasgone but we acted like it was still there
  • 16.
    The data toolstoday are vastly more powerful Tools built to make sense of our data
  • 17.
    Your data istelling you things Are you listening?
  • 18.
    We need todo more than report We need products that… inform educate reveal
  • 21.
    Google analytics What ifit provided actual intelligence, like… Steve, Your website currently has higher than usual traffic. Most of the visitors are coming from a blog post on Lifehacker which links to your Vim page. There's quite a conversation happening there so you may want to participate. Additionally the conversation seems to have spilled over onto Twitter.
  • 22.
    Google analytics Of thepeople that go to MongoDB.org: 4% visited DataStax 20% visited Amazon Web Services 10% searched for cloud hosting …
  • 23.
    It's about makingconnections … so your users don’t have to
  • 24.
  • 25.
    Goals • Bring allthe data all in one place • Make it available in realtime • Find relationships in the data
  • 28.
    Modern tools like Redis Hadoop MongoDB Storm / Spark R / Ruby / Python / Go empower you
  • 29.
    What are yougoing to do with your data?
  • 30.
    Questions @spf13 spf13.com MongoDB.org

Editor's Notes

  • #11 In the past (current) all we did was present data in shiny digestible formats (google analytics)
  • #16 As tools matured they allowed us to do this faster (chartbeat) but the result is the same. 
  • #25 http://www.smartchicagocollaborative.org/projects/windy-grid/How can we use realtime data to make better decisions?
  • #27 911, 311, asset locations (sidewalks, parks, parking, bike storage, etc), building information, tweets and more.