Making Big Data
    Small:
  Big Data for the Rest of Us




                                November 2012




                                                1
…2000

•Google announced it had released the largest
search engine on the Internet
•Google’s new index comprised more than 1 billion
URLs

•BIG!!!
                                                    2
…2008
•Our indexing system for processing links indicates
that we now count 1 trillion unique URLs
(and the number of individual web pages out there
is growing by several billion pages per day)

•BIGGER!!!!

                                                      3
An unprecedented
amount of data is being
created and is accessible


                            4
5
6
• Applies to more than just CPUs
• Summary version? Things double at regular
  intervals
• It’s exponential growth…and applies to Big Data




BBC: “Your current PC is more powerful than the
 computer they had on board the first flight to the
 moon.”
                                                      7
9,000
9000



6750


                                                                4,400
4500


                                                        2,150
2250
                                                1,000
                                          500
                         55   120   250
       1   4   10   24
   0
    2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                                                                                8
9
10
11
Source: Silicon Angle, 2012
                              12
Source: Silicon Angle, 2012
                              13
Source: Silicon Angle, 2012
                              14
Storage/
Operations




Processing/
Analytics


              15
16
• In 1998 Google won the search race through
  custom software and infrastructure
• In 2002 Amazon again wrote custom and
  proprietary software to handle their BIG Data
  needs
• In 2006 Facebook started with off the shelf
  software, but quickly turned to developing their
  own custom-built solutions

What do these have in common? Big Data was
 critical to making them win
                                                     17
18
19
• MS Office -> OpenOffice
• Oracle DB ->
  PostgreSQL
• Unix -> Linux
• Weblogic -> JBoss
• Documentum -> Alfresco
• Cognos ->
  Pentaho/Jasper
• Salesforce ->SugarCRM
• Informatica -> Talend
• iOS -> Android (?)
• Etc.                      20
Web Innovation



                                           OSS
                                           companies


                        Vendor-sponsored



Individual developers


                                                                        21
22
“The best
     minds of my
     generation
     are thinking
     about how to
     make people
     click ads.”
     (Jeff Hammerbacher)




23
                           23
Where Do We Go from Here?




                            24
Agile Development
                         • Iterative & continuous
                         • New and emerging
                           apps




Volume and Type
of Data
• Trillions of records
• 10’s of millions of             New Architectures
  queries per second             • Systems scaling horizontally,
• Volume of data                   not vertically
• Semi-structured and            • Commodity servers
  unstructured data              • Cloud Computing



                                                           25
storm




Apache Drill

               26
27
World’s Most Popular Big Data Sources, 2012




                                              Source: JasperSoft, 2012

                                                                         28
The future is
    hu   MONGOus


                   29
5,900 companies evaluated. 10gen is #1 in Software and #9 overall
                                                                    30
“Relational databases…[don’t] necessarily match the way we see our
data. mongoDB gave us the flexibility to store data in the way that
we understand it as opposed to somebody’s theoretical view.”




                              “It’s friendly. By friendly, I mean that coming from a relational
                              background, specifically a MySQL background, a lot of the concepts
                              carry over.... It makes it very easy to get started.”




“Selecting MongoDB as our database platform was a no brainer as the
technology offered us the flexibility and scalability that we knew
we’d need for Priority Moments.”



                                                                                                   31
32
33
34
35
36
@mjasay




          37

Morning with MongoDB Paris 2012 - Making Big Data Small

  • 1.
    Making Big Data Small: Big Data for the Rest of Us November 2012 1
  • 2.
    …2000 •Google announced ithad released the largest search engine on the Internet •Google’s new index comprised more than 1 billion URLs •BIG!!! 2
  • 3.
    …2008 •Our indexing systemfor processing links indicates that we now count 1 trillion unique URLs (and the number of individual web pages out there is growing by several billion pages per day) •BIGGER!!!! 3
  • 4.
    An unprecedented amount ofdata is being created and is accessible 4
  • 5.
  • 6.
  • 7.
    • Applies tomore than just CPUs • Summary version? Things double at regular intervals • It’s exponential growth…and applies to Big Data BBC: “Your current PC is more powerful than the computer they had on board the first flight to the moon.” 7
  • 8.
    9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 8
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    • In 1998Google won the search race through custom software and infrastructure • In 2002 Amazon again wrote custom and proprietary software to handle their BIG Data needs • In 2006 Facebook started with off the shelf software, but quickly turned to developing their own custom-built solutions What do these have in common? Big Data was critical to making them win 17
  • 18.
  • 19.
  • 20.
    • MS Office-> OpenOffice • Oracle DB -> PostgreSQL • Unix -> Linux • Weblogic -> JBoss • Documentum -> Alfresco • Cognos -> Pentaho/Jasper • Salesforce ->SugarCRM • Informatica -> Talend • iOS -> Android (?) • Etc. 20
  • 21.
    Web Innovation OSS companies Vendor-sponsored Individual developers 21
  • 22.
  • 23.
    “The best minds of my generation are thinking about how to make people click ads.” (Jeff Hammerbacher) 23 23
  • 24.
    Where Do WeGo from Here? 24
  • 25.
    Agile Development • Iterative & continuous • New and emerging apps Volume and Type of Data • Trillions of records • 10’s of millions of New Architectures queries per second • Systems scaling horizontally, • Volume of data not vertically • Semi-structured and • Commodity servers unstructured data • Cloud Computing 25
  • 26.
  • 27.
  • 28.
    World’s Most PopularBig Data Sources, 2012 Source: JasperSoft, 2012 28
  • 29.
    The future is hu MONGOus 29
  • 30.
    5,900 companies evaluated.10gen is #1 in Software and #9 overall 30
  • 31.
    “Relational databases…[don’t] necessarilymatch the way we see our data. mongoDB gave us the flexibility to store data in the way that we understand it as opposed to somebody’s theoretical view.” “It’s friendly. By friendly, I mean that coming from a relational background, specifically a MySQL background, a lot of the concepts carry over.... It makes it very easy to get started.” “Selecting MongoDB as our database platform was a no brainer as the technology offered us the flexibility and scalability that we knew we’d need for Priority Moments.” 31
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.