Not Just Hadoop

NoSQL in the
 Enterprise
Talking about
What is BIG Data
BIG Data & you
Real world examples
The future of Big Data
@spf13

                  AKA
Steve Francia
16+ years building
the internet

  Father, husband,
  skateboarder


Chief Evangelist @
responsible for drivers,
integrations, web & writing
What is

BIGdata ?
2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.

Google’s new index, comprising
more than 1 billion URLs
2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs

(and the number of individual web
pages out there is growing by
several billion pages per day).
An unprecedented
amount of data is
being created and is
accessible
Data Growth                                   1,000
1000



 750


                                                       500
 500


                                                250
 250
                                          120
                                  55
            4      10     24
       1
   0
    2000   2001   2002   2003   2004     2005   2006   2007   2008

                           Millions of URLs
Truly Exponential
        Growth
Is hard for people to grasp


A BBC reporter recently: "Your current PC
is more powerful than the computer they
had on board the first flight to the moon".
Moore’s Law
Applies to more than just CPUs


Boiled down it is that things double at
regular intervals


It’s exponential growth.. and applies to
big data
How BIG is it?
How BIG is it?


2008
How BIG is it?
               2007


2008
                      2005
        2006
                          2003
                 2004
                               2001
                        2002
We’ve had BIG Data
 needs for a long time



In 1998 Google won the search race
through custom software & infrastructure
We’ve had BIG Data
 needs for a long time


In 2002 Amazon again wrote custom &
proprietary software to handle their BIG
Data needs
We’ve had BIG Data
 needs for a long time



In 2006 Facebook started with off the
shelf software, but quickly turned to
developing their own custom built
solutions
Ability to handle big data is
one of the largest factors in
determining winners vs
losers.
For over a decade
   BIG Data =
custom software
Why all this
talk about BIG
  Data now?
In the past few
years open source
software emerged
enabling ‘us’ to
handle BIG Data
The Big Data

 Story
Is actually
two stories
Doers & Tellers talking about
      different things
                http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
Tellers
Doers
Doers talk a lot more about
     actual solutions
They know it’s a two sided story

            Storage




           Processing
Take aways
MongoDB and Hadoop
MongoDB for storage &
operations
Hadoop for processing &
analytics
How MongoDB
     enables big data
•Flexible schema
• Horizontal scale built in & free
•Operates at near speed of memory
• Optimized for modern apps
MongoDB @ Orbitz
    Rob Lancaster
    October 23 | 2012
Use Cases

 • Hotel Data Collection
 • Hotel Rate Feed:
  • Supply hotel rates to Google for their Hotel Finder
  • Uses MongoDB:
   - Maintain state of data sent to Google
   - Identify changes in rates as they occur
  • Makes use of complex querying, secondary indexing
 • EasyLoader:
  • Feature allowing suppliers to easily load inventory to Orbitz
  • Uses MongoDB to persist all changes for auditing purposes



  29
Hotel Data Collection

 • Goals:
  • Better understand performance of
    our hotel path
  • Optimize our hotel rate cache
 • Methods:
  • Capture every action performed
    by our hotel search engine.
  • Persist this data for long periods.
 • Challenges:
  • Need high performance capture.
  • Scalable, inexpensive storage.



   30
Requirements

 Collection                        Storage & Processing
 • High write throughput           • High data volume
  • 500 servers                     • ~500 GB/day
  • > 100 million documents/day     • 7 TB/month compressed
 • Flexibility                     • Scalable
  • Complex extendable documents   • Inexpensive
  • No forced schema
                                   • Proximity with other data
 • Scalability
 • Simplicity




   31
The Solution

 • Utilize MongoDB as a collector:
  • ~ 500 clients
  • Utilize unsafe writes for high
    throughput
  • Heterogeneous documents
  • New collection for each hour
 • HDFS for storage & processing:
  • Data moved via M/R job:
   - One job per collection
   - One mapper per MongoDB instance
  • Additional processing and analysis
    by other jobs




   32
Challenges & Conclusions

 •Challenges? None really.
 •Achieved a robust and simple solution
 •MongoDB has been entirely worry free
  • Very high write throughput
  • Reads (well, full collection dumps across the wire) are
    slower




  33
The
  Futureof
      BIG
            data
What is BIG?
  BIG today is
normal tomorrow
Data Growth                                                 9,000
9000



6750


                                                                   4,400
4500


                                                           2,150
2250
                                                   1,000
                                             500
                         55     120   250
       1   4   10   24
  0
   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                              Millions of URLs
Data Growth                                                 9,000
9000



6750


                                                                   4,400
4500


                                                           2,150
2250
                                                   1,000
                                             500
                         55     120   250
       1   4   10   24
  0
   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                              Millions of URLs
How BIG is it?
How BIG is it?


2012
How BIG is it?
               2011


2012
                      2009
        2010
                          2007
                 2008
                               2005
                        2006
2012
Generating over
250 Millions of
tweets per day
MongoDB enables us to scale
with the redefinition of BIG.

Tools like Hadoop are
enabling us to process the
new BIG.
MongoDB is
committed to working
 with best data tools
      including
 Hadoop, Storm,
Disco, Spark & more
http://spf13.com
                           http://github.com/s
                           @spf13




Question
    download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
Big data for the rest of us

Big data for the rest of us

  • 1.
    Not Just Hadoop NoSQLin the Enterprise
  • 2.
    Talking about What isBIG Data BIG Data & you Real world examples The future of Big Data
  • 3.
    @spf13 AKA Steve Francia 16+ years building the internet Father, husband, skateboarder Chief Evangelist @ responsible for drivers, integrations, web & writing
  • 4.
  • 5.
    2000 Google Inc Today announcedit has released the largest search engine on the Internet. Google’s new index, comprising more than 1 billion URLs
  • 6.
    2008 Our indexing systemfor processing links indicates that we now count 1 trillion unique URLs (and the number of individual web pages out there is growing by several billion pages per day).
  • 7.
    An unprecedented amount ofdata is being created and is accessible
  • 8.
    Data Growth 1,000 1000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Millions of URLs
  • 9.
    Truly Exponential Growth Is hard for people to grasp A BBC reporter recently: "Your current PC is more powerful than the computer they had on board the first flight to the moon".
  • 10.
    Moore’s Law Applies tomore than just CPUs Boiled down it is that things double at regular intervals It’s exponential growth.. and applies to big data
  • 11.
  • 12.
    How BIG isit? 2008
  • 13.
    How BIG isit? 2007 2008 2005 2006 2003 2004 2001 2002
  • 14.
    We’ve had BIGData needs for a long time In 1998 Google won the search race through custom software & infrastructure
  • 15.
    We’ve had BIGData needs for a long time In 2002 Amazon again wrote custom & proprietary software to handle their BIG Data needs
  • 16.
    We’ve had BIGData needs for a long time In 2006 Facebook started with off the shelf software, but quickly turned to developing their own custom built solutions
  • 17.
    Ability to handlebig data is one of the largest factors in determining winners vs losers.
  • 18.
    For over adecade BIG Data = custom software
  • 19.
    Why all this talkabout BIG Data now?
  • 20.
    In the pastfew years open source software emerged enabling ‘us’ to handle BIG Data
  • 21.
  • 22.
  • 23.
    Doers & Tellerstalking about different things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
  • 24.
  • 25.
  • 26.
    Doers talk alot more about actual solutions
  • 27.
    They know it’sa two sided story Storage Processing
  • 28.
    Take aways MongoDB andHadoop MongoDB for storage & operations Hadoop for processing & analytics
  • 29.
    How MongoDB enables big data •Flexible schema • Horizontal scale built in & free •Operates at near speed of memory • Optimized for modern apps
  • 30.
    MongoDB @ Orbitz Rob Lancaster October 23 | 2012
  • 31.
    Use Cases •Hotel Data Collection • Hotel Rate Feed: • Supply hotel rates to Google for their Hotel Finder • Uses MongoDB: - Maintain state of data sent to Google - Identify changes in rates as they occur • Makes use of complex querying, secondary indexing • EasyLoader: • Feature allowing suppliers to easily load inventory to Orbitz • Uses MongoDB to persist all changes for auditing purposes 29
  • 32.
    Hotel Data Collection • Goals: • Better understand performance of our hotel path • Optimize our hotel rate cache • Methods: • Capture every action performed by our hotel search engine. • Persist this data for long periods. • Challenges: • Need high performance capture. • Scalable, inexpensive storage. 30
  • 33.
    Requirements Collection Storage & Processing • High write throughput • High data volume • 500 servers • ~500 GB/day • > 100 million documents/day • 7 TB/month compressed • Flexibility • Scalable • Complex extendable documents • Inexpensive • No forced schema • Proximity with other data • Scalability • Simplicity 31
  • 34.
    The Solution •Utilize MongoDB as a collector: • ~ 500 clients • Utilize unsafe writes for high throughput • Heterogeneous documents • New collection for each hour • HDFS for storage & processing: • Data moved via M/R job: - One job per collection - One mapper per MongoDB instance • Additional processing and analysis by other jobs 32
  • 35.
    Challenges & Conclusions •Challenges? None really. •Achieved a robust and simple solution •MongoDB has been entirely worry free • Very high write throughput • Reads (well, full collection dumps across the wire) are slower 33
  • 36.
    The Futureof BIG data
  • 37.
    What is BIG? BIG today is normal tomorrow
  • 38.
    Data Growth 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 39.
    Data Growth 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 40.
  • 41.
    How BIG isit? 2012
  • 42.
    How BIG isit? 2011 2012 2009 2010 2007 2008 2005 2006
  • 43.
  • 44.
    MongoDB enables usto scale with the redefinition of BIG. Tools like Hadoop are enabling us to process the new BIG.
  • 45.
    MongoDB is committed toworking with best data tools including Hadoop, Storm, Disco, Spark & more
  • 46.
    http://spf13.com http://github.com/s @spf13 Question download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Editor's Notes