Not Just HadoopNoSQL in the Enterprise
Talking aboutWhat is BIG DataBIG Data & youReal world examplesThe future of Big Data
@spf13                  AKASteve Francia16+ years buildingthe internet  Father, husband,  skateboarderChief Evangelist @re...
What isBIGdata ?
2000Google IncToday announced it has releasedthe largest search engine on theInternet.Google’s new index, comprisingmore t...
2008Our indexing system for processinglinks indicates thatwe now count 1 trillion unique URLs(and the number of individual...
An unprecedentedamount of data isbeing created and isaccessible
Data Growth                                   1,0001000 750                                                       500 500 ...
Truly Exponential        GrowthIs hard for people to graspA BBC reporter recently: "Your current PCis more powerful than t...
Moore’s LawApplies to more than just CPUsBoiled down it is that things double atregular intervalsIt’s exponential growth.....
How BIG is it?
How BIG is it?2008
How BIG is it?               20072008                      2005        2006                          2003                 ...
We’ve had BIG Data needs for a long timeIn 1998 Google won the search racethrough custom software & infrastructure
We’ve had BIG Data needs for a long timeIn 2002 Amazon again wrote custom &proprietary software to handle their BIGData ne...
We’ve had BIG Data needs for a long timeIn 2006 Facebook started with off theshelf software, but quickly turned todevelopi...
Ability to handle big data isone of the largest factors indetermining winners vslosers.
For over a decade   BIG Data =custom software
Why all thistalk about BIG  Data now?
In the past fewyears open sourcesoftware emergedenabling ‘us’ tohandle BIG Data
The Big Data Story
Is actuallytwo stories
Doers & Tellers talking about      different things                http://www.slideshare.net/siliconangle/trendconnect-big...
Tellers
Doers
Doers talk a lot more about     actual solutions
They know it’s a two sided story            Storage           Processing
Take awaysMongoDB and HadoopMongoDB for storage &operationsHadoop for processing &analytics
How MongoDB     enables big data•Flexible schema• Horizontal scale built in & free•Operates at near speed of memory• Optim...
MongoDB @ Orbitz    Rob Lancaster    October 23 | 2012
Use Cases • Hotel Data Collection • Hotel Rate Feed:  • Supply hotel rates to Google for their Hotel Finder  • Uses MongoD...
Hotel Data Collection • Goals:  • Better understand performance of    our hotel path  • Optimize our hotel rate cache • Me...
Requirements Collection                        Storage & Processing • High write throughput           • High data volume  ...
The Solution • Utilize MongoDB as a collector:  • ~ 500 clients  • Utilize unsafe writes for high    throughput  • Heterog...
Challenges & Conclusions •Challenges? None really. •Achieved a robust and simple solution •MongoDB has been entirely worry...
The  Futureof      BIG            data
What is BIG?  BIG today isnormal tomorrow
Data Growth                                                 9,00090006750                                                 ...
Data Growth                                                 9,00090006750                                                 ...
How BIG is it?
How BIG is it?2012
How BIG is it?               20112012                      2009        2010                          2007                 ...
2012Generating over250 Millions oftweets per day
MongoDB enables us to scalewith the redefinition of BIG.Tools like Hadoop areenabling us to process thenew BIG.
MongoDB iscommitted to working with best data tools      including Hadoop, Storm,Disco, Spark & more
http://spf13.com                           http://github.com/s                           @spf13Question    download at mon...
Big data for the rest of us
Upcoming SlideShare
Loading in...5
×

Big data for the rest of us

2,991

Published on

While Hadoop is the most well-known technology in big data, it’s not always the most approachable or appropriate solution for data storage and processing. In this session you’ll learn about enterprise NoSQL architectures, with examples drawn from real-world deployments, as well as how to apply big data regardless of the size of your own enterprise.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,991
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
63
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • 10\n15\n10\n5\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • One site is generating nearly as many URLs as the entire internet 6 years ago.\n
  • \n
  • \n
  • \n
  • \n
  • Big data for the rest of us

    1. 1. Not Just HadoopNoSQL in the Enterprise
    2. 2. Talking aboutWhat is BIG DataBIG Data & youReal world examplesThe future of Big Data
    3. 3. @spf13 AKASteve Francia16+ years buildingthe internet Father, husband, skateboarderChief Evangelist @responsible for drivers,integrations, web & writing
    4. 4. What isBIGdata ?
    5. 5. 2000Google IncToday announced it has releasedthe largest search engine on theInternet.Google’s new index, comprisingmore than 1 billion URLs
    6. 6. 2008Our indexing system for processinglinks indicates thatwe now count 1 trillion unique URLs(and the number of individual webpages out there is growing byseveral billion pages per day).
    7. 7. An unprecedentedamount of data isbeing created and isaccessible
    8. 8. Data Growth 1,0001000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Millions of URLs
    9. 9. Truly Exponential GrowthIs hard for people to graspA BBC reporter recently: "Your current PCis more powerful than the computer theyhad on board the first flight to the moon".
    10. 10. Moore’s LawApplies to more than just CPUsBoiled down it is that things double atregular intervalsIt’s exponential growth.. and applies tobig data
    11. 11. How BIG is it?
    12. 12. How BIG is it?2008
    13. 13. How BIG is it? 20072008 2005 2006 2003 2004 2001 2002
    14. 14. We’ve had BIG Data needs for a long timeIn 1998 Google won the search racethrough custom software & infrastructure
    15. 15. We’ve had BIG Data needs for a long timeIn 2002 Amazon again wrote custom &proprietary software to handle their BIGData needs
    16. 16. We’ve had BIG Data needs for a long timeIn 2006 Facebook started with off theshelf software, but quickly turned todeveloping their own custom builtsolutions
    17. 17. Ability to handle big data isone of the largest factors indetermining winners vslosers.
    18. 18. For over a decade BIG Data =custom software
    19. 19. Why all thistalk about BIG Data now?
    20. 20. In the past fewyears open sourcesoftware emergedenabling ‘us’ tohandle BIG Data
    21. 21. The Big Data Story
    22. 22. Is actuallytwo stories
    23. 23. Doers & Tellers talking about different things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
    24. 24. Tellers
    25. 25. Doers
    26. 26. Doers talk a lot more about actual solutions
    27. 27. They know it’s a two sided story Storage Processing
    28. 28. Take awaysMongoDB and HadoopMongoDB for storage &operationsHadoop for processing &analytics
    29. 29. How MongoDB enables big data•Flexible schema• Horizontal scale built in & free•Operates at near speed of memory• Optimized for modern apps
    30. 30. MongoDB @ Orbitz Rob Lancaster October 23 | 2012
    31. 31. Use Cases • Hotel Data Collection • Hotel Rate Feed: • Supply hotel rates to Google for their Hotel Finder • Uses MongoDB: - Maintain state of data sent to Google - Identify changes in rates as they occur • Makes use of complex querying, secondary indexing • EasyLoader: • Feature allowing suppliers to easily load inventory to Orbitz • Uses MongoDB to persist all changes for auditing purposes 29
    32. 32. Hotel Data Collection • Goals: • Better understand performance of our hotel path • Optimize our hotel rate cache • Methods: • Capture every action performed by our hotel search engine. • Persist this data for long periods. • Challenges: • Need high performance capture. • Scalable, inexpensive storage. 30
    33. 33. Requirements Collection Storage & Processing • High write throughput • High data volume • 500 servers • ~500 GB/day • > 100 million documents/day • 7 TB/month compressed • Flexibility • Scalable • Complex extendable documents • Inexpensive • No forced schema • Proximity with other data • Scalability • Simplicity 31
    34. 34. The Solution • Utilize MongoDB as a collector: • ~ 500 clients • Utilize unsafe writes for high throughput • Heterogeneous documents • New collection for each hour • HDFS for storage & processing: • Data moved via M/R job: - One job per collection - One mapper per MongoDB instance • Additional processing and analysis by other jobs 32
    35. 35. Challenges & Conclusions •Challenges? None really. •Achieved a robust and simple solution •MongoDB has been entirely worry free • Very high write throughput • Reads (well, full collection dumps across the wire) are slower 33
    36. 36. The Futureof BIG data
    37. 37. What is BIG? BIG today isnormal tomorrow
    38. 38. Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
    39. 39. Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
    40. 40. How BIG is it?
    41. 41. How BIG is it?2012
    42. 42. How BIG is it? 20112012 2009 2010 2007 2008 2005 2006
    43. 43. 2012Generating over250 Millions oftweets per day
    44. 44. MongoDB enables us to scalewith the redefinition of BIG.Tools like Hadoop areenabling us to process thenew BIG.
    45. 45. MongoDB iscommitted to working with best data tools including Hadoop, Storm,Disco, Spark & more
    46. 46. http://spf13.com http://github.com/s @spf13Question download at mongodb.orgWe’re hiring!! Contact us at jobs@10gen.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×