Big data for the rest of us
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Big data for the rest of us

on

  • 3,140 views

While Hadoop is the most well-known technology in big data, it’s not always the most approachable or appropriate solution for data storage and processing. In this session you’ll learn about ...

While Hadoop is the most well-known technology in big data, it’s not always the most approachable or appropriate solution for data storage and processing. In this session you’ll learn about enterprise NoSQL architectures, with examples drawn from real-world deployments, as well as how to apply big data regardless of the size of your own enterprise.

Statistics

Views

Total Views
3,140
Views on SlideShare
2,220
Embed Views
920

Actions

Likes
7
Downloads
58
Comments
0

11 Embeds 920

http://spf13.com 740
http://feeds.feedburner.com 78
https://twitter.com 44
http://localhost 32
http://andrewpattinson.wordpress.com 14
http://cloud.feedly.com 6
http://digg.com 2
http://core.traackr.com 1
http://www.diffbot.com&_=1351805064729 HTTP 1
http://dzone.com 1
http://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • 10\n15\n10\n5\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • One site is generating nearly as many URLs as the entire internet 6 years ago.\n
  • \n
  • \n
  • \n
  • \n

Big data for the rest of us Presentation Transcript

  • 1. Not Just HadoopNoSQL in the Enterprise
  • 2. Talking aboutWhat is BIG DataBIG Data & youReal world examplesThe future of Big Data
  • 3. @spf13 AKASteve Francia16+ years buildingthe internet Father, husband, skateboarderChief Evangelist @responsible for drivers,integrations, web & writing
  • 4. What isBIGdata ?
  • 5. 2000Google IncToday announced it has releasedthe largest search engine on theInternet.Google’s new index, comprisingmore than 1 billion URLs
  • 6. 2008Our indexing system for processinglinks indicates thatwe now count 1 trillion unique URLs(and the number of individual webpages out there is growing byseveral billion pages per day).
  • 7. An unprecedentedamount of data isbeing created and isaccessible
  • 8. Data Growth 1,0001000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Millions of URLs
  • 9. Truly Exponential GrowthIs hard for people to graspA BBC reporter recently: "Your current PCis more powerful than the computer theyhad on board the first flight to the moon".
  • 10. Moore’s LawApplies to more than just CPUsBoiled down it is that things double atregular intervalsIt’s exponential growth.. and applies tobig data
  • 11. How BIG is it?
  • 12. How BIG is it?2008
  • 13. How BIG is it? 20072008 2005 2006 2003 2004 2001 2002
  • 14. We’ve had BIG Data needs for a long timeIn 1998 Google won the search racethrough custom software & infrastructure
  • 15. We’ve had BIG Data needs for a long timeIn 2002 Amazon again wrote custom &proprietary software to handle their BIGData needs
  • 16. We’ve had BIG Data needs for a long timeIn 2006 Facebook started with off theshelf software, but quickly turned todeveloping their own custom builtsolutions
  • 17. Ability to handle big data isone of the largest factors indetermining winners vslosers.
  • 18. For over a decade BIG Data =custom software
  • 19. Why all thistalk about BIG Data now?
  • 20. In the past fewyears open sourcesoftware emergedenabling ‘us’ tohandle BIG Data
  • 21. The Big Data Story
  • 22. Is actuallytwo stories
  • 23. Doers & Tellers talking about different things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
  • 24. Tellers
  • 25. Doers
  • 26. Doers talk a lot more about actual solutions
  • 27. They know it’s a two sided story Storage Processing
  • 28. Take awaysMongoDB and HadoopMongoDB for storage &operationsHadoop for processing &analytics
  • 29. How MongoDB enables big data•Flexible schema• Horizontal scale built in & free•Operates at near speed of memory• Optimized for modern apps
  • 30. MongoDB @ Orbitz Rob Lancaster October 23 | 2012
  • 31. Use Cases • Hotel Data Collection • Hotel Rate Feed: • Supply hotel rates to Google for their Hotel Finder • Uses MongoDB: - Maintain state of data sent to Google - Identify changes in rates as they occur • Makes use of complex querying, secondary indexing • EasyLoader: • Feature allowing suppliers to easily load inventory to Orbitz • Uses MongoDB to persist all changes for auditing purposes 29
  • 32. Hotel Data Collection • Goals: • Better understand performance of our hotel path • Optimize our hotel rate cache • Methods: • Capture every action performed by our hotel search engine. • Persist this data for long periods. • Challenges: • Need high performance capture. • Scalable, inexpensive storage. 30
  • 33. Requirements Collection Storage & Processing • High write throughput • High data volume • 500 servers • ~500 GB/day • > 100 million documents/day • 7 TB/month compressed • Flexibility • Scalable • Complex extendable documents • Inexpensive • No forced schema • Proximity with other data • Scalability • Simplicity 31
  • 34. The Solution • Utilize MongoDB as a collector: • ~ 500 clients • Utilize unsafe writes for high throughput • Heterogeneous documents • New collection for each hour • HDFS for storage & processing: • Data moved via M/R job: - One job per collection - One mapper per MongoDB instance • Additional processing and analysis by other jobs 32
  • 35. Challenges & Conclusions •Challenges? None really. •Achieved a robust and simple solution •MongoDB has been entirely worry free • Very high write throughput • Reads (well, full collection dumps across the wire) are slower 33
  • 36. The Futureof BIG data
  • 37. What is BIG? BIG today isnormal tomorrow
  • 38. Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 39. Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 40. How BIG is it?
  • 41. How BIG is it?2012
  • 42. How BIG is it? 20112012 2009 2010 2007 2008 2005 2006
  • 43. 2012Generating over250 Millions oftweets per day
  • 44. MongoDB enables us to scalewith the redefinition of BIG.Tools like Hadoop areenabling us to process thenew BIG.
  • 45. MongoDB iscommitted to working with best data tools including Hadoop, Storm,Disco, Spark & more
  • 46. http://spf13.com http://github.com/s @spf13Question download at mongodb.orgWe’re hiring!! Contact us at jobs@10gen.com