NoSQL databases and managing big data

11,786 views
11,550 views

Published on

An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.

Published in: Technology
2 Comments
13 Likes
Statistics
Notes
  • Really it is simple and brief...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Superb steve...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
11,786
On SlideShare
0
From Embeds
0
Number of Embeds
3,325
Actions
Shares
0
Downloads
273
Comments
2
Likes
13
Embeds 0
No embeds

No notes for slide
  • \n
  • 10\n15\n10\n5\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • * memcache, redis, membase\n* mongodb, couch\n* cassandra, riak\n* neo4j, flockdb\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • One site is generating nearly as many URLs as the entire internet 6 years ago.\n
  • \n
  • \n
  • \n
  • \n
  • NoSQL databases and managing big data

    1. 1. NoSQLDatabases &Managing Big Data
    2. 2. Talking aboutWhat is BIG DataNoSQLMongoDBFuture of BIG Data
    3. 3. @spf13 AKASteve Francia15+ years buildingthe internet Father, husband, skateboarderChief Solutions Architect @responsible for drivers,integrations, web & docs
    4. 4. Company behind MongoDBOffices in NYC, Palo Alto, London & Dublin100+ employeesSupport, consulting, trainingMgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark LogicWell Funded: Sequoia, Union Square, Flybridge
    5. 5. What is BIG data ?
    6. 6. 2000Google IncToday announced it has releasedthe largest search engine on theInternet.Google’s new index, comprisingmore than 1 billion URLs
    7. 7. 2008Our indexing system for processinglinks indicates thatwe now count 1 trillion unique URLs(and the number of individual webpages out there is growing byseveral billion pages per day).
    8. 8. Data Growth 1,0001000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Millions of URLs
    9. 9. An unprecedentedamount of data isbeing created and isaccessible
    10. 10. What good is it ifwe can’t utilize thisdata?
    11. 11. ?What isNoSQL
    12. 12. What is NoSQL?Key / Value Column Graph Document
    13. 13. Key-Value StoresA mapping from a key to a valueThe store doesnt know anything about the thekey or valueThe store doesnt know anything about theinsides of the valueOperations :•Set, get, or delete a key-value pair
    14. 14. Column-Oriented StoresLike a relational store, but flipped around: alldata for a column is kept togetherAn index provides a means to get a columnvalue for a recordOperations: •Get, insert, delete records; updating fieldsStreaming column data in and out of Hadoop
    15. 15. Graph DatabasesStores vertex-to-vertex edgesOperations: •Getting and setting edges •Sometimes possible to annotate vertices or edgesQuery languages support finding pathsbetween vertices, subject to variousconstraints
    16. 16. Document StoresThe store is a container for documentsDocuments are made up of named fields (think object/array/dict/hash...)Can query on any document field(s)Operations:•Insert and delete documents•Update fields within documents
    17. 17. MySQLData Model Columns Key:Value Columns Documents Relational Eventual / Eventual /Consistency Strong Strong Strong Quorum Quorum Multi- Multi- Single Single SingleAvailability Master Master Master Master Master Range orPartitioning Hash Hash Range N/A Hash Thrift, Native Rest, Native Query SQL CQL Drivers (6) Thrift Drivers (12)
    18. 18. Introduction toMongoDB
    19. 19. What do we want in an ideal world?
    20. 20. What do we want in an ideal world?•Horizontal scaling •cloud compatible •works with standard servers•Fast•Development is easy •Features •The Right Data Model •Schema Agility
    21. 21. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
    22. 22. Under the hoodWritten in C++Runs nearly everywhereData serialized to BSONExtensive use of memory-mapped filesi.e. read-through write-throughmemory caching.
    23. 23. Database LandscapeScalability & Performance Memcached MongoDB RDBMS Depth of Functionality
    24. 24. “MongoDB has the bestfeatures of key/valuestores, documentdatabases andrelational databasesin one. John Nunemaker
    25. 25. Relational made normalized data look like this Category • Name • Url Article User • Name Tag• Name • Slug • Name• Email Address • Publish date • Url • Text Comment • Comment • Date • Author
    26. 26. Document databases makenormalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
    27. 27. MongoD B
    28. 28. Start with an (or array, hash, dict, eplace1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ]}
    29. 29. Inserting the record Initial Data Load > db.places.insert(place1)> db.places.insert(place1)
    30. 30. Querying{ name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ]}> db.places.findOne({ zip: "10011", tags: "awesome" })> db.places.find({tags: "business" })
    31. 31. Nested Documents{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], tips : [{ author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Place Ever!" }]}
    32. 32. Updating> db.places.update( {name : "10gen HQ"}, { $push : { tips : { author : "nosh", date : 6/26/2011, text : "Office hours are great!" } } })
    33. 33. MongoDBUse Cases
    34. 34. CMS / BlogNeeds:• Business needed modern data store for rapid development and scaleSolution:• Use PHP & MongoDBResults:• Real time statistics• All data, images, etc stored together easy access, easy deployment, easy high availability• No need for complex migrations• Enabled very rapid development and growth
    35. 35. Photo Meta-DataProblem:• Business needed more flexibility than Oracle could deliverSolution:• Use MongoDB instead of OracleResults:• Developed application in one sprint cycle• 500% cost reduction compared to Oracle• 900% performance improvement compared to Oracle
    36. 36. Customer AnalyticsProblem:• Deal with massive data volume across all customer sitesSolution:• Use MongoDB to replace Google Analytics / Omniture optionsResults:• Less than one week to build prototype and prove business case• Rapid deployment of new features
    37. 37. ArchivingWhy MongoDB:• Existing application built on MySQL• Lots of friction with RDBMS based archive storage• Needed more scalable archive storage backendSolution:• Keep MySQL for active data (100mil)• MongoDB for archive (2+ billion)Results:• No more alter table statements taking over 2 months to run• Sharding enabled horizontal scale• Very happily looking at other places to use MongoDB
    38. 38. Online DictionaryProblem:• MySQL could not scale to handle their 5B+ documentsSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Eliminated need for external caching system• 20x performance improvement over MySQL
    39. 39. E-commerceProblem:• Multi-vertical E-commerce impossible to model (efficiently) in RDBMSSolution:• Switched from MySQL to MongoDBResults:• Massive simplification of code base• Rapidly build, halving time to market (and cost)• Eliminated need for external caching system• 50x+ performance improvement over MySQL
    40. 40. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
    41. 41. In Good Company and 1000s more
    42. 42. The Futureof BIGdata
    43. 43. What is BIG? BIG today isnormal tomorrow
    44. 44. Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
    45. 45. Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
    46. 46. 2012Generating over250 Millions oftweets per day
    47. 47. MongoDB enablesus to scale withthe redefinitionof BIG.
    48. 48. MongoDB High EasyPerformance Development { author : “steve”, date : new Date(), text : “About MongoDB...”, tags : [“tech”, “database”]} Horizontally Scalable
    49. 49. http://spf13.com http://github.com/s @spf13Question download at mongodb.orgWe’re hiring!! Contact us at jobs@10gen.com

    ×