Do Big Data and NoSQL Fit Your Needs?

1,532 views

Published on

Many of us consider

Published in: Social Media, Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,532
On SlideShare
0
From Embeds
0
Number of Embeds
378
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Do Big Data and NoSQL Fit Your Needs?

  1. 1. The VP R&D Open Seminar Big Data Workshop moshe.kaplan@brightaqua.com http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  2. 2. Presentation Objectives http://www.webperformancetoday.com/2010/06/15/everything-you-wanted-to-know-about-web-performance/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  3. 3. Why Do I Care? From 0 to 100 (US mass adaptation) Phone: Radio: TV: Mobile: Internet: Facebook: 100 yrs 40 yrs 30 yrs 20 yrs 10 yrs 2 yrs http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  4. 4. The Internet Industry http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  5. 5. The Prime Suspect http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  6. 6. Assumptions… http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  7. 7. Where did it Fail? Get an Answer, Fast and Cheap http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  8. 8. Where did it Fail? I Just Want “Class Persistency Storage” and Changing Schema on Demand http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  9. 9. Where did it Fail? Be Always Available, Even w/ an Old Answer http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  10. 10. Where did it Fail? Get Me Fast and Good Enough Answer http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  11. 11. Where did it Fail? Data is Too Big, and Storage is $$$ But CPU and Network are Even More http://www.powerbyte.com/Isilon.html http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  12. 12. It is all great, but… I Need to Meet Compliance http://www.vision7.com/app_system/lib/image/content/PCI_compliance.jpg http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  13. 13. It is all great, but… I Need a Vendor http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  14. 14. It is all great, but… I Need Reporting http://www.novell.com/communities/node/5851/get-ready-sentinel-61 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  15. 15. It is all great, but… I Need Transactions http://www.novell.com/communities/node/5851/get-ready-sentinel-61 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  16. 16. It is all great, but… We Need Training for the Data Analysts db.article.aggregate( { $group : { _id : "$author", < GROUP BY author docsPerAuthor : { $sum : 1 }, < SUM(1) = N viewsPerAuthor : { $sum : "$pageViews" } < SUM(pageViews) }} ); http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  17. 17. General Architecture Client Server Database http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com Apps
  18. 18. The VP R&D Open Seminar CLIENT SIDE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  19. 19. It’s a World Made of Pixels http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  20. 20. The VP R&D Open Seminar SERVER SIDE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  21. 21. General Strategies Online In Memory Databases and Q Log files processing http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  22. 22. In Memory Databases http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  23. 23. 700 Inserts/Sec In Memory Engine 3000 Inserts/Sec Amazon AWS Standard Large Instance InnoDB Engine 700 Inserts/Sec http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  24. 24. The VP R&D Open Seminar General Strategies DATA SIDE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  25. 25. Strategy A - Sharding http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  26. 26. Strategy B – MapReduce http://blogs.microsoft.co.il/blogs/vprnd http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  27. 27. Strategy C - NoSQL insert get multiget remove truncate <Key, Value> http://wiki.apache.org/cassandra/API http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  28. 28. The VP R&D Open Seminar MongoDB DOCUMENT DATABASES http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  29. 29. When Should I Choose NoSQL? • • • Eventually Consistent Document Store Key Value http://guyharrison.squarespace.com/blog/tag/nosql http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  30. 30. Same Terminology Database Table Row  Database  Collection  Document http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  31. 31. Same Terminology Database Table Row  Database  Collection  Document http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  32. 32. A Blog Case Study in RDBMS http://www.slideshare.net/nateabele/building-apps-with-mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  33. 33. And as a SW Engineer would like it to be… http://www.slideshare.net/nateabele/building-apps-with-mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  34. 34. Classic RDBMS Replication http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  35. 35. Auto Selection Using Quorum Selection Methods: • Low Priority • Hidden • (Weighted) Voting http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  36. 36. MongoDB and Sharding http://www.10gen.com/products/mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  37. 37. The VP R&D Open Seminar Cassandra EVENTUALLY CONSISTENT http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  38. 38. Product Architecture http://horicky.blogspot.co.il/2010/10/bigtable-model-with-cassandra-and-hbase.html http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  39. 39. Key Concepts Fast Answer Not Always Right Can Lose Data Autosync Bottom Line:     Use the memory Multiple instances Multiple instances Client timestamp Integrated Memcached + MySQL http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  40. 40. Azure Table Storage: Key Concepts Very Large Tables  Partitioning Get by Key  Portioning Key Sort  Single Sort Key Simple Rows  Basic Types No Joins, No Grouping, No Multiple Sorting Bottom Line: Simple Very Large Tables  LDAP http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  41. 41. MongoDB and Sharding http://www.10gen.com/products/mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  42. 42. The VP R&D Open Seminar Hadoop MAP REDUCE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  43. 43. Count Pageviews by Date Map The Challenge (Count on every node) Reduce The Answers (Get a Single Answer) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  44. 44. Word Count function map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: list of aggregated counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  45. 45. Hadoop Architecture http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  46. 46. Hadoop as a Service http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  47. 47. Excel Integration http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  48. 48. The VP R&D Open Seminar COLUMN ORIENTED DATABASES http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  49. 49. Column Oriented Database Databases + INSERT + GROUP BY, SUM … + Compression - Join - DELETE, UPDATE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  50. 50. Cloud Services http://www.theregister.co.uk/2012/11/28/amazon_aws_redshift_data_warehousing/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  51. 51. Google Big Query http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  52. 52. The VP R&D Open Seminar FEEDBACK SYSTEMS http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  53. 53. Customer Feedback (Kampyle) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  54. 54. Heatmaps (Clicktale) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  55. 55. User Interaction (Totango) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  56. 56. The VP R&D Open Seminar MongoDB BUSINESS MONITORING http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  57. 57. Funnel Monitoring http://blog.clicktale.com/2011/01/18/new-clicktale-product-launches-for-2011/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  58. 58. Monitoring is not your CPU utilization http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  59. 59. The VP R&D Open Seminar SHARDING IN DEPTH http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  60. 60. Sharding Again http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  61. 61. Vertical Sharding http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  62. 62. Horizontal Sharding Static Hashing Complex growth Simple Mod 10 = 0 Mod 10 = 1 Mod 10 = 2 Mod 10 = 3 Mod 10 = 4 Mod 10 = 5 Mod 10 = 6 Mod 10 = 7 Mod 10 = 8 Mod 10 = 9 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  63. 63. Horizontal Sharding Key locations are defined in a directory Simple growth Directory is SPOF The Directory Can be Very Large http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  64. 64. Horizontal Sharding Static Hashing with Directory Mapping Simple Growth The Small Directory Can be Cached on Each App Server Mod 1000 = 4 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  65. 65. Horizontal Sharding Each key is signed by the DB# on creation Simple growth The Key Store Can be Cached on Each App Server http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  66. 66. The Bottom Line: Grow ∞ Thank you! and Keep Performing! Moshe Kaplan http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com

×