Do Big Data and NoSQL Fit Your Needs?

  • 1,109 views
Uploaded on

Many of us consider

Many of us consider

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,109
On Slideshare
0
From Embeds
0
Number of Embeds
26

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The VP R&D Open Seminar Big Data Workshop moshe.kaplan@brightaqua.com http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 2. Presentation Objectives http://www.webperformancetoday.com/2010/06/15/everything-you-wanted-to-know-about-web-performance/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 3. Why Do I Care? From 0 to 100 (US mass adaptation) Phone: Radio: TV: Mobile: Internet: Facebook: 100 yrs 40 yrs 30 yrs 20 yrs 10 yrs 2 yrs http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 4. The Internet Industry http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 5. The Prime Suspect http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 6. Assumptions… http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 7. Where did it Fail? Get an Answer, Fast and Cheap http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 8. Where did it Fail? I Just Want “Class Persistency Storage” and Changing Schema on Demand http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 9. Where did it Fail? Be Always Available, Even w/ an Old Answer http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 10. Where did it Fail? Get Me Fast and Good Enough Answer http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 11. Where did it Fail? Data is Too Big, and Storage is $$$ But CPU and Network are Even More http://www.powerbyte.com/Isilon.html http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 12. It is all great, but… I Need to Meet Compliance http://www.vision7.com/app_system/lib/image/content/PCI_compliance.jpg http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 13. It is all great, but… I Need a Vendor http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 14. It is all great, but… I Need Reporting http://www.novell.com/communities/node/5851/get-ready-sentinel-61 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 15. It is all great, but… I Need Transactions http://www.novell.com/communities/node/5851/get-ready-sentinel-61 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 16. It is all great, but… We Need Training for the Data Analysts db.article.aggregate( { $group : { _id : "$author", < GROUP BY author docsPerAuthor : { $sum : 1 }, < SUM(1) = N viewsPerAuthor : { $sum : "$pageViews" } < SUM(pageViews) }} ); http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 17. General Architecture Client Server Database http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com Apps
  • 18. The VP R&D Open Seminar CLIENT SIDE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 19. It’s a World Made of Pixels http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 20. The VP R&D Open Seminar SERVER SIDE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 21. General Strategies Online In Memory Databases and Q Log files processing http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 22. In Memory Databases http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 23. 700 Inserts/Sec In Memory Engine 3000 Inserts/Sec Amazon AWS Standard Large Instance InnoDB Engine 700 Inserts/Sec http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 24. The VP R&D Open Seminar General Strategies DATA SIDE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 25. Strategy A - Sharding http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 26. Strategy B – MapReduce http://blogs.microsoft.co.il/blogs/vprnd http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 27. Strategy C - NoSQL insert get multiget remove truncate <Key, Value> http://wiki.apache.org/cassandra/API http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 28. The VP R&D Open Seminar MongoDB DOCUMENT DATABASES http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 29. When Should I Choose NoSQL? • • • Eventually Consistent Document Store Key Value http://guyharrison.squarespace.com/blog/tag/nosql http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 30. Same Terminology Database Table Row  Database  Collection  Document http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 31. Same Terminology Database Table Row  Database  Collection  Document http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 32. A Blog Case Study in RDBMS http://www.slideshare.net/nateabele/building-apps-with-mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 33. And as a SW Engineer would like it to be… http://www.slideshare.net/nateabele/building-apps-with-mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 34. Classic RDBMS Replication http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 35. Auto Selection Using Quorum Selection Methods: • Low Priority • Hidden • (Weighted) Voting http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 36. MongoDB and Sharding http://www.10gen.com/products/mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 37. The VP R&D Open Seminar Cassandra EVENTUALLY CONSISTENT http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 38. Product Architecture http://horicky.blogspot.co.il/2010/10/bigtable-model-with-cassandra-and-hbase.html http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 39. Key Concepts Fast Answer Not Always Right Can Lose Data Autosync Bottom Line:     Use the memory Multiple instances Multiple instances Client timestamp Integrated Memcached + MySQL http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 40. Azure Table Storage: Key Concepts Very Large Tables  Partitioning Get by Key  Portioning Key Sort  Single Sort Key Simple Rows  Basic Types No Joins, No Grouping, No Multiple Sorting Bottom Line: Simple Very Large Tables  LDAP http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 41. MongoDB and Sharding http://www.10gen.com/products/mongodb http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 42. The VP R&D Open Seminar Hadoop MAP REDUCE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 43. Count Pageviews by Date Map The Challenge (Count on every node) Reduce The Answers (Get a Single Answer) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 44. Word Count function map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: list of aggregated counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 45. Hadoop Architecture http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 46. Hadoop as a Service http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 47. Excel Integration http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 48. The VP R&D Open Seminar COLUMN ORIENTED DATABASES http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 49. Column Oriented Database Databases + INSERT + GROUP BY, SUM … + Compression - Join - DELETE, UPDATE http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 50. Cloud Services http://www.theregister.co.uk/2012/11/28/amazon_aws_redshift_data_warehousing/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 51. Google Big Query http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 52. The VP R&D Open Seminar FEEDBACK SYSTEMS http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 53. Customer Feedback (Kampyle) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 54. Heatmaps (Clicktale) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 55. User Interaction (Totango) http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 56. The VP R&D Open Seminar MongoDB BUSINESS MONITORING http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 57. Funnel Monitoring http://blog.clicktale.com/2011/01/18/new-clicktale-product-launches-for-2011/ http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 58. Monitoring is not your CPU utilization http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 59. The VP R&D Open Seminar SHARDING IN DEPTH http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 60. Sharding Again http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 61. Vertical Sharding http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 62. Horizontal Sharding Static Hashing Complex growth Simple Mod 10 = 0 Mod 10 = 1 Mod 10 = 2 Mod 10 = 3 Mod 10 = 4 Mod 10 = 5 Mod 10 = 6 Mod 10 = 7 Mod 10 = 8 Mod 10 = 9 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 63. Horizontal Sharding Key locations are defined in a directory Simple growth Directory is SPOF The Directory Can be Very Large http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 64. Horizontal Sharding Static Hashing with Directory Mapping Simple Growth The Small Directory Can be Cached on Each App Server Mod 1000 = 4 http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 65. Horizontal Sharding Each key is signed by the DB# on creation Simple growth The Key Store Can be Cached on Each App Server http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com
  • 66. The Bottom Line: Grow ∞ Thank you! and Keep Performing! Moshe Kaplan http://blogs.microsoft.co.il/blogs/vprnd http://top-performance.blogspot.com