Successfully reported this slideshow.

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access

2

Share

Loading in …3
×
1 of 69
1 of 69

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access

  1. 1. A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access! Ben Stopford : RBS
  2. 2. How fast is a HashMap lookup? ~20 ns
  3. 3. That’s how long it takes light to travel a room
  4. 4. How fast is a database lookup? ~20 ms
  5. 5. That’s how long it takes light to go to Australia and back
  6. 6. 3 times
  7. 7. Computers really are very fast!
  8. 8. The problem is we’re quite good at writing software that slows them down
  9. 9. Desktop Virtualization
  10. 10. We love abstraction
  11. 11. There are many reasons why abstraction is a good idea… …performance just isn’t one of them
  12. 12. Question: is it fair to compare a Database with a HashMap?
  13. 13. Not really…
  14. 14. Key Point On one end of ..on the other sits the scale sits the the database… HashMap… …but it’s a very very long scale that sits between them.
  15. 15. Times are changing
  16. 16. Database Architecture is Aging
  17. 17. The Traditional Architecture
  18. 18. Traditional Shared Shared In Memory Disk Nothing Distributed Simpler In Memory Contract
  19. 19. Simplifying the Contract
  20. 20. How big is the internet? 5 exabytes (which is 5,000 petabytes or 5,000,000 terabytes)
  21. 21. How big is an average enterprise database 80% < 1TB (in 2009)
  22. 22. Simplifying the Contract
  23. 23. Databases have huge operational overheads Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  24. 24. Avoid that overhead with a simpler contract and avoiding IO
  25. 25. Improving Database Performance ! Shared Disk Architecture Shared Disk
  26. 26. Improving Database Performance ! Shared Nothing Architecture
  27. 27. Each machine is responsible for a subset of the records. Each record exists on only one machine.! 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  28. 28. Improving Database Performance (3)! In Memory Databases! (single address-space)
  29. 29. Databases must cache subsets of the data in memory Cache
  30. 30. Not knowing what you don’t know 90% in Cache Data on Disk
  31. 31. If you can fit it ALL in memory you know everything!!
  32. 32. The architecture of an in memory database
  33. 33. Memory is at least 100x faster than disk ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  34. 34. Memory allows random access. Disk only works well for sequential reads
  35. 35. This makes them very fast!!
  36. 36. The proof is in the stats. TPC-H Benchmarks on a 1TB data set
  37. 37. So why haven’t in memory databases taken off?
  38. 38. Address-Spaces are relatively small and of a finite, fixed size
  39. 39. Durability
  40. 40. One solution is distribution
  41. 41. Distributed In Memory (Shared Nothing)
  42. 42. Again we spread our data but this time only using RAM. 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  43. 43. Distribution solves our two problems
  44. 44. We get massive amounts of parallel processing
  45. 45. But at the cost of loosing the single address space
  46. 46. Traditional Shared Shared In Memory Disk Nothing Distributed Simpler In Memory Contract
  47. 47. There are three key themes here: Simplify the Distribution No Disk contract Improve Gain scalability by scalability picking All data is through a appropriate held in RAM distributed ACID architecture properties.
  48. 48. ODC
  49. 49. ODC – Distributed, Shared Nothing, In Memory, Semi-Normalised, Graph DB 450 processes 2TB of RAM Messaging (Topic Based) as a system of record (persistence)
  50. 50. ODC represents a balance between throughput and latency
  51. 51. What is Latency?
  52. 52. What is Throughput
  53. 53. Which is best for latency? Shared Nothing (Distributed) Traditional In-Memory Database Database Latency?
  54. 54. Which is best for throughput? Shared Nothing (Distributed) Traditional In-Memory Database Database Latency?
  55. 55. So why do we use distributed in memory? Plentiful In Memory hardware Latency Throughput
  56. 56. This is the technology of the now. So what is the technology of the future?
  57. 57. Terabyte Memory Architectures
  58. 58. Fast Persistent Storage
  59. 59. New Innovations on the Horizon
  60. 60. These factors are remolding the hardware landscape to one where memory both vast and durable
  61. 61. This is changing the way we write software
  62. 62. Huge servers in the commodity space are driving us towards single process architectures that utilise many cores and large address spaces
  63. 63. We can attain hundreds of thousands of executions per second from a single process if it is well optimised.
  64. 64. “All computers wait at the same speed” !
  65. 65. We need to optimise for our CPU architecture ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  66. 66. Tools like Vtune allow us to optimise software to truly leverage our hardware
  67. 67. So what does this all mean?
  68. 68. Further Reading

×