Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Paradigm Shift: TheIncreasing Dominance ofMemory-Oriented Solutions forHigh Performance Data Access!Ben Stopford : RBS
How fast is a HashMap lookup?~20 ns
That’s how long it takes light to         travel a room
How fast is a database lookup?~20 ms
That’s how long it takes light to go       to Australia and back
3 times
Computers really are very fast!
The problem is we’re quite good atwriting software that slows them down
Desktop Virtualization
We loveabstraction
There are many reasonswhy abstraction is agood idea… …performance just isn’tone of them
Question: is it fair to compare aDatabase with a HashMap?
Not really…
Key Point On one end of                     ..on the other sitsthe scale sits the                       the database…   Ha...
Times are changing
Database Architecture isAging
The Traditional Architecture
Traditional       Shared                  Shared                In Memory        Disk                  NothingDistributed ...
Simplifying the   Contract
How big is the internet?     5 exabytes              (which is 5,000 petabytes or   5,000,000 terabytes)
How big is an average enterprise           database   80% < 1TB           (in 2009)
Simplifying the Contract
Databases have huge operational          overheads                             Taken from “OLTP Through                   ...
Avoid that overhead with a simplercontract and avoiding IO
Improving Database Performance !Shared Disk Architecture                            Shared                             Disk
Improving Database Performance !Shared Nothing Architecture
Each machine is responsible for a subset of the   records. Each record exists on only one                  machine.!      ...
Improving Database Performance (3)! In Memory Databases!(single address-space)
Databases must cache subsets of      the data in memory             Cache
Not knowing what you don’t know        90% in Cache           Data on Disk
If you can fit it ALL in memory youknow everything!!
The architecture of an in memory            database
Memory is at least 100x faster than disk               ms   μs       ns           ps1MB Disk/Network        1MB Main Memor...
Memory allows random access.Disk only works well for sequentialreads
This makes them very fast!!
The proof is in the stats. TPC-HBenchmarks on a 1TB data set
So why haven’t in memory  databases taken off?
Address-Spaces are relatively small     and of a finite, fixed size
Durability
One solution is distribution
Distributed In Memory (Shared           Nothing)
Again we spread our data but this time only               using RAM.                   1, 2, 3…   97, 98, 99…           76...
Distribution solves our two          problems
We get massive amounts of parallel          processing
But at the cost ofloosing the single  address space
Traditional       Shared                  Shared                In Memory        Disk                  NothingDistributed ...
There are three key themes here:                  Simplify theDistribution                        No Disk                 ...
ODC
ODC – Distributed, Shared Nothing, InMemory, Semi-Normalised, Graph DB      450 processes      2TB of RAM  Messaging (Topi...
ODC represents abalance between throughput and     latency
What is Latency?
What is Throughput
Which is best for latency?                 Shared                Nothing               (Distributed)Traditional           ...
Which is best for throughput?                   Shared                  Nothing                 (Distributed)  Traditional...
So why do we use distributed in          memory?                     Plentiful       In Memory                    hardware...
This is the technology ofthe now. So what is the technologyof the future?
Terabyte Memory Architectures
Fast Persistent Storage
New Innovations on the Horizon
These factors are remolding thehardware landscape to one where memory both vast and durable
This is changing the way we write             software
Huge servers in thecommodity space aredriving us towards singleprocess architectures thatutilise many cores andlarge addre...
We can attain hundreds ofthousands of executionsper second from a singleprocess if it is welloptimised.
“All computers wait at thesame speed” !
We need to optimise for our CPU architecture               ms   μs       ns           ps1MB Disk/Network        1MB Main M...
Tools like Vtune allow us tooptimise software to truly leverage           our hardware
So what does this all mean?
Further Reading
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access
Upcoming SlideShare
Loading in …5
×

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access

13,176 views

Published on

This lecture was presented at UCL on the Financial Computing course in October 2011.

Published in: Technology
  • Be the first to comment

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access

  1. 1. A Paradigm Shift: TheIncreasing Dominance ofMemory-Oriented Solutions forHigh Performance Data Access!Ben Stopford : RBS
  2. 2. How fast is a HashMap lookup?~20 ns
  3. 3. That’s how long it takes light to travel a room
  4. 4. How fast is a database lookup?~20 ms
  5. 5. That’s how long it takes light to go to Australia and back
  6. 6. 3 times
  7. 7. Computers really are very fast!
  8. 8. The problem is we’re quite good atwriting software that slows them down
  9. 9. Desktop Virtualization
  10. 10. We loveabstraction
  11. 11. There are many reasonswhy abstraction is agood idea… …performance just isn’tone of them
  12. 12. Question: is it fair to compare aDatabase with a HashMap?
  13. 13. Not really…
  14. 14. Key Point On one end of ..on the other sitsthe scale sits the the database… HashMap… …but it’s a very very long scale that sits between them.
  15. 15. Times are changing
  16. 16. Database Architecture isAging
  17. 17. The Traditional Architecture
  18. 18. Traditional Shared Shared In Memory Disk NothingDistributed SimplerIn Memory Contract
  19. 19. Simplifying the Contract
  20. 20. How big is the internet? 5 exabytes (which is 5,000 petabytes or 5,000,000 terabytes)
  21. 21. How big is an average enterprise database 80% < 1TB (in 2009)
  22. 22. Simplifying the Contract
  23. 23. Databases have huge operational overheads Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  24. 24. Avoid that overhead with a simplercontract and avoiding IO
  25. 25. Improving Database Performance !Shared Disk Architecture Shared Disk
  26. 26. Improving Database Performance !Shared Nothing Architecture
  27. 27. Each machine is responsible for a subset of the records. Each record exists on only one machine.! 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  28. 28. Improving Database Performance (3)! In Memory Databases!(single address-space)
  29. 29. Databases must cache subsets of the data in memory Cache
  30. 30. Not knowing what you don’t know 90% in Cache Data on Disk
  31. 31. If you can fit it ALL in memory youknow everything!!
  32. 32. The architecture of an in memory database
  33. 33. Memory is at least 100x faster than disk ms μs ns ps1MB Disk/Network 1MB Main Memory 0.000,000,000,000Cross Continental Main Memory L1 Cache RefRound Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  34. 34. Memory allows random access.Disk only works well for sequentialreads
  35. 35. This makes them very fast!!
  36. 36. The proof is in the stats. TPC-HBenchmarks on a 1TB data set
  37. 37. So why haven’t in memory databases taken off?
  38. 38. Address-Spaces are relatively small and of a finite, fixed size
  39. 39. Durability
  40. 40. One solution is distribution
  41. 41. Distributed In Memory (Shared Nothing)
  42. 42. Again we spread our data but this time only using RAM. 1, 2, 3… 97, 98, 99… 765, 769… 169, 170…Client 333, 334… 244, 245…
  43. 43. Distribution solves our two problems
  44. 44. We get massive amounts of parallel processing
  45. 45. But at the cost ofloosing the single address space
  46. 46. Traditional Shared Shared In Memory Disk NothingDistributed SimplerIn Memory Contract
  47. 47. There are three key themes here: Simplify theDistribution No Disk contract Improve Gain scalability by scalability picking All data is through a appropriate held in RAM distributed ACID architecture properties.
  48. 48. ODC
  49. 49. ODC – Distributed, Shared Nothing, InMemory, Semi-Normalised, Graph DB 450 processes 2TB of RAM Messaging (Topic Based) as a system of record (persistence)
  50. 50. ODC represents abalance between throughput and latency
  51. 51. What is Latency?
  52. 52. What is Throughput
  53. 53. Which is best for latency? Shared Nothing (Distributed)Traditional In-MemoryDatabase Database Latency?
  54. 54. Which is best for throughput? Shared Nothing (Distributed) Traditional In-Memory Database Database Latency?
  55. 55. So why do we use distributed in memory? Plentiful In Memory hardware Latency Throughput
  56. 56. This is the technology ofthe now. So what is the technologyof the future?
  57. 57. Terabyte Memory Architectures
  58. 58. Fast Persistent Storage
  59. 59. New Innovations on the Horizon
  60. 60. These factors are remolding thehardware landscape to one where memory both vast and durable
  61. 61. This is changing the way we write software
  62. 62. Huge servers in thecommodity space aredriving us towards singleprocess architectures thatutilise many cores andlarge address spaces
  63. 63. We can attain hundreds ofthousands of executionsper second from a singleprocess if it is welloptimised.
  64. 64. “All computers wait at thesame speed” !
  65. 65. We need to optimise for our CPU architecture ms μs ns ps1MB Disk/Network 1MB Main Memory 0.000,000,000,000Cross Continental Main Memory L1 Cache RefRound Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  66. 66. Tools like Vtune allow us tooptimise software to truly leverage our hardware
  67. 67. So what does this all mean?
  68. 68. Further Reading

×