0
Petabytes and Nanoseconds 
Distributed Data Storage andthe CAP Theorem 
FIN talk 
Robert Greiner 
Nathan Murray 
August 21...
CHAPTER 
The Problems 
Your phone can add two numbers in the same time it takes light to travel one foot 
All high frequen...
A Common Scenario 
Web 
Application 
RDBMS 
+ =
The Solution: Scale All the Things!!1
Why shouldwe scale? 
Throughput 
Latency 
Storage 
Reliability
The Solution? 
Add a load balancer 
Add more web servers 
Tune the DB. Indexes,SPs, etc.
There’sa new bottleneck 
Generally an RDBMS can becomea bottleneck around 10K transactions per second
Next Step… Distribute Your Data 
Each web server can talk to any data storage node 
Nodes distribute queries and replicate...
Cluster = Additional Complexity
Enter the CAP Theorem! 
This guy created the CAP Theorem 
This guy’s 
VP Invented the internet
CAP Theorem: Defined 
Within a distributed system, you can only make two of the following three guarantees across a write/...
Guarantee 1: Consistency 
If a value is written, and then fetched, I will alwaysget back the new value 
Note: not the same...
Guarantee 2: Availability 
If a value is written, a success message should always be returned. If a subsequent read return...
Guarantee 3: Partition Tolerance 
The system will continue to function when network partitions occur –OOP != NP. 
_ 
Note:...
CAP Triangle 
The CAP Theorem is explained as a triangle 
C, A or P: Pick two 
This is true in practice, except…
When choosing a distributed system… 
vs.
… You Can’t Sacrifice Partition Tolerance! 
NOTDistributed 
(a.k.a. NOTPartition Tolerant) 
Available 
AND 
Consistent 
Di...
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always.
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always. 
A...
CHAPTER 
Whendo companies care?
Companies care about internetscale
Distributed Storage Past 
2004 
Google’s Map Reduce paper published 
2006 
Google’s Big Table paper published 
2007 
Amazo...
Looking forward 
•Open source implementations of more sophisticated storage systems 
•Managed services with more advanced ...
CHAPTER 
How does this affect me
Even our most “legacy” clients are already starting to care about internet scale: 
_
Scenario 
Client = Energy Retailer (Independent Sales Force) 
Sales Agent captures info about potential customer 
Price...
Current State
Solution Strategy 
Assess 
•Analyze business performance needs 
•Select non-performing work streams 
•Filter –(Could/Shoul...
Optimize Code 
Scale Up 
Scale Out 
Managed Service
Optimize CodeLevel 1 
Least organizational impact 
No architecture changes required 
Use existing development processes...
Scale UpLevel 2 
Easiest solution 
Utilize existing infrastructure 
Little/no architecture changes 
Low probability of...
Scale OutLevel 3 
Highest throughput 
Improved system up-time 
No single point of failure 
Linear performance increase...
Managed ServiceLevel 4 
Low barrier to entry 
No additional hardware investment required 
Treat as extension of existin...
Optimize Code(Level 1) 
•Least organizational impact 
•No architecture changes required 
•Use existing development process...
First Attempt
Good Enough?
Taking It to the Next Level
The Best Solution?
What Would YOUDo?
Fin’ 
robert.greiner@parivedasolutions.com 
nathan.murray@parivedasolutions.com
Petabytes and Nanoseconds
Upcoming SlideShare
Loading in...5
×

Petabytes and Nanoseconds

1,716

Published on

Today's technical landscape features workloads that can no longer be accomplished on a single server using technology from years past. As a result, we must find new ways to accommodate the increasing demands on our compute performance. Some of these new strategies introduce trade-offs and additional complexity into a system.

In this presentation, we give an overview of scaling and how to address performance concerns that business are facing, today.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,716
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Petabytes and Nanoseconds"

  1. 1. Petabytes and Nanoseconds Distributed Data Storage andthe CAP Theorem FIN talk Robert Greiner Nathan Murray August 21,2014
  2. 2. CHAPTER The Problems Your phone can add two numbers in the same time it takes light to travel one foot All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
  3. 3. A Common Scenario Web Application RDBMS + =
  4. 4. The Solution: Scale All the Things!!1
  5. 5. Why shouldwe scale? Throughput Latency Storage Reliability
  6. 6. The Solution? Add a load balancer Add more web servers Tune the DB. Indexes,SPs, etc.
  7. 7. There’sa new bottleneck Generally an RDBMS can becomea bottleneck around 10K transactions per second
  8. 8. Next Step… Distribute Your Data Each web server can talk to any data storage node Nodes distribute queries and replicate data – lots more complexity!
  9. 9. Cluster = Additional Complexity
  10. 10. Enter the CAP Theorem! This guy created the CAP Theorem This guy’s VP Invented the internet
  11. 11. CAP Theorem: Defined Within a distributed system, you can only make two of the following three guarantees across a write/read pair
  12. 12. Guarantee 1: Consistency If a value is written, and then fetched, I will alwaysget back the new value Note: not the same as the C in ACID! _
  13. 13. Guarantee 2: Availability If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. _ Note: not the same as the A in HA!
  14. 14. Guarantee 3: Partition Tolerance The system will continue to function when network partitions occur –OOP != NP. _ Note: nothing to do with BAC!
  15. 15. CAP Triangle The CAP Theorem is explained as a triangle C, A or P: Pick two This is true in practice, except…
  16. 16. When choosing a distributed system… vs.
  17. 17. … You Can’t Sacrifice Partition Tolerance! NOTDistributed (a.k.a. NOTPartition Tolerant) Available AND Consistent Distributed (a.k.a. Partition Tolerant) Available OR Consistent _ _
  18. 18. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always.
  19. 19. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always. At a bank, you get a deposit receipt afterthe work is complete At a coffee shop, you get a receipt beforethe work is complete
  20. 20. CHAPTER Whendo companies care?
  21. 21. Companies care about internetscale
  22. 22. Distributed Storage Past 2004 Google’s Map Reduce paper published 2006 Google’s Big Table paper published 2007 Amazon’s Dynamo paper published 2008 Yahoo runs search on Hadoop 2008 Facebook open sources Cassandra 2008 Bitcoin paper published 2009 Yahoo open sources Hadoop 2010 Azure Table Storage released 2012 Google’s Spanner and F1 papers 2013 Amazon releases DynamoDB inside AWS 2014 Google’s Mesa paper published 2015 ????
  23. 23. Looking forward •Open source implementations of more sophisticated storage systems •Managed services with more advanced capabilities •Google Cloud versions of F1, Spanner, or Mesa? •NoSQL + SQL •Distributed data storage in untrusted environments
  24. 24. CHAPTER How does this affect me
  25. 25. Even our most “legacy” clients are already starting to care about internet scale: _
  26. 26. Scenario Client = Energy Retailer (Independent Sales Force) Sales Agent captures info about potential customer Price generated on-demand based on daily rate curve Quote no longer valid at midnight Each night, rates are updated based on new rate-curve Used to take 4hours Now takes > 24hours (Due to increased demand)
  27. 27. Current State
  28. 28. Solution Strategy Assess •Analyze business performance needs •Select non-performing work streams •Filter –(Could/Should) •Prioritize •Performance Baseline / Load Test Strategize •Identify Bottlenecks (CPU/RAM/Network) •Optimization strategy •Technology Selection Implement •POC •Load Test •Optimize •Build
  29. 29. Optimize Code Scale Up Scale Out Managed Service
  30. 30. Optimize CodeLevel 1 Least organizational impact No architecture changes required Use existing development processes Risky –Code may be fine Expensive –Dev Resources Time Consuming –Dev + Deploy
  31. 31. Scale UpLevel 2 Easiest solution Utilize existing infrastructure Little/no architecture changes Low probability of network partitions May not solve the problem long-term Hardware limitations Non-linear improvement (2x RAM != 2x Performance) C/A
  32. 32. Scale OutLevel 3 Highest throughput Improved system up-time No single point of failure Linear performance increases Use commodity hardware –Hard to scale-up CPU Increased infrastructure / system complexity Increased probability of network partitions Automation complexity A/C
  33. 33. Managed ServiceLevel 4 Low barrier to entry No additional hardware investment required Treat as extension of existing data center Appliance configuration Globally redundant (cloud) Most organizational change Less control and customization Built-in redundancy and innovation C/A A/C
  34. 34. Optimize Code(Level 1) •Least organizational impact •No architecture changes required •Use existing development processes •Risky –Code may be fine •Expensive –Dev Resources •Time Consuming –Dev + Deploy Scale Up(Level 2) •Easiest solution •Utilize existing infrastructure •Little/no architecture changes •Reduce probability of network partitions •May not solve the problem long-term •Hardware limitations •Non-linear improvement Scale Out(Level 3) •Highest throughput •Improved system up-time •No single point of failure •Linear performance inc. •Use commodity hardware •Increased infrastructure / system complexity •Increased probability of network partitions •Automation complexity Managed Service(Level 4) •Low barrier to entry •No additional hardware investment required •Treat as extension of existing data center •Appliance configuration •Globally redundant (cloud) •Most organizational change •Less control and customization •High innovation Pick One (Or More!)
  35. 35. First Attempt
  36. 36. Good Enough?
  37. 37. Taking It to the Next Level
  38. 38. The Best Solution?
  39. 39. What Would YOUDo?
  40. 40. Fin’ robert.greiner@parivedasolutions.com nathan.murray@parivedasolutions.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×