SlideShare a Scribd company logo
1 of 60
Download to read offline
Building Resource Efficient
Distributed Systems At Scale
Michael Pellon (@p3ll0n)
Operations Engineer
In the ideal world . . .
. . . we want to be here
cost
work
But in the “real” world . . .
. . . we usually find ourselves here
cost
work
Big “jumps” are possible in a relatively short timeframe!
requestspersecond
~ 2009 - 2012
joules
~ 2013 - ???
RPS/dollar: 4.1x
RPS/joule: 4.3x
RPS/rack: 10.4x
Avoid “density without value”!
“Respect the problem.”
- Theo Schlossnagle, OmniTI
There is no free lunch.
Tradeoffs cannot be solved by marketing.
How to play with the “big boys” when
you are not as “big” as them ...
Lesson #1
Understand deeply the relationship
between latency, bandwidth and capacity
across all levels of your infrastructure.
< disk seeks = higher performance
> caching = higher performance
We end up with an ever increasing amount
of our cheap DRAM is used to hide the
terrible latency of our cheap storage.
This growing split between the bandwidth and latency of
our storage systems only becomes apparent at large scale.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
➔ Latency is driven by physical limits whereas bandwidth can be
addressed through parallelism.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
➔ Latency is driven by physical limits whereas bandwidth can be
addressed through parallelism.
➔ Bountiful bandwidth with lagging latency!
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ Widening gap between bandwidth and capacity.
➔ Widening gap between bandwidth and capacity.
➔ Time to read a complete disk with random IO is increasing 22x /
decade or 36% / year.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ Widening gap between bandwidth and capacity.
➔ Time to read a complete disk with random IO is increasing 22x /
decade or 36% / year.
➔ Now our applications cannot afford to have a cache miss!
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
Solutions
Caching, prediction and replication.
Solutions
Caching, prediction and replication.
Tape is dead.
Disk is tape.
Flash is disk.
RAM locality is king.
- Jim Gray, Microsoft (2006)
Requires very careful attention to durability.
Solutions
Caching, prediction and replication.
Expend bandwidth to reduce apparent latency.
Solutions
Caching, prediction and replication.
Expend capacity to reduce apparent latency.
Avoid the problem entirely by using more servers with
cheaper, lower powered processors that more closely
match the capabilities of the memory subsystem.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or
memory) bound so spending more on a faster CPU will not deliver results.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or
memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current
generation server CPUs because there is far less competition in server
processors prices tend to be higher and price/performance relatively low.
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or
memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current
generation server CPUs because there is far less competition in server
processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or
memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current
generation server CPUs because there is far less competition in server
processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or
memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current
generation server CPUs because there is far less competition in server
processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65
➔ ~25% the processing rate @ ~10% the cost!
➔ Leverages the massive volume economics of the smart device (e.g., cell phones
and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or
memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current
generation server CPUs because there is far less competition in server
processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65
➔ ~25% the processing rate @ ~10% the cost!
➔ Volume of the device ecosystem fuels innovation so the performance gap
shrinks each generation!
➔ These machines also help with one of the biggest and most certainly the
fastest growing cost of any data center -- power!
➔ These machines also help with one of the biggest and most certainly the
fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt
boogie)!
➔ These machines also help with one of the biggest and most certainly the
fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt
boogie)!
➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks
provisioning 12 - 14 kW racks just to fill it up 50%!)
➔ These machines also help with one of the biggest and most certainly the
fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt
boogie)!
➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks
provisioning 12 - 14 kW racks just to fill it up 50%!)
➔ If you can save a lot on op-ex by spending a little more on cap-ex it’s a great
bargain! (ask your CFO!)
➔ People costs dominate the enterprise player’s data centers but it is very easy
and cheap to not let them dominate your costs.
➔ People costs dominate the enterprise player’s data centers but it is very easy
and cheap to not let them dominate your costs.
➔ The barrier to entry into automation tools (Puppet, Chef, etc) has never been
lower and their penetration into existing systems (networking devices, etc)
has never been higher.
Lesson #2
Understand that distributed systems are
fundamentally about dealing with
distance and having more than one thing.
Currently writing distributed applications is usually not
indistinguishable from writing non-distributed applications.
Despite the non-zero probability of failure within a
nearly every aspect of modern computers;
developers of non-distributed applications do not
routinely maintain a concept of failing hardware.
complexity
instruction
s
behaviors
instruction
s
behaviors
programming
language
hardware
limitations
The difference between an entire data center and a single
computer should only be quantitative not qualitative.
Since software development is an entirely
quantitative pursuit we should be able to conceal the
entire complexity of the Internet within software.
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols)
spanning multiple globally distributed data centers.
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols)
spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s
Borg/Omega, Apache Mesos, Airbnb’s Chronos, etc.)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols)
spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s
Borg/Omega, Apache Mesos, Airbnb Chronos, etc.)
➔ nanomsg scalability protocols (M. Sustrik).
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols)
spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s
Borg/Omega, Mesos, Airbnb, etc.)
➔ nanomsg scalability protocols (M. Sustrik).
➔ Not only possible but the clear “silent” choice of the
majority!
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both
broadly and deeply so you know where to focus all your
resources most effectively.
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both
broadly and deeply so you know where to focus all your
resources most effectively.
➔ That understanding will allow to you operate at
economies of scale that free up your most important
resource -- people.
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both
broadly and deeply so you know where to focus all your
resources most effectively.
➔ That understanding will allow to you operate at
economies of scale that free up your most important
resource -- people.
➔ But remember the focus of our resources is not
necessarily where your resources should be focused nor is
anyone elses.
So how to play “big” when you’re “small”?
➔ Look for areas where a qualitative difference could easily
become merely a quantitative difference.
So how to play “big” when you’re “small”?
➔ Look for areas where a qualitative difference could easily
become merely a quantitative difference.
➔ Quantitative problems are easy to solve through
technology, however, qualitative problems are very
intractable through technology alone.

More Related Content

Similar to Acug datafiniti pellon_sept2013

Buying Your Next Computer
Buying Your Next ComputerBuying Your Next Computer
Buying Your Next ComputerLeslie Eyton
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfRedis Labs
 
Using preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael aultUsing preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael aultLouis liu
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksMongoDB
 
Optimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudOptimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudAmazon Web Services
 
A4 oracle's application engineered storage your application advantage
A4   oracle's application engineered storage your application advantageA4   oracle's application engineered storage your application advantage
A4 oracle's application engineered storage your application advantageDr. Wilfred Lin (Ph.D.)
 
5 Things You Need to Know About Enterprise Fl
 5 Things You Need to Know About Enterprise Fl 5 Things You Need to Know About Enterprise Fl
5 Things You Need to Know About Enterprise FlWestern Digital
 
Dell whitepaper busting solid state storage myths
Dell whitepaper busting solid state storage mythsDell whitepaper busting solid state storage myths
Dell whitepaper busting solid state storage mythsNatalie Cerullo
 
All about Azure workshop deck
All about Azure workshop deckAll about Azure workshop deck
All about Azure workshop deckAlexey Bokov
 
Webinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash MarketWebinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash MarketStorage Switzerland
 
Future Trends in IT Storage
Future Trends in IT StorageFuture Trends in IT Storage
Future Trends in IT StorageTony Pearson
 
Performance Tuning
Performance TuningPerformance Tuning
Performance TuningJannet Peetz
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchAzul Systems Inc.
 
Flash for the Real World – Separate Hype from Reality
Flash for the Real World – Separate Hype from RealityFlash for the Real World – Separate Hype from Reality
Flash for the Real World – Separate Hype from RealityHitachi Vantara
 
Business Continuity Presentation[1]
Business Continuity Presentation[1]Business Continuity Presentation[1]
Business Continuity Presentation[1]jrm1224
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11CloudExpoEurope
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSumeet Bansal
 
The benefits of IBM FlashSystems
The benefits of IBM FlashSystemsThe benefits of IBM FlashSystems
The benefits of IBM FlashSystemsLuca Comparini
 

Similar to Acug datafiniti pellon_sept2013 (20)

Buying Your Next Computer
Buying Your Next ComputerBuying Your Next Computer
Buying Your Next Computer
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
 
Using preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael aultUsing preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael ault
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware Bottlenecks
 
Optimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudOptimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS Cloud
 
A4 oracle's application engineered storage your application advantage
A4   oracle's application engineered storage your application advantageA4   oracle's application engineered storage your application advantage
A4 oracle's application engineered storage your application advantage
 
5 Things You Need to Know About Enterprise Fl
 5 Things You Need to Know About Enterprise Fl 5 Things You Need to Know About Enterprise Fl
5 Things You Need to Know About Enterprise Fl
 
Dell whitepaper busting solid state storage myths
Dell whitepaper busting solid state storage mythsDell whitepaper busting solid state storage myths
Dell whitepaper busting solid state storage myths
 
All about Azure workshop deck
All about Azure workshop deckAll about Azure workshop deck
All about Azure workshop deck
 
Webinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash MarketWebinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash Market
 
Future Trends in IT Storage
Future Trends in IT StorageFuture Trends in IT Storage
Future Trends in IT Storage
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
Flash for the Real World – Separate Hype from Reality
Flash for the Real World – Separate Hype from RealityFlash for the Real World – Separate Hype from Reality
Flash for the Real World – Separate Hype from Reality
 
Business Continuity Presentation[1]
Business Continuity Presentation[1]Business Continuity Presentation[1]
Business Continuity Presentation[1]
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teams
 
The benefits of IBM FlashSystems
The benefits of IBM FlashSystemsThe benefits of IBM FlashSystems
The benefits of IBM FlashSystems
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Acug datafiniti pellon_sept2013

  • 1. Building Resource Efficient Distributed Systems At Scale Michael Pellon (@p3ll0n) Operations Engineer
  • 2. In the ideal world . . . . . . we want to be here cost work
  • 3. But in the “real” world . . . . . . we usually find ourselves here cost work
  • 4. Big “jumps” are possible in a relatively short timeframe! requestspersecond ~ 2009 - 2012 joules ~ 2013 - ??? RPS/dollar: 4.1x RPS/joule: 4.3x RPS/rack: 10.4x
  • 6. “Respect the problem.” - Theo Schlossnagle, OmniTI
  • 7. There is no free lunch.
  • 8. Tradeoffs cannot be solved by marketing.
  • 9. How to play with the “big boys” when you are not as “big” as them ...
  • 10. Lesson #1 Understand deeply the relationship between latency, bandwidth and capacity across all levels of your infrastructure.
  • 11. < disk seeks = higher performance
  • 12. > caching = higher performance
  • 13. We end up with an ever increasing amount of our cheap DRAM is used to hide the terrible latency of our cheap storage.
  • 14. This growing split between the bandwidth and latency of our storage systems only becomes apparent at large scale.
  • 15. CPU DRAM LAN Disk Bandwidth 1.50 1.27 1.39 1.28 Latency 1.17 1.07 1.12 1.09 Annual Bandwidth and Latency Improvements (Patterson, 2004) * Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year. ➔ CPU fastest to change and DRAM is the slowest.
  • 16. CPU DRAM LAN Disk Bandwidth 1.50 1.27 1.39 1.28 Latency 1.17 1.07 1.12 1.09 Annual Bandwidth and Latency Improvements (Patterson, 2004) * Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year. ➔ CPU fastest to change and DRAM is the slowest. ➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.
  • 17. CPU DRAM LAN Disk Bandwidth 1.50 1.27 1.39 1.28 Latency 1.17 1.07 1.12 1.09 Annual Bandwidth and Latency Improvements (Patterson, 2004) * Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year. ➔ CPU fastest to change and DRAM is the slowest. ➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism. ➔ Bountiful bandwidth with lagging latency!
  • 18. CPU DRAM LAN Disk Bandwidth 1.50 1.27 1.39 1.28 Capacity -- 1.52 -- 1.48 Annual Bandwidth and Capacity Improvements (Patterson, 2004) * Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year. ➔ Widening gap between bandwidth and capacity.
  • 19. ➔ Widening gap between bandwidth and capacity. ➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year. CPU DRAM LAN Disk Bandwidth 1.50 1.27 1.39 1.28 Capacity -- 1.52 -- 1.48 Annual Bandwidth and Capacity Improvements (Patterson, 2004) * Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
  • 20. ➔ Widening gap between bandwidth and capacity. ➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year. ➔ Now our applications cannot afford to have a cache miss! CPU DRAM LAN Disk Bandwidth 1.50 1.27 1.39 1.28 Capacity -- 1.52 -- 1.48 Annual Bandwidth and Capacity Improvements (Patterson, 2004) * Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
  • 23. Tape is dead. Disk is tape. Flash is disk. RAM locality is king. - Jim Gray, Microsoft (2006)
  • 24. Requires very careful attention to durability.
  • 26. Expend bandwidth to reduce apparent latency.
  • 28. Expend capacity to reduce apparent latency.
  • 29. Avoid the problem entirely by using more servers with cheaper, lower powered processors that more closely match the capabilities of the memory subsystem.
  • 30. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
  • 31. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market. ➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
  • 32. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market. ➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results. ➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
  • 33. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market. ➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results. ➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low. ➔ Server CPU = ~$300 - ~$1000
  • 34. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market. ➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results. ➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low. ➔ Server CPU = ~$300 - ~$1000 ➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65
  • 35. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market. ➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results. ➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low. ➔ Server CPU = ~$300 - ~$1000 ➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!
  • 36. ➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market. ➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results. ➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low. ➔ Server CPU = ~$300 - ~$1000 ➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost! ➔ Volume of the device ecosystem fuels innovation so the performance gap shrinks each generation!
  • 37. ➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
  • 38. ➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power! ➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
  • 39. ➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power! ➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)! ➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)
  • 40. ➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power! ➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)! ➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!) ➔ If you can save a lot on op-ex by spending a little more on cap-ex it’s a great bargain! (ask your CFO!)
  • 41. ➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.
  • 42. ➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs. ➔ The barrier to entry into automation tools (Puppet, Chef, etc) has never been lower and their penetration into existing systems (networking devices, etc) has never been higher.
  • 43. Lesson #2 Understand that distributed systems are fundamentally about dealing with distance and having more than one thing.
  • 44. Currently writing distributed applications is usually not indistinguishable from writing non-distributed applications.
  • 45. Despite the non-zero probability of failure within a nearly every aspect of modern computers; developers of non-distributed applications do not routinely maintain a concept of failing hardware.
  • 49. The difference between an entire data center and a single computer should only be quantitative not qualitative.
  • 50. Since software development is an entirely quantitative pursuit we should be able to conceal the entire complexity of the Internet within software.
  • 51. A clear trajectory in the same direction … ➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
  • 52. A clear trajectory in the same direction … ➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr). ➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
  • 53. A clear trajectory in the same direction … ➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr). ➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers. ➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb’s Chronos, etc.)
  • 54. A clear trajectory in the same direction … ➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr). ➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers. ➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb Chronos, etc.) ➔ nanomsg scalability protocols (M. Sustrik).
  • 55. A clear trajectory in the same direction … ➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr). ➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers. ➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Mesos, Airbnb, etc.) ➔ nanomsg scalability protocols (M. Sustrik). ➔ Not only possible but the clear “silent” choice of the majority!
  • 56. So how to play “big” when you’re “small”? ➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
  • 57. So how to play “big” when you’re “small”? ➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively. ➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.
  • 58. So how to play “big” when you’re “small”? ➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively. ➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people. ➔ But remember the focus of our resources is not necessarily where your resources should be focused nor is anyone elses.
  • 59. So how to play “big” when you’re “small”? ➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.
  • 60. So how to play “big” when you’re “small”? ➔ Look for areas where a qualitative difference could easily become merely a quantitative difference. ➔ Quantitative problems are easy to solve through technology, however, qualitative problems are very intractable through technology alone.