Performance
Optimization of Cloud
Based Applications
Dr. Peter Smith, Principal Engineer, ACL
Overview
• Why Optimize?
• … business reasons you should care…
• Technical Stuff
• … contact me later if you want more detail…
• Recommendations
• … to take back to your company tomorrow…
About ACL
• Founded in 1987 – Vancouver headquartered.
• Audit for Fraud Detection
• Data-Driven Governance, Risk Management, Compliance (GRC)
• On-Premise Software
• Windows .NET and Java
• SaaS
• Entirely AWS-based
• Ruby-on-Rails, Node.js,
Golang, Scala.
Typical SaaS Architecture
End Users
Edge
Locations
Load
Balancer
Application/Web
Servers
Batch Servers
Database
Why Optimize?
Why Should You Care about Optimization?
Performance: How fast does your system react, even to one user?
Scalability: How many users, or how much data can you handle?
(alternatively, do you have the ability to scale?)
Versus
Why Should You Care about Optimization?
• Improve end-user experience – New Users
• Focus on features, less so on performance and scalability.
• Improve end-user experience – Existing Users
• Lots of ā€œloadā€, might abandon your product if too slow.
• Reduce cloud operating costs
• Handling the same workload using less infrastructure leads to
lower cost.
Technical Stuff
(three common areas)
Problem Area 1 - Beware of Latency
Latency: The time required for data to travel from Point A to Point B
Vancouver to Virginia 100ms
Vancouver to Singapore 300ms
Between availability zones 1ms
From web server to database 100μs +
From main memory into CPU 100ns
Key Problem – ā€œChatty Protocolsā€ (repeat
1000x)
Problem Area 1 - Beware of Latency
Example 1: SSL/TLS Negotiation
• TLS is a very ā€œchattyā€ protocol, compared with non-SSL.
500ms versus 180ms to connect!
• New SSL connections require FIVE round trip messages.
• Solution: Use nearby CloudFront as SSL endpoint.
Example 2: N + 1 SQL Database queries
• Badly written queries => too many queries.
• With large data sets, latency adds up!
• Solution: Rewrite using a single optimized query.
Problem Area 1 - Beware of Latency
Problem Area 2 - Efficiency of Your Code
Modern computers can do:
• Billions of cycles/second
• Millions of RAM accesses/second
• Thousands of disk accesses/second
You will waste them, but you need to know where:
• Understand what’s being stored in RAM.
• Understand what your CPU is doing.
• Understand what your disk or database is busy loading.
Problem Area 2 - Efficiency of Your Code
Problem Area 2 - Efficiency of Your Code
Example: Fetch list of Facebook friends
• From SQL database: 10ms
• From cached copy in memory: 1μs
• 10,000 times faster!
Problems:
• Knowing when to ā€œinvalidateā€ caches is hard!
• Implementing caches is hard!
• Storage is more expensive.
Caching: Storing hard-to-compute results for later reuse
Problem Area 2 - Efficiency of Your Code
Language/Framework – Choose carefully, based on needs.
Example: Ruby on Rails
• Awesome for development of interactive sites.
• Easy to learn, develop, and debug.
Example: Scala with Play Framework
• Awesome for high performance and scalable
systems, including analytics.
• Harder to learn, slower to implement code.
Problem Area 3 - Architecting for ā€œScale Outā€
Always architect your software to be scalable.
• Although not necessarily scaled.
Some common principles:
1. Think distributed instead of monolithic (scale-out not scale-up)
2. Design software components to be stateless (easier scale-out)
3. Assume multiple databases:
• You might start with one database, but eventually it’ll become a
bottleneck – plan for having many.
Recommendations
How to Prioritize?
1. A potential customer is negotiating a deal, but wants a feature
added before they’ll pay.
2. An existing customer is pushing your software to new limits
(more users, more data), but is noticing problems.
3. Your QA team stress-tests your product, making it fail.
TTM
Quality
Performance
Scalability
Strategy 1 - Don’t Optimize Prematurely
• Keep performance and scalability in mind,
but don’t over-engineer your software.
• Keep ahead of the problems, but not too far ahead.
• When building complex software systems, the actual bottlenecks
might surprise you.
Strategy 2 - Define ā€œGood Enoughā€
• Have an organization-wide consensus on
performance and scalability expectations.
• Don’t leave it up to personal judgment whether
something is good enough – pass/fail must be obvious.
• For example:
• 95% of requests complete within 2 seconds.
• 99.9% of requests complete within 5 seconds.
• The remaining 0.1% might take longer.
Strategy 3 – Use Measurement Tools
• Don’t base problems on ā€œpersonal feelingsā€.
• Collect performance data for all users (use New Relic / Data Dog)
• Raise alerts when data goes beyond ā€œgood enoughā€ – be proactive
Strategy 4 – Get Upper Management on Board
• There’s nothing worse than conflicting messages:
ā€œYou need to focus on performance of the product, but that
customer feature is needed by Fridayā€
• Software Developers are often conflicted,
needing firm and consistent leadership.
• Don’t pull them in multiple directions!
Strategy 5 – Identify Technical Champions
It’s easy to say ā€œwrite performant and scalable codeā€, but HOW???
• Identify knowledgeable and passionate individual
contributors.
• Make these the ā€œgo toā€ people for advising-on
and reviewing performance-critical code.
• Pay special attention to these people when
they’re concerned about issues.
Strategy 6 - Always Test with Realistic Data
Developers often test using small amounts of data, typically on laptops.
• Won’t find performance or scalability issues!
• For example:
• Customers had 1000 network devices, but we
tested with 6. Found an O(n^3) algorithm!
Instead, periodically spin-up servers for scalability testing:
• Use production-sized servers.
• Use production-sized data and workload.
Summary
Take Away Message
Tomorrow, I hope you think differently about performance and
scalability.
1) Do you believe that performance and scalability are
important to think about?
2) What are your company’s quantifiable expectations
on performance and scalability?
3) Are you doing a good enough job to measure and
identify problems?
4) Are there any cultural issues in your organization
preventing you from reaching your goals?
Thanks…
peter_smith@acl.com

Performance Optimization of Cloud Based Applications by Peter Smith, ACL

  • 1.
    Performance Optimization of Cloud BasedApplications Dr. Peter Smith, Principal Engineer, ACL
  • 2.
    Overview • Why Optimize? •… business reasons you should care… • Technical Stuff • … contact me later if you want more detail… • Recommendations • … to take back to your company tomorrow…
  • 3.
    About ACL • Foundedin 1987 – Vancouver headquartered. • Audit for Fraud Detection • Data-Driven Governance, Risk Management, Compliance (GRC) • On-Premise Software • Windows .NET and Java • SaaS • Entirely AWS-based • Ruby-on-Rails, Node.js, Golang, Scala.
  • 4.
    Typical SaaS Architecture EndUsers Edge Locations Load Balancer Application/Web Servers Batch Servers Database
  • 5.
  • 6.
    Why Should YouCare about Optimization? Performance: How fast does your system react, even to one user? Scalability: How many users, or how much data can you handle? (alternatively, do you have the ability to scale?) Versus
  • 7.
    Why Should YouCare about Optimization? • Improve end-user experience – New Users • Focus on features, less so on performance and scalability. • Improve end-user experience – Existing Users • Lots of ā€œloadā€, might abandon your product if too slow. • Reduce cloud operating costs • Handling the same workload using less infrastructure leads to lower cost.
  • 8.
  • 9.
    Problem Area 1- Beware of Latency Latency: The time required for data to travel from Point A to Point B Vancouver to Virginia 100ms Vancouver to Singapore 300ms Between availability zones 1ms From web server to database 100μs + From main memory into CPU 100ns Key Problem – ā€œChatty Protocolsā€ (repeat 1000x)
  • 10.
    Problem Area 1- Beware of Latency Example 1: SSL/TLS Negotiation • TLS is a very ā€œchattyā€ protocol, compared with non-SSL. 500ms versus 180ms to connect! • New SSL connections require FIVE round trip messages. • Solution: Use nearby CloudFront as SSL endpoint. Example 2: N + 1 SQL Database queries • Badly written queries => too many queries. • With large data sets, latency adds up! • Solution: Rewrite using a single optimized query.
  • 11.
    Problem Area 1- Beware of Latency
  • 12.
    Problem Area 2- Efficiency of Your Code Modern computers can do: • Billions of cycles/second • Millions of RAM accesses/second • Thousands of disk accesses/second You will waste them, but you need to know where: • Understand what’s being stored in RAM. • Understand what your CPU is doing. • Understand what your disk or database is busy loading.
  • 13.
    Problem Area 2- Efficiency of Your Code
  • 14.
    Problem Area 2- Efficiency of Your Code Example: Fetch list of Facebook friends • From SQL database: 10ms • From cached copy in memory: 1μs • 10,000 times faster! Problems: • Knowing when to ā€œinvalidateā€ caches is hard! • Implementing caches is hard! • Storage is more expensive. Caching: Storing hard-to-compute results for later reuse
  • 15.
    Problem Area 2- Efficiency of Your Code Language/Framework – Choose carefully, based on needs. Example: Ruby on Rails • Awesome for development of interactive sites. • Easy to learn, develop, and debug. Example: Scala with Play Framework • Awesome for high performance and scalable systems, including analytics. • Harder to learn, slower to implement code.
  • 16.
    Problem Area 3- Architecting for ā€œScale Outā€ Always architect your software to be scalable. • Although not necessarily scaled. Some common principles: 1. Think distributed instead of monolithic (scale-out not scale-up) 2. Design software components to be stateless (easier scale-out) 3. Assume multiple databases: • You might start with one database, but eventually it’ll become a bottleneck – plan for having many.
  • 17.
  • 18.
    How to Prioritize? 1.A potential customer is negotiating a deal, but wants a feature added before they’ll pay. 2. An existing customer is pushing your software to new limits (more users, more data), but is noticing problems. 3. Your QA team stress-tests your product, making it fail. TTM Quality Performance Scalability
  • 19.
    Strategy 1 -Don’t Optimize Prematurely • Keep performance and scalability in mind, but don’t over-engineer your software. • Keep ahead of the problems, but not too far ahead. • When building complex software systems, the actual bottlenecks might surprise you.
  • 20.
    Strategy 2 -Define ā€œGood Enoughā€ • Have an organization-wide consensus on performance and scalability expectations. • Don’t leave it up to personal judgment whether something is good enough – pass/fail must be obvious. • For example: • 95% of requests complete within 2 seconds. • 99.9% of requests complete within 5 seconds. • The remaining 0.1% might take longer.
  • 21.
    Strategy 3 –Use Measurement Tools • Don’t base problems on ā€œpersonal feelingsā€. • Collect performance data for all users (use New Relic / Data Dog) • Raise alerts when data goes beyond ā€œgood enoughā€ – be proactive
  • 22.
    Strategy 4 –Get Upper Management on Board • There’s nothing worse than conflicting messages: ā€œYou need to focus on performance of the product, but that customer feature is needed by Fridayā€ • Software Developers are often conflicted, needing firm and consistent leadership. • Don’t pull them in multiple directions!
  • 23.
    Strategy 5 –Identify Technical Champions It’s easy to say ā€œwrite performant and scalable codeā€, but HOW??? • Identify knowledgeable and passionate individual contributors. • Make these the ā€œgo toā€ people for advising-on and reviewing performance-critical code. • Pay special attention to these people when they’re concerned about issues.
  • 24.
    Strategy 6 -Always Test with Realistic Data Developers often test using small amounts of data, typically on laptops. • Won’t find performance or scalability issues! • For example: • Customers had 1000 network devices, but we tested with 6. Found an O(n^3) algorithm! Instead, periodically spin-up servers for scalability testing: • Use production-sized servers. • Use production-sized data and workload.
  • 25.
  • 26.
    Take Away Message Tomorrow,I hope you think differently about performance and scalability. 1) Do you believe that performance and scalability are important to think about? 2) What are your company’s quantifiable expectations on performance and scalability? 3) Are you doing a good enough job to measure and identify problems? 4) Are there any cultural issues in your organization preventing you from reaching your goals?
  • 27.

Editor's Notes