LCA13: Web Server and Caching Technologies
Upcoming SlideShare
Loading in...5
×
 

LCA13: Web Server and Caching Technologies

on

  • 260 views

Resource: LCA13

Resource: LCA13
Name: Web Server and Caching Technologies
Date: 05-03-2013
Speaker: Ard Biesheuvel

Statistics

Views

Total Views
260
Views on SlideShare
260
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

LCA13: Web Server and Caching Technologies LCA13: Web Server and Caching Technologies Presentation Transcript

  • ASIA 2013 (LCA13) Web server and caching technologies Ard Biesheuvel (ardb) <ard.biesheuvel@linaro.org>
  • ASIA 2013 (LCA13) www.linaro.org Definitions Topic internally referred to as web frontend vertical Web frontend → the part of a hyperscale web service deployment that faces the external network Excludes single server LAMP boxes Excludes specific optimizations for distributed data storage etc Vertical → take all layers of the software stack into account Networking performance at the kernel level I/O performance in reverse proxies/caches CPU performance in dynamic content generation
  • ASIA 2013 (LCA13) www.linaro.org Mission We aim to put ourselves in the shoes of the provider of a medium-to-large web service who is looking to replace some parts of their deployment with ARM servers and aim to: determine which parts of such a stack are most suitable for replacement with ARM server parts, and find and ameliorate performance bottlenecks before someone tries to do this for real.
  • ASIA 2013 (LCA13) www.linaro.org Mission So what is a suitable replacement? Cost of whole deployment should be less Performance of whole deployment should not be less What constitutes performance? Throughput → ability to handle # requests per unit time Latency → ability to handle an individual request within time t
  • ASIA 2013 (LCA13) www.linaro.org Performance and scalability Scale up Faster node More requests handled per unit time Throughput improves Latency improves Scale out Multiple slower nodes in parallel More requests handled per unit time Throughput improves Latency stays the same! When trying to maintain the same level of performance using a larger number of less powerful nodes, latency is likely a bigger concern than throughput.
  • ASIA 2013 (LCA13) www.linaro.org Example: SSL handling Facebook, Google etc now use https:// by default The CONNECT phase of SSL consists of RSA authentication (which involves costly signing at the server end) and Diffie-Hellman key exchange No parallellism here Conventional chaining modes for symmetric payload encryption can only execute sequentially Web server workload is highly parallel in nature, but not below the connection level.
  • ASIA 2013 (LCA13) www.linaro.org Example: Wordpress Unoptimized boilerplate Wordpress install on an ARM system running Ubuntu 12.04 ApacheBench running on an identical node 1000 requests at concurrency levels 1, 2, 3, 4, 8, 16 Connection Times (ms) min mean[+/-sd] median max Connect: 200 202 1.8 201 231 Processing: 142 145 1.2 144 154 Waiting: 142 144 1.2 144 154 Total: 343 347 1.8 347 373 Obvious optimizations stand out Connection caching for SSL Opcode cache for PHP ApacheBench is single threaded!!
  • ASIA 2013 (LCA13) www.linaro.org LAVA support Currently, LAVA has no support at all for running test cases involving more than a single node Any web frontend test case will involve at least an origin server and 2+ request generators Jobs cannot be executed in parallel arbitrarily if the nodes have shared resources (such as the Calxeda fabric)
  • ASIA 2013 (LCA13) www.linaro.org Power efficiency as a first-class metric Power efficiency is what sets ARM apart from the competition However, latency is often proportional to raw CPU performance We need a benchmark that takes both into account, and Allows us to track our progress by watching the numbers improve over time Compare ourselves with others in a meaningful way
  • ASIA 2013 (LCA13) www.linaro.org Approach Benchmark 'typical' deployment running on server grade hardware Calxeda Highbank has plenty of nodes and fast interconnect 'Typical' means state of the art in terms of applied optimizations Anything less means we are reinventing the wheel Measure performance Latency, throughput, power consumption? Systems not under test should be contention free I/O induced CPU contention is more costly on ARM Look out for bottlenecks in networking stack Focus on latency not throughput Use faster code so CPU bound tasks complete in less time → Assembler dependencies (Steve McIntyre Fri 9 am) → Hadoop optimization (Steve Capper Tue 12 pm) More pipelining of CPU and I/O bound tasks
  • More about Linaro Connect: www.linaro.org/connect/ More about Linaro: www.linaro.org/about/ More about Linaro engineering: www.linaro.org/engineering/ ASIA 2013 (LCA13)