1. ASIA 2013 (LCA13)
Web server and caching technologies
Ard Biesheuvel (ardb) <ard.biesheuvel@linaro.org>
2. ASIA 2013 (LCA13)
www.linaro.org
Definitions
Topic internally referred to as web frontend vertical
Web frontend → the part of a hyperscale web service
deployment that faces the external network
Excludes single server LAMP boxes
Excludes specific optimizations for distributed data storage etc
Vertical → take all layers of the software stack into account
Networking performance at the kernel level
I/O performance in reverse proxies/caches
CPU performance in dynamic content generation
3. ASIA 2013 (LCA13)
www.linaro.org
Mission
We aim to put ourselves in the shoes of the provider of a
medium-to-large web service who is looking to replace some
parts of their deployment with ARM servers and aim to:
determine which parts of such a stack are most suitable for
replacement with ARM server parts, and
find and ameliorate performance bottlenecks before
someone tries to do this for real.
4. ASIA 2013 (LCA13)
www.linaro.org
Mission
So what is a suitable replacement?
Cost of whole deployment should be less
Performance of whole deployment should not be less
What constitutes performance?
Throughput → ability to handle # requests per unit time
Latency → ability to handle an individual request within time t
5. ASIA 2013 (LCA13)
www.linaro.org
Performance and scalability
Scale up
Faster node
More requests handled per
unit time
Throughput improves
Latency improves
Scale out
Multiple slower nodes in
parallel
More requests handled per
unit time
Throughput improves
Latency stays the same!
When trying to maintain the same level
of performance using a larger number of
less powerful nodes, latency is likely a
bigger concern than throughput.
6. ASIA 2013 (LCA13)
www.linaro.org
Example: SSL handling
Facebook, Google etc now use https:// by default
The CONNECT phase of SSL consists of RSA
authentication (which involves costly signing at the server
end) and Diffie-Hellman key exchange
No parallellism here
Conventional chaining modes for symmetric payload
encryption can only execute sequentially
Web server workload is highly parallel in nature, but not
below the connection level.
7. ASIA 2013 (LCA13)
www.linaro.org
Example: Wordpress
Unoptimized boilerplate Wordpress install on an ARM
system running Ubuntu 12.04
ApacheBench running on an identical node
1000 requests at concurrency levels 1, 2, 3, 4, 8, 16
Connection Times (ms)
min mean[+/-sd] median max
Connect: 200 202 1.8 201 231
Processing: 142 145 1.2 144 154
Waiting: 142 144 1.2 144 154
Total: 343 347 1.8 347 373
Obvious optimizations stand out
Connection caching for SSL
Opcode cache for PHP
ApacheBench is single threaded!!
8. ASIA 2013 (LCA13)
www.linaro.org
LAVA support
Currently, LAVA has no support at all for running test cases
involving more than a single node
Any web frontend test case will involve at least an origin
server and 2+ request generators
Jobs cannot be executed in parallel arbitrarily if the nodes
have shared resources (such as the Calxeda fabric)
9. ASIA 2013 (LCA13)
www.linaro.org
Power efficiency as a first-class metric
Power efficiency is what sets ARM apart from the
competition
However, latency is often proportional to raw CPU
performance
We need a benchmark that takes both into account, and
Allows us to track our progress by watching the numbers improve
over time
Compare ourselves with others in a meaningful way
10. ASIA 2013 (LCA13)
www.linaro.org
Approach
Benchmark 'typical' deployment running on server grade
hardware
Calxeda Highbank has plenty of nodes and fast interconnect
'Typical' means state of the art in terms of applied optimizations
Anything less means we are reinventing the wheel
Measure performance
Latency, throughput, power consumption?
Systems not under test should be contention free
I/O induced CPU contention is more costly on ARM
Look out for bottlenecks in networking stack
Focus on latency not throughput
Use faster code so CPU bound tasks complete in less time
→ Assembler dependencies (Steve McIntyre Fri 9 am)
→ Hadoop optimization (Steve Capper Tue 12 pm)
More pipelining of CPU and I/O bound tasks
11. More about Linaro Connect: www.linaro.org/connect/
More about Linaro: www.linaro.org/about/
More about Linaro engineering: www.linaro.org/engineering/
ASIA 2013 (LCA13)