My ${favorite} Linux Distribution is slow !Credit : fras1977@flickr                            Distro Recipes 4th April 20...
Performance does matter●   Users expects more performance●   They do have perfect hardware●   They installed the latest OS...
About this talk●   What to expect ?    –   Tricks to proove distro is not always the bad guy    –   A compilation of real ...
Tracking the beast●   Slowdowns come from various sources    –   CPU    –   Storage    –   Interrupts    –   Memory    –  ...
CPU load●   Estimating the load of the CPU is pretty easy●   Using « top » with a sort on « cpu load »    –   Dont mixup w...
Weird CPU issues●   Temperature    –   Internal throttling to avoid overheat    –   ~110/120° on Intel CPUs    –   Monitor...
Storage Load●   Massive IOs can slow down a system seriously    –   Depending on the storage device ( HDD vs SSD)    –   D...
Storage Load                                                 bi   =   blocks in                                           ...
Storage Load●   A broken/slow storage device can load the system●   HDD :   Broken sectors reallocation are invisible but ...
Storage Load●   « smartctl -a /dev/sda » of a dying HDD disk
Storage Load●   SSDs :     Far from a perfect device      ●   Performance may vary regarding various fw implementations   ...
SSD IO PathSATA IOs              IO6Gb/sec           Controller              MLC             960                          ...
Weird Storage Issues●   Temperature    –   On HDDs, thermal recalibration occurs too often to maintain        a certain le...
IRQ Storms●   Inside a +1200 array of identical computers●   Some are booting very very slowly and engage some    software...
IRQ Storms
Memory Issues●   2 identical servers that doesnt perform the same    –   One is really slower than the other●   Same serve...
Memory Issues●   Memory banks were not populated with the same HW●   Some were DDR3 with a CAS Latency = 9●   Some were DD...
Dear Loadavg,●   You are complicated to understand●   You dont help tracking the source of the load●   You can be a lier i...
Thanks !●   Email : erwanliasr1@gmail.com●   IRC : erwan_taf @ {freenode | oftc }
Upcoming SlideShare
Loading in...5
×

Distro Recipes 2013 : My ${favorite_linux_distro} is slow!

247

Published on

https://distro-recipes.org

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
247
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Distro Recipes 2013 : My ${favorite_linux_distro} is slow!"

  1. 1. My ${favorite} Linux Distribution is slow !Credit : fras1977@flickr Distro Recipes 4th April 2013 @Paris
  2. 2. Performance does matter● Users expects more performance● They do have perfect hardware● They installed the latest OS release● So it shall be faster than ever ! Isnt it ?● But we still got thoses imprecise reports ..... « Hey ! My Linux Distro is Slow ! » « The latest OS reduces the performance ! »
  3. 3. About this talk● What to expect ? – Tricks to proove distro is not always the bad guy – A compilation of real debugging sessions● What not to expect ? – Having one magic answer about perf.● Who are you ?
  4. 4. Tracking the beast● Slowdowns come from various sources – CPU – Storage – Interrupts – Memory – Network (not included in this presentation) – Applications (not included)
  5. 5. CPU load● Estimating the load of the CPU is pretty easy● Using « top » with a sort on « cpu load » – Dont mixup with loadavg !
  6. 6. Weird CPU issues● Temperature – Internal throttling to avoid overheat – ~110/120° on Intel CPUs – Monitoring via coretemp & acpi « CPU1: Core temperature above threshold, cpu clock throttled (total events = 12841) » – Generates Machine Check Exceptions (MCE) – As a result, CPU performance are reduced
  7. 7. Storage Load● Massive IOs can slow down a system seriously – Depending on the storage device ( HDD vs SSD) – Depending on the IO profile (sequential vs random) – « vmstat » is useful to track this behavior bi = blocks in bo = blocks out wa = waiting IO si = swap in so = swap out Someone reads a lot !
  8. 8. Storage Load bi = blocks in bo = blocks out wa = waiting IO si = swap in so = swap out Someone try to read a lot ! (3 threads read 4K random)● CPU does wait the storage device (~30% wa)● HDD + 3 threads @ 4K random generates a massive device load● During this load, my system was unusable● A desktop search, rsync, tar, ... can generate such load
  9. 9. Storage Load● A broken/slow storage device can load the system● HDD : Broken sectors reallocation are invisible but lags ● SATA disks tries several time to recover sectors ● No other IOs will be accepted during this process ● Kills RAID-arrays ● Enterprise-class SATA disks reallocates immediately ● SMART to count {broken|pending|reallocated} sectors ● %wa in top or vmstat shall be high in such case
  10. 10. Storage Load● « smartctl -a /dev/sda » of a dying HDD disk
  11. 11. Storage Load● SSDs : Far from a perfect device ● Performance may vary regarding various fw implementations ● SLC front cache before reaching the MLC storage – Getting out-of-cache effect – 200+MB/s on SLC – 5MB/s on MLC in worst case – After a while, global SSD performance is limited : 5MB/sec – Behavior not visible for {simple|short} workload – %wa in top or vmstat shall increase in such case – Can be reproduced by using fio http://git.kernel.dk/?p=fio.git
  12. 12. SSD IO PathSATA IOs IO6Gb/sec Controller MLC 960 Cells Mb/sec SLC 40Mb/sec Cache
  13. 13. Weird Storage Issues● Temperature – On HDDs, thermal recalibration occurs too often to maintain a certain level of service. – Media-class disks are less subject to this effect● Vibrations – Raid arrays contains several HDDs spinning constantly – All this individual vibrations prevent heads being properly aligned leading to heads recalibrations – That could totally prevent a raid array from delivering IOs
  14. 14. IRQ Storms● Inside a +1200 array of identical computers● Some are booting very very slowly and engage some software watchdogs● /proc/interrupts reports IRQ storm (66000 per sec) on interrupt 19● CPU is permanently interrupted by IRQs● AHCI controller floods as HDD doesnt answer on ATA_IDENTIFY requests (seen by extracting HDD)● AHCI driver fails at probing so int19 only reports usb dev.● Some hardware failures can lead to load issues
  15. 15. IRQ Storms
  16. 16. Memory Issues● 2 identical servers that doesnt perform the same – One is really slower than the other● Same server brand / model● Same vendor● Same hardware setup● But really performs differently....● What the hell my {application|os} is doing wrong here ?
  17. 17. Memory Issues● Memory banks were not populated with the same HW● Some were DDR3 with a CAS Latency = 9● Some were DDR3 with a CAS Latency = 11● As a result the memory access were slower on one● This got detected at runtime under Linux with DDR3 timing tool from Cyring. (http://code.cyring.fr/FTS/? PATH=Source/C/DDR3_Timings/0.2/timings.c)● Hardware setups were supposed to be the same !
  18. 18. Dear Loadavg,● You are complicated to understand● You dont help tracking the source of the load● You can be a lier if some kernel code dont update you● But you provide an indicator on the global load – 1.0 means 100% of the ressources● Ill keep you as a raw indicator to start my investigations
  19. 19. Thanks !● Email : erwanliasr1@gmail.com● IRC : erwan_taf @ {freenode | oftc }
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×