Distro Recipes 2013 : My ${favorite_linux_distro} is slow!

  • 209 views
Uploaded on

https://distro-recipes.org

https://distro-recipes.org

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
209
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. My ${favorite} Linux Distribution is slow !Credit : fras1977@flickr Distro Recipes 4th April 2013 @Paris
  • 2. Performance does matter● Users expects more performance● They do have perfect hardware● They installed the latest OS release● So it shall be faster than ever ! Isnt it ?● But we still got thoses imprecise reports ..... « Hey ! My Linux Distro is Slow ! » « The latest OS reduces the performance ! »
  • 3. About this talk● What to expect ? – Tricks to proove distro is not always the bad guy – A compilation of real debugging sessions● What not to expect ? – Having one magic answer about perf.● Who are you ?
  • 4. Tracking the beast● Slowdowns come from various sources – CPU – Storage – Interrupts – Memory – Network (not included in this presentation) – Applications (not included)
  • 5. CPU load● Estimating the load of the CPU is pretty easy● Using « top » with a sort on « cpu load » – Dont mixup with loadavg !
  • 6. Weird CPU issues● Temperature – Internal throttling to avoid overheat – ~110/120° on Intel CPUs – Monitoring via coretemp & acpi « CPU1: Core temperature above threshold, cpu clock throttled (total events = 12841) » – Generates Machine Check Exceptions (MCE) – As a result, CPU performance are reduced
  • 7. Storage Load● Massive IOs can slow down a system seriously – Depending on the storage device ( HDD vs SSD) – Depending on the IO profile (sequential vs random) – « vmstat » is useful to track this behavior bi = blocks in bo = blocks out wa = waiting IO si = swap in so = swap out Someone reads a lot !
  • 8. Storage Load bi = blocks in bo = blocks out wa = waiting IO si = swap in so = swap out Someone try to read a lot ! (3 threads read 4K random)● CPU does wait the storage device (~30% wa)● HDD + 3 threads @ 4K random generates a massive device load● During this load, my system was unusable● A desktop search, rsync, tar, ... can generate such load
  • 9. Storage Load● A broken/slow storage device can load the system● HDD : Broken sectors reallocation are invisible but lags ● SATA disks tries several time to recover sectors ● No other IOs will be accepted during this process ● Kills RAID-arrays ● Enterprise-class SATA disks reallocates immediately ● SMART to count {broken|pending|reallocated} sectors ● %wa in top or vmstat shall be high in such case
  • 10. Storage Load● « smartctl -a /dev/sda » of a dying HDD disk
  • 11. Storage Load● SSDs : Far from a perfect device ● Performance may vary regarding various fw implementations ● SLC front cache before reaching the MLC storage – Getting out-of-cache effect – 200+MB/s on SLC – 5MB/s on MLC in worst case – After a while, global SSD performance is limited : 5MB/sec – Behavior not visible for {simple|short} workload – %wa in top or vmstat shall increase in such case – Can be reproduced by using fio http://git.kernel.dk/?p=fio.git
  • 12. SSD IO PathSATA IOs IO6Gb/sec Controller MLC 960 Cells Mb/sec SLC 40Mb/sec Cache
  • 13. Weird Storage Issues● Temperature – On HDDs, thermal recalibration occurs too often to maintain a certain level of service. – Media-class disks are less subject to this effect● Vibrations – Raid arrays contains several HDDs spinning constantly – All this individual vibrations prevent heads being properly aligned leading to heads recalibrations – That could totally prevent a raid array from delivering IOs
  • 14. IRQ Storms● Inside a +1200 array of identical computers● Some are booting very very slowly and engage some software watchdogs● /proc/interrupts reports IRQ storm (66000 per sec) on interrupt 19● CPU is permanently interrupted by IRQs● AHCI controller floods as HDD doesnt answer on ATA_IDENTIFY requests (seen by extracting HDD)● AHCI driver fails at probing so int19 only reports usb dev.● Some hardware failures can lead to load issues
  • 15. IRQ Storms
  • 16. Memory Issues● 2 identical servers that doesnt perform the same – One is really slower than the other● Same server brand / model● Same vendor● Same hardware setup● But really performs differently....● What the hell my {application|os} is doing wrong here ?
  • 17. Memory Issues● Memory banks were not populated with the same HW● Some were DDR3 with a CAS Latency = 9● Some were DDR3 with a CAS Latency = 11● As a result the memory access were slower on one● This got detected at runtime under Linux with DDR3 timing tool from Cyring. (http://code.cyring.fr/FTS/? PATH=Source/C/DDR3_Timings/0.2/timings.c)● Hardware setups were supposed to be the same !
  • 18. Dear Loadavg,● You are complicated to understand● You dont help tracking the source of the load● You can be a lier if some kernel code dont update you● But you provide an indicator on the global load – 1.0 means 100% of the ressources● Ill keep you as a raw indicator to start my investigations
  • 19. Thanks !● Email : erwanliasr1@gmail.com● IRC : erwan_taf @ {freenode | oftc }