Distro Recipes 2013 : My ${favorite_linux_distro} is slow!
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Distro Recipes 2013 : My ${favorite_linux_distro} is slow!







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Distro Recipes 2013 : My ${favorite_linux_distro} is slow! Presentation Transcript

  • 1. My ${favorite} Linux Distribution is slow !Credit : fras1977@flickr Distro Recipes 4th April 2013 @Paris
  • 2. Performance does matter● Users expects more performance● They do have perfect hardware● They installed the latest OS release● So it shall be faster than ever ! Isnt it ?● But we still got thoses imprecise reports ..... « Hey ! My Linux Distro is Slow ! » « The latest OS reduces the performance ! »
  • 3. About this talk● What to expect ? – Tricks to proove distro is not always the bad guy – A compilation of real debugging sessions● What not to expect ? – Having one magic answer about perf.● Who are you ?
  • 4. Tracking the beast● Slowdowns come from various sources – CPU – Storage – Interrupts – Memory – Network (not included in this presentation) – Applications (not included)
  • 5. CPU load● Estimating the load of the CPU is pretty easy● Using « top » with a sort on « cpu load » – Dont mixup with loadavg !
  • 6. Weird CPU issues● Temperature – Internal throttling to avoid overheat – ~110/120° on Intel CPUs – Monitoring via coretemp & acpi « CPU1: Core temperature above threshold, cpu clock throttled (total events = 12841) » – Generates Machine Check Exceptions (MCE) – As a result, CPU performance are reduced
  • 7. Storage Load● Massive IOs can slow down a system seriously – Depending on the storage device ( HDD vs SSD) – Depending on the IO profile (sequential vs random) – « vmstat » is useful to track this behavior bi = blocks in bo = blocks out wa = waiting IO si = swap in so = swap out Someone reads a lot !
  • 8. Storage Load bi = blocks in bo = blocks out wa = waiting IO si = swap in so = swap out Someone try to read a lot ! (3 threads read 4K random)● CPU does wait the storage device (~30% wa)● HDD + 3 threads @ 4K random generates a massive device load● During this load, my system was unusable● A desktop search, rsync, tar, ... can generate such load
  • 9. Storage Load● A broken/slow storage device can load the system● HDD : Broken sectors reallocation are invisible but lags ● SATA disks tries several time to recover sectors ● No other IOs will be accepted during this process ● Kills RAID-arrays ● Enterprise-class SATA disks reallocates immediately ● SMART to count {broken|pending|reallocated} sectors ● %wa in top or vmstat shall be high in such case
  • 10. Storage Load● « smartctl -a /dev/sda » of a dying HDD disk
  • 11. Storage Load● SSDs : Far from a perfect device ● Performance may vary regarding various fw implementations ● SLC front cache before reaching the MLC storage – Getting out-of-cache effect – 200+MB/s on SLC – 5MB/s on MLC in worst case – After a while, global SSD performance is limited : 5MB/sec – Behavior not visible for {simple|short} workload – %wa in top or vmstat shall increase in such case – Can be reproduced by using fio http://git.kernel.dk/?p=fio.git
  • 12. SSD IO PathSATA IOs IO6Gb/sec Controller MLC 960 Cells Mb/sec SLC 40Mb/sec Cache
  • 13. Weird Storage Issues● Temperature – On HDDs, thermal recalibration occurs too often to maintain a certain level of service. – Media-class disks are less subject to this effect● Vibrations – Raid arrays contains several HDDs spinning constantly – All this individual vibrations prevent heads being properly aligned leading to heads recalibrations – That could totally prevent a raid array from delivering IOs
  • 14. IRQ Storms● Inside a +1200 array of identical computers● Some are booting very very slowly and engage some software watchdogs● /proc/interrupts reports IRQ storm (66000 per sec) on interrupt 19● CPU is permanently interrupted by IRQs● AHCI controller floods as HDD doesnt answer on ATA_IDENTIFY requests (seen by extracting HDD)● AHCI driver fails at probing so int19 only reports usb dev.● Some hardware failures can lead to load issues
  • 15. IRQ Storms
  • 16. Memory Issues● 2 identical servers that doesnt perform the same – One is really slower than the other● Same server brand / model● Same vendor● Same hardware setup● But really performs differently....● What the hell my {application|os} is doing wrong here ?
  • 17. Memory Issues● Memory banks were not populated with the same HW● Some were DDR3 with a CAS Latency = 9● Some were DDR3 with a CAS Latency = 11● As a result the memory access were slower on one● This got detected at runtime under Linux with DDR3 timing tool from Cyring. (http://code.cyring.fr/FTS/? PATH=Source/C/DDR3_Timings/0.2/timings.c)● Hardware setups were supposed to be the same !
  • 18. Dear Loadavg,● You are complicated to understand● You dont help tracking the source of the load● You can be a lier if some kernel code dont update you● But you provide an indicator on the global load – 1.0 means 100% of the ressources● Ill keep you as a raw indicator to start my investigations
  • 19. Thanks !● Email : erwanliasr1@gmail.com● IRC : erwan_taf @ {freenode | oftc }