The Art of Writing Efficient Software

3,616 views

Published on

For many applications, efficiency -- either in terms of execution time or memory consumption -- is of utmost importance.
If it is not possible, either due to economic or technical constraints, to use more powerful hardware, the efficiency of the software running on these limited platforms becomes a critical success factor.
This presentation shows important principles and techniques that are needed to implement efficient software. It is mainly targeted at systems and embedded systems developers. A good command of the C programming language is assumed.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,616
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Art of Writing Efficient Software

  1. 1. The Art of Writing Efficient SoftwarePrinciples and TechniquesVersion 1.0.1ralf.holly@approxion.com
  2. 2. The Art of Writing Efficient Software Copyright © 2013 Ralf Holly“Having lost sight of our goals,we redouble our efforts.”-- Mark Twain
  3. 3. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyEfficiency definedEffectivity:► Means doing the right things► Fulfill functional requirements► Example: sorting an array; algorithm is effective if array is sorted afterwardsEfficiency:► Means doing the right things as good as possible► "good" means using as few resources as possible► Example: using Quicksort (instead of Bubblesort)► Space efficiency ("footprint"):► Use as little memory as possible► Run-time efficiency ("performance")► Use as little execution time as possible
  4. 4. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyWhy is efficiency important?Embedded systems► Fixed requirements in terms of memory consumption/execution time/cost► Mass production: car control units, mobile phones, smart cards► Efficient software yield efficiency in terms of energy consumption (battery life-time)User experience► Slow software sucks► Especially: games, user-interfaces⇨ Efficiency is an important sales-factor!
  5. 5. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyAn example► Task: implement a fast memfill routinevoid MemFill(uint8_t* p, uint16_t len, uint8_t fill) {...}► Optimize iteratively► Development platform: Renesas H8/300H (16/32 bit)► Toolchain: HEW C/C++ 5.03, "optimized for speed"
  6. 6. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 1 -- Simple for loopvoid MemFill1(uint8_t* p, uint16_t len, uint8_t fill) {uint16_t i;for (i = 0; i < len; i++) {*p++ = fill;}}
  7. 7. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 1 -- Simple for looplen 1 5 10 20 30 50 100 200 500 1000 1000086 136 208 346 486 766 1466 2866 7066 14066 140068Results showconsumed CPU cycleson H8/300HResults showconsumed CPU cycleson H8/300H
  8. 8. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 2 -- Simple while loopvoid MemFill2(uint8_t* p, uint16_t len, uint8_t fill){while (len-- > 0) {*p++ = fill;}}
  9. 9. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 2 -- Simple while looplen 1 5 10 20 30 50 100 200 500 1000 1000086 136 208 346 486 766 1466 2866 7066 14066 140068104 170 248 410 570 890 1690 3290 8090 16090 160088Lesson:"Dont assumeanything!"Lesson:"Dont assumeanything!"
  10. 10. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 3 -- Word accessvoid MemFill3(uint8_t* p, uint16_t len, uint8_t fill) {uint16_t fill2 = fill << 8 | fill;uint16_t len2;if (len == 0) return;if ((uint8_t)p & 1) { *p++ = fill; --len; }len2 = len >> 1;while (len2-- > 0) {*(uint16_t*)p = fill2;p += 2;}if (len & 1)*p = fill;}Downside:Need to take careof special casesDownside:Need to take careof special cases
  11. 11. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 3 -- Word accesslen 1 5 10 20 30 50 100 200 500 1000 1000086 136 208 346 486 766 1466 2866 7066 14066 140068104 170 248 410 570 890 1690 3290 8090 16090 160088132 166 200 282 362 522 922 1722 4122 8122 80120
  12. 12. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyLoop unrollingfor (i = 0; i < n; ++i) {/* do something */}for (i = 0; i < n / 8; ++i) {/* do something *//* do something *//* do something *//* do something *//* do something *//* do something *//* do something *//* do something */}for (i = 0; i < n % 8; ++i) {/* do something */}Goal:Reduce loopoverheadGoal:Reduce loopoverhead
  13. 13. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyLoop unrolling (Duffs Device)int i = (n + 7) / 8;switch(n % 8) {case 0: do { /* do something */;case 7: /* do something */;case 6: /* do something */;case 5: /* do something */;case 4: /* do something */;case 3: /* do something */;case 2: /* do something */;case 1: /* do something */;} while (--i > 0);}► Tom Duff (1983, Lucasfilm)► http://www.lysator.liu.se/c/duffs-device.html► is valid ANSI C► "reusable unrolling"
  14. 14. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyLoop unrolling (Duffs Device)#define DUFF_DEVICE(duffTimes, duffAction) do { U16 n = (duffTimes) ; U16 i = (n + 7) >> 3; switch (n & 7) { case 0: do { duffAction; case 7: duffAction; case 6: duffAction; case 5: duffAction; case 4: duffAction; case 3: duffAction; case 2: duffAction; case 1: duffAction; } while(--i > 0); } } while(0)Lets put DuffsDevice in a macroLets put DuffsDevice in a macro
  15. 15. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 4 -- Word access + Duffs Devicevoid MemFill4(uint8_t* p, uint16_t len, uint8_t fill) {uint16_t len2;if (len == 0) return;if ((uint8_t)p & 1) { *p++ = fill; --len; }len2 = len >> 1;if (len2 != 0) {uint16_t fill2 = fill << 8 | fill;DUFF_DEVICE(len2, *(uint16_t*)p = fill2; p += 2;);}if (len & 1)*p = fill;}
  16. 16. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 4 -- Word access + Duffs Devicelen 1 5 10 20 30 50 100 200 500 1000 1000086 136 208 346 486 766 1466 2866 7066 14066 140068104 170 248 410 570 890 1690 3290 8090 16090 160088132 166 200 282 362 522 922 1722 4122 8122 80120150 224 248 282 312 378 552 888 1902 3588 33962
  17. 17. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 5 -- Word access + Duffs Device + small lengthoptimizationvoid MemFill5(uint8_t* p, uint16_t len, uint8_t fill) {switch (len) {case 10: *p++ = fill;case 9: *p++ = fill;case 8: *p++ = fill;case 7: *p++ = fill;case 6: *p++ = fill;case 5: *p++ = fill;case 4: *p++ = fill;case 3: *p++ = fill;case 2: *p++ = fill;case 1: *p = fill;case 0: return;default:;};... rest as version 4 ...Special treatment(similar to unrolling)for len <= 10Special treatment(similar to unrolling)for len <= 10
  18. 18. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 5 -- Word access + Duffs Device + small lengthoptimizationlen 1 5 10 20 30 50 100 200 500 1000 1000086 136 208 346 486 766 1466 2866 7066 14066 140068104 170 248 410 570 890 1690 3290 8090 16090 160088132 166 200 282 362 522 922 1722 4122 8122 80120150 224 248 282 312 378 552 888 1902 3588 33962182 208 236 312 342 412 602 962 2052 3862 36480This was a bad idea:now our code is socomplicated that theoptimizer gives up!This was a bad idea:now our code is socomplicated that theoptimizer gives up!
  19. 19. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 6 -- Assembler► Based on assembly output of version 5► Optimized use of CPU registers► Removed redundant store/load of some registers (push/pop)► Removed redundant library call► Removed redundant instruction in Duffs Device
  20. 20. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyVersion 6 -- Assemblerlen 1 5 10 20 30 50 100 200 500 1000 1000086 136 208 346 486 766 1466 2866 7066 14066 140068104 170 248 410 570 890 1690 3290 8090 16090 160088132 166 200 282 362 522 922 1722 4122 8122 80120150 224 248 282 312 378 552 888 1902 3588 33962182 208 236 312 342 412 602 962 2052 3862 3648094 108 138 186 216 282 456 792 1806 3492 338640.9 1.3 1.5 1.9 2.3 2.7 3.2 3.6 3.9 4.0 4.1
  21. 21. The Art of Writing Efficient Software Copyright © 2013 Ralf HollySummaryRun-time improvements► 1.5 (len = 10)► 3.2 (len = 100)► 4.0 (len = 1000)Biggest contribution: word-access and loop unrolling► Just re-implementing slow code in assembly language doesnt cut it!Complexity increases► SLOCbefore: 2 lines of straigth-forward C codeafterwards: ~100 lines of assembly code► Cyclomatic complexity (McCabe factor)before: 2after: 20
  22. 22. Principles of Efficient Software
  23. 23. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciples vs. TechniquesPrinciples► Universal applicability► Easy to grasp► Unspecific, hard to apply► Principles often sound like truismsTechniques► More or less easy to grasp► Specific, more or less easy to apply► Limited applicability (dependent on context)
  24. 24. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 1: Dont optimize (yet)► D. Knuth: "Premature optimization is the root of all evil"► What he meant: focus on correctness, maintainability► Code tuning is time-consuming, risky, and expensive► Maintainability and testability is reduced► But: efficiency must be a requirements issue► Difficult to add later► Choice of hardware/programming language/architecture► Measure/track efficiency already in early stages of development
  25. 25. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 2: Understand the systemThorough understanding of the system is a prerequisite forefficiency► How does the system work?► How does it interact with other systems?► What are the major use-cases?► Are there any real-time constraints?► What is the system doing most of the time?
  26. 26. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 3: Measure, measure, measure...► Dont assume inefficiency, prove it► Be wary of old-wives tales► Switch-case is faster than if-else► Bitfields are more efficient than explicit ANDing/ORing► Today, compiled code is better/worse than hand-written assembly code► Inspect assembly output and count cycles► Risky: prefetching, pipelining, caching are context-dependent► Use a profiler► Part of many toolchains (z. B. gprof, Lauterbach Debugger)► Finds hotspots/critical-paths► Every step of optimization must be measured
  27. 27. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyIntermezzo: SmartTime% Hit Func Func Func Func FChild FChild FChild FChild Func FuncRuntime Count Sum Min Max Mean Sum Min Max Mean ID Name---------------------------------------------------------------------------------------------------21.35 722 76.567 0.090 0.117 0.099 113.022 0.152 0.161 0.152 1202 ReadHeaderShort10.02 26 35.926 0.197 4.177 1.380 148.249 0.807 17.282 5.701 1218 GetChildByFID9.85 1060 35.308 0.027 0.108 0.027 35.792 0.027 0.125 0.027 588 pobjGetObjectHeader…5.60 6 20.096 3.039 3.505 3.343 20.096 3.039 3.505 3.343 149 vHALWriteBlock4.53 136 16.242 0.108 0.170 0.117 20.500 0.143 0.206 0.143 529 MmFileRef2Ptr4.35 771 15.615 0.018 0.027 0.018 15.615 0.018 0.027 0.018 669 GetDataShort3.47 1 12.450 12.450 12.450 12.450 358.599 358.599 358.599 358.599 10 ROOT3.21 129 11.518 0.081 0.099 0.081 17.094 0.125 0.134 0.125 1200 ReadHeaderByte2.15 50 7.709 0.143 0.179 0.152 28.845 0.574 0.583 0.574 1197 GetFileSize2.01 32 7.216 0.090 0.332 0.224 286.424 0.090 128.216 8.946 925 InvokeBasic1.92 34 6.875 0.179 0.215 0.197 48.350 1.416 1.443 1.416 1196 peeGetFileBodyOffset1.83 73 6.570 0.081 0.099 0.090 18.187 0.242 0.278 0.242 1199 ReadHeaderByteUse…1.19 29 4.258 0.134 0.152 0.143 6.266 0.197 0.260 0.215 917 Return1.06 25 3.810 0.125 0.188 0.152 283.484 0.305 128.395 11.339 932 InvokeExec0.98 1 3.532 3.532 3.532 3.532 3.532 3.532 3.532 3.532 139 UART_SendByteAndR…0.90 1 3.227 3.227 3.227 3.227 4.912 4.912 4.912 4.912 1213 SearchADF0.75 40 2.698 0.018 0.583 0.063 2.698 0.018 0.583 0.063 690 MemFill2RAM0.75 52 2.698 0.045 0.054 0.045 3.254 0.054 0.063 0.054 858 GetCPEntryShort0.74 21 2.644 0.117 0.134 0.125 15.561 0.170 12.092 0.735 508 MmTaCommitAllTransa…0.73 231 2.617 0.009 0.018 0.009 2.617 0.009 0.018 0.009 666 GetDataByte0.68 17 2.429 0.134 0.152 0.134 5.764 0.179 0.565 0.332 896 GetstaticExec0.61 2 2.187 1.094 1.094 1.094 2.187 1.094 1.094 1.094 648 MmSegCalcSatEdcHelper0.60 1 2.151 2.151 2.151 2.151 2.366 2.366 2.366 2.366 806 TM_SendDataSWget…
  28. 28. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 4: Exploit concurrencyHow not to cook spaghetti with tomato sauce:Step Time Total TimePeel and chop onions 1 1Peel and chop garlic 1 2Heat olive oil in pan 5 7Steam onions/garlic 3 10Add canned tomatos 1 11Season with salt/pepper and tasting 1 12Simmer tomato sauce 15 27Grind parmesan 2 29Bring salted water to boil 10 39Boil spaghetti 10 49Prepare 2 51
  29. 29. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 4: Exploit concurrencyPrepare(2)Boil spaghetti(10)Bring salted water to boil(10)Simmer tomato sauce(15)Season with salt/pepper/ and tasting(1)Added canned tomatoes(1)Steam onions/garlic(3)Heat olive oil in pan(5)Peel/chop onions(1)Peel/chop garlic(1)Grind parmesan(2)This tree showsthe dependenciesThis tree showsthe dependencies
  30. 30. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 4: Exploit concurrencyChefs load:26%PrepareBoil spaghettiBoil waterSimmer sauceHeat oilChoponions/garlicSteam onions/garlicGrind parmesanCanned tomatoesSeasoning/tasting27The critical path startswith heating oil.Doable in 27 minsThe critical path startswith heating oil.Doable in 27 minsblue: Chef waitsorange: Chef worksblue: Chef waitsorange: Chef works
  31. 31. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 4: Exploit concurrencyDivide processes in independent stepsExecute independent steps in parallel► Assign tasks to worker threads► Example: C# BackgroundWorker class► Increases "liveliness" of the system►User input►Network and file I/O►Calculations► Performance gain with multi-core systemsOnly optimize along the performance critical-path!
  32. 32. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 5: Look for alternative algorithms/designs► Optimization is a top-down process► Code-level Optimization often leads to "fast slow code"► Best optimizations stem from key insights► Example: quicksort vs. bubblesort► Example: computation of greatest common divisor
  33. 33. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 5: Look for alternative algorithms/designsint gcdSimple(int a, int b) {int i;if (a < b) { // Ensure a >= bi = b;b = a;a = i;}for (i = b; i > 0; --i) {if ( a % i == 0&& b % i == 0 ) {return i;}}return 0;}Brute-force approach► Straight-forward and obvious► Up to b loop iterations► Up to 2 x b integer operationsEuclids key insight (~300 BC)► gcd(a, b) == gcd(b, a % b)
  34. 34. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 5: Look for alternative algorithms/designsint gcdEuclid(int a, int b) {int i;if (a < b) { // Ensure a >= bi = b;b = a;a = i;}for (;;) {i = a % b;if (i == 0)return b;a = b;b = i;}}Number of integer operations:a 64 314.159 23.456.472b 54 271.828 2.324.328gcdSimple 58 271.829 2.324.323gcdEuclid 4 9 12
  35. 35. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 6: Differentiate between normal and worst caseDifferent requirements► Normal case: efficient► Worst case: "only" correct; efficiency unimportantBe careful with generic code and abstractions Abstraktionen► Increase maintainability► Frequently decrease efficiencySpecial case treatment► Often ugly► Often very efficientExamples► Speculative execution and branch prediction► 2G SIM SELECT command
  36. 36. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyIntermezzo: 2G SELECTSELECT command► Selects a file/folder on a SIM card (smart card)► Returns SIM card status information (access rights, PIN status, free memory)GET RESPONSE command► Transport layer command of T=0 protocol► Generic command to fetch response data from a previous command to the SIM card"Measure, measure, measure" principle► Time to select a file: ~10ms► Time to build status information: ~10 - 90ms► SELECT command is issued a lot by mobile handsets► In 90% of all cases, handset doesnt want status info and hence sends no GET RESPONSE!
  37. 37. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyIntermezzo: 2G SELECT"Differentiate between normal and worst case" principleModified SELECT command► Selects file/folder as usual► Doesnt build status information► But remembers that SELECT was issued as last commandGET RESPONSE► If last command was SELECT, build status information "just-in-time"► Return status information to handsetResult:► Worst case: GET RESPONSE 10 - 90 ms slower (but GET RESPONSE is not used a lot)► Normal case: SELECT is 2 - 10 x faster
  38. 38. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 7: Fight high multiplicityExecution time► Code that is executed frequently► Examples for techniques► Loop-unrolling► InliningFootprint► Redundancy in code and data► Examples for techniques► Factor out common code (base classes, subroutines)► Data compressionHow to detect► Profiler, run-time measurements► Analyze map file► ZIP data, measure compression rate
  39. 39. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 8: CachingKeep data that is frequently accessed in memoryData that is difficult to access► Read cache► Example: RAM, browser cache► Write cache► Example: Non-volatile memoryData that is difficult to compute► sin(), log(), data scaling and conversionVerify efficacy by measuring cache hit-rate!
  40. 40. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 9: Precompute resultsPerform computations long before results are needed► At system start-up► At compile-timeUse look-up tables► Example: CRC16 checksum algorithm
  41. 41. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 9: Precompute resultsuint16_t SimpleCRC16(uint8_t value, uint16_t crcin) {uint16_t k = (((crcin >> 8) ^ value) & 255) << 8;uint16_t crc = 0;uint16_t bits = 8;while (bits--) {if (( crc ^ k ) & 0x8000)crc = (crc << 1) ^ 0x1021;elsecrc <<= 1;k <<= 1;}return ((crcin << 8) ^ crc);}
  42. 42. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 9: Precompute resultsconst uint16_t CRC16_TABLE[] = {0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7,0x8108, 0x9129, 0xA14A, 0xB16B, 0xC18C, 0xD1AD, 0xE1CE, 0xF1EF,0x1231, 0x0210, 0x3273, 0x2252, 0x52B5, 0x4294, 0x72F7, 0x62D6,: : : : : : : :0x7C26, 0x6C07, 0x5C64, 0x4C45, 0x3CA2, 0x2C83, 0x1CE0, 0x0CC1,0xEF1F, 0xFF3E, 0xCF5D, 0xDF7C, 0xAF9B, 0xBFBA, 0x8FD9, 0x9FF8,0x6E17, 0x7E36, 0x4E55, 0x5E74, 0x2E93, 0x3EB2, 0x0ED1, 0x1EF0};U16 LookupCRC16(U08 value, U16 crcin){return (U16)((crcin << 8) ^CRC16_TABLE[(U08)((crcin >> 8) ^ (value))]);}32 xImprovement > factor 6
  43. 43. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 10: Exploit the processors architectureWord-wise data processing► Refer to MemFill exampleNatural byte-ordering ("endianness")► Frequently, protocols use byte-ordering that is different to the byte-ordering of the targetarchitecture► No problem, as long as data is only stored► If data is also used internally (for computations) introduce endianness conversion layer:► When data enters the system: convert extern endian → intern endian► Perform computations with internal endianness► When data leaves the system: convert intern endian → extern endianUse portable integer types► C99 stdint.h► Exact width types to store data (e. g. uint8_t)► Minimum width types for computations (e. g. uint_fast8_t)
  44. 44. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 11: Recode in assembly languageAfter all other possibilities have been tried:► No compiler/optimizer is perfect► Jumps to arbitrary locations possible► Stack manipulations► Self-modifying code► Some instructions have no high-level language equivalent:ADC ; add with carryROL, ROR ; rotate left, rightJC, JNE, JPL ; branch based on flagsDIV ; div and mod at the same time
  45. 45. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyPrinciple 11: Recode in assembly languageImportant: Dont think like a compiler!Instruction-level optimizations are good► Analyze generated assembly code► Remove redundant instructions► Replace inefficient instructions with more efficient instructionsBut so-called "local optimization" is much better► View instructions as tools/building-blocks► Combine these building-blocks in an efficient (creative!) way
  46. 46. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyIntermezzo: Local optimizationMichael Abrashs programming challenge (ca. 1991)► Write a function that finds the smallest/biggest value in an array► Use less than 24 bytes of memory► x86 assembly language (at the time: 16-bit)unsigned int FindHigh(int len, unsigned int* buffer);unsigned int FindLow(int len, unsigned int* buffer);
  47. 47. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyIntermezzo: Local optimization_FindLow: pop ax ; get return addresspop dx ; get lenpop bx ; get data pointerpush bxpush dxpush axsave: mov ax, [bx] ; store mintop: cmp ax, [bx] ; compare current val to minja save ; if smaller, save new mininc bx ; advance to next val, 1st byteinc bx ; advance to next val, 2nd bytedec dx ; decrement loop counterjnz top ; next iterationretNice, for sure...Nice, for sure...
  48. 48. The Art of Writing Efficient Software Copyright © 2013 Ralf HollyIntermezzo: Local optimization_FindHigh: db 0b9h ; first byte of mov cx, 31C9_FindLow: xor cx, cx ; 31 C9pop ax ; get return addresspop dx ; get lenpop bx ; get data pointerpush bxpush dxpush axsave: mov ax, [bx]top: cmp ax, [bx]jcxz around ; depending on compare modecmc ; invert compare resultaround: ja saveinc bxinc bxdec dxjnz topretBut he wantedboth functionsin 24 bytes!But he wantedboth functionsin 24 bytes!Study this carefully!There is a lot to belearned!Study this carefully!There is a lot to belearned!
  49. 49. The Art of Writing Efficient Software Copyright © 2013 Ralf HollySummaryThere will always be a need for efficient softwareEfficiency must be a requirements issueCorrectness and maintainability usually have higher priorityProve that an optimization is necessaryProve that optimization worksPrerequisites: knowledge and a systematic approachFor outstanding efficiency: creativity and passion requiredMichael Abrash: „The best optimizer is between your ears“
  50. 50. http://www.approxion.com

×