BECKER et al.: DETECTING SOFTWARE THEFT IN EMBEDDED SYSTEMS 1145Fig. 1. Pipelining concept of the ATmega8 . While one instruction is executed, another one is prefetched from the program memory.mark. The side-channel software watermark consists of only afew instructions that are inserted at the assembly level. Theseinstructions are then leaking information in the power consump-tion of the tested design and can be detected by a trusted veriﬁerwith methods very similar to classic side-channel attacks. Thesewatermarks provide higher robustness against code-transforma-tion attacks while introducing only a very small overhead interms of performance and code size.The previous two methods to detect software plagiarism canbe used by a veriﬁer to detect whether or not his code is presentin an embedded system. However, these methods cannot be usedto prove towards a third party that the code or the watermarkbelongs to the veriﬁer. The original side-channel software wa-termark only transmits one bit of information—either the water-mark is present or not—but it does not contain any informationabout the owner of the watermark. We, therefore, extended theside-channel watermark so that the watermark can transmit adigital signature. The veriﬁer can use this digital signature toprove towards a third party, e.g., a judge, that he is the ownerof the watermark. In most cases, this will imply that he is alsothe owner of part of the software code that contained this wa-termark.The rest of the paper is organized as follows. Section II ex-plains in detail how the detection based on Hamming weightsworks. In Section III, the software side-channel watermark is in-troduced. How this watermark idea can be extended to provideproof-of-ownership is presented in Section IV. We will then dis-cuss the robustness of the proposed side-channel software wa-termark in Section V. We conclude the paper with a short sum-mary of the achieved results in the last section.II. PLAGIARISM DETECTION WITH STRING MATCHINGALGORITHMSOur ﬁrst approach is intended to provide an indication of soft-ware plagiarism in the case where the insertion of copy detec-tion methods inside the code is not available, e.g., for alreadyexisting products.Let us begin with a deﬁnition of the Hamming weight of anumber as this is a crucial point for this approach. Assumethat is given by its -bit representation .The Hamming weight is then deﬁned by the number of1’s, i.e.,We now propose an efﬁcient method that makes use of the factthat the power consumption of the microcontroller is relatedto the Hamming weight of the prefetched opcode. This depen-dency allows us to map the power consumption of every clockcycle, which may consist of thousands of sampling points, toonly one Hamming weight. The result is a high dimensionalityreduction of the power traces. For subsequent analyses, stringmatching algorithms are applied for comparisons between orig-inal and suspicious software executions.Our target device is the 8-bit microcontroller Atmel AVR AT-mega8. The Atmega8 is an extremely widely used microcon-troller which is based on the RISC design strategy. As most ofthe RISC microcontrollers, the ATmega8 uses a Harvard archi-tecture, i.e., program and data memory are physically separatedand do not share one bus. The main advantage of this architec-ture is a pipelining concept that allows a simultaneous opcodeprefetch during an instruction execution (see Fig. 1).The opcodes of the ATmega8 have a 16-bit format and con-sist of the bit representation of the instruction and correspondingoperands or literals.1 The instruction , e.g., performs a log-ical AND between the two registers and , , , andis given by the opcodewhere and are the binary representations ofand , respectively.A closer look at the power consumption reveals that theprefetching mechanism is leaking the Hamming weight ofthe opcode.2 Fig. 2 shows the power consumption of randominstructions with different Hamming weights of the prefetchedopcode recorded at a clock rate of 1 MHz. Right beforethe rising to the second peak, the Hamming weights of theprefetched opcodes are clearly distinguishable from each other.Our prediction was veriﬁed by the analysis of an AES imple-mentation on the ATmega8. The histogram in Fig. 3 depicts thedistribution of the power consumption at the point of leakageusing about 6 million instructions of multiple AES encryptions.The intersections between the peaks were taken as thresholdvalues when mapping the voltages to Hamming weights. In this1The only exceptions are four (of 130) instructions that have access to theprogram or data memory. These instructions can be described either by 16- or32-bit opcodes. Thirty-two-bit opcodes are fetched in two clock cycles.2We focused on the ATmega8 here. However, we observed the same leakagebehavior for other microcontrollers of the AVR family by Atmel and PIC familyby Microchip. Both families use the Harvard architecture with separated busesfor data and instruction memory.
1146 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012Fig. 2. Power consumption of instructions with different Hamming weights.There exist no 16-bit instruction with a Hamming weight of 16.Fig. 3. Histogram of the power consumptions at the moment of leakage inevery clock cycle. For this purpose, 6 000 000 instructions of multiple AES ex-ecutions were recorded.way, the power traces are converted to strings of Hammingweights which means that every clock cycle is represented byonly one value. In our tests, the probability of a correct map-ping to the Hamming weight was close to 100% and we wereable to reduce the size of one power trace from approximately3 750 000 measurement points to a string of only 3000 Ham-ming weights. Due to this reduction, the computational effortfor analyzing the strings stays at a low level, i.e., the stringmatching algorithms discussed in the following typically do notlast longer than a few seconds on a modern PC.There is one instruction that does not comply with our leakagemodel. The instruction , loads one byte of the pro-gram memory to the register and does not leak the Ham-ming weight of the fetched opcode. Depending on the imple-mented algorithm, this should be considered during the detec-tion process.A. DetectionThe Hamming weight strings can now be used to indicatesoftware plagiarism. The idea can be summarized in three steps:First, the execution ﬂow of the original software is mapped toa sting of Hamming weights. This is realized either by mea-suring and evaluating the power consumption of the executionas stated above or by simulating the execution ﬂow and calcu-lating the Hamming weights from the opcodes. Note that, e.g.,branch instructions need more than one clock cycle and hence,during the execution more than one opcode is prefetched fromthe program memory.The second step is to record a power trace of the suspiciousdevice to obtain also a string of Hamming weights, denoted as. A one-to-one copy can now be detected in a third step bycomparing the two strings and .In fact, a one-to-one copy is an idealized assumption of soft-ware plagiarism and is, therefore, easy to detect. In the fol-lowing, we discuss how minor modiﬁcations can also be de-tected by the veriﬁer.1) Register Substitution: The easiest modiﬁcation that inﬂu-ences the Hamming weights is to substitute the working regis-ters. There are 32 general purpose working registers on the AT-mega8, which means that one can choose between six differentHamming weights (register to register). However, a substitution of a register affects the Ham-ming weight of every instruction that processes the data of thechanged register. For instance, the short assembler coderesults to the string (cf. ). If the attacker sub-stitutes, e.g., with , this affects lines 1 and 3 and leadsto . To consider register substitutions, the veriﬁca-tion process can now be extended by a generalized string thatconsiders the dependencies between the registers in different in-structions. For instance, the string of our short example could bechanged to , where 6, 7, and 2are the minimum Hamming weights of the given instructionsexcluding the registers and and are variables for the Ham-ming weights of the registers. The veriﬁer has now to check ifthere exist and with , such that and match.Of course, a long string with many dependencies is more mean-ingful than a short string as given in our example.2) Partial Copy: Instead of scanning the whole source code,it might make sense to focus only on the important parts of theprogram to detect partial copies of the original code. In thiscase, approximate string matching algorithms like the general-ized Boyer–Moore–Horspool algorithm  are well-suited toﬁnd all positions where these patterns occur in the string . Thedifference to an exact string matching algorithm is that the gen-eralized Boyer–Moore–Horspool algorithm considers a prede-ﬁned number of mismatches. This is necessary due to infre-quent erroneous mappings of power consumptions to Hammingweights or to consider the usage of instructions. The gen-eralized Boyer–Moore–Horspool algorithm solves the mis-matches problem in on average, withbeing the length of the string, the length of the pattern, andthe alphabet. Additionally, a preprocessing is executed oncefor every pattern in .3) Insert, Delete, or Substitute Instructions: A metric that isoften used for approximate string matching is the Levenshteindistance of two strings , also known as edit distance. It was
BECKER et al.: DETECTING SOFTWARE THEFT IN EMBEDDED SYSTEMS 1147Fig. 4. Computation of the Levenshtein distance between the two stringsand . The gray boxes illustrate the trace that is followedto get to the highlighted overall distance 2.introduced by Levenshtein in 1965 and is deﬁned as the min-imum number of necessary edits to transform one string intoanother, in our case into . Possible edits are:insertion—add one character of to the string ;deletion—remove one character of the string ;substitution—replace a character of by a character of .Let us denote the length of as and the length of as . Thenthe Levenshtein distance is determined by creating anmatrix with the following steps:i) Initialize the ﬁrst row and columnii) For all and setif no edit is necessary, i.e., with and denotingthe th character of and the th character of , respec-tively, andelse.The overall distance between and is the value stored in .An example is given in Fig. 4. Since every cell of the matrixhas to be computed, the computational complexity is bound by.With this method the veriﬁer is able to measure the degreeof modiﬁcation or simply use the distance as an indication forplagiarism.4) Handling Unpredictable Branches or Interrupts: In somecases, it is impossible to acquire measurements of the suspi-cious implementation with the same input data as the referenceimplementation. Another scenario is that the underlying algo-rithm is nondeterministic regarding the order of executed sub-routines due to random parameters. The result might be that forevery measurement a different path of subroutines is executed.While scanning the whole string after a partial match, e.g., a sub-routine, could be still successful, comparing the whole stringsseems to be more complicated.One technique that can help us here is called dot plot andcomes from the ﬁeld of bioinformatics where it is mostly appliedFig. 5. Dot plot of the two strings and .Every character match is highlighted by a gray box.Fig. 6. Comparing two HW strings with dot plots (a) before and (b) after re-moving noise. The diagonal lines illustrate the matchings between the strings.on DNA sequences to illustrate similarities . In , Yankovet al. adopted this technique to analyze two time series afterconverting them into a symbolic representation. The idea is verysimple: to compare two strings with length and with length, an -matrix is generated where position is markedif and match. A small example is given in Fig. 5.Two interesting observations can be made from this plot.First, equal substrings are represented as diagonal lines. Inour example, the substring 1, 5, 2 appears in both strings. Thesecond observation is that a mirrored substring is marked as adiagonal line which is perpendicular to the main diagonal. Thisis the case for 2, 8, 6 and 6, 8, 2.We applied this technique to two Hamming weight stringsof different software implementations with both containing twoequal subroutines. A small detail is depicted in Fig. 6(a).As there is much noise in this plot, we removed everymatching that is shorter than four characters. The result is givenin Fig. 6(b), where at least two diagonal lines are suspicious,the ﬁrst, starting at about character 60 in HW string 2 and thesecond, starting at about 240. Another interesting point is thatthe second line is interrupted. This is due to the insertion ofsome dummy instructions during the subroutine of the secondimplementation which should simulate an interrupt serviceroutine.The methods proposed in this section provide only an indica-tion of plagiarism. To make a more precise statement, the soft-ware code has to be modiﬁed in advance, as we will discuss inSections III–VI.III. SIDE-CHANNEL SOFTWARE WATERMARKThe main idea behind the side-channel software watermarkthat we ﬁrst introduced in  is to use the power consumptionas a hidden communication channel to transmit a watermark.
1148 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012This idea is related to the side-channel-based hardware water-mark proposed in . The watermark is hidden in the powerconsumption of the tested system and can only be revealed bya veriﬁer who possesses the watermark secret. The watermarkconsists of two components:1) combination function;2) leakage generator;and is realized by adding instructions at the assembly level tothe targeted code. The combination function uses some knowninternal state of the program and a watermark constant to com-pute a one-bit output. This output bit is then leaked out (trans-mitted) via the power consumption using a leakage generator.The leakage generator is realized by one or several instructionswhose power dissipation depends on the value of the combina-tion function. This results in a power consumption that dependson the known internal state, the combination function, and thewatermark constant. To detect the watermark, the veriﬁer canuse his knowledge of the watermark design to perform a cor-relation power analysis (CPA), similar to a classic side-channelattack . If this power analysis is successful, i.e., the water-mark signal is detected in the power traces, the veriﬁer can besure that his watermark is embedded in the device. In many ap-plications, this will imply that a copy of the original software ispresent.In the following, we will ﬁrst describe the details of ourproof-of-concept implementation before we explain how thiswatermark can be detected by means of a side-channel analysis.A. ImplementationTo show the feasibility of our approach, we implemented aside-channel watermark on the ATmega8 microcontroller. Asmentioned, the watermark consists of a combination functionand a leakage generator. In our case, the input to the combina-tion function is a 16-bit internal state and a 16-bit watermarkconstant. The combination function computes a one-bit outputwhich is leaked out using a leakage generator. There are manyways to implement a combination function and the combina-tion function introduced in this paper should only be seen asan example. In our implementation, we chose a very small andcompact combination function that only consists of four assem-bler instructions. The 16-bit input to the combination function isseparated into two byte values and . In the ﬁrst step,and are each subtracted from the one-byte watermark con-stants, and . In the next step, the two one-byte results ofthese subtractions are multiplied with each other. The resultingtwo-byte value from this multiplication is again separated intotwo one-byte values and . and are then multiplied witheach other againThe output of the combination function is the eighth least sig-niﬁcant bit of the result of this multiplication. The corre-sponding assembler code can be found as follows:The registers and are used as the internal states and theintegers 35 and 202 are the two watermark constants and .The instruction subtracts the constant 35 fromand stores the result back in . In the ATmega8 instructionset, the two-byte result of the multiply instruction is alwaysstored in and . The result of the combination function isthe most signiﬁcant bit in .A leakage generator is used to leak out the output bit of thecombination function. In our implementation, we used a con-ditional jump as our leakage generator. If the output bit is 0,we compute the two’s complement of the register , otherwiseno operation is executed. We furthermore store the result ofin the memory. The following is the corresponding assemblercode:Recall that the output of the combination function is the mostsigniﬁcant bit of . , checks the most signiﬁcant bitof and skips the next instruction if this bit is 0. Otherwisethe instruction is executed, which computes the two’s com-plement of . In the last step, the value of is stored in thememory.The leakage generator helps the veriﬁer to detect the water-mark. The difference in the power consumption between thecase that the instruction is executed in comparison to thecase the instruction is skipped is very large. This makes de-tecting the watermark using side-channel analysis straightfor-ward.We were also able to successfully detect the watermarkwithout any leakage generator. This is due to the fact that thepower consumption of the last multiply instruction is higherif the output bit is 1 compared to 0. However, the leakagegenerator makes detection much easier and can also protectagainst reverse-engineering and code-transformation attacks(see Section V). In Section III-B, we will give more details onhow the watermark detection works.B. Watermark VeriﬁcationTo detect the watermark, we use a correlation power anal-ysis (CPA). The main idea of a correlation power analysis isto exploit the fact that the power consumption of a device de-pends on the executed algorithm as well as on the processeddata. However, this data-dependent power consumption mightbe too small to be observed with a simple power analysis as ithas been used in Section II. Therefore, many traces are mea-sured in a CPA and then statistical methods are used to extractthe wanted information. In a classical CPA setting, the goal isto retrieve a secret key. In the watermark case, the veriﬁer doesnot want to retrieve a secret key but wants to verify whether or
BECKER et al.: DETECTING SOFTWARE THEFT IN EMBEDDED SYSTEMS 1149Fig. 7. Result of the side-channel analysis plotted (a) against time and (b) with respect to different hypotheses, where hypothesis number 100 is the correct one.Fig. 8. Results of the side-channel analysis with respect to number of mea-surements. Even with less than 100 measurements, the correct hypothesis canbe clearly detected.not his watermark is present. To do this, the veriﬁer ﬁrst col-lects power traces with different inputs of the system under test.For each trace, the veriﬁer computes the known internal statethat is used as the input to the combination function (in our im-plementation this was the value of register and ). Theinternal state used for the watermark should be a varying statethat is predictable for the veriﬁer, e.g., a state depending on theinput values. The veriﬁer uses this internal state and the wa-termark constants to compute the output of the used combina-tion function for each input value and stores these values as thecorrect hypothesis. He repeats this procedure times byusing different watermark constants or combination functions.At the end, the veriﬁer has hypotheses with different wa-termark constants or combination functions, where one of thehypotheses contains the correct watermark constant and combi-nation function.In the last step, the veriﬁer correlates the hypotheses withthe measured power traces. If the watermark is embedded inthe tested device, a correlation peak should be visible for thehypothesis with the correct watermark constant. This is due tothe fact that the correct hypothesis is the best prediction of thepower consumption during the execution of the leakage circuit.Reference  gives a more detailed introduction to this side-channel analysis method. The result of the CPA on our exampleimplementation can be found in Figs. 7 and 8. In Fig. 8, we cansee that detecting the watermark is possible with less than 100measurements.Other microcontrollers will have a different power behaviorand, therefore, the number of traces needed to detect the water-mark might vary from CPU to CPU. It should be noted that afew hundred traces are easily obtained from most practical em-bedded systems, and it is reasonable to assume that a veriﬁercan use many more measurements if needed. Hence, even if thesignal-to-noise ratio might decrease for other microcontrollers,it is safe to assume that detection of this kind of watermark willin most cases be possible. It should also be noted that the lengthof the code that is being watermarked does not have an impacton the signal-to-noise ratio of the detection. The number of in-structions that are executed before or after the watermark doesnot make any difference in this type of side-channel analysis.C. TriggeringOne important aspect of the software watermark detection isthe triggering and alignment of the power measurements. Totake a power measurement we need to have a trigger signal thatwill indicate the oscilloscope to start the measurement. In prac-tice, a communication signal or the power-up signal is usuallyused as the trigger signal for the oscilloscope. Other possibletrigger points might be an unusually low or high power con-sumption; e.g., when data is written to a nonvolatile memoryposition, the microcontroller wakes up from a sleep mode or acoprocessor is activated. Modern oscilloscopes have advancedtriggering mechanisms where several trigger conditions can beused at once, e.g., a speciﬁc signal from the bus followed by anunusually high power consumption, etc. Because these triggerpoints might not be close to the inserted watermark, the ver-iﬁer might need to locate and align the watermark in a largepower trace. With some knowledge of the underlying design, itis usually possible to guess a time window in which the water-marked code is executed. Looking for power patterns that, e.g.,are caused by a large number of memory lookups or the activa-tion of a coprocessor can also help to identify the time windowwhere the watermark is executed. Once this time window is lo-cated, alignment mechanisms such as simple pattern matchingalgorithms  or more advanced methods such as  and are used to align each power trace with each other.Basically, the problem of triggering and alignment of thepower trace for a side-channel watermark is similar to theproblem of triggering and alignment of the power traces
1150 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012in a real-world side-channel attack. Often in a real-worldside-channel attack the attacker has actually less knowledgeof the attacked system than the veriﬁer has for detecting thewatermark. The veriﬁer knows the code and ﬂow of his water-marked program while in a real-world attack the attacker canoften only guess how it is implemented. We would, therefore,like to refer to the area of real-world side-channel attacks formore details on the feasibility of triggering a measurement inpractice , , , . The alignment of power traces willbe addressed in more detail in Section V-B when we describehow to overcome the insertion of random delays as one of thepossible attacks on our watermark.PROOF-OF-OWNERSHIPThe watermark discussed in the previous section only trans-mits one bit of information: either the watermark is present ornot. This is very helpful to detect whether or not your codewas used in an embedded system. However, it is not possiblefor the veriﬁer to prove towards a third party that he is the le-gitimate owner of the watermark. This is due to the fact thatthe watermark itself does not contain any information about theparty who inserted the watermark. Therefore, everyone couldclaim to be the owner of the watermark once he detects a wa-termark in a system. We call the goal to be able to prove to-wards a third party that you are the legitimate owner of the wa-termark proof-of-ownership. In this section, we will show howwe can expand the watermark idea from Section III to also pro-vide proof-of-ownership.The idea to establish proof-of-ownership is to modify theside-channel watermark in a way that the watermark transmitsa digital signature that can uniquely identify the owner of thewatermark. One nice property of the side-channel watermark isthat the watermark is hidden in the noise of the power consump-tion of the system. Without the knowledge of the watermark se-cret, the presence of the watermark cannot be detected. So wehave already established a hidden communication channel. InFig. 7(a), we can observe a positive correlation peak while theleakage circuit is being executed. Recall that the leakage cir-cuit is designed in such a way that the power consumption ishigher when the output of the combination function is “1.” Ifwe change the leakage generator in a way that the power con-sumption is higher when the output bit is “0” instead of “1,” thenthe correlation will be inverted. That means that we would notsee a positive, but a negative correlation peak. We can use thisproperty to transmit information. If the bit we want to transmitis “0,” we invert the output bit of the combination function. Ifit is “1,” we do not invert the output bit. By doing so, we knowthat “1” is being transmitted when we see a positive correlationpeak and “0” when we see a negative correlation peak. We canuse this method to transmit data one bit at a time. We tested thiskind of watermark by using the same combination function asdiscussed in Section III but exchanged the leakage generator.We stored an 80-bit signature that we want to transmit with ourwatermark in the program memory. We then load the signature,one byte at a time, from the program memory and subsequentlyadd the output bit of the combination function to the bit we wantto transmit. The resulting bit is then leaked out using a condi-tional jump, just as it has been done in Section III-A.Fig. 9. Side-channel watermark that transmits an ID. Positive correlation peaksindicate that a “1” is being transmitted; negative correlation peaks indicate a “0.”In this ﬁgure, we can see how the hexadecimal string “E926CFFD” is beingtransmitted.The same detection method as explained in Section III-B isused to detect the watermark and to read out the transmitted sig-nature. Fig. 9 shows the result of this correlation-based poweranalysis. The positive and negative correlation peaks over timerepresent the transmitted signature. The resulting watermark isstill quite small. In our example, implementation of the water-mark consists of only 15 assembler instructions for the leakagegenerator and only 4 instructions for the combination function.We also used 80 bits of the program memory to store the digitalsignature. If storing the signature in the program memory is toosuspicious, it is also possible to implement the leakage gener-ator without a load instruction by using constants. This mightincrease the code-size a bit, but it is still possible to programthe leakage generator with around 30 instructions for an 80-bitsignature on an 8-bit microcontroller. In a 16-bit or 32-bit archi-tecture, smaller code sizes can be achieved.IV. ROBUSTNESS AND SECURITY ANALYSIS OF THE SOFTWAREWATERMARKIn Section IV, we have introduced our side-channel water-marks and showed that we are able to reliably detect the wa-termarks. However, so far we have not talked about the secu-rity of the watermark. Traditionally, the security of watermarkstowards different attacks is called robustness. To the best ofour knowledge, there does not exist a completely robust soft-ware watermark that can withstand all known attacks. For soft-ware watermarks, and especially side-channel watermarks, itis very difﬁcult to quantify the robustness of the watermark.We do not claim that our watermark is “completely robust” orsecure—given sufﬁcient effort, the side-channel software wa-termark can be removed. In the following, we will introduceour security model and describe some possible attacks againstthe system. We will provide arguments why these attacks canbe nontrivial in practice. Hence, we will show that the water-mark—although not impossible to remove—still represents asigniﬁcant obstacle for attackers.In the security model of the software watermark, three partiesare involved: the owner of the watermark who inserted the wa-termark, the veriﬁer who locates the watermark in a suspecteddevice, and an attacker who tries to remove the watermark from
BECKER et al.: DETECTING SOFTWARE THEFT IN EMBEDDED SYSTEMS 1151a software code. The attacker has only access to the assemblercode of the watermarked program. The attacker does not knowthe design of the combination function as well as what part ofthe assembler code implements this combination function andwhich internal states or constants are being used in this com-bination function. This knowledge is considered the watermarksecret. The veriﬁer needs to be a trusted third party who sharesthe watermark secret with the owner of the watermark. A suc-cessful attack is deﬁned as follows:A transformation of the watermarked software code that1) will make it impossible for the veriﬁer to locate the wa-termark with means of side-channel analysis and 2) does notchange the functionality of the software program.Hence, an attacker was unsuccessful if either the veriﬁer isstill able to detect the software watermark or the resulting soft-ware code does not fulﬁll the intended purpose of the programany longer. We will discuss three different attack approaches toremove the watermark from the assembler code.Reverse-engineering attack: In a reverse-engineering at-tack, the attacker tries to locate the assembler instructions thatimplement the watermark using reverse-engineering techniquesso that he can remove or alter these instructions.Code-transformation attacks: In a code-transformationattack, the attacker uses automated code-transformations tochange the original assembler code in a way that the resultingcode is still functioning correct but the watermark detection isimpossible.Side-channel attacks: In a side-channel attack, the attackertries to use side-channel techniques to locate the side-channelsignal in the power consumption. This gives the attacker theknowledge of the location of some of the watermark instructions(e.g., the leakage generator).In the following, we discuss each of the three attacks in moredetail.A. Reverse-Engineering AttackIf the attacker can reverse-engineer the entire code and iden-tify the purpose of each instruction and function, the attackeralso knows which instructions are not directly needed by theprogram and which are, therefore, possible watermark instruc-tions. However, complete reverse-engineering of the assemblercode can be very difﬁcult and time consuming, especially inlarger programs. Furthermore, complete reverse-engineeringmight be more expensive than actually implementing it,making product piracy not cost effective if reverse-engineeringis needed. An attacker can try to locate the watermark withoutreverse-engineering the entire code. For example, the attackercould use techniques such as data-ﬂow diagrams to detectsuspicious code segments which he can then investigate fur-ther. The complexity of such attacks depends on the attackersreverse-engineering skills as well as on the way the watermarkis embedded in the code.We believe that due to the small size of the watermarks, lo-cating the watermarks with methods of reverse-engineering canbe very expensive for the attacker. Especially in larger designs,which are usually more attractive for software theft, this can bevery difﬁcult. Another attractive property of the side-channelwatermarks is that they are hidden in the power consumptionof the system. This means that an attacker cannot tell whetheror not a watermark is present in the code. So even if she lo-cates and removes one or several side-channel watermarks froma code, she cannot be sure if there are not still more watermarkspresent in the code. Considering the small size of only 5–10 as-sembly instructions for some watermarks, adding multiple wa-termarks is still very economical. This may discourage attackersfrom stealing the code as reverse-engineering the entire code isnecessary to ensure that all watermarks have been removed.B. Code-Transformation AttacksIn an automated code-transformation attack, a software isused to change the program code without changing the seman-tically correct execution of the program. Examples for code-transformations are recompiling the code, reordering of instruc-tions and obfuscation techniques such as replacing one instruc-tion with one or more instructions that have the same result.Code-transformations can be a very powerful attack tool for dis-abling software watermarks as has been shown in , where alltested static software watermarks for Java bytecodes have beensuccessfully removed with standard obfuscation tools.Let us ﬁrst consider the impact of reordering of instructionsand the insertion of dummy instructions on our side-channelwatermark. If these methods are used by the attacker, they canhave the effect that the leakage generator is executed in a dif-ferent clock cycle compared to the original code. For the detec-tion this means that the correlation peak will be at a differentclock cycle. However, the correlation peak will be as visibleas without the reordering as inserting a static delay does notdecrease the signal-to-noise ratio. Therefore, simple reorderingand the insertion of dummy instructions cannot prevent the ver-iﬁer from detecting the watermark.However, if the attacker does not add a static but a randomdelay this will have a negative impact on the watermark detec-tion. Random delays have the effect that the measurement tracesare not aligned with each other, i.e., the clock cycle where theleakage generator is executed varies from measurement to mea-surement. Unaligned traces hamper side-channel analysis butthe detection can still be successful if enough traces are alignedwith each other . It is not always easy to insert efﬁcientrandom delays into the code, e.g., a source of randomness isneeded and simply measuring the execution time of a programmight give an indication of the random delay introduced. Fur-thermore, the veriﬁer can use alignment methods to detect suchmisalignment and remove the random delays. By using thesealignment methods, the veriﬁer has a good chance to counteractthe random delays, especially if the delays are inserted severalclock cycles before the leakage generator. Due to the fact thatthe attacker does not know the location of the leakage generatorthis is very likely. Otherwise the attacker would need to insert alot of random delays which will hurt the performance.To show the power of alignment techniques to counteractrandom delays, we changed our experiment by insertingrandom delays. In our ﬁrst approach, we added our ownrandom delays by using a timer interrupt that would pseudo-randomly trigger every 1–128 clock cycles. We used an S-Boxto generate our pseudorandom numbers and an externallygenerated 8-bit random number as its initialization for each
1152 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012Fig. 10. Figure showing a CPA with 1.000 measurements to detect the watermark in two implementations with random-delay countermeasures. In both ﬁgurespeak extraction and a pattern-based alignment method were used. In (a), random delays are introduced by pseudorandomly triggering a timer interrupt every 1–128clock cycles. In (b), the side-channel countermeasure improved ﬂoating mean was added to the watermarked AES. In both cases, the watermark is clearly detectable.measurement. These random interrupts did not provide muchof an obstacle and with a simple pattern matching algorithm we could detect the watermark. The result of the CPA isshown in Fig. 10(a). To make our experiment more credible,we also implemented the random-delay-based side-channelcountermeasure presented in  to our watermarked AES.This countermeasure, called improved ﬂoating mean, insertsrandom delays at ﬁxed positions but with a varying length.The initial state of the used PRNG is an externally generated64-bit number.3 We again used a pattern-matching alignmentalgorithm and peak extraction before performing our CPA. Theresult of this analysis is shown in Fig. 10(b). The correlationcoefﬁcient decreased for this analysis compared to the originalwatermarked AES implementation from around 0.85 to 0.5 butthis correlation value is still very large. These results aim toshow that it is not simple to insert random delays to defeat thewatermark detection. With more improved alignment methods(e.g., , ) better results could probably be achieved.Furthermore, in , it was demonstrated that the randomdelays can be removed without a big decrease in the correla-tion coefﬁcient with methods similar to the ones described inSection II.By replacing instructions, an attacker might change thepower proﬁle of a code. For example, instead of using thedecrement instruction to decrease a register value, thesubtract with constant instruction could be used. Theseinstructions have a different power proﬁle. However, even if theattacker can change the power proﬁle of the code signiﬁcantly,this does not impact the CPA. The power consumption of theclock cycles before or after the watermark do not have an im-pact on the CPA correlation. As long as there is a difference inpower consumption according to the output of the combination3We used the parameters provided in  for our implementation of improvedﬂoating mean with three dummy rounds before the encryption and inserted thewatermark in the main AES encryption function. Twenty-nine random delays,each varying between 24 and 536 clock cycles in steps of two, are executedbefore the ﬁrst execution of the watermark.function, this difference can be used to detect the watermarkusing a CPA. For example, just the transmission of the outputof the combination function over the internal bus usually leaksenough data-dependent power consumption to be used in aside-channel analysis, regardless of the actual instruction thatis being executed.4A code-transformation that removes the output of the combi-nation function on the other hand would be successful. But everycode-transformation algorithm needs to make sure that the re-sulting code does not change the semantically correct executionof the program. Therefore, it needs to be ensured that for thecompiler or code-transformation algorithm the watermark valueis considered needed. This can be done by storing the output orusing it in some other way. In this case, any code-transformationattacks will be unsuccessful as removing the watermark valuewould destroy the semantically correct execution of the programfrom the view of a compiler.Adding additional side-channel watermarks to a program canbe seen as a code-transformation as well. A side-channel water-mark should not change the state of the program that is beingwatermarked to ensure that the watermark does not cause soft-ware failures. Hence, additional watermarks will not change thecombination function of a previously inserted watermark. Theymight only introduce some either static or data-dependent de-lays. For this reason, it is possible to add multiple watermarksinto a design without interference problems.C. Side-Channel AttacksIf the attacker can successfully detect the watermark usinga side-channel analysis, the attacker also gains the knowledgeof the exact clock cycles where watermark instructions (e.g.,the leakage generators) are executed. In this case, the attacker4This assumes that the power consumption of the internal bus is correlatedwith the hamming weight of the transmitted bits, which is usually the case formicrocontrollers .
BECKER et al.: DETECTING SOFTWARE THEFT IN EMBEDDED SYSTEMS 1153only needs to remove or alter these instructions to make the wa-termark detection impossible. Therefore, the watermark shouldonly be detectable by the legitimate veriﬁer who possesses thewatermark secret.The attacker can try to discover the watermark secret byperforming a brute-force side-channel analysis in which hetries every possible watermark secret. But the attacker facestwo problems with this approach: the big search space ofpossible watermarks and false-positives. The size of the searchspace of possible watermark secrets depends strongly on theapplication, the size of the watermark, and the architecture ofthe microcontroller. The application that is being watermarkeddetermines how many internal states can be used as inputsto the combination function and the size of the watermarkdetermines how many operations the combination functionperforms. Finally, the number of available instructions andfunctions that can be used for the watermark also inﬂuences thesearch space.In the following, we give a rough estimation of a possiblesearch space for our example application of an AES-128 en-cryption on the 8-bit ATmega8 microcontroller. The AES en-cryption program has two 16-byte inputs, the plaintext and thekey, and one 16-byte output, the ciphertext. Let us assume thatthe designer of the watermark can use the 16-byte input for hisinternal state of the watermark. For simplicity, we assume thatthe combination function consists of 10 basic instructions usingthe internal states and two 8-bit watermark constants as inputs.Furthermore, we assume that only the six ATmega8 instructionsaddition, subtraction, AND, OR, exclusive-OR, and multiplicationcan be used. Using these parameters, the lower bound of pos-sible different combination functions is roughly .Besides the large search space an attacker has also to facethe problem of false-positives. If an attacker tries differentwatermark secrets, it is likely that the attacker will see somecorrelation peaks that are not due to the actual watermark. Onereason for a correlation peak might simply be noise as statisti-cally some hypotheses will generate greater correlation peaksthan others. Such peaks are usually called ghost peaks in theliterature. These correlation peaks should be smaller than theactual correlation peak due to the watermark if enough tracesare used. However, if the attacker has not discovered the water-mark yet he does not know how high the correlation peak of thewatermark is supposed to be and might, therefore, falsely sus-pect wrong parts of the design to be the watermark. The secondreason why a false-positive might appear is that some part ofthe actual program that is being watermarked might be linearlyrelated to a possible watermark. In a brute-force approach, allpossible operations on the internal states are tested. Therefore,it is more than likely that one or several of the tested combi-nation functions are identical or linearly related to parts of theactual program. In this case, correlation peaks appear that willindicate a possible watermark at a location where there is nowatermark embedded.To summarize, detecting the watermark using side-channelanalysis can be quite complicated for the attacker. Using small(and possibly multiple) watermarks will increase the problem offalse-positives while the search space becomes too big in prac-tice if larger watermark secrets (more operations and/or morepossible internal states) are used. In our opinion, it seems thatusing a reverse-engineering or code-transformation approach ismore promising to remove the watermark in practice.V. CONCLUSIONIn this work, we introduced three very efﬁcient and cost-ef-fective ways to detect software plagiarism and piracy in em-bedded systems. The biggest advantage of these methods arethat they do not require access to the suspicious program code.This property is very useful for embedded systems as the accessto the program code is usually restricted by program memoryprotection mechanisms. Hence, our methods enable a veriﬁer toefﬁciently test many embedded systems towards software pla-giarism by simply measuring the power consumption of the de-vices under test. In our ﬁrst approach, we achieve this by de-riving the Hamming weight of the running opcode from thetaken power measurements. This can be done without the needto modify the program code, i.e., no watermark needs to beinserted. The Hamming weight method can detect one-to-onecopies of software code very easily. We also showed that this de-tection mechanism can still be successful if the attacker makessome changes to the program code. For this case, we showedhow string matching algorithms can be applied to the Hammingweight method to determine how much the tested code is similarto the reference code. If the code is very similar to the referencecode, this is a very good indicator for software plagiarism.For an increased robustness and easier detection, we haveintroduced the side-channel software watermarks. These wa-termarks can be inserted at the assembly level and introduceonly a very small overhead in terms of code size and runtime.Like the Hamming weight method, these watermarks can be de-tected by simply measuring the power consumption of the de-vice under test. Practical experiments showed that only as fewas hundred traces were needed to clearly detect the watermark.We have also discussed why these watermarks are very robust tocode-transformations and other attacks. Furthermore, the side-channel software watermark can be extended to transmit a dig-ital signature. Such a signature can be used in court to prove thelegitimate ownership of the watermark. Hence, this watermarkcan not only detect the software plagiarism, but can also be usedto provide proof-of-ownership.REFERENCES 8-bit AVR Instruction Set ATMEL, Revision 0856I-AVR-07/10. ATmega8 Datasheet: 8-bit AVR With 8 k bytes In-System Pro-grammable Flash ATMEL, Revision 2486Z-AVR-02/11. G. T. Becker, W. Burleson, and C. Paar, “Side-channel watermarks forembedded software,” in Proc. IEEE 9th Int. New Circuits and SystemsConf. (NEWCAS), Jun. 2011, pp. 478–481. G. T. Becker, M. Kasper, A. Moradi, and C. Paar, “Side-channel basedwatermarks for integrated circuits,” in Proc. IEEE Int. Symp. Hard-ware-Oriented Security and Trust (HOST), Jun. 2010, pp. 30–35. E. Brier, C. Clavier, and F. Olivier, “Correlation power analysis witha leakage model,” in Proc. Cryptographic Hardware and EmbeddedSystems (CHES), 2004, pp. 16–29. Eighth Annual BSA Global Software Piracy Study. Washington,D.C.: Business Software Alliance, May 2011. C. S. Collberg, A. Huntwork, E. Carter, G. Townsend, and M. Stepp,“More on graph theoretic software watermarks: Implementation, anal-ysis, and attacks,” Inf. Softw. Technol., vol. 51, no. 1, pp. 56–67, Jan.2009.
1154 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012 J.-S. Coron and I. Kizhvatov, “Analysis and improvement of therandom delay countermeasure of CHES 2009,” in Proc. Crypto-graphic Hardware and Embedded Systems (CHES), 2010, pp. 95–109. T. Eisenbarth, T. Kasper, A. Moradi, C. Paar, M. Salmasizadeh, andM. T. M. Shalmani, “On the power of power analysis in the real world:A complete break of the Keeloq code hopping scheme,” CRYPTO, pp.203–220, 2008. A. Gibbs and G. McIntyre, “The diagram, a method for comparing se-quences. Its use with amino acid and nucleotide sequences,” Eur. J.Biochem., vol. 16, no. 1, pp. 1–11, 1970. J. Hamilton and S. Danicic, “An evaluation of static Java bytecode wa-termarking,” in Lecture Notes in Engineering and Computer Science:Proc. World Congress on Engineering and Computer Science 2010,San Francisco, CA, Oct. 2010, pp. 1–8. V. Levenshtein, “Binary codes capable of correcting deletions, in-sertions, and reversals,” Soviet Physics Doklady, vol. 10, no. 8, pp.707–710, 1966. S. Mangard, E. Oswald, and T. Popp, Power Analysis Attacks—Re-vealing the Secrets of Smart Cards. New York: Springer, 2007. A. Moradi, A. Barenghi, T. Kasper, and C. Paar, “On the vulnera-bility of FPGA bitstream encryption against power analysis attacks:Extracting keys from Xilinx Virtex-II FPGAs,” in Proc. ACM Conf.Computer and Communications Security, 2011, pp. 111–124. A. Moradi, M. Kasper, and C. Paar, “On the portability of side-channelattacks—An analysis of the Xilinx Virtex 4 and Virtex 5 bitstream en-cryption mechanism,” IACR Cryptology ePrint Archive, vol. 2011, pp.391–391, 2011. R. Muijrers, J. G. J. van Woudenberg, and L. Batina, “RAM: Rapidalignment method,” in CARDIS, Lecture Notes in Computer Science.New York: Springer, 2011. D. Oswald and C. Paar, “Breaking Mifare DESFire MF3ICD40: Poweranalysis and templates in the real world,” in Proc. Cryptographic Hard-ware and Embedded Systems (CHES), 2011, pp. 207–222. D. Strobel and C. Paar, “An efﬁcient method for eliminating randomdelays in power traces of embedded software,” in Proc. Int. Conf. In-formation Security and Cryptololgy (ICISC), Seoul, Korea, 2011. J. Tarhio and E. Ukkonen, “Approximate Boyer-Moore stringmatching,” SIAM J. Comput., vol. 22, pp. 243–260, Apr. 1993. J. G. J. van Woudenberg, M. F. Witteman, and B. Bakker, “Im-proving differential power analysis by elastic alignment,” CT-RSA,pp. 104–119, 2011. D. Yankov, E. J. Keogh, S. Lonardi, and A. W.-C. Fu, “Dot plots fortime series analysis,” in Proc. Int. Conf. Tools With Artiﬁcial Intelli-gence (ICTAI), 2005, pp. 159–168, IEEE Computer Society. W. Zhu and C. Thomborson, “Algorithms to watermark softwarethrough register allocation,” in Digital Rights Management. Technolo-gies, Issues, Challenges and Systems. New York: Springer, 2006,vol. 3919, Lecture Notes in Computer Science, pp. 180–191. W. Zhu, C. Thomborson, and F.-Y. Wang, “A survey of softwarewatermarking,” in Intelligence and Security Informatics. New York:Springer, 2005, vol. 3495, Lecture Notes in Computer Science, pp.454–458.Georg T. Becker received the B.S. degree in appliedcomputer science and the M.S. degree in IT-securityfrom the Ruhr-University of Bochum, Germany,in 2007 and 2009, respectively. He is workingtoward the Ph.D. degree in electrical and computerengineering at the University of Massachusetts,Amherst.His primary research interest is hardware securitywith a special focus on side-channel analysis, IP pro-tection, and hardware Trojans.Daehyun Strobel received the B.S. degree in appliedcomputer science and the M.S. degree in IT-securityfrom the University in Bochum, Germany, in 2006and 2009, respectively. He is currently working to-ward the Ph.D. degree at the Chair for Embedded Se-curity at the University of Bochum.His research interests include practicalside-channel attacks on embedded systems andside-channel reverse-engineering of embeddedsoftware.Christof Paar (S’92–M’95–SM’05–F’11) holdsthe Chair for Embedded Security at the Universityof Bochum, Germany, and is afﬁliated professorat the University of Massachusetts, Amherst. Hecofounded, with Cetin Koc, the CHES workshopseries. His research interests include highly efﬁcientsoftware and hardware realizations of cryptography,physical security, penetration of real-world sys-tems, and cryptanalytical hardware. He has over150 peer-reviewed publications and is coauthor ofthe textbook Understanding Cryptography. He iscofounder of escrypt–Embedded Security, a leading consultancy in appliedsecurity.Wayne Burleson (S’87–M’89–SM’08–F’11) has de-grees from MIT and the University of Colorado.He is Professor of Electrical and Computer Engi-neering at the University of Massachusetts, Amherst,where he has been on the faculty since 1990. He hassubstantial industry experience in Silicon Valley. Hecurrently directs and conducts research in security en-gineering. Although his primary focus is hardwaresecurity, building on 20 years in the Microelectronicsﬁeld, he also studies higher level issues of system se-curity and security economics, with applications inpayment systems, RFID, and medical devices. He teaches courses in microelec-tronics, embedded systems, and security engineering.Dr. Burleson is a Fellow of the IEEE for contributions in integrated circuitdesign and signal processing.