Watermarking in Source Code: Applications and Security Challenges

Software Watermarking
[1]
Shyamsundar Das
Product Security
Engineer, Xperi.
Scholar, M.Tech, BITS
PILANI, Software
Engineering

Software Protection Overview
Software Watermarking Overview
Static Software Watermarking Algorithms
Attacks on Software Watermarks
Dynamic Software Watermarking
The SANDMARK tool
Conclusion
[2]
Agenda

Software Watermarks & Fingerprints
Embed a unique identifier in a program to trace
software pirates.
Watermarking
1. discourages theft,
2. allows us to prove theft.
Fingerprinting
3. allows us to trace violators.
[3]

Malicious Reverse Engineering
Buy one
copy
Reuse
module
Sell
N Y
X
O
P
M
Q
M
Alice and Bob are competing software developers.
Bob reverse engineers Alice’s program and includes parts of it
in his own code.
Easier with Java bytecode, .NET
, ANDF. . .
obfuscates
[4]
⇒ Alice her code.

Tampering
Extract
media
Modify
container
Cryptolope
Encrypted
media
Software Player
Partial Keys
Signatures
Business Rules
Partial Keys
Codecs
FREE PLAY!
Resell
Alice is a media publisher. She packages her media into a cryptolope.
Bob tampers with the software player to extract the decrypted media.
InterTrust, Intel, IBM, Xerox, Microsoft,. . . .
obfuscates
⇒ Alice , watermarks,
[5]
tamper-proofs the player.

Software Piracy
Resell
Buy one
copy
Make illegal
copies
P
P
P
Alice is a software developer.
Bob buys one copy of Alice’s application and sells copies to
third parties.
watermarks/fingerprints
[6]
⇒ Alice her program.

Software Watermarking
Extract
key
P’
42
P
Embed
key
42
Attack
42
[7]

P’
42
Distortive
Attack 42
P’’
Extract
Semantics−
preserving
transformations
P1
Collusive
Attack
P2
17
42 P’’
Extract
P’
42 Additive
Attack
P’’ 11
23
4219 Extract ?
?
?
[8]

Watermarking Transformations
Naive approaches typically use reordering (of statements,
basic blocks, . . . ) or renaming (of registers, methods, . . . ):
L: X:
REORDER RENAME
More powerful approaches extend program semantics or
alter program statistics:
ALTER
STATS
[9]
EXTEND
SEMANTICS

Semantics-Preserving Attacks
P’
42
Distortive
Attack 42
P’’
Extract
Semantics−
preserving
transformations
Code optimization, decompile-recompile, translation,
code obfuscation,. . . .
Our SANDMARK tool relies on combining sequences of
simple obfuscating and optimizing transformations.
?
[10]

Static watermarking
•Static watermarking embeds ownership information directly into the
software code.
•This information is permanently embedded and cannot be easily removed.
•It is used to track ownership, identify pirated copies, and prevent
unauthorized distribution.

Static watermarking algorithms
•The Bogus Initializer Static Watermarking Algorithm works by inserting
seemingly meaningless code into the software.
•This code, however, contains the watermark information.
•The code is designed to not affect the functionality of the software.

EXTEND
SEMANTICS
— Moskowitz & Cooperman
class Main {
const Picture C =
· · ·
Code R = Decode(C);
Execute(R);
}
A watermarked media
object is embedded in the
program’s static data
segment.
“Essential” parts of the
program are
steganographically
encoded into the media.
If the watermarked image
is attacked, the embedded
code will crash.
US Patent 5,745,569, Jan 1996.
[13]

The SANDMARK tool
Conclusion
[14]

Original Code
, ,
public class C {
static in t gcd ( in t x , in t y ) {
int t ;
while ( true ) {
b = x % y = = 0 ;
boolean
i f ( b ) return y ;
t = x % y ; x = y ; y = t ;
}
}
public static void main ( String [ ] a ) {
System . out . p r i n t ( "Answer : " ) ;
System . out . p r i n t l n ( gcd (100 ,10));
}
}
[15]
z r

Boolean Splitting Obfuscation
, ,
public class C {
stati c int gcd ( int i , i nt j ) {
int t8, t7 , k ;
for ( ; ; ) {
i f ( i%j ==0) { t8=1;t7=0; }
else
i f (
{ t8=0;t7=0; }
(t7ˆt8)!=0 )
return j ;
else {
k= i%j ; i = j ; j =k ;
}
} }
public static void main ( String [ ] Z1 ) {
System . out . p r i n t ( "Answer: " ) ;
System . out . p r i n t l n ( gcd ( 1 0 0 , 1 0 ) ) ; }
[16]
}
z r

Bogus Branch Obfuscation
, ,
public class C {
static
int
int gcd ( int
t9 , t8 , q7 ,
i , int j ) {
q6 , q4 , q3 ;
q7=9;
for ( ; ; ) {
i f ( i%j ==0) { t9 =1; t8 = 0 ; }
q4=t8 ; q6=t9 ;
else { t9 =0; t8 =0;}
i f ( ( q4^q6 ) ! =0)
return j ;
else {
i f ( (((q7+q7*q7)%2!=0)?0:1)!=1 ) return 0 ;
q3= i%j ; i = j ; j =q3 ;
}
} }
public static void main ( String [ ] Z1 ) {
System . out . p r i n t ( "Answer: " ) ;
[17]
}
z r

String Encoding Obfuscation
, ,
public class C {
static int gcd ( int i , int j ) {
/ / As before
}
public static void main ( String [ ] a ) {
System . out . p r i n t (
Obfuscator.DecodeString( / / Rename
[18]
t h i s !
"u00ABu00CDu00ABu00CD"+
"uFF84u2A16u5D68u2AA0"+
"u388Eu91CFu5326u5604"));
}
z r

Collusion Protection by Obfuscation
P2’
17
42
P1’
P1
42
P2
17
Obfuscate
Key1
Obfuscate
Key2
Obfuscation can also be used to
collusive attacks.
Collusive
Attack ?
protect
[19]
against

Collusion Protection by Obfuscation
, ,
public class C {
s ta t ic Object get0 ( Object [ ] I ) {
Inte ger K , J , M, N ; in t r , q , j ; K=new Integ er ( 9 ) ;
j=2; j=60-(j+1); ++j; j=60-j;
for ( ; ; ) {
i f ( ( ( Integ er ) I [ 0 ] ) . intV alue ()% (( In te ger ) I [ 1 ] ) . intV alue ()== 0)
{ r = 1 ; q = 0 ; } else { r = 0; q = 0;}
M=new In teg er ( q ) ; J=new Intege r ( r ) ;
i f ( (M. intV alue ( ) ^ J . intValue ( ) ) ! = 0 )
return new Intege r ( ( ( Intege r ) I [ 1 ] ) . intV alue ( ) ) ;
else {
i f ( ( ( ( K. intV alue () +K . intV alue () ∗K . i ntV alue ()) % 2 != 0) ?0:1)!= 1)
return new Inte ger ( 0 ) ;
N=new Integ er ( ( ( Integ er ) I [ 0 ] ) . intV alue ()%
( ( Integ er ) I [ 1 ] ) . intV alue ( ) ) ;
I [0]=new Inte ger ( ( ( In teg e r ) I [ 1 ] ) . intV alue ( ) ) ;
I [1]=new Inte ger (N. intValue ( ) ) ;
} } }
public s ta t ic void main ( S tr i n g [ ] Z1 ) {
int j=2; int i=2; i=80-(i+1); j=80-(j+1);
System . out . p r i n t ( ( S tr i ng ) Obfuscator . get 0 (new Object [ ] {
( S tr i n g )new Object [ ] { " S tr i n g as before " } [ 0 ] } ) ) ;
[20]
++i; i=80-i; ++j; j=80-j;
System . out . p r i n t l n ( ( ( Intege r ) get0 (
new Object [ ] { ( Inte ger )new Object [ ] {
new Inte ger ( 1 0 0 ) , new Inte ger ( 1 0 ) } [ 0 ] ,
( Integ er )new Object [ ] {
new Inte ger ( 1 0 0 ) , new Integ er ( 1 0 ) } [ 1 ]
} ) ) . intValue ( ) ) ;
} }
z r

Dynamic watermarking
•Dynamic watermarks are embedded during program
execution.
•Specific events or conditions trigger the watermarking
process.
•The watermark information can be encoded in various
aspects of the execution state, such as:
• Variable values
• Data structure organization
• Control flow paths

Dynamic watermarking algorithms
•Dynamic watermarks are embedded during program
execution.
•Specific events or conditions trigger the watermarking
process.
•The watermark information can be encoded in various
aspects of the execution state, such as:
• Variable values
• Data structure organization
• Control flow paths

Types of Dynamic algorithm
Various algorithms are:-
• Execution path watermarking algorithm
• Arboit watermarking algorithm
• Collberg Thomberson watermarking algorithm
• Easter egg watermarking algorithm

Static vs. Dynamic Watermarking
Static
Embed
Static
Extract
key
P
w
PJ
w
Static
key
algorithms are vulnerable to
semantics-preserving code transformations.
Dynamic
Embed
Dynamic
Extract
P
w
PJ w
Dynamic
[24]
I 1 , ···, I k I 1 , ···, I k
algorithms extract the mark from the state of
the program when run on a secret key input sequence.

Collberg Thomberson Watermarking
algorithm
•The Collberg Thomborson technique leverages dynamic
data structures for watermarking.
•It constructs a hidden data structure within the
program's memory during execution.
•This data structure encodes the watermark information.

CT algorithm implementational
techniques
•Watermark Embedding: During program execution,
specific events trigger the creation of the hidden data
structure.
•Watermark Encoding: The watermark information
(ownership, license details, etc.) is encoded within the
data structure.
•Data Structure Manipulation: The data structure is
manipulated subtly to embed the watermark without
affecting program functionality.
•Watermark Verification: A separate program
(watermark decoder) can extract the watermark
information from the hidden data structure to verify
ownership or identify tampering attempts.

EXTEND
SEMANTICS
— Collberg-Thomborson
Heap
Control Flow
n
Build G1
Build G2
I 1 , ···, I k
The watermark is embedded in the topology of a
dynamic graph structure, built at runtime but only for the
special input sequence I 1, · · · , I k .
Shape-analysis
[27]
Why? is hard.
ACM Principles of Programming Languages, POPL’99

CT — Example
, ,
public class Simple {
static void P( String i ) {
System . out . p r i n t l n ( " Hello " + i ) ;
}
public static void main ( String args [ ] ) {
P( args [ 0 ] ) ;
}
}
z r
⇓
, ,
class Watermark extends java . lang . Object {
public Watermark edge1 , edge2 ;
[28]
}
z r
⇓

CT — Example. . .
, ,
public class Simple_W {
static void P( String i , Watermark n2 ) {
i f ( i . equals ( " World " ) ) {
Watermark n1 = new Watermark();
n4.edge1 = n1; n1.edge1 = n2;
Watermark n3 = (n2 != null)?n2.edge1:new Watermark();
n3.edge1 = n1;
}
System . out . p r i n t l n ( " Hello " + i ) ; }
args [ ] ) {
public static void main ( String
n2.edge1 = n3; n2.edge2 = n3;
P( args [ 0 ] , n2 ) ;
[29]
}
z r

The SANDMARK tool
Conclusion
2
[30]

SANDMARK — A Software Protection Tool
2
[31]

A Session with SANDMARK
"WILDCATS"
⇓
Embed Watermark
ORIG.jar
NEW.jar
Select
Algorithm
Configure
Obfuscate
⇒
⇒
⇒
We obfuscate to protect against reverse engineering
and collusive de-watermarking attacks.
2
[32]

Recognize Watermark
NEW.jar ⇒"WILDCATS"
⇒
We
2
[33]
extract the watermark to prove ownership.

NEW.jar
located?
⇒ Watermark
Compare Bytecodes
Compute Static Statistics
View/Sort Bytecodes
⇒
manual attack
2
[34]
To simulate a we examine the
obfuscated/watermarked program using various static
analysis tools.

ATTACKED.jar
⇓
Recognize Watermark
NEW.jar
Select
Algorithm
Configure
Obfuscate
⇒
⇒
T
o simulate an
2
[35]
automatic attack
⇓
Watermark destroyed?
we use SANDMARK’s
obfuscators (“SoftStir”) to attack the watermark.

Some other tools
[36]
2
Language Potentials tools Notes
C/C++ StegFS, Watermarking for C/C++
- StegFS might be a general steganography tool, potentially adaptable for watermarking. - Investigate if
"Watermarking for C/C++" is a specific tool or a generic description.
Java
LunaJava- JWatermark, Java Watermarking Tool, JDWP, Allatori,
Sandmark
- JWatermark and Java Watermarking Tool could be specific tools or generic descriptions. - JDWP focuses on
debugging, but might have watermarking capabilities (investigate further). - Allatori and Sandmark are
confirmed tools mentioned previously.
Python PyWatermark - Seems like a specific tool (PyWatermark) for Python watermarking.
JavaScript JSSP, JSMin, UglifyJS
- JSSP is unclear. Investigate if it's for watermarking or something else. - JSMin and UglifyJS are minifiers, not
watermarking tools.
MATLAB MatWater - Seems like a specific tool (MatWater) for MATLAB watermarking.
PHP PHPWatermark, PHPWatermarkingTool - Similar to Java, these might be specific tools or generic descriptions (investigate further).
Ruby RubyWater - Seems like a specific tool (RubyWater) for Ruby watermarking.
Swift SwiftyWatermark - Seems like a specific tool (SwiftyWatermark) for Swift watermarking.

Conclusion
Many interesting problems left to work on!
Formal models of attack and stealth.
Combining error correction and tamper-proofing.
Watermarking other languages.
Download from sandmark.cs.arizona.edu.
2
[37]

Watermarking in Source Code: Applications and Security Challenges

Recommended

Recommended

More Related Content

Similar to Watermarking in Source Code: Applications and Security Challenges

Similar to Watermarking in Source Code: Applications and Security Challenges (20)

Recently uploaded

Recently uploaded (20)

Watermarking in Source Code: Applications and Security Challenges

Editor's Notes