• Save
Phd T H E S I Sproposal
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Phd T H E S I Sproposal

Uploaded on

With computers having GHz of processing speed, information / data either stored or in ...

With computers having GHz of processing speed, information / data either stored or in
transmission has become more and more vernalable to hostile eavesdropping, theft,
wiretapping etc. This urges us to devise new data hiding techniques to protect and secure data
of vital significance. Steganography is a method of securing data by obscuring the contents in
another media (called Cover) in which it is saved / transmitted. This doctorial thesis proposal will
present a new Steganographic Technique for hiding data in (ASCII) text files together with its
Software implementation, a research area in Steganography which is considered as
toughest among all, to address.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 1

http://www.slideshare.net 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Table of Contents Abstract ……………………………………….……… 2 Motivation ……………………………………………… 3 History ……………………………………………… 3 Principals ……………………………………………… 4 Techniques ……………………………………………… 5 Related Work ……………………………………………… 7 Limitation of Existing Techniques ……………………. 18 Proposed Solution ……………………………………… 18 Application ……………………………………………… 18 References ……………………………………………… 19
  • 2. Abstract Mark Owens [9] !quot;# With computers having GHz of processing speed, information / data either stored or in transmission has become more and more vernalable to hostile eavesdropping, theft, wiretapping etc. This urges us to devise new data hiding techniques to protect and secure data of vital significance. Steganography is a method of securing data by obscuring the contents in another media (called Cover) in which it is saved / transmitted. This doctorial thesis proposal will present a new Steganographic Technique for hiding data in (ASCII) text files together with its Software implementation, a research area in Steganography which is considered as toughest among all, to address.
  • 3. $ Motivation While Net surfing, I encountered an on-line article in the USA Today titled “Terror groups hide behind Web encryption” claiming (though not yet publicized evidence exist) terrorists may be using steganography to communicate with each other in planning terrorist attacks, that twigged my interest for evolving a new concealment technique. It is intuited that images with hidden messages have ideal cover on bulletin boards or dead drops for other terrorists to pick up and resolve. History Steganography dates back to ancient Greece when etching messages or images in wooden tablets and covering them with wax, and tattooing a shaved messenger's head, letting the hair grow back, and then shaving the head again to read the message were common practices. Early in WWII steganographic technology consisted almost exclusively of invisible inks. Sources for invisible inks include milk, vinegar, fruit juices and urine that darken when heated. The following message was sent by a German spy during WWII: Apparently neutral's protest is thoroughly discounted and ignored. Isman hard hit. Blockade issue affects pretext for embargo on by products, ejecting suets and vegetable oils. Taking the second letter in each word the following message emerges: Pershing sails from NY June 1. When invisible inks became easy to decode through improved technology, null ciphers were used. Null ciphers are unencrypted messages that are indiscernible in innocent sounding messages. An example of such a message is:
  • 4. % Fishing freshwater bends and saltwater coasts rewards anyone feeling stressed. Resourceful anglers usually find masterful leapers fun and admit swordfish rank overwhelming anyday. Taking the third letter in each word the following message emerges: Send Lawyers, Guns, and Money. The Germans developed the microdot technology during WWII. Microdots are text or photographic images that are shrunk down to the size and shape of a period or the dot of an i or j. Microdots were usually sent by writing a letter containing periods, i's, or j's, and the intended recipient could read the messages using a microscope. Because of the extremely small size of the microdots the messages typically went unnoticed by inspectors. A steganographic message generally appears to be something else, like an article or a picture, or some other quot;coverquot; message. Drawings have often been used to conceal information since it is easy to encode a message by varying lines, colors or other elements in pictures. This tutorial will focus on image files to hide text messages. Principals: Steganography can be split into two types, these are Fragile and Robust. The following section describes the definition of these two different types of steganography. Fragile Fragile steganography involves embedding information into a file which is destroyed if the file is modified. This method is unsuitable for recording the copyright holder of the file since it can be so easily removed, but is useful in situations where it is important to prove that the file has not been tampered with, such as using a file as evidence in a court of law, since any tampering would have removed the watermark. Fragile steganography techniques tend to be easier to implement than robust methods.
  • 5. & Robust Robust marking aims to embed information into a file which cannot easily be destroyed. Although no mark is truly indestructible, a system can be considered robust if the amount of changes required to remove the mark would render the file useless. Therefore the mark should be hidden in a part of the file where its removal would be easily perceived. There are two main types of robust marking. Fingerprinting involves hiding a unique identifier for the customer who originally acquired the file and therefore is allowed to use it. Should the file be found in the possession of somebody else, the copyright owner can use the fingerprint to identify which customer violated the license agreement by distributing a copy of the file. Unlike fingerprints, Watermarks identify the copyright owner of the file, not the customer. Whereas fingerprints are used to identify people who violate the license agreement watermarks help with prosecuting those who have an illegal copy. Ideally fingerprinting should be used but for mass production of CDs, DVDs, etc it is not feasible to give each disk a separate fingerprint. Watermarks are typically hidden to prevent their detection and removal, they are said to be imperceptible watermarks. However this need not always be the case. Visible watermarks can be used and often take the form of a visual pattern overlaid on an image. The use of visible watermarks is similar to the use of watermarks in non-digital formats (such as the watermark on British money). Techniques: Information hiding techniques are receiving much attention today. The main motivation for this is largely due to fear of encryption services getting outlawed, and copyright owners who
  • 6. ' want to track confidential and intellectual property copyright against unauthorized access and use in digital materials such as music, film, book and software through the use of digital watermarks. A Steganographic System: f E: steganographic function quot;embeddingquot; fE-1: steganographic function quot;extractingquot; cover: cover data in which emb will be hidden emb: message to be hidden key: parameter of fE stego: cover data with the hidden message A Graphical Version of the Steganographic System: Steganographic messages may first be encrypted and then a cover message is modified to contain the encrypted message, resulting in stego text. Only those who know the technique used can recover the message and, if required, decrypt it.
  • 7. ( The message may be a few thousand bits (often at 7 or 8 bits per text character) embedded in millions of other bits. Probably the most typical use is digital images. Digital images are commonly stored in either 24-bit or 8-bit files. If an 8-bit image is viewed as a grid and the grid is made up of cells, these cells are called pixels. Each pixel consists of an 8-bit binary number (or a single byte), and each 8-bit binary number refers to the color palette (a set of colors defined within the image). All color variations for the pixels are derived from three primary colors: red, green, and blue. Each primary color is represented by 1 byte (= 8 bits). Digital watermarking technology is viewed as quot;an enabling agent allowing more widespread sharing and use of that content while decreasing worry over piracy”. Today steganography is often used for digital watermarking to hide copyright or ownership information in an image, movie, or audio file. A copyright holder can pull the hidden copyright or ownership information out of a suspect file to prove it is stolen. Digital watermarking is not used for authenticating documents. (Digital signatures perform this task.) A digital watermark refers to the ability to unobtrusively include information in a file, and is commonly executed through a variety of cryptographic techniques, collectively known as steganography. Algorithms and transformations: Another steganography technique is to hide data in mathematical functions that are in compression algorithms. The idea is to hide the data bits in the least significant coefficients. Other techniques of steganography include spread spectrum steganography, statistical steganography, distortion, and cover generation steganography. Related Work (Text Techniques) While it is very easy to tell when you have committed a copyright infringement by photocopying a book, since the quality is widely different, it is more difficult when it comes to
  • 8. * electronic versions of text. Copies are identical and it is impossible to tell if it is an original or a copied version. To embed information inside a document we can simply alter some of its characteristics. These can be either the text formatting or characteristics of the characters. You may think that if we alter these characteristics it will become visible and obvious to third parties or attackers. The key to this problem is that we alter the document in a way that it is simply not visible to the human eye yet it is possible to decode it by computer. + Figure above, shows the general principle in embedding hidden information inside a document. Again, there is an encoder and to decode it, there will be a decoder. The codebook is a set of rules that tells the encoder which parts of the document it needs to change. It is also worth pointing out that the marked documents can be either identical or different. By different, we mean that the same watermark is marked on the document but different characteristics of each of the documents are changed. Line Shift Coding Protocol In line shift coding, we simply shift various lines inside the document up or down by a th small fraction such as 1/300 of an inch) according to the codebook. The shifted lines are undetectable by humans because it is only a small fraction but is detectable when the computer measures the distances between each of the lines. Differential encoding techniques are normally used in this protocol, meaning if you shift a line the adjacent lines are not moved.
  • 9. , These lines will become a control so that the computer can measure the distances between them. By finding out whether a line has been shifted up or down we can represent a single bit, 0 or 1. And if we put the whole document together, we can embed a number of bits and therefore have the ability to hide large information. Word Shift Coding Protocol The word shift coding protocol is based on the same principle as the line shift coding protocol. The main difference is instead of shifting lines up or down, we shift words left or right. This is also known as the justification of the document. The codebook will simply tell the encoder which of the words is to be shifted and whether it is a left or a right shift. Again, the decoding technique is measuring the spaces between each word and a left shift could represent a 0 bit and a right bit representing a 1 bit. The quick brown fox jumps the lazy dog. - ./ 0 Line Shift Coding Protocol In this example the first line uses normal spacing while the second has had each word shifted left or right by 0.5 points in order to encode the sequence 01000001 that is 65, the ASCII character code for A. Without having the original for comparison it is likely that this may not be noticed and the shifting could be even smaller to make it less noticeable. Feature Coding Protocol In feature coding, there is a slight difference with the above protocols, and this is that the document is passed through a parser where it examines the document and it automatically builds a codebook specific to that document. It will pick out all the features that it thinks it can use to hide information and each of these will be marked into the document. This can use a number of different characteristics such as the height of certain characters, the dots above i and
  • 10. 1 j and the horizontal line length of letters such as f and t. Line shifting and word shifting techniques can also be used to increase the amount of data that can be hidden. White Space Manipulation One way of hiding data in text is to use white space. If done correctly, white space can be manipulated so that bits can be stored. This is done by adding a certain amount of white space to the end of lines. The amount of white space corresponds to a certain bit value. Due to the fact that in practically all text editors, extra white space at the end of lines is skipped over, it won’t be noticed by the casual viewer. In a large piece of text, this can result in enough room to hide a few lines of text or some secret codes. A freely available program which uses this technique is named “SNOW”. Text Content Another way of hiding information is to conceal it in what seems to be inconspicuous text. The grammar within the text can be used to store information. It is possible to change sentences to store information and keep the original meaning. TextHide is a program, which incorporates this technique to hide secret messages. A simple example is: Changed to: 2 - 3 Another way of using text itself is to use random words as a means of encoding information. Different words can be given different values. Of course this would be easy to spot but there are clever implementations, such as SpamMimic which creates a spam email that contains a secret message. As spam usually has poor grammar, it is far easier for it to escape notice. The following extract from a spam email encodes the phrase 45
  • 11. Dear Friend , Especially for you - this red-hot intelligence . We will comply with all removal requests . This mail is being sent in compliance with Senate bill 2116 , Title 9 ; Section 303 ! THIS IS NOT A GET RICH SCHEME . Why work for somebody else when you can become rich inside 57 weeks . Have you ever noticed most everyone has a cellphone & people love convenience. Well, now is your chance to capitalize on this . WE will help YOU SELL MORE and sell more! You are guaranteed to succeed because we take all the risk ! But don't believe us . Ms Simpson of Washington tried us and says quot;My only problem now is where to park all my carsquot; . This offer is 100% legal. You will blame yourself forever if you don't order now ! Sign up a friend and you'll get a discount of 50%. Thank-you for your serious consideration of our offer . Dear Decision maker; Thank-you for your interest in our briefing . If you are not interested in our publications and wish to be removed from our lists, simply do NOT respond and ignore this mail ! This mail is being sent in compliance with Senate bill 1623 ; Title 6 ; Section 304 ! THIS IS NOT A GET RICH SCHEME ! Why work for somebody else when you can … A very basic form of steganography makes use of a cipher. A cipher is basically a key which can be used to decode some data to retrieve a secret hidden message. Sir Francis Bacon th created one in the 16 Century using messages with two different type faces, one bolder than the other. By looking at the positions of the bold characters in relation to the rest of the text, a secret message could be decoded. There are many other different ciphers which could be used to the same effect. XML XML is becoming a widely used standard for data exchange. The format also provides plenty of opportunities for data hiding. This is important for verifying documents to see if they have been altered and also for copyright reasons. You can embed a code for example, which can be traced back to the source. A method for hiding information in XML comes courtesy of the University of Tokyo. Many different files can exist when XML is used. There is the XML file itself but there can be transformation files (.xsl), validation files (.dtd) and style files (.css). All of these files can be
  • 12. used to hide data but the main XML file is usually the best due to its larger size. This technique concentrates on just the XML file, more elaborate techniques could use a combination of all four files to increase robustness. One way of hiding data in XML is to use the different tags as allowed by the W3C. For example both of these image tags are valid and could be used to indicate different bit settings Stego key: <img></img> -> 0 <img/> -> 1 In this way a piece of XML like the following could be used to encode a simple bit string. Stego data: <img src=”foo1.jpg”></img> <img src=”foo2.jpg”/> <img src=”foo3.jpg”/> <img src=”foo4.jpg”/> <img src=”foo5.jpg”></img> The XML data in this case stores the bit strings 101100 and 010011. Other ways of storing data include using the order in which attributes or elements appear. For example, assigning the combination of element A followed by element B the bit value of 1 while if A is followed by some element C, it would be assigned the value of 0. Hiding data using the scheme outlined above would be pretty easy. In the case of using white space, a simple text manipulation program could be used to add the spaces and then a reader could be created to parse the XML and retrieve the hidden data. The same is true for the usage of different tags. The structure of elements would be a little more difficult as changing elements could have an adverse impact on the way the XML is displayed but if cleverly designed, this could be overcome. In this example the containment of elements is used: <favorite><fruit>SOMETHING</fruit></favorite> -> 0
  • 13. $ <fruit><favorite>SOMETHING</favorite></fruit> -> 1 In this example the order of the elements is used: <user><name>NAME</name><id>ID</id></user> -> 0 [2] <user><id>ID</id><name>NAME</name></user> -> 1 Microsoft Soft Office Suit A great deal of research has been accomplished in the area of hiding data in text, image, or audio files. There does not seem to be a lot of research in the area of hiding data inside unused space. The only related work found is by Eric Cole in his book “Hiding Data in Plain Sight” where he gives several examples of how to hide data in various file structures, including the properties section of Word documents. In the world of spy vs. spy, covert communication, or steganography, is not a new concept. This ancient art has been used in many ways and in many mediums and has not been ignored in this century with the bits and bytes of the computerized world. Many methods have been found for hiding covert messages and data in computer files. One only has to search the Internet for steganography, or stego for short, to find multiple freeware utilities that will allow even a novice computer user to create files with hidden communications. However, where there is a desire to hide communication, there is also a desire to detect that communication. For this reason, there are also tools available online to detect covert data in image files. How dangerous is a hiding place that everyone knows about? What if someone sending covert data used file types less commonly used for steganography such as MS Word documents? Would that communication escape notice? Can these files even carry a covert message? With the large amount of traffic that traverses networks daily it is impossible for any single administrator or investigator to examine all data. When examining network traffic a system administrator is limited to the traffic they consider suspicious or dangerous. A system administrator must know the normal traffic across their network and investigate when something
  • 14. % odd occurs. There are a large number of programs today that will hide data in image or audio files. Therefore, data could be stored inside one of these and sent across the network decreasing suspicion. However, what if, instead of pictures, someone sends a Word document. Then they send a Power Point presentation followed by any number of common office documents. This varying of file types would create less suspicion by appearing to be normal traffic. Can these files carry covert information? Yes, they contain meta-data and unused bits that can be replaced without obvious effect. The programs mentioned above that hide data in images perform steganography. There are numerous, well-published ways to use steganography in the hiding of information in image and audio files. However, a lesser considered area is the simple hiding of information inside common office files. These spaces are not well-known or well-documented. They can be used relatively easily to hide data and using them decreases suspicion as stated above. Also, using these spaces with bit substitution keeps the original file size. This reduces the chance for automated detection or analysis. For these reasons and more, these spaces should be made aware to investigators. Unused Space and Meta-data Defined Some files contain readily available spaces that can be used inside their file structures. One possible example could be meta-data, data about data. Meta-data is ingrained in file structures but not visible to the user without special tools. Some files also have unused space. They contain bits that can be overwritten without any adverse or obvious effect on the file. These spaces are not visible to the average user because they are ignored when the files are opened. These spaces can be seen when examined at the byte level, something few users would do. These spaces create an opportunity to hide covert data. This paper shows the results of examining several common office files to see if they have these spaces and whether or not
  • 15. & they could be used to hide data. It is not our intent to suggest their use, but rather to document their existence as a vulnerability and possible data leakage point. The Experiments and General Observations The first sets of tests were run on the Microsoft Office documents: Word, Excel, and Power Point. Next html and email files were examined. Finally, compressed files were tested. Each file type was put through the same set of tests. The presence or absence of meta-data and unused space was immediately obvious in all file types. It was most prevalent in Microsoft Word. This file type not only kept metadata but also contained history information about the document. It contained such things as who created it, and where it was printed. Along with these meta-data sections, large groups of the repeated hex value FF or 00 were noticed in some file types. These spaces were ideal for hiding data. For each file type, several files of different sizes were examined to determine if these spaces were constant. The spaces seem to be more dependent on the version used to create the file than on the file contents. Replacing these spaces with our data was accomplished but the data could not be inserted in this area without noticeable side effects. Inserting data changes the length of the file and the format of the file structure, so once the file is saved it cannot be opened without error messages. Sometimes, it could not be opened at all. Therefore, inserting the data is easily done and possible but it corrupts the file in the process. This held true for all the file types that did not consist of plain text like web pages. Data inserted at the end of the file did not cause this effect but did affect the file size, which could help identify that file as containing hidden data. Each file type was tested to see if data could be hidden at the end of the file, after the end of file pointer. All proved susceptible to this technique except html and email files. Data in either place proved to be volatile. Once anyone opens and saves the document, the hidden data is destroyed. Now details concerning each one of the file types will be discussed.
  • 16. ' Results by File Type Word documents were the first to be tested. 780 bytes of repeated values were discovered and utilized to hide data. Excel files were examined next. The findings were similar to those of Word, however, Excel had fewer spaces in which to hide data. The largest continuous block was approximately 420 bytes found just below the header. Finally Power Point files were examined. The results were the same as the Excel files, except they did seem to have more of the smaller hiding places. In Word the plain text was obvious. In Excel the numbers could be seen. Power Point was not so obvious making searching for hiding places harder. In summary Microsoft Office files provided many opportunities for hiding data. Inserting data caused the file to become corrupt, but they had plenty of unused space that could be written over. This could be avoided by inserting data at the end of the file. Another peculiarity was the need to avoid the area where Microsoft stores its file property information. This area had to be avoided to prevent others from easily viewing the hidden data. This was discussed in which provided source code for a program that could be used to hide data in this spot. Other than this limitation, the inserted data was not apparent and was stable as long as the file was not altered or saved. Web files were tested next. Html and email files are actually no more than text files that are interpreted by another program. Text files have no headers and no unused space. There are ways to hide data in text, but there are no data hiding vulnerabilities in the file structures of a simple text file that we are aware of. However, web pages contain areas that are ignored during web page creation. There is no real unused space to hide data in, but these ignored areas create meta-data hiding opportunities. Web browsers also ignore commands they see as errors, so data can be hidden by placing it inside the symbols “<>.” These methods have a draw-back. Web browsers normally contain the option to “view source.” This is not an often used tool but it allows any user to view the hidden text with ease.
  • 17. ( The data could be encrypted or made to look like meta-data using a grammar-based substitution technique but its presence could still be easily detected. Email files proved to be similar to html files. They are also plain text files that are interpreted by other programs. Emails contain information about each server that the email traveled through. Data can easily be hidden here by mimicking this server information. Simply insert the data following the word “Received:”. Most email programs today would not display this information by default. Just as in html/htm documents, one has only to view source or open the file in a text editor to see the hidden data. In summary, web files could be used to hide data easily, but the ease of use is balanced by the ease of discovery. When dealing with electronic transfer where space must be conserved, it would not be uncommon to see compressed files, such as WinZip. Therefore compressed files were studied next. Due to the nature of these files, they are not as vulnerable to hiding. One function of a compression algorithm is to look for long strings of redundant bytes and transform them into smaller strings that represent them. Therefore, the long strings of repeated values being used to hide data here would have been reduced or eliminated. However, because of the commonality of these files, tests were run to confirm this. Data was successfully added after the end of file marker, but there were no unused spaces inside them to use for hiding data. It was also noted that compressing a file with hidden data and then uncompressing it did not affect the hidden data. In addition while the file was compressed the hidden data was not readable with the hex editor. The compressed files containing hidden data were larger than the uncompressed files because of the reduction of the redundant bits when the substitution of hidden data was done. This could possibly be a red flag for hidden data if the reduction ratios of files were used to check file sizes. [1]
  • 18. * Limitations of Existing Text based Steganographic Techniques Following are the major drawbacks in the above cited techniques: Data hidden in .doc files is lost when saved in PDF/ASCII – Text format etc. Increase / Decrease in line / word spacing is eye-catching, and so is the separation of words / lines with extra spaces. Placing extra spaces at the end of a sentence can go un-noticed except if one selects a page or an entire document for copy etc., where the extra spaces become prominent. Adding spaces past end of file mark can create doubts because of increased file-length. Proposed Solution Till today, no known Text-based data hiding technique exist that can hide information without increasing / decreasing document length and / or altering the text appearance. The proposed thesis is aimed at evolving a coding technique that will hide data within actual contents of the Text file, used as cover, taking care of all of the existing drawbacks in Text- based Steganographic Systems, dully supported by a complete software solution. This will eradicate the possibility of losing hidden data at the time of compression or conversion of the text to “pdf” file format. In addition, any one in possession of the actual cover will not find a change in the contents and layout of the stego-text document on comparison. APPLICATION: This technique can best be applied on web pages for un-noticed global interaction, where the entire concentration is primarily focused on images and text spacing. A real time demonstration of this fact will also be given.
  • 19. , References 6. 47 + 4 8 ! 2 + 9: 8 : 2. 9;; ;< (6 00 ;! '1(; Steganography And Digital Watermarking, 2004 Jonathan Cummins, Patrick Diskin, Samuel 3. Lau and Robert Parlett,School of Computer Science, The University of Birmingham. ! quot;#$$%& 7 ' 8 9: 4 8 ( ) $ $*) #$$% #$%##$ + ! ,- . ' / 0/ 04 666 !2 = quot;# 6> 82 !4 ? 6 + $12**%)$%)3 2 $$ 4 #$$% ' ' 5' ( 5' 7 67 8. 9 / /: 0 9 #$$2 8 7 = @9: 7 =@ 2 . 5 ; '< = . 2 9quot; & 2 1 > - **> > ? @ '? < 8 %+ %( - % %1%9 **9 * ? @ '? < 8 %* %( : + 219> #$$$ $ A ? B/ . ! 6 A =, > ' + 7 4 @ ,/ %21 $ = = 9;; ; A; ;8 A ,'A A 0 **9 ;. 5 <. C/ 4 7 3: 8 =/ ' >2 2 $9#1 $2> < *** + 7 ! 6 - quot; .+ # D E E D D D! quot; E! # $# quot; % & #' ( &#) quot; ' ## * +% # !# # & % , :A / * #$$# . ) ' . % F: + ! ! 8 ' 6 G! ' 9;; ; .7 ;; + / , ! ?-. =6/H = @ , 9;; ;B ; ; C 3/ 8 #$$# 9 </ ; <I / F6. 8 98 # ' ;0 **>-#91% // ; ; 1' 2 ;</ < / <I F8 ! 9 = '= 4 7 4 6/ 5 ' 1651**1 $1 ;< : 0 ***