A zero text watermarking algorithm based on the probabilistic patterns


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A zero text watermarking algorithm based on the probabilistic patterns

  1. 1. INTERNATIONALComputer VolumeOF COMPUTER ENGINEERING International Journal of Engineering and Technology (IJCET), ISSN 0976- JOURNAL 4, Issue 1, January- February (2013), © IAEME 6367(Print), ISSN 0976 – 6375(Online) & TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 1, January- February (2013), pp. 284-300 IJCET© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEMEwww.jifactor.com A ZERO TEXT WATERMARKING ALGORITHM BASED ON THE PROBABILISTIC PATTERNS FOR CONTENT AUTHENTICATION OF TEXT DOCUMENTS 1 2 3 Fahd N. Al-Wesabi , Adnan Z. Alsakaf , Kulkarni U. Vasantrao 1 PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA, 2 Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen, 3 Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg.and Tech., Maharashtra, INDIA. ABSTRACT In the study of content authentication and tamper detection of digital text documents, there are very limited techniques available for content authentication of text documents using digital watermarking techniques. A novel intelligent text zero-watermarking approach based on probabilistic patterns is proposed in this paper for content authentication and tamper de- tection of text documents. Based on the Markov model for English text analysis algorithmsfor the watermark generation and detection was designed in this paper. In the proposed approach, Markov model of order one and letter-based was constructed for content authentication and tamper detection of English text documents. Theprobabilistic pattern features of text contents, were utilized theseto generate the watermark. However, we can extract this watermark later using extraction and detection algorithm to identify the status of text document such as au- thentic, or tampered. The proposed approachis implemented using PHP programming lan- guage. Furthermore, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The accuracytampering detection is com- pared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experimental datasets. Results show that the proposed ap- proachis more secure as it always detects tampering attacks occurred randomly on text even when the tampering volume is low or high. Keywords: Digital watermarking, Markov Model, order one, letter-Level, probabilistic pat- terns, information hiding, content authentication, tamper detection. 284
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEI. INTRODUCTION With the increasing use of internet, e-commerce, and other efficient communicationtechnologies, the copyright protection and authentication of digital contents, have gainedgreat importance. Most of these digital contents are in text form such as email, websites,chats, e-commerce, eBooks, news, and short messaging systems/services (SMS) [1]. These text documents may be tempered by malicious attackers, and the modified datacan lead to fatal wrong decision and transaction disputes [2]. Content authentication and tamper detection of digital image, audio, and video has beenof great interest to the researchers. Recently, copyright protection, content authentication, andtamper detection of text document attracted the interest of researchers. Moreover, during thelast decade, the research on text watermarking schemes is mainly focused on issues of copy-right protection, but gave less attention on content authentication, integrity verification, andtamper detection [4]. Various techniques have been proposed for copyright protection, authentication, andtamper detection for digital text documents. Digital Watermarking (DWM) techniques are con-sidered as the most powerful solutions to most of these problems. Digital watermarking is atechnology in which various information such as image, a plain text, an audio, a video or acombination of all can be embedded as a watermark in digital content for several applicationssuch as copyright protection, owner identification, content authentication, tamper detection,access control, and many other applications [2]. Traditional text watermarking techniques such as format-based, content-based, andimage-based require the use of some transformations or modifications on contents of textdocument to embed watermark information within text. A new technique has been proposednamed as a zero-watermarking for text documents. The main idea of zero-watermarkingtechniques is that it does not change the contents of original text document, but utilizes thecontents of the text itself to generate the watermark information [13]. In this paper, the authors present a new zero-watermarking technique for digital textdocuments. This technique utilizes the probabilistic nature of the natural languages, mainlythe first order Markov model. The paper is organized as follows. Section 2 provides an overview of the previous work doneon text watermarking. The proposed generation and detection algorithms are described in detailin section 3. Section 4 presents the experimental results for the various tampering attacks suchas insertion, deletion and reordering. Performance of the proposed approach is evaluated bymultiple text datasets. The last section concludes the paper along with directions for futurework.II. PREVIOUS WORK Text watermarking techniques have been proposed and classified by many literaturesbased on several features and embedding modes of text watermarking. We have examinedbriefly some traditional classifications of digital watermarking as in literatures. These tech-niques involve text images, content based, format based, features based, synonym substitu-tion based, and syntactic structure based, acronym based, noun-verb based, and many othersof text watermarking algorithms that depend on various viewpoints [1][3][4]. 285
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEA. Format-based Techniques Text watermarking techniques based on format are layout dependent. In [5], proposedthree different embedding methods for text documents which are, line shift coding, word shiftcoding, and feature coding. In line-shift coding technique, each even line is shifted up or downdepending on the bit value in the watermark bits. Mostly, the line is shifted up if the bit is one,otherwise, the line is shifted down. The odd lines are considered as control lines and used atdecoding. Similarly, in word-shift coding technique, words are shifted and modify the inter-word spaces to embed the watermark bits. Finally, in the feature coding technique, certain textfeatures such as the pixel of characters, the length of the end lines in characters are altered in aspecific way to encode the zeros and ones of watermark bits. Watermark detection process isperformed by comparing the original and watermarked document.B. Content-based Techniques Text watermarking techniques based on content are structure-based natural languagedependent [4]. In [6][14], a syntactic approach has been proposed which use syntactic struc-ture of cover text for embedding watermark bits by performed syntactic transformations tosyntactic tree diagram taking into account conserving of natural properties of text during wa-termark embedding process. In [18], a synonym substitution has been proposed to embed wa-termark by replacing certain words with their synonyms without changing the sense and con-text of text.C. Binary Image-based Techniques Text Watermarking techniques of binary image documents depends on traditional im-age watermarking techniques that based on space domain and transform domain, such as Dis-crete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Least SignificantBit (LSB) [5]. Several formal text watermarking methods have been proposed based on em-bedding watermark in text image by shifting the words and sentences right or left, or shiftingthe lines up or down to embed watermark bits as it is mentioned above in section format-based watermarking [5][7].D. Zero-based Techniques Text watermarking techniques based on Zero-based watermarking are content featuresdependent. There are several approaches that designed for text documents have been pro-posed in the literatures which are reviewed in this paper [1][19] [20] and [21]. The first algorithm has been proposed by [19] for tamper detection in plain text documentsbased on length of words and using digital watermarking and certifying authority techniques.The second algorithm has been proposed by [20] for improvement of text authenticity in whichutilizes the contents of text to generate a watermark and this watermark is later extracted toprove the authenticity of text document. The third algorithm has been proposed by [1] for copy-right protection of text contents based on occurrence frequency of non-vowel ASCII charactersand words. The last algorithm has been proposed by [21] to protect all open textual digital con-tents from counterfeit in which is insert the watermark image logically in text and extracted itlater to prove ownership. In [22], Chinese text zero-watermark approach has been proposedbased on space model by using the two-dimensional modelcoordinate of wordlevel and the sen-tence weights of sentencelevel. 286
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEE. Combined-based Techniques One can say the text is dissimilar image. Thus, language has a distinct and syntacticalnature that makes such techniques more difficult to apply. Thus, text should be treated as textinstead of an image, and the watermarking process should be performed differently. In [23] Acombined method has been proposed for copyright protection that combines the best of bothimage based text watermarking and language based watermarking techniques. The above mentioned text watermarking approaches are not appropriate to all types oftext documents under document size, types and random tampering attacks, and its mecha-nisms are very essential to embed and extract the watermark in which maybe discovered eas-ily by attackers . On the other hands,theseapproaches are not designed specifically to solveproblem of authentication and tamper detection of text documents, and are based on makingsome modifications on original text document to embed added external information in textdocument and this information can be used later for various purposes such as content authen-tication, integrity verification, tamper detection, or copyright protection. This paper proposesa novel intelligent approach for content authentication and tamper detectionof English textdocuments in which the watermark embedding and extraction process are performed logicallybased on text analysis and extract the features of contents by using hidden Markov model inwhich the original text document is not altered to embed watermark.III. THE PROPOSED APPROACH This paper proposes a novel intelligent approach based on zero-watermarking meth-odology in which the original text document is not altered to embed watermark, that meansthe watermark embedding process is performed logically. The proposed approach uses theMarkov model of the natural languages that is Markov chains are used to analyse the EnglishText and extract the probabilistic features of the contents which are utilized to generate a wa-termark key that is stored in a watermark database. This watermark key can be used later andmatched with watermark generated from attacked document for identifying any tamperingthat may happen to the document and authenticating its content. This process illustrated infigure 1. Fig. 1.Watermark generation and detection processes 287
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Before we explain the watermark generation and extraction processes, in the next sub-section we present a preliminary mathematical description of Markov models for natural lan-guage text analysis.A. Markov Models for Text Analysis In this subsection, we explain how to model text using a Markov chain, which is de-fined as a stochastic (random) model for describing the way that processes move from state toa state. For example, suppose that we want to analyse the sentence: “Ahmed was beginning to get very tired of sitting by his brother on the bank, and of having nothing to do” When we use a Markov model of order one, each character is a state by itself, and theMarkov process transitions from state to state as the text is read. For instance, as the abovesample text is processed, the system makes the following transitions:"A" -> "h" -> "m" -> "e" -> "d" -> " " -> "w" -> "a" -> "s" -> " " -> "b" -> "e" -> "g" -> "i" -> "n" -> "n" -> "i" -> "n" -> "g" … As a result of first order Markov model for analysing the given sentence we obtain thefigure 2 which gives the present state and the all possible transitions. Fig. 2.Sample text transitions. Now if we consider state "a", the next state transitions are "h", "n",”n”, "s", and "v".We observe that state “n” occurs twice. Next we present a simple method to build the states and the Markov transition matrixMሾ݅, ݆ሿwhich is the most basic part of text analysis using Markov model. In the proposed approach, the text considered is not limited to alphabetic characters,but includes spaces, numbers, and special characters such as [, . ; : - ? ! ], and the total num-ber of states is 61,these are [English letters = 26, space letter= 1, Integer numbers from 0 to 288
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME9= 10, specific symbols such as . " , ; : ? ! / @ $ & % * + - = >< ( ) [ ] = 24].The entryMሾ݅, ݆ሿ will be used to keep track of the number of times that the i୲୦ character of the text isfollowed by the j୲୦ character of the text.For i ൌ 1 to L െ 1, where Lis the length of the textdocument - 1", let x be the ith character in the text and y be the (i+1)st character in the text.Then increment M[x,y].Now the matrix M contains the counts of all transitions. Next weturn these counts into probabilities as follows,for each ifrom 1 to 61, sum the entries onthe ith row, i.e., let counter[i] = M[i,1] + M[i,2] + M[i,3] + ... + M[i,61] . Now define P[i,j] = M[i,j] / counter[i] for all pairs i,j. This just gives a matrix of prob-abilities. In other words, now P[i,j] is the probability of making a transition from letter ito letter j. Hence a matrix of probabilities that describes a Markov model of order one forthe given text is obtained.B. Watermark Generation and EmbeddingAlgorithm The watermark generationand embedding algorithm requires the original textdocument as input, then as a pre-processing step it is required to perform conversion ofcapital letters to small letters and to remove all spaces within the text document. A wa-termark pattern is generated as the output of this algorithm. This watermark is then storedin watermark database along with the original text document, document identity, authorname, current date and time. This stage includes two main processes which are watermark generation and watermarkembedding. Watermark generation from the original text document and embed it logicallywithin the original watermark will be done by the embedding algorithm. In this proposed watermark generation algorithm, the original text document (T) is tobe provided by the author. Then text analysis process should be done using Markovmodel to compute the number of occurrences of the next state transitions for every pre-sent state, in this approach we use Markov model of order one. A Matrix of transitionprobabilities that represents the number of occurrences of transition from a state to an-other is constructed according to the procedure explained in previous section A and canbe computed by equation (1). MarkovMatrix[ps][ns] = P[i][j], for i,.j=1,2, .,n …….(1) Where, o n: is the total number of states o i: refers to PS "the present state". o j: refers to NS "the next state". o P[i,j]: is the probability of making a transition from character i to character j. After performing the text analysis and extracting the probability features, the water-mark is obtained by identifying all the nonzero values in the above matrix. These nonzerovalues are sequentially concatenated to generate a watermark pattern, denoted byWMPOas given by equation (2) and presented in figure 3. 289
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Fig. 3. Watermark generation processes WMPO&= MarkovMatrix [ps] [ns], for i,. j= nonzero values in the Markov matrix……..(2) This watermark is then stored in a watermark database along with the original text docu-ment, document identity, author name, current date and time.After watermark generation assequential patterns, an MD5 message digest is generated for obtaining a secure and compactform of the watermark,notationalyas given by equation (3) and presented in figure 4. Fig. 4.Watermark before and after MD5 digesting DWM = MD5(WMPO) ……………….. (3) The proposed watermark generation and embedding algorithm, using First order Markovmodel is presented formally in figure 5. 290
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Fig. 5: Watermark generation and embedding algorithmC. Watermark Extraction and Detection Algorithm The watermark detection algorithm is on the base of zero-watermark, so before de-tection for attacked text document TDA, the proposed algorithm still need to generate theattacked watermark patterns′. When received the watermark patterns′, the matching rateof patterns′ and watermark distortion are calculated in order to determine tampering de-tection and content authentication. This stage includes two main processes which are watermark extraction and detec-tion. Extracting the watermark from the received attacked text document and matching itwith the original watermark will be done by the detection algorithm. The proposed watermark extraction algorithmtakes the attacked text document,and perform the same water mark generation algorithm to obtain the watermark patternfor the attacked text document. After extracting the attacked watermark pattern, the watermark detection is per-formed in three steps, • Primary matching is performed on the whole watermark pattern of the original document WMPO, and the attacked document WMPA. If these two patterns are found the same, then the text document will be called authentic text without tampering. If the primary matching is unsuccessful, the text document will be called not authentic and tampering occurred, then we proceed to the next step. • Secondary matching is performed by comparing the components associated with each state of the overall pattern.which compares the extracted watermark pattern for each state with equivalent transition of original watermark pattern. This process can be described by the following mathematical equations (4),and (5). 291
  9. 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME ௐெ௉ೀ ሾ௜ሿሾ௝ሿି ሺௐெ௉ೀ ሾ௜ሿሾ௝ሿି ௐெ௉ಲ ሾ௜ሿሾ௝ሿሻ ܲ‫ ்ܴܯ‬ሺ݅, ݆ሻ ൌ ቚ ቚ ݂‫..………݆ .݅ ݈݈ܽ ݎ݋‬ (4) ௐெ௉ೀ ሾ௜ሿሾ௝ሿ ∑೙ ൫௉ெோ೅ ሺ௜,௝ሻ൯ ܲ‫ܴܯ‬ௌ ሺ݅ ሻ ൌ ฬ ೕసభ ฬ ݂‫..…….………………݅ ݈݈ܽ ݎ݋‬ (5) ்௢௧௔௟ ௌ௧௔௧௘௉௔௧௧௘௥௡஼௢௨௡௧ሺ௜ሻ This process is illustrated in figure 6. Fig. 6: Watermark extraction process Finally, the PMR is calculated by equation (6), which represent the pattern match-ing rate between the original and attacked text document. ∑౤ ୔୑ୖୗሺ୧ሻ PMR ൌ ቚ ౟సభ ቚ……………….…….. (6) ୒Where, • N: is the number of non-zero elements in the Markov matrix The watermark distortion rate refers to tampering amount occurred by attacks on con-tents of attacked text document, this value represent in WDR which we can get for it byequation (7): WDR ൌ 1 െ PMR ……………………. (7) The detection algorithm is illustrated in figure 7. 292
  10. 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME DWM Detection Algorithm (A Zero Text DWM based on Probabilistic patterns) - Input: Original Text Document, Attacked Text Document - Output: WMPO, WMA, WMPA. PMR, WDR, Attacked States and Transitions Matrix [61][61]. 1. Read WMo or Original Text(TDO) and Attacked Text(TDA) documents and performs Pre-processing for them. 1. Loop ps = 1 to 61, // Build the states of MarkovMatrix - Loop ns = 1 to 61, // Build the transitions for each state of MarkovMatrix - aMarkovMatrix[ps][ns] = Total Number of Transition[ps][ns] // compute the total frequencies of tran- sitions for every state 2. Loop i = 1 to 61, // Extract the embedded watermark Loop j = 1 to 61, - IF aMarkovMatrix [i][j] != 0 // states that have transitions - WMPA &= aMarkovMatrix[i] [j] 3. Output WMPO, WMPA 4. IF WMPA = WMPO Print “Document is authentic and no tampering occurred” - PMR = 1 Else - Print “Document is not authentic and tampering occurred” - For i = 1 to 61 // Extract transition patterns and match each of them with original transition patterns - For j = 1 to 61 o IF WMPO[i][j] != 0 - patternCount +=1 ‫ ۽۾ۻ܅‬ሾ୧ሿሾ୨ሿି ሺ‫ ۽۾ۻ܅‬ሾ୧ሿሾ୨ሿି ‫ ۯ۾ۻ܅‬ሾ୧ሿሾ୨ሿሻ ‫ ܂ ܀ۻ۾‬ሾiሿሾjሿ ൌ ቚ ‫ ۽۾ۻ܅‬ሾ୧ሿሾ୨ሿ ቚ - transPMRTotal += PMR ୘ o Else - IF WMPA[i][j] != 0 o patternCount += WMP୅ሾiሿሾjሿ ୲୰ୟ୬ୱ୔୑ୖ୘୭୲ୟ୪ሾ୧ሿሾ୨ሿ - statePMR[i] = ୮ୟ୲୲ୣ୰୬େ୭୳୬୲ሾ୧ሿ - PMR ୗ += statePMR[i] Totalpattern += patternCount ୔୑ୖ౏ 5. PMR ൌ ୘୭୲ୟ୪୮ୟ୲୲ୣ୰୬ୱ 6. WDR ൌ 1 – PMR WMPO: Original watermark, WMPA: Attacked watermark, aMarkovMatrix: Attacked states and transitions matrix, TDA: Attacked text document array, ps: the present state, ns: the next state, PMRT: Transition patterns matching rate, PMRS: State patterns matching rate, PMR: Watermark patterns matching rate, WDR: Watermark distortion rate. Fig. 7: watermark extraction and detection algorithmIV. EXPERIMENTAL SETUP, RESULTS AND DISCUSSIONA. Experimental Setup In order to test the proposed approach and compare with other the approach, we con-ducted a series of simulation experiments. The experimental environment is listed as below:CPU: Intel Core™i5 M480/2.67 GHz, RAM: 8.0GB, Windows 7; Programming languagePHP NetBeans IDE 7.0. With regard to the data sets used, six samples from the data sets de-signed in [24]. These samples were categorized into three classes according to their size,namely Small Size Text (SST), Medium Size Text (MST), and Large Size Text (LST). 293
  11. 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Next, we define the types of attacks and their percentage as follows, Insertion attack,deletion attack and reorder attack performed randomly on multiple locations of these datasets. The details of our datasets volume and attacks percentage used is shown in table I, which isconsidered are similar to those performed in [19] for comparison purpose, and it should bementioned that we perform the reorder attack on the datasets which is not contained in thesame paper. Table I Original and attacked text samples with insertion and deletion percentage Original Attacked Attacks Percentage Sample Text Text Text No Word Inser- Dele- Reor- Word Count Count tion tion der [SST2] 421 26% 25% 16% 425 [SST4] 179 44% 54% 5% 161 [MST2] 559 49% 25% 6% 696 [MST4] 2018 14% 12% 2% 2048 [MST5] 469 57% 53% 10% 491 [LST1] 7993 9% 6% 1% 8259 To measure the performance of our approach and compare it with others, the Tamper-ing Accuracy which is a measure of the watermark robustness will be used. The PMR valuewill give the Tampering Accuracy of the given text document. The watermark distortion rateWDR is also measured and compared with other approaches. The values of both PMR andWDR range between 0 and 1 value. The larger PMR value, and obviously the lowest WDRvalue mean more robustness, while the lowest PMR value and largest WDR value means lessrobustness. Desirable value of PMR with close to 0, and close to 1 with WDR. We categorize tam-per detection states into three classes based on PMR threshold values which are: (High whenPMR values greater than 0.70, Mid when PMR values between 0.40 and 0.70, and Low whenPMR values less than 0.40). To evaluate the accuracy of the proposed approach, a series of experiments were con-ducted with all the well known attacks such as random insertion, deletion and reorder of wordsand sentences on each sample of the datasets. These various kinds of attacks were applied at mul-tiple locations in the datasets. The experiments were conducted, firstly with individual attacks,then with all attacks at the same time and conducted comparative results of the proposed ap-proach with recently similar approach.B. Experiments with allAttacks In order to compare the performance of the proposed approach with recently pub-lished approach for text watermarking which titled word length Zero-watermarking algorithm(WLZW) proposed by Z Jalil et al. [19], named here as WLZW, in this part we limited ourcharacter set to letters from ‘a’ to ‘z’ and space letter as in [19]. In this experiments, randommultiple insertions and deletion attacks were performed at the same time on each sample ofthe datasets with attacks rates as shown in table I.Ratios of successfully detected watermarkof the proposed algorithms as compared with WLZW are shown in table II and graphicallyrepresented in figure 8. 294
  12. 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME January Fig. 8: Comparative performance accuracy of the proposed algorithm with WLZW alg algo- rithm TableII Comparison of the proposed algorithm with WLZW Ratio of watermark accu- Attacks Rates Sample racy to tamper detection Text No IA DA The proposed WLZW rate rate Algorithm 1 : [SST2] 26% 25% 0.5671 0.7022 2 : [SST4] 44% 54% 0.8634 0.6607 3: [MST2] 49% 25% 0.7941 0.651 4: [MST4] 14% 12% 0.8335 0.8486 5: [MST5] 57% 53% 0.6332 0.5767 6: [LST1] 9% 6% 0.8548 0.903 Table II shows the comparative results of the ratio of watermark accuracy to tamperdetection for both the proposed approach and WLZW approach. It can be seen from the re- rsults that the proposed approach performs better for the data sets under small rate of the a at-tacks, while at higher rate of the attacks the WLZA approach is better in performance. In the next section we consider an enlarged character set for the text document, and i im-prove significantly the performance of our approach. cantlyC. Experiments with the proposed approach In this section, we evaluate the performance of the proposed approach. The character .set is extended to cover all English letters, space, numbers, and special symbols. The experi-ments were conducted with the various kinds of attacks individually with the rates of theseattacks selected randomly between 1 and 40%. The performance results of this approach u un-der all the mentioned attacks are presented intabular form in table IV, and presented graph , graphi-cally in figures 9, 10, and 11 for Insertion attack, Deletion attack, and Reorder attack respe respec-tively. These results are discussed below. iscussed 295
  13. 13. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME January Fig. 9: Watermark accuracy under various rates of insertion attacks Fig. 10: Watermark accuracy under various rates of deletion attacks Fig. 11: Watermark accuracy under various rates of reorder attacks 296
  14. 14. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Table IV IMPROVED PATTERNS matching of extracted watermark with individual attacks Insertion Attack Deletion Attack Re-order Attack Sample (IA) (DA) (RA) Text No DA RA IA rate PMR PMR PMR rate rate [SST2] 17% 0.8077 12% 0.879 16% 0.9797 [SST4] 40% 0.6206 26% 0.7554 5% 0.9891 [MST2] 11% 0.9081 10% 0.8842 6% 0.9951 [MST4] 3% 0.9546 5% 0.9442 2% 0.9996 [MST5] 16% 0.7805 12% 0.8835 10% 0.9916 [LST1] 1% 0.9856 6% 0.9296 1% 0.9999 For each dataset in table IV, and figure9, 10, and 11 with all kind of attacks, tamperingof the text is always detected based on PMR threshold values, whether it is low, medium orhigh. This proves that the text is sensitive to any modification made by various attackers andthe accuracy of watermark gets affected even when the tampering volume is low. As observed from the graphs for all the cases the PMR is always above 70% under alltypes of the attacks, except for the dataset [SST4], under Insertion attack with rate 40% it is be-low 70% but still above 60%.This shows that the proposed approach is very effective in detect-ing insertion ,deletion, and reorder tampering. Further, the performance of the proposed approach was evaluated with all the attacksapplied at the same time. Table V show the experimental results under insertion and deletion,and reorder attacks occurred simultaneously. Table V IMPROVED PATTERNS MATCHING RATE UNDER INSERTION AND DELETION AT- TACKS Attacks Rates DA RA PMR WDR IA rate rate rate [SST2] 26% 25% 11% 0.6599 0.3401 [SST4] 44% 54% 18% 0.5288 0.4712 [MST2] 49% 25% 11% 0.5861 0.4139 [MST4] 14% 12% 4% 0.8307 0.1693 [MST5] 57% 53% 12% 0.5087 0.4913 [LST1] 9% 6% 1.5% 0.8586 0.1414 As can be seen from Table V, and the graphical representation in figure 12, that the pro-posed approach is still efficient, when all attacks were applied simultaneously, with small rateof attacks [MST4] , [LST1] and [SST2]. On the other hand when the rate of the attacks isslightly higher, the proposed approach performs in a moderate manner, and still effective.The results are also presented graphically as compared with the WLZW approach, and theproposed approach with small character set. As can be seen that the proposed approach per-forms comparably in a good manner even though the character set is very much larger thanother approaches. 297
  15. 15. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME January Fig. 12: Comparative performance accuracy of the proposed algorithm before and after improved with WLZW algorithmV. CONCLUSION Based on Markov model of order oneandletter-based, the authors have designed a text zero oneandletter based, zero-watermark approach which is based on text analysis. The algorithm uses the text features as probabil- analysis turesistic patterns of states and transitions in order to generate and detect the watermark. Theproposeda . Theproposedap-proach is implemented using PHP programming language. The experiment shows that the proposed e perimentapproach is more secure and has good accuracy of tampering detection. Compared with the traditional ingwatermark approachnamed WLZW under random insertion, deletion and reorder attacks in localized ra domand dispersed form on 6 variable size text datasets, the result shows that the tampering detection acc accu-racy is improved in the proposed approach. In addition, the proposed approach always detects inse posed inser-tion, deletion and reorder tampering attacks occurred randomly on different size of text documentseven when the tampering volume is very low, medium, or high. Also, results show that the proposed e ,approach is not limited to alphabetic characters, but includes spaces, numbers, and special cha proach characters. This work can further be extended to include the high order level of Markov model for naturallanguage processing and analysis the contents of text document and utilize of its features to generate a guage fe turesbetter watermark.REFERENCES 1. Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, (2010), “A Zero Text Watermar ro Watermarking Algo- rithm based on Non-Vowel ASCII Characters”, International Conference on Educational and In- Vowel Characters I formation Technology (ICET 2010), IEEE. 2. Suhail M. A., (2008), “Digital Watermarking for Protection of Intellectual Property A Book Digital Property”, Published by University of Bradford, UK. d UK 3. L. Robert, C. Science, C. Government Arts, (2009), “A Study on Digital Watermar A Watermarking Tech- niques”. International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp. 223 . Engineering, 223-225. 4. X. Zhou, S. Wang, S. Xiong, (2009), “Security Theory and Attack Analysis for Text Wate ty Water- marking”. International Conference on E . E-Business and Information System Secur Security, IEEE, pp. 1-6. 5. T. Brassil, S Low, and N. F. Maxemchuk, (1999), “Copyright Protection for the Electronic Copyright Ele Distribution of Text Documents . Proceedings of the IEEE, vol. 87, no. 7, pp. 1181 ents”. 1181-1196. 6. M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. M Mohamed, and S.Naik, (2001),“Natural language watermarking: Design, analysis, andimplementation Pro- Natural andi plementation”. ceedings of the a Fourth Hiding Workshop, vol. LNCS 2137, 25-27. ourth W 298
  16. 16. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME 7. N. F. Maxemchuk and S Low, (1997), Marking Text Documents. Proceedings of the IEEE In- ternational Conference on Image Processing, Washington, DC, pp. 13- 16. 8. D. Huang, H. Yan, (2001), “Interword distance changes represented by sine waves for watermark- ing text images”. IEEE Trans. Circuits and Systems for Video Technology, Vol.11, No.12, pp. 1237 1245. 9. N. Maxemchuk, S. Low, (2000), “Performance Comparison of Two Text Marking Methods”. IEEE Journal of Selected Areas in Communications (JSAC), vol. 16 no. 4, pp. 561-572, 1998. 10. S. Low, N. Maxemchuk, Capacity of Text Marking Channel. IEEE Signal Processing Letters, vol. 7, no. 12 , pp. 345 -347. 11. M. Kim, “Text Watermarking by Syntactic Analysis”. 12th WSEAS International Conference on Computers, Heraklion, Greece, 2008. 12. H. Meral, B. Sankur, A. Sumru, T. Güngör, E. Sevinç , (2009), Natural language watermarking via morphosyntactic alterations. Computer Speech and Language, 23, pp. 107-125. 13. Z. Jalil, A. Mirza, (2009), “A Review of Digital Watermarking Techniques for Text Documents”. International Conference on Information and Multimedia Technology, pp. 230-234, IEEE. 14. M. AtaIIah, C. McDonough, S. Nirenburg, V. Raskin, (2000), “Natural Language Processing for Information Assurance and Security: An Overview and Implementations”. Proceedings 9th ACM/SIGSAC New Security Paradigms Workshop, pp. 5 1-65. 15. H. Meral, E. Sevinc, E. Unkar, B. Sankur, A. Ozsoy, T. Gungor, (2007), “Syntactic tools for text watermarking”. In Proc. of the SPIE International Conference on Security, Steganography, and Wa- termarking of Multimedia Contents, pp. 65050X-65050X-12. 16. O. Vybornova, B. Macq., (2007), “Natural Language Watermarking and Robust Hashing Based on Presuppositional Analysis”. IEEE International Conference on Information Reuse and Integration, IEEE. 17. M. tallah, V. Raskin, C. Hempelmann, (2002), “language watermarking and tamperproofing”. Proc. of al.. Natural 5th International Information Hiding Workshop, Noordwijkerhout, Netherlands, pp.196-212. 18. U. Topkara, M. Topkara, M. J. Atallah, (2006), “The Hiding Virtues of Ambiguity: Quantifiably Resilient Watermarking of Natural Language Text through Synonym Substitutions”. In Proceedings of ACM Multimedia and Security Conference, Geneva. 19. Z Jalil, A. Mirza, H. Jabeen, (2010), “Word Length Based Zero-Watermarking Algorithm for Tamper Detection in Text Documents”. 2nd International Conference on Computer Engineering and Technology, pp. 378-382, IEEE. 20. Z Jalil, A. Mirza, M. Sabir, (2010), “Content based Zero-Watermarking Algorithm for Authentica- tion of Text Documents”. (IJCSIS) International Journal of Computer Science and Information Secu- rity, Vol. 7, No. 2. 21. Z. Jalil , A. Mirza, T. Iqbal, (2010), “A Zero-Watermarking Algorithm for Text Documents based on Structural Components”. pp. 1-5 , IEEE. 22. M.Yingjie, G. Liming, W.Xianlong, G Tao, (2011), “Chinese Text Zero-Watermark Based on Space Model”.In Proceedings of I3rd International Workshop on Intelligent Systems and Applica- tions,pp. 1-5 , IEEE. 23. S. Ranganathan, A. Johnsha, K. Kathirvel, M. Kumar, (2010), “Combined Text Watermarking. In- ternational Journal of Computer Science and Information Technologies”, Vol. 1 (5), pp. 414-416. 24. Fahd N. Al-Wesabi, Adnan Alshakaf, Kulkarni U. Vasantrao, (2012), “A Zero Text Watermarking Algorithm based on the Probabilistic weights for Content Authentication of Text Documents”, in Proc. On International Journal of Computer Applications (IJCA), U.S.A, pp. 388 - 393. 25. M.Vasim Babu and Dr.A.V Ramprasad, “Energy Aware Adaptive Monte Carlo Localization Al- gorithm For WSN Based On Antithetic Markov Chain (AMCAM)” International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 180 - 190, Published by IAEME. 26. Karimella Vikram, Dr. V. Murali Krishna, Dr. Shaik Abdul Muzeer and Mr.K. Nara- simha, “Invisible Water Marking Within Media Files Using State-Of-The-Art Technology” Interna- tional journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 1 - 8, Published by IAEME. 299
  17. 17. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Fahd N.Al-Wesabi was born in Thammar, Yemen on 05 April, 1980. He received his B.Sc. degree in Computer Science from University of Science and Technology, Sanaa, Yemen, in 2006. He later received M.Sc. degree in Computer Information Systems in 2009 from The Arab Academy for bank- ing and financial sciences, Jordon, Yemen branch. He is Assistant teacher, Department of IT, Faculty of Computing and IT, University of Science and Technology, Sana’a, Yemen. Currently he is pursuing his Ph.D research at Department of Computer Science, Engineering College, SRTM University, Nanded, India. His research interest includes text watermarking, information security, content authentication, and soft computing tools.Adnan Z. Alsakaf Currently he is pursuing his Ph.D research at Department of Com-puter Science, IIT School, Delhi, India. His research interest includes information security,cryptography, watermarking, and soft computing tools. He is Professor, Department ofIS,Faculty of Computing and IT, University of Science and Technology, Sana’a, Yemen. Kulkarni U. Vasantrao received his B.Sc degree in Electronics, MSc degree in Systems Software; and Ph.D. degree in Electronics and Computer Science Engineering, Fuzzy Neural Networks and their applications in Pat- tern Recognition. He is Professor & Head, Dept. of Computer Science, S.G.G.S. Engineering College, SRTM University, Nanded. His area of spe- cialization includes Artificial Neural Networks, Distributed Systems and Microprocessors. 300