Your SlideShare is downloading. ×
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1
Scanned Document Compression Using a
Block-based Hybrid Video Codec
...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 2
formidable compressor for still images [32]. The intraframe
macroblo...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 3
(a) (b)
(c) (d)
(e) (f)
(g)
2
(h)
Fig. 4. Illustration of approximat...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 4
Fig. 5. Configuration parameters that have greater influence on the en...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 5
TABLE I
AVERAGE OBJECTIVE (PSNR IN DB) IMPROVEMENT OVER EXISTING
STA...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 6
(a) (b) (c) (d)
Fig. 7. Examples of documents used in our experiment...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 7
0.2 0.3 0.4 0.5 0.6 0.7 0.8
25
30
35
40
45
Bitrate(bpp)
PSNR(dB) Com...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 8
0.2 0.4 0.6 0.8
26
28
30
32
34
36
38
40
42
Bitrate(bpp)
AveragePSNR(...
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 9
(a) (b)
(c) (d)
Fig. 10. Subjective comparison among coders: (a) zoo...
Upcoming SlideShare
Loading in...5
×

Scanned document compression using block based hybrid video codec

762

Published on


Sybian Technologies is a leading IT services provider & custom software development company. We offer full cycle custom software development services, from product idea, offshore software development to outsourcing support & enhancement. Sybian employs a knowledgeable group of software developers coming from different backgrounds. We are able to balance product development efforts & project duration to your business needs.

Sybian Technologies invests extensively in R&D to invent new solutions for ever changing needs of your businesses, to make it future-proof, sustainable and consistent. We work in close collaboration with academic institutions and research labs across the world to design, implement and support latest IT based solutions that are futuristic, progressive and affordable. Our services continue to earn trust and loyalty from its clients through its commitment to the following parameters

Final Year Projects & Real Time live Projects

JAVA(All Domains)
DOTNET(All Domains)
ANDROID
EMBEDDED
VLSI
MATLAB


Project Support

Abstract, Diagrams, Review Details, Relevant Materials, Presentation,
Supporting Documents, Software E-Books,
Software Development Standards & Procedure
E-Book, Theory Classes, Lab Working Programs, Project Design & Implementation
24/7 lab session

Final Year Projects For BE,ME,B.Sc,M.Sc,B.Tech,BCA,MCA

PROJECT DOMAIN:
Cloud Computing
Networking
Network Security
PARALLEL AND DISTRIBUTED SYSTEM
Data Mining
Mobile Computing
Service Computing
Software Engineering
Image Processing
Bio Medical / Medical Imaging

Contact Details:
Sybian Technologies Pvt Ltd,
No,33/10 Meenakshi Sundaram Building,
Sivaji Street,
(Near T.nagar Bus Terminus)
T.Nagar,
Chennai-600 017
Ph:044 42070551

Mobile No:9790877889,9003254624,7708845605

Mail Id:sybianprojects@gmail.com,sunbeamvijay@yahoo.com


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
762
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Scanned document compression using block based hybrid video codec"

  1. 1. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1 Scanned Document Compression Using a Block-based Hybrid Video Codec Alexandre Zaghetto, Member, IEEE, and Ricardo L. de Queiroz, Senior Member, IEEE Abstract—This paper proposes a hybrid pattern matching/transform-based compression method for scanned documents. The idea is to use regular video interframe prediction as a pattern matching algorithm that can be applied to document coding. We show that this interpretation may generate residual data that can be efficiently compressed by a transform-based encoder. The efficiency of this approach is demonstrated using H.264/AVC as a high quality single- and multi-page document compressor. The proposed method, called Advanced Document Coding (ADC), uses segments of the originally independent scanned pages of a document to create a video sequence, which is then encoded through regular H.264/AVC. The encoding performance is unrivaled. Results show that ADC outperforms AVC-I (H.264/AVC operating in pure intra mode) and JPEG2000 by up to 2.7 dB and 6.2 dB, respectively. Superior subjective quality is also achieved. Index Terms—Scanned document compression, advanced doc- ument coding, pattern matching, H.264/AVC. I. INTRODUCTION COMPRESSION of scanned documents can be tricky. The scanned document is either compressed as a continuous- tone picture, or it is binarized before compression. The binary document can then be compressed using any available two- level lossless compression algorithm (such as JBIG [1] and JBIG2 [2]), or it may undergo character recognition [3]. Binarization may cause strong degradation to object contours and textures, such that, whenever possible, continuous-tone compression is preferred [4]. In single/multi-page document compression, each page may be individually encoded by some continuous-tone image compression algorithm, such as JPEG [5] or JPEG2000 [6], [7]. Multi-layer approaches such as the mixed raster content (MRC) imaging model [8]–[12] are also challenged by soft edges in scanned documents, often requiring pre- and post-processing [13]. Natural text along a document typically presents repetitive symbols such that dictionary-based compression methods be- come very efficient. For continuous-tone imagery, the recur- rence of similar patterns is illustrated in Fig. 1. Nevertheless, an efficient dictionary-based encoder relying on continuous- tone pattern matching is not that trivial. We propose an encoder that explores such a recurrence through the use of pattern- matching predictors and efficient transform encoding of the residual data. Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. The authors are with the Department of Computer Science, Universidade de Brasilia, Brazil, e-mail: alexandre@cic.unb.br, queiroz@ieee.org. Fig. 1. Digitized books usually present recurrent patterns across different pages and across regions of the same page. It is important to place our proposal within the proper scenario. Three premises are assumed. Firstly, we want to avoid complex multi-coder schemes such as MRC. Secondly, the decoder should be as standard as possible. Since we are dealing with scanned compound documents (mixed pictures and text), natural image encoders, such as JPEG2000, are the most adequate. Non-standard encoders, based on fractals [14]– [16], texture prediction [17], [18], template matching [19] or multiscale pattern recurrence [20], [21], are good options out of the scope of what is being proposed. Thirdly, one should provide high quality reconstructed versions of scanned documents. This is especially important if rare books of historical value must be digitally stored, thus discarding optical character recognition (OCR) and token-based methods [2], [10]. In summary, we want a standard single coder approach that operates on natural images and delivers high-quality reconstructed compound documents. The proposed coder makes heavy use of the H.264/AVC standard video coder [22]. H.264/AVC has been well explained in the literature [23]–[27]. H.264/AVC leads to substantial performance improvement when compared to other existing standards [25], [28], such as MPEG-2 [29] and H.263 [30]. Among such improvements we can mention [22], [31]: in- terframe variable block size prediction; arbitrary reference frames; quarter-pel motion estimation; intraframe macroblock prediction; context-adaptive binary arithmetic coding; and in- loop deblocking filter. Results point to at least a factor of two improvement over previous standards. The many cod- ing advances brought into H.264/AVC not only set a new benchmark for video compression, but they also make it a This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  2. 2. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 2 formidable compressor for still images [32]. The intraframe macroblock prediction, combined with the context-adaptive binary arithmetic coding (CABAC) [33] turns the H.264/AVC into a powerful still image compression method (i.e. working on a video sequence composed of one single frame). We refer to this coder as AVC-I. Gains of the AVC-I over JPEG2000 are typically in the order of 0.25 dB to 0.5 dB in PSNR for pictorial images [31], [32], [34]. For compound images (mixture of text and picture) [4], the PSNR gains are more substantial, even surpassing the mark of 3 dB improvement over JPEG2000, in some cases [31]. The hypothesis presented is this paper is that a scanned document encoder that employs state-of-the-art video coding techniques and generates an H.264/AVC-decodable bit-stream yields the best rate-distortion performance compared to other continuous-tone still image compressors. 1 . II. THE PROPOSED METHOD AND ITS IMPLEMENTATION USING AVC The proposed document coder has a generic concept and an implementation based on a stock H.264/AVC video coder. We now describe the desired features and how one can implement them using AVC. The generic description may help the reader to adapt other video coders for that purpose or to develop non-standard-based (proprietary) variations. A. Block-based pattern matching The encoder is based on pattern matching. The document image is segmented into blocks of pixels. Each block is matched to an existing pattern in a dictionary which is popu- lated by the previous contents of the same document. In order to do that, we partition the scanned document, which may be made of one or more scanned pages of H ×W pixels, into Np (H/ Np × W/ Np pixels) sub-pages or frames. Hence, a scanned book may be decomposed into many frames. Figure 2 illustrates the page pre-processing (partition) algorithm, while Fig. 3 shows an example of a frame sequence built from a 3-pages set. Blocks have for example 16×16 pixels and each one is matched to an existing pattern in a previous frame. In this way, the previous frames make a dynamic dictionary of patterns to look when encoding the present frame, which is continuously being updated as more frames are encoded. Once a match is found, the matching pattern is used to predict the block and the prediction error (residue) is encoded along with the frame number and position where the match was found (reference vector). A block can be partitioned into smaller blocks to ease prediction at the cost of spending more bits to encode reference vectors. Figure 4 illustrates the effect of using the pattern matching prediction algorithm. Figures 4 (a) and (b) show examples of a reference and a current text area, respectively. Figures 4 (c), (e) and (g) represent the predictions of the current text using 16×16, 8×8 and 4×4-pixel block partitions. Figures 4 (d), (f) and (h) are the corresponding residual data. 1Preliminary results of the proposed method over multi-page text-only documents have been presented at a conference [35]. 0 1 2 3 4 5 6 7 8 9 10 11 Fig. 2. A document page is partitioned into segments (labeled in sequence). Each one is considered a frame and can be sequentially encoded. Fig. 3. Example of a frame sequence, built from a 3 pages set, Np = 4 frames/page. Frames 1 to 4, 5 to 8, and 9 to 12 are built from pages 1, 2 and 3, respectively. Notice that the 4×4-pixel prediction generates a lower-energy residual, when compared to the 16 × 16 and 8 × 8 prediction, however, they require encoding more reference vectors. In this context, video coders often use motion estimation techniques which are essentially the same as pattern matching. The H.264/AVC is capable of partitioning macroblocks of 16× 16 pixels into any valid combination of blocks of 16 × 8, 8 × 16, 8 × 8, 4 × 8, 8 × 4, and 4 × 4 pixels. Our algorithm is then to feed the document frames as video frames into AVC since its motion estimation algorithm will take care of the pattern matching search for us. However, motion estimation algorithms always take advantage of the fact that video content at the same frame position in neighbor frames are typically very correlated. Since this is not our case, it is advisable to make the search window to cover as much as possible of the reference frames, or the whole frame, in order to enrich the dictionary and to remove the spatial dependency. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  3. 3. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 3 (a) (b) (c) (d) (e) (f) (g) 2 (h) Fig. 4. Illustration of approximate pattern matching using interframe prediction: (a) reference text; (b) current text; (c), (e) and (g) predicted text (block size: 16×16, 8×8 and 4×4 pixels, respectively); and (d), (f) and (h) prediction residue (block size: 16 × 16, 8 × 8 and 4 × 4 pixels, respectively). Each zoomed image patch has 178× 178 pixels. B. Inter- and intra-frame prediction AVC also allows for intra-frame prediction, in which a block (partitioned or not) can be predicted from neighboring blocks by means of directional extrapolation of the border pixels. The decision to use or not intra-frame prediction is typically based on rate-distortion optimization (RDO) and we use RDO in all our simulations. However, AVC does not allow for in-frame motion vectors (IFMV), but many variations using such a feature and other sophisticated methods of intra-frame prediction do exist [19]. Apparently, HEVC will also support IFMV [36]. Breaking up the pages into frames allows for some intra-document prediction similar to IFMV, yet using a stock video coder. Furthermore, the information derived from IFMV is typically very small compared to all compressed data, such that the advantage should not be much relevant. Because of that, we do not use IFMV. Another issue is the random access to different book pages. In order to get to a book page, we are forced to decompress all the frames it uses as reference. So, if random access is an issue, we suggest to periodically use no-reference frames, i.e. frames in which inter-frame prediction is not allowed, relying on pure intra-frame prediction/extrapolation. In our encoder, using the AVC structure, each block- partitioning combination and prediction mode is tested and the best one is picked through RDO. With RDO within AVC, in the k-th configuration test in a macroblock, AVC computes the rate Rk (bits spent to encode the block) and distortion Dk (sum of absolute differences - SAD) achieved by reconstructing the block. One picks the block partition method that minimizes Jk = Rk + λDk. The process is then repeated for every macroblock. As usual, λ controls compression ratios and is varied to find the RD curves in our simulations. C. Residual coding The residual macroblock, i.e. the prediction error, is trans- formed using 4 8×8-pixel discrete cosine transform (DCT) or an integer approximation of it. The transformed blocks are quantized and encoded using arithmetic coding. H.264/AVC uses an integer transform with similar properties as the DCT and the resulting transformed coefficients are quantized and entropy encoded using CABAC. D. Compound documents and region classification Compound document compression usually segments the image into regions and classifies each one as containing text and graphics or images (or halftones, for instance). Once a region is classified, it can be encoded using a proper algorithm. This approach is driven by objects such as text characters so that regions of the image are labeled based on our estimate of its contents. Our method, however, is driven by the compression itself. Rather than only testing pattern- matching-based prediction for every block partition, we also test prediction by extrapolating neighboring blocks, as in ”intra-prediction´´ in H.264/AVC. The RD-optimized selection of the best prediction assures that the best option is picked. Text and graphics shall contain recurrent patterns and will be often encoded using patterns from previous regions, while pictorial regions may resort to intra-prediction. In this sense, segmentation is embedded into the encoding process. In fact, the block prediction and RDO may have the same effect of This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  4. 4. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 4 Fig. 5. Configuration parameters that have greater influence on the encoder performance: Rf (number of reference frames) and Sr (search range). a segmentation map, even though benefiting the compression process, and not the true identification of image contents. E. Encoder and decoder summary In our concept, the frames are fed into AVC in sequence just like in a regular video coder. Because of that relation to AVC, we refer to the proposed method as advanced document coding, or, simply, ADC. In a nutshell, ADC operation can be summarized as (i) break the book pages into frames; (ii) feed all frames to H.264/AVC resulting in an AVC-compatible stream; (iii) decode the bit stream; and (iv) assemble the decoded frames into the final document book pages. In order to work well, H.246/AVC should operate in ”High” profile, following an IPPP... framework. The encoder should periodically use no-reference I-frames in the case random access is desired. RDO should be turned on. Motion estimation should be set to full search over a window that is as large as possible. Note that other video coders such as HEVC and MPEG-2 will also work, even though achieving different performance levels due to their different sophistication levels. III. EXPERIMENTAL RESULTS In our tests, different page sets are compressed using JPEG2000, AVC-I (H.264/AVC operating in pure intra mode) and the proposed ADC. The reason we chose JPEG2000 and AVC-I for comparison is that these are the most suitable standards that would meet the three premises presented in Section I. Distortion metrics based on visual models such as Structural Similarity (SSIM) [37] and Video Quality Metric (VQM) [38] have been extensively tested for pictorial content. However, they are unproven for text and graphics, which rely more on resolution than on number of gray levels. Readability is very important and some alternative metrics such as OCR efficiency are considered. A good objective metric to reflect subjective perception of text has not been well explored yet. Hence, we opted to stick to the traditional PSNR as a distortion metric. In JPEG2000 and AVC-I compression, the pages are sepa- rately encoded. As for ADC, the first frame of the sequence is encoded as an I-frame (only intraframe prediction modes are used) and all the remaining frames are encoded as P- frames (in addition to intraframe prediction, only past frames 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 25 30 35 40 45 Bitrate (bpp) PSNR(dB) Sequence "guita" (Np = 4, Sr = 32 and Rf = 1, 3, 5) ADC (S r = 32, R f = 5) ADC (S r = 32, R f = 3) ADC (Sr = 32, Rf = 1) AVC−I JPEG2000 (a) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 25 30 35 40 45 Bitrate (bpp) PSNR(dB) Sequence "guita" (Np = 4, Sr = 08, 16, 32 and Rf = 5) ADC (S r = 32, R f = 5) ADC (S r = 16, R f = 5) ADC (Sr = 08, Rf = 5) AVC−I JPEG2000 (b) Fig. 6. Comparison between JPEG2000, AVC-I and the proposed ADC, for different combinations of search ranges (Sr) and number of reference frames (Rf ) for test sequence “guita”. The number of frames/page (Np) is 4. are used as reference by the interframe prediction). We also considered that each page may be segmented into Np = 4 frames, Np = 16 frames, or not segmented at all (Np = 1, for multi-page documents only). Two configuration parameters have greater influence on the encoder performance. One is the number of reference frames, Rf , the other is the search range, Sr, as illustrated in Fig. 5. Initially, we evaluated the effect of choosing different values for Sr and Rf . Figure 6 shows PSNR plots comparing JPEG2000, AVC-I and ADC (Np = 4 frames/page), for different combinations of Sr and Rf . The PSNR was calculated using the global mean square error (MSE). The higher the Sr and Rf values, the better the rate-distortion performance. In particular, for Sr = 32 pixels and Rf = 5 frames, ADC outperforms AVC-I by more than 2 dB and JPEG2000 by more than 5 dB, at 0.5 bit/pixel (bpp). Our test set is composed by 18 documents divided into the This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  5. 5. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 5 TABLE I AVERAGE OBJECTIVE (PSNR IN DB) IMPROVEMENT OVER EXISTING STANDARDS FOR THE 4 DOCUMENT TEST SETS. Document Set 0 1 2 3 JPEG2000 6.26 5.61 4.47 2.41 AVC-I 2.75 2.58 1.56 0.89 following 4 classes2 : • Class 0: 6 multi-page text-only documents; • Class 1: 6 single-page text-only documents; • Class 2: 3 multi-page compound documents; and • Class 3: 3 single-page compound documents. Figure 7 illustrates one example of each class. Examples of PSNR plots are shown in Figs. 8. Figure 9 shows average PSNR improvement of ADC over JPEG2000 and AVC-I for each of the document class test sets. In all cases, JPEG2000 and AVC-I are objectively outperformed, considering Sr = 32, Rf = 5 and Np = 16. Figure 10 (a) shows a zoomed part of the original “cerrado” sequence. Its encoded and reconstructed versions using AVC-I, JPEG2000 and ADC, at approximately 0.25 bits/pixel, are shown in Figs. 10 (b), (c) and (d), respectively. ADC also yields superior subjective quality. As a reference, in Table I we present the gains of the proposed method over AVC-I and JPEG2000 using Bjontegaard’s method [39] applied to the curves in Fig. 9. As one can see from the table and from the graphics, the gains are very substantial. Our software-based tests using the popular and efficient x246 implementation of the H.264/AVC standard and the Kakadu implementation of JPEG 2000, indicate that ADC (AVC running with 5 reference frames, slowest search, 32×32 window, in IPPPP... mode) is near 10× slower than JPEG 2000. x264 in an Intel Core i7 platform has been shown to encode RGB video at a rate of 3 to 30 million pixels per second (Mps). The variation is due to the various x264 settings that affect speed and quality. Of course, in order to do that, the system was be dedicated to the task, as an appliance. A scanned letter-sized (8.5× 11 in) page at 600 pixels per inch (ppi) yields about 33 million pixels. Hence, we can expect a page compression speed roughly in the order of 5 to 50 pages per minute (ppm). This page rate may be acceptable for many on-the-fly applications and is definitely reasonable for off-line compression of books and such. A rigorous complexity study of the encoding algorithms presented here is beyond the scope of this paper. IV. CONCLUSIONS In this paper, we presented a pattern matching/transform- based encoder for scanned documents named ADC. The reason why we decided to use H.264/AVC tools to implement the proposed method is because its interframe prediction scheme allied with RDO yield an efficient pattern matching algorithm. 2The entire test set is available at http://image.unb.br/queiroz/testset In addition, the intraframe prediction, the DCT-based trans- form and the CABAC contribute to improve the encoding efficiency. In essence, our work can be summarized as splitting the document into many pages, forming frames, and feeding the frames to AVC. Despite the simplicity of the idea, the perfor- mance for scanned documents is unrivaled, to our knowledge. Results show that ADC objectively outperforms AVC-I and JPEG2000 by up to 2.7 dB and 6.2 dB, respectively, with more significant gains observed for multi-page text-only documents. Furthermore, the encoder outputs documents with superior subjective quality. Replacing H.264/AVC by HEVC in ADC would yield even larger gains. REFERENCES [1] JBIG, “Information Technology - Coded Representation of Picture and Audio Information - Progressive Bi-level Image Compression. ITU-T Recommendation T.82,” Mar. 1993. [2] JBIG2, “Information Technology - Coded Representation of Picture and Audio Information - Lossy/Lossless Coding of Bi-level Images. ITU-T Recommendation T.88,” Mar. 2000. [3] S. Mori, C.Y. Suen, and K.; Yamamoto, “Historical review of OCR research and development,” Proceedings of the IEEE, vol. 80, no. 7, pp. 1029–1058, Jul. 1992. [4] R. L. de Queiroz, Compressing Compound Documents, in the Document and Image Compression Handbook, by M. Barni, Marcel-Dekker, EUA, 2005. [5] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compres- sion Standard, Chapman and Hall, 1993. [6] JPEG, “Information Technology - JPEG2000 Image Coding System - Part 1: Core Coding System. ISO/IEC 15444-1,” 2000. [7] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression Fundamentals, Standards and Practice, Kluwer Academic, EUA, 2002. [8] MRC, “Mixed Raster Content (MRC). ITU-T Recommendation T.44,” 1999. [9] R. L. de Queiroz, R. Buckley, and M. Xu, “Mixed Raster Content (MRC) model for compound image compression,” Proc. of SPIE Visual Communications and Image Processing, vol. 3653, pp. 1106–1117, Jan. 1999. [10] P. Haffner, P. G. Howard, P. Simard, Y. Bengio, and Y. Lecun, “High quality document image compression with DjVu,” Journal of Electronic Imaging, vol. 7, pp. 410–425, 1998. [11] G. Feng and C. A. Bouman, “High-quality MRC document coding,” IEEE Trans. on Image Processing, vol. 15, no. 10, pp. 3152–3169, Oct. 2006. [12] A. Zaghetto, R. L de Queiroz, and D. Mukherjee, “MRC compression of compound documents using threshold segmentation, iterative data- filling and H.264/AVC-INTRA,” Proc. Indian Conference on Computer Vision, Graphics and Image Processing, Dec. 2008. [13] A. Zaghetto and R. L de Queiroz, “Improved layer processing for MRC compression of scanned documents,” Proc. of IEEE Intl. Conference on Image Processing, pp. 1993 – 1996, Nov. 2009. [14] E. Walach and E. Karnin, “A fractal-based approach to image com- pression,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, vol. 11, pp. 529 – 532, Apr. 1986. [15] S.M. Kocsis, “Fractal-based image compression,” in Proc. 23rd. Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 177 –181, 1989. [16] A. Wakatani, “Improvement of adaptive fractal image coding on GPUs,” in Proc. IEEE Intl. Conf. on Consumer Electronics, pp. 255 –256, Jan. 2012. [17] D.C. Garcia and R.L. de Queiroz, “Least-squares directional intra prediction in h.264/avc,” IEEE Signal Processing Letters, vol. 17, no. 10, pp. 831–834, Oct. 2010. [18] Y. Liu L. Liu and E. Delp, “Enhanced intra prediction using contex- tadaptive linear prediction,” in Proc. of PCS 2007 - Picture Coding Symp., Nov. 2007. [19] C. Lan, J. Xu, F. Wu, and G. Shi, “Intra frame coding with template matching prediction and adaptive transform,” in Proc. IEEE Intl. Conf. Image Processing,, pp. 1221 –1224, Sep., 2010. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  6. 6. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 6 (a) (b) (c) (d) Fig. 7. Examples of documents used in our experiments: (a) class 0: multi-page text-only documents; (b) class 1: single-page text-only documents; (c) class 2: multi-page compound documents; and (d) class 3: single-page compound documents. [20] E. B. Lima Filho, E. A. B. da Silva, M. B. de Carvalho, and F. S. Pinag´e, “Universal image compression using multiscale recurrent patterns with adaptive probability model,” IEEE Trans. on Image Processing, vol. 17, no. 4, pp. 512–527, Apr. 2008. [21] N. Francisco, N. Rodrigues, E. da Silva, M. de Carvalho, S. de Faria, and V. da Silva, “Scanned compound document encoding using multiscale recurrent patterns,” IEEE Trans. on Image Processing, vol. 19, no. 10, pp. 2712–2724, Apr. 2010. [22] JVT, “Advanced Video Coding for Generic Audiovisual Services. ITU-T Recommendation H.264,” Nov. 2007. [23] I. E. G. Richardson, H.264 and MPEG-4 video compression, Wiley, EUA, 2003. [24] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003. [25] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding stan- dards,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 688–703, July 2003. [26] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video Coding with H.264/AVC: Tools, Performance, and Complexity,” IEEE Circuits and Systems Magazine, vol. 4, no. 1, pp. 7–28, Mar. 2004. [27] G. J. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions,” Proc. of SPIE Conference on Applications of Digital Image Processing XXVII, Special Session on Advances in the New Emerging Standard: H.264/AVC, vol. 5558, pp. 53–74, Aug. 2004. [28] N. Kamaci and Y. Altunbasak, “Performance comparison of the emerging H.264 video coding standard with the existing standards,” Proc. Intl. Conf. on Multimedia and Expo, vol. 1, pp. 345–348, July 2003. [29] B. G. Haskell, A. Puri, and A. N. Netravalli, Digital Video: An Introduction to MPEG-2, Chapman and Hall, EUA, 1997. [30] ITU-T, “Video Coding for Low Bit Rate Communication. ITU-T Recommendation H.263,” Version 1: Nov. 1995, Version 2: Jan. 1998, Version 3: Nov. 2000. [31] R. L. de Queiroz, R. S. Ortis, A. Zaghetto, and T. A. Fonseca, “Fringe benefits of the H.264/AVC,” Proc. of International Telecommunications Symposium, pp. 208–212, Sep. 2006. [32] D. Marpe, V. George, and T. Wiegand, “Performance comparison of intra-only H.264/AVC and JPEG2000 for a set of monochrome ISO/IEC test images,” Contribution JVT ISO/IEC MPEG and ITU-T VCEG, Doc. JVT M-014, Oct. 2004. [33] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620 – 636, July 2003. [34] A. Al, B. P. Rao, S. S. Kudva, S. Babu, D. Sumam, and A. V. Rao, “Quality and complexity comparison of H.264 intra mode with JPEG2000 and JPEG,” Proc. of IEEE International Conference on Image Processing, vol. 1, pp. 24–27, Oct. 2004. [35] A. Zaghetto and R. L de Queiroz, “High-quality scanned book compres- sion using pattern matching,” Proc. of IEEE International Conference on Image Processing, pp. 26 – 29, Sep. 2010. [36] K. Ugur et. al. , “High performance, low complexity video coding and the emerging HEVC standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1688–1697, Dec. 2010. [37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactios on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004. [38] M.H. Pinson, and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Transactions on Broadcasting, vol.50, no.3, pp. 312- 322, Sept. 2004. [39] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX, Apr. 2001. Alexandre Zaghetto received the Engineer degree in 2002, from the Federal University of Rio de Janeiro, Rio de Janeiro, Brazil, the M.Sc. degree in 2004, from the University of Brasilia, Brasilia, Brazil, and and the Ph.D. degree in 2009 also from the University of Brasilia, all in Electrical Engi- neering. In 2009, he became Associate Professor at the Computer Science Department at University of Brasilia. His main research interests are in image and video processing, compound document coding, bio- metrics, fuzzy logic and artificial neural networks. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  7. 7. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 7 0.2 0.3 0.4 0.5 0.6 0.7 0.8 25 30 35 40 45 Bitrate(bpp) PSNR(dB) Comparison between coders: "guita" ADC − 16 frames/page ADC − 4 frames/page ADC − 1 frame/page AVC−I JPEG2000 (a) 0.2 0.4 0.6 0.8 25 30 35 40 Bitrate(bpp) PSNR(dB) Comparison between coders: page 2 of "guita" ADC − 16 frames/page ADC − 4 frames/page AVC−I JPEG2000 (b) 0.3 0.4 0.5 0.6 0.7 0.8 30 35 40 Bitrate(bpp) PSNR(dB) Comparison between coders: "paper" ADC − 16 frames/page ADC − 4 frames/page AVC−I JPEG2000 (c) 0.4 0.6 0.8 1 28 30 32 34 36 38 40 42 Bitrate(bpp) PSNR(dB) Comparison between coders: "carta" ADC − 16 frames/page ADC − 4 frames/page AVC−I JPEG2000 (d) Fig. 8. Examples of PSNR plots for: (a) class 0 (multi-page, text-only), document “guita” (number of pages: 2, size: 1568 × 1024 pixels); (b) class 1 (single-page, text-only), page 2 of document “guita”(1568 × 1024 pixels); (c) class 2 (multi-page, compound), document “paper” (number of pages: 4, size: 2304 × 1632 pixels); and (d) class 3 (single-page, compound), document “carta” (2152 × 1632 pixels). Search range and number of reference frames are Sr = 32 and Rf = 5, respectively. Ricardo L. de Queiroz received the Engineer de- gree from Universidade de Brasilia , Brazil, in 1987, the M.Sc. degree from Universidade Estadual de Campinas, Brazil, in 1990, and the Ph.D. degree from The University of Texas at Arlington , in 1994, all in Electrical Engineering. In 1990-1991, he was with the DSP research group at Universidade de Brasilia, as a research associate. He joined Xerox Corp. in 1994, where he was a member of the research staff until 2002. In 2000-2001 he was also an Adjunct Faculty at the Rochester Institute of Technology. He joined the Electrical Engineering Department at Universidade de Brasilia in 2003. In 2010, he became a Full Professor at the Computer Science Department at Universidade de Brasilia. Dr. de Queiroz has published over 150 articles in Journals and conferences and contributed chapters to books as well. He also holds 46 issued patents. He is an elected member of the IEEE Signal Processing Society’s Multimedia Signal Processing (MMSP) Technical Committee and a former member of the Image, Video and Multidimensional Signal Processing (IVMSP) Technical Committee. He is a past editor for the EURASIP Journal on Image and Video Processing, IEEE Signal Processing Letters, IEEE Transactions on Image Processing, and IEEE Transactions on Circuits and Systems for Video Technology. He has been appointed an IEEE Signal Processing Society Distinguished Lecturer for the 2011-2012 term. Dr. de Queiroz has been actively involved with the Rochester chapter of the IEEE Signal Processing Society, where he served as Chair and organized the Western New York Image Processing Workshop since its inception until 2001. He is now helping organizing IEEE SPS Chapters in Brazil and just founded the Brasilia IEEE SPS Chapter. He was the General Chair of ISCAS’2011, and MMSP’2009, and is the General Chair of SBrT’2012. He was also part of the organizing committee of ICIP’2002. His research interests include image and video compression, multirate signal processing, and color imaging. Dr. de Queiroz is a Senior Member of IEEE, a member of the Brazilian Telecommunications Society and of the Brazilian Society of Television Engineers. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  8. 8. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 8 0.2 0.4 0.6 0.8 26 28 30 32 34 36 38 40 42 Bitrate(bpp) AveragePSNR(dB) Average over class 0 ADC − 16 frames/page AVC−I JPEG2000 (a) 0.4 0.6 0.8 1 26 28 30 32 34 36 38 40 42 Bitrate(bpp) AveragePSNR(dB) Average over class 1 ADC − 16 frames/page AVC−I JPEG2000 (b) 0.4 0.6 0.8 1 1.2 26 28 30 32 34 36 38 40 Bitrate(bpp) AveragePSNR(dB) Average over class 2 ADC − 16 frames/page AVC−I JPEG2000 (c) 0.4 0.6 0.8 1 1.2 1.4 1.6 28 30 32 34 36 38 40 Bitrate(bpp) AveragePSNR(dB) Average over class 3 ADC − 16 frames/page AVC−I JPEG2000 (c) Fig. 9. Comparison of ADC against JPEG2000 and AVC-I in terms of PSNR averaged for documents in: (a) class 0; (b) class 1; (c) class 2; and (d) class 3. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  9. 9. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 9 (a) (b) (c) (d) Fig. 10. Subjective comparison among coders: (a) zoomed part of “cerrado” sequence; reconstructed versions using (b) AVC-I, (c) JPEG2000 and (d) ADC, at approximately 0.25 bits/pixels. ADC yields superior subjective quality. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

×