• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The paq data compression programs
 

The paq data compression programs

on

  • 155 views

 

Statistics

Views

Total Views
155
Views on SlideShare
155
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The paq data compression programs The paq data compression programs Document Transcript

    • The PAQ Data Compression Programs The PAQ Data Compression Programs Matt Mahoney PAQ is a series of open source data compression archivers that have evolved through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). This page traces their development. All versions may be downloaded here (GPL source, Windows and Linux executables). Latest well supported versions. Contents Large Text Compression Benchmark Benchmarks on Calgary corpus PAQ benchmarks (solid archive) WRT dictionary benchmarks Calgary Corpus Challenge Contributors (each listed oldest to newest) Matt Mahoney, Serge Osnach Neural Network Compression (includes AAAI paper) PAQ1 (includes an unpublished paper) PAQ6 (and technical report) PAQ7 archiver PAQ8A, PAQ8F, PAQ8L, PAQ8M, PAQ8N Berto Destasio Johan de Bock David A Scott Fabio Buffoni Jason Schmidt Alexander Ratushnyak (PAQAR, PAQ8H, PAQ8HP1-12) Przemyslaw Skibinski (WRT, PAsQDa, PAQ8B,C,D,E,G) Rudi Cilibrasi (raq8g) Pavel Holoborodko (PAQ8I) Bill Pettis (PAQ8JD, PAQ8K) Serge Osnach (PAQ8JB) Jan Ondrus (PAQ8FTHIS2) How it works The most recent paper describes PAQ6 and its derivatives PAsQDa and PAQAR as of 2005. The compressors use context mixing: a large number of models estimate the probability that the next bit of data will be a 0 or 1. These predictions are combined and arithmetic coded (similar to PPM). In PAQ6, predictions are combined by weighted averaging, then adjusting the weights to favor the most accurate models. M. Mahoney, Adaptive Weighing of Context Models for Lossless Data Compression, Florida Tech. Technical Report CS-2005-16, 2005. PAQ7 and later differ mainly in that model predictions are combined using a neural network rather than by weighted averaging. This is described in more detail in the paq8f.cpp comments. see also the Wikipedia article on PAQ. Benchmarks The Calgary corpus benchmarks have not been maintained since about 2005 except for PAQ versions. Timing tests were done on a now dead computer. Recent benchmarks. Calgary Corpus http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs Test results are shown on the Calgary corpus (14 individual files or concatenated into a single file of 3,141,622 bytes) on a 750 MHz Duron under Windows Me and 256 MB memory. All options set for maximum compression (generally slower) within 64 MB memory (which limits compression on many of the better programs) unless indicated otherwise. Programs are ordered by increasing compression on the concatenated corpus. For sources to many programs, see ftp://ftp.elf.stuba.sk/pub/pc/pack/. Program Options 14 files -------------------compress 1,272,772 pkzip 2.04e 1,032,290 gzip 1.2.4 -9 1,017,624 bzip2 1.0.0 -9 828,347 winhki v1.3e free (hki1 max) 830,315 7zip 3.11 a -mx=9 822,059 sbc 0.910 c -m3 740,161 GRZipII 0.2.2 e 768,609 GRZipII 0.2.4 e 773,008 sbc 0.970r2 -b8 -m3 738,253 ppms e 765,587 acb u 766,322 boa 0.58b -m15 751,413 winhki 1.3e reg (hki2 max) 752,927 winrar 3.20 b3 best, solid 754,270 ppmd H e -m64 -o16 744,057 rk 1.04 -mx3 -M64 -ts 712,188 ppmd J e -o16 -m64 756,763 rk 1.02 b5 -mx3 -M64 -ts 707,160 ppmn 1.00b N1 e -O9 -M:50 -MT1 716,297 enc v0.15 a 724,540 ppmonstr H e -m64 -o1 719,922 rkc a -M80m 685,226 ppmonstr Ipre e -m64 -o128 696,647 epm r7 c -m64 693,538 durilca v.03a e -m64 D 696,789 (as in READ_ME.TXT) D 647,028 rkc a -M80m -td+ D 661,102 ash cn-04 sse-9A9 /s64 709,837 epm r9 c 668,115 slim 16 a -d16 662,991 slim 17 a 661,333 slim 18-19 a 659,358 slim 20 a 659,213 slim 21 a 658,494 durilca v.0.2a e -t2(7) -m64 D 658,943 (as in READ_ME.TXT and -m64) D 652,599 durilca v.0.1 e -t2(7) -m64 D 659,670 (as in READ_ME.TXT) D 652,840 compressia 1.0 beta (180 MB) D 650,398 Block size 5 (60 MB),English D 709,614 ppmonstr J e -o128 673,744 WinRK 1.00b2 64M ppmz16 no dict 668,692 WinRK 1.00b2 64M ppmz16 dict D 639,545 WinRK 2.0.1 PWCM, no dictionary 617,240 WinRK 2.0.1 PWCM, dictionary D 593,348 WinRK 3.0.2b PWCM, dict. D 586,148 no dictionary, 700 MB 603,916 no dict., 256 MB 606,018 Seconds Concatenated ------- -----------1.5 1,318,269 1.5 1,033,217 2 1,021,863 5 859,448 6 852,745 20 821,872 4.1 819,016 4.5 794,045 3.9 793,866 5.5 784,749 4 774,072 110 769,363 44 769,196 14 768,108 7 760,953 5 759,674 36 755,872 5.5*** 753,848 44 750,744 23 748,588 251 739,052 13 736,899 87 710,125 (80 MB) 35 703,320 49 702,612 29 696,845 35 91 695,900 (80 MB) 109 694,527 (387 MB) 54 693,636 (? MB) 139 686,796 (? MB) 141 681,714 (? MB) 153 678,898 (? MB) 159 678,880 (? MB) 156 678,652 (? MB) 30 678,372 32 31 677,989 33 66 674,830 7 674,994 46*** 667,050 146 MB 102 683,462 102 655,955 1275 619,205 192 MB 1107 597,939 192 MB 1326*** 591,342 700 MB 1505*** 608,915 700 MB 1301*** 611,188 256 MB Notes: slim does not have options to limit memory usage. slim caused disk thrashing on my 256 MB PC, which was eliminated by using -d16, with no loss of compression. rkc (with -td+ option), durilca, compressia, and WinRK use English dictionaries (marked with "D"). For programs that are not archivers (compress, gzip, epm, durilca, rkc, ash), the 14 file test size is the total size of 14 compressed files rather than the size of the archive (so grouping similar files in a tar file first might improve compression). ash /m64 (64 MB memory) compresses poorly on the concatenated corpus (about 1.2 MB) so I posted the result for unlimited memory. I didn't try all the options to see which got the best compression. Increasing WinRK 1.0 memory to 224 MB or PPM order from 16 to 32 does not improve compression. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs PAQ compressors found here The following are available below. Compressed size for the concatenated corpus is always about 150-200 bytes smaller (due solely to the archive header), and compression time is about the same. Decompression time below is about the same as compression time, although for some programs above (like gzip), decompression may be faster. Compressor Solid archive size --------------------------P5 992,902 P6 841,717 P12 831,341 P12a 831,341 PAQ1 716,704 PAQ2 702,382 PAQ3 696,616 PAQ3a PAQ3b PAQ3c PAQ3N 684,580 PAQ3Na PAQ3N_ic8_ml_ipo (fastest) PAQ3N_vc71 (smallest .exe) PAQ4 672,134 PAQ4a PAQ4b PAQ4v2a WRT11 + PAQ4v2a 649,201 PAQ5a 661,811 PAQ5b WRT11 + PAQ5a 638,635 PAQ5-EMILCONT-DEUTERIUM 661,604 PAQ6a -0 858,954 PAQ6a -1 780,031 PAQ6a -2 725,798 PAQ6a -3 709,806 PAQ6b -3 PAQ6 -3 PAQ6a -4 655,694 PAQ6a -5 648,951 PAQ6a -6 648,892 PAQ6b -6 PAQ6 -6 PAQ6b -7 647,767 PAQ6b -8 647,646 PAQ6v2ds -6 648,572 PAQ6fb -6 648,257 PAQ6fdj -6 647,923 PAQ6fdj -7 646,932 PAQ6fdj -8 646,943 PAQ6fdj2 -6 647,898 PAQ32 -6 647,898 PAQ601 -6 647,369 PAQ602 -6 646,931 PAQ604 -6 646,875 PAQ603 -6 644,978 PAQ605fb -6 642,178 -7 641,357 -8 640,978 PAQ605fbj -6 640,730 -7 639,924 -8 639,468 PAQ605fbj8 -5 640,629 -6 640,133 PAQ605fbj9 -5 640,768 -6 640,242 PAQ606fb -6 640,464 PAQ6-emilcont-febas -5 639,770 -6 639,371 -7 638,404 -8 638,046 PAQ6-emilcont-anny -5 638,740 -6 638,279 -7 637,289 -8 636,867 PAQ607fb -6 634,892 PAQ6-emilcont-anny-607fb -5 634,471 -6 633,943 Seconds ------31.8 38.4 39.2 36.6 68.1 93.1 76.7 70.0 70.6 69.6 156.2 147.2 142.0 162.0 222.4 186.0 166.5 183.2 139.0 366.3 298.3 261.3 494.6 51.8 65.6 76.1 97.4 79.2 73.5 354.1 625.2 635.8 549.2 516.7 592.6* 607.0* 505.1 428.3 444.7 455.8* 472.1* 430.0 428.5 445.9 430.6 435.0 419.9 400.2 412.0* 423.8* 623.2 644.6 670.5 750.7 716.3 423.3 625.8 626* 636* 648* 817.9 820* 833* 861* 556.4 805.8 http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM] Memory used ----------256 KB 16 MB 16 MB 48 MB 48 MB 48 MB 80 MB 84 MB 186 MB 186 MB 168 MB 2 MB 3 MB 6 MB 18 MB 64 MB 154 MB 202 MB 404 MB 808 MB 202 MB 202 MB 202 MB 404 MB 808 MB 202 MB 202 MB 202 MB 202 MB 202 MB 202 MB 202 MB 404 MB 808 MB 252 MB 504 MB 1008 MB <256 MB >256 MB <256 MB >256 MB 202 MB <256 MB >256 MB >512 MB >1024 MB <256 MB >256 MB >512 MB >1024 MB 206 MB (g++ compile) <256 MB
    • The PAQ Data Compression Programs -7 -8 PAQ6-emilcont-blaster -5 -6 -7 -8 PAQ6-emilcont-destroyer -5 PAQ6-emilcont-annyhilator -5 PAQ6-emilcont-harlock -5 PAQ6ed-schmidtvara -5 PAQ6ed-schmidtvarb -5 PAQ6-emilcont-italia -4 PAQAR 1.0 -6 (get614) PAQAR 1.0 -6 (get614) PAQAR 1.0 -7 (get614) PAQAR 1.0 -8 (get614) PAQAR 1.1 -6 PAQAR 1.1 -7 PAQAR 1.1 -8 PAQAR 1.2 -6 PAQAR 1.2 -6 PAQAR 1.2 -7 PAQAR 1.3 -6 PAQAR 1.3 -7 PAQAR 2.0 -5 PAQAR 2.0 -6 PAQAR 2.0 -7 PAQAR 3.0 -5 PAQAR 3.0 -6 PAQAR 3.0 -7 PAQAR 4.0 -5 PAQAR 4.0 -6 PAQAR 4.0 -7 PAQAR 4.0 -8 emilcontv02 -4 (MARS build) (Intel 8 build) emilcontv02 -5 (Intel 8 build) emilcontv03 alpha -3 PAsQDa10 -5 PAsQDa20 -5 -6 -7 -8 PAsQDa21 -4 -5 -6 -7 -7e PAsQDa30 -5 -6 -7 PAsQDa40 -5 -6 -7 -7e PAsQDa39 -5 -6 -7 -7e -8 -8e PAsQDA41 -5 -6 -7 -7e PAsQDaCC41 -5 -6 -7 -7e PAsQDa 4.2 -5 PAsQDacc 4.2 -5 PAsQDa 4.3 -5 PAsQDacc 4.3 -5 PAsQDa 4.3c -5 -6 -7 -7e -8 PAsQDacc 4.3c -5 -6 -7 D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D 633,133 632,865 633,551 891.5 633,084 632,242 631,834 633,373 831.3 633,788 828.7 633,582 967.3 632,659 709.8 632,119 851.6 640,727 610,647 12733.7t 610,647 1580* 610,468 1598* 610,649 9800*t 610,270 1675* 610,036 1696* 610,247 8453*t 610,244 7541.0t 1681* 610,062 1701* 608,656 1668* 608,438 1687* 607,541 1792* 606,117 1779* 606,131 1780* 607,417 2021* 605,187 2024* 604,872 2015* 606,641 2129* 604,254 2127* 604,037 2116* 604,232 7311*t 654,118 334 228 635,336 669t 651,932 789 614,614 444.4 577,404 1564 576,890 1563* 577,063 1559* 577,178 2370* 578,750 1462 576,471 1555* 575,911 1552* 575,870 1548* 576,835 1574* 573,644 1585* 572,968 1576* 572,938 1580* 569,250 1570* 568,318 1558* 568,229 1563* 569,245 1584* 571,478 1609* 570,833 1601* 570,773 1601* 571,750 1623* 570,874 2890* 571,827 2801* 571,127 1586* 570,451 1579* 570,429 1600* 570,704 3186* 568,511 1627* 568,152 1616* 568,043 1634* 569,099 1634* 571,268 1488 568,876 1432 571,080 1442 568,580 1643 571,080 1494* 570,385 1490 570,351 1483* 571,717 1508* 570,502 2955* 568,234 1490* 567,833 1490* 567,668 1512* http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM] <256 MB <256 MB (g++ compile) <256 MB (g++ compile) <256 MB (MARS compile) <256 MB <256 MB <256 MB 240 MB 240 MB 480 MB 960 MB 230 MB 460 MB 920 MB 230 MB 460 MB 230 MB 460 MB 120 MB 230 MB 460 MB 120 MB 230 MB 460 MB 120 MB 230 MB 460 MB 920 MB <256 MB ~256 MB <192 MB 164 MB 130 MB 240 MB 470 MB 930 MB 100 MB 180 MB 330 MB 630 MB 630 MB 191 MB 354 MB 690 MB 191 MB 354 MB 690 MB 690 MB 128 MB 240 MB 470 MB 470 MB 930 MB 930 MB 128 MB 240 MB 470 MB 470 MB 191 MB 354 MB 690 MB 690 MB 112 MB 175 MB 128 MB 191 MB 191 MB 128 MB 240 MB 470 MB 930 MB 191 MB 322 MB 626 MB
    • The PAQ Data Compression Programs -8 PAQ7 -1 -2 -3 -4 -5 PAsQDa 4.4 -5 -7 PAsQDaCC 4.4 -5 -7 PAQ7PLUS v1.11 -0 -1 -2 -3 -4 PAQ7PLUS v1.19 -0 -1 -2 -3 -4 PAQ8A -4 PAQ8A2 -4 -6 PAQ8B -4 -6 PAQ8C -4 -6 PAQAR 4.5 -5 -7 PAQARCC 4.5 -5 -7 PAQ8D -4 -6 PAQ8E -4 -6 PAQ8F -4 -6 -7 PAQ8Fsse -7 PAQ8G -4 -6 PAQ8H -4 -6 RAQ8G -6 PAQ8I -7 PAQ8J -7 PAQ8JA -7 PAQ8JB -7 PAQ8JC -7 PAQ8JD -7 PAQ8JDsse PAQ8K -7 PAQ8L -6 -7 D 569,139 625,924 618,301 614,209 612,338 611,684 D 571,803 D 571,011 D 567,548 D 567,245 D 586,198 D 582,337 D 579,799 D 578,388 D 577,691 D 585,071 D 581,602 D 579,357 D 578,057 D 575,538 610,624 D 592,976 D 592,847 D 592,976 D 592,847 D 572,763 D 572,265 D 570,374 D 569,956 D 566,495 D 565,495 D 572,089 D 571,717 D 572,461 D 572,115 606,605 605,650 605,792 D D D D 575,351 575,521 572,018 572,077 603,312 D 572,277 598,081 597,106 596,824 596,883 596,179 595,537 595,586 594,857 650 645 710 740*** 740*** 1538 1475*** 1630 1480*** 461 468 501 503*** 507*** 478 480 500 512*** 514*** 792*** 577*** 577*** 515*** 516*** 497*** 501*** 1557*** 1540*** 1552*** 1847*** 495*** 500*** 500*** 503*** 828*** 840*** 881*** 816*** 561*** 572*** 694*** 702*** 1150*** 832*** 1810*** 1997*** 2030*** 2052*** 1997*** 1886*** 5984*** 1918*** 1872*** 56 87 150 275 525 128 470 191 626 53 84 146 272 522 53 84 146 272 522 115 116 418 116 418 116 418 191 626 191 626 116 418 116 418 120 435 854 120 435 120 450 552 730 959 992 1004 1017 1030 MB (times are for g++ compile) MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB 767 MB 435 MB 837 MB *Timed on an AMD 2800+ with 1 GB memory by Werner Bergmans. Times are approximated for 750 MHz by multiplying by 3.6, the approximate ratio of run times on both machines. Times marked with "t" denote some disk thrashing. **Tested on a PIII 500 MHz by Leonardo (run times not adjusted). ***Tested on a 2.2 GHz AMD-64 (in 32 bit XP), adjusted times 4.17. D = Uses external English dictionary. WRT (dictionary) benchmarks WRT11 is a word replacing transform preprocessor written by Przemyslaw Skibinski. It replaces words with 1-3 byte symbols using an external dictionary. Run times include the 3 seconds to run WRT. WRT20 was released Dec. 29, 2003. WRT30 (generic dictionary) + d2 dictionary (tuned to Calgary corpus as with WRT11-20) was released Jan. 29, 2004. Results below: http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs WRT11 + WRT20 + WRT20 + WRT20 + WRT30 WRT30d2 WRT30 WRT30d2 WRT30d2 WRT30d2 WRT30 WRT30d2 WRT30 WRT30d2 WRT30 WRT30d2 PAQ6a PAQ6a PAQ6b PAQ6b -p -b -p -b -p -b -p -b -p -b -p -b -p -b -p -b -p -b -p -b -p -b -p -b -6 -6 -7 -8 + PAQ6v2 -6 + PAQ6v2 -6 + PAQ603 -6 + PAQ603 -6 + PAQ606fb -6 + PAQ607fb -6 + PAQAR 1.2 -6 + PAQAR 1.2 -6 + PAQAR 1.2 -6 + PAQAR 1.2 -6 + PAQAR 4.0 -5 + PAQAR 4.0 -5 626,395 617,734 617,376 618,005 624,067 615,325 621,350 613,684 609,877 605,601 599,638 592,156 589,111 581,945 594,364 587,029 446.9 439.2 415.5* 423.2* 384.7 384.1 317.1 327.3 312.0 2091** 1934** ** ** 1633 1612 202 202 404 808 202 202 202 202 202 206 240 240 240 240 120 120 MB MB MB MB MB MB MB MB MB MB MB MB MB (binaries separate) MB (binaries separate) MB MB Some improvement is possible by compressing the four binary files separately and the text files as a solid archive. For example, PAQ6 -6 and WRT20 + PAQ6 -6 each compress about 5K smaller. Savings are similar for other PAQ and WRT versions. paq6 -6 archive1 news bib book1 book2 paper1 paper2 progc progl progp trans -> 508514 476557 paq6 -6 archive2 geo -> 45263 45274 paq6 -6 archive3 pic -> 29274 29254 paq6 -6 archive4 obj1 -> 8189 8068 paq6 -6 archive5 obj2 -> 52554 52965 ----------Total 643794 612118 with WRT20 in 5 archives paq6 -6 archive * 648892 617734 with WRT20 in one archive File sizes for PAQAR 1.2 -5 and -6 (reported by Leonardo on May 27-28, 2004). Text file order is bib, book1, book2, news, paper1, paper2, progc, progl, progp, trans, compressed together. PAQAR 2.0 results reported June 27, 2004. PAQAR 1.2 -5 -----------text + WRT30 467172 text + WRT30d2 459937 geo 44498 obj1 7778 obj2 46489 pic 23996 -------Total WRT30 589933 Total WRT30d2 582698 -6 -----466536 459370 44481 7776 46331 23987 -----589111 581945 2.0 -6 -----457638 44338 7653 45649 23883 -----579161 PAsQDa 2.0 integrates PAQAR 4.0 with WRT and file reordering to compress to 577,404 bytes, improved with later versions. Calgary Challenge paqc.cpp produced a winning entry to the Calgary Challenge with a RAR archive of 645,667 bytes containing a decompression program and 5 compressed files on Jan. 10, 2004. PAQC is derived from PAQ6 as explained in the source code. To restore the Calgary corpus: unrar e calgary.rar gxx -O d.cpp -o d.exe d v d w d x d y d z (depending on your compiler) The 5 compressed files (total size 639,567 bytes) were produced as follows: paqc paqc paqc paqc -1 -2 -3 -3 v w x y news bib book1 book2 paper1 paper2 progc progl progp trans pic geo obj1 http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs paqc -3 z obj2 The source for d.cpp is released under the GNU General Public License. (It doesn't say so because there are no comments). It is a stripped down version of PAQC that does decompression only. PAQC can also be used as a general purpose archiver, although the compression is usually not quite as good as PAQ6. (PAQC differs mainly in an improved model for pic.) Use the compression option -1 (default) for text, -2 for CCITT images, or -3 for other binary files. The program uses 190 MB memory. On Apr. 2, 2004, Alexander Ratushnyak submitted an entry of 637,116 bytes using a modified version of PAQ (paqar 6). He improved this to 619,922 bytes on Apr. 25, 2004, to 614,738 bytes on May 19, 2004, to 610,920 bytes on June 24, 2004, and to 609,650 bytes on July 12, 2004. The table below compares the compression with paq6-emilcontblaster (paq6eb -5), which was the best available version of PAQ at the time of get637 (paq6eb -6 should compress about 500 bytes smaller but thrashes the disk on my 256 MB PC). File ---geo obj1 obj2 pic others paqc -----45346 8154 52569 26072 507426 -----Archive 639567 +code paq6eb -----44955 8105 49667 27552 500377 -----630636 get637 get619 get614 get610 ------ ------ ------ -----45173 44409 44491 44338 8216 7836 7781 50196 47516 46542 45649 25840 24252 23989 23883 499380 489644 485804 490713 ------ ------ ------ -----628805 613657 608607 604583 637116 619992 614738 610920 get609 pc.ha cc 596 cc 593 cc 589 ------ ------ ------ ------ -----44323 23872 535178 592486 588183 586071 582325 ------ ------ ------ ------ -----603373 592486 588183 586071 582325 609650 603416 596314 593620 589862 The corresponding compressor (source and executable) for get614 is PAQAR 1.0 (use -6 option). The corresponding compressor for get610 is PAQAR 2.0 -6. Przemyslaw Skibinski submitted a challenge entry pc.ha of 603,416 bytes on Apr. 4, 2005. It appears to be a variant of PAsQDa with a tiny dictionary built in, and a single archive of 592,486 bytes. This was improved to 596,314 bytes, (cc 596), by Alexander Ratushnyak on Oct. 25, 2005, 593,620 bytes on Dec. 3, 2005, 589,862 bytes on June 5, 2006. The actual 589,862 byte entry is the two files prog.pmd and c.dat in cc589.zip, not the zip archive. The size is calculated by adding the length of the data file (c.dat), plus 1 byte for the terminator and 3 bytes for the size. prog.pmd is a PPMd var. I archive containing the decompressor C++ source code and two include files. Contributors Versions by Matt Mahoney and Serge Osnach These programs trace the historical development of the PAQ series of archivers. I don't maintain this code, so if it doesn't work on your compiler you will have to fix it yourself. These programs all work like PAQ6 except that there are no options in the older programs. PAQ1SSE/PAQ2 and PAQ3N are by Serge Osnach. Other versions are by Matt Mahoney. Additional contributors after the release of PAQ6 are listed separately. Neural Network Data Compression P5, P6, and P12 are the only known data compression programs based on neural networks that are fast enough for practical use. You may download, use, copy, modify, and distribute these programs under the terms of the GNU general public license. I recommend P12 unless you're short on memory. Files compressed with one program cannot be decompressed with another. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs Windows Executables P5 - "Small" model discussed in the paper below. Gives the worst compression, but uses the least memory (256K). P6 - "Large" model, gives better compression. Requires 16 MB of memory to compress or decompress, plus memory for Windows and other progams. P12 - Improved version of P6. Gives 3-8% better compression on text files than P6 but about 10% slower, still uses 16 MB. Compiled with Borland 5.5 on 5/13/00. P12a - Compatible with P12 but 9% faster than P12 with a smaller executable (33,792 bytes). Compiled by Jason Schmidt, (schmidtj at tampabay dot rr dot com) 9/6/03 using VS .net 2003 (7.1) with the /O2 (optimize for maximum speed) and /G7 (Pentium 4, AMD Athlon, or higher) options, then packed with UPX --brute --force. To use these archivers, run them from the command line in an MS-DOS box: p12 p12 archive file file... dir/b | archive p12 archive more < archive Print this help message Create new archive Create new archive of whole directory Extract or compare files from existing archive View contents of archive Files are never clobbered. The command: p12 archive file has the following meanings: If the archive exists, but not the file, then extract the file. If the file exists, but not the archive, then compress the file. If both exist, then compare the file with what's in the archive. You can't update or extract individual files in an archive. You can only create or extract/compare the whole archive at once. Timestamps, permissions, etc. are not preserved. If you enter a path when compressing, then the filename will be stored that way and extracted to that path, for example: p12 archive file1 subfile2 tmpfile3 p12 archive then file1 will be extracted to the current directory, file2 will be extracted to the the subdirectory sub of the current directory (which must exist or the file will be skipped during extraction), and file3 will be extracted to tmp from the root directory (which also must exist). Substitute / for in UNIX. If you want your files to be portable across Windows and UNIX, don't use a path, and enter filenames in lower case. All of the compressors on this page work the same way. Source Code p5.cpp p6.cpp p12.cpp std.h All three programs use std.h, a replacement for Borland 5.0's poor implementation of vector and string (later fixed in version 5.5). I am including them for reference, as the papers below are based on them, but you may have to port the code. I later ported P12 to g++ 2.952 (DJGPP for Windows) as p12a.cpp, which does not require std.h. This is the one I recommend you use. Archives created with p12a and p12 are compatible, however other combinations are not. To compile (ignore warnings): gxx -O p12a.cpp -o p12.exe http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs These papers describe how the programs work. Fast Text Compression with Neural Networks HTML, PostScript, or PDF (5 pages), © 2000 AAAI, presented at FLAIRS, May 23, 2000, Orlando, Florida. Describes P5 (small model) and P6 (large model). Slides (10 pages) PostScript or PDF Improving Neural Network Text Compression with Word-Oriented Contexts, HTML (unpublished). Describes P12. The FAQ helps explain the code. PAQ1 Archiver PAQ1 uses a combination of models, the most important of which is a nonstationary context-sensitive bit predictor (but no neural network). It give better compression than stationary models such as PPM or Burrows-Wheeler on data where the statistics change over time (such as concatenated files of different types). paq1.exe Windows executable, requires 64 MB memory. Originally posted Jan. 6, 2002. Last updated Jan. 21, 2002 to use a Borland executable (rather than DJGPP), since it's smaller, and to fix some bugs. Run time is the same and the archives are compatible. Paper: The PAQ1 Data Compression Program (draft), PDF. Jan. 20, 2002, revised Feb. 28 and Mar. 2, Mar 6, Mar. 19. paq1.cpp source code and documentation. Updated Jan. 21, 2002 to fix bugs and port to Borland (does not affect archive compatibility). To compile: g++ -O paq1.cpp or: bcc32 -O paq1.cpp If you want to modify the code, you might need stategen.cpp which generated some of the source code (the state tables for type Counter). Updated Jan. 20, 2002. PAQ2 Archiver This is an improved version of PAQ1 with SSE added by Serge Osnach (ench at netcity.ru). It compresses the Calgary corpus to 702,242 bytes (updated May 11, 2003). paq2.cpp source code. paq2.exe executable for Windows. The source of PAQ2 is PAQ1SSE which can be found at compression.graphicon.ru/so/ (in Russian). The only changes are to rename the program and to give credit in the banner. Unfortunately this makes the archives incompatible because the 4'th byte of every archive is changed from "1" to "2". (I changed it because PAQ1 and PAQ2 archives are genuinely incompatible and I wanted both programs to give a sensible error message). PAQ3 Archiver PAQ3 introduces improvements to SSE in PAQ2: linear interpolation between buckets, a more compact SSE representation (2 1-byte counters), and initialization to SSE(p) = p, and some minor improvements (updated Sept. 3, 2003). Thanks to Serge Osnach for introducing me to SSE. paq3.cpp source code. paq3.exe executable for Windows, compiled with g++ -O (DJGPP 2.95.2) and packed with UPX on 9/2/03. paq3a.exe for Pentium 4, AMD Athlon, or higher. Compiled with VS .net 7.1 and packed with UPX. Runs 10% faster http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs than paq3.exe. (Compiled by Jason Schmidt, 9/6/03). paq3b.exe with Intel 7.1 using the "release" and "whole program optimization" options, and packed with UPX. It is about 10% faster than paq3a.exe on his 1600 MHz Athlon XP, but about the same speed as paq3a on my 750 MHz Duron. (Compiled by Jason Schmidt, 9/18/03). paq3c.exe compiled with Intel 8.0 (beta). The smallest (37,376 byte executable) and fastest. (Compiled by Eugene D. Shelwien, 9/20/03). All executables are archive compatible. I recommend paq3c.exe. PAQ3N Archiver PAQ3N contains modifications to PAQ3 by Serge Osnach, released Oct. 9, 2003. It includes improvements to the SSE context (including the last two characters) and a new submodel (SparseModel), three order-2 models which each skip over one byte. It is not archive compatible with PAQ3. It uses about 80 MB memory. Available from his website at www.thepipe.kiev.ua/download/paq3n.zip or mirrored here: paq3n.cpp All of the following Windows executables are archive compatible: paq3n.exe (compiled by Serge Osnach, 10/9/03) paq3na.exe (compiled by Jason Schmidt using VS .net 2003, 10/9/03) ru.datacompression.info/paq3nb.rar contains several faster and smaller variants compiled by Eugene D. Shelwien (10/9/03) paq3n_ic8_ml_ipo.exe (fastest) paq3n_vc71.exe (smallest, 10,752 bytes) PAQ4 Archiver PAQ4 mixes models using adaptive rather than fixed weights, and also includes an improved model for data with fixed length records. This is all explained in the source code. paq4v2.cpp Source code (ver. 2, Nov. 15, 2003) paq4v2.exe Windows executable (g++ -O, UPX, 88,148 bytes) paq4v2a.exe (39,424 bytes, 16% faster, compiled by Jason Schmidt in VS .net 7.1 /O2 /G7, UPX --brute --force, Nov. 22, 2003) Version 2 fixes a bug in which some files were not decompressed correctly in the last few bytes. It will correctly decompress files compressed with either PAQ4 or PAQ4V2. Version 1 is given below for reference only. (Thanks to Alexander Ratushnyak for finding the bug). paq4.cpp Source code (Oct. 16, 2003) paq4.exe Windows executable (compiled with g++ -O and packed with UPX, 88,136 bytes) paq4a.exe, smaller (39,424 bytes) and 16% faster, compiled by Jason Schmidt using VS .net 7.1 /O2 /G7 and packed with UPX 1.90w --brute --force (Oct. 17, 2003) paq4b.exe, even smaller (31,744 bytes) and another 10% faster, compiled by Eugene Shelwien using Intel 8 (Oct. 21, 2003). Other versions (some as small as 9728 bytes) are here. PAQ5 PAQ5 has some minor improvements over PAQ4, including word models for text, models for audio and images, an improved hash table, dual mixers, and modeling of run lengths within contexts. It uses about 186 MB of memory. Updated Dec. 18, 2003. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs paq5.cpp source code, includes a more detailed description. paq5a.exe Windows executable, compiled with g++ -O and UPX. I'm waiting for a faster version to call paq5.exe. paq5b.exe compiled by Jason Schmidt, Dec. 19, 2003, VS .net 7.1 /O2 /G7, UPX --brute --force The main improvement in PAQ6 over PAQ5 is in the context counter states. When counting 0 and 1 bits in a context, it more aggressively decreases the opposite bit count, and gives greater weight to counts when there is a large differene between them. It also includes models for .exe/.dll files and CCITT images. See the source code comments for details. PAQ6 PAQ6 is an archiving data compression program for most operating systems including Windows, UNIX, and Linux. It ranks among the top archivers for data compression, at the expense of speed and memory. (A derived version has won the Calgary Challenge). PAQ6 should be considered experimental, as I expect future improvements. The purpose of the program is to foster the development of better data models and algorithms. These programs were developed with the help of many people. They are open source and are free under terms of the GNU General Public License. To create a new archive, you specify the name of the archive on the command line, and the files you want to compress, either after the archive name or from standard input. Wildcards are not expanded in Windows, so you can use dir/b to get the same effect. For example, to compress all .txt files into archive.pq6 paq6 archive.pq6 file1.txt file2.txt paq6 archive.pq6 *.txt dir/b *.txt | paq6 archive.pq6 (in any operating system) (in UNIX) (in Windows) To decompress: paq6 archive.pq6 PAQ6 assumes you want to extract rather than compress files if the archive already exists. If the files to be extracted also exist, then PAQ6 will simply compare them and report whether they are identical. PAQ6 never clobbers any files. To view the contents of an archive: more < archive.pq6 File names and their lengths are stored in a human-readable header ending with a Windows EOF character and a formfeed to hide the binary compressed data. The first line starts with "PAQ6" so you know which version you need to extract the files. Different versions (PAQ1, PAQ2, etc.) produce incompatible archives. PAQ6 (but not earlier versions) includes an option to trade off compression vs. memory and speed. To compress: paq6 -3 archive.pq6 files... The -3 is optional, and gives a reasonable tradeoff. The possible values are: Compression option Memory needed to compress/decompress ------------------ ------------------------------------0 2 MB (fastest) -1 3 MB -2 6 MB -3 18 MB (default) -4 64 MB -5 154 MB -6 202 MB -7 404 MB -8 808 MB -9 1616 MB (best compression, slowest) There are no decompression options. Instead, the compression option stored in the archive is used, which means that the decompressor needs the same amount of memory as was used to compress the files. There are no options to add, update, or extract individual files. You have to create or extract the entire archive all at http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs once. File names are stored and extracted as they are entered. Thus, if you enter the file names without a directory path (which I recommend), then they will be extracted to the current directory. The archive does not store timestamps, permissions, etc., as these can't be done portably. paq6v2.cpp source code (Jan. 8, 2004) paq6v2.exe Windows executable (Intel 8, UPX, by Jason Schmidt) If you want to modify the state tables in the source code, you will need stgen6.cpp. PAQ6V2 is a replacement for PAQ6, which incorrectly decompresses some small files (those that compress smaller than 4 bytes). PAQ6V2 will correctly decompress files made by either version. Compression produces identical archives so the benchmarks below for PAQ6 are valid. See the bottom of this page for variants that improve on PAQ6 slightly. Note that all versions are archive incompatible with each other unless noted. PAQ6 v1 This version has a bug in that small files (those that compress to less than 4 bytes) will not decompress correctly. PAQ6V2 will correctly decompress all files compressed with PAQ6. Thanks to Alexander Ratushnyak for finding the bug. paq6.cpp Source code, fully documented, Dec. 30, 2003 stgen6.cpp, program to generate the state table in paq6.cpp. (You don't need this unless you want to modify it). paq6.exe, Windows executable, compiled using Intel 8 + UPX, the fastest version in my tests, compiled by Jason Schmidt, Dec. 31, 2003. Non-Windows users can compile as follows: g++ -O paq6.cpp The Windows executables below are slower but are archive compatible. These are included for benchmarking purposes only. paq6a.exe, DJGPP g++ 2.95.2 + UPX paq6b.exe, compiled by Jason Schmidt using VS .net 7.1 /O2 /G7 + UPX (Dec. 30, 2003) paq6_versions.rar, 8 other compiles by Jason Schmidt for older or multithreaded processors (RAR archive). See the readme file. The fastest of these (by about 3%) on my PC is PAQ6_P4_Athlon_AXP.exe, which is just paq6b.exe above. Other executables by Eugene Shelwien, including the smallest (12,288 bytes), and one which displays compression progress (paq6_verb). Source (Jan. 5, 2004). PAQ7 PAQ7 is a complete rewrite of PAQ6 and variants (PAQAR, PAsQDa). Compression ratio is similar to PAQAR but 3 times faster. However it lacks x86 and a dictionary, so does not compress Windows executables and English text files as well as PAsQDa. It does include models for color .bmp, .tiff, and .jpeg files, so compresses these files better. The primary difference from PAQ6 is it uses a neural network to combine models rather than a gradient descent mixer. paq7.exe Windows executable, g++ compile (76,288 bytes, Dec. 24, 2005) paq-7.exe Intel compile by Johan De Bock, 15% faster but doesn't accept wildcards (use dir/b) (47,616 bytes, Dec. 25, 2005) paq7pp.exe g++ compile for Pentium Pro and higher (PCs since 1997), 4% slower than paq-7 but accepts wildcards (30,208 bytes, Jan. 2, 2006). paq7 32-bit Linux 2.6.9 binary (elf, shared libraries, compiled like paq7pp, 66,908 bytes), Jan. 5, 2006 paq7static 32-bit Linux binary, static libraries (517,472 bytes), Jan. 5, 2006 http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs To use: To compress: or (in Windows): input) To extract/compare: To extract with different names: To view contents: paq7 -3 archive files... dir/b | paq7 -3 archive (reads filenames from standard paq7 archive paq7 archive files... more < archive Compression option is -1 to -5 to control memory usage. Speed is about the same for all options (slow): -1 -2 -3 -4 -5 = = = = = 62 MB 96 MB 163 MB (default) 296 MB 525 MB Memory usage is 10% less if no .jpeg images are detected. Tested under 32-bit Windows (g++, Borland, Mars under Me and XP), 64-bit Linux, and Solaris (Sparc). For nonWindows, see source code comments to compile. In Windows only the g++ version accepts wildcards in file names. Note: when reading file names by piping DIR/B be sure the archive is not in the directory you are compressing or else PAQ7 might try to compress (part of) itself. Either put the archive in another directory or give the archive a different extension than the files you are compressing like this: dir/b *.txt | paq7 temptextfiles.paq7 Source code: paq7.cpp and paq7asm.asm (assembles with NASM, or compile with -DNOASM (1/3 slower)) paq7pp.exe is compiled with NASN 0.98.38, MinGW C++ 3.4.2, and UPX 1.24w as follows. Executable size is 30,208 bytes. nasm -f win32 paq7asm.asm --prefix _ g++ paq7.cpp paq7asm.obj -O2 -Os -s -o paq7pp.exe -march=pentiumpro -fomit-frame-pointer upx paq7pp.exe PAQ8A PAQ8A is an experimental pre-release of PAQ8. It has an improved context map (2 byte hash) and state table, bug fixes in the jpeg model, a new x86 model, and minor improvements. It does not include an English dictionary like paq7plus or pasqda, and does not have a .wav model. The x86 model uses a preprocessor which is tested for correct decompression during compression. If this fails, then the preprocessor is bypassed and compression is still correct. Options are -0 (18 MB memory) to -9 (4 GB). -0 is faster than other options, and is the default. -4 uses 115 MB. Each increment doubles memory usage. paq8a.exe Windows executable (Pentium Pro or newer), Jan 27, 2006. paq8a.cpp Source code (compiled as with paq7pp and linked with paq7asm.obj) PAQ8F PAQ8F has 3 improvments over PAQ8A: a more memory efficient context model, a new indirect context model to improve compression, and a new user interface to support drag and drop in Windows. It does not use an English dictionary like PAQ8B/C/D/E. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs To install in Windows, put paq8f.exe or a shortcut on the desktop. To compress a file or folder, drop it on the icon. An archive with a .paq8f extension is put in the same folder as the source. To extract, drop the compressed file on the icon. From the command line use as follows: paq8f [-level] archive files... paq8f [-d] dir1archive.paq8f [dir2] (compresses to archive.paq8f) (extracts to dir2 if given, else dir1) -level ranges from -0 (store without compression) to -9 (smallest, slowest, uses most memory). Default is -5 (needs 256MB memory). You can also compress directories the same way as files. The directory hierarchy is restored upon extraction, creating directories as needed. However file attributes like timestamps and permissions are not preserved. To support drag and drop, paq8f will pause if run with only one argument and no options until you press ENTER. To prevent this, use an option like -5 or -d even if not required. paq8f does not read file names from standard input like earlier versions. Wildcards are allowed (requires g++ compile). paq8f has a more robust detector for x86 preprocessing. Rather than depend on the file name extension (.exe, .dll...) or "MZ" in the header, it tries the E8E9 transform and tests if it helps compression. This allows it to detect Linux executables and reject 16-bit Windows executables. It divides the input file into blocks and will not use the transform on non-executable data within the file. Like earlier versions, the transform is tested at compression time for correct decompression, and abandoned if it fails. No user intervention is required. paq8f uses a new indirect context model that improves compression on most files, text and binary. For example, given a string "AB...AC...AB...AC...AB...A?" it guesses "C" based on the previous observation that "C" followed "BCB" after the first 3 occurrences of "A". This is an example of an order (1,3) indirect context. paq8f also models orders (1,1), (1,2), (2,1) and (2,2). paq8f.exe Windows executable, g++ compile (Pentium Pro or higher), Feb. 28, 2006 paq-8f.exe Intel compile by Johan de Bock (10% faster, but does not accept wildcards) paq8f.cpp, see source for compile instructions, link with paq7asm.asm from paq7 Update Nov. 21, 2006. Updated the wording of the copyright notice (GPL). There is no change to the code or the license. It is recommended that all future versions should use this wording. Update Nov. 22, 2006. paq-8f.zip and paq-8f.tar.gz (Nov. 23, 2006) UNIX/Linux source distribution prepared by Jari Aalto. Update Dec. 15, 2006. paq-x86_64.tgz x86_64 Linux port of paq8f by Matthew Fite. Also as a patch. The updated assembler code paq7asm-x86_64.asm in paq-x86_64.tgz assembled with YASM should work with any version of PAQ that uses paq7asm.asm, which includes all versions of paq7, paq8, and paq8hp* under Linux on X86_64 processors. It replaces MMX code with 64 bit SSE2 code. Update Jan. 19, 2007. Updated the above assembler code (which does not work). paq8f.zip and paq8jd.zip use new assembler code, which can be linked to any paq7/8 version with no changes to the C++ code. The 64 bit Linux versions are archive compatible with the Win32 versions but about 7% faster on an Athlon 64. Update Jan. 30, 2007. Added 32-bit SSE2 assembler code by wowtiger for Pentium 4. Update Feb. 2, 2007. Added 32-bit Linux executables (by Giorgio Tani) to paq8f.zip and paq8jd.zip. The archives contain source and executables for Win32 for Pentium-MMX or higher, Win32 for Pentium 4 or higher, and 32 and 64 bit Linux executables, and all source code. (updated readme.txt on Feb. 12, 2007). PAQ8L paq8l, Mar. 7, 2007, improves on paq8jd by adding a DMC model and removing some redundant models in SparseModel, plus minor tuneups and documentation fixes. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs PAQ8M paq8m, Aug. 4, 2007, is paq8l with the improved JPEG model from paq8fthis by Jan Ondrus. The JPEG model includes a bug fix (it crashed on some malformed JPEG files), and some speed optimization of the DCT/IDCT code. However, JPEG compression is still slower than paq8l. The program will now report errors in case of malformed JPEGs, but they are harmless. Note: paq8m still crashes on one of the JPEG images in the private MFC compression test from maximumcompression.com. paq8l does not have this problem. PAQ8N paq8n, Aug 18, 2007, is paq8l with the further improved JPEG model from paq8fthis2 by Jan Ondrus. It no longer reports harmless errors for malformed JPEGs. Benchmarks with -6 option (files from maximumcompression.com) on a 2.2 GHz Athlon-64, 2 GB, Win32: 842,468 698,214 667,190 667,722 674,995 660,740 661,321 4,168,192 553,493 524,926 547,082 518,694 519,163 513,045 a10.jpg a10.jpg.paq8f a10.jpg.paq8fthis a10.jpg.paq8l a10.jpg.paq8m a10.jpg.paq8fthis2 a10.jpg.paq8n Compression time (seconds) 19 47 22 36 23 27 ohs.doc (contains a large embedded JPEG file). ohs.doc.paq8f 105 ohs.doc.paq8fthis 217 ohs.doc.paq8l 171 ohs.doc.paq8m 228 ohs.doc.paq8fthis2 120 ohs.doc.paq8n 188 Compression is identical to paq8l and paq8m for non JPEG data. Versions by Berto Destasio These large-memory variations by Berto Destasio improve on PAQ4 and PAQ5. paq4-emilcont-duritium.exe is a large memory version (about 364 MB) of PAQ4v2 by Berto Destasio which takes first place on his benchmark as of Nov. 22, 2003. It's not compatible with any other version. I did not test this on the Calgary corpus because my PC has only 256 MB memory. Also, from examining the source code at paq4v2-emilcontduritium.cpp, I believe there is a bug in the random number generator that could cause decompression errors. The program uses modified counter state transition tables, generated with stategen-emilcont.cpp paq5-emilcont-deuterium.cpp (needs 168 MB), Dec. 26, 2003, tuned from PAQ5. The bug in the random number generator is fixed. paq5-emilcont-deuterium.exe, compiled with Digital MARS paq5ed.exe, about 23% faster, compiled by Jason Schmidt using VS .net 7.1, Dec. 27, 2003 (not archive compatible). Additional improvements of pre-release versions of PAQ6 which I sent him. PAQ6 improves on these, however. paq6-emilcont-jackdamarioum.cpp (needs 344 MB), Dec. 29, 2003 paq6d-emilcont-jackdamarioum.cpp (needs 396 MB), Dec. 29, 2003 Adds a new sparse model (SparseModel2) to paq606fb. paq6-emilcont-febas.cpp, Mar. 28, 2004 http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs paq6-emilcont-febas.exe No source code yet. paq6-emilcont-anny.exe, Mar. 30, 2004 (has a bug). paq6-emilcont-anny-607fb.exe, Apr. 1, 2004 paq6-emilcont-blaster.cpp Apr. 7, 2004 paq6-emilcont-blaster.exe paq6eba.exe Intel 8, UPX compile by Jason Schmidt, Apr. 8, 2004. Versions derived from paq6ebb.cpp. Compiled by Jason Schmidt, Apr. 18, 2004. (Add "using namespace std;" to .cpp file to compile) paq6-emilcont-destroyer.cpp, Apr. 12, 2004 paq6-emilcont-destroyer.exe, Intel 8, UPX paq6-emilcont-annyhilator.cpp, Apr. 12, 2004 paq6-emilcont-annyhilator.exe, Intel 8, UPX paq6-emilcont-harlock.cpp, Apr, 15, 2004 paq6-emilcont-harlock.exe, Intel 8, UPX paq6-emilcont-italia, May 2, 2004 The newest versions of Emilcont can be found at http://www.freewebs.com/emilcont/index.htm Intel builds by Johan De Bock can be found at http://studwww.ugent.be/~jdebock/win32_compressor_builds.htm Versions by Johan De Bock PAQ6eb compiled by Johan De Bock contains 2 minor changes to paq6-emilcont-blaster to compile with the Intel 8 compiler (added "using namespace std;" and corrected the line "CounterMap t0, t1, t2, t3, t4, t5, t6,;"). It is otherwise identical to paq6-emilcont-blaster but about 40% faster. paq6eb.cpp, Apr. 8, 2004 paq6eb.exe PAQ6ebb is PAQ6eb that reports compression progress as it runs. This replaces a version posted Apr. 9 which had a bug and was removed. paq6ebb.cpp, Apr. 10, 2004 paq6ebb.exe, Intel 8, UPX (Jason Schmidt, Apr. 11, 2004) Versions by David A. Scott PAQ6v2ds is a variant of PAQ6v2 by David A. Scott that uses 64 bit arithmetic encoding. It improves compression by about 0.05% over PAQ6v2, but is about 3% slower. The compiler must support the unsigned long long type (e.g. g++ and some others). All of the PAQ6 variants from here on accept the same compression options as PAQ6. paq6v2ds.cpp, Jan. 17, 2004 paq6v2ds.exe, Windows executable, compiled by Jason Schmidt http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs PAQ6fdj2 is a variant of PAQ6fdj that has about the same performance but includes an integrity check during decompression. It uses a CACM arithmetic coder which compresses very close to the Shannon limit. (See Moffat, A., Neal, R. M., Witten, I. H. (1998), Arithmetic Coding Revisited, ACM Trans. Information Systems, 16(3) 256-294). paq6fdj2.cpp bit_byts.cpp bit_byts.h Source: Jan. 20, 2004 paq6fdj2.exe, Intel 8, UPX (compiled by Jason Schmidt) PAQ32 is a variant of PAQ6fdj2 that returns the encoder to 32 bits for a bit more speed. Compression is nearly identical to PAQ6fdj2 (since there is no point in using higher precision with a CACM coder). paq32.cpp bit_bytm.cpp bit_bytm.h Source: Jan. 24, 2004 paq32.exe Intel 8, UPX (compiled by Jason Schmidt) Versions by Fabio Buffoni PAQ6fb is variant of PAQ6 by Fabio Buffoni that is a bit faster and gives better compression than PAQ6. It should compile in g++, Borland, Mars and VC6 (old or new for-loop scoping rules). paq6fb.cpp, Jan. 19, 2004 paq6fb.exe, Intel 8, UPX compiled by Jason Schmidt PAQ601 includes a new mixer, some word model changes and some SSE context changes. It uses the original PAQ6 arithmetic coder. paq601.cpp, Jan. 24, 2004. paq601.exe Intel 8, UPX (compiled by Jason Schmidt) PAQ603 is a version uses David Scott's 32 bit CACM coder. paq603.cpp bit_bytm.cpp bit_bytm.h Jan. 25, 2004 paq603.exe, Intel 8, UPX (compiled by Jason Schmidt) PAQ605fb: new recordmodel, changes to state table, minor changes and fine tuning. Includes CACM coder all in one file. paq605fb.cpp, Jan. 30, 2004. paq605fb.exe, Intel 8, UPX (compiled by Jason Schmidt) PAQ606fb contains minor changes. paq606fb.cpp, Mar. 15, 2004. paq606fb.exe, Intel 8, UPX (compiled by Jason Schmidt) PAQ607fb. Several tuning (state table, SSE, charmodel, sparsemodel), new recordmodel, extended mixer, modified sparsemodel2, 5% slower than paq606fb. Memory usage: -6 = 206 MB, -7 = 412 MB, -8 = 824 MB. (Has a bug). paq607fb.cpp, Mar. 30, 2004 paq607fb.exe, DJGPP g++ 2.95.2, UPX paq607fba.exe, Intel 8, UPX by Jason Schmidt, Apr. 8, 2004 Versions by Jason Schmidt This variant by Jason Schmidt combines the modifications from both PAQ6v2ds and PAQ6fb. (fdj = Fabio, David, Jason). http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs paq6fdj.cpp, Jan. 19, 2004 paq6fdj.exe, Intel 8, UPX This variant of PAQ601 includes David Scott's 64 bit coder from PAQ6fdj2. paq602.cpp, Jan. 25, 2004. paq602.exe, Intel 8, UPX This uses his 32 bit CACM coder. paq604.cpp bit_bytm.cpp bit_bytm.h Jan. 25, 2004 paq604.exe, Intel 8, UPX PAQ605fbj adds sparse record and word models to PAQ605fb. Memory usage is 20% higher than stated in the help message. paq605fbj.cpp, Jan. 30, 2004 paq605fbj.exe, Intel 8, UPX These variants add even more models for a slight improvement at the cost of speed and memory. The -5 option works with 256M memory but -6 does not. paq6fbj8.cpp Feb. 20, 2004 paq6fbj8.exe Intel 8, UPX paq6fbj9.cpp Feb. 20, 2004 paq6fbj9.exe Intel 8, UPX Versions derived from paq6-emilcont-destroyer with changes to the counter state tables, one extra CharModel order, and a minor change to RecordModel2. VarB also adds sparse word modeling out to 12 words, and is somewhat slower and takes more memory than VarA, but gives better compression. paq6ed-schmidtvara.cpp, Apr. 19, 2004 paq6ed-schmidtvara.exe, Intel 8, UPX paq6ed-schmidtvarb.cpp, Apr. 19, 2004 paq6ed-schmidtvarb.exe, Intel 8, UPX Versions by Alexander Ratushnyak PAQAR 1.0a is the compressor producing the files for get614.ha, the top entry to the Calgary Challenge (614,738 bytes including the decompressor) as of May 19, 2004. It also works as a general purpose compressor and is the first PAQ version to take the #1 spot in the Maximum Compression benchmark. It uses 240 MB but will run very slowly on a 256 MB machine due to disk thrashing (3.5 hours). With more memory it should take about 20 minutes (750 MHz). To compile in g++ I had to add "#include <cstdio>" and fix 2 old style for-loop scoping problems. (I did not change the posted version, however). Source and .exe (RAR archive) paqar1_0.rar, mirror, May 20, 2004 PAQAR 1.1 improves compression and uses slightly less memory. paqar1_1.rar, May 22, 2004 http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs PAQAR 1.2 accepts the option -Ne (e.g. -6e) to improve execution on x86 code (.exe, .dll files). Source and .exe (RAR archive) paqar1_2.rar, mirror, May 22, 2004 PAQAR 1.3 Source and .exe (RAR archive) paqar1_3.rar, mirror, June 9, 2004 PAQAR 2.0 Source and .exe (RAR archive) paqar2.rar, mirror, June 24, 2004 PAQAR 3.0 Compresses the Calgary corpus to 603,375 bytes as follows: paqar -6 v book1 news paper2 paper1 book2 bib trans progc progp progl obj1 obj2 paqar -6 w pic paqar -6 x geo Source and .exe (RAR archive) paqar3.rar, mirror, July 11, 2004 PAQAR 4.0 Compresses the Calgary corpus to 602,556 bytes as follows (GET609 order): paqar -6 a book1 news paper2 paper1 book2 bib trans progc progl progp obj1 obj2 paqar -6 p pic paqar -6 g geo Source and .exe (RAR archive) paqar4.rar, mirror, July 25, 2004, updated July 27 to fix a bug in the decompressor (does not change statistics). PAQAR 4.1 has a bug fix in the x86 preprocessor that caused some 16-bit executables to decompress incorrectly when used with the -e option in earlier versions. This bug also occurred in PAsQDa versions prior to 4.3b. Calgary corpus results are the same as 4.0. Source and .exe (RAR archive, Dec. 12, 2005) paqar41.rar, mirror, posted Jan. 3, 2006. PAQAR differs from PAQ6 as follows (see whatsnew.txt in distribution): PAQAR uses a weighted combination of a large number of mixers (not just 2) to compute the next bit probability. Each mixer has its own SSE stage. The final combination passes through another 5 SSE stages which are combined using a weighted average depending on file type. PAQAR uses collision detection in both types of countermaps. Collision detection is improved by increasing the checksum from 8 to 16 bits. PAQAR uses a more complicated state table with more entries for low bit counts. There is a preprocessor for x86 files, instead of an EXE model (from Entropy coder). I believe it converts relative call and jump addresses to absolute addresses. PAQ7PLUS 1.11 combines the models from PAQ7 (includes .bmp, .tif, .jpg, mixed with neural network) with the state table, arithmetic coder, English dictionary and TE8E9 x86 preprocessor from PAsQDa. Use with options -0 through -4 (low to high memory) or -0e to -4e to compress .exe or .dll files. Speed is about the same for all options (like PAQ7). http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs paq7plus.rar Mirror (Jan 11, 2006). PAQ7PLUS v1.19 - small improvements over v1.11, posted Jan. 23, 2006. PAQAR 4.5 and PAQARCC 4.5 will probably be the last version based on the PAQ6 core, nothing from PAQ7 or PAQ8. paqar45.rar Feb. 13, 2006 paqar45.rar (mirror) g++/NASM port (for Linux) by Luchezar Georgiev, Aug. 30, 2006, updated Sept. 4, 2006 PAQ8H is based on PAQ8G with some improvements to the model. Released Mar. 22, 2006, updated Mar. 24, 2006. paq8h.rar source code. paq8h.zip includes source, Windows .exe and dictionaries. Note: there are two executables: paq8h.exe (VC++) and paq-8h.exe (Intel by Johan de Bock). The Intel compile is about 2% faster, and 9% faster than the original g++ compile posted Mar. 22, which has been removed. All executables produce identical archives. The benchmark timings are based on the Intel compile. PAQ8HP1 through PAQ8HP6 are specialized for the Hutter prize (text), and lack models for binary data. They are not benchmarked here. See the large text benchmark. Versions by Przemyslaw Skibinski PAsQDa 1.0 combines dictionary coding (WRT) with PAQ6v2. Command: " pasqda -5 calgary.paqd book1 book2 paper1 paper2 bib news progc progl progp pic trans obj1 obj2 geo " gives file with 614170 bytes (225.81 sec. on Celeron 2.4Ghz). pasqda10.zip (source, Windows .exe and dictionary) Mirror, Jan. 18, 2005 PAsQDa 2.0 combines WRT with PAQAR 4.0 and also reorders the input files to improve compression. pasqda20.zip (source, Windows .exe and dictionary), Jan. 24, 2005 Mirror, posted Jan. 26, 2005 PAsQDa 2.1 - on non text files, does not use dictionary and automatically restarts PAQ model. -Ne (-1e to -9e) on .exe/.dll files works like in PAQAR. pasqda21.zip (source, Windows .exe and dictionary), Jan. 31, 2005 Mirror, posted Feb. 1, 2005 PAsQDa 3.0 - word model is optimized for the preprocessor. During compression of Calgary corpus, book2 becomes a predictor for textual files (which increases the memory requirement). pasqda30.zip (source, Windows .exe and dictionary), Feb. 7, 2005 Mirror, posted Feb. 7, 2005 PAsQDa 4.0 - new dictionary and other improvements. pasqda40.zip, Apr. 4, 2005. Mirror, posted Apr. 5, 2005 PAsQDa 3.9 - uses less memory than 4.0 pasqda39.zip, Apr. 7, 2005. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs Mirror, posted Apr. 7, 2005 PAsQDa 4.1 - includes a version optimized for the Calgary corpus - PAsQDaCC. pasqda41.zip, July 1, 2005 Mirror, posted July 15, 2005. PAsQDa 4.1b - is a bug fix for 4.1. Version 4.1 fails to correctly decompress the word "bulandsness". Thanks to Alexander Ratushnyak for finding the bug. pasqda41b.zip, Oct. 13, 2005 Mirror, posted Oct. 13, 2005. PAsQDa 4.2 has 2 bug fixes. First, it fixes a bug in PAsQDa 4.1b that incorrectly decompressed text files ending with a space character (no trailing newline). Second, it fixes a bug in the x86 exe preprocessor TE8E9 that incorrectly decompressed some 16-bit executables. (Thanks to Alexander Ratushnyak for finding both bugs and fixing the x86 bug). Additional features: -w option, which changes balance between prediction and SSE (from -w0 to -w32, default -w16 (-w28 with -e)). Lower memory requirements (32 MB less for -6). Other small improvements. pasqda42.zip, Dec. 8, 2005 Mirror, Dec. 8, 2005 These replace the post of Dec. 5, 2005 with faster executables (Intel compile courtesy of Johan de Bock). No source code changes. PAsQDa 4.3. adds 2 more options. Intel compiles by Johan de Bock. -i (-i0 to -i10, default -i0 (-i8 with -e)) update weight of even mixers. -j (-j0 to -j20, default -j2 (-j20 with -e)) update weight for odd mixers. pasqda43.zip, Dec. 7, 2005 Mirror, posted Dec. 8, 2005 PAsQDa 4.3b fixes another bug in executables compressed with -e in version 4.3. No changes in benchmarks. pasqda43b.zip, Dec. 14, 2005 Mirror, posted Dec. 14, 2005 PAsQDa 4.3c fixes a bug in 4.3b that caused files ending in a punctuation character such as , or ! to decompress incorrectly. pasqda43c.zip, Dec. 21, 2005 Mirror, posted Dec. 21, 2005 PAsQDa 4.4 has improved file type detection and improved compression on foreign language text. pasqda44.zip, Jan. 4, 2006 Mirror, posted Jan. 4, 2006 PAQ8A2 adds WRT dictionaries to PAQ8A (Feb 7, 2006). To install: Unzip paq8a2.zip and wrt44.zip (mirror). Rename paq8a.exe to paq8a2.exe http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs Put paq8a2.exe somewhere in your PATH. Make a subdirectory named TextFilter in the directory where you put paq8a2.exe Put all 7 wrt-*.dic files in TextFilter . PAQ8B replaces PAQ8A2 (which was a pre-release I wasn't supposed to post). It is faster (Intel 8 compile by Johan De Bock), has improved file detection, and fixes a bug in PAQ8A and PAQ8A2 where it was leaving temporary files behind. To install, put paq8b.exe in your PATH and put the 7 wrt*.dic files in a subdirectory TextFilter under the directory where you put paq8b.exe. paq8b.zip Feb. 8, 2006 Mirror, Feb. 8, 2006 PAQ8C includes TextFilter 2.0 for PAQ by P.Skibinski new dictionaries major improvements in file type detection routines file list sorting depend on type of file Intel9 compile by Johan de Bock. paq8c.zip Feb. 12, 2006 (mirror) Feb. 13, 2006 PAQ8D improved speed of file type detection (especially when creating multiple file archive) ExeFilter looks for a "MZ" header in a file instead of an extension .tar files won't be preprocessed by a TextFilter fixed bug from PAQ8C with small files paq8d.zip Feb. 15, 2006 (mirror) Feb. 15, 2006 PAQ8E improved detection of XML and HTML files experimental support for ISO-8859-1 encoding in UTF-8 added WRT-short-Larg.dic and removed WRT-short-MaxC.dic dictionary default memory changed from -0 to -4 fixed bug from PAQ8D in Windows 95/98/Me (program crashed on some text files) Intel9 compile by Johan de Bock. paq8e.zip Feb. 23, 2006 (mirror) Feb. 23, 2006 PAQ8G is PAQ8F with dictionaries added. However it uses the same user interface as older PAQ versions (no drag and drop). Additional improvements: TextFilter redesigned not to decrease compression performance on non-textual files PS, HLP, H, C, CPP, INI, and INF files won't be preprocessed by TextFilter sources can be compiled on Linux/Unix, but when using PAQ8G, dictionaries ("TextFilter" directory) must be present in the current working directory new dictionaries: WRT-short-CompSc.dic, WRT-short-Math.dic, WRT-short-Music.dic http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs Additional dictionaries in 6 other languages are available at http://www.ii.uni.wroc.pl/~inikep/research/dicts/. paq8g.zip (source, Windows and Linux executables, Mar. 3, 2006). Mirror. Versions by Rudi Cilibrasi raq8g is a modification of paq8f with optimizations for the Hutter prize, released Aug. 16, 2006. The improvements come mainly from modeling the nesting of parenthesis and brackets in text, and from increased memory usage. raq8g.exe (Windows executable, compiled with g++ and linked with paq7asm.asm (NASM)). Commands work like paq8f. It does not use dictionaries. The website has a Linux executable and raq8g.cpp. Versions by Pavel L. Holoborodko paq8i by Pavel L. Holoborodko, Aug. 18, 2006, is a modification to paq8h to add a PGM (grayscale image) model. Some results are included as a spreadsheet in the distribution. BMP compression is also improved (small bug fix). It works like paq8h and uses the same dictionaries for text compression (which must be present and identical for decompression, in a TextFilter subdirectory under paq8i.exe). Update: Aug. 22, 2006. I added paq8ib.exe to the archive. This is a Borland 5.5 compile of the same code to fix a bug (also in paq8g and paq8h) that causes the program to crash on some text files when compiled with MINGW 3.4.2 g++ -O. The bug does not occur when compiled with Borland, VC++, or Intel C++, or with g++ without optimization. However, paq8ib.exe is about 20% slower than paq8i.exe. No source code was changed but a file "vector" was added. They were compiled: nasm -f obj --prefix _ paq7asm.asm bcc32 -O -DWIN32 -w-8027 paq8i.cpp paq7asm.obj rename paq8i.exe paq8ib.exe upx paq8ib.exe nasm -f win32 --prefix _ paq7asm.asm g++ -Wall %1.cpp -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o paq8i.exe upx paq8i.exe Update: Sept. 4, 2006. paq8ib.exe crashes on most files, so I removed it. I added paq8idmc.exe, compiled with Digital Mars 8.38n, which appears to work. The original g++ compile is named paq8igcc.exe. I changed one line of paq8i.cpp from #include "vector" to #include <vector> . The Mars compile is 12-14% slower than the gcc compile. To compile in Mars: nasm -f obj --prefix _ paq7asm.asm dmc -O -Ae -DWIN32 -Idmstlportstlport paq8i.cpp paq7asm.obj Update: Sept. 13, 2006. paq8i_cleaned.zip is a "cleaned up" version of the source code with a Mars 8.49 compile, by Michael Adams. It splits up the source code, strips out inline targets, and fixes some warnings. It is archive-compatible with other paq8i versions. Versions by Bill Pettis paq8j (Nov. 13, 2006) is based on paq8f with model improvements from paq8hp5, but without dictionaries. It uses the paq8f drag and drop interface. paq8jd (Dec. 30, 2006) (linked above) is based on paq8jc with additional APM (SSE) stages. Update (Jan 19, 2007). Ported paq8f and paq8jd to AMD64 Linux. The zip files contain source code (C++, 32 and 64 bit NASM/YASM assembler, Win32 and Linux-x86_64 executables. The new paq7asm-x86_64.asm (using 64 bit SSE2 code in YASM) can be linked to any paq7/8 version with no changes to the .cpp file. http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]
    • The PAQ Data Compression Programs Update (Jan 30, 2007). Added SSE2 assembler source code by wowtiger for 32-bit Pentium 4 or higher to the paq8f.zip and paq8jd.zip downloads. The code should work with any paq7/8 version. Speed is improved by about 1%. A Win32 paq8jdsse.exe is included in paq8jd. paq8k, Feb. 13, 2007. Versions by Serge Osnach (See also PAQ1SSE and PAQ3N). paq8ja (Nov. 16, 2006) improves the sparse model of paq8j for better compression of binary and some text files. The model groups bytes in 6 categories (letters, punctuation, etc) and uses up to order-11 contexts. paq8ja uses the drag and drop interface of paq8j. paq8jb (Nov. 21, 2006) adds a distance model, using context of distance back to an anchor character (x00, space, newline, xff) combined with previous characters. Win32 compiled with VS2003. Update, Nov. 23, 2006. paq8jbb.zip by Andrew Paterson fixes some minor bugs (memory leaks) identified by Borland CodeGuard. It maintains compatibility with paq8jb. It also includes a Borland .exe, although it is slower than the VS compile. paq8jc (Nov. 28, 2006) includes paq8jbb bug fixes, improvements to the record model and minor tuneups. Versions by Jan Ondrus paq8fthis (July 27, 2007) is paq8f with improved JPEG compression. paq8fthis2 (Aug. 12, 2007) further improves JPEG compression, is faster, and fixes a bug that caused paq8fthis to crash on some malformed JPEG data (e.g. JPEG fragments in some Thumbs.db files). Matt Mahoney, mmahoney@cs.fit.edu http://cs.fit.edu/~mmahoney/compression/paq.html[10/4/2013 8:51:10 AM]