Research
| BS99 | F07 | D05 | R08 | |
| bib | 1.91 | 1.926 | 1.887 | 1.795 |
| book1 | 2.27 | 2.356 | 2.264 | 2.147 |
| book2 | 1.96 | 2.012 | 1.953 | 1.840 |
| geo | 4.16 | 4.268 | 4.129 | 3.967 |
| news | 2.42 | 2.464 | 2.397 | 2.268 |
| obj1 | 3.73 | 3.765 | 3.692 | 3.584 |
| obj2 | 2.45 | 2.433 | 2.411 | 2.226 |
| paper1 | 2.41 | 2.439 | 2.390 | 2.274 |
| paper2 | 2.36 | 2.387 | 2.329 | 2.230 |
| pic | 0.72 | 0.753 | 0.714 | 0.688 |
| progc | 2.45 | 2.476 | 2.422 | 2.307 |
| progl | 1.68 | 1.697 | 1.660 | 1.576 |
| progp | 1.68 | 1.702 | 1.666 | 1.579 |
| trans | 1.46 | 1.488 | 1.451 | 1.354 |
| average bpb | 2.26 | 2.298 | 2.240 | 2.131 |
NanoZip is based on original lossless compression technology. Engineering efficient compression algorithms lead to research in various areas such as Burrows Wheeler Transform (BWT). The Calgary Corpus results here show how some of this work compares to the latest known research.
BS99 The best results of Balkenhol. [1]
F07 Fenwick's best results. [2]
D05 The best Deorowicz results. [3] Fenwick (2007) describes this as "the best Burrows Wheeler result to date."
R08 My experimental results. (The compressor is not a part of NanoZip.)
NanoZip uses a novel sorting algorithm requiring 4n space for computing BWT. The table beside shows approximated Manzini Corpus results based on Yuta Mori's timings [4] for the latest suffix array construction algorithms. With the exception of MSufSort3 all algorithms work with similar space requirements. The R08 timings are adjusted by the ratio of MSufSort3 timings done with the same hardware as R08.
Binaries of the sort and compression implementations will be made available here in the future.
| Archon4r0 | Deep-Shallow | MSufSort3 | divsufsort2 | R08 | |
| chr22.dna | 6.030 | 7.514 | 7.132 | 5.362 | 5.985 |
| etext99 | 22.160 | 34.264 | 24.106 | 18.064 | 13.823 |
| gcc-3.0.tar | 13.856 | 35.822 | 14.952 | 10.084 | 14.533 |
| howto | 5.806 | 8.288 | 5.672 | 5.320 | 4.034 |
| jdk13c | 18.106 | 32.182 | 11.314 | 9.010 | 8.268 |
| linux-2.4.5.tar | 18.174 | 25.912 | 19.890 | 14.290 | 18.121 |
| rctail96 | 32.490 | 62.502 | 21.060 | 17.914 | 15.225 |
| rfc | 20.736 | 29.666 | 17.936 | 15.658 | 16.728 |
| sprot34.dat | 22.832 | 32.096 | 23.352 | 17.404 | 15.735 |
| w3c2 | 27.264 | 54.682 | 17.090 | 13.486 | 12.750 |
| total seconds | 187.454 | 322.928 | 162.504 | 126.592 | 125.202 |
[1] B. Balkenhol, Y. M. Shtarkov, "One attempt of a compression algorithm using the BWT", Falculty of Mathematics, University of Bielefeld, 1999
[2] P. Fenwick, "Burrows Wheeler Compression: Principles and Reections." Theoretical Computer Science Vol 387 (2007) No. 3 pp 200-219
[3] S. Deorowicz "Context exhumation after the Burrows Wheeler transform", Information Processing Letters, Vol 95, No 1, pp 313-320, 2005
[4] Y. Mori http://code.google.com/p/libdivsufsort/wiki/SACA_Benchmarks