Quantcast
Channel: CodeSection,代码区,Linux操作系统:Ubuntu_Centos_Debian - CodeSec
Viewing all articles
Browse latest Browse all 11063

Facebook Open-Sources New Compression Algorithm Outperforming Zlib

$
0
0

The new Zstandard 1.0 compression algorithm, recently open sourced by Facebook, is one of the few compression algorithms that is both faster and more efficient than zlib, the current “reigning standard”, write Facebook engineers Yann Collet and Chip Turner.

According to Facebook benchmarks, Zstandard outperforms zlib for any combination of compression ratio and bandwidth.


Facebook Open-Sources New Compression Algorithm Outperforming Zlib

In particular, Zstandard showed outstanding performance against zlib when using the standard lossless compression Silesia corpus :

it was ~3 5x faster at the same compression rate it produced 10 15% smaller files at the same compression speed it decompressed 2x faster regardless of compression ratio it scaledto much higher compression ratio (~4x vs. ~3.15).

Zstandard uses Finite State Entropy , based on Jarek Duda’s work on Asymmetric Numeral Systems (ANS) for entropy coding. ANS aims to “end the trade-off between speed and rate” and can be used both for precise coding and very fast encoding, with support for data encryption. But, at the root of Zstandard better performance are a number of other design and implementation choices:

while zlib is limited to a 32KB window, Zstandard leverages the much greater availability of memory in modern environments, including mobile and embedded environments, and does not impose any inherent limit

a new Huffman decoder, Huff0 , is used to decode symbols in parallel thanks to multiple ALUs by reducing the data dependencies between arithmetic operations

Zstandard attempts to be as branchless as possible, thus minimizing the highly expensive pipeline flushes due to incorrect branch predictions. For example, this is how a while loop can be rewritten without using branches:

/* classic version */
while (nbBitsUsed >= 8) { /* each while test is a branch */
accumulator <<= 8;
accumulator += *byte++;
nbBitsUsed -= 8;
}
/* branch-less version */
nbBytesUsed = nbBitsUsed >> 3;
nbBitsUsed &= 7;
ptr += nbBytesUsed;
accumulator = read64(ptr); repcode modeling highly improves the compression of sequences that only differ by a few bytes

Zstandard is both a command line tool and a library, both written in C. It provides more than 20 levels of compression thatallow to carefully fine-tune its use for the concrete available hardware, data to compress, and bottlenecks to optimize. Facebook recommendsstarting out with the default level 3, which is suitable for most cases, and thentrying with higher levels up to level 9 to ensure a reasonable trade-off of speed versus space, or higher for better compression ratios, saving levels 20+ for those cases where you do not care about compression speed.

Collet and Turner also provided some hints at what future versions of Zstandard will bring, including support for multi-threading, and new compression levels allowing for faster compressions as well as higher ratios.

Zstandard follows on Apple’sZLFSE and Google’s Brotli , both open source, each trying to optimized for a specific use case: Brotli seems to be tuned to ensure high compression rates for Web assets and Android APKs, while LZFSE aims to be faster than Zlib at the same compression ratio but with lower power consumption.


Viewing all articles
Browse latest Browse all 11063

Trending Articles