You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
👍 Dynamic CPU detection and JIT scalar/sse/avx2 switching
100% C (C++ headers), usage as simple as memcpy
Byte Transpose
Fastest byte transpose
🆕 (2019.11) 2D,3D,4D transpose
Nibble Transpose
nearly as fast as byte transpose
more efficient, up to 10 times! faster than Bitshuffle
🆕 better compression (w/ lz77) and 10 times! faster than one of the best floating-point compressors SPDP
can compress/decompress (w/ lz77) better and faster than other domain specific floating point compressors
Scalar and SIMD Transform
Delta encoding for sorted lists
Zigzag encoding for unsorted lists
Xor encoding
🆕 lossy floating point compression with user-defined error
Transpose Benchmark:
Benchmark Intel CPU: Skylake i7-6700 3.4GHz gcc 9.2 single thread
Benchmark ARM: ARMv8 A73-ODROID-N2 1.8GHz
- Speed test
Benchmark w/ 16k buffer
BOLD = pareto frontier.
E:Encode, D:Decode
./tpbench -s# file -B16K (# = 8,4,2)
E cycles/byte
D cycles/byte
Transpose 64 bits AVX2
.199
.134
TurboTranspose Byte
.326
.201
Blosc byteshuffle
.394
.260
TurboTranspose Nibble
.848
.478
Bitshuffle 8
E cycles/byte
D cycles/byte
Transpose 32 bits AVX2
.121
.102
TurboTranspose Byte
.451
.139
Blosc byteshuffle
.345
.229
TurboTranspose Nibble
.773
.476
Bitshuffle
E cycles/byte
D cycles/byte
Transpose 16 bits AVX2
.095
.071
TurboTranspose Byte
.640
.108
Blosc byteshuffle
.329
.198
TurboTranspose Nibble
.758
1.177
Bitshuffle 2
.067
.067
memcpy
E MB/s
D MB/s
16 bits ARM 2019.11
8192
16384
TurboTranspose Byte
8192
8192
blosc byteshuffle
1638
2341
TurboTranspose Nibble
356
287
blosc bitshuffle
16384
16384
memcpy
E MB/s
D MB/s
32 bits ARM 2019.11
8192
8192
TurboTranspose Byte
8192
8192
blosc byteshuffle
1820
2341
TurboTranspose Nibble
372
252
blosc bitshuffle
E MB/s
D MB/s
64 bits ARM 2019.11
4096
8192
TurboTranspose Byte
5461
5461
blosc byteshuffle
1490
1490
TurboTranspose Nibble
372
260
blosc bitshuffle
Transpose/Shuffle benchmark w/ large files (100MB).
MB/s: 1,000,000 bytes/second
./tpbench -s# file (# = 8,4,2)
E MB/s
D MB/s
Transpose 16 bits AVX2 2019.11
9208
9795
TurboTranspose Byte
8382
7689
Blosc byteshuffle
9377
9584
TurboTranspose Nibble
2750
2530
Blosc bitshuffle
13725
13900
memcpy
E MB/s
D MB/s
Transpose 32 bits AVX2 2019.11
9718
9713
TurboTranspose Byte
9181
9030
Blosc byteshuffle
8750
9472
TurboTranspose Nibble
2767
2942
Blosc bitshuffle 4
E MB/s
D MB/s
Transpose 64 bits AVX2 2019.11
8998
9573
TurboTranspose Byte
8721
8586
Blosc byteshuffle 2
8252
9222
TurboTranspose Nibble
2711
2053
Blosc bitshuffle 2
E MB/s
D MB/s
16 bits ARM 2019.11
872
3998
TurboTranspose Byte
678
3852
blosc byteshuffle
1365
2195
TurboTranspose Nibble
357
280
blosc bitshuffle
3921
3913
memcpy
E MB/s
D MB/s
32 bits ARM 2019.11
1828
3768
TurboTranspose Byte
1769
3713
blosc byteshuffle
1456
2299
TurboTranspose Nibble
374
243
blosc bitshuffle
E MB/s
D MB/s
64 bits ARM 2019.11
1793
3572
TurboTranspose Byte
1784
3544
blosc byteshuffle
1176
1267
TurboTranspose Nibble
331
203
blosc bitshuffle
- Compression test (transpose/shuffle+lz4)
🆕 Download IcApp a new benchmark for TurboPFor+TurboTranspose
for testing allmost all integer and floating point file types.
Note: Lossy compression benchmark with icapp only.
eTp4Lzt = lossy compression with allowed error = 0.0001
Compile:
git clone git://github.com/powturbo/TurboTranspose.git
cd TurboTranspose
Linux + Windows MingW
make
or
make AVX2=1
Windows Visual C++
nmake /f makefile.vs
or
nmake AVX2=1 /f makefile.vs
benchmark with other libraries
download or clone bitshuffle or blosc and type
make AVX2=1 BLOSC=1
or
make AVX2=1 BITSHUFFLE=1
Testing:
benchmark "transpose" functions
./tpbench [-s#] [-z] file
s# = element size #=2,4,8,16,... (default 4)
-z = only lz77 compression benchmark (bitshuffle package mandatory)
Function usage:
Byte transpose:
void tpenc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tpdec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)
Nibble transpose:
void tp4enc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tp4dec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)
Environment:
OS/Compiler (64 bits):
Linux: GNU GCC (>=4.6)
Linux: Clang (>=3.2)
Windows: MinGW-w64 makefile
Windows: Visual c++ (>=VS2008) - makefile.vs (for nmake)
Windows: Visual Studio project file - vs/vs2017 - Thanks to PavelP