Integer + Floating Point Compression Filter

Fastest transpose/shuffle
- 🆕 (2019.11) ALL TurboTranspose functions now available under 64 bits ARMv8 including NEON SIMD.
- Byte/Nibble transpose/shuffle for improving compression of binary data (ex. floating point data)
- ✨ Scalar/SIMD Transpose/Shuffle 8,16,32,64,... bits
- 👍 Dynamic CPU detection and JIT scalar/sse/avx2 switching
- 100% C (C++ headers), usage as simple as memcpy
Byte Transpose
- Fastest byte transpose
- 🆕 (2019.11) 2D,3D,4D transpose
Nibble Transpose
- nearly as fast as byte transpose
- more efficient, up to 10 times! faster than Bitshuffle
- 🆕 better compression (w/ lz77) and
  10 times! faster than one of the best floating-point compressors SPDP
- can compress/decompress (w/ lz77) better and faster than other domain specific floating point compressors
Scalar and SIMD Transform
- Delta encoding for sorted lists
- Zigzag encoding for unsorted lists
- Xor encoding
- 🆕 lossy floating point compression with user-defined error

Transpose Benchmark:

Benchmark Intel CPU: Skylake i7-6700 3.4GHz gcc 9.2 single thread
Benchmark ARM: ARMv8 A73-ODROID-N2 1.8GHz

- Speed test

Benchmark w/ 16k buffer

BOLD = pareto frontier.
E:Encode, D:Decode

    ./tpbench -s# file -B16K   (# = 8,4,2)

E cycles/byte	D cycles/byte	Transpose 64 bits AVX2
.199	.134	TurboTranspose Byte
.326	.201	Blosc byteshuffle
.394	.260	TurboTranspose Nibble
.848	.478	Bitshuffle 8

E cycles/byte	D cycles/byte	Transpose 32 bits AVX2
.121	.102	TurboTranspose Byte
.451	.139	Blosc byteshuffle
.345	.229	TurboTranspose Nibble
.773	.476	Bitshuffle

E cycles/byte	D cycles/byte	Transpose 16 bits AVX2
.095	.071	TurboTranspose Byte
.640	.108	Blosc byteshuffle
.329	.198	TurboTranspose Nibble
.758	1.177	Bitshuffle 2
.067	.067	memcpy

E MB/s	D MB/s	16 bits ARM 2019.11
8192	16384	TurboTranspose Byte
8192	8192	blosc byteshuffle
1638	2341	TurboTranspose Nibble
356	287	blosc bitshuffle
16384	16384	memcpy

E MB/s	D MB/s	32 bits ARM 2019.11
8192	8192	TurboTranspose Byte
8192	8192	blosc byteshuffle
1820	2341	TurboTranspose Nibble
372	252	blosc bitshuffle

E MB/s	D MB/s	64 bits ARM 2019.11
4096	8192	TurboTranspose Byte
5461	5461	blosc byteshuffle
1490	1490	TurboTranspose Nibble
372	260	blosc bitshuffle

Transpose/Shuffle benchmark w/ large files (100MB).

MB/s: 1,000,000 bytes/second

    ./tpbench -s# file  (# = 8,4,2)

E MB/s	D MB/s	Transpose 16 bits AVX2 2019.11
9208	9795	TurboTranspose Byte
8382	7689	Blosc byteshuffle
9377	9584	TurboTranspose Nibble
2750	2530	Blosc bitshuffle
13725	13900	memcpy

E MB/s	D MB/s	Transpose 32 bits AVX2 2019.11
9718	9713	TurboTranspose Byte
9181	9030	Blosc byteshuffle
8750	9472	TurboTranspose Nibble
2767	2942	Blosc bitshuffle 4

E MB/s	D MB/s	Transpose 64 bits AVX2 2019.11
8998	9573	TurboTranspose Byte
8721	8586	Blosc byteshuffle 2
8252	9222	TurboTranspose Nibble
2711	2053	Blosc bitshuffle 2

E MB/s	D MB/s	16 bits ARM 2019.11
872	3998	TurboTranspose Byte
678	3852	blosc byteshuffle
1365	2195	TurboTranspose Nibble
357	280	blosc bitshuffle
3921	3913	memcpy

E MB/s	D MB/s	32 bits ARM 2019.11
1828	3768	TurboTranspose Byte
1769	3713	blosc byteshuffle
1456	2299	TurboTranspose Nibble
374	243	blosc bitshuffle

E MB/s	D MB/s	64 bits ARM 2019.11
1793	3572	TurboTranspose Byte
1784	3544	blosc byteshuffle
1176	1267	TurboTranspose Nibble
331	203	blosc bitshuffle

- Compression test (transpose/shuffle+lz4)

🆕 Download IcApp a new benchmark for TurboPFor+TurboTranspose
for testing allmost all integer and floating point file types.
Note: Lossy compression benchmark with icapp only.

Scientific IEEE 754 32-Bit Single-Precision Floating-Point Datasets

- Speed test (file msg_sweep3d)

C size	ratio %	C MB/s	D MB/s	Name AVX2
11,348,554	18.1	2276	4425	TurboTranspose Nibble+lz
22,489,691	35.8	1670	3881	TurboTranspose Byte+lz
43,471,376	69.2	348	402	SPDP
44,626,407	71.0	1065	2101	bitshuffle+lz
62,865,612	100.0	13300	13300	memcpy

    ./tpbench -s4 -z *.sp

File	File size	lz %	Tp8lz	Tp4lz	BSlz	spdp1	spdp9	Tp4lzt	eTp4lzt
msg_bt	133194716	94.3	70.4	66.4	73.9	70.0	67.4	54.7	32.4
msg_lu	97059484	100.4	77.1	70.4	75.4	76.8	74.0	61.0	42.2
msg_sppm	139497932	11.7	11.6	12.6	15.4	14.4	13.7	9.0	5.6
msg_sp	145052928	100.3	68.8	63.7	68.1	67.9	65.3	52.6	24.9
msg_sweep3d	62865612	98.7	35.8	18.1	71.0	69.6	13.7	9.8	3.8
num_brain	70920000	100.4	76.5	71.1	77.4	79.1	73.9	63.4	32.6
num_comet	53673984	92.4	79.0	77.6	82.1	84.5	84.6	70.1	41.7
num_control	79752372	99.4	89.5	90.7	88.1	98.3	98.5	81.4	51.2
num_plasma	17544800	100.4	0.7	0.7	75.5	30.7	2.9	0.3	0.2
obs_error	31080408	89.2	73.1	70.0	76.9	78.3	49.4	20.5	12.2
obs_info	9465264	93.6	70.2	61.9	72.9	62.4	43.8	27.3	15.1
obs_spitzer	99090432	98.3	90.4	95.6	93.6	100.1	100.7	80.2	52.3
obs_temp	19967136	100.4	89.5	92.4	91.0	99.4	100.1	84.0	55.8

Tp8=Byte transpose, Tp4=Nibble transpose, lz = lz4
eTp4Lzt = lossy compression with lzturbo and allowed error = 0.0001 (1e-4)
Slow but best compression: SPDP9 and lzt = lzturbo,39

Scientific IEEE 754 64-Bit Double-Precision Floating-Point Datasets
```
  ./tpbench -s8 -z *.trace
```

File	File size	lz %	Tp8lz	Tp4lz	BSlz	spdp1	spdp9	Tp4lzt	eTp4lzt
msg_bt	266389432	94.5	77.2	76.5	81.6	77.9	75.4	69.9	16.0
msg_lu	194118968	100.4	82.7	81.0	83.7	83.3	79.6	75.5	21.0
msg_sppm	278995864	18.9	14.5	14.9	19.5	21.5	19.8	11.2	2.8
msg_sp	290105856	100.4	79.2	77.5	80.2	78.8	77.1	71.3	12.4
msg_sweep3d	125731224	98.7	50.7	36.7	80.4	76.2	33.2	27.3	1.9
num_brain	141840000	100.4	82.6	81.1	84.5	87.8	83.3	77.0	16.3
num_comet	107347968	92.8	83.3	78.8	76.3	86.5	86.0	69.8	21.2
num_control	159504744	99.6	92.2	90.9	89.4	97.6	98.9	85.5	25.8
num_plasma	35089600	75.2	0.7	0.7	84.5	77.3	3.0	0.3	0.1
obs_error	62160816	78.7	81.0	77.5	84.4	87.9	62.3	23.4	6.3
obs_info	18930528	92.3	75.4	70.6	82.4	81.7	51.2	33.1	7.7
obs_spitzer	198180864	95.4	93.2	93.7	86.4	100.1	102.4	78.0	26.9
obs_temp	39934272	100.4	93.1	93.8	91.7	98.0	97.4	88.2	28.8

eTp4Lzt = lossy compression with allowed error = 0.0001

Compile:

    git clone git://github.com/powturbo/TurboTranspose.git
    cd TurboTranspose

Linux + Windows MingW

	make
    or
	make AVX2=1

Windows Visual C++

	nmake /f makefile.vs
    or
	nmake AVX2=1 /f makefile.vs

benchmark with other libraries
download or clone bitshuffle or blosc and type
```
  make AVX2=1 BLOSC=1
  or
  make AVX2=1 BITSHUFFLE=1
```

Testing:

benchmark "transpose" functions

./tpbench [-s#] [-z] file
s# = element size #=2,4,8,16,... (default 4) 
-z = only lz77 compression benchmark (bitshuffle package mandatory)

Function usage:

Byte transpose:

void tpenc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tpdec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)

Nibble transpose:

void tp4enc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tp4dec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)

Environment:

OS/Compiler (64 bits):

Linux: GNU GCC (>=4.6)
Linux: Clang (>=3.2)
Windows: MinGW-w64 makefile
Windows: Visual c++ (>=VS2008) - makefile.vs (for nmake)
Windows: Visual Studio project file - vs/vs2017 - Thanks to PavelP
Linux ARM: 64 bits aarch64 ARMv8: gcc (>=6.3)
Linux ARM: 64 bits aarch64 ARMv8: clang

Multithreading:

All TurboTranspose functions are thread safe

References:

Last update: 25 Oct 2019

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
vs		vs
.travis.yml		.travis.yml
README.md		README.md
bitutil.c		bitutil.c
bitutil.h		bitutil.h
conf.h		conf.h
makefile		makefile
makefile.vs		makefile.vs
sse_neon.h		sse_neon.h
time_.h		time_.h
tpbench.c		tpbench.c
transpose.c		transpose.c
transpose.h		transpose.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integer + Floating Point Compression Filter

Transpose Benchmark:

- Speed test

Benchmark w/ 16k buffer

Transpose/Shuffle benchmark w/ large files (100MB).

- Compression test (transpose/shuffle+lz4)

- Speed test (file msg_sweep3d)

Compile:

Linux + Windows MingW

Windows Visual C++

Testing:

Function usage:

Environment:

OS/Compiler (64 bits):

Multithreading:

References:

About

Releases

Packages

Languages

powturbo/Turbo-Transpose

Folders and files

Latest commit

History

Repository files navigation

Integer + Floating Point Compression Filter

Transpose Benchmark:

- Speed test

Benchmark w/ 16k buffer

Transpose/Shuffle benchmark w/ large files (100MB).

- Compression test (transpose/shuffle+lz4)

- Speed test (file msg_sweep3d)

Compile:

Linux + Windows MingW

Windows Visual C++

Testing:

Function usage:

Environment:

OS/Compiler (64 bits):

Multithreading:

References:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages