Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MAFISC filter #110

Open
t20100 opened this issue Mar 1, 2021 · 2 comments
Open

Adding MAFISC filter #110

t20100 opened this issue Mar 1, 2021 · 2 comments

Comments

@t20100
Copy link
Member

t20100 commented Mar 1, 2021

Once JPEG is added (#85), adding BZIP2 and MAFISC would make hdf5plugin embedding all "commonly used Registered Filter Plugins [..] packaged into the HDF5 Plugin software" (see HDF5 Filter Plugins: BZIP2, JPEG, LZF, BLOSC, MAFISC, LZ4, Bitshuffle, and ZFP) except LZF which is already provided by h5py.

t20100 added a commit to t20100/hdf5plugin that referenced this issue Oct 14, 2021
REVERT: 2b63814b Tag open source release 1.1.9.
REVERT: 9c1be179 'size' remains unused if none of ZLIB, LZO and LZ4 are available.
REVERT: 78650d12 Add project goals to CONTRIBUTING.md.
REVERT: 5e7c14bd Add stubs for abseil flags.
REVERT: 80a2a10c Remove unused run_microbenchmarks flag.
REVERT: 453942b3 Add absl::GetFlag and absl::SetFlag to uses of flags.
REVERT: ea368c2f Add AppVeyor status badge.
REVERT: d1d1f486 Remove unused include in snappy_benchmark.cc.
REVERT: 4ebd8b2f Split benchmarks and test tools into separate targets.
REVERT: 0793e2ae Merge pull request silx-kit#117 from cmumford:disable-osx-fuzzer
REVERT: ac55f842 Test stub improvements.
REVERT: 6e9ae724 Disable fuzzing on OSX.
REVERT: 402d8881 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#115)
REVERT: 6badb0a2 Merge pull request silx-kit#114 from cmumford:werror-only-clang
REVERT: bc53daa7 Fixed endif clause.
REVERT: e9a6a084 Matching clang.
REVERT: 955a5dd1 Building with `-Werror` only with clang.
REVERT: 42d1dd7e Fix CHECK_EQ to call ok() instead of CheckSuccess().
REVERT: eaaa0ed0 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#111)
REVERT: e1e91ee4 Rework file:: stubs.
REVERT: 6aa79cb4 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
REVERT: bae9f9be Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#110)
REVERT: 5f913be0 Fix unused local variable warnings.
REVERT: 549685a5 Remove custom testing and benchmarking code.
REVERT: 11f9a77a Add Travis-CI build status badge to README.md.
REVERT: 49540965 Update Travis CI config.
REVERT: 8995ffab Replace #pragma nounroll with equivalent used elsewhere.
REVERT: d1daa830 Remove inline qualifier from static variables.
REVERT: 3b571656 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's.
REVERT: a9730ed5 Optimize zippy decompression by making IncrementalCopy faster.
REVERT: 56c2c247 Internal change
REVERT: a94be58e Optimize zippy decompression by making IncrementalCopy faster.
REVERT: 01a566f8 Fix opensource version
REVERT: 616b8229 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower.
REVERT: e4a6e97b Extend validate benchmarks over all types and also add a medley for validation.
REVERT: 719bed0a Bug fix. Error on 0 offset copies.
REVERT: 289c8a3c Make zippy decompression branchless
REVERT: 3bfa265a Revert zippy optimization that causes heap buffer overflows.
REVERT: 4d2dc9dc Optimize zippy unzipping by upto >10% by making IncrementalCopy faster.
REVERT: 11e5165b Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration.
REVERT: 6835abd9 Change hash function for Compress.
REVERT: 368b01c8 Merge pull request silx-kit#107 from jsteemann:bug-fix/fix-compile-warning
REVERT: 1ce58af2 Fix the use of op + len when op is nullptr and len is non-zero. See https://reviews.llvm.org/D67122 for some discussion of why this can matter. I don't think this should have any noticeable effect on performance.
REVERT: 0b990db2 Run clang-format
REVERT: cb2b3c7e fix compile warnings due to missing override specifiers
REVERT: 7ffaf77c Replace ARCH_K8 with __x86_64__.
REVERT: 4dd277fe Replace the division with a constant table in IncrementalCopy
REVERT: f16eda34 Correct uninitialized variable.
REVERT: 837f38b3 Revise stubs for ARCH_{K8,PPC,ARM}.
REVERT: e1353b9f Remove ARCH_* guards around Bits::FindLSBSetNonZero64().
REVERT: c98344f6 Fix Clang/GCC compilation warnings.
REVERT: 113cd97a Tighten types on a few for loops.
REVERT: abde3abb Fix Travis CI build.
REVERT: e6506681 Fix accidental double std:: qualifiers.
REVERT: 63620c06 Add some std:: qualifiers to types and functions.
REVERT: 5417da69 Switch from C headers to C++ headers.
REVERT: 251d935d Remove #include <string> from snappy-stubs-public.h.
REVERT: 4f195aee Remove mismatched #endif.
REVERT: 041c6080 Remove platform-dependent code for unaligned loads/stores.
REVERT: 27ff130f Remove platform-dependent code for little-endian loads and stores.
REVERT: a4cdb5d1 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE.
REVERT: 231b8be0 Migrate to standard integral types.
REVERT: 14bef662 Modernize memcpy() and memmove() usage.
REVERT: d674348a Improve zippy with 5-10%.
REVERT: 4dfcad9f assertion failure on darwin_x86_64, have to investigage
REVERT: e1917874 assertion failure on darwin_x86_64, have to investigage
REVERT: 0faf5637 This cl does two things 1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load. 2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).
REVERT: 0c7ed08a The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing.
REVERT: 3c77e014 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain. 2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point. 3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop. 4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path. 5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's.
REVERT: 9eabb7ba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening.
REVERT: cddd9c08 Improve comments in IncrementalCopy, add an assert.

git-subtree-dir: src/snappy
git-subtree-split: 537f4ad6240e586970fe554614542e9717df7902
@vasole
Copy link
Member

vasole commented Oct 6, 2022

For JPEG I would only give the option to build with provided system libraries. I guess it would be something for the users to install if they want.

I do not think we are going to compile the JPEG library nor to supply it with the wheels. Of course I would reconsider if there is a strong demand.

t20100 added a commit to t20100/hdf5plugin that referenced this issue Oct 21, 2022
2b63814b Tag open source release 1.1.9.
9c1be179 'size' remains unused if none of ZLIB, LZO and LZ4 are available.
78650d12 Add project goals to CONTRIBUTING.md.
5e7c14bd Add stubs for abseil flags.
80a2a10c Remove unused run_microbenchmarks flag.
453942b3 Add absl::GetFlag and absl::SetFlag to uses of flags.
ea368c2f Add AppVeyor status badge.
d1d1f486 Remove unused include in snappy_benchmark.cc.
4ebd8b2f Split benchmarks and test tools into separate targets.
0793e2ae Merge pull request silx-kit#117 from cmumford:disable-osx-fuzzer
ac55f842 Test stub improvements.
6e9ae724 Disable fuzzing on OSX.
402d8881 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#115)
6badb0a2 Merge pull request silx-kit#114 from cmumford:werror-only-clang
bc53daa7 Fixed endif clause.
e9a6a084 Matching clang.
955a5dd1 Building with `-Werror` only with clang.
42d1dd7e Fix CHECK_EQ to call ok() instead of CheckSuccess().
eaaa0ed0 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#111)
e1e91ee4 Rework file:: stubs.
6aa79cb4 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
bae9f9be Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#110)
5f913be0 Fix unused local variable warnings.
549685a5 Remove custom testing and benchmarking code.
11f9a77a Add Travis-CI build status badge to README.md.
49540965 Update Travis CI config.
8995ffab Replace #pragma nounroll with equivalent used elsewhere.
d1daa830 Remove inline qualifier from static variables.
3b571656 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's.
a9730ed5 Optimize zippy decompression by making IncrementalCopy faster.
56c2c247 Internal change
a94be58e Optimize zippy decompression by making IncrementalCopy faster.
01a566f8 Fix opensource version
616b8229 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower.
e4a6e97b Extend validate benchmarks over all types and also add a medley for validation.
719bed0a Bug fix. Error on 0 offset copies.
289c8a3c Make zippy decompression branchless
3bfa265a Revert zippy optimization that causes heap buffer overflows.
4d2dc9dc Optimize zippy unzipping by upto >10% by making IncrementalCopy faster.
11e5165b Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration.
6835abd9 Change hash function for Compress.
368b01c8 Merge pull request silx-kit#107 from jsteemann:bug-fix/fix-compile-warning
1ce58af2 Fix the use of op + len when op is nullptr and len is non-zero. See https://reviews.llvm.org/D67122 for some discussion of why this can matter. I don't think this should have any noticeable effect on performance.
0b990db2 Run clang-format
cb2b3c7e fix compile warnings due to missing override specifiers
7ffaf77c Replace ARCH_K8 with __x86_64__.
4dd277fe Replace the division with a constant table in IncrementalCopy
f16eda34 Correct uninitialized variable.
837f38b3 Revise stubs for ARCH_{K8,PPC,ARM}.
e1353b9f Remove ARCH_* guards around Bits::FindLSBSetNonZero64().
c98344f6 Fix Clang/GCC compilation warnings.
113cd97a Tighten types on a few for loops.
abde3abb Fix Travis CI build.
e6506681 Fix accidental double std:: qualifiers.
63620c06 Add some std:: qualifiers to types and functions.
5417da69 Switch from C headers to C++ headers.
251d935d Remove #include <string> from snappy-stubs-public.h.
4f195aee Remove mismatched #endif.
041c6080 Remove platform-dependent code for unaligned loads/stores.
27ff130f Remove platform-dependent code for little-endian loads and stores.
a4cdb5d1 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE.
231b8be0 Migrate to standard integral types.
14bef662 Modernize memcpy() and memmove() usage.
d674348a Improve zippy with 5-10%.
4dfcad9f assertion failure on darwin_x86_64, have to investigage
e1917874 assertion failure on darwin_x86_64, have to investigage
0faf5637 This cl does two things 1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load. 2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).
0c7ed08a The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing.
3c77e014 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain. 2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point. 3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop. 4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path. 5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's.
9eabb7ba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening.
cddd9c08 Improve comments in IncrementalCopy, add an assert.

git-subtree-dir: src/snappy
git-subtree-split: 2b63814b15a2aaae54b7943f0cd935892fae628f
@vasole vasole changed the title Adding BZIP2 and MAFISC filters Adding MAFISC filter Dec 16, 2022
@vasole
Copy link
Member

vasole commented Dec 16, 2022

BZip2 already in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants