-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding MAFISC filter #110
Labels
Comments
t20100
added a commit
to t20100/hdf5plugin
that referenced
this issue
Oct 14, 2021
REVERT: 2b63814b Tag open source release 1.1.9. REVERT: 9c1be179 'size' remains unused if none of ZLIB, LZO and LZ4 are available. REVERT: 78650d12 Add project goals to CONTRIBUTING.md. REVERT: 5e7c14bd Add stubs for abseil flags. REVERT: 80a2a10c Remove unused run_microbenchmarks flag. REVERT: 453942b3 Add absl::GetFlag and absl::SetFlag to uses of flags. REVERT: ea368c2f Add AppVeyor status badge. REVERT: d1d1f486 Remove unused include in snappy_benchmark.cc. REVERT: 4ebd8b2f Split benchmarks and test tools into separate targets. REVERT: 0793e2ae Merge pull request silx-kit#117 from cmumford:disable-osx-fuzzer REVERT: ac55f842 Test stub improvements. REVERT: 6e9ae724 Disable fuzzing on OSX. REVERT: 402d8881 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#115) REVERT: 6badb0a2 Merge pull request silx-kit#114 from cmumford:werror-only-clang REVERT: bc53daa7 Fixed endif clause. REVERT: e9a6a084 Matching clang. REVERT: 955a5dd1 Building with `-Werror` only with clang. REVERT: 42d1dd7e Fix CHECK_EQ to call ok() instead of CheckSuccess(). REVERT: eaaa0ed0 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#111) REVERT: e1e91ee4 Rework file:: stubs. REVERT: 6aa79cb4 Wrap snappy_unittest in an anonymous namespace and remove static from functions. REVERT: bae9f9be Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#110) REVERT: 5f913be0 Fix unused local variable warnings. REVERT: 549685a5 Remove custom testing and benchmarking code. REVERT: 11f9a77a Add Travis-CI build status badge to README.md. REVERT: 49540965 Update Travis CI config. REVERT: 8995ffab Replace #pragma nounroll with equivalent used elsewhere. REVERT: d1daa830 Remove inline qualifier from static variables. REVERT: 3b571656 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's. REVERT: a9730ed5 Optimize zippy decompression by making IncrementalCopy faster. REVERT: 56c2c247 Internal change REVERT: a94be58e Optimize zippy decompression by making IncrementalCopy faster. REVERT: 01a566f8 Fix opensource version REVERT: 616b8229 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower. REVERT: e4a6e97b Extend validate benchmarks over all types and also add a medley for validation. REVERT: 719bed0a Bug fix. Error on 0 offset copies. REVERT: 289c8a3c Make zippy decompression branchless REVERT: 3bfa265a Revert zippy optimization that causes heap buffer overflows. REVERT: 4d2dc9dc Optimize zippy unzipping by upto >10% by making IncrementalCopy faster. REVERT: 11e5165b Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration. REVERT: 6835abd9 Change hash function for Compress. REVERT: 368b01c8 Merge pull request silx-kit#107 from jsteemann:bug-fix/fix-compile-warning REVERT: 1ce58af2 Fix the use of op + len when op is nullptr and len is non-zero. See https://reviews.llvm.org/D67122 for some discussion of why this can matter. I don't think this should have any noticeable effect on performance. REVERT: 0b990db2 Run clang-format REVERT: cb2b3c7e fix compile warnings due to missing override specifiers REVERT: 7ffaf77c Replace ARCH_K8 with __x86_64__. REVERT: 4dd277fe Replace the division with a constant table in IncrementalCopy REVERT: f16eda34 Correct uninitialized variable. REVERT: 837f38b3 Revise stubs for ARCH_{K8,PPC,ARM}. REVERT: e1353b9f Remove ARCH_* guards around Bits::FindLSBSetNonZero64(). REVERT: c98344f6 Fix Clang/GCC compilation warnings. REVERT: 113cd97a Tighten types on a few for loops. REVERT: abde3abb Fix Travis CI build. REVERT: e6506681 Fix accidental double std:: qualifiers. REVERT: 63620c06 Add some std:: qualifiers to types and functions. REVERT: 5417da69 Switch from C headers to C++ headers. REVERT: 251d935d Remove #include <string> from snappy-stubs-public.h. REVERT: 4f195aee Remove mismatched #endif. REVERT: 041c6080 Remove platform-dependent code for unaligned loads/stores. REVERT: 27ff130f Remove platform-dependent code for little-endian loads and stores. REVERT: a4cdb5d1 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE. REVERT: 231b8be0 Migrate to standard integral types. REVERT: 14bef662 Modernize memcpy() and memmove() usage. REVERT: d674348a Improve zippy with 5-10%. REVERT: 4dfcad9f assertion failure on darwin_x86_64, have to investigage REVERT: e1917874 assertion failure on darwin_x86_64, have to investigage REVERT: 0faf5637 This cl does two things 1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load. 2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb). REVERT: 0c7ed08a The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing. REVERT: 3c77e014 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain. 2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point. 3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop. 4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path. 5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's. REVERT: 9eabb7ba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening. REVERT: cddd9c08 Improve comments in IncrementalCopy, add an assert. git-subtree-dir: src/snappy git-subtree-split: 537f4ad6240e586970fe554614542e9717df7902
For JPEG I would only give the option to build with provided system libraries. I guess it would be something for the users to install if they want. I do not think we are going to compile the JPEG library nor to supply it with the wheels. Of course I would reconsider if there is a strong demand. |
t20100
added a commit
to t20100/hdf5plugin
that referenced
this issue
Oct 21, 2022
2b63814b Tag open source release 1.1.9. 9c1be179 'size' remains unused if none of ZLIB, LZO and LZ4 are available. 78650d12 Add project goals to CONTRIBUTING.md. 5e7c14bd Add stubs for abseil flags. 80a2a10c Remove unused run_microbenchmarks flag. 453942b3 Add absl::GetFlag and absl::SetFlag to uses of flags. ea368c2f Add AppVeyor status badge. d1d1f486 Remove unused include in snappy_benchmark.cc. 4ebd8b2f Split benchmarks and test tools into separate targets. 0793e2ae Merge pull request silx-kit#117 from cmumford:disable-osx-fuzzer ac55f842 Test stub improvements. 6e9ae724 Disable fuzzing on OSX. 402d8881 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#115) 6badb0a2 Merge pull request silx-kit#114 from cmumford:werror-only-clang bc53daa7 Fixed endif clause. e9a6a084 Matching clang. 955a5dd1 Building with `-Werror` only with clang. 42d1dd7e Fix CHECK_EQ to call ok() instead of CheckSuccess(). eaaa0ed0 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#111) e1e91ee4 Rework file:: stubs. 6aa79cb4 Wrap snappy_unittest in an anonymous namespace and remove static from functions. bae9f9be Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#110) 5f913be0 Fix unused local variable warnings. 549685a5 Remove custom testing and benchmarking code. 11f9a77a Add Travis-CI build status badge to README.md. 49540965 Update Travis CI config. 8995ffab Replace #pragma nounroll with equivalent used elsewhere. d1daa830 Remove inline qualifier from static variables. 3b571656 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's. a9730ed5 Optimize zippy decompression by making IncrementalCopy faster. 56c2c247 Internal change a94be58e Optimize zippy decompression by making IncrementalCopy faster. 01a566f8 Fix opensource version 616b8229 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower. e4a6e97b Extend validate benchmarks over all types and also add a medley for validation. 719bed0a Bug fix. Error on 0 offset copies. 289c8a3c Make zippy decompression branchless 3bfa265a Revert zippy optimization that causes heap buffer overflows. 4d2dc9dc Optimize zippy unzipping by upto >10% by making IncrementalCopy faster. 11e5165b Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration. 6835abd9 Change hash function for Compress. 368b01c8 Merge pull request silx-kit#107 from jsteemann:bug-fix/fix-compile-warning 1ce58af2 Fix the use of op + len when op is nullptr and len is non-zero. See https://reviews.llvm.org/D67122 for some discussion of why this can matter. I don't think this should have any noticeable effect on performance. 0b990db2 Run clang-format cb2b3c7e fix compile warnings due to missing override specifiers 7ffaf77c Replace ARCH_K8 with __x86_64__. 4dd277fe Replace the division with a constant table in IncrementalCopy f16eda34 Correct uninitialized variable. 837f38b3 Revise stubs for ARCH_{K8,PPC,ARM}. e1353b9f Remove ARCH_* guards around Bits::FindLSBSetNonZero64(). c98344f6 Fix Clang/GCC compilation warnings. 113cd97a Tighten types on a few for loops. abde3abb Fix Travis CI build. e6506681 Fix accidental double std:: qualifiers. 63620c06 Add some std:: qualifiers to types and functions. 5417da69 Switch from C headers to C++ headers. 251d935d Remove #include <string> from snappy-stubs-public.h. 4f195aee Remove mismatched #endif. 041c6080 Remove platform-dependent code for unaligned loads/stores. 27ff130f Remove platform-dependent code for little-endian loads and stores. a4cdb5d1 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE. 231b8be0 Migrate to standard integral types. 14bef662 Modernize memcpy() and memmove() usage. d674348a Improve zippy with 5-10%. 4dfcad9f assertion failure on darwin_x86_64, have to investigage e1917874 assertion failure on darwin_x86_64, have to investigage 0faf5637 This cl does two things 1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load. 2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb). 0c7ed08a The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing. 3c77e014 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain. 2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point. 3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop. 4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path. 5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's. 9eabb7ba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening. cddd9c08 Improve comments in IncrementalCopy, add an assert. git-subtree-dir: src/snappy git-subtree-split: 2b63814b15a2aaae54b7943f0cd935892fae628f
BZip2 already in. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Once JPEG is added (#85), adding BZIP2 and MAFISC would make
hdf5plugin
embedding all "commonly used Registered Filter Plugins [..] packaged into the HDF5 Plugin software" (see HDF5 Filter Plugins: BZIP2, JPEG, LZF, BLOSC, MAFISC, LZ4, Bitshuffle, and ZFP) except LZF which is already provided byh5py
.The text was updated successfully, but these errors were encountered: