Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD_compressSequencesAndLiterals #4217

Open
wants to merge 67 commits into
base: dev
Choose a base branch
from

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Dec 17, 2024

This PR introduces a variant of ZSTD_compressSequences(),
named ZSTD_compressSequencesAndLiterals(),
which accepts a buffer of literals instead of the source buffer.

This is meant to match specific limitations present in some transcoder operations,
in which it's easy to present Sequences and Literals coming from an external generator,
but it would be more complex or costly to restore the corresponding source.
As an example, one such use case happens within the Linux kernel,
where the source comes as scattered pages which do not form a continuous buffer.

This variant is a bit faster on its own than regular ZSTD_compressSequences(), by almost ~+20%,
as measured below using fullbench on a i7-9700k:

source parser compressSequences compressSequencesAndLiterals difference
calgary.tar level 19 805 MB/s 960 MB/s +19%
enwik6 level 4 761 MBs 917 MB/s +20 %

The difference is pretty consistent, driven primarily by a reduction in overhead during the sequence transcoding operation.

In addition, the goal is to cumulate the speed benefits of this variant with potential other benefits on the caller side, by no longer requiring presentation of the original data.

That being said, ZSTD_compressSequencesAndLiterals() also features some limitations:

  • Only supports Explicit Block Delimiter mode. While this could be extended later on, it's unclear if it would be useful.
  • Does not support Frame Checksum, since it doesn't see the original data. Requesting this checksum will result in an error.
  • Does not write the content size within the frame header
  • Cannot generate uncompressed blocks, since it doesn't have the original data. While this could be partially mitigated in some favorable cases, at the cost of complexity, there are many more complex scenarios for which it cannot be completely eliminated. Thus, if the compression process tries to generate an uncompressed block, it results in a compression error.

For this reason, it's best to keep this method for compression of "small" data (typically <= 128 KB), where data is typically stored uncompressed when it doesn't compress well enough. This setup corresponds to typical kernel use cases like zram, zswap or BtrFS.

@Cyan4973 Cyan4973 self-assigned this Dec 17, 2024
@Cyan4973 Cyan4973 force-pushed the ZSTD_compressSequencesAndLiterals branch from ebc6485 to 5895903 Compare December 17, 2024 06:00
@Cyan4973
Copy link
Contributor Author

Updated implementation, with slightly different prototype (does no longer request srcSize) and improved compression speed.

@Cyan4973 Cyan4973 force-pushed the ZSTD_compressSequencesAndLiterals branch from 877796c to 6b28993 Compare December 18, 2024 06:12
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed everything except the actual implementation in zstd_compress.c and zstd_compress_internal.h.

lib/zstd.h Outdated
* - Does not write the content size in frame header
* - If any block is incompressible, will fail and return an error
* - @litSize must be == sum of all @.litLength fields in @inSeqs. Any discrepancy will generate an error.
* - the buffer @literals must be larger than @litSize by at least 8 bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take another parameter litCapacity here? This seems like an easy constraint to break, and it would be good to force users to pass in a value here and check it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter litCapacity added

@@ -232,7 +232,7 @@ static size_t roundTripTest(void* result, size_t resultCapacity,
const void* src, size_t srcSize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test ZSTD_compressSequencesAndLiterals() here as well. We could just test both ZSTD_compressSequences() and ZSTD_compressSequencesAndLiterals()

compressedSeqsSize = ZSTD_entropyCompressSeqStore(&cctx->seqStore,
&cctx->blockState.prevCBlock->entropy, &cctx->blockState.nextCBlock->entropy,
&cctx->appliedParams,
compressedSeqsSize = ZSTD_entropyCompressSeqStore_wExtLitBuffer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the same as ZSTD_entropyCompressSeqStore(), can we call that instead? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZSTD_entropyCompressSeqStore() is invoking ZSTD_entropyCompressSeqStore_wExtLitBuffer(),
so they both converge to the same implementation.

The difference is that ZSTD_entropyCompressSeqStore() forces the literals buffer to be the one reserved into cctx, while _wExtLitBuffer() let the caller select where the buffer of literals is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cyan4973 I understand that, but in this case the literals are in the seqStore, right?

There are so many parameters that it is hard to read, but it looks like this could just invoke ZSTD_entropyCompressSeqStore() instead, because the literals are cctx->seqStore.litStart, (size_t)(cctx->seqStore.lit - cctx->seqStore.litStart).

Copy link
Contributor Author

@Cyan4973 Cyan4973 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you meant this specific instance.
And yes, you are right, it could be replaced by ZSTD_entropyCompressSeqStore().

This modification was made early during development, when I was trying to have a common pipeline for the 2 variants. This is a left over from this period. Since I gave up this approach, we now have 2 separate pipelines, so it's possible to restore the call to ZSTD_entropyCompressSeqStore(), especially if it helps readability.

Comment on lines 7095 to 7098
/*
* Note: Sequence validation functionality has been disabled (removed).
* This is helpful to find back simplicity, leading to performance.
* It may be re-inserted later.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we error if sequence validation was requested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, at this point, it's just ignored.
It could be a good idea to add such check and error out clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check added

Comment on lines 7203 to 7204
RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall, "No room for empty frame block header");
MEM_writeLE32(op, cBlockHeader24);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have MEM_writeLE24 right? Can we use it here to support dstCapacity == 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize, dstSize_tooSmall, "not enough dstCapacity to write a new compressed block");

compressedSeqsSize = ZSTD_entropyCompressSeqStore_internal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call ZSTD_entropyCompressSeqStore_internal() or can we call ZSTD_entropyCompressSeqStore_wExtLitBuffer()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial code was invoking ZSTD_entropyCompressSeqStore_wExtLitBuffer().
But because ZSTD_entropyCompressSeqStore_wExtLitBuffer() is essentially ZSTD_entropyCompressSeqStore(), just with a selectable litBuffer,
it adds a few constraints, which favor the creation of uncompressed blocks.
In the normal case, it's fine (an uncompressed block is preferable to a compressed one with only 1 byte of saving),
but in the case of ZSTD_compressSequencesAndLiterals(), it is catastrophic, because this variant is unable to generate an uncompressed block (since it doesn't see the original data). This would result in an error.
So we prefer skipping these additional checks, by invoking the next stage directly, ZSTD_entropyCompressSeqStore_internal().

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once we add the fuzzer

compressedSeqsSize = ZSTD_entropyCompressSeqStore(&cctx->seqStore,
&cctx->blockState.prevCBlock->entropy, &cctx->blockState.nextCBlock->entropy,
&cctx->appliedParams,
compressedSeqsSize = ZSTD_entropyCompressSeqStore_wExtLitBuffer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cyan4973 I understand that, but in this case the literals are in the seqStore, right?

There are so many parameters that it is hard to read, but it looks like this could just invoke ZSTD_entropyCompressSeqStore() instead, because the literals are cctx->seqStore.litStart, (size_t)(cctx->seqStore.lit - cctx->seqStore.litStart).

makes it possible to register a sequence without copying its literals.
they should not be in common/zstd_internal.h,
since these definitions are not shared beyond lib/compress/.
SeqDef is a type name, so it should start with a Capital letter.
It's an internal symbol, no impact on public API.
same idea, SeqStore_t is a type name, it should start with a Capital letter.
can receive externally defined buffer of literals
since it's a type name.

Note: in contrast with previous names, this one is on the Public API side.
So there is a #define, so that existing programs using ZSTD_sequenceFormat_e still work.
no need to publish them outside of this unit.
Cyan4973 and others added 29 commits December 20, 2024 10:36
does not need to track and update internal `litPtr`.
note: does not measurably impact performance.
…s is enabled

note: very minor saving, no performance impact
…ng code

results in +4% compression speed,
thanks to removal of branches in the hot loop.
and updated its documentation.
Note: older name ZSTD_c_searchForExternalRepcodes remains supported via #define
check that ZSTD_compressAndLiterals() also controls that the `srcSize` field is exact.
this makes it possible to adjust windowSize to its tightest.
checks that srcSize is present in the frame header
and bounds the window size.
ZSTD_compressSequencesAndLiterals() cannot produce an uncompressed block
since ZSTD_compressSequencesAndLiterals() doesn't support it.
to ZSTD_compressSequencesAndLiterals()
to enforce the litCapacity >= litSize+8 condition.
so that an empty frame needs only 3 bytes of dstCapacity.
in the ZSTD_compressSequences() pipeline
@Cyan4973 Cyan4973 force-pushed the ZSTD_compressSequencesAndLiterals branch from 9bcee5f to 47cbfc8 Compare December 20, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants