Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace trailing 1-element arrays with C99 flex-arrays #3785

Closed
wants to merge 1 commit into from

Conversation

ebiggers
Copy link
Contributor

@ebiggers ebiggers commented Oct 7, 2023

Using a trailing 1-element array as a variable-length array is undefined behavior. Compilers have traditionally allowed this, as this is a traditional, pre-C99 convention. But it's being increasingly phased out in favor of C99 flexible arrays, which avoid ambiguity about whether the array is intended to be variable-length or not. gcc 13+ and clang 16+ support the -fstrict-flex-arrays=3 option, which Linux v6.5+ enables globally, resulting in UBSAN errors when these arrays are accessed. (See https://lore.kernel.org/r/00000000000049964e06041f2cbf@google.com)

Fix this by making zstd use C99 flexible arrays instead of 1-element arrays. This only affects three places (that I could find): one in fse_decompress.c and two in zstdmt_compress.c.

Note, this breaks the c89build CI job and thus might not be suitable for merging yet. I recommend that zstd drop support for C89/C90.

Using a trailing 1-element array as a variable-length array is undefined
behavior.  Compilers have traditionally allowed this, as this is a
traditional, pre-C99 convention.  But it's being increasingly phased out
in favor of C99 flexible arrays, which avoid ambiguity about whether the
array is intended to be variable-length or not.  gcc 13+ and clang 16+
support the -fstrict-flex-arrays=3 option, which Linux v6.5+ enables
globally, resulting in UBSAN errors when these arrays are accessed.
(See https://lore.kernel.org/r/00000000000049964e06041f2cbf@google.com)

Fix this by making zstd use C99 flexible arrays instead of 1-element
arrays.  This only affects three places (that I could find): one in
fse_decompress.c and two in zstdmt_compress.c.

Note, this breaks the c89build CI job and thus might not be suitable for
merging yet.  I recommend that zstd drop support for C89/C90.
@facebook-github-bot
Copy link

Hi @ebiggers!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 7, 2023

We still value C89/C90 support at this time.

I can see why this programming pattern, which used to be tolerated in the past, would be a problem for modern unforgiving static analyzers.

Maybe a more future-proof solution could be to move away from this pattern, though I also understand it's a much larger refactor.

@ebiggers
Copy link
Contributor Author

ebiggers commented Oct 7, 2023

It's not just about static analyzers. This issue prevents enabling the UBSAN array bounds sanitizer, which can be desirable as a security feature (not just debugging), in code that is being built with -fstrict-flex-arrays=3 (such as the Linux kernel).

Other options could include #ifdef to use different styles of variable-length array on different C versions, or forcing -fstrict-flex-arrays=0 for the affected files

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 7, 2023

Yeah, this is a concern.

Shortest solution might be to use selective #ifdef to cater to both C90 and C99 conventions.

Longer term, I still believe it will be better to entirely move away from this pattern.
But this action will require a longer time span.

Cyan4973 added a commit that referenced this pull request Oct 9, 2023
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.

Not sure if this change is good enough to avoid ubsan warnings though.
@Cyan4973
Copy link
Contributor

This issue prevents enabling the UBSAN array bounds sanitizer, which can be desirable as a security feature (not just debugging), in code that is being built with -fstrict-flex-arrays=3 (such as the Linux kernel).

Btw, I just tried this setting -fstrict-flex-arrays=3 for a local test, and the default local compiler (Apple clang version 15.0.0 (clang-1500.0.40.1)) did not like it :

error: invalid value '3' in '-fstrict-flex-arrays=3'

I presume it must be a recent setting, with is not widely supported yet.

@terrelln
Copy link
Contributor

Thanks @ebiggers for bringing this to our attention & submitting a PR!

I will look into fixing this upstream and merging the relevant fix into my tree.

Unfortunately I missed those messages in the LKML. It looks like my email has to stop syncing my folder containing LKML messages that CC me. So I have to fix that and work through my backlog.

@terrelln
Copy link
Contributor

@Cyan4973 What do you think about adding a macro to our portability.h?

#if __STDC_VERSION__ >= 199901L
#  define ZSTD_FLEXIBLE_ARRAY_DIM /* C99 */
#else
#  define ZSTD_FLEXIBLE_ARRAY_DIM 1
#endif

Then using that for all of our flexible arrays?

@Cyan4973
Copy link
Contributor

This clever use of macro would answer the specific C90 flexArray[1] vs C99 flexArray[] notation issue.
Unfortunately, then, there are a few subtle issues around correctly sizing the structure, that would require additional modifications.

It also assumes that the new ubsan flag would not be used simultaneously with the C90 code convention,
which is probably okay for @ebiggers scenario, but not completely guaranteed.

Finally, I also believe that it's generally good practice to move away from this convention, whenever practical.
On this ground, the proposed modifications of ZSTDMT_ look good to me,
I think I would prefer to work with the new code, more readable, more reliable.

For the fse side though, the situation is a bit more complex, and I'm more on the fence, so the discussion remains opened.

Cyan4973 added a commit that referenced this pull request Oct 19, 2023
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.

Not sure if this change is good enough to avoid ubsan warnings though.
Cyan4973 added a commit that referenced this pull request Oct 20, 2023
@Cyan4973
Copy link
Contributor

Cyan4973 commented Nov 1, 2023

This issue has been fixed in #3785 + #3789 .

@Cyan4973 Cyan4973 closed this Nov 1, 2023
@Cyan4973 Cyan4973 self-assigned this Nov 1, 2023
gcflymoto pushed a commit to gcflymoto/zstd that referenced this pull request Nov 2, 2023
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.

Not sure if this change is good enough to avoid ubsan warnings though.
hswong3i pushed a commit to alvistack/facebook-zstd that referenced this pull request Mar 27, 2024
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.

Not sure if this change is good enough to avoid ubsan warnings though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants