Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix LayerNorm f16 CPU implementation #22479

Merged
merged 5 commits into from
Oct 18, 2024
Merged

Conversation

fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Oct 17, 2024

Description

The recent PR #22223 introduced 2 bugs in implementation of CPU LayerNorm f16:

  • possible access to nullptr for bias
    const TensorShape& bias_shape = bias->Shape(); will crash when bias does not exist. (amazingly seems this one is not coverred by any test case)
    • fix: guard with pointer check
  • a racing condition inside ComputeJob
    ComputeJob() is dispatched to threadpool and it internally tries to modify LayerNormImpl::scale_fp32_ and LayerNormImpl::bias_fp32_, which are std::unique_ptrs and are not thread-safe.
    • fix: move the modification of LayerNormImpl::scale_fp32_ and LayerNormImpl::bias_fp32_ out of ComputeJob() and put into LayerNormImpl::ComputeWithoutContext(). It may still have racing condition because ConcurrentRunSupported is set to true for CPU EP. Added an OrtMutex.

This should fixes the recent flaky tests as well.

amarin16
amarin16 previously approved these changes Oct 17, 2024
Copy link
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex is not needed. See other comments.

@fs-eire
Copy link
Contributor Author

fs-eire commented Oct 17, 2024

Updated to resolve the comments.

  • mutex removed.
  • the members of unique_ptr of float* are only assigned in Prepack()
    • if prepack is used, always use the stored prepacked_scale_fp32_data_ and prepacked_bias_fp32_data_.
    • if prepack is not used, it means the scale and bias are not initializers. always convert from f16 to f32 for them.

@fs-eire fs-eire requested a review from tianleiwu October 17, 2024 22:46
tianleiwu
tianleiwu previously approved these changes Oct 17, 2024
@fs-eire fs-eire merged commit b4cb937 into main Oct 18, 2024
91 checks passed
@fs-eire fs-eire deleted the fs-eire/fix-layernorm-cpu-f16 branch October 18, 2024 01:49
guschmue pushed a commit that referenced this pull request Oct 18, 2024
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
tianleiwu pushed a commit that referenced this pull request Oct 18, 2024
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
apsonawane pushed a commit that referenced this pull request Oct 22, 2024
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
@sophies927 sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024
ishwar-raut1 pushed a commit to ishwar-raut1/onnxruntime that referenced this pull request Nov 19, 2024
### Description

The recent PR microsoft#22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked Cherry-picked for a cherrypicks branch release:1.20.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants