fix LayerNorm f16 CPU implementation #22479

fs-eire · 2024-10-17T08:01:11Z

Description

The recent PR #22223 introduced 2 bugs in implementation of CPU LayerNorm f16:

possible access to nullptr for bias
const TensorShape& bias_shape = bias->Shape(); will crash when bias does not exist. (amazingly seems this one is not coverred by any test case)
- fix: guard with pointer check
a racing condition inside ComputeJob
ComputeJob() is dispatched to threadpool and it internally tries to modify LayerNormImpl::scale_fp32_ and LayerNormImpl::bias_fp32_, which are std::unique_ptrs and are not thread-safe.
- fix: move the modification of LayerNormImpl::scale_fp32_ and LayerNormImpl::bias_fp32_ out of ComputeJob() and put into LayerNormImpl::ComputeWithoutContext(). It may still have racing condition because ConcurrentRunSupported is set to true for CPU EP. Added an OrtMutex.

This should fixes the recent flaky tests as well.

onnxruntime/core/providers/cpu/nn/layer_norm_impl.h

onnxruntime/core/providers/cpu/nn/layer_norm_impl.cc

tianleiwu

Mutex is not needed. See other comments.

fs-eire · 2024-10-17T22:24:59Z

Updated to resolve the comments.

mutex removed.
the members of unique_ptr of float* are only assigned in Prepack()
- if prepack is used, always use the stored prepacked_scale_fp32_data_ and prepacked_bias_fp32_data_.
- if prepack is not used, it means the scale and bias are not initializers. always convert from f16 to f32 for them.

onnxruntime/test/onnx/microbenchmark/layer_normalization.cc

onnxruntime/core/providers/cpu/nn/layer_norm_impl.cc

### Description The recent PR #22223 introduced 2 bugs in implementation of CPU LayerNorm f16: - possible access to nullptr for bias `const TensorShape& bias_shape = bias->Shape();` will crash when `bias` does not exist. (amazingly seems this one is not coverred by any test case) - fix: guard with pointer check - a racing condition inside ComputeJob `ComputeJob()` is dispatched to threadpool and it internally tries to modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`, which are `std::unique_ptr`s and are not thread-safe. - fix: move the modification of `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into `LayerNormImpl::ComputeWithoutContext()`. It may still have racing condition because `ConcurrentRunSupported` is set to `true` for CPU EP. Added an OrtMutex. This should fixes the recent flaky tests as well.

### Description The recent PR microsoft#22223 introduced 2 bugs in implementation of CPU LayerNorm f16: - possible access to nullptr for bias `const TensorShape& bias_shape = bias->Shape();` will crash when `bias` does not exist. (amazingly seems this one is not coverred by any test case) - fix: guard with pointer check - a racing condition inside ComputeJob `ComputeJob()` is dispatched to threadpool and it internally tries to modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`, which are `std::unique_ptr`s and are not thread-safe. - fix: move the modification of `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into `LayerNormImpl::ComputeWithoutContext()`. It may still have racing condition because `ConcurrentRunSupported` is set to `true` for CPU EP. Added an OrtMutex. This should fixes the recent flaky tests as well.

fix LayerNorm f16 CPU implementation

437533b

tianleiwu reviewed Oct 17, 2024

View reviewed changes

onnxruntime/core/providers/cpu/nn/layer_norm_impl.h Outdated Show resolved Hide resolved

sophies927 added the release:1.20.0 label Oct 17, 2024

amarin16 previously approved these changes Oct 17, 2024

View reviewed changes

tianleiwu reviewed Oct 17, 2024

View reviewed changes

onnxruntime/core/providers/cpu/nn/layer_norm_impl.h Outdated Show resolved Hide resolved

tianleiwu reviewed Oct 17, 2024

View reviewed changes

onnxruntime/core/providers/cpu/nn/layer_norm_impl.cc Outdated Show resolved Hide resolved

tianleiwu requested changes Oct 17, 2024

View reviewed changes

resolve comments

0377367

fs-eire dismissed amarin16’s stale review via 0377367 October 17, 2024 22:18

fix build

c3fcfa2

fs-eire requested a review from tianleiwu October 17, 2024 22:46

tianleiwu previously approved these changes Oct 17, 2024

View reviewed changes

tianleiwu reviewed Oct 17, 2024

View reviewed changes

onnxruntime/test/onnx/microbenchmark/layer_normalization.cc Outdated Show resolved Hide resolved

tianleiwu reviewed Oct 17, 2024

View reviewed changes

onnxruntime/core/providers/cpu/nn/layer_norm_impl.cc Outdated Show resolved Hide resolved

add explicit cast

35a570c

fs-eire dismissed tianleiwu’s stale review via 35a570c October 17, 2024 23:10

more fixes

b2c3349

tianleiwu approved these changes Oct 17, 2024

View reviewed changes

fs-eire merged commit b4cb937 into main Oct 18, 2024
91 checks passed

fs-eire deleted the fs-eire/fix-layernorm-cpu-f16 branch October 18, 2024 01:49

sophies927 removed the release:1.20.0 label Oct 18, 2024

snnn mentioned this pull request Oct 22, 2024

ORT 1.20.0 Release: Cherry pick round 1 #22526

Merged

sophies927 added the release:1.20.0 label Oct 22, 2024

sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix LayerNorm f16 CPU implementation #22479

fix LayerNorm f16 CPU implementation #22479

fs-eire commented Oct 17, 2024

tianleiwu left a comment

fs-eire commented Oct 17, 2024

fix LayerNorm f16 CPU implementation #22479

fix LayerNorm f16 CPU implementation #22479

Conversation

fs-eire commented Oct 17, 2024

Description

tianleiwu left a comment

Choose a reason for hiding this comment

fs-eire commented Oct 17, 2024