Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: use g_running_task store current regs #13585

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

hujun260
Copy link
Contributor

Summary

This commit fixes the regression from #13561

In order to determine whether a context switch has occurred, we can use g_running_task to store the current regs. This allows us to compare the current register state with the previously stored state to identify if a context switch has taken place.

Impact

riscv

Testing

rv-virt:knsh
rv-virt:knsh64
rv-virt:smp

@github-actions github-actions bot added Area: Tooling Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Size: L The size of the change in this PR is large labels Sep 24, 2024
@github-actions github-actions bot added Size: S The size of the change in this PR is small and removed Area: Tooling Size: L The size of the change in this PR is large labels Sep 24, 2024
@nuttxpr
Copy link

nuttxpr commented Sep 24, 2024

[Experimental Bot, please feedback here]

The PR summary is missing some information, and the Testing section is incomplete.

Here's a breakdown:

Summary - Needs Improvement

  • What functional part of the code is being changed? Be specific. Mention the files, functions, or modules affected.
  • How does the change exactly work (what will change and how)? Provide a technical explanation of your solution. How does using g_running_task to store registers help determine a context switch?

Impact - Needs Improvement

  • Is new feature added? Is existing feature changed? - This seems like a bug fix, so clearly state that.
  • Impact on user (will user need to adapt to change)? - Even if the answer is NO, you should still briefly explain why.
  • The other "Impact" sections may not all be relevant. It's okay to remove sections if there's genuinely no impact.

Testing - Incomplete

  • Build Host(s): Specify your build host details.
  • Testing logs before change / Testing logs after change: You need to provide actual logs demonstrating the problem before your fix and the improvement afterward.

To improve this PR:

  1. Expand the Summary: Clearly describe what part of the code is affected and the technical details of your fix.
  2. Complete the Impact section: Address all points, even if it's to say "NO" and provide a brief justification.
  3. Provide Actual Testing Logs: Show the issue before your change and the successful result after.

By providing this additional information, you'll make it easier for reviewers to understand and approve your PR.

Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested OK with OSTest on rv-virt:knsh, rv-virt:knsh64 and milkv_duos:nsh. Thanks!

@lupyuen
Copy link
Member

lupyuen commented Sep 24, 2024

Hi @tmedicci would you need to test this on ESP32 RISC-V? Thanks!

@lupyuen
Copy link
Member

lupyuen commented Sep 24, 2024

@hujun260 I rebooted and retested OSTest, 10 times on rv-virt:knsh64. Out of the 10 tests, 1 test failed at vfork. Would you know what's causing this? Thanks! https://gist.github.com/lupyuen/cebee9e69f23047476a7a5305f595fa6

user_main: vfork() test
[   71.520000] _assert: Current Version: NuttX  0.0.0 7b5ca0ec41 Sep 24 2024 12:41:49 risc-v
[   71.520000] _assert: Assertion failed : at file: common/riscv_fork.c:133 task: ostest process: ostest 0xc000001a
[   71.520000] up_dump_register: EPC: 00000000802133e4
[   71.520000] up_dump_register: A0: 00000000804072b0 A1: 0000000000000085 A2: 0000000200042020 A3: 0000000080408560
[   71.520000] up_dump_register: A4: 000000000000000a A5: 0000000000000004 A6: 0000000000000000 A7: 0000000000000000
[   71.520000] up_dump_register: T0: 0000000080206898 T1: 0000000000000007 T2: 0000000000000000 T3: 00000000c0206070
[   71.520000] up_dump_register: T4: 00000000c0206068 T5: 00000000c000b8f9 T6: 000000000000003d
[   71.520000] up_dump_register: S0: 0000000000000000 S1: 0000000080409180 S2: 0000000080409180 S3: 0000000000000000
[   71.520000] up_dump_register: S4: 0000000200042020 S5: 0000000000000000 S6: 0000000080219328 S7: 0000000000000000
[   71.520000] up_dump_register: S8: 0000000000000006 S9: 00000000804074d0 S10: 0000000000000085 S11: 0000000080407550
[   71.520000] up_dump_register: SP: 000000008040c850 FP: 0000000000000000 TP: 0000000080409180 RA: 00000000802133e4
[   71.520000] dump_stack: Kernel Stack:
[   71.520000] dump_stack:   base: 0x8040c030
[   71.520000] dump_stack:   size: 00003072
[   71.520000] dump_stack:     sp: 0x8040c850
[   71.520000] stack_dump: 0x8040c810: 0000000000000009 00000000804092d8 000000008021be18 0000000000000000 00000000c0202070 0000000080409180 0000000000000000 0000000080213666
[   71.520000] stack_dump: 0x8040c850: 00000000c000001a fffffffffffffffc 0000000000006010 00000000c0200000 0000000080409180 00000000804072b0 0000000000000000 0000000080219328
[   71.520000] stack_dump: 0x8040c890: 0000000000000085 ffff00587474754e 0000000000000c10 0000000080408180 0000000080408180 0000000080206af0 00000000c0206000 302e300080408180
[   71.520000] stack_dump: 0x8040c8d0: 000000008000302e 00000000802086ca 3061633562370000 7065532031346365 3432303220343220 343a31343a323120 0000000080400039 0000000080203352
[   71.520000] stack_dump: 0x8040c910: 7369720000000000 0000000000762d63 0000000000000000 000000008040cc40 000000008040cf40 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040c950: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000005 0000000200042022 0000000080409180 00000000c000919a
[   71.520000] stack_dump: 0x8040c990: ffffffffffffffda 0000000080206912 ffffffffffffffda 000000008020a32c ffffffffffffffff 0000000000000000 0000000000000005 0000000000000020
[   71.520000] stack_dump: 0x8040c9d0: 0000000000000008 00000000c000919a ffffffffffffffda 00000000802068a0 0000000000000008 0000000080200940 0000000000000000 0000000080200972
[   71.520000] stack_dump: 0x8040ca10: 0000000200042020 00000000802001a2 00000000c000919a 00000000c0008af8 00000000c0203ef0 0000000080409180 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040ca50: 00000000c01017b0 0000000000000000 00000000c0202040 0000000000000000 0000000000000022 00000000c02003d8 00000000c0200448 0000000000000018
[   71.520000] stack_dump: 0x8040ca90: 000000000000000a 00000000c0101f4e 0000000000000009 0000000000000000 0000000000000005 00000000c0010a48 0000000000000005 0000000000000000
[   71.520000] stack_dump: 0x8040cad0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000c0008146 00000000c000b6f8
[   71.520000] stack_dump: 0x8040cb10: 00000000c000b8f9 000000000000003d 0000000200042020 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040cb50: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040cb90: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040cbd0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000802034b2
[   71.520000] stack_dump: 0x8040cc10: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
ostest_main: Exiting with status 256

Update: Out of 20 reboots and retests, 2 failed with the vfork error. So the failure rate is 10%.

@hujun260
Copy link
Contributor Author

@hujun260 I rebooted and retested OSTest, 10 times on rv-virt:knsh64. Out of the 10 tests, 1 test failed at vfork. Would you know what's causing this? Thanks! https://gist.github.com/lupyuen/cebee9e69f23047476a7a5305f595fa6

user_main: vfork() test
[   71.520000] _assert: Current Version: NuttX  0.0.0 7b5ca0ec41 Sep 24 2024 12:41:49 risc-v
[   71.520000] _assert: Assertion failed : at file: common/riscv_fork.c:133 task: ostest process: ostest 0xc000001a
[   71.520000] up_dump_register: EPC: 00000000802133e4
[   71.520000] up_dump_register: A0: 00000000804072b0 A1: 0000000000000085 A2: 0000000200042020 A3: 0000000080408560
[   71.520000] up_dump_register: A4: 000000000000000a A5: 0000000000000004 A6: 0000000000000000 A7: 0000000000000000
[   71.520000] up_dump_register: T0: 0000000080206898 T1: 0000000000000007 T2: 0000000000000000 T3: 00000000c0206070
[   71.520000] up_dump_register: T4: 00000000c0206068 T5: 00000000c000b8f9 T6: 000000000000003d
[   71.520000] up_dump_register: S0: 0000000000000000 S1: 0000000080409180 S2: 0000000080409180 S3: 0000000000000000
[   71.520000] up_dump_register: S4: 0000000200042020 S5: 0000000000000000 S6: 0000000080219328 S7: 0000000000000000
[   71.520000] up_dump_register: S8: 0000000000000006 S9: 00000000804074d0 S10: 0000000000000085 S11: 0000000080407550
[   71.520000] up_dump_register: SP: 000000008040c850 FP: 0000000000000000 TP: 0000000080409180 RA: 00000000802133e4
[   71.520000] dump_stack: Kernel Stack:
[   71.520000] dump_stack:   base: 0x8040c030
[   71.520000] dump_stack:   size: 00003072
[   71.520000] dump_stack:     sp: 0x8040c850
[   71.520000] stack_dump: 0x8040c810: 0000000000000009 00000000804092d8 000000008021be18 0000000000000000 00000000c0202070 0000000080409180 0000000000000000 0000000080213666
[   71.520000] stack_dump: 0x8040c850: 00000000c000001a fffffffffffffffc 0000000000006010 00000000c0200000 0000000080409180 00000000804072b0 0000000000000000 0000000080219328
[   71.520000] stack_dump: 0x8040c890: 0000000000000085 ffff00587474754e 0000000000000c10 0000000080408180 0000000080408180 0000000080206af0 00000000c0206000 302e300080408180
[   71.520000] stack_dump: 0x8040c8d0: 000000008000302e 00000000802086ca 3061633562370000 7065532031346365 3432303220343220 343a31343a323120 0000000080400039 0000000080203352
[   71.520000] stack_dump: 0x8040c910: 7369720000000000 0000000000762d63 0000000000000000 000000008040cc40 000000008040cf40 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040c950: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000005 0000000200042022 0000000080409180 00000000c000919a
[   71.520000] stack_dump: 0x8040c990: ffffffffffffffda 0000000080206912 ffffffffffffffda 000000008020a32c ffffffffffffffff 0000000000000000 0000000000000005 0000000000000020
[   71.520000] stack_dump: 0x8040c9d0: 0000000000000008 00000000c000919a ffffffffffffffda 00000000802068a0 0000000000000008 0000000080200940 0000000000000000 0000000080200972
[   71.520000] stack_dump: 0x8040ca10: 0000000200042020 00000000802001a2 00000000c000919a 00000000c0008af8 00000000c0203ef0 0000000080409180 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040ca50: 00000000c01017b0 0000000000000000 00000000c0202040 0000000000000000 0000000000000022 00000000c02003d8 00000000c0200448 0000000000000018
[   71.520000] stack_dump: 0x8040ca90: 000000000000000a 00000000c0101f4e 0000000000000009 0000000000000000 0000000000000005 00000000c0010a48 0000000000000005 0000000000000000
[   71.520000] stack_dump: 0x8040cad0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000c0008146 00000000c000b6f8
[   71.520000] stack_dump: 0x8040cb10: 00000000c000b8f9 000000000000003d 0000000200042020 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040cb50: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040cb90: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   71.520000] stack_dump: 0x8040cbd0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000802034b2
[   71.520000] stack_dump: 0x8040cc10: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
ostest_main: Exiting with status 256

ok i'll try to reproduce

@lupyuen
Copy link
Member

lupyuen commented Sep 24, 2024

@hujun260 I did some more Stress Testing:

  1. For the code in this PR: Out of 20 reboots and retests, 2 failed with the vfork error. So the failure rate is 10%.
  2. For the code dated 12 Sep, 2 weeks ago (9bee10f): Out of 200 reboots and retests, all 200 succeeded without vfork error.

So this vfork problem might be introduced in the past 2 weeks?

Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do more analysis on the vfork issue, to ensure it's not caused by this PR. Thanks!

@hujun260
Copy link
Contributor Author

Let's do more analysis on the vfork issue, to ensure it's not caused by this PR. Thanks!
ok

@hujun260
Copy link
Contributor Author

hujun260 commented Sep 24, 2024

@lupyuen Could you send me your stress testing script? I can occasionally reproduce the issue on my side, but the efficiency is relatively low.

@lupyuen
Copy link
Member

lupyuen commented Sep 24, 2024

arch/risc-v/src/common/riscv_doirq.c Outdated Show resolved Hide resolved
arch/risc-v/src/common/supervisor/riscv_perform_syscall.c Outdated Show resolved Hide resolved
arch/risc-v/src/common/riscv_swint.c Outdated Show resolved Hide resolved
@github-actions github-actions bot added Size: M The size of the change in this PR is medium and removed Size: S The size of the change in this PR is small labels Sep 25, 2024
Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested OK with OSTest on rv-virt:knsh, rv-virt:knsh64 and milkv_duos:nsh.

Stressed Tested 200 times (reboot + ostest) on rv-virt:knsh64. All 200 times succeeded 🎉 Thanks :-)

arch/risc-v/src/common/riscv_fork.c Outdated Show resolved Hide resolved
@github-actions github-actions bot added Size: S The size of the change in this PR is small and removed Size: M The size of the change in this PR is medium labels Sep 26, 2024
Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hujun260! Let's wait for #13633 to be merged. Then we can rebase and retest your PR. Thank you so much :-)

@lupyuen
Copy link
Member

lupyuen commented Sep 27, 2024

Hi @hujun260: #13633 has been merged. Could you please rebase to the master branch and retest your PR? Thanks!

@hujun260
Copy link
Contributor Author

Thanks @hujun260! Let's wait for #13633 to be merged. Then we can rebase and retest your PR. Thank you so much :-)

ok

@github-actions github-actions bot added the Arch: arm Issues related to ARM (32-bit) architecture label Sep 27, 2024
Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stress Test of rv-virt:knsh64 (reboot qemu + restart ostest) succeeded for all 200 iterations. Also tested OK with OSTest on milkv_duos:nsh. Thank you so much!

@github-actions github-actions bot removed the Arch: arm Issues related to ARM (32-bit) architecture label Sep 27, 2024
Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retested: Stress Test of rv-virt:knsh64 (reboot qemu + restart ostest) succeeded for all 200 iterations. Also tested OK with OSTest on milkv_duos:nsh. Thank you so much!

This commit fixes the regression from apache#13561

In order to determine whether a context switch has occurred,
we can use g_running_task to store the current regs.
This allows us to compare the current register state with the previously
stored state to identify if a context switch has taken place.

Signed-off-by: hujun5 <hujun5@xiaomi.com>
@xiaoxiang781216 xiaoxiang781216 merged commit 3e459c0 into apache:master Sep 27, 2024
29 checks passed
@lupyuen
Copy link
Member

lupyuen commented Sep 28, 2024

FYI All the Daily Tests are successful now 🎉

@hujun260 hujun260 deleted the apache_11 branch September 29, 2024 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Size: S The size of the change in this PR is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants