Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring up Linux kernel #508

Draft
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

ChinYikMing
Copy link
Collaborator

@ChinYikMing ChinYikMing commented Oct 28, 2024

Clone the branch:

$ git clone https://github.com/ChinYikMing/rv32emu.git -b feat/bring-up-linux --depth 1

Checkout the repo:

$ cd rv32emu

Fetch prebuilt Linux image and run:

$ make system ENABLE_SYSTEM=1 -j8

To exit VM:

CTRL + a + x

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks

Benchmark suite Current: 36d664e Previous: ab8b756 Ratio
Dhrystone 1530 Average DMIPS over 10 runs 1533 Average DMIPS over 10 runs 1.00
Coremark 1417.099 Average iterations/sec over 10 runs 1420.683 Average iterations/sec over 10 runs 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jserv jserv requested a review from vacantron October 28, 2024 19:25
src/common.h Show resolved Hide resolved
src/decode.c Show resolved Hide resolved
@@ -90,7 +90,7 @@ enum op_field {
) \
/* RV32 Zicsr Standard Extension */ \
IIF(RV32_HAS(Zicsr))( \
_(csrrw, 0, 4, 0, ENC(rs1, rd)) \
_(csrrw, 1, 4, 0, ENC(rs1, rd)) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SYSTEM configuration is set, the Zicsr should be set accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SYSTEM configuration is set, the Zicsr should be set accordingly.

Zicsr is enabled by default if no configuration file or ENABLE_Zicsr parameter is used. So, would it be better notice this on README is good enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zicsr is enabled by default if no configuration file or ENABLE_Zicsr parameter is used. So, would it be better notice this on README is good enough?

It would be good to document the two configuration options and clarify the dependency in build system.

Copy link
Collaborator Author

@ChinYikMing ChinYikMing Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zicsr is enabled by default if no configuration file or ENABLE_Zicsr parameter is used. So, would it be better notice this on README is good enough?

It would be good to document the two configuration options and clarify the dependency in build system.

Mentioned in README. Also mentioned that ENABLE_Zifencei, ENABLE_EXT_M and ENABLE_EXT_A are mandatory. see 0915b75

@jserv
Copy link
Contributor

jserv commented Oct 28, 2024

Can you exploit the prebuilt image files used by semu?

@ChinYikMing
Copy link
Collaborator Author

Can you exploit the prebuilt image files used by semu?

Yes, intended. Ultimately, the Image in current build directory will be removed.

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the hardware model files such as UART and PLIC to the directory src/devices for maintenance purposes.

@jserv jserv added this to the release-2024.2 milestone Oct 28, 2024
@jserv

This comment was marked as outdated.

src/common.h Outdated Show resolved Hide resolved
@jserv

This comment was marked as resolved.

src/emulate.c Outdated Show resolved Hide resolved
@@ -1018,6 +1107,8 @@ static void __trap_handler(riscv_t *rv)
assert(insn);

rv_decode(ir, insn);
reloc_enable_mmu_jalr_addr = rv->PC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is reloc_enable_mmu_jalr_addr generally available? It should be SYSTEM specific.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Thanks for pointing out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICYMI, is the __trap_handler already conditionally built by SYSTEM?

@ChinYikMing ChinYikMing force-pushed the feat/bring-up-linux branch 9 times, most recently from 261b2be to dd0b1c5 Compare November 2, 2024 15:18
- modify Makefile to enable detect devices directory
- decouple PLIC and UART into separate files
- Bind RISC-V core to plic_t to enable sending interrupt from PLIC to
  core
@jserv
Copy link
Contributor

jserv commented Nov 10, 2024

@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:

You can create a file containing the necessary version setting in directory .ci/.

The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.

Agree. Can you specify the explicit rules to trigger the builds for Linux kernel and/or rootfs?

@ChinYikMing
Copy link
Collaborator Author

@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:

You can create a file containing the necessary version setting in directory .ci/.

Got it.

The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.

Agree. Can you specify the explicit rules to trigger the builds for Linux kernel and/or rootfs?

Yes, I will include the CI trigger rules in this PR.

With the Linux image and ELF executable already separated, it is
also necessary to decouple the tags. Add the suffix 'Linux-Image'
for the Linux image release artifact and 'ELF' for the test bench
ELF executable.

Additionally, include the Buildroot and Linux kernel version files
in .ci/. Updating either of these files will trigger a rebuild of
the Linux image artifact.
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use eval to generalize variables.
See https://github.com/sysprog21/mado/blob/main/mk/common.mk

mk/external.mk Outdated Show resolved Hide resolved
mk/external.mk Outdated Show resolved Hide resolved
@ChinYikMing ChinYikMing force-pushed the feat/bring-up-linux branch 3 times, most recently from 64abe47 to f60e662 Compare November 11, 2024 13:28
@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Nov 11, 2024

Can you exploit the prebuilt image files used by semu?

Yes, intended. Ultimately, the Image in current build directory will be removed.

Use the released Linux image once it becomes available in rv32emu-prebuilt.

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Nov 11, 2024

Action items:

  • Send pull request to semu for bumping to Linux v6.6.y, which is the latest longterm kernel. You have to make sure SMP configurations work as well. If not, report on semu. Once semu integrates Linux v6.6.y, rework the above build script here.

Let's stick with the Linux v6.1.y in this PR. Bump to v6.6.y in new PR after this.

@ChinYikMing ChinYikMing force-pushed the feat/bring-up-linux branch 2 times, most recently from 13062be to 6d8f3a3 Compare November 11, 2024 15:14
@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Nov 11, 2024

Clone the branch:

$ git clone https://github.com/ChinYikMing/rv32emu.git -b feat/bring-up-linux --depth 1

Checkout the repo:

$ cd rv32emu

Fetch prebuilt Linux image and run:

$ make system ENABLE_SYSTEM=1 -j8

To exit VM:

CTRL + a + x

Prebuilt Linux image are available now. Please give it a try. The make check or other CI are broken because the ELF prebuilt tag has not added the suffix "-ELF", shall be confirmed with @vacantron .

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the configurations of Buildroot, enabling dhrystone and coremark for benchmarking purpose.

@jserv
Copy link
Contributor

jserv commented Nov 11, 2024

Prebuilt Linux image are available now. Please give it a try.

I saw repeated messages as following:

[    0.076716] remote fence extension is not available in SBI v0.3

Can you clarify this?

By the way, I attempted to run vi (an applet provided by Busybox), and the emulator crashed.

[    0.318814] Oops [#1]
[    0.318816] Modules linked in:
[    0.318818] CPU: 0 PID: 64 Comm: vi Not tainted 6.1.116 #1
[    0.318822] Hardware name: rv32emu (DT)
[    0.318825] epc : strncpy_from_user+0x6c/0x190
[    0.318829]  ra : getname_flags+0x74/0x194
[    0.318833] epc : c01fb6a8 ra : c00e95a8 sp : c0b07ea0
[    0.318836]  gp : c04da828 tp : c0abb600 t0 : 00000ff0
[    0.318840]  t1 : fefefeff t2 : 6917b420 s0 : c0b07eb0
[    0.318843]  s1 : c0851000 a0 : 00000000 a1 : 00000000
[    0.318847]  a2 : 00000ff0 a3 : 00000000 a4 : 00000000
[    0.318850]  a5 : 00000ff0 a6 : 00000022 a7 : c0851010
[    0.318854]  s2 : c0b07f38 s3 : 00000000 s4 : 00000000
[    0.318857]  s5 : c04db698 s6 : 00000000 s7 : 00000000
[    0.318860]  s8 : 00001000 s9 : 00000002 s10: 00000014
[    0.318864]  s11: ffffffff t3 : 80808080 t4 : 00040000
[    0.318867]  t5 : 00000005 t6 : 00000ff0
[    0.318870] status: 00040120 badaddr: 00000000 cause: 0000000d
[    0.318874] [<c01fb6a8>] strncpy_from_user+0x6c/0x190
[    0.318878] [<c00e95a8>] getname_flags+0x74/0x194
[    0.318883] [<c00e9718>] getname+0x1c/0x2c
[    0.318887] [<c00d7f18>] do_sys_openat2+0x4c/0xf0
[    0.318891] [<c00d80b8>] do_sys_open+0x40/0x58
[    0.318895] [<c00d8130>] sys_openat+0x24/0x34
[    0.318899] [<c0002464>] ret_from_syscall+0x0/0x4
[    0.318903] ---[ end trace 0000000000000000 ]---
[    0.318935] sh[62]: unhandled signal 11 code 0x1 at 0x00000040 in busybox[69016000+b6000]
[    0.318942] CPU: 0 PID: 62 Comm: sh Tainted: G      D            6.1.116 #1
[    0.318947] Hardware name: rv32emu (DT)
[    0.318949] epc : 00000040 ra : 00000040 sp : 9d4df530
[    0.318953]  gp : 690cdd14 tp : 9575d2c0 t0 : 0000000a
[    0.318956]  t1 : 6901d28c t2 : 00000001 s0 : 00000002
[    0.318960]  s1 : ffffffff a0 : fffffff2 a1 : 9d4df520
[    0.318963]  a2 : 9d4df5a0 a3 : 00000006 a4 : 9d4df8e8
[    0.318967]  a5 : 00000011 a6 : 00040000 a7 : 0000005f
[    0.318970]  s2 : 9d4df9dc s3 : 00000000 s4 : 690ce1a0
[    0.318974]  s5 : 00000001 s6 : 690ce1a0 s7 : 690cda60
[    0.318977]  s8 : 0000007f s9 : 00000001 s10: 9d4df9dc
[    0.318980]  s11: 00000004 t3 : 9568afc8 t4 : 00000080
[    0.318984]  t5 : 00000009 t6 : 690b1de8
[    0.318987] status: 00000020 badaddr: 00000040 cause: 0000000c

@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Nov 11, 2024

Prebuilt Linux image are available now. Please give it a try.

I saw repeated messages as following:

[    0.076716] remote fence extension is not available in SBI v0.3

Can you clarify this?

I have used a SMP-enabled Linux configuration to build the Linux kernel, thus the remote fence SBI probing is working to enable flushing cache in different core but there is no corresponding SBI implementation currently. Two ways to suppress this:

  1. Implement a dummy remote fence SBI
  2. disable SMP configuration (this one is easier)

Nonetheless, the remote fence SBI is an essential future feature for accurately simulating SMP behavior. Also, note that the repeated message appears in semu as well.

By the way, I attempted to run vi (an applet provided by Busybox), and the emulator crashed.

[    0.318814] Oops [#1]
[    0.318816] Modules linked in:
[    0.318818] CPU: 0 PID: 64 Comm: vi Not tainted 6.1.116 #1
[    0.318822] Hardware name: rv32emu (DT)
[    0.318825] epc : strncpy_from_user+0x6c/0x190
[    0.318829]  ra : getname_flags+0x74/0x194
[    0.318833] epc : c01fb6a8 ra : c00e95a8 sp : c0b07ea0
[    0.318836]  gp : c04da828 tp : c0abb600 t0 : 00000ff0
[    0.318840]  t1 : fefefeff t2 : 6917b420 s0 : c0b07eb0
[    0.318843]  s1 : c0851000 a0 : 00000000 a1 : 00000000
[    0.318847]  a2 : 00000ff0 a3 : 00000000 a4 : 00000000
[    0.318850]  a5 : 00000ff0 a6 : 00000022 a7 : c0851010
[    0.318854]  s2 : c0b07f38 s3 : 00000000 s4 : 00000000
[    0.318857]  s5 : c04db698 s6 : 00000000 s7 : 00000000
[    0.318860]  s8 : 00001000 s9 : 00000002 s10: 00000014
[    0.318864]  s11: ffffffff t3 : 80808080 t4 : 00040000
[    0.318867]  t5 : 00000005 t6 : 00000ff0
[    0.318870] status: 00040120 badaddr: 00000000 cause: 0000000d
[    0.318874] [<c01fb6a8>] strncpy_from_user+0x6c/0x190
[    0.318878] [<c00e95a8>] getname_flags+0x74/0x194
[    0.318883] [<c00e9718>] getname+0x1c/0x2c
[    0.318887] [<c00d7f18>] do_sys_openat2+0x4c/0xf0
[    0.318891] [<c00d80b8>] do_sys_open+0x40/0x58
[    0.318895] [<c00d8130>] sys_openat+0x24/0x34
[    0.318899] [<c0002464>] ret_from_syscall+0x0/0x4
[    0.318903] ---[ end trace 0000000000000000 ]---
[    0.318935] sh[62]: unhandled signal 11 code 0x1 at 0x00000040 in busybox[69016000+b6000]
[    0.318942] CPU: 0 PID: 62 Comm: sh Tainted: G      D            6.1.116 #1
[    0.318947] Hardware name: rv32emu (DT)
[    0.318949] epc : 00000040 ra : 00000040 sp : 9d4df530
[    0.318953]  gp : 690cdd14 tp : 9575d2c0 t0 : 0000000a
[    0.318956]  t1 : 6901d28c t2 : 00000001 s0 : 00000002
[    0.318960]  s1 : ffffffff a0 : fffffff2 a1 : 9d4df520
[    0.318963]  a2 : 9d4df5a0 a3 : 00000006 a4 : 9d4df8e8
[    0.318967]  a5 : 00000011 a6 : 00040000 a7 : 0000005f
[    0.318970]  s2 : 9d4df9dc s3 : 00000000 s4 : 690ce1a0
[    0.318974]  s5 : 00000001 s6 : 690ce1a0 s7 : 690cda60
[    0.318977]  s8 : 0000007f s9 : 00000001 s10: 9d4df9dc
[    0.318980]  s11: 00000004 t3 : 9568afc8 t4 : 00000080
[    0.318984]  t5 : 00000009 t6 : 690b1de8
[    0.318987] status: 00000020 badaddr: 00000040 cause: 0000000c

I have encountered the same issue. But, when using vi xxx (xxx is some random filename), vi works normally. Try to figure out the root cause.

- reuse mk/external.mk to verify integrity of the source.
- let build-system-image Makefile target can be reused in build-artifact
  CI.
- the download, extract and verify function in mk/external.mk are adjusted
  since git CLI command is added to pull buildroot and Linux. Note that
  the '*/.git/*' of a git cloned repository has to be filtered out
  during SHA value verification because the metadata is difference every
  time when cloning.
- all linux image store in build/linux-image directory.
Note that ENABLE_SYSTEM=1 shall be specified
@ChinYikMing
Copy link
Collaborator Author

ChinYikMing commented Nov 12, 2024

The rv32emu-prebuilt latest release tag has been added suffix -ELF, so that all CI tests passed.

After merging this PR, the new release of test benches will automatically have the suffix -ELF added.

@jserv
Copy link
Contributor

jserv commented Nov 12, 2024

The rv32emu-prebuilt latest release tag has been added suffix -ELF, so that all CI tests passed.

Why uppercase -ELF suffix?

@ChinYikMing
Copy link
Collaborator Author

The rv32emu-prebuilt latest release tag has been added suffix -ELF, so that all CI tests passed.

Why uppercase -ELF suffix?

I think it just a typical naming convention when mentioning ELF format, but please let me know if you prefer something different.

The Linux image rebuild will be triggered if and only if the version of them
are changed in mk/external.mk.
@Mes0903
Copy link

Mes0903 commented Nov 12, 2024

Hi! I have observed that random occurrences of segmentation faults, kernel panics, and crashes are happening. It feels like approximately one out of every five or six runs results in one of these issues. The tests were conducted on Commit ab8b756.

The command I used is:

make system ENABLE_SYSTEM=1 -j8

For multiple tests afterward, I used:

build/rv32emu -k build/linux-image/Image -i build/linux-image/rootfs.cpio -b build/minimal.dtb

Below is one of the kernel panic cases:

[    0.014183] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[    0.014197] Oops [#1]
[    0.014203] Modules linked in:
[    0.014210] CPU: 0 PID: 1 Comm: swapper Not tainted 6.1.116 #2
[    0.014223] Hardware name: rv32emu (DT)
[    0.014230] epc : __rb_rotate_set_parents+0x0/0x58
[    0.014242]  ra : rb_insert_color+0xc4/0x154
[    0.014254] epc : c0313b54 ra : c031401c sp : c0861cb0
[    0.014265]  gp : c0476320 tp : c0844000 t0 : c09c9f20
[    0.014277]  t1 : 00000000 t2 : d7a9a567 s0 : c0861cc0
[    0.014287]  s1 : c09c9ec8 a0 : c09c9dd0 a1 : c09c9ed8
[    0.014298]  a2 : c09c9d94 a3 : c09c9ed8 a4 : 00000003
[    0.014309]  a5 : 00000000 a6 : 00000016 a7 : c035b560
[    0.014320]  s2 : 00000000 s3 : c0828034 s4 : c09c9d68
[    0.014330]  s5 : c047600c s6 : 00000000 s7 : 00000000
[    0.014341]  s8 : 00000008 s9 : 00000000 s10: 00000000
[    0.014352]  s11: 00000000 t3 : 00000004 t4 : 00000014
[    0.014361]  t5 : ed55a009 t6 : c09b57e6
[    0.014369] status: 00000120 badaddr: 00000008 cause: 0000000d
[    0.014381] [<c0313b54>] __rb_rotate_set_parents+0x0/0x58
[    0.014394] [<c031401c>] rb_insert_color+0xc4/0x154
[    0.014408] [<c010a224>] kernfs_link_sibling+0x54/0xf4
[    0.014421] [<c010b46c>] kernfs_add_one+0x88/0x14c
[    0.014434] [<c010d110>] __kernfs_create_file+0xb4/0xec
[    0.014448] [<c010df08>] sysfs_add_file_mode_ns+0xd4/0x124
[    0.014462] [<c010dfd8>] sysfs_create_file_ns+0x80/0x84
[    0.014475] [<c01f9a3c>] device_create_file+0x8c/0xac
[    0.014490] [<c01fd0dc>] device_add+0x41c/0x67c
[    0.014501] [<c01fd360>] device_register+0x24/0x38
[    0.014514] [<c01d1174>] tty_register_device_attr+0x174/0x210
[    0.014528] [<c01d122c>] tty_register_device+0x1c/0x2c
[    0.014542] [<c01d13a8>] tty_register_driver+0x16c/0x1d0
[    0.014555] [<c033cf64>] pty_init+0x164/0x3d0
[    0.014567] [<c000110c>] do_one_initcall+0x6c/0x260
[    0.014579] [<c032c0ac>] kernel_init_freeable+0x20c/0x210
[    0.014592] [<c0325a6c>] kernel_init+0x24/0x118
[    0.014605] [<c00023d0>] ret_from_exception+0x0/0x1c
[    0.014618] ---[ end trace 0000000000000000 ]---
[    0.014627] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.014640] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Below is another crash example:

[    0.026278] Oops - Oops - load address misaligned [#1]
[    0.026289] Modules linked in:
[    0.026296] CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 6.1.116 #2
[    0.026310] Hardware name: rv32emu (DT)
[    0.026318] Workqueue: events_unbound async_run_entry_fn
[    0.026333] epc : jbd2_journal_dirty_metadata+0x28/0x290
[    0.026346]  ra : __ext4_handle_dirty_metadata+0x90/0x204
[    0.026359] epc : c0161438 ra : c0114284 sp : c086dc70
[    0.026370]  gp : c0476320 tp : c0845b80 t0 : c0a1b048
[    0.026382]  t1 : 00000003 t2 : 8147ac9e s0 : c086dca0
[    0.026393]  s1 : c086dd68 a0 : 339dc50d a1 : c086dd68
[    0.026405]  a2 : 339dc50d a3 : 00000000 a4 : 61a20000
[    0.026416]  a5 : c082f0d1 a6 : 7c11977b a7 : 3be9185e
[    0.026427]  s2 : 00000000 s3 : 339dc50d s4 : 00000000
[    0.026438]  s5 : c042be78 s6 : c035c3f8 s7 : 000003a0
[    0.026449]  s8 : 00000001 s9 : 0000000b s10: c089d05f
[    0.026460]  s11: 00000000 t3 : c0880014 t4 : c0c18e84
[    0.026471]  t5 : 2771c19e t6 : c088001c
[    0.026480] status: 00000120 badaddr: 339dc50d cause: 00000004
[    0.026492] [<c0161438>] jbd2_journal_dirty_metadata+0x28/0x290
[    0.026506] [<c0114284>] __ext4_handle_dirty_metadata+0x90/0x204
[    0.026521] [<c012cda4>] ext4_getblk+0x290/0x2a4
[    0.026534] [<c00b9f94>] path_lookupat+0x60/0x154
[    0.026547] [<c00bab08>] filename_lookup+0xa0/0xf8
[    0.026560] [<c00baba0>] kern_path+0x40/0x68
[    0.026572] [<c03392b4>] init_chown+0x3c/0xa8
[    0.026585] [<c032cf10>] do_symlink+0x74/0xac
[    0.026598] [<c032cf88>] write_buffer+0x40/0x64
[    0.026611] [<c032d85c>] unpack_to_rootfs+0x298/0x2e4
[    0.026625] [<c032df54>] do_populate_rootfs+0x6c/0xd4
[    0.026639] [<c002630c>] async_run_entry_fn+0x3c/0xc4
[    0.026654] [<c001d500>] process_one_work+0x188/0x20c
[    0.026667] [<c001da04>] worker_thread+0x20c/0x268
[    0.026680] [<c002341c>] kthread+0xc0/0xc4
[    0.026693] [<c00023d0>] ret_from_exception+0x0/0x1c
[    0.026706] ---[ end trace 0000000000000000 ]---
[    0.033158] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.033467] printk: console [ttyS0] disabled
[    0.033483] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    0.033504] printk: console [ttyS0] enabled
[    0.033504] printk: console [ttyS0] enabled
[    0.033521] printk: bootconsole [ns16550] disabled
[    0.033521] printk: bootconsole [ns16550] disabled
[    0.033808] clk: Disabling unused clock

Below is another segmentation fault example:

[    0.065393] Freeing unused kernel image (initmem) memory: 160K
[    0.065408] Kernel memory protection not selected by kernel config.
[    0.065422] Run /init as init process
[    0.074177] ln[29]: unhandled signal 11 code 0x1 at 0x9b779c64 in ld-linux-riscv32-ilp32.so.1[95729000+27000]
[    0.074207] CPU: 0 PID: 29 Comm: ln Not tainted 6.1.116 #2
[    0.074222] Hardware name: rv32emu (DT)
[    0.074232] epc : 9573ec38 ra : 9573e030 sp : 9d230dd0
[    0.074246]  gp : 6915cd14 tp : 957782c0 t0 : 0000000a
[    0.074259]  t1 : 9d230df0 t2 : 00000000 s0 : 9d230e50
[    0.074274]  s1 : 95729ab8 a0 : 00009ed6 a1 : 0000009e
[    0.074287]  a2 : 95729ad0 a3 : 0000000a a4 : 9b779c63
[    0.074301]  a5 : 9ed66737 a6 : 2f2f2f2f a7 : 00000001
[    0.074315]  s2 : 9d25bbe0 s3 : 00000001 s4 : 95752008
[    0.074328]  s5 : 95729000 s6 : 9d25bc7c s7 : 95729000
[    0.074342]  s8 : 95751008 s9 : 95729ac4 s10: 957293ac
[    0.074356]  s11: 0000fff1 t3 : 009ed667 t4 : fffffffc
[    0.074370]  t5 : 00000035 t6 : 0000000b
[    0.074381] status: 00000020 badaddr: 9b779c64 cause: 0000000f
Segmentation fault (core dumped)
make: *** [mk/system.mk:27: system] Error 139

@Mes0903
Copy link

Mes0903 commented Nov 16, 2024

I have identified several issues here.

A segmentation fault occurs in the mmu_write_b function, specifically in get_ppn_and_offset, where the value of pte can be 0x0. Since pte is dereferenced in get_ppn_and_offset, this causes a segmentation fault, which is also the reason for the "Unable to handle kernel NULL pointer dereference at virtual address 00000040" message.

Also, the assert(insn) in the block_translate function fails sporadically, resulting in the message "Unable to handle kernel access to user memory without uaccess routines at virtual address."

Additionally, the program randomly enters an unresponsive state. In such cases, it gets stuck in the following code, and the behavior looks like it gets into an infinite loop:

/* BNE: Branch if Not Equal */
RVOP(
    bne,
    { BRANCH_FUNC(uint32_t, ==); },
    GEN({
        rald2, rs1, rs2;
        cmp, VR1, VR0;
        break;
        setjmpoff;
        jcc, 0x85;
        cond, branch_untaken;
        jmp, pc, 4;
        end;
        ldimm, TMP, pc, 4;
        st, S32, TMP, PC;
        exit;
        jmpoff;
        cond, branch_taken;
        jmp, pc, imm;
        end;
        ldimm, TMP, pc, imm;
        st, S32, TMP, PC;
        exit;
    }))

When stuck in this code, the local variables PC and cycle increase in a regular pattern.

Below is the log at the time of the segmentation fault:

[    0.249663] printk: bootconsole [ns16550] disabled
[    0.250717] clk: Disabling unused clocks
[    0.251025] Freeing unused kernel image (initmem) memory: 160K
[    0.251072] Kernel memory protection not selected by kernel config.
[    0.251111] Run /init as init process
[    0.263080] mount[22]: unhandled signal 11 code 0x1 at 0x9b7f6c64 in ld-linux-riscv32-ilp32.so.1[957a6000+27000]
[    0.263175] CPU: 0 PID: 22 Comm: mount Not tainted 6.1.116 #2
[    0.263227] Hardware name: rv32emu (DT)
[    0.263259] epc : 957bbc38 ra : 957bb030 sp : 9d4b3de0
[    0.263306]  gp : 690f1d14 tp : 957282c0 t0 : 0000000a
[    0.263351]  t1 : 9d4b3e00 t2 : 00000000 s0 : 9d4b3e60
[    0.263397]  s1 : 957a6ab8 a0 : 00009ede a1 : 0000009e
[    0.263442]  a2 : 957a6ad0 a3 : 0000000a a4 : 9b7f6c63
[    0.263487]  a5 : 9ede3737 a6 : 2f2f2f2f a7 : 00000001
[    0.263533]  s2 : 9d41bbf0 s3 : 00000001 s4 : 957cf008
[    0.263578]  s5 : 957a6000 s6 : 9d41bc7c s7 : 957a6000
[    0.263624]  s8 : 957ce008 s9 : 957a6ac4 s10: 957a63ac
[    0.263671]  s11: 0000fff1 t3 : 009ede37 t4 : fffffffc
[    0.263717]  t5 : 00000035 t6 : 0000000b
[    0.263752] status: 00000020 badaddr: 9b7f6c64 cause: 0000000f
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1213562==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x555555598401 bp 0x7ffff38af500 sp 0x7fffffffd780 T0)
==1213562==The signal is caused by a READ memory access.
==1213562==Hint: address points to the zero page.
    #0 0x555555598401 in mmu_write_b src/system.c:392
    #1 0x5555555750fe in do_sb src/rv32_template.c:639
    #2 0x5555555628f9 in rv_step src/emulate.c:1075
    #3 0x5555555628f9 in rv_run src/riscv.c:500
    #4 0x5555555628f9 in main src/main.c:279
    #5 0x7ffff722a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #6 0x7ffff722a28a in __libc_start_main_impl ../csu/libc-start.c:360
    #7 0x5555555663a4 in _start (/home/mes/MesRepo/Mes-rv32emu/rv32emu/build/rv32emu+0x123a4) (BuildId: e0992c4435c27bffa4166ed19d915866b583f6fc)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/system.c:392 in mmu_write_b
==1213562==ABORTING

@ChinYikMing
Copy link
Collaborator Author

@Mes0903 Hi, thanks for your several testing, appreciate that! The get_ppn_and_offset function should work correctly, assuming the PTE is valid at the time it is used ( I might add assertions to ensure the PTE's validity ). However, this assumption does not hold in your test case.

Upon investigation, some page faults are successfully detected and handled by the do_page_fault function in the kernel. Ideally, this trap handler remaps the PTE if it is absent or performs other VMA-related checks. If something goes wrong, a user-space process might receive a SIGSEGV and terminate for example, while a kernel thread could potentially enter a dead state (refer to die_kernel_fault). Tracing do_page_fault in greater detail could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants