Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bl2 hang on mt7981 with BOOT_DEVICE=emmc (clang vs gcc) #8

Open
arachsys opened this issue Jun 6, 2024 · 3 comments
Open

bl2 hang on mt7981 with BOOT_DEVICE=emmc (clang vs gcc) #8

arachsys opened this issue Jun 6, 2024 · 3 comments

Comments

@arachsys
Copy link

arachsys commented Jun 6, 2024

I spotted something interesting while compiling a bootloader and trusted-firmware for an mt7981 device, but haven't been able to debug it yet. I'll have a go at digging into it over the weekend if I get a chance.

Like ARM upstream, mtk-openwrt atf's Makefile includes support for building with clang as well as gcc.

bl31 + u-boot work fine when built with clang, as does bl2 with BOOT_DEVICE=ram RAM_BOOT_UART_DL=1. However, built with clang and BOOT_DEVICE=emmc, bl2 hangs just before locating the fip partition:

NOTICE:  BL2: v2.10.0  (release):f9e4748a-dirty
NOTICE:  BL2: Built : 10:00:11, Jun  6 2024
INFO:    BL2: Doing platform setup
NOTICE:  WDT: Cold boot
NOTICE:  WDT: disabled
NOTICE:  EMI: Using DDR4 settings
NOTICE:  EMI: Detected DRAM size: 512MB
NOTICE:  EMI: complex R/W mem test passed
NOTICE:  CPU: MT7981 (1300MHz)
INFO:    MediaTek MMC/SD Card Controller ver 20201125, eco 0
INFO:    Patching MBR num of sectors: 0xffffffff -> 0xe8ffff
[hang]

Built with gcc instead, it continues normally:

NOTICE:  BL2: v2.10.0  (release):f9e4748a-dirty
NOTICE:  BL2: Built : 09:06:56, Jun  6 2024
INFO:    BL2: Doing platform setup
NOTICE:  WDT: [40000000] Software reset (reboot)
NOTICE:  EMI: Using DDR4 settings
NOTICE:  EMI: Detected DRAM size: 512MB
NOTICE:  EMI: complex R/W mem test passed
NOTICE:  CPU: MT7981 (1300MHz)
INFO:    MediaTek MMC/SD Card Controller ver 20201125, eco 0
INFO:    Patching MBR num of sectors: 0xffffffff -> 0xe8ffff
INFO:    Located partition 'fip' at 0x680000, size 0x200000
INFO:    BL2: Loading image id 3
INFO:    Loading image id=3 at address 0x40800000
INFO:    Image id=3 loaded: 0x40800000 - 0x40808148
INFO:    BL2: Loading image id 5
INFO:    Loading image id=5 at address 0x40800000
INFO:    Image id=5 loaded: 0x40800000 - 0x40867068
NOTICE:  BL2: Booting BL31
INFO:    Entry point address = 0x43001000
INFO:    SPSR = 0x3cd
NOTICE:  BL31: v2.10.0  (release):f9e4748a-dirty
[...]

My exact build command for reproduction purposes is

make CC=clang/gcc E=0 LD=ld.lld/ld STATIC=1 \
  PLAT=mt7981 BOARD_BGA=1 DRAM_USE_DDR4=1 BOOT_DEVICE=emmc \
  USE_MKIMAGE=1 MKIMAGE=../u-boot/tools/mkimage bl2

where CC is either clang 18.1.5 or gcc aarch64 14.1.0, and LD is set to ld.lld or ld to match.

@arachsys
Copy link
Author

arachsys commented Jun 7, 2024

A bit of INFO() debugging implies it locks up in load_mbr_header() in drivers/partition/partition.c. The MBR boot signature check succeeds:

	plat_patch_mbr_header(mbr_sector);

	/* Check MBR boot signature. */
	if ((mbr_sector[LEGACY_PARTITION_BLOCK_SIZE - 2] != MBR_SIGNATURE_FIRST) ||
	    (mbr_sector[LEGACY_PARTITION_BLOCK_SIZE - 1] != MBR_SIGNATURE_SECOND)) {
		VERBOSE("MBR boot signature failure\n");
		return -ENOENT;
	}

and then

	tmp = (mbr_entry_t *)(&mbr_sector[MBR_PRIMARY_ENTRY_OFFSET]);

gives tmp = 0x241cae (on my board), but the attempt to dereference tmp->first_lba in the next line locks up bl2. (An INFO("Dereference %u\n", tmp->first_lba) is enough to lockup.)

Built with gcc, tmp = 0x23d5de (bl2 is slightly smaller), dereferencing tmp->first_lba succeeds, and boot continues normally.

For completeness, the hang happens when bl2 is compiled with clang and linked with binutils ld, but doesn't happen when bl2 is compiled with gcc and linked with lld, so the choice of compiler is significant and the choice of linker is not.

@arachsys
Copy link
Author

arachsys commented Jun 7, 2024

Ah, I'm an idiot - it's staring me in the face. tmp is 0x241cae so tmp->first_lba will be an unaligned access. Sure enough, the obvious

diff --git a/drivers/partition/partition.c b/drivers/partition/partition.c
index 1436ddd3..6913d8bd 100644
--- a/drivers/partition/partition.c
+++ b/drivers/partition/partition.c
@@ -56,7 +56,7 @@ static int load_mbr_header(uintptr_t image_handle, mbr_entry_t *mbr_entry)
 {
        size_t bytes_read;
        int result;
-       mbr_entry_t *tmp;
+       mbr_entry_t tmp;

        assert(mbr_entry != NULL);
        /* MBR partition table is in LBA0. */
@@ -81,19 +81,19 @@ static int load_mbr_header(uintptr_t image_handle, mbr_entry_t *mbr_entry)
                return -ENOENT;
        }

-       tmp = (mbr_entry_t *)(&mbr_sector[MBR_PRIMARY_ENTRY_OFFSET]);
+       memcpy(&tmp, mbr_sector + MBR_PRIMARY_ENTRY_OFFSET, sizeof tmp);

-       if (tmp->first_lba != 1) {
+       if (tmp.first_lba != 1) {
                VERBOSE("MBR header may have an invalid first LBA\n");
                return -EINVAL;
        }

-       if ((tmp->sector_nums == 0) || (tmp->sector_nums == UINT32_MAX)) {
+       if ((tmp.sector_nums == 0) || (tmp.sector_nums == UINT32_MAX)) {
                VERBOSE("MBR header entry has an invalid number of sectors\n");
                return -EINVAL;
        }

-       memcpy(mbr_entry, tmp, sizeof(mbr_entry_t));
+       memcpy(mbr_entry, &tmp, sizeof(mbr_entry_t));
        return 0;
 }

fixes it and it boots normally. Presumably gcc's -mstrict-align papers over this UB whereas clang's doesn't.

@arachsys
Copy link
Author

arachsys commented Jul 5, 2024

The bug fix is now in upstream ARM trusted-firmware-a:

ARM-software@21a77e0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant