Skip to content
This repository has been archived by the owner on Feb 27, 2023. It is now read-only.

Sdcard stuck trying to write the image #487

Open
carlonluca opened this issue Apr 6, 2018 · 23 comments
Open

Sdcard stuck trying to write the image #487

carlonluca opened this issue Apr 6, 2018 · 23 comments

Comments

@carlonluca
Copy link

Hello! I'm using noobs for a project. It seems that sometimes the write procedure of the image blocks suddenly. When it is stuck, the system is up and running, I can login via ssh (I added it to buildroot) and the recovery application works and responds properly. The writing thread of the recovery app instead is stuck (the one that wgets and untars).
In that situation I logged in using ssh and I found that trying to write any data to the sdcard results in the deadlock of the process. Tried to simply dd 1MB into the sdcard and dd couldn't finish. Only solution is to reboot. After a reboot everything is back to normal and the operation typically completes. The system will properly work from then on. This happens from time to time with many devices and sdcards. dmesg doesn't show any error from the kernel.
Any idea what may be causing this? Anyone who got this behaviour before? Thanks!

@procount
Copy link
Contributor

procount commented Apr 6, 2018

I have also seen this occasional behaviour from my PINN variant, but as there are no error messages, it is not easy to see what has gone wrong.
What models of RPi have you seen this happen on?
What version of NOOBS did it happen on?
How were you connected to the internet - Ethernet or wifi / built-in or external (which)?

@carlonluca
Copy link
Author

I'm using noobs only on Pi3.
I've been using 9a4547c, but I also tried latest master, where I see kernel and firmware files were updated: I can reproduce the same behaviour. After the image is written, the image itself works perfectly fine. Never had problems writing to the sdcard.
I download from LAN using the regular ethernet interface. It does not seem to be a network/server issue as everything seems to be working via ssh, except writing to the sdcard.

@procount
Copy link
Contributor

procount commented Apr 6, 2018

Hmm, I've not seen it happen on v2.4 (or PINN equivalent), or earlier versions.
I hope @XECDesign can come up with some ideas on how to debug this to identify where the failure comes from - Ethernet, wget, xz, bsdtar, SDcard driver?
Which OS caused it to stick? (Just wondering if the type of download/tar/compression affects it)

@carlonluca
Copy link
Author

The image I write is a custom image based on raspbian. I use xz compression. I tried to add:

CONFIG_STACKTRACE_SUPPORT=y
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y

but still I cannot get any log.

@procount
Copy link
Contributor

procount commented Apr 6, 2018

Probably because it has not actually crashed, but just got stuck somewhere... 🤷‍♂️

@carlonluca
Copy link
Author

Also tried with CONFIG_DETECT_HUNG_TASK. I remember the kernel should be able to also print the stacktrace in case something hangs, but not sure if that is optional and if that is properly enabled by these directives...

I don't remember ever seeing anything similar in raspbian, so I guess it is probably useless to ask in https://github.com/raspberrypi/linux right?
Thanks for your help.

@lurch
Copy link
Collaborator

lurch commented Apr 6, 2018

Might be worth investigating if it only happens with images that are wget-ed and extracted (i.e the way that NOOBS Lite installs Raspbian), or also happens with images extracted directly from the SD card (i.e. the way that full NOOBS installs Raspbian) ?

@procount
Copy link
Contributor

procount commented Apr 6, 2018

IIRC, I've only seen it when downloading, but didn't take note whether it was ethernet or wifi.

@procount
Copy link
Contributor

procount commented Apr 6, 2018

Ah. It happened tonight in PINN v2.5.4 when installing Retropie on a 3B+ from a USB stick.
Normally ctrl-alt-f2 followed by ctrl-alt-del would reboot it, but not in this stuck state

@carlonluca
Copy link
Author

I tried to "reboot -f" once and it didn't work. But the watchdog seemed to be able to do it instead the other day.
Anyone who knows what could be enabled in the kernel to get log messages to debug?

Were you using xz compression by any chance? I typically use xz (-9) and it happens frequently. xz -9 requires much mem so I then tried with gz, took me some time but at the end I could reproduce with it as well. Difficult to say if it makes any difference or not.

@procount
Copy link
Contributor

procount commented Apr 7, 2018

My retropie image was compressed with xz but with standard compression not -9 cos it uses too much memory.

@carlonluca
Copy link
Author

This is how mem is seen during the procedure with gz:

# free -m
             total         used         free       shared      buffers
Mem:           231          224            6            0           13
-/+ buffers:                210           20
Swap:            0            0            0
# cat /proc/meminfo 
MemTotal:         236676 kB
MemFree:           16740 kB
MemAvailable:     178028 kB
Buffers:           21788 kB
Cached:           128740 kB
SwapCached:            0 kB
Active:            45064 kB
Inactive:         112384 kB
Active(anon):       7032 kB
Inactive(anon):      220 kB
Active(file):      38032 kB
Inactive(file):   112164 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              9164 kB
Writeback:          5116 kB
AnonPages:          6912 kB
Mapped:            15676 kB
Shmem:               336 kB
Slab:              25380 kB
SReclaimable:      18196 kB
SUnreclaim:         7184 kB
KernelStack:         872 kB
PageTables:          352 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      118336 kB
Committed_AS:      44396 kB
VmallocTotal:    1835008 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:           8192 kB
CmaFree:            3912 kB

I see that only 231MB of memory is available. Is this written somewhere? This is a pi3 so 231MB does not seem a correct value, does it? Is this written somewhere in the sources?

@maxnet
Copy link
Collaborator

maxnet commented Apr 9, 2018

I see that only 231MB of memory is available. Is this written somewhere?

Put start.elf/fixup.dat on the SD card instead of recovery.elf if you need access to more.

@carlonluca
Copy link
Author

Thank you for your answer. I read in the wiki:

Running recovery.elf then switches the firmware into "NOOBS mode" - it uses recovery.img instead of kernel.img, recovery.cmdline instead of cmdline.txt, and it sets the root filesystem to recovery.rfs.

So does it mean I cannot use start.elf/fixup.dat with noobs?

@lurch
Copy link
Collaborator

lurch commented Apr 9, 2018

I can't remember the details now, but IIRC you can tweak some settings in config.txt to get it to read the NOOBS-named files.

@carlonluca
Copy link
Author

carlonluca commented Apr 9, 2018

Ah thanks, so I should:

  1. remove recovery.elf;
  2. put start.elf and fixup.dat in boot;
  3. in config.txt set cmdline=recovery.cmdline and kernel=recovery.kernel.

Is this correct? What I'm missing according to the wiki is how to set rootfs to recovery.rfs. Also can I extract start.elf and fixup.dat from any Raspbian image?
I guess increasing ram won't change anything, but I'm not sure what else I could try.

@lurch
Copy link
Collaborator

lurch commented Apr 9, 2018

IIRC recovery.rfs is an initramfs, if that helps.
You can also get the files you need from https://github.com/raspberrypi/firmware/tree/master/boot

@carlonluca
Copy link
Author

I tried with this in config.txt but I'm getting a kernel panic (cannot mount root fs):

cmdline=recovery.cmdline
kernel=recovery7.img
initramfs=recovery.rfs

@lurch
Copy link
Collaborator

lurch commented Apr 9, 2018

It's a long long time since I played with any of this, but I think @procount might have more recent experience?

@XECDesign
Copy link
Contributor

Potentially useful things to try to get more info:

You may learn something by enabling the driver's logging feature which will record activity the kernel message log. Add dtparam=sd_debug=on to config.txt and reboot. You can also eliminate a DMA problem as being the cause (at the cost of some performance) by adding dtparam=sd_force_pio=on.

raspberrypi/linux#2500 (comment)

@procount
Copy link
Contributor

procount commented May 24, 2018

I turned on the logging in PINN but kept DMA,
I was writing Raspbian from USB to the SD card on a Pi3B using Linux recovery 4.14.37-rescue-v7, but using the OLDish firmware (31st March). (EDIT: I suppose it was 22de0bb68d34fd210ba9d086c6a1fc5e90f0bfbb)

tail - f /tmp/debug

Executing: "/sbin/mkfs.fat -n prjboot -F 32 /dev/mmcblk0p6" 
Executing: "sh -o pipefail -c "xz -dc /tmp/media/sda1/os/Raspbian/boot.tar.xz | bsdtar -xf - -C /mnt2  --no-same-owner "" 
finished writing filesystem in 1.643 seconds 
Executing: "/usr/sbin/mkfs.ext4 -L prjroot -O ^huge_file /dev/mmcblk0p7" 
Executing: "sh -o pipefail -c "xz -dc /tmp/media/sda1/os/Raspbian/root.tar.xz | bsdtar -xf - -C /mnt2 "" 

tail - f /tmp/messages

Jan  1 00:01:22 recovery kern.info kernel: [   82.882741] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:22 recovery kern.info kernel: [   82.882790] mmc0: cmd 25 0xc3ca50 (flags 0xb5) - write 760*512
Jan  1 00:01:22 recovery kern.info kernel: [   82.917723] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:22 recovery kern.info kernel: [   82.917771] mmc0: cmd 25 0xc3cd48 (flags 0xb5) - write 648*512
Jan  1 00:01:22 recovery kern.info kernel: [   82.948890] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:22 recovery kern.info kernel: [   82.948926] mmc0: cmd 25 0xc3cfd0 (flags 0xb5) - write 816*512
Jan  1 00:01:23 recovery kern.info kernel: [   82.999782] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:23 recovery kern.info kernel: [   82.999834] mmc0: cmd 25 0xc3d300 (flags 0xb5) - write 1024*512
Jan  1 00:01:23 recovery kern.info kernel: [   83.043922] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:23 recovery kern.info kernel: [   83.043959] mmc0: cmd 25 0xc3d700 (flags 0xb5) - write 1024*512

Nothing unusual in the logs :(
The slides were still changing, the language and keyboard dialog was still responsive, as was ssh. Just the Imagewritethread seemed to have stopped.

@carlonluca
Copy link
Author

carlonluca commented May 31, 2018

I experienced exactly the same behavior. Nothing unusual in the logs related to the sdcard. I kept DMA as well.

@carlonluca
Copy link
Author

Also on Pi 3 B+ the same is happening. Everything seems to be working properly but it seems the thread writing to the sd card is stuck.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants