Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: complete rewrite of drive restoration and image writing #740

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

grulja
Copy link
Collaborator

@grulja grulja commented Oct 7, 2024

This is a complete rewrite of the Windows support. It introduces a new library 'libwindisk' that combines use of WMI and WinAPI to manipulate with devices. We use Windows Storage Management for getting information about devices and some actions, especially since pure WinAPI does not seem to have support for things like formatting or partition removal. This library is used in both the app and the helper process used for writing and formatting. We now also don't rely on diskpart since it didn't work properly in some cases.

Fixes #626 | Fixes #575 | Fixes #574 |Fixes #555 | Fixes #96

@grulja grulja marked this pull request as draft October 7, 2024 12:55
@grulja grulja force-pushed the windows branch 15 times, most recently from 432ad64 to 9f814da Compare October 11, 2024 12:50
@grulja grulja force-pushed the windows branch 5 times, most recently from b738491 to c0045f7 Compare October 18, 2024 10:58
@grulja grulja changed the title WIP: Windows support Windows: complete rewrite of drive restoration and image writing Oct 18, 2024
@grulja grulja requested a review from gastoner October 18, 2024 11:00
@grulja grulja marked this pull request as ready for review October 18, 2024 11:00
@grulja grulja force-pushed the windows branch 4 times, most recently from 223f9c3 to c5252fd Compare October 18, 2024 11:46
@grulja
Copy link
Collaborator Author

grulja commented Oct 19, 2024

Hi @kparal, do you think you can do some testing? You reported some of the mentioned issues in this PR and your testing was always super helpful.

You can download Windows installer with this change from the latest CI run here.

@grulja
Copy link
Collaborator Author

grulja commented Oct 24, 2024

Well, this is interesting. I tested about 6 writes, both to a thumb drive and an external hdd, and all of them fail with the 4.8% bug. The write phase proceeds fine, but the verification phase is always almost immediately skipped (I see only the first segment of the progress bar filled out, and then it immediately it skips to Finished!), and then it always fails verification (checkisomd5).

So whatever you did, it somehow made the check phase be skipped (possibly it checks 4.8% and then skips the rest), and the 4.8% bug now happens every time.

Alright, the checkisomd5 is fixed. I used Qt before, then changed it to use directly WinAPI to write and for some reason it fails later with the verifacation. Going back to Qt fixes it. Still, @gastoner reported before that the writing is still broken for him. I was not able to install Windows 11 on my another laptop, but Windows 11 in VM and Windows 10 on an old laptop do work reliably. I'm running out of ideas what could possibly be wrong.

@grulja grulja force-pushed the windows branch 7 times, most recently from 5893e45 to 14f1be9 Compare October 25, 2024 10:34
@grulja
Copy link
Collaborator Author

grulja commented Oct 25, 2024

@kparal @gastoner can you please do one more round of testing?

I have installed Windows 11 and could reproduce the error. I have changed few things and it made things more reliable. Still I got verification error afterwards, but not all the time. This was only with faster USB drive so I suspect it was caused by some data still not being fully written. I have added a flush() call after a successful write to ensure the last data is written in hope it fixes the problem for good. I did one try with that and it worked.

Thank you.

@kparal
Copy link

kparal commented Oct 25, 2024

I tried it about 6 times. I haven't seen the 4.8% bug at all. I haven't seen the double-write bug. I still saw the first-write-after-boot bug (screenshot log1 log2) , but only with my thumb drive, not usb hdd. Also, waiting a few minutes after rebooting seemed to avoid this bug.

@gastoner
Copy link
Collaborator

Here are mine observations:

  1. When I try to format drive with Rufus (nonbootable - GPT or MBR) - FMW will write the iso successfuly ✅
  2. When I try to write the iso aggain (without restoring using FMW) - it failes FedoraMediaWriter-helper.log
  3. When I try to write the iso (using Rufus) then restoring using FMW and writing iso using FMW - was successful ✅
  4. After restoring (using FMW) next writing was not successful (logs are the same) ❌

With using Rufus to write the iso, from log I see some similar patterns as in FMW-helper.log rufus.log
Like Could not get Disk Extents: [0x00000001] Incorrect function. or Write error [0x000001B1] A device which does not exist was specified.

@grulja grulja force-pushed the windows branch 7 times, most recently from de83c43 to 37db5b3 Compare October 29, 2024 15:09
@grulja
Copy link
Collaborator Author

grulja commented Oct 29, 2024

Hi @kparal, can you please do one more (last) round of testing? @gastoner says this version finally works for him too. I will then merge this and make a new release of FMW.

This is a complete rewrite of the Windows support. It introduces a new
library 'libwindisk' that combines use of WMI and WinAPI to manipulate
with devices. We use Windows Storage Management for getting information
about devices and some actions, especially since pure WinAPI does not
seem to have support for things like formatting or partition removal.
This library is used in both the app and the helper process used for
writing and formatting. We now also don't rely on diskpart since it
didn't work properly in some cases.

Fixes FedoraQt#626 | Fixes FedoraQt#575 | Fixes FedoraQt#574 |Fixes FedoraQt#555 | Fixes FedoraQt#96
@grulja
Copy link
Collaborator Author

grulja commented Oct 29, 2024

I'm going to merge this now. I will make something like a pre-release so this gets more tested and we can catch the remaining corner cases before making a "stable" release.

@grulja grulja merged commit 77ec947 into FedoraQt:main Oct 29, 2024
6 checks passed
@kparal
Copy link

kparal commented Nov 13, 2024

I've spent several hours (sigh) testing 5.1.90 pre-release with two different usb sticks, and it mostly works fine. I only filed #759 as a regression.

However, I still see some problems, randomly and rarely. I saw the 4.8% problem once, and I saw the Restore functionality only re-creating a partition but not formatting it several times. But I'm starting to attribute it to some problem in Windows itself. I noticed that every time I have a Fedora image on the stick and plug it in, the "Safely remove" systray icon only contains "..." when I display its menu, instead of "Eject $thumbdrive". It's like some operation is pending and never finishes. It's even worse in the Disk Management tool, if you try to eject the drive from there, the app completely freezes. When I play with the system like this, often I then saw the inability to restore the drive, and I had to reboot the PC to make it work again. Some operation was stuck/pending/etc in the background, probably, blocking access. But not always. It's exhausting, because when I think I finally have a reproducer, I find out that it works differently with a different stick. It's probably some race in Windows itself.

So overall 5.1.90 seems to work OK, and the issues I randomly see might just be Windows bugs, which are not going to disappear with any FMW rewrite...

@grulja
Copy link
Collaborator Author

grulja commented Nov 13, 2024

I've spent several hours (sigh) testing 5.1.90 pre-release with two different usb sticks, and it mostly works fine. I only filed #759 as a regression.

However, I still see some problems, randomly and rarely. I saw the 4.8% problem once, and I saw the Restore functionality only re-creating a partition but not formatting it several times. But I'm starting to attribute it to some problem in Windows itself. I noticed that every time I have a Fedora image on the stick and plug it in, the "Safely remove" systray icon only contains "..." when I display its menu, instead of "Eject $thumbdrive". It's like some operation is pending and never finishes. It's even worse in the Disk Management tool, if you try to eject the drive from there, the app completely freezes. When I play with the system like this, often I then saw the inability to restore the drive, and I had to reboot the PC to make it work again. Some operation was stuck/pending/etc in the background, probably, blocking access. But not always. It's exhausting, because when I think I finally have a reproducer, I find out that it works differently with a different stick. It's probably some race in Windows itself.

So overall 5.1.90 seems to work OK, and the issues I randomly see might just be Windows bugs, which are not going to disappear with any FMW rewrite...

Thank you so much for a thorough testing. I really appreciate it. Do you think in a normal use, where the user just writes the image, boot it and restores the drive aftewards this will work reliably? Basically asking whether the issues you mentioned are only reproducible when doing multiple operations one after the other? I will look into the issue you reported, I already have some ideas what could be wrong.

@grulja
Copy link
Collaborator Author

grulja commented Nov 13, 2024

@kparal would you be able to get a log (MediaWriter-helper.txt) from when the restore functionality fails to format the partition?

@kparal
Copy link

kparal commented Nov 13, 2024

Here we go:
FedoraMediaWriter-helper.log

But I couldn't reproduce it with just the systray interaction this time, I had to open Disk Management, try to eject the drive from there, kill the app, reinsert the thumb drive, and then restore in FMW.

On an unrelated note, I noticed that when the drive is affected by th 4.8% bug, it shows up in explorer (D:, "unformatted") and in systray correctly (drive name). When it's not affected by 4.8%, it doesn't show up in explorer and systray only contains "..." instead of drive name. But I can't be sure, this is hard to reproduce several times in a row.

@kparal
Copy link

kparal commented Nov 13, 2024

Do you think in a normal use, where the user just writes the image, boot it and restores the drive aftewards this will work reliably? Basically asking whether the issues you mentioned are only reproducible when doing multiple operations one after the other?

The fewer operations, the higher likelihood of things working, I assume. But it's tricky. I might have a Windows computer where I create the thumb drive, then use it to install Fedora on a different computer, and then put the thumb drive back to the Windows computer to restore it. That's already two actions. And it might be different if you closed FMW in the meantime or kept it running. And I'm unable to pinpoint a simple reproducer to my issues, so I can't say how much reliable a simple use case is. But overall I think this is definitely not worse than previous FMW versions, if that helps :-) In other words, I think you can release it this way.

@kparal
Copy link

kparal commented Nov 13, 2024

This could be a big source of my troubles. Maybe if we fix that, all of these issues might go away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants