Upload API instability (fileparse and extract_thumbnail) #1127

psilabs-dev · 2024-12-11T09:14:32Z

os: ubuntu/linux, docker.

When uploading an archive to the server via the API, there seems to be a ~10-20% chance that an unexpected/unhandled exception occurs at extract_thumbnail -> get_filelist -> is_pdf. Adding try catch to is_pdf and printing the id shows the following:

[Archive] [error] Failed to parse file: "" - error: fileparse(): need a valid pathname at /home/koyomi/lanraragi/script/../lib/LANraragi/Utils/Archive.pm line 43.

For some reason, the ID of this archive is empty. However, the archive has been uploaded into the server: if I try to upload manually, I get a duplicate archive error, and docker exec-ing the container shows that the archive has been uploaded in its entirety.

Checking redis, there is no trace of this archive. So the archive is uploaded but not registered in the server/database. Running server-based cleaning like rescan archive/clean search/etc. don't seem to fix the issue. Will probably be checking the database logic for now.

Perhaps more importantly, if an archive is physically present in LRR, rescanning for new archives does not successfully pick up these archives. This is resulting in "dead" archives occupying the contents directory that LRR can't read, while preventing manual or api uploads of the same archive on duplicate grounds.

For the record hasn't happened to me in the past 200k archives I uploaded via the API, though recently I turned on vm.overcommit, so I'm lowkey wondering if that could be a cause.

The text was updated successfully, but these errors were encountered:

Difegue · 2024-12-11T11:02:33Z

This is likely caused by the filemap in the Shinobu file watcher, although I find it odd that it wouldn't just re-register the archive if the ID is empty?
IDs are at the core of every file detection, so unless this file specifically makes an empty ID every time, it'd eventually be picked up as soon as it gets a different one.

psilabs-dev · 2024-12-12T08:11:55Z

I moved the affected instance's archives to an SSD; the files in LRR match* the num of files in the directory now (±1). I think I'll add some logs before running my next upload job; it seems this is only happening in a prod environment, but fortunately there's not many places where something could go wrong here.

psilabs-dev · 2024-12-18T03:57:54Z

Had nothing to do with redis or vm overcommit, or how the LRR ID computation is different between different Python versions.

If you upload an archive X, and then upload an archive Y with the exact same filename as X, AND X and Y do not have the same LRR ID, this bug triggers. Fix coming (hopefully) soon.

psilabs-dev mentioned this issue Dec 19, 2024

Fix upload API order bug #1128

Merged

Difegue linked a pull request Dec 24, 2024 that will close this issue

Fix upload API order bug #1128

Merged

Difegue closed this as completed in #1128 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload API instability (fileparse and extract_thumbnail) #1127

Upload API instability (fileparse and extract_thumbnail) #1127

psilabs-dev commented Dec 11, 2024 •

edited

Loading

Difegue commented Dec 11, 2024

psilabs-dev commented Dec 12, 2024 •

edited

Loading

psilabs-dev commented Dec 18, 2024 •

edited

Loading

Upload API instability (fileparse and extract_thumbnail) #1127

Upload API instability (fileparse and extract_thumbnail) #1127

Comments

psilabs-dev commented Dec 11, 2024 • edited Loading

Difegue commented Dec 11, 2024

psilabs-dev commented Dec 12, 2024 • edited Loading

psilabs-dev commented Dec 18, 2024 • edited Loading

psilabs-dev commented Dec 11, 2024 •

edited

Loading

psilabs-dev commented Dec 12, 2024 •

edited

Loading

psilabs-dev commented Dec 18, 2024 •

edited

Loading