-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload API instability (fileparse and extract_thumbnail) #1127
Comments
This is likely caused by the filemap in the Shinobu file watcher, although I find it odd that it wouldn't just re-register the archive if the ID is empty? |
I moved the affected instance's archives to an SSD; the files in LRR match* the num of files in the directory now (±1). I think I'll add some logs before running my next upload job; it seems this is only happening in a prod environment, but fortunately there's not many places where something could go wrong here. |
Had nothing to do with redis or vm overcommit, or how the LRR ID computation is different between different Python versions. If you upload an archive X, and then upload an archive Y with the exact same filename as X, AND X and Y do not have the same LRR ID, this bug triggers. Fix coming (hopefully) soon. |
os: ubuntu/linux, docker.
When uploading an archive to the server via the API, there seems to be a ~10-20% chance that an unexpected/unhandled exception occurs at extract_thumbnail -> get_filelist -> is_pdf. Adding try catch to is_pdf and printing the id shows the following:
For some reason, the ID of this archive is empty. However, the archive has been uploaded into the server: if I try to upload manually, I get a duplicate archive error, and docker exec-ing the container shows that the archive has been uploaded in its entirety.
Checking redis, there is no trace of this archive. So the archive is uploaded but not registered in the server/database. Running server-based cleaning like rescan archive/clean search/etc. don't seem to fix the issue. Will probably be checking the database logic for now.
Perhaps more importantly, if an archive is physically present in LRR, rescanning for new archives does not successfully pick up these archives. This is resulting in "dead" archives occupying the contents directory that LRR can't read, while preventing manual or api uploads of the same archive on duplicate grounds.
For the record hasn't happened to me in the past 200k archives I uploaded via the API, though recently I turned on vm.overcommit, so I'm lowkey wondering if that could be a cause.
The text was updated successfully, but these errors were encountered: