-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement upload API #1085
Implement upload API #1085
Conversation
implementation of upload API logic
api exception handling validation and integrity checks
reflect mojo upload object
Thumbnail generation for upload API
Open for review and testing. I've tested this on ~50k cbz archives with metadata (without obvious problems). I've also tested this in combination with/wo automatic metadata plugins, checksum validation, optional metadata (title, tags, summaries), mandatory fields (filename, file data), localhost/reverse proxy. I'll add that the checksum validation is for when a file is corrupted during transit, not validating a file corrupted at the origin, if you upload a corrupted file it will still be corrupted and not checked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good at a glance to me - I'll have to play with it for a bit.
Would it be possible to get the documentation updated?
This would land in https://github.com/Difegue/LANraragi/blob/dev/tools/Documentation/api-documentation/archive-api.md - Please duplicate an existing {swagger} element and edit it to match. Even if it's not perfect I'll help with suggestions afterwards 👍
I guess one additional thought: You mention only one upload can be done at a time with this API, but I'm not seeing any obvious limitations that would prevent running it sequentially. I wouldn't put it behind client implementations to manhandle this endpoint to death, so we should take precautions if needed 🤔 |
Ahh you're right, LRR does support concurrent API calls now that I tested it again. I don't know why last time for some reason my second worker would always hang and lose connections... After testing concurrent uploads, I'm seeing two (probably fixable) issues with concurrency. Type 1: when the same archive is uploaded concurrently by two clients, on rare occasions the server will throw a "The file couldn't be renamed in your content folder: No such file or directory" error to the later client. This can happen when the two uploads differ by as much as 30ms.
<!DOCTYPE html>
<head>
<title>LANraragi Server Error</title>
<meta name="viewport" content="width=device-width" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="apple-mobile-web-status-bar-style" content="black" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<link type="image/png" rel="icon" href="favicon.ico" />
<link rel="manifest" href="app.webappmanifest" />
</head>
<body style='background: none repeat scroll 0% 0% brown; color: white; font-family: sans-serif; text-align: center'>
<img src='/img/flubbed.gif' />
<br />
<h2>I flubbed it while rendering your page!</h2>
It's likely there's a bug with the server at this point.<br />
<p>
at /home/koyomi/lanraragi/script/../lib/LANraragi/Utils/Archive.pm line 336.
</p>
<h3>Some more info below if available:</h3>
<pre>{
"action" => "create_archive",
"controller" => "api-archive"
}
</pre>
</body>
</html> I've noticed that when one worker encounters this error, the other worker will encounter a concurrency error of the first type. This is more rare than the first type of error |
Sure! I'll take a look at this |
Co-authored-by: Difegue <8237712+Difegue@users.noreply.github.com>
Ok, don't think I'll have anything else to add, I'll be doing final builds and stress tests and opening it up for review later. |
Sounds good, feel free to undraft once you think it's ready and I'll give it one last passover. |
One thing that came to mind before I forget, there are no file size upload limits. I tested whether this handles large files well and I was able to upload a 18gb archive without issue. Although I can't read it due to temp folder limitations. That said, do we want a 500mb filesize limit? |
No, if there's no limitations on the server I wouldn't implement an artificial one in the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last round of comments from me after reviewing the whole thing anew.
my $summary = $self->req->param('summary'); | ||
|
||
# return error if archive is not supported. | ||
if ( !is_archive($filename) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't really need this bit if it's already done in handle_incoming_file
, right?
(I'm aware Controller->Upload does the same, but I think I'll eventually deprecate that code path in favor of just using the API in the webclient)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_archive
ensures that the filename has a valid extension. I don't know how removing this will affect the tempdir logic, I'll need to test it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the archive check block is removed, an unhandled exception will be thrown when a filename with an invalid extension or no extension is passed:
... fileparse(): need a valid pathname at /home/koyomi/lanraragi/script/../lib/LANraragi/Utils/Archive.pm line 38.
which is the is_pdf subroutine... which is really weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, I guess is_pdf wasn't written to handle invalid filenames at all and the issue just never popped up - good to know.
my $tempdir = tempdir(); | ||
|
||
my ( $fn, $path, $ext ) = fileparse( $filename, qr/\.[^.]*/ ); | ||
my $byte_limit = LANraragi::Model::Config->enable_cryptofs ? 143 : 255; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this shouldn't be enforced at the handle_incoming_file
level rather than both in here, in Model->Upload->Download_url, and the old Controller upload.
Might be one for later tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the initial purpose of the byte limit? I only kept it there to mirror, I don't know how this will affect temporary file creation if a filename exceeds this limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's mostly there for synology hardware which has that 255 byte cap, and the notably insane who use cryptoFS on top of that, which limits it down to a further 143 bytes.
I believe this would break even temporary filenames if exceeded.
Co-authored-by: Difegue <8237712+Difegue@users.noreply.github.com>
Co-authored-by: Difegue <8237712+Difegue@users.noreply.github.com>
Co-authored-by: Difegue <8237712+Difegue@users.noreply.github.com>
Congratulations @psilabs-dev, the maintainer of this repository has issued you a holobyte! Here it is: https://holopin.io/holobyte/cm3ocndk629080cmk40k0mke9 This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account. |
This is good for me - the leftover comments aren't essential and I might get to them at some point during a refactor. Thanks again for the work! |
Implementation of an archive upload API, that uploads the content of an archive and all metadata, if any, to the LRR server. (#1005)
While it is possible to write a LRR plugin for a website W, it is often the case to use dedicated, feature-rich, and actively maintained downloaders (e.g. PixivUtil2 for Pixiv) that specifically pulls metadata/archives from W. However, moving downloaded content from these downloaders to LRR is still a human process. In the case of PixivUtil2, metadata is stored in a sqlite db, and the storage/language/architecture is different for each downloader.
This API implementation (using multipart/form-data encoding) allows users to perform language-agnostic archive migrations or mirrors from their existing archive servers and downloaders to the LRR server via the API, or write comparatively thin HTTP client wrappers around their downloaders that negotiate with and push to LRR during that application's runtime. The user may wish to attach the SHA1 hash of the file to check for file integrity and avoid corruptions during transit.
Return values (see documentation)
Include in body: file content (binary), filename, title (optional), summary (optional), tags (optional), category ID (optional), plugin calls (?)
Example of a
PUT /api/archives/upload
in Python with the "example.zip" archive:This is a synchronous API implementation. Theoretically, we can also run this API as a Minion job but this is not a long-running task and doesn't justify creating a job, also error handling is more streamlined and the API should not take too many requests to begin with (TBD).