feat: enhance API filetype detection #445
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use the library for filetype detection
The mimetype detection has always been very naive in the API - we rely on the file extension. If the user doesn't include a filename, we return an error that
Filetype None is not supported
. The library has a detect_filetype that actually inspects the file bytes, so let's reuse this.Add a
content_type
param to override filetype detectionAdd an optional
content_type
param that allows the user to override the filetype detection. We'll use this value if it's set, or take thefile.content_type
which is based on the multipartContent-Type
header. This provides an alternative when clients are unable to modify the header.Testing
The important thing is that
test_happy_path_all_types
passes in the docker smoke test - this contains all filetypes that we want the API to support.To test manually, you can try sending files to the server with and without the filename/content_type defined.
Check out this branch and run
make run-web-app
.Example sending with no extension in filename. This correctly processes a pdf.
For the new param, you can try modifying the content type for a text based file.
Verify that you can change the
metadata.filetype
of the response using the new param: