Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement chunked uploading with lazy ops header #36

Closed
butonic opened this issue Mar 7, 2019 · 24 comments
Closed

Implement chunked uploading with lazy ops header #36

butonic opened this issue Mar 7, 2019 · 24 comments

Comments

@butonic
Copy link
Member

butonic commented Mar 7, 2019

We need to bring this to reva as clients want to drop support for the old chunking algorithm

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

phoenix uses chunked uploading and the sync client as well, so I am going to add a PoC for tho owncloud storage driver, based on redis. Upstream issue is cs3org/reva#266. This serves to keep track of the PoC using redis.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

  • ocdav handler
    • where do we store the chunks?
    • how does ocdav wait for the assembly to finish? wait for the worker ...
  • if lazy ops is not set, wait for worker to finish
  • if lazy ops is set, use worker id to poll worker status ... if that is possible?

We could use https://github.com/thoas/bokchoy to use a library that implements queues on top of redis. But the question now is where do we put things in the queue? In ocdavsvc? Or does the storage provider have to deal with that? The cs3 has no notion of chunked uploads. It leaves the upload to the http data svc using the https://cs3org.github.io/cs3apis/#cs3.storageproviderv0alpha.InitiateFileUploadResponse

We could implement the ocdav svc to create a temporary folder inside the target directory, but we agreed that having a quarantine area makes sense. The quarantine area could be a distinct storage to keep data from passing storage boundaries, eg by a virus scan. So an upload would be assemble in storage a, scan, if successful: move to storage b (final destination).

urgh ... we need to keep track of these events anyway, so the user can list the storage activities ... which includes renames, creates, deletes and should aggregate events to the same file ... which brings us back to https://github.com/owncloud/nexus/issues/69

There is a difference between the filesystem events and the activities the user can see ... if a file is updatet at 1000Hz it does not make sense totigger the workflow for the file that many times. At least for high ingress storages. For more secure storages it makes sense to allow uploads at 1000Hz but only allowing downloads until the file has passed postprocessing. If you are editing a file then the etag will change and you can overwrite the file because you know the current etag. that would allows updating the file sequentially, in effect locking the updates to a single producer. he will have to wait for a response though. The client can choose not to send the If-match header and just overwrite whatever is there. It actually does not matter if there is a workflow. the file content is not readable until postprocessing unlocks the file. What if a file upload is in progress ... would clients get the old version of the content or would we fail? And this is how write locks came into being ... with all the horrible lock starvation that comes with it.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

sidenote on filesystem journals as sources for the activity:

  • ext4magic: a tool to list the journal of ext4: http://ext4magic.sourceforge.net/journal_en.html the ext4 journal is a ring, but can contain gabs because on remounts the journal is overwritten from the beginning, so recent entries may be lost ... not an option to rely on the ext4 fs as the source of activities ... or only with the described tradeoff. Also: works based on inodes
  • xfs_logprint: list the xfs log ... works on inodes ... no idea if it is a ring ...

so ... neither of those make sense as a source for the activity log ... hm ... well ... maybe in addition to inotify?

to truly scale we nood to push the events into the storage layer, where we can assume enough free storage as it can be scaled better than ram. In conclusion I would argue it makes sense to keep the events as individual files on the storage. That allows geo-distributing them along the file data and metadata. which is one of the core design principles for ocis.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

hm, is the activity log more than a recursive list of files, folders, trash items and versions ordered by mtime? If so it is a cache and can be reconstructed from the existing metadata...

for now, cs3 has neither an activity api nor an event bus or a queue. the implementation needs to happen in the ocs api anyway. the activity service can be implemented in many ways:

  • a dedicated service that we feed with events
  • store activity entries in the filesystem? under a dedicated namespace in the cs3 api? set permissions on events if they are stored as files? or use a list per storage? with every user home and every share being a separate storage? slowly move events to the root? hm no ... what happens if we use acls that take away access to a subtree of a shared storage (negative acls ... or an enforced !d in eos)

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

how do we translate chunked uploads to the ocs api into cs3?

    1. MKCOL https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529 this creates the upload collection
    1. PUT https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529/0, PUT https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529/10485760 the PUT requests also contain an x-oc-mtime: 1568643784.11 header ... and a content-disposition: attachment; filename="Armbian_5.95_Odroidxu4_Debian_buster_default_4.14.141_minimal.7z" header. each chunk is assignod its own fileid, as can be seen in the response headers oc-fileid: 00000285oc6mnsqnwqft, oc-fileid: 00000286oc6mnsqnwqft ... the last put may be smaller than the chunk size. Where do we get the chunk size from? does the client determine the chunk size? I think so. It allows the clients to adjust the size based on the bandwith. TODO add response header to allow server to recommend new chunk size?
    1. MOVE https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529/.file with destination: https://demo.owncloud.com/remote.php/dav/files/demo/tast/Armbian_5.95_Odroidxu4_Debian_buster_default_4.14.141_minimal.7z header. also has the optional oc-lazyops: 1 header. oc-total-length: 115824915 can be used to check the size ... but meh ... should have been sent in the beginning, so the server can preallocate the file. also has the x-oc-mtime: 1568643784.11 set again. This time the response carries a x-oc-mtime: accepted. We also get oc-etag: "2a134fcb07dd624137bd9cd9b9c241d7" and etag: "2a134fcb07dd624137bd9cd9b9c241d7". finally the oc-fileid: 00000289oc6mnsqnwqft is the fileid for the destination file. The lazyops header can return a status: 201 (ok) or a status: 202 (accepted) with a OC-JobStatus-Location: https://demo.owncloud.com/remote.php/dav/job-status/demo/5a7c5838-56cb-47b8-ac0e-4fb1e6d21316 and a connection: close header
    1. GET https://demo.owncloud.com/remote.php/dav/job-status/demo/5a7c5838-56cb-47b8-ac0e-4fb1e6d21316 which might get {"status":"started"}, {"status":"finished","fileId":"oc1234","ETag":"\"abcdef\""} or {"status":"error","errorCode":404,"errorMessage":""} in case of an error

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

given that the lazy ops header implementation already introduces a job queue I will implement a queue outside of the cs3 api. we can add one to cs3 if it becomes necessary. however, I assume that becomes an issue when trying to implement chunked parallel uploads using the cs3 api on its own.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

  1. The MKCOL can be translated to a ._reva_atomic_upload_web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529 temp dir in the distination folder. This allows it to work with any storage. We could remove the access permissions to this dir. For now I would hide the file in listings or replace it with a placeholder reference. ... Hah .. No .. not possible, because we don't HAVE the destination dir yet ... 💥 so ... where do we put the files? a dedicated /uploads namespace? in the users home? in a .uploads folder in a users home?

  2. PUT requests can store files in that directory. No problem here

  3. MOVE contains source and destination, so no problems regarding those. but lazy ops is interesting. we could start the implementation without supporting lazy ops. at least the sync for large files would work.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

on the one hand, we want to avoid having to copy the file to the target destination when they reside on the same storage. A move would be fister in this case. On the other hand we want to prevent access if the file has not gone through postprocessing.

The idea of the quarantine area was born out of the pain of having to copy data again after it has been uploaded. For small files this may not be an issue, but it becomes painful for large files.

Thinking of the quarantine area as a high ingress storage with restricted access might make more sense. First, data transfer between cient and server should be fast... preferably parallel and allow other clients do download the chunks while they are being uploaded.

But other client should only be able to access the file when it has been processed. How can they download chunks if the file has not yet been processed? clients could generate a random symmetric encryption key and send it to the server. all chunks ar encrypted symmetrically. The clients can start downloading chunks, even if the file has not been processed. if processing finishes the server releases the key to the clients and they can decrypt the chunks instead of waiting for the complete file. This would decrease latency ... but it might cause a redownload if postprocessing changes the file ...

In any case ... I need to think about the upload and quarantine area as a high ingress storage ... the question is if cs3 can detect if the underlying storage is the same and issue a move insteayd of a copy...

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

the best I can come up with for now is to put chunked uploads into the users home storage. under a hidden .uploads folder. that way, the final move is fast when it is done on the same storage. we can detect if source and target storage are the same using the storage id. move if they are the same, copy if they differ ... hm, but we need to copy the chunks for assembly anyway ... 🤔 the final move ALWAYS is a copy ... linking original discusson on this owncloud/core#4997

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

  • AFAICT we need the full file size on the first MKCOL in order to precreate the file with the desired length. then we can seek to tho offset given by the PUT requests, which would save us the assembly. This can be an optional enhancement.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

  • If we get the destination on the MKCOL we can store the file on the correct storage. If it supports seeking we can even omit chunk assembly. However access needs to be restricted until the file has been processed. maybe a get temp dir / file CS3 api call makes sense?

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

hm the cs3 api does not deal with file up or download. maybe this is more a question of the datasvc ...

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

aaand @evert had his say on tus.io as well: tus/tus-resumable-upload-protocol#14 (comment)

Lookid into it as a replacement for the datasvc... still does not handle multiple small files well ... any pointer in that regard is welcome.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

Obviously, we can teach datasvc the http://sabre.io/dav/http-patch/ tricks ... but it would require keeping track of upload progress ... the HEAD request of tus.io is nice to resume uploads.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

link: https://blog.daftcode.pl/how-to-make-uploading-10x-faster-f5b3f9cfcd52 uses compression in the browser to reduce upload size. I wonder if normal PUT requests natively support compression. They should, shouldn't they?

Moreover, unlike servers, browsers have no native way to compress data before uploading. So, here we are, stuck with slower network speeds and bigger files, or… are we?

maybe we can use https://github.com/nodeca/pako or https://github.com/photopea/UZIP.js to compress multiple small files into a single bytestream and upload that.

all of this in an incremental way ... oh well ... maybe as an extension to tus.io?

@evert
Copy link

evert commented Sep 30, 2019

HTTP requests should just be able to use Content-Transfer-Encoding to enable compression. If you want to get real fancy, you could use brotli encoding for more speed-up.

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

@evert I only know Content-Transfer-Encoding from SMTP ... I assume you mean Content-Encoding, which can be set to br for brotli. Ohhh, it neatly already has 11.3. Creating Self-Contained Parts within the Compressed Data. Oh and Content-Transfer-Enconding should not be used ... see https://tools.ietf.org/html/rfc7231#appendix-A.5 Sure enough setting Content-Encoding on PUT / POST requests has not been invented here: https://medium.com/axiomzenteam/put-your-http-requests-on-a-diet-3e1e52333014

@evert
Copy link

evert commented Sep 30, 2019

Yes that's exactly what I meant =)

@butonic
Copy link
Member Author

butonic commented Sep 30, 2019

regarding capabilities we currently have:

  • dav.chunking=1.0
  • dav.async=1.0
  • dav,trashbin=1.0

we could indicate the new capabilities with new properties or increase the version number:

  • dav.chunking=1.1 to indicate the server supports a destination and a oc-total-length on the initial MKCOL request. We can increase the minor version because this is backwards compatible.
  • dav.resume=1.0 could indicate support for fetching the progress of a chunk using HEAD? similar to tus.io: https://tus.io/protocols/resumable-upload.html#example
  • dav.compression=br,gzip although I prefer being able to indicate to the client which compression algorithms can be used as part of the initial MKCOL response. Maybe with an Accept-Encoding ... but it usually is sent by the client to the server. Oh and there have been a few security problems with http compression in the past ...
  • dav.bulk=1.0 for sending a compressed stream of small files? they would be added in batch and only a single event would be propagated?

@butonic butonic self-assigned this Oct 1, 2019
@butonic butonic transferred this issue from owncloud-archive/nexus Oct 1, 2019
@butonic
Copy link
Member Author

butonic commented Oct 1, 2019

In theory it would be possible to model the upload/quarantine area as a dedicated storage that as its last workflow step has the MOVE or if necessary COPY operation to the target storage. Unfortunately, this would need special handling of the fileid, because a cross storage move changes the file id. So when a client initiated an upload it will not get the correct file id until the file has reached the target storage...

how would this look like?

  1. The client MKCOLs an upload dir. If a Destination header is present we can look up the responsible storage, if not we fall back to the users home storage. This translates to an InitiateUploadRequest, not a CreateCollectionRequest, because CS3 delegates upload to an out of bound process. Since we do not yet know the name we need to create a temporery file. If the client sent a destination, we may have a filename and can Initiate a direct upload to the target file.

  2. The storage provider returns a location where we PUT the file. Currently, this is handled by the datasvc. This does however not need to use the same storage driver. It could use a new workflow driver that implements the execution of steps. The last step being: MOVE or COPY to the destination, when the client signals it has finished uploading. To decide on MOVE vs COPY the workflow storage can be configured with two storages: an ingress storage driver and a target storage driver. Both with a ... hmmm ...

Intermezzo ...

The storage drivers only have an Upload function ... it currently does not allow append or range requests. This is a problem if we want to implement resumable uploads for CS3 and use HTTP/2 because we cannot resume uploads. Furthermore, the API does not allow implementing an assembly approach. As it stands, we need to have the file complete before we can call Upload.

If we assume the datasvc sits next to the storageprovidersvc they should have access to the storage. In the case of the owncloud and eos storage driver we could in theory bypass the Upload and directly write to the storage, including seeking to offsets etc. But for s3 this is not possible. S3 however does have multipart uploads. Does that map to our current chunked uploading? well, again, if we get the target in the initial MKCOL we can start the S3 multipart upload.

So, this is more a question of how to implement a datasvc that is storage driver specific.
AFAICT https://github.com/tus/tusd/ can be embedded programatically. We can implement different DataStore implementations to put the files where we need them. Hm, tusd already has a hook system: https://github.com/tus/tusd/blob/master/docs/hooks.md

... end intermezzo

I recommend we replace the datasvc with a tusd based implementation. But since the current chunking does not send the taget in advance, and the CS3 api as well as the storage driver API only support uploading as a single operation we need to defer this and describe a proper solution before going further in that direction.

The current eos driver uses a temp file as well, so the owncloud driver, or any other for that matter has to suffer the penalty of doing a full copy of every file upload .... urgh this is 🐮 💩

@butonic
Copy link
Member Author

butonic commented Oct 1, 2019

Ok, so on the other half of the implementation we have an asynchronous chunk assembly. Normally I would use a proper task queue like https://github.com/RichardKnop/machinery/ to make the queu persistent, prevent multiple processes from starting the assembly ( locking or rether task deduplication).

However, let us remember the idea to push metadata as close to the storage as possible? S3 eg has its own multipart upload. I don't know if it allows uploads to the same upload id from different regions ... I don't see a usecase for it, because a client is likely not going to switch the region between chunks. So, do we need to trasport chunks over geo boundaries? If we want to allow asymettric download of chunks while the file is not fully uploaded that might be a good idea. But for this the chounks would need to be stored on the storage, not in the ocdav svc that currently implements chunking. So this needs to be postponed as well, until we discuss chunking v3?

Which leaves the question of how to prevent multiple ocdavsvc processes from initiating an assembly at the same time. In a HA scenario we have at least two instances. Do they share a queue or do we lock the upload to prevent an additional assembly. Actually what if a client (or a proxy in between) repeats the move. We need to lock the chunked upload dir, not only to the process but to the go routine and deal with timeouts and errors ... tomorrow...

tusd has an in memory or file based locking provider. redis is mentioned, but the only other implementation I found uses etcd. Another pointer towads tusd ... they thought about this ...

Anyway, we can just create a lockfile. open(pathname, O_CREAT | O_EXCL, 0644) is reasonable safe, even on todays nfs. So this is an atomic way to have the first request assemble the chunks and let others fail. First come first served.

whoever gets the lock creates a uuid for the dav/job-status endpoint and updates the status there ... we can even write the json files there, so the jobs can be picked up after a restart?

@butonic
Copy link
Member Author

butonic commented Oct 8, 2019

@butonic butonic added this to the Feature Complete milestone Oct 15, 2019
@butonic butonic removed their assignment Nov 11, 2019
refs pushed a commit that referenced this issue Sep 18, 2020
allow handler to return an error
refs added a commit that referenced this issue Sep 18, 2020
Report trace on a single service basis
refs pushed a commit that referenced this issue Sep 18, 2020
…-started

In case the http server cannot be started the error is logged
@refs
Copy link
Member

refs commented Jan 12, 2021

blocked. see #36 (comment)

@refs refs changed the title implement chunked uploading with lazy ops header Implement chunked uploading with lazy ops header Jan 13, 2021
@butonic
Copy link
Member Author

butonic commented Mar 8, 2021

lazy ops got removed from the clients: owncloud/client#8398
tracking new approach in #214

@butonic butonic closed this as completed Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants