Implement chunked uploading with lazy ops header #36

butonic · 2019-03-07T12:33:35Z

We need to bring this to reva as clients want to drop support for the old chunking algorithm

butonic · 2019-09-30T10:13:56Z

phoenix uses chunked uploading and the sync client as well, so I am going to add a PoC for tho owncloud storage driver, based on redis. Upstream issue is cs3org/reva#266. This serves to keep track of the PoC using redis.

butonic · 2019-09-30T10:58:55Z

ocdav handler
- where do we store the chunks?
- how does ocdav wait for the assembly to finish? wait for the worker ...
if lazy ops is not set, wait for worker to finish
if lazy ops is set, use worker id to poll worker status ... if that is possible?

We could use https://github.com/thoas/bokchoy to use a library that implements queues on top of redis. But the question now is where do we put things in the queue? In ocdavsvc? Or does the storage provider have to deal with that? The cs3 has no notion of chunked uploads. It leaves the upload to the http data svc using the https://cs3org.github.io/cs3apis/#cs3.storageproviderv0alpha.InitiateFileUploadResponse

We could implement the ocdav svc to create a temporary folder inside the target directory, but we agreed that having a quarantine area makes sense. The quarantine area could be a distinct storage to keep data from passing storage boundaries, eg by a virus scan. So an upload would be assemble in storage a, scan, if successful: move to storage b (final destination).

urgh ... we need to keep track of these events anyway, so the user can list the storage activities ... which includes renames, creates, deletes and should aggregate events to the same file ... which brings us back to https://github.com/owncloud/nexus/issues/69

There is a difference between the filesystem events and the activities the user can see ... if a file is updatet at 1000Hz it does not make sense totigger the workflow for the file that many times. At least for high ingress storages. For more secure storages it makes sense to allow uploads at 1000Hz but only allowing downloads until the file has passed postprocessing. If you are editing a file then the etag will change and you can overwrite the file because you know the current etag. that would allows updating the file sequentially, in effect locking the updates to a single producer. he will have to wait for a response though. The client can choose not to send the If-match header and just overwrite whatever is there. It actually does not matter if there is a workflow. the file content is not readable until postprocessing unlocks the file. What if a file upload is in progress ... would clients get the old version of the content or would we fail? And this is how write locks came into being ... with all the horrible lock starvation that comes with it.

butonic · 2019-09-30T11:41:35Z

sidenote on filesystem journals as sources for the activity:

ext4magic: a tool to list the journal of ext4: http://ext4magic.sourceforge.net/journal_en.html the ext4 journal is a ring, but can contain gabs because on remounts the journal is overwritten from the beginning, so recent entries may be lost ... not an option to rely on the ext4 fs as the source of activities ... or only with the described tradeoff. Also: works based on inodes
xfs_logprint: list the xfs log ... works on inodes ... no idea if it is a ring ...

so ... neither of those make sense as a source for the activity log ... hm ... well ... maybe in addition to inotify?

to truly scale we nood to push the events into the storage layer, where we can assume enough free storage as it can be scaled better than ram. In conclusion I would argue it makes sense to keep the events as individual files on the storage. That allows geo-distributing them along the file data and metadata. which is one of the core design principles for ocis.

butonic · 2019-09-30T11:43:10Z

hm, is the activity log more than a recursive list of files, folders, trash items and versions ordered by mtime? If so it is a cache and can be reconstructed from the existing metadata...

for now, cs3 has neither an activity api nor an event bus or a queue. the implementation needs to happen in the ocs api anyway. the activity service can be implemented in many ways:

a dedicated service that we feed with events
store activity entries in the filesystem? under a dedicated namespace in the cs3 api? set permissions on events if they are stored as files? or use a list per storage? with every user home and every share being a separate storage? slowly move events to the root? hm no ... what happens if we use acls that take away access to a subtree of a shared storage (negative acls ... or an enforced !d in eos)

butonic · 2019-09-30T13:52:58Z

how do we translate chunked uploads to the ocs api into cs3?

1. MKCOL https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529 this creates the upload collection
1. PUT https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529/0, PUT https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529/10485760 the PUT requests also contain an x-oc-mtime: 1568643784.11 header ... and a content-disposition: attachment; filename="Armbian_5.95_Odroidxu4_Debian_buster_default_4.14.141_minimal.7z" header. each chunk is assignod its own fileid, as can be seen in the response headers oc-fileid: 00000285oc6mnsqnwqft, oc-fileid: 00000286oc6mnsqnwqft ... the last put may be smaller than the chunk size. Where do we get the chunk size from? does the client determine the chunk size? I think so. It allows the clients to adjust the size based on the bandwith. TODO add response header to allow server to recommend new chunk size?
1. MOVE https://demo.owncloud.com/remote.php/dav/uploads/demo/web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529/.file with destination: https://demo.owncloud.com/remote.php/dav/files/demo/tast/Armbian_5.95_Odroidxu4_Debian_buster_default_4.14.141_minimal.7z header. also has the optional oc-lazyops: 1 header. oc-total-length: 115824915 can be used to check the size ... but meh ... should have been sent in the beginning, so the server can preallocate the file. also has the x-oc-mtime: 1568643784.11 set again. This time the response carries a x-oc-mtime: accepted. We also get oc-etag: "2a134fcb07dd624137bd9cd9b9c241d7" and etag: "2a134fcb07dd624137bd9cd9b9c241d7". finally the oc-fileid: 00000289oc6mnsqnwqft is the fileid for the destination file. The lazyops header can return a status: 201 (ok) or a status: 202 (accepted) with a OC-JobStatus-Location: https://demo.owncloud.com/remote.php/dav/job-status/demo/5a7c5838-56cb-47b8-ac0e-4fb1e6d21316 and a connection: close header
1. GET https://demo.owncloud.com/remote.php/dav/job-status/demo/5a7c5838-56cb-47b8-ac0e-4fb1e6d21316 which might get {"status":"started"}, {"status":"finished","fileId":"oc1234","ETag":"\"abcdef\""} or {"status":"error","errorCode":404,"errorMessage":""} in case of an error

butonic · 2019-09-30T13:56:14Z

given that the lazy ops header implementation already introduces a job queue I will implement a queue outside of the cs3 api. we can add one to cs3 if it becomes necessary. however, I assume that becomes an issue when trying to implement chunked parallel uploads using the cs3 api on its own.

butonic · 2019-09-30T14:42:03Z

The MKCOL can be translated to a ._reva_atomic_upload_web-file-upload-c8639c42235c9ec26749a804aba61396-1569849691529 temp dir in the distination folder. This allows it to work with any storage. We could remove the access permissions to this dir. For now I would hide the file in listings or replace it with a placeholder reference. ... Hah .. No .. not possible, because we don't HAVE the destination dir yet ... 💥 so ... where do we put the files? a dedicated /uploads namespace? in the users home? in a .uploads folder in a users home?
PUT requests can store files in that directory. No problem here
MOVE contains source and destination, so no problems regarding those. but lazy ops is interesting. we could start the implementation without supporting lazy ops. at least the sync for large files would work.

butonic · 2019-09-30T15:05:13Z

on the one hand, we want to avoid having to copy the file to the target destination when they reside on the same storage. A move would be fister in this case. On the other hand we want to prevent access if the file has not gone through postprocessing.

The idea of the quarantine area was born out of the pain of having to copy data again after it has been uploaded. For small files this may not be an issue, but it becomes painful for large files.

Thinking of the quarantine area as a high ingress storage with restricted access might make more sense. First, data transfer between cient and server should be fast... preferably parallel and allow other clients do download the chunks while they are being uploaded.

But other client should only be able to access the file when it has been processed. How can they download chunks if the file has not yet been processed? clients could generate a random symmetric encryption key and send it to the server. all chunks ar encrypted symmetrically. The clients can start downloading chunks, even if the file has not been processed. if processing finishes the server releases the key to the clients and they can decrypt the chunks instead of waiting for the complete file. This would decrease latency ... but it might cause a redownload if postprocessing changes the file ...

In any case ... I need to think about the upload and quarantine area as a high ingress storage ... the question is if cs3 can detect if the underlying storage is the same and issue a move insteayd of a copy...

butonic · 2019-09-30T15:28:19Z

the best I can come up with for now is to put chunked uploads into the users home storage. under a hidden .uploads folder. that way, the final move is fast when it is done on the same storage. we can detect if source and target storage are the same using the storage id. move if they are the same, copy if they differ ... hm, but we need to copy the chunks for assembly anyway ... 🤔 the final move ALWAYS is a copy ... linking original discusson on this owncloud/core#4997

butonic · 2019-09-30T15:46:32Z

AFAICT we need the full file size on the first MKCOL in order to precreate the file with the desired length. then we can seek to tho offset given by the PUT requests, which would save us the assembly. This can be an optional enhancement.

butonic · 2019-09-30T15:52:11Z

If we get the destination on the MKCOL we can store the file on the correct storage. If it supports seeking we can even omit chunk assembly. However access needs to be restricted until the file has been processed. maybe a get temp dir / file CS3 api call makes sense?

butonic · 2019-09-30T15:54:34Z

hm the cs3 api does not deal with file up or download. maybe this is more a question of the datasvc ...

butonic · 2019-09-30T17:54:16Z

aaand @evert had his say on tus.io as well: tus/tus-resumable-upload-protocol#14 (comment)

Lookid into it as a replacement for the datasvc... still does not handle multiple small files well ... any pointer in that regard is welcome.

butonic · 2019-09-30T18:03:28Z

Obviously, we can teach datasvc the http://sabre.io/dav/http-patch/ tricks ... but it would require keeping track of upload progress ... the HEAD request of tus.io is nice to resume uploads.

butonic · 2019-09-30T18:23:40Z

link: https://blog.daftcode.pl/how-to-make-uploading-10x-faster-f5b3f9cfcd52 uses compression in the browser to reduce upload size. I wonder if normal PUT requests natively support compression. They should, shouldn't they?

Moreover, unlike servers, browsers have no native way to compress data before uploading. So, here we are, stuck with slower network speeds and bigger files, or… are we?

maybe we can use https://github.com/nodeca/pako or https://github.com/photopea/UZIP.js to compress multiple small files into a single bytestream and upload that.

all of this in an incremental way ... oh well ... maybe as an extension to tus.io?

evert · 2019-09-30T18:25:58Z

HTTP requests should just be able to use Content-Transfer-Encoding to enable compression. If you want to get real fancy, you could use brotli encoding for more speed-up.

butonic · 2019-09-30T20:58:24Z

@evert I only know Content-Transfer-Encoding from SMTP ... I assume you mean Content-Encoding, which can be set to br for brotli. Ohhh, it neatly already has 11.3. Creating Self-Contained Parts within the Compressed Data. Oh and Content-Transfer-Enconding should not be used ... see https://tools.ietf.org/html/rfc7231#appendix-A.5 Sure enough setting Content-Encoding on PUT / POST requests has not been invented here: https://medium.com/axiomzenteam/put-your-http-requests-on-a-diet-3e1e52333014

evert · 2019-09-30T21:14:46Z

Yes that's exactly what I meant =)

butonic · 2019-09-30T21:48:54Z

regarding capabilities we currently have:

dav.chunking=1.0
dav.async=1.0
dav,trashbin=1.0

we could indicate the new capabilities with new properties or increase the version number:

dav.chunking=1.1 to indicate the server supports a destination and a oc-total-length on the initial MKCOL request. We can increase the minor version because this is backwards compatible.
dav.resume=1.0 could indicate support for fetching the progress of a chunk using HEAD? similar to tus.io: https://tus.io/protocols/resumable-upload.html#example
dav.compression=br,gzip although I prefer being able to indicate to the client which compression algorithms can be used as part of the initial MKCOL response. Maybe with an Accept-Encoding ... but it usually is sent by the client to the server. Oh and there have been a few security problems with http compression in the past ...
dav.bulk=1.0 for sending a compressed stream of small files? they would be added in batch and only a single event would be propagated?

butonic · 2019-10-01T13:06:29Z

In theory it would be possible to model the upload/quarantine area as a dedicated storage that as its last workflow step has the MOVE or if necessary COPY operation to the target storage. Unfortunately, this would need special handling of the fileid, because a cross storage move changes the file id. So when a client initiated an upload it will not get the correct file id until the file has reached the target storage...

how would this look like?

The client MKCOLs an upload dir. If a Destination header is present we can look up the responsible storage, if not we fall back to the users home storage. This translates to an InitiateUploadRequest, not a CreateCollectionRequest, because CS3 delegates upload to an out of bound process. Since we do not yet know the name we need to create a temporery file. If the client sent a destination, we may have a filename and can Initiate a direct upload to the target file.
The storage provider returns a location where we PUT the file. Currently, this is handled by the datasvc. This does however not need to use the same storage driver. It could use a new workflow driver that implements the execution of steps. The last step being: MOVE or COPY to the destination, when the client signals it has finished uploading. To decide on MOVE vs COPY the workflow storage can be configured with two storages: an ingress storage driver and a target storage driver. Both with a ... hmmm ...

Intermezzo ...

The storage drivers only have an Upload function ... it currently does not allow append or range requests. This is a problem if we want to implement resumable uploads for CS3 and use HTTP/2 because we cannot resume uploads. Furthermore, the API does not allow implementing an assembly approach. As it stands, we need to have the file complete before we can call Upload.

If we assume the datasvc sits next to the storageprovidersvc they should have access to the storage. In the case of the owncloud and eos storage driver we could in theory bypass the Upload and directly write to the storage, including seeking to offsets etc. But for s3 this is not possible. S3 however does have multipart uploads. Does that map to our current chunked uploading? well, again, if we get the target in the initial MKCOL we can start the S3 multipart upload.

So, this is more a question of how to implement a datasvc that is storage driver specific.
AFAICT https://github.com/tus/tusd/ can be embedded programatically. We can implement different DataStore implementations to put the files where we need them. Hm, tusd already has a hook system: https://github.com/tus/tusd/blob/master/docs/hooks.md

... end intermezzo

related: the reva gatewaysvc implementation for the storageprovider does not yet support MOVE https://github.com/cs3org/reva/blob/master/cmd/revad/svcs/grpcsvcs/gatewaysvc/storageprovidersvc.go#L179

I recommend we replace the datasvc with a tusd based implementation. But since the current chunking does not send the taget in advance, and the CS3 api as well as the storage driver API only support uploading as a single operation we need to defer this and describe a proper solution before going further in that direction.

The current eos driver uses a temp file as well, so the owncloud driver, or any other for that matter has to suffer the penalty of doing a full copy of every file upload .... urgh this is 🐮 💩

butonic · 2019-10-01T15:11:22Z

Ok, so on the other half of the implementation we have an asynchronous chunk assembly. Normally I would use a proper task queue like https://github.com/RichardKnop/machinery/ to make the queu persistent, prevent multiple processes from starting the assembly ( locking or rether task deduplication).

However, let us remember the idea to push metadata as close to the storage as possible? S3 eg has its own multipart upload. I don't know if it allows uploads to the same upload id from different regions ... I don't see a usecase for it, because a client is likely not going to switch the region between chunks. So, do we need to trasport chunks over geo boundaries? If we want to allow asymettric download of chunks while the file is not fully uploaded that might be a good idea. But for this the chounks would need to be stored on the storage, not in the ocdav svc that currently implements chunking. So this needs to be postponed as well, until we discuss chunking v3?

Which leaves the question of how to prevent multiple ocdavsvc processes from initiating an assembly at the same time. In a HA scenario we have at least two instances. Do they share a queue or do we lock the upload to prevent an additional assembly. Actually what if a client (or a proxy in between) repeats the move. We need to lock the chunked upload dir, not only to the process but to the go routine and deal with timeouts and errors ... tomorrow...

tusd has an in memory or file based locking provider. redis is mentioned, but the only other implementation I found uses etcd. Another pointer towads tusd ... they thought about this ...

Anyway, we can just create a lockfile. open(pathname, O_CREAT | O_EXCL, 0644) is reasonable safe, even on todays nfs. So this is an atomic way to have the first request assemble the chunks and let others fail. First come first served.

whoever gets the lock creates a uuid for the dav/job-status endpoint and updates the status there ... we can even write the json files there, so the jobs can be picked up after a restart?

butonic · 2019-10-08T12:27:22Z

wating for replace datasvc with a tus.io capable endpoint cs3org/reva#284
consider https://uppy.io in phoenix?

allow handler to return an error

Report trace on a single service basis

…-started In case the http server cannot be started the error is logged

refs · 2021-01-12T09:30:23Z

blocked. see #36 (comment)

butonic · 2021-03-08T14:55:47Z

lazy ops got removed from the clients: owncloud/client#8398
tracking new approach in #214

butonic self-assigned this Oct 1, 2019

butonic transferred this issue from owncloud-archive/nexus Oct 1, 2019

butonic added this to the Feature Complete milestone Oct 15, 2019

butonic removed their assignment Nov 11, 2019

refs pushed a commit that referenced this issue Sep 18, 2020

Merge pull request #36 from butonic/return-error

3ca3bd4

allow handler to return an error

refs added a commit that referenced this issue Sep 18, 2020

Merge pull request #36 from owncloud/fix/tracing

5c659fd

Report trace on a single service basis

refs added a commit that referenced this issue Sep 18, 2020

Merge pull request #36 from owncloud/feature/accounts-uuid-middleware

d7eab50

refs pushed a commit that referenced this issue Sep 18, 2020

Merge pull request #36 from owncloud/bugfix/add-logging-if-server-not…

b04999d

…-started In case the http server cannot be started the error is logged

refs added Status:Blocked Category:Feature labels Jan 13, 2021

refs changed the title ~~implement chunked uploading with lazy ops header~~ Implement chunked uploading with lazy ops header Jan 13, 2021

settings bot removed the Category:Feature label Jan 29, 2021

exalate-issue-sync bot added the p3-medium label Feb 10, 2021

butonic closed this as completed Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement chunked uploading with lazy ops header #36

Implement chunked uploading with lazy ops header #36

butonic commented Mar 7, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019

evert commented Sep 30, 2019

butonic commented Sep 30, 2019 •

edited

Loading

evert commented Sep 30, 2019

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Oct 1, 2019

butonic commented Oct 1, 2019 •

edited

Loading

butonic commented Oct 8, 2019

refs commented Jan 12, 2021

butonic commented Mar 8, 2021

Implement chunked uploading with lazy ops header #36

Implement chunked uploading with lazy ops header #36

Comments

butonic commented Mar 7, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019 • edited Loading

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019 • edited Loading

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019

butonic commented Sep 30, 2019 • edited Loading

butonic commented Sep 30, 2019 • edited Loading

butonic commented Sep 30, 2019

evert commented Sep 30, 2019

butonic commented Sep 30, 2019 • edited Loading

evert commented Sep 30, 2019

butonic commented Sep 30, 2019 • edited Loading

butonic commented Oct 1, 2019

butonic commented Oct 1, 2019 • edited Loading

butonic commented Oct 8, 2019

refs commented Jan 12, 2021

butonic commented Mar 8, 2021

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Sep 30, 2019 •

edited

Loading

butonic commented Oct 1, 2019 •

edited

Loading