Merge pull request #1377 from trapexit/movecopy-readme

Add FAQ entry on 'move' and 'copy'
trapexit · Nov 30, 2024 · 7d64a47 · 7d64a47
2 parents b0bb7ef + 5fba42f
commit 7d64a47
Showing 1 changed file with 49 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -2306,6 +2306,55 @@ mergerfs pool that includes all the paths you need if you want links
 to work.
 
 
+#### How does mergerfs handle moving and copying of files?
+
+This is a *very* common mistaken assumption regarding how filesystems
+work. There is no such thing as "move" or "copy." These concepts are
+high level behaviors made up of numerous independent steps and *not*
+individual filesystem functions.
+
+A "move" can include a "copy" so lets describe copy first.
+
+When an application copies a file from source to destination it can do
+so in a number of ways but the basics are the following.
+
+1. `open` the source file.
+2. `create` the destination file.
+3. `read` a chunk of data from source and `write` to
+   destination. Continue till it runs out of data to copy.
+4. Copy file metadata (`stat`) such as ownership (`chown`),
+   permissions (`chmod`), timestamps (`utimes`), extended attributes
+   (`getxattr`, `setxattr`), etc.
+5. `close` source and destination files.
+
+"move" is typically a `rename(src,dst)` and if that errors with
+`EXDEV` (meaning the source and destination are on different
+filesystems) the application will "copy" the file as described above
+and then it removes (`unlink`) the source.
+
+The `rename(src,dst)`, `open(src)`, `create(dst)`, data copying,
+metadata copying, `unlink(src)`, etc. are entirely distinct and
+separate events. There is really no practical way to know that what is
+ultimately occurring is the "copying" of a file or what the source
+file would be. Since the source is not known there is no way to know
+how large a created file is destined to become. This is why it is
+impossible for mergerfs to choose the branch for a `create` based on
+file size. The only context provided when a file is created, besides
+the name, is the permissions, if it is to be read and/or written, and
+some low level settings for the operating system.
+
+All of this means that mergerfs can not make decisions when a file is
+created based on file size or the source of the data. That information
+is simply not available. At best mergerfs could respond to files
+reaching a certain size when writing data or when a file is closed.
+
+Related: if a user wished to have mergerfs perform certain activities
+based on the name of a file it is common and even best practice for a
+program to write to a temporary file first and then rename to its
+final destination. That temporary file name will typically be random
+and have no indication of the type of file being written.
+
+
 #### Does FICLONE or FICLONERANGE work?
 
 Unfortunately not. FUSE, the technology mergerfs is based on, does not