diff --git a/README.md b/README.md index 30f995bf6..fac0db21b 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ The srcmounts (source mounts) argument is a colon (':') delimited list of paths To make it easier to include multiple source mounts mergerfs supports [globbing](http://linux.die.net/man/7/glob). **The globbing tokens MUST be escaped when using via the shell else the shell itself will expand it.** ``` -$ mergerfs -o defaults,allow_other /mnt/disk\*:/mnt/cdrom /media/drives +$ mergerfs -o defaults,allow_other,use_ino /mnt/disk\*:/mnt/cdrom /media/drives ``` The above line will use all mount points in /mnt prefixed with **disk** and the **cdrom**. @@ -57,8 +57,8 @@ The above line will use all mount points in /mnt prefixed with **disk** and the To have the pool mounted at boot or otherwise accessable from related tools use **/etc/fstab**. ``` -# -/mnt/disk*:/mnt/cdrom /media/drives fuse.mergerfs defaults,allow_other 0 0 +# +/mnt/disk*:/mnt/cdrom /media/drives fuse.mergerfs defaults,allow_other,use_ino 0 0 ``` **NOTE:** the globbing is done at mount or xattr update time (see below). If a new directory is added matching the glob after the fact it will not be automatically included. @@ -97,7 +97,7 @@ Due to FUSE limitations **ioctl** behaves differently if its acting on a directo | ff (first found) | Given the order of the drives, as defined at mount time or when configured via xattr interface, act on the first one found. For **create** category it will exclude readonly drives and those with free space less than **minfreespace** (unless there is no other option). | | lfs (least free space) | Pick the drive with the least available free space. For **create** category it will exclude readonly drives and those with free space less than **minfreespace**. Falls back to **mfs**. | | lus (least used space) | Pick the drive with the least used space. For **create** category it will exclude readonly drives and those with free space less than **minfreespace**. Falls back to **mfs**. | -| mfs (most free space) | Pick the drive with the most available free space. For **create** category it will exclude readonly drives and those with free space less than **minfreespace**. Falls back to **ff**. | +| mfs (most free space) | Pick the drive with the most available free space. For **create** category it will exclude readonly drives. Falls back to **ff**. | | newest (newest file) | Pick the file / directory with the largest mtime. For **create** category it will exclude readonly drives and those with free space less than **minfreespace** (unless there is no other option). | | rand (random) | Calls **all** and then randomizes. | @@ -310,7 +310,7 @@ A B C * The recommended options are **defaults,allow_other,direct_io,use_ino**. * Run mergerfs as `root` unless you're merging paths which are owned by the same user otherwise strange permission issues may arise. * https://github.com/trapexit/backup-and-recovery-howtos : A set of guides / howtos on creating a data storage system, backing it up, maintaining it, and recovering from failure. -* If you don't see some directories / files you expect in a merged point be sure the user has permission to all the underlying directories. If `/drive0/a` has is owned by `root:root` with ACLs set to `0700` and `/drive1/a` is `root:root` and `0755` you'll see only `/drive1/a`. Use `mergerfs.fsck` to audit the drive for out of sync permissions. +* If you don't see some directories / files you expect in a merged point be sure the user has permission to all the underlying directories. Use `mergerfs.fsck` to audit the drive for out of sync permissions. * Do *not* use `direct_io` if you expect applications (such as rtorrent) to [mmap](http://linux.die.net/man/2/mmap) files. It is not currently supported in FUSE w/ `direct_io` enabled. * Since POSIX gives you only error or success on calls its difficult to determine the proper behavior when applying the behavior to multiple targets. **mergerfs** will return an error only if all attempts of an action fail. Any success will lead to a success returned. This means however that some odd situations may arise. * Remember that some policies mixed with some functions may result in strange behaviors. Not that some of these behaviors and race conditions couldn't happen outside **mergerfs** but that they are far more likely to occur on account of attempt to merge together multiple sources of data which could be out of sync due to the different policies. @@ -325,6 +325,12 @@ Use the `direct_io` option as described above. Due to what mergerfs is doing the Since enabling `direct_io` disables `mmap` this is not an ideal situation however write speeds should be increased and there are some tweaks being developed which may help in minimizing the extra caching. +#### NFS clients don't work + +Some NFS clients appear to fail when a mergerfs mount is exported. Kodi in particular seems to have issues. + +Try enabling the `use_ino` option. Some have reported that it fixes the issue. + #### rtorrent fails with ENODEV (No such device) Be sure to turn off `direct_io`. rtorrent and some other applications use [mmap](http://linux.die.net/man/2/mmap) to read and write to files and offer no failback to traditional methods. FUSE does not currently support mmap while using `direct_io`. There will be a performance penalty on writes with `direct_io` off as well as the problem of double caching but it's the only way to get such applications to work. If the performance loss is too high for other apps you can mount mergerfs twice. Once with `direct_io` enabled and one without it. @@ -431,33 +437,34 @@ Yes. It will be represented immediately in the pool as the policies would descri Please reread the sections above about policies, path preserving, and the **moveonenospc** option. If the policy is path preserving and a drive is almost full and the drive the policy would pick then the writing of the file may fill the drive and receive ENOSPC errors. That is expected with those settings. If you don't want that: enable **moveonenospc** and don't use a path preserving policy. -#### How are inodes calculated? +#### Can mergerfs mounts be exported over NFS? -mergerfs-inode = (original-inode | (device-id << 32)) +Yes. Some clients (Kodi) have issues but users have found that enabling the `use_ino` option often address the problem. -While `ino_t` is 64 bits only a few filesystems use more than 32. Similarly, while `dev_t` is also 64 bits it was traditionally 16 bits. Bitwise or'ing them together should work most of the time. While totally unique inodes are preferred the overhead which would be needed does not seem to outweighted by the benefits. +#### Can mergerfs mounts be exported over Samba / SMB? -#### It's mentioned that there are some security issues with mhddfs. What are they? How does mergerfs address them? +Yes. -[mhddfs](https://github.com/trapexit/mhddfs) tries to handle being run as **root** by calling [getuid()](https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319) and if it returns **0** then it will [chown](http://linux.die.net/man/1/chown) the file. Not only is that a race condition but it doesn't handle many other situations. Rather than attempting to simulate POSIX ACL behaviors the proper behavior is to use [seteuid](http://linux.die.net/man/2/seteuid) and [setegid](http://linux.die.net/man/2/setegid), become the user making the original call and perform the action as them. This is how [mergerfs](https://github.com/trapexit/mergerfs) handles things. +#### How are inodes calculated? -If you are familiar with POSIX standards you'll know that this behavior poses a problem. **seteuid** and **setegid** affect the whole process and **libfuse** is multithreaded by default. We'd need to lock access to **seteuid** and **setegid** with a mutex so that the several threads aren't stepping on one another and files end up with weird permissions and ownership. However, with lots of calls the contention on that mutex would be extremely high. Thankfully on Linux and macOS there is a better solution. +mergerfs-inode = (original-inode | (device-id << 32)) -macOS has a [non-portable pthread extension](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) for per-thread user and group impersonation. +While `ino_t` is 64 bits only a few filesystems use more than 32. Similarly, while `dev_t` is also 64 bits it was traditionally 16 bits. Bitwise or'ing them together should work most of the time. While totally unique inodes are preferred the overhead which would be needed does not seem to outweighted by the benefits. -Linux does not support [pthread_setugid_np](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) but user and group IDs are a per-thread attribute though documentation on that fact or how to manipulate them is not well distributed. From the **4.00** release of the Linux man-pages project for [setuid](http://man7.org/linux/man-pages/man2/setuid.2.html). +#### It's mentioned that there are some security issues with mhddfs. What are they? How does mergerfs address them? -> At the kernel level, user IDs and group IDs are a per-thread attribute. However, POSIX requires that all threads in a process share the same credentials. The NPTL threading implementation handles the POSIX requirements by providing wrapper functions for the various system calls that change process UIDs and GIDs. These wrapper functions (including the one for setuid()) employ a signal-based technique to ensure that when one thread changes credentials, all of the other threads in the process also change their credentials. For details, see nptl(7). +[mhddfs](https://github.com/trapexit/mhddfs) manages running as **root** by calling [getuid()](https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319) and if it returns **0** then it will [chown](http://linux.die.net/man/1/chown) the file. Not only is that a race condition but it doesn't handle many other situations. Rather than attempting to simulate POSIX ACL behavior the proper way to manage this is to use [seteuid](http://linux.die.net/man/2/seteuid) and [setegid](http://linux.die.net/man/2/setegid), in effect becoming the user making the original call, and perform the action as them. This is what mergerfs does. -As it turns out the setreuid syscalls apply only to the thread. GLIBC hides this away using RT signals to inform all threads to change credentials. Taking after **Samba**, mergerfs uses **syscall(SYS_setreuid,...)** to set the callers credentials for that thread only. Jumping back to **root** as necessary should escalated privileges be needed (for instance: to clone paths between drives). +In Linux setreuid syscalls apply only to the thread. GLIBC hides this away by using realtime signals to inform all threads to change credentials. Taking after **Samba**, mergerfs uses **syscall(SYS_setreuid,...)** to set the callers credentials for that thread only. Jumping back to **root** as necessary should escalated privileges be needed (for instance: to clone paths between drives). -For non-Linux systems mergerfs uses a read-write lock and changes credentials only when necessary. If multiple threads are to be user X then only the first one will need to change the processes credentials. So long as the other threads need to be user X they will take a readlock allow multiple threads to share the credentials. Once a request comes in to run as user Y that thread will attempt a write lock and change to Y's credentials when it can. If the ability to give writers priority is supported then that flag will be used so threads trying to change credentials don't starve. This isn't the best solution but should work reasonably well. As new platforms are supported if they offer per thread credentials those APIs will be adopted. +For non-Linux systems mergerfs uses a read-write lock and changes credentials only when necessary. If multiple threads are to be user X then only the first one will need to change the processes credentials. So long as the other threads need to be user X they will take a readlock allowing multiple threads to share the credentials. Once a request comes in to run as user Y that thread will attempt a write lock and change to Y's credentials when it can. If the ability to give writers priority is supported then that flag will be used so threads trying to change credentials don't starve. This isn't the best solution but should work reasonably well assuming there are few users. # SUPPORT #### Issues with the software * github.com: https://github.com/trapexit/mergerfs/issues * email: trapexit@spawn.link +* twitter: https://twitter.com/_trapexit #### Support development * Gratipay: https://gratipay.com/~trapexit diff --git a/man/mergerfs.1 b/man/mergerfs.1 index d2ba76201..78fc0bc0b 100644 --- a/man/mergerfs.1 +++ b/man/mergerfs.1 @@ -104,7 +104,7 @@ the shell itself will expand it.\f[] .IP .nf \f[C] -$\ mergerfs\ \-o\ defaults,allow_other\ /mnt/disk\\*:/mnt/cdrom\ /media/drives +$\ mergerfs\ \-o\ defaults,allow_other,use_ino\ /mnt/disk\\*:/mnt/cdrom\ /media/drives \f[] .fi .PP @@ -116,8 +116,8 @@ tools use \f[B]/etc/fstab\f[]. .IP .nf \f[C] -#\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -/mnt/disk*:/mnt/cdrom\ \ /media/drives\ \ fuse.mergerfs\ \ defaults,allow_other\ \ 0\ \ \ \ \ \ \ 0 +#\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +/mnt/disk*:/mnt/cdrom\ \ /media/drives\ \ fuse.mergerfs\ \ defaults,allow_other,use_ino\ \ 0\ \ \ \ \ \ \ 0 \f[] .fi .PP @@ -302,8 +302,7 @@ T{ mfs (most free space) T}@T{ Pick the drive with the most available free space. -For \f[B]create\f[] category it will exclude readonly drives and those -with free space less than \f[B]minfreespace\f[]. +For \f[B]create\f[] category it will exclude readonly drives. Falls back to \f[B]ff\f[]. T} T{ @@ -705,9 +704,6 @@ maintaining it, and recovering from failure. .IP \[bu] 2 If you don\[aq]t see some directories / files you expect in a merged point be sure the user has permission to all the underlying directories. -If \f[C]/drive0/a\f[] has is owned by \f[C]root:root\f[] with ACLs set -to \f[C]0700\f[] and \f[C]/drive1/a\f[] is \f[C]root:root\f[] and -\f[C]0755\f[] you\[aq]ll see only \f[C]/drive1/a\f[]. Use \f[C]mergerfs.fsck\f[] to audit the drive for out of sync permissions. .IP \[bu] 2 @@ -766,6 +762,13 @@ Since enabling \f[C]direct_io\f[] disables \f[C]mmap\f[] this is not an ideal situation however write speeds should be increased and there are some tweaks being developed which may help in minimizing the extra caching. +.SS NFS clients don\[aq]t work +.PP +Some NFS clients appear to fail when a mergerfs mount is exported. +Kodi in particular seems to have issues. +.PP +Try enabling the \f[C]use_ino\f[] option. +Some have reported that it fixes the issue. .SS rtorrent fails with ENODEV (No such device) .PP Be sure to turn off \f[C]direct_io\f[]. @@ -984,6 +987,14 @@ drive and receive ENOSPC errors. That is expected with those settings. If you don\[aq]t want that: enable \f[B]moveonenospc\f[] and don\[aq]t use a path preserving policy. +.SS Can mergerfs mounts be exported over NFS? +.PP +Yes. +Some clients (Kodi) have issues but users have found that enabling the +\f[C]use_ino\f[] option often address the problem. +.SS Can mergerfs mounts be exported over Samba / SMB? +.PP +Yes. .SS How are inodes calculated? .PP mergerfs\-inode = (original\-inode | (device\-id << 32)) @@ -997,59 +1008,22 @@ needed does not seem to outweighted by the benefits. .SS It\[aq]s mentioned that there are some security issues with mhddfs. What are they? How does mergerfs address them? .PP -mhddfs (https://github.com/trapexit/mhddfs) tries to handle being run as +mhddfs (https://github.com/trapexit/mhddfs) manages running as \f[B]root\f[] by calling getuid() (https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319) and if it returns \f[B]0\f[] then it will chown (http://linux.die.net/man/1/chown) the file. Not only is that a race condition but it doesn\[aq]t handle many other situations. -Rather than attempting to simulate POSIX ACL behaviors the proper -behavior is to use seteuid (http://linux.die.net/man/2/seteuid) and -setegid (http://linux.die.net/man/2/setegid), become the user making the -original call and perform the action as them. -This is how mergerfs (https://github.com/trapexit/mergerfs) handles -things. -.PP -If you are familiar with POSIX standards you\[aq]ll know that this -behavior poses a problem. -\f[B]seteuid\f[] and \f[B]setegid\f[] affect the whole process and -\f[B]libfuse\f[] is multithreaded by default. -We\[aq]d need to lock access to \f[B]seteuid\f[] and \f[B]setegid\f[] -with a mutex so that the several threads aren\[aq]t stepping on one -another and files end up with weird permissions and ownership. -However, with lots of calls the contention on that mutex would be -extremely high. -Thankfully on Linux and macOS there is a better solution. -.PP -macOS has a non\-portable pthread -extension (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) -for per\-thread user and group impersonation. -.PP -Linux does not support -pthread_setugid_np (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) -but user and group IDs are a per\-thread attribute though documentation -on that fact or how to manipulate them is not well distributed. -From the \f[B]4.00\f[] release of the Linux man\-pages project for -setuid (http://man7.org/linux/man-pages/man2/setuid.2.html). -.RS -.PP -At the kernel level, user IDs and group IDs are a per\-thread attribute. -However, POSIX requires that all threads in a process share the same -credentials. -The NPTL threading implementation handles the POSIX requirements by -providing wrapper functions for the various system calls that change -process UIDs and GIDs. -These wrapper functions (including the one for setuid()) employ a -signal\-based technique to ensure that when one thread changes -credentials, all of the other threads in the process also change their -credentials. -For details, see nptl(7). -.RE -.PP -As it turns out the setreuid syscalls apply only to the thread. -GLIBC hides this away using RT signals to inform all threads to change -credentials. +Rather than attempting to simulate POSIX ACL behavior the proper way to +manage this is to use seteuid (http://linux.die.net/man/2/seteuid) and +setegid (http://linux.die.net/man/2/setegid), in effect becoming the +user making the original call, and perform the action as them. +This is what mergerfs does. +.PP +In Linux setreuid syscalls apply only to the thread. +GLIBC hides this away by using realtime signals to inform all threads to +change credentials. Taking after \f[B]Samba\f[], mergerfs uses \f[B]syscall(SYS_setreuid,...)\f[] to set the callers credentials for that thread only. @@ -1061,20 +1035,21 @@ credentials only when necessary. If multiple threads are to be user X then only the first one will need to change the processes credentials. So long as the other threads need to be user X they will take a readlock -allow multiple threads to share the credentials. +allowing multiple threads to share the credentials. Once a request comes in to run as user Y that thread will attempt a write lock and change to Y\[aq]s credentials when it can. If the ability to give writers priority is supported then that flag will be used so threads trying to change credentials don\[aq]t starve. -This isn\[aq]t the best solution but should work reasonably well. -As new platforms are supported if they offer per thread credentials -those APIs will be adopted. +This isn\[aq]t the best solution but should work reasonably well +assuming there are few users. .SH SUPPORT .SS Issues with the software .IP \[bu] 2 github.com: https://github.com/trapexit/mergerfs/issues .IP \[bu] 2 email: trapexit\@spawn.link +.IP \[bu] 2 +twitter: https://twitter.com/_trapexit .SS Support development .IP \[bu] 2 Gratipay: https://gratipay.com/~trapexit diff --git a/src/option_parser.cpp b/src/option_parser.cpp index 485cb6fa1..0ad72880e 100644 --- a/src/option_parser.cpp +++ b/src/option_parser.cpp @@ -252,6 +252,8 @@ usage(void) " splice_write, splice_move\n" " -o direct_io bypass additional caching, increases write\n" " speeds at the cost of reads\n" + " -o use_ino mergerfs will generate inode values rather than\n" + " autogenerated by libfuse\n" " -o minfreespace= minimum free space needed for certain policies:\n" " default 4G\n" " -o moveonenospc= try to move file to another drive when ENOSPC\n"