Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files on multi-user installation on Ubuntu #6007

Closed
jakubgs opened this issue Jan 28, 2022 · 46 comments · Fixed by #6553
Closed

Too many open files on multi-user installation on Ubuntu #6007

jakubgs opened this issue Jan 28, 2022 · 46 comments · Fixed by #6553
Labels

Comments

@jakubgs
Copy link

jakubgs commented Jan 28, 2022

Describe the bug

I'm installing 2.5.1(but I also tested 2.6.0) on Ubuntu, and I'm getting some varation of errors like these when building a big project:

error: opening directory '/nix/store/...': Too many open files
error: creating pipe: Too many open files

Steps To Reproduce

  1. Use Ubuntu 21.10 and 20.04.3 LTS
  2. Download Nix 2.5.1 or 2.6.0
  3. Perform multi-user installtion
  4. Try running a big build
  5. See Too many open files

Expected behavior

I expected it to work out of the box, but I also expected that something like ulimit -Sn 4096 or editing /etc/security/limits.conf to fix the problem but it doesn't. Nor does setting fs.file-max to something high in /etc/sysctl.conf.

What DOES fix the problem is adding:

LimitNOFILE=65536
LimitNOFILESoft=4096

In /etc/systemd/system/nix-daemon.service, but that's not a good solution, since it would require me to ask developers to perform these additional actions after installation. Wouldn't it make sense to include an increased open files limit in the service definition by default?

Based on this post Systemd services appear to ignore system-wide limits - both hard and soft - and have to be adjusted in the service definition, or it defaults to 1024 for the soft limit:

jakubgs@ubuntuvm:~ % grep 'open files' /proc/$(pgrep nix-daemon)/limits
Max open files            1024                 524288               files  

Additional context

I'm attempting to upgrade our mobile application build setup to 2.5.1 in status-im/status-mobile#12980.

@SuperSandro2000
Copy link
Member

Try adding fs.file-max=65536 to /etc/sysctl.d/90-custom.conf and reload sysctl with sudo sysctl -p. You may need to restart the daemon.

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

I already said I tried changing fs.file-max in /etc/sysctl.conf, but here you go, no effect even after reboot:

jakubgs@ubuntuvm:~ % cat /etc/sysctl.d/90-custom.conf            
fs.file-max=65536
jakubgs@ubuntuvm:~ % cat /etc/security/limits.d/nix.conf 
*    soft    nofile    65536
jakubgs@ubuntuvm:~ % grep 'open files' /proc/$(pgrep nix-daemon)/limits
Max open files            1024                 524288               files 

Systemd services appear to ignore limits other than those set by LimitNOFILE and LimitNOFILESoft.

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

Systemd does not support global limits, the file is intentionally ignored.

LimitNOFILE= in the service file can be set to specify the number of open
file descriptors for a specific service.

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

@edolstra
Copy link
Member

Adding LimitNOFILE to our systemd unit sounds good. However it would also be useful to know where the file descriptors are going. Any chance you can get the contents of /proc/<pid>/fd of the daemon just before it fails?

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

I'm just running the build of our mobile application, which pulls a SHITLOAD of dependencies, especially for Gradle:
https://github.com/status-im/status-react/blob/develop/nix/deps/gradle/deps.list

So I'd guess that's why it fails for our builds, but I can try checking.

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

There you go, 1024 file descriptors open:
https://gist.github.com/jakubgs/354f0c76a9c82fa4648459923c9bd114

Most of it is a combination of Node.js dependencies pulled in by yarn2nix and Gradle dependencies pulled in by our own setup:

 > grep -E '(pom|jar).lock$' nix_open_files.ls | wc -l   
823

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

One thing worth noting is that Systemd services seem to ignore the LimitNOFILESoft setting if there's no LimitNOFILE present.

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

I think LimitNOFILESoft might be deprecated, since I cant find it in the docs:
https://www.freedesktop.org/software/systemd/man/systemd.exec.html

This format appears to work fine:

jakubgs@ubuntuvm:~ % grep limit /etc/systemd/system/nix-daemon.service
LimitNOFILE=4096:32768
jakubgs@ubuntuvm:~ % grep 'open files' /proc/$(pgrep nix-daemon)/limits
Max open files            4096                 32768                files  

Which appears to match the format of the defaults:

jakubgs@ubuntuvm:~ % grep DefaultLimitNOFILE /etc/systemd/system.conf 
#DefaultLimitNOFILE=1024:524288

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

It appears systemd has been setting DefaultLimitNOFILE to 1024:262144 since this change: systemd/systemd@c02b6ee

Which was effective since v240.

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

Also, since what's causing the error is the soft limit Nix could raise it's own soft limit using the setrlimit syscall:
https://man7.org/linux/man-pages/man2/setrlimit.2.html

setrlimit(RLIMIT_NOFILE, 4096)

I do see some references to use of setrlimit in the codebase:
https://github.com/NixOS/nix/search?q=setrlimit

But using it would probably involve some additional logic, like checking the limit against currently opened files and raising the limit only when necessary in steps or something like that. Just adding LimitNOFILE is definitely simpler.

@edolstra
Copy link
Member

https://gist.github.com/jakubgs/354f0c76a9c82fa4648459923c9bd114

Are these building/downloading in parallel (i.e. --max-jobs with a very high value)?

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

@edolstra
Copy link
Member

Hm, we acquire a path lock before obtaining a build slot, so if a derivation has a few thousand direct dependencies, Nix will lock acquire a few thousand locks, regardless of how many build slots there are...

@jakubgs
Copy link
Author

jakubgs commented Jan 28, 2022

How many CPU cores does that machine have?

Depends, my VM? 4 cores. Our CI hosts? 12 cores. The same thing happens regardless.

jakubgs added a commit to status-im/status-mobile that referenced this issue Jan 31, 2022
NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/status-mobile that referenced this issue Jan 31, 2022
NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/status-mobile that referenced this issue Jan 31, 2022
NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Author

jakubgs commented Jan 31, 2022

I have tried setting max-jobs to concrete values - like 1000, 800, and 600 - but they do not work.
Which is probably because of the issue @edolstra identified with acquiring path locks before obtaining a build slot.

jakubgs added a commit to status-im/status-mobile that referenced this issue Feb 1, 2022
Due to changes in how Nix handles Git refs we need to specify
`refs/tags/` prefix in `package.json` to avoid the following error:
```
fatal: couldn't find remote ref refs/heads/v2.0.3-status-v6
error: program 'git' failed with exit code 128
```

I also had to rewrite some logic in `nix/scripts/source.sh` in order to
take account of single-user and multi-user installations.
We default to multi-user for Darwin, but not for any other OS due to
discovered issues with `nix-daemon` socket on Arch and open file limits.

Resolves: #12832
Depends on: status-im/status-jenkins-lib#37
Issues:
* NixOS/nix#5291
* NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/status-mobile that referenced this issue Feb 1, 2022
Due to changes in how Nix handles Git refs we need to specify
`refs/tags/` prefix in `package.json` to avoid the following error:
```
fatal: couldn't find remote ref refs/heads/v2.0.3-status-v6
error: program 'git' failed with exit code 128
```

I also had to rewrite some logic in `nix/scripts/source.sh` in order to
take account of single-user and multi-user installations.
We default to multi-user for Darwin, but not for any other OS due to
discovered issues with `nix-daemon` socket on Arch and open file limits.

Resolves: #12832
Depends on: status-im/status-jenkins-lib#37
Issues:
* NixOS/nix#5291
* NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/status-mobile that referenced this issue Feb 1, 2022
Due to changes in how Nix handles Git refs we need to specify
`refs/tags/` prefix in `package.json` to avoid the following error:
```
fatal: couldn't find remote ref refs/heads/v2.0.3-status-v6
error: program 'git' failed with exit code 128
```

I also had to rewrite some logic in `nix/scripts/source.sh` in order to
take account of single-user and multi-user installations.
We default to multi-user for Darwin, but not for any other OS due to
discovered issues with `nix-daemon` socket on Arch and open file limits.

Resolves: #12832
Depends on: status-im/status-jenkins-lib#37
Issues:
* NixOS/nix#5291
* NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/status-mobile that referenced this issue Feb 1, 2022
Due to changes in how Nix handles Git refs we need to specify
`refs/tags/` prefix in `package.json` to avoid the following error:
```
fatal: couldn't find remote ref refs/heads/v2.0.3-status-v6
error: program 'git' failed with exit code 128
```

I also had to rewrite some logic in `nix/scripts/source.sh` in order to
take account of single-user and multi-user installations.
We default to multi-user for Darwin, but not for any other OS due to
discovered issues with `nix-daemon` socket on Arch and open file limits.

Resolves: #12832
Depends on: status-im/status-jenkins-lib#37
Issues:
* NixOS/nix#5291
* NixOS/nix#6007

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Author

jakubgs commented Feb 1, 2022

Because of this issue I will not default to multi-user installation on Linux for our developers, but will use it on our CI host, since there I can easily apply the systemd service definition fix. Maybe it can be fixed for a future release.

@tobiasBora
Copy link

tobiasBora commented May 2, 2022

I also hit this bug when I try to install latex with doc (reported here NixOS/nixpkgs#171218):

  environment.systemPackages = with pkgs; [
    (texlive.combine {
      inherit (texlive) scheme-full;
      pkgFilter = pkg: lib.elem pkg.tlType [ "run" "bin" "doc" ];
    })
   # ...
  ];
copying path '/nix/store/ac8g2qfm88gychisrkjiqn0v53nvisxa-texlive.tlpdb.xz' from 'https://cache.nixos.org'...
error: opening directory '/nix/store/y2p34715pgw4f0ydd5glkjp85fqim5fl-dirtytalk.doc.r20520.tar.xz': Too many open files

@SuperSandro2000
Copy link
Member

Because of this issue I will not default to multi-user installation on Linux for our developers, but will use it on our CI host, since there I can easily apply the systemd service definition fix. Maybe it can be fixed for a future release.

Keep in mind that build failures/issues on linux that are caused by having the sandbox disabled are mostly ignored.

@jakubgs
Copy link
Author

jakubgs commented May 2, 2022

Keep in mind that build failures/issues on linux that are caused by having the sandbox disabled are mostly ignored.

What does that mean?

@SuperSandro2000
Copy link
Member

If a package fails to build and the reason behind it is that the sandbox is disabled, it is most likely not going to be fixed. You need multi-user mode for the sandbox.

@jakubgs
Copy link
Author

jakubgs commented May 2, 2022

How is the sandbox related to the default 1024:262144 value of DefaultLimitNOFILE in systemd and Nix daemon hitting that limit?

I don't really get why you are referring to the sandbox in this issue, as it doesn't appear to be realted. At least to me.

@SuperSandro2000
Copy link
Member

Because of this issue I will not default to multi-user installation on Linux for our developers

@jakubgs
Copy link
Author

jakubgs commented May 3, 2022

That in no way explains how it's related to the sandbox.

@tobiasBora
Copy link

Correct me if I'm wrong, but I think that SuperSandro2000 just meant that if you choose the single-user installation instead of the multi-user installation, then your developer may not be able to report bugs as in this mode the sandbox is disabled, and that therefore it may be worth using the multi-user installation (and fix this bug?).

To come back to this issue, on NixOs I was able to solve my problem using this, but I'm not sure if something similar could apply here as it does not even target NixOs.

  security.pam.loginLimits = [{
    domain = "*";
    type = "soft";
    item = "nofile";
    value = "262144";
  }];

@jakubgs
Copy link
Author

jakubgs commented May 3, 2022

Ooooh, okay, that actually makes sense. Thanks for explaining @tobiasBora. And yes, I agree, I'd prefer to stick to multi-user everywhere to avoid issues caused by inconsistent setups. But at the same time I'd rather keep the setup process as default as possible.

And yes, the issue in question has been observed on Ubuntu and Arch, I've not experienced in on NixOS.

@Artturin
Copy link
Member

#6553

@bennofs
Copy link
Contributor

bennofs commented Nov 2, 2022

This is not a complete fix: when running nix as root, the daemon is not used, so the low default fileno ulimit (1024) makes nix builds fail often. I think nix should use setrlimit itself to raise the soft limit.

@tobiasBora
Copy link

Is this supposed to be solved in NixOs as well? I just installed a fresh NixOs 23.11, and I’m running into that error again. I tried to check the value of ulimit -n, and it gives as both a normal user and a root user.

@tobiasBora
Copy link

Ok, @bennofs is indeed right. I created a new issue here with a reproducible example #10158

@ldesousa
Copy link

I installed Nix yesterday on Ubuntu 24.04. I just bumped on this issue trying to install TexLive:
$ nix-shell -p texliveSmall

I really don't understand the discussion above about sandboxes, I am just starting to use Nix. Please let me know if there is an obvious fix.

@andrevmatos
Copy link
Member

Got this one on NixOS unstable-small today; ran nixos-rebuild twice, same error, then systemctl restart nix-daemon did fix it (on this instance) and allowed build to complete.
Maybe nix 2.24 is leaking fds?

@kjeremy
Copy link
Contributor

kjeremy commented Aug 19, 2024

I just hit this on nixos-unstable

$ nix --version
nix (Nix) 2.24.1

@llakala
Copy link
Contributor

llakala commented Aug 24, 2024

I just hit this on nixos-unstable

$ nix --version
nix (Nix) 2.24.1

Just got it too. I'm on stable, with only a few packages using unstable, and nix 2.24.3. Rebuilding again fixed it.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-the-address-the-too-many-open-files-issue/51646/1

@cole-h
Copy link
Member

cole-h commented Sep 4, 2024

Likely fixed by #11408 (present in Nix 2.24.5+).

According to that PR, it was present in 2.24.0+, hence the reoccurrence.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/update-fails-with-permission-denied/51663/1

@ldesousa
Copy link

ldesousa commented Sep 5, 2024

@cole-h This issue prevails with release 2.25.

$ nix --version
nix (Nix) 2.25.0pre20240807_cfe66dbe

$ nix-shell -p pandoc haskellPackages.pandoc-crossref pandoc-include librsvg texliveFull
[...]
error: creating pipe: Too many open files

@cole-h
Copy link
Member

cole-h commented Sep 5, 2024

Note that that revision of 2.25pre is from before the bug fix was merged to master. Once my PR to fix the systemd unit is merged (#11413) you'll be able to update Nix to a revision that includes the fix. Or you can fetch the patch yourself.

@ldesousa
Copy link

ldesousa commented Sep 5, 2024

Or you can fetch the patch yourself.

@cole-h Could you explain how to do that?

@cole-h
Copy link
Member

cole-h commented Sep 5, 2024

You have two options:

  1. Update to a more recent Nix master and pull the patch for the systemd unit (fixup: use the real bindir for systemd unit's bindir #11413)
  2. Keep using your current Nix and pull the patch for the actual FD fix (Respect max-substitution-jobs again #11402)

Whichever you choose, the process will be the same: create an overlay for the nix package like so:

final: prev: {
  nix = prev.nix.overrideAttrs ({ patches ? [ ], ... }): {
    patches = patches ++ [
      (final.fetchpatch {
        url = "the  url to whichever PR's patch you chose";
        hash = "sha256-.........";
      })
    ];
  };
}

and import it after the Nix master branch overlay.

@ldesousa
Copy link

ldesousa commented Sep 6, 2024

@cole-h I so far failed to follow your instructions. I guess my knowledge of Nix is not yet up to it. Since this is a lateral discussion, could you just point me to where I can learn what are things such as a package overlay and how to update to a Nix master (according to the forum I should now have the very latest installed).

@edolstra
Copy link
Member

edolstra commented Sep 6, 2024

@ldesousa Note that Nix 2.24.5 contains #11408 so you may want to try that.

@ldesousa
Copy link

ldesousa commented Sep 6, 2024

@edolstra Thank you for chiming in. Problem is: I don't really know how to do that. For the moment I can get the latest stable and the latest unstable, installing a precise release I haven't figured yet.

@ldesousa
Copy link

ldesousa commented Sep 9, 2024

At the advice of a friend I removed the Nix version I had installed from the official repositories and installed from zero-to-nix instead. That got me version 2.24.5, that indeed does not suffer from this bug.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-the-address-the-too-many-open-files-issue/51646/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.