You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to share models across users. To this end, we configured HF_HUB_CACHE which worked great for a while! However, we started to run into PermissionError related to files in .locks.
The problem seems to be to mixed group permissions for .locks. I'm attaching the artifacts list of this model below, but we see the problem for other models, too. The output of umask is 0002 for all users of the system.
Questions:
Is setting HF_HUB_CACHE sufficient for sharing hub cache across users?
If I understand correctly, the lock files should be released after use. However, they are not actually deleted by FileLock which may explain the problem we are facing. The relevant logic seems to be here:
A workaround would be to delete the .locks files, but not all users have permissions to do that, and asking each individual user to delete their files is tedious. So I'm curios to hear your thoughts on this scenario. Thanks!
Reproduction
No response
Logs
Here is a full stack trace and a list of the artifacts with permission mismatch.
$ python -c "import transformers; transformers.AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3.1-8B-Instruct')"
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 844, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/transformers/utils/hub.py", line 403, in cached_file
resolved_file = hf_hub_download(
^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1380, in _hf_hub_download_to_cache_dir
with WeakFileLock(lock_path):
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/utils/_fixes.py", line 98, in WeakFileLock
lock.acquire()
File "/home/trienes/.local/lib/python3.12/site-packages/filelock/_api.py", line 295, in acquire
self._acquire()
File "/home/trienes/.local/lib/python3.12/site-packages/filelock/_unix.py", line 42, in _acquire
fd = os.open(self.lock_file, open_flags, self._context.mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/scratch_shared/ag_seifertg/.cache/huggingface/hub/.locks/models--meta-llama--Meta-Llama-3.1-8B-Instruct/db88166e2bc4c799fd5d1ae643b75e84d03ee70e.lock'
Describe the bug
We would like to share models across users. To this end, we configured
HF_HUB_CACHE
which worked great for a while! However, we started to run intoPermissionError
related to files in.locks
.The problem seems to be to mixed group permissions for
.locks
. I'm attaching the artifacts list of this model below, but we see the problem for other models, too. The output ofumask
is0002
for all users of the system.Questions:
HF_HUB_CACHE
sufficient for sharing hub cache across users?huggingface_hub/src/huggingface_hub/utils/_fixes.py
Lines 115 to 121 in 476fa0b
A workaround would be to delete the
.locks
files, but not all users have permissions to do that, and asking each individual user to delete their files is tedious. So I'm curios to hear your thoughts on this scenario. Thanks!Reproduction
No response
Logs
Here is a full stack trace and a list of the artifacts with permission mismatch.
"blob" files get group read-write:
While ".locks" file don't get the same set of permissions.
System info
The text was updated successfully, but these errors were encountered: