[BugFix] Model State Reload with Quantized Stubs in SparseAutoModelForCausalLM #2226

rahul-tuli · 2024-04-05T18:45:00Z

Description

Identified a bug in the main branch where the model state fails to reload when using quantized stubs with SparseAutoModelForCausalLM.from_pretrained(...). The issue was due to the reload_model_state method expecting weight files in a local directory, not accounting for remotely hosted model directories.

Solution

Propose downloading the model directory before invoking reload_model_state to ensure weight files are available locally for model state reload.

Testing

Tested with the following script, confirming the fix resolves the issue:

from sparseml.transformers import SparseAutoModelForCausalLM

model_path = "mgoin/llama2.c-stories15M-quant-pt"
m1 = SparseAutoModelForCausalLM.from_pretrained(model_path)

Observations

Before the Fix:

Model state fails to reload due to missing local weight files, as shown in warnings and errors in the logs.

..
..
2024-04-05 17:33:33 sparseml.core.recipe.recipe INFO     Loading recipe from file /home/rahul/.cache/huggingface/hub/models--mgoin--llama2.c-stories15M-quant-pt/snapshots/aa70fc9dc46615b68f935fb5405ae7875b88b716/recipe.yaml
manager stage: Model structure initialized
2024-04-05 17:33:34 sparseml.pytorch.model_load.helpers INFO     Applied an unstaged recipe to the model at mgoin/llama2.c-stories15M-quant-pt
2024-04-05 17:33:34 sparseml.pytorch.model_load.helpers WARNING  Model state was not reloaded for SparseML: could not find model weights for mgoin/llama2.c-stories15M-quant-pt

After the Fix:

..
..
..
2024-04-05 21:17:16 sparseml.pytorch.model_load.helpers INFO     Reloaded model state after SparseML recipe structure modifications from /nm/drive0/rahul/.cache/huggingface/hub/models--mgoin--llama2.c-stories15M-quant-pt/snapshots/aa70fc9dc46615b68f935fb5405ae7875b88b716

Successfully reloaded model state with the fix.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1204866054040952

mgoin

The snapshot_download can pull down a lot of extra files since it downloads the whole folder. This might interact weirdly with the resolve_recipe call with specifically tries to download the recipe. This works, but I would like to be a bit more selective with the download

bfineran

LGTM pending @mgoin's comment

rahul-tuli · 2024-04-05T21:21:26Z

The snapshot_download can pull down a lot of extra files since it downloads the whole folder. This might interact weirdly with the resolve_recipe call with specifically tries to download the recipe. This works, but I would like to be a bit more selective with the download

Addressed in latest commit @mgoin

mgoin

Thanks a lot! It's additional complexity but I think good to have

src/sparseml/transformers/utils/helpers.py

rahul-tuli · 2024-04-05T22:01:38Z

Thanks a lot! It's additional complexity but I think good to have

It was a great callout. Really appreciate it.

Fix bug for loading models from hf hub

0e16e99

rahul-tuli force-pushed the hf-stub-bugfix branch from 8c1504b to 0e16e99 Compare April 5, 2024 18:45

rahul-tuli requested review from Satrat, dbogunowicz, mgoin and bfineran April 5, 2024 18:46

rahul-tuli self-assigned this Apr 5, 2024

rahul-tuli added the bug Something isn't working label Apr 5, 2024

rahul-tuli requested review from dsikka and horheynm April 5, 2024 18:46

mgoin reviewed Apr 5, 2024

View reviewed changes

bfineran previously approved these changes Apr 5, 2024

View reviewed changes

rahul-tuli dismissed bfineran’s stale review via d162980 April 5, 2024 21:18

Update to download only relevant files and not the whole model repo

b85b069

rahul-tuli force-pushed the hf-stub-bugfix branch from d162980 to b85b069 Compare April 5, 2024 21:22

mgoin previously approved these changes Apr 5, 2024

View reviewed changes

mgoin reviewed Apr 5, 2024

View reviewed changes

src/sparseml/transformers/utils/helpers.py Outdated Show resolved Hide resolved

Add py files to relevant suffixes

de18de3

rahul-tuli dismissed mgoin’s stale review via de18de3 April 5, 2024 21:54

mgoin approved these changes Apr 5, 2024

View reviewed changes

mgoin merged commit 88196d5 into main Apr 5, 2024
13 of 15 checks passed

mgoin deleted the hf-stub-bugfix branch April 5, 2024 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Model State Reload with Quantized Stubs in SparseAutoModelForCausalLM #2226

[BugFix] Model State Reload with Quantized Stubs in SparseAutoModelForCausalLM #2226

rahul-tuli commented Apr 5, 2024 •

edited

Loading

mgoin left a comment

bfineran left a comment

rahul-tuli commented Apr 5, 2024

mgoin left a comment

rahul-tuli commented Apr 5, 2024

[BugFix] Model State Reload with Quantized Stubs in SparseAutoModelForCausalLM #2226

[BugFix] Model State Reload with Quantized Stubs in SparseAutoModelForCausalLM #2226

Conversation

rahul-tuli commented Apr 5, 2024 • edited Loading

Description

Solution

Testing

Observations

mgoin left a comment

Choose a reason for hiding this comment

bfineran left a comment

Choose a reason for hiding this comment

rahul-tuli commented Apr 5, 2024

mgoin left a comment

Choose a reason for hiding this comment

rahul-tuli commented Apr 5, 2024

rahul-tuli commented Apr 5, 2024 •

edited

Loading