Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monkeypatch for Llama 3.2-Vision #282

Merged
merged 30 commits into from
Oct 10, 2024

Conversation

tyler-romero
Copy link
Contributor

@tyler-romero tyler-romero commented Sep 29, 2024

Summary

Add monkeypatch to support Llama 3.2-Vision models.

Details

Llama 3.2-Vision is a multimodal model. It is also only available in transformers>=4.45.0.

Torchvision is required to run the multimodal tests for Llama 3.2-Vision (the image processor requires it).

Testing Done

  • Hardware Type: RTX 4090
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@tyler-romero
Copy link
Contributor Author

Wanted to add support for LayerNorm in the vision tower as well, but it is hard to patch the way it is used in modeling_mllama.py - a patch would end up changing torch.nn globally.

@tyler-romero tyler-romero marked this pull request as ready for review September 30, 2024 17:23
@ByronHsu ByronHsu mentioned this pull request Sep 30, 2024
shimizust and others added 9 commits October 1, 2024 11:26
## Summary
- Previously, the pre-trained weights were not being loaded if patching
model post-initialization
- Instead of loading weights, just patch the model instance module's
forward method (see linkedin#279)

## Testing Done
- In convergence tests, check that pre-init patching and post-init
patching match results from original model

- Hardware Type: A100
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence --> most tests
working, waiting for other fixes for all tests to pass
- Make the `transformers` dependency optional so we only have `torch`
and `triton` as required deps, which is helpful if you're not using
`transformers` for modeling code. This was also causing installation
issues for people using slightly older transformers versions.
- If transformers is needed, make it compatible with any 4.x version.
The specific model being used should dictate the transformers version
compatibility.

`pip install -e .[transformers]`
`pip install -e .[dev]`

A100-80G-PCIe

- Hardware Type: A100-80G-PCIe
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence
@tyler-romero
Copy link
Contributor Author

Just need someone to set a HF Token that has accepted the llama license in the CICD env:
https://discord.com/channels/1189498204333543425/1275130785933951039/1290378722653896744

>           raise EnvironmentError(
                "You are trying to access a gated repo.\nMake sure to have access to it at "
                f"[https://huggingface.co/{path_or_repo_id}.\n{str(e)}](https://huggingface.co/%7Bpath_or_repo_id%7D./n%7Bstr(e)%7D)"
            ) from e
E           OSError: You are trying to access a gated repo.
E           Make sure to have access to it at https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct.
E           401 Client Error. (Request ID: Root=1-66fc5343-253e99011960556f10a26970;1bfe5fb8-1fa8-4ff0-a21e-28041d5b6752)
E           
E           Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/config.json.
E           Access to model meta-llama/Llama-3.2-11B-Vision-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in.

@shivam15s
Copy link
Collaborator

shivam15s commented Oct 1, 2024

Added HF hub read secret to the repo and got the gated llama 3.2 access

@tyler-romero
Copy link
Contributor Author

Thanks, for some reason I'm still seeing the same issue - could you try triggering a cicd run on this branch?

@shivam15s
Copy link
Collaborator

Let me get back. Still figuring out how to configure the gpu to use the access token.

@shivam15s
Copy link
Collaborator

Seems like using gated stuff for tests isn't a good idea. I'm looking into using fake/mirrored models/processors for the tests

@tyler-romero
Copy link
Contributor Author

Any update @shivam15s ?

@shivam15s
Copy link
Collaborator

Hey @tyler-romero, created a PR on your branch. Let me know if the proposed solution makes sense

@tyler-romero
Copy link
Contributor Author

Really nice solution, thank you!

@shivam15s
Copy link
Collaborator

Seems like a convergence test fails because of a precision issue.
Could you please look into it? Thanks

@tyler-romero
Copy link
Contributor Author

Now that you made custom tokenizers for the multimodal tests, I could reduce the vocab sizes of the mini-models (to be in-line with the vocab sizes of the other mini-models), and the convergence tests are passing now.

Copy link
Collaborator

@shivam15s shivam15s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, lgtm

@tyler-romero
Copy link
Contributor Author

Sounds good, feel free to merge (only collaborators/linkedin employees can merge) - and please add yourself as a contributor to this change if GitHub doesn't do that automatically.

@ByronHsu
Copy link
Collaborator

Thank for both for driving this!!

@ByronHsu ByronHsu merged commit 9b10f48 into linkedin:main Oct 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants