27 Sep 22:06

dakinggg

54cbb8b

v0.3.0

🚀 LLM Foundry v0.3.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!

Features

Llama-2 (#485, #520, #533)

Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the attention_patch_type in your yaml like so:

model:
    ...
    attention_patch_type: triton
    ...

See the example yaml for a full example of how to finetune Llama-2 on the MosaicML platform.

8-bit Lion (#514)

We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from decoupled_lionw to decoupled_lionw_8b!

Checkpoint conversion (#526, #519, #594)

We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.

To enable the new callback, add the hf_checkpointer callback to your yaml like so:

callbacks:
    ...
    hf_checkpointer:
        # Save a Hugging Face formatted checkpoint at the end of each epoch
        save_interval: 1ep
        # The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface, 
        # so this folder will likely be the same as your overall save_folder
        save_folder: ./{run_name}/checkpoints 
        # Set the precision you want the checkpoint saved in
        precision: bfloat16

Code evaluation (#587)

We have added support for running HumanEval (code evaluation) using LLM Foundry! See the evaluation readme for a more detailed description and the tasks yaml for an ICL yaml that can be used to run the HumanEval evaluation task.

Transformer Engine support (#432)

Adds support for using NVIDIA's Transformer Enginer to enable FP8 training. To enable, set fc_type='te' and/or ffn_config['ffn_type']='te_ln_mlp' and precision='amp_fp8'.

MLFlow (#475)

Adds support for using MLFlow as an experiment tracker. To enable, simply add mlflow to the loggers section of your yaml. See the Composer docs for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.

Updated streaming version/defaults (#503, #573, #580, #602)

Updates to the latest release of MosaicML Streaming and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!

Grouped Query Attention (#492)

Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set attn_config['attn_type']='grouped_query_attention' and attn_config['kv_n_heads'] to the desired number of kv heads.

MPT quality of life improvements (#559, #599)

Thanks to @tdoublep and @lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!

Eval gauntlet during training, inference API eval wrapper (#501, #494)

Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.

tiktoken support (#610)

We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:

tokenizer:
    name: tiktoken
    kwargs:
        model_name: gpt-4

LoRA eval (#515)

Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See this yaml for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!

Finetuning API

Lastly, we are building a finetuning API on top of LLM Foundry, Composer, and Streaming. Please reach out if you might be interested in using this API as a customer!

What's Changed

Release/v0.2.0 by @vchiley in #410
Update README.md by @abhi-mosaic in #429
Remove try catch in eval.py; make model_gauntlet optional in eval.py by @bmosaicml in #434
Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims by @sashaDoubov in #442
Update mcli-hf-generate.yaml by @vchiley in #456
Add trust remote code for tokenizer in inference conversion script by @margaretqian in #446
setup.py: replace composer with mosaicml by @guoyejun in #458
Add linear layer and ffn config to enable TransformerEngine layers (with FP8) by @vchiley in #432
use mono checkpoints by @samhavens in #448
Update inference benchmark with recent HF changes by @sashaDoubov in #461
Adding different device intialization to eval by @bcui19 in #466
Fix missing import by @bcui19 in #470
Autoresume default on by @mvpatel2000 in #467
Support eval loader when finetuning from JSONL files in object stores by @samhavens in #469
Fix ambiguous throughput in README by @abhi-mosaic in #476
adds early stopping call back by @codestar12 in #488
Update accelerate to 0.20.3 for LLaMa-2 support by @rishab-partha in #485
If Alibi is on, we should turn learned_pos_emb to False by @bcui19 in #489
Fix Local World Size by @rishab-partha in #495
Increase lint CI timeout by @dakinggg in #498
fix boolean for reentrant setting by @dakinggg in #500
Adding pyright to pre-commit by @bcui19 in #477
fix no bias assignment by @vchiley in #502
Updated StreamingTextDataset to pass take in shuffle_block_size by @snarayan21 in #503
Add MLFlow as a logger option by @aspfohl in #475
Remove old optimizer logs by @mvpatel2000 in #509
Updates GPU test timeout to use mcloud flag by @mvpatel2000 in #510
Grouped Query Attention + Refactor Attn by @sashaDoubov in #492
Fix training integration test by @j316chuck in #517
Update max duration due to mcli api change by @mvpatel2000 in #523
Fix typo in GQA comments by @sashaDoubov in https://github.com/mosaic...

Contributors

dblalock, sashaDoubov, and 23 other contributors

Assets 2

04 Jul 05:36

vchiley

v0.2.0

d0efe55

v0.2.0

🚀 LLM Foundry v0.2.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.

We are excited to share the release of v0.2.0, packed with support for new hardware, features, and tutorials.

📖 Tutorials

We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!

To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.

Next, detailed guides for different workflows are linked below:

Training

In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.

Inference

The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:

Major Features

LLM Foundry now uses Composer v0.15.0 and Streaming v0.5.1 as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.

⚠️ The new Streaming release includes a few API changes, see the Streaming v0.5 release notes for more details. Our API have also been changed to reflect these API modifications.

🆕 Torch 2.0 support

LLM Foundry is now Torch 2.0 compatible!

Note: we have not tested torch.compile, but do not expect significant performance improvements.
⚡ H100 Support

We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.

To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.

For example, mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04 from our dockerhub has been tested with NVIDIA H100 systems.

No code changes should be required.
📈 AMD MI250 GPU Support

With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.

Running with our stack was straightforward: use the ROCm 5.4 docker image rocm/dev-ubuntu-20.04:5.4.3-complete; and install PyTorch for ROCm 5.4 and install Flash Attention.

Modify your configuration settings:
- attn_impl=flash instead of the default triton
  - Note: ALiBi is currently not supported with attn_impl=flash.
- loss_fn=torch_crossentropy instead of the default fused_crossentropy.
🚧 LoRA finetuning (Preview)

We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).

To use LoRA, follow the instructions found here.

Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
🔎 Evaluation Refactor (#308)

Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
- Instead of model, use the models keyword and provide a list of models.
- tokenizer is now model-specific.
For example, to run the gauntlet of various eval tasks with mosaicml/mpt-7b:
```
cd llm-foundry/scripts
composer eval/eval.py eval/yamls/hf_eval.yaml
    model_name_or_path=mosaicml/mpt-7b
```
This release also makes evaluation deterministic even on different number of GPUs.

For more details on all these changes, see #308
⏱️ Benchmarking Inference

To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.

PR List

hf dict cfg overrides by @vchiley in #90
Add slack and license buttons to readme by @growlix in #98
Add minimum mosaicml-streaming version by @hanlint in #110
Update dataloader.py by @nelsontkq in #102
Add features to hf_generate by @alextrott16 in #116
Make mpt7b finetuning more obvious by @samhavens in #101
Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by @alanxmay in #131
Fix HF conversion script to upload to S3 after editing the files to be HF compatible by @dakinggg in #136
Set pad_token_id to tokenizer.pad_token_id if not set on command line by @patrickhwood in #118
Changed the keep_zip default to False to comply with StreamingDataset by @karan6181 in #150
Add cloud upload to checkpoint conversion script by @dakinggg in #151
Adds precision to eval by @mvpatel2000 in #148
Update StreamingDataset defaults by @abhi-mosaic in #157
Explain composer command by @hanlint in #164
Remove pynvml by @hanlint in #165
Adds a concrete finetuning example from a custom dataset by @alextrott16 in #156
Remove health checker by @mvpatel2000 in #167
Rename datasets to avoid hf conflict by @hanlint in #175
Torch2 (#177) by @vchiley in #178
Revert "Torch2 (#177) (#178)" by @dakinggg in #181
clean up dataset conversion readme by @codestar12 in #168
Convert MPT checkpoints to FT format by @dskhudia in #169
Update README.md by @jacobfulano in #198
Removed unused tokenizer_name config field by @dakinggg in #206
Add community links to README by @hanlint in #182
Add Tensorboard logger to yaml config by @hanlint in #166
Update inference README by @abhi-mosaic in #204
torch2 updt with hf fixes by @vchiley in #193
Removing deprecated vocabulary size parameter from composer CE metrics by @sashaDoubov in #222
Add `composer[...

Contributors

patrickhwood, dwyatte, and 29 other contributors

Assets 2

10 May 18:12

mvpatel2000

v0.1.1

6c16a6e

v0.1.1

What's New

LLM Foundry is now on PyPI!

What's Changed

Update README.md by @ejyuen in #72
Update version by @dakinggg in #73
Remove todo in workflow by @mvpatel2000 in #74
Bump composer version by @vchiley in #84
Fix pypi by @mvpatel2000 in #80
Remove xentropy from pypi by @mvpatel2000 in #86
Fix sed command for xentropy by @mvpatel2000 in #87
Updates to prefixlm and t5 by @alextrott16 in #85
Disable image for pypi by @mvpatel2000 in #97

New Contributors

@ejyuen made their first contribution in #72

Full Changelog: v0.1.0...v0.1.1

Contributors

vchiley, alextrott16, and 3 other contributors

Assets 2

08 May 23:12

dakinggg

v0.1.0

67e61a2

Announcing LLM Foundry and the MPT foundation series

🚀 LLM Foundry v0.1.0

This is the first release of MosaicML's LLM Foundry!

Our efficient code for training, evaluating, and deploying LLMs outgrew our examples repository, so we've migrated to a brand new repository dedicated to everything LLMs. Keep watching this space and see the top-level README and our blog post for more details on this announcement!

Model releases

In addition to all the open-source code released here, we're releasing four open-source models that we hope will be useful to the community. All models were trained on the MosaicML platform, using Composer and Streaming. If you're interested in training your own models, or using these models with our optimized inference stack, please reach out!

mpt-7b: This is our base 7-billion parameter model, trained for 1 trillion tokens. This model is released with an Apache-2.0 (commercial use permitted) license.
mpt-7b-storywriter: All of the models use ALiBi to allow them to exrapolate to longer sequence lengths than they saw during training, but storywriter is our long context model, further pretrained on 65k-token excerpts of a fiction subset of the books3 corpus. This model is released with an Apache-2.0 (commercial use permitted) license.
mpt-7b-instruct: This model is instruction finetuned on a dataset we also release, derived from Databrick's Dolly-15k and Anthropic’s Helpful and Harmless datasets. This model is released with a CC-By-SA-3.0 (commercial use permitted) license.
mpt-7b-chat: This model is trained to be able to chat by further training on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. This model is released with a CC-By-NC-SA-4.0 (non-commercial use only) license.

Features

Training

We release fully featured code for efficiently training any HuggingFace LLM (including our optimized MPT using FSDP, Composer, and Streaming. Seamlessly scale to multi-gpu and multi-node training, stream your data from one cloud, train on a different cloud, write checkpoints to a third cloud, send your training logs to Weights&Biases, and much more. See the README for more detailed instructions on getting started pretraining and finetuning!

Our MPT model is equipped with the latest advancements in training large transformers (e.g. ALiBi, the LION optimizer, FlashAttention), and is desgined to be easily hackable, configurable, and extendable!

Evaluation

Our evaluation framework, makes it easy to fully re-evaluate any HuggingFace model. We also include copies of the processed data for many popular benchmarks, to make it easy to replicate our evals, and perform your own! We welcome the addition of new benchmarks to our suite. In previous benchmarks, our setup is 8x faster than other eval frameworks on a single GPU and seamlessly achieves linear scaling with multiple GPUs. Built-in support for FSDP makes it possible to evaluate large models and use larger batch sizes for further acceleration.

Inference

MPT is designed to be fast, easy, and cheap to deploy for inference. To begin with, all MPT models are subclassed from the HuggingFace PretrainedModel base class, which means that they are fully compatible with the HuggingFace ecosystem. You can upload MPT models to the HuggingFace Hub, generate outputs with standard pipelines like model.generate(...), build HuggingFace Spaces (see some of ours here!), and more.

What about performance? With MPT’s optimized layers (including FlashAttention and low precision layernorm), the out-of-the-box performance of MPT-7B on GPUs when using model.generate(...) is 1.5x-2x faster than other 7B models like LLaMa-7B. This makes it easy to build fast and flexible inference pipelines with just HuggingFace and PyTorch.

Finally, for the best hosting experience, deploy your MPT models directly on MosaicML’s Inference service. Start with our managed endpoints for models like MPT-7B-Instruct, and/or deploy your own custom model endpoints for optimal cost and data privacy. Check out the Inference blog post for more details!

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 LLM Foundry v0.3.0

Features

Llama-2 (#485, #520, #533)

8-bit Lion (#514)

Checkpoint conversion (#526, #519, #594)

Code evaluation (#587)

Transformer Engine support (#432)

MLFlow (#475)

Updated streaming version/defaults (#503, #573, #580, #602)

Grouped Query Attention (#492)

MPT quality of life improvements (#559, #599)

Eval gauntlet during training, inference API eval wrapper (#501, #494)

tiktoken support (#610)

LoRA eval (#515)

Finetuning API

What's Changed

Contributors

🚀 LLM Foundry v0.2.0

📖 Tutorials

Training

Inference

Major Features

PR List

Contributors

What's New

What's Changed

New Contributors

Contributors

🚀 LLM Foundry v0.1.0

Model releases

Features

Training

Evaluation

Inference

Releases: mosaicml/llm-foundry

v0.3.0

🚀 LLM Foundry v0.3.0

Features

Llama-2 (#485, #520, #533)

8-bit Lion (#514)

Checkpoint conversion (#526, #519, #594)

Code evaluation (#587)

Transformer Engine support (#432)

MLFlow (#475)

Updated streaming version/defaults (#503, #573, #580, #602)

Grouped Query Attention (#492)

MPT quality of life improvements (#559, #599)

Eval gauntlet during training, inference API eval wrapper (#501, #494)

tiktoken support (#610)

LoRA eval (#515)

Finetuning API

What's Changed

Contributors

v0.2.0

🚀 LLM Foundry v0.2.0

📖 Tutorials

Training

Inference

Major Features

PR List

Contributors

v0.1.1

What's New

What's Changed

New Contributors

Contributors

Announcing LLM Foundry and the MPT foundation series

🚀 LLM Foundry v0.1.0

Model releases

Features

Training

Evaluation

Inference