🚀 LLM Foundry v0.3.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!

Features

Llama-2 (#485, #520, #533)

Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the attention_patch_type in your yaml like so:

model:
    ...
    attention_patch_type: triton
    ...

See the example yaml for a full example of how to finetune Llama-2 on the MosaicML platform.

8-bit Lion (#514)

We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from decoupled_lionw to decoupled_lionw_8b!

Checkpoint conversion (#526, #519, #594)

We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.

To enable the new callback, add the hf_checkpointer callback to your yaml like so:

callbacks:
    ...
    hf_checkpointer:
        # Save a Hugging Face formatted checkpoint at the end of each epoch
        save_interval: 1ep
        # The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface, 
        # so this folder will likely be the same as your overall save_folder
        save_folder: ./{run_name}/checkpoints 
        # Set the precision you want the checkpoint saved in
        precision: bfloat16

Code evaluation (#587)

We have added support for running HumanEval (code evaluation) using LLM Foundry! See the evaluation readme for a more detailed description and the tasks yaml for an ICL yaml that can be used to run the HumanEval evaluation task.

Transformer Engine support (#432)

Adds support for using NVIDIA's Transformer Enginer to enable FP8 training. To enable, set fc_type='te' and/or ffn_config['ffn_type']='te_ln_mlp' and precision='amp_fp8'.

MLFlow (#475)

Adds support for using MLFlow as an experiment tracker. To enable, simply add mlflow to the loggers section of your yaml. See the Composer docs for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.

Updated streaming version/defaults (#503, #573, #580, #602)

Updates to the latest release of MosaicML Streaming and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!

Grouped Query Attention (#492)

Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set attn_config['attn_type']='grouped_query_attention' and attn_config['kv_n_heads'] to the desired number of kv heads.

MPT quality of life improvements (#559, #599)

Thanks to @tdoublep and @lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!

Eval gauntlet during training, inference API eval wrapper (#501, #494)

Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.

tiktoken support (#610)

We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:

tokenizer:
    name: tiktoken
    kwargs:
        model_name: gpt-4

LoRA eval (#515)

Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See this yaml for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!

Finetuning API

Lastly, we are building a finetuning API on top of LLM Foundry, Composer, and Streaming. Please reach out if you might be interested in using this API as a customer!

What's Changed

Release/v0.2.0 by @vchiley in #410
Update README.md by @abhi-mosaic in #429
Remove try catch in eval.py; make model_gauntlet optional in eval.py by @bmosaicml in #434
Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims by @sashaDoubov in #442
Update mcli-hf-generate.yaml by @vchiley in #456
Add trust remote code for tokenizer in inference conversion script by @margaretqian in #446
setup.py: replace composer with mosaicml by @guoyejun in #458
Add linear layer and ffn config to enable TransformerEngine layers (with FP8) by @vchiley in #432
use mono checkpoints by @samhavens in #448
Update inference benchmark with recent HF changes by @sashaDoubov in #461
Adding different device intialization to eval by @bcui19 in #466
Fix missing import by @bcui19 in #470
Autoresume default on by @mvpatel2000 in #467
Support eval loader when finetuning from JSONL files in object stores by @samhavens in #469
Fix ambiguous throughput in README by @abhi-mosaic in #476
adds early stopping call back by @codestar12 in #488
Update accelerate to 0.20.3 for LLaMa-2 support by @rishab-partha in #485
If Alibi is on, we should turn learned_pos_emb to False by @bcui19 in #489
Fix Local World Size by @rishab-partha in #495
Increase lint CI timeout by @dakinggg in #498
fix boolean for reentrant setting by @dakinggg in #500
Adding pyright to pre-commit by @bcui19 in #477
fix no bias assignment by @vchiley in #502
Updated StreamingTextDataset to pass take in shuffle_block_size by @snarayan21 in #503
Add MLFlow as a logger option by @aspfohl in #475
Remove old optimizer logs by @mvpatel2000 in #509
Updates GPU test timeout to use mcloud flag by @mvpatel2000 in #510
Grouped Query Attention + Refactor Attn by @sashaDoubov in #492
Fix training integration test by @j316chuck in #517
Update max duration due to mcli api change by @mvpatel2000 in #523
Fix typo in GQA comments by @sashaDoubov in #522
Pr eval lora by @danbider in #515
Monkeypatch flash attention in for llama by @dakinggg in #520
Add runtime error in train.py if yaml config is improperly formatted with extraneous or missing values by @j316chuck in #506
Add runtime error in eval.py if yaml config is improperly formatted with extraneous or missing values by @j316chuck in #521
Add import for pop_config from config_utils by @j316chuck in #531
Add an mcli yaml for running llama2 models by @dakinggg in #533
Adapt composer -> HF conversion script to support all causal lms by @dakinggg in #526
Refactor build optimizer and peft models to use kwargs syntax by @j316chuck in #525
Fix max duration scheduling by @mvpatel2000 in #537
Add missing mixed init to llama example by @dakinggg in #539
Add python log level in llm foundry eval script by @j316chuck in #546
Add Composer MPT to FasterTransformer Conversion Script by @nik-mosaic in #519
Fix init device in conversion script by @dakinggg in #556
Add Programming to Foundry by @rishab-partha in #441
Revert "Add Programming to Foundry" by @rishab-partha in #557
8-bit LION, take 2 by @dblalock in #514
Move the model creation to the last step before trainer creation by @dakinggg in #547
Fix propagation of drop_last and add error message when it would produce no batches by @dakinggg in #549
Enable eval script for HuggingFace 8bit models by @es94129 in #516
change docstring by @bmosaicml in #563
MPT: Change order of operands to enable PT2 compile for inference by @tdoublep in #559
Enable gauntlet training by @bmosaicml in #501
make 8bit flag optional by @bmosaicml in #567
Fix conversion script to work with checkpoints from composer dev by @dakinggg in #568
Adjust print statements in conversion script by @dakinggg in #569
Refactor build_tokenizer to use kwargs syntax and specify name by @j316chuck in #532
Add ValueError in evaluation script if load_path checkpoint is not specified in config for mpt_causal_lm's by @j316chuck in #535
added sampling_method to StreamingTextDataset by @snarayan21 in #573
Pin version of custom kernels more precisely by @dblalock in #577
Add handling for various types of malformed finetuning data by @dakinggg in #576
Add regression tests by @dakinggg in #574
StreamingTextDataset takes correct device batch size by @snarayan21 in #580
Update extension file not found error to be less confusing by @irenedea in #579
typecast shuffle_block_size because of issue by @codestar12 in #581
Upgrade composer version by @dakinggg in #560
Raise torch pin by @mvpatel2000 in #583
Updates the commit in the example llama2 yaml by @dakinggg in #584
Update to transformers 4.32 by @dakinggg in #561
change output format for better copy-pasting into excel by @dskhudia in #459
Fix huggingface custom split path issue by @dakinggg in #588
Add git-repo and git-branch params to regressions script by @irenedea in #591
Fix ComposerHFCausalLM instantiation with PeftModel by @irenedea in #593
Fix some type ignores by @hanlint in #589
Refactor logging by @hanlint in #234
add eval readme by @bmosaicml in #566
Update datasets version to latest by @irenedea in #585
Add lots of return types by @dakinggg in #595
Update transformers version by @dakinggg in #596
Add a callback to write huggingface checkpoints during the training run by @dakinggg in #594
Fix optimizer logging by @mvpatel2000 in #597
Skip flaky lion8b test by @dblalock in #598
Add script for MDS conversion of bucket of text files by @irenedea in #570
Updated streaming args for StreamingDataset subclasses by @snarayan21 in #602
Fixes a typo default arg by @dakinggg in #604
Add inference api eval wrapper by @bmosaicml in #494
Add comment indicating tokenizer API wrapper is experimental by @bmosaicml in #606
Remove regression tests by @irenedea in #607
Add default processes in text to mds conversion by @irenedea in #608
add cloud stores to foundry deps by @dakinggg in #612
Fix eval yamls by @irenedea in #609
Upgrade huggingface-hub dependency by @jerrychen109 in #613
Run CPU tests on a new dep group all-cpu by @dakinggg in #616
Allow MPT models to return attention weights by @lorabit110 in #599
Attempt to speed up codeql by @dakinggg in #617
Support for using tiktoken tokenizers by @dakinggg in #610
Fixes a bad merge in the tiktoken PR by @dakinggg in #619
Update setup.py to bump flash attn by @mvpatel2000 in #615
Replace dashes with underscores in split name by @irenedea in #626
Propagate bias through model by @mvpatel2000 in #627
Change repeat to expand in GQA by @sashaDoubov in #628
Add node rank to signal paths by @mvpatel2000 in #629
Bump composer version by @dakinggg in #630
Add code eval by @samhavens in #587

New Contributors

@margaretqian made their first contribution in #446
@guoyejun made their first contribution in #458
@rishab-partha made their first contribution in #485
@j316chuck made their first contribution in #517
@dblalock made their first contribution in #514
@es94129 made their first contribution in #516
@tdoublep made their first contribution in #559
@irenedea made their first contribution in #579
@jerrychen109 made their first contribution in #613
@lorabit110 made their first contribution in #599

Full Changelog: v0.2.0...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0