v0.3.0
🚀 LLM Foundry v0.3.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!
Features
Llama-2 (#485, #520, #533)
Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the attention_patch_type
in your yaml like so:
model:
...
attention_patch_type: triton
...
See the example yaml for a full example of how to finetune Llama-2 on the MosaicML platform.
8-bit Lion (#514)
We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from decoupled_lionw
to decoupled_lionw_8b
!
Checkpoint conversion (#526, #519, #594)
We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.
To enable the new callback, add the hf_checkpointer
callback to your yaml like so:
callbacks:
...
hf_checkpointer:
# Save a Hugging Face formatted checkpoint at the end of each epoch
save_interval: 1ep
# The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface,
# so this folder will likely be the same as your overall save_folder
save_folder: ./{run_name}/checkpoints
# Set the precision you want the checkpoint saved in
precision: bfloat16
Code evaluation (#587)
We have added support for running HumanEval (code evaluation) using LLM Foundry! See the evaluation readme for a more detailed description and the tasks yaml for an ICL yaml that can be used to run the HumanEval evaluation task.
Transformer Engine support (#432)
Adds support for using NVIDIA's Transformer Enginer to enable FP8 training. To enable, set fc_type='te'
and/or ffn_config['ffn_type']='te_ln_mlp'
and precision='amp_fp8'
.
MLFlow (#475)
Adds support for using MLFlow as an experiment tracker. To enable, simply add mlflow
to the loggers
section of your yaml. See the Composer docs for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.
Updated streaming version/defaults (#503, #573, #580, #602)
Updates to the latest release of MosaicML Streaming and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!
Grouped Query Attention (#492)
Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set attn_config['attn_type']='grouped_query_attention'
and attn_config['kv_n_heads']
to the desired number of kv heads.
MPT quality of life improvements (#559, #599)
Thanks to @tdoublep and @lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!
Eval gauntlet during training, inference API eval wrapper (#501, #494)
Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.
tiktoken support (#610)
We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:
tokenizer:
name: tiktoken
kwargs:
model_name: gpt-4
LoRA eval (#515)
Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See this yaml for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!
Finetuning API
Lastly, we are building a finetuning API on top of LLM Foundry, Composer, and Streaming. Please reach out if you might be interested in using this API as a customer!
What's Changed
- Release/v0.2.0 by @vchiley in #410
- Update README.md by @abhi-mosaic in #429
- Remove try catch in eval.py; make model_gauntlet optional in eval.py by @bmosaicml in #434
- Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims by @sashaDoubov in #442
- Update mcli-hf-generate.yaml by @vchiley in #456
- Add trust remote code for tokenizer in inference conversion script by @margaretqian in #446
- setup.py: replace composer with mosaicml by @guoyejun in #458
- Add linear layer and ffn config to enable TransformerEngine layers (with FP8) by @vchiley in #432
- use mono checkpoints by @samhavens in #448
- Update inference benchmark with recent HF changes by @sashaDoubov in #461
- Adding different device intialization to eval by @bcui19 in #466
- Fix missing import by @bcui19 in #470
- Autoresume default on by @mvpatel2000 in #467
- Support eval loader when finetuning from JSONL files in object stores by @samhavens in #469
- Fix ambiguous
throughput
in README by @abhi-mosaic in #476 - adds early stopping call back by @codestar12 in #488
- Update accelerate to 0.20.3 for LLaMa-2 support by @rishab-partha in #485
- If Alibi is on, we should turn learned_pos_emb to False by @bcui19 in #489
- Fix Local World Size by @rishab-partha in #495
- Increase lint CI timeout by @dakinggg in #498
- fix boolean for reentrant setting by @dakinggg in #500
- Adding pyright to pre-commit by @bcui19 in #477
- fix no bias assignment by @vchiley in #502
- Updated StreamingTextDataset to pass take in shuffle_block_size by @snarayan21 in #503
- Add MLFlow as a logger option by @aspfohl in #475
- Remove old optimizer logs by @mvpatel2000 in #509
- Updates GPU test timeout to use mcloud flag by @mvpatel2000 in #510
- Grouped Query Attention + Refactor Attn by @sashaDoubov in #492
- Fix training integration test by @j316chuck in #517
- Update max duration due to mcli api change by @mvpatel2000 in #523
- Fix typo in GQA comments by @sashaDoubov in #522
- Pr eval lora by @danbider in #515
- Monkeypatch flash attention in for llama by @dakinggg in #520
- Add runtime error in train.py if yaml config is improperly formatted with extraneous or missing values by @j316chuck in #506
- Add runtime error in eval.py if yaml config is improperly formatted with extraneous or missing values by @j316chuck in #521
- Add import for pop_config from config_utils by @j316chuck in #531
- Add an mcli yaml for running llama2 models by @dakinggg in #533
- Adapt composer -> HF conversion script to support all causal lms by @dakinggg in #526
- Refactor build optimizer and peft models to use kwargs syntax by @j316chuck in #525
- Fix max duration scheduling by @mvpatel2000 in #537
- Add missing mixed init to llama example by @dakinggg in #539
- Add python log level in llm foundry eval script by @j316chuck in #546
- Add Composer MPT to FasterTransformer Conversion Script by @nik-mosaic in #519
- Fix init device in conversion script by @dakinggg in #556
- Add Programming to Foundry by @rishab-partha in #441
- Revert "Add Programming to Foundry" by @rishab-partha in #557
- 8-bit LION, take 2 by @dblalock in #514
- Move the model creation to the last step before trainer creation by @dakinggg in #547
- Fix propagation of
drop_last
and add error message when it would produce no batches by @dakinggg in #549 - Enable eval script for HuggingFace 8bit models by @es94129 in #516
- change docstring by @bmosaicml in #563
- MPT: Change order of operands to enable PT2 compile for inference by @tdoublep in #559
- Enable gauntlet training by @bmosaicml in #501
- make 8bit flag optional by @bmosaicml in #567
- Fix conversion script to work with checkpoints from composer dev by @dakinggg in #568
- Adjust print statements in conversion script by @dakinggg in #569
- Refactor build_tokenizer to use kwargs syntax and specify name by @j316chuck in #532
- Add ValueError in evaluation script if load_path checkpoint is not specified in config for mpt_causal_lm's by @j316chuck in #535
- added sampling_method to StreamingTextDataset by @snarayan21 in #573
- Pin version of custom kernels more precisely by @dblalock in #577
- Add handling for various types of malformed finetuning data by @dakinggg in #576
- Add regression tests by @dakinggg in #574
- StreamingTextDataset takes correct device batch size by @snarayan21 in #580
- Update extension file not found error to be less confusing by @irenedea in #579
- typecast shuffle_block_size because of issue by @codestar12 in #581
- Upgrade composer version by @dakinggg in #560
- Raise torch pin by @mvpatel2000 in #583
- Updates the commit in the example llama2 yaml by @dakinggg in #584
- Update to transformers 4.32 by @dakinggg in #561
- change output format for better copy-pasting into excel by @dskhudia in #459
- Fix huggingface custom split path issue by @dakinggg in #588
- Add git-repo and git-branch params to regressions script by @irenedea in #591
- Fix ComposerHFCausalLM instantiation with PeftModel by @irenedea in #593
- Fix some type ignores by @hanlint in #589
- Refactor logging by @hanlint in #234
- add eval readme by @bmosaicml in #566
- Update datasets version to latest by @irenedea in #585
- Add lots of return types by @dakinggg in #595
- Update transformers version by @dakinggg in #596
- Add a callback to write huggingface checkpoints during the training run by @dakinggg in #594
- Fix optimizer logging by @mvpatel2000 in #597
- Skip flaky lion8b test by @dblalock in #598
- Add script for MDS conversion of bucket of text files by @irenedea in #570
- Updated streaming args for StreamingDataset subclasses by @snarayan21 in #602
- Fixes a typo default arg by @dakinggg in #604
- Add inference api eval wrapper by @bmosaicml in #494
- Add comment indicating tokenizer API wrapper is experimental by @bmosaicml in #606
- Remove regression tests by @irenedea in #607
- Add default processes in text to mds conversion by @irenedea in #608
- add cloud stores to foundry deps by @dakinggg in #612
- Fix eval yamls by @irenedea in #609
- Upgrade huggingface-hub dependency by @jerrychen109 in #613
- Run CPU tests on a new dep group
all-cpu
by @dakinggg in #616 - Allow MPT models to return attention weights by @lorabit110 in #599
- Attempt to speed up codeql by @dakinggg in #617
- Support for using tiktoken tokenizers by @dakinggg in #610
- Fixes a bad merge in the tiktoken PR by @dakinggg in #619
- Update setup.py to bump flash attn by @mvpatel2000 in #615
- Replace dashes with underscores in split name by @irenedea in #626
- Propagate bias through model by @mvpatel2000 in #627
- Change
repeat
toexpand
in GQA by @sashaDoubov in #628 - Add node rank to signal paths by @mvpatel2000 in #629
- Bump composer version by @dakinggg in #630
- Add code eval by @samhavens in #587
New Contributors
- @margaretqian made their first contribution in #446
- @guoyejun made their first contribution in #458
- @rishab-partha made their first contribution in #485
- @j316chuck made their first contribution in #517
- @dblalock made their first contribution in #514
- @es94129 made their first contribution in #516
- @tdoublep made their first contribution in #559
- @irenedea made their first contribution in #579
- @jerrychen109 made their first contribution in #613
- @lorabit110 made their first contribution in #599
Full Changelog: v0.2.0...v0.3.0