v0.12.0
🚀 LLM Foundry v0.12.0
New Features
PyTorch 2.4 (#1505)
This release updates LLM Foundry to the PyTorch 2.4 release, bringing with it support for the new features and optimizations in PyTorch 2.4
Extensibility improvements (#1450, #1449, #1468, #1467, #1478, #1493, #1495, #1511, #1512, #1527)
Numerous improvements to the extensibility of the modeling and data loading code, enabling easier reuse for subclassing and extending. Please see the linked PRs for more details on each change.
Improved error messages (#1457, #1459, #1519, #1518, #1522, #1534, #1548, #1551)
Various improved error messages, making debugging user errors more clear.
Sliding window in torch attention (#1455)
We've added support for sliding window attention to the reference attention implementation, allowing easier testing and comparison against more optimized attention variants.
Bug fixes
Extra BOS token for llama 3.1 with completion data (#1476)
A bug resulted in an extra BOS token being added between prompt and response during finetuning. This is fixed so that the prompt and response supplied by the user are concatenated without any extra tokens put between them.
What's Changed
- Add test for logged_config transforms by @b-chu in #1441
- Bump version to 0.12.0.dev0. by @irenedea in #1447
- Update pytest-codeblocks requirement from <0.17,>=0.16.1 to >=0.16.1,<0.18 by @dependabot in #1445
- Bump coverage[toml] from 7.4.4 to 7.6.1 by @dependabot in #1442
- Enabled generalizing build_inner_model in ComposerHFCausalLM by @gupta-abhay in #1450
- Update llm foundry version in mcli yamls by @irenedea in #1451
- merge to main by @XiaohanZhangCMU in #865
- allow embedding resizing passed through by @jdchang1 in #1449
- Update packaging requirement from <23,>=21 to >=21,<25 by @dependabot in #1444
- Update pytest requirement from <8,>=7.2.1 to >=7.2.1,<9 by @dependabot in #1443
- Implement ruff rules enforcing PEP 585 by @snarayan21 in #1453
- Adding sliding window attn to scaled_multihead_dot_product_attention by @ShashankMosaicML in #1455
- Add user error for UnicodeDeocdeError in convert text to mds by @irenedea in #1457
- Fix log_config by @josejg in #1432
- Add EnvironmentLogger Callback by @josejg in #1350
- Update mosaicml/ci-testing to 0.1.2 by @irenedea in #1458
- Correct error message for inference wrapper by @josejg in #1459
- Update CI tests to v0.1.2 by @KuuCi in #1466
- Bump onnxruntime from 1.18.1 to 1.19.0 by @dependabot in #1461
- Update tenacity requirement from <9,>=8.2.3 to >=8.2.3,<10 by @dependabot in #1460
- Simple change to enable mapping functions for ft constructor by @gupta-abhay in #1468
- use default eval interval from composer by @milocress in #1369
- Consistent Naming EnviromentLoggingCallback by @josejg in #1470
- Register NaN Monitor Callback by @josejg in #1471
- Add train subset num batches by @mvpatel2000 in #1472
- Parent class hf models by @jdchang1 in #1467
- Remove extra bos for prompt/response data with llama3.1 by @dakinggg in #1476
- Add prepare fsdp back by @dakinggg in #1477
- Add date_string when applying tokenizer chat template by @snarayan21 in #1474
- Make sample tokenization extensible by @gupta-abhay in #1478
- Use Streaming version 0.8.1 by @snarayan21 in #1479
- Bump hf-transfer from 0.1.3 to 0.1.8 by @dependabot in #1480
- fix hf checkpointer by @milocress in #1489
- Fix device mismatch when running hf.generate by @ShashankMosaicML in #1486
- Bump composer to 0.24.1 + FSDP config device_mesh deprecation by @snarayan21 in #1487
- master_weights_dtype not supported by ComposerHFCausalLM.init() by @eldarkurtic in #1485
- Detect loss spikes and high losses during training by @joyce-chen-uni in #1473
- Enable passing in external position ids by @gupta-abhay in #1493
- Align logged attributes for errors and run metadata in kill_loss_spike_callback.py by @joyce-chen-uni in #1494
- tokenizer is never built when converting finetuning dataset by @eldarkurtic in #1496
- Removing error message for reusing kv cache with torch attn by @ShashankMosaicML in #1497
- Fix formatting of loss spike & high loss error messages by @joyce-chen-uni in #1498
- Enable cross attention layers by @gupta-abhay in #1495
- Update to ci-testing 0.2.0 by @dakinggg in #1500
- [WIP] Torch 2.4 in docker images by @snarayan21 in #1491
- [WIP] Only torch 2.4.0 compatible by @snarayan21 in #1505
- Update mlflow requirement from <2.16,>=2.14.1 to >=2.14.1,<2.17 by @dependabot in #1506
- Update ci-testing to 0.2.2 by @dakinggg in #1503
- Allow passing key_value_statest for x-attn through MPT Block by @gupta-abhay in #1511
- Fix cross attention for blocks by @gupta-abhay in #1512
- Put 2.3 image back in release examples by @dakinggg in #1513
- Sort callbacks so that CheckpointSaver goes before HuggingFaceCheckpointer by @irenedea in #1515
- Raise MisconfiguredDatasetError from original error by @irenedea in #1519
- Peft fsdp by @dakinggg in #1520
- Raise DatasetTooSmall exception if canonical nodes is less than num samples by @irenedea in #1518
- Add permissions check for delta table reading by @irenedea in #1522
- Add HuggingFaceCheckpointer option for only registering final checkpoint by @irenedea in #1516
- Replace FSDP args by @KuuCi in #1517
- enable correct padding_idx for embedding layers by @gupta-abhay in #1527
- Revert "Replace FSDP args" by @KuuCi in #1533
- Delete unneeded inner base model in PEFT HF Checkpointer by @snarayan21 in #1532
- Add deprecation warning to fsdp_config by @KuuCi in #1530
- Fix reuse kv cache for torch attention by @ShashankMosaicML in #1539
- Error on text dataset file not found by @milocress in #1534
- Make ICL tasks not required for eval by @snarayan21 in #1540
- Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. by @ShashankMosaicML in #1374
- Register mosaic logger by @dakinggg in #1542
- Hfcheckpointer optional generation config by @KuuCi in #1543
- Bump composer version to 0.25.0 by @dakinggg in #1546
- Bump streaming version to 0.9.0 by @dakinggg in #1550
- Bump version to 0.13.0.dev0 by @dakinggg in #1549
- Add proper user error for accessing schema by @KuuCi in #1548
- Validate Cluster Access Mode by @KuuCi in #1551
New Contributors
- @jdchang1 made their first contribution in #1449
- @joyce-chen-uni made their first contribution in #1473
Full Changelog: v0.11.0...v0.12.0