🚀 LLM Foundry v0.12.0

New Features

PyTorch 2.4 (#1505)

This release updates LLM Foundry to the PyTorch 2.4 release, bringing with it support for the new features and optimizations in PyTorch 2.4

Extensibility improvements (#1450, #1449, #1468, #1467, #1478, #1493, #1495, #1511, #1512, #1527)

Numerous improvements to the extensibility of the modeling and data loading code, enabling easier reuse for subclassing and extending. Please see the linked PRs for more details on each change.

Improved error messages (#1457, #1459, #1519, #1518, #1522, #1534, #1548, #1551)

Various improved error messages, making debugging user errors more clear.

Sliding window in torch attention (#1455)

We've added support for sliding window attention to the reference attention implementation, allowing easier testing and comparison against more optimized attention variants.

Bug fixes

Extra BOS token for llama 3.1 with completion data (#1476)

A bug resulted in an extra BOS token being added between prompt and response during finetuning. This is fixed so that the prompt and response supplied by the user are concatenated without any extra tokens put between them.

What's Changed

Add test for logged_config transforms by @b-chu in #1441
Bump version to 0.12.0.dev0. by @irenedea in #1447
Update pytest-codeblocks requirement from <0.17,>=0.16.1 to >=0.16.1,<0.18 by @dependabot in #1445
Bump coverage[toml] from 7.4.4 to 7.6.1 by @dependabot in #1442
Enabled generalizing build_inner_model in ComposerHFCausalLM by @gupta-abhay in #1450
Update llm foundry version in mcli yamls by @irenedea in #1451
merge to main by @XiaohanZhangCMU in #865
allow embedding resizing passed through by @jdchang1 in #1449
Update packaging requirement from <23,>=21 to >=21,<25 by @dependabot in #1444
Update pytest requirement from <8,>=7.2.1 to >=7.2.1,<9 by @dependabot in #1443
Implement ruff rules enforcing PEP 585 by @snarayan21 in #1453
Adding sliding window attn to scaled_multihead_dot_product_attention by @ShashankMosaicML in #1455
Add user error for UnicodeDeocdeError in convert text to mds by @irenedea in #1457
Fix log_config by @josejg in #1432
Add EnvironmentLogger Callback by @josejg in #1350
Update mosaicml/ci-testing to 0.1.2 by @irenedea in #1458
Correct error message for inference wrapper by @josejg in #1459
Update CI tests to v0.1.2 by @KuuCi in #1466
Bump onnxruntime from 1.18.1 to 1.19.0 by @dependabot in #1461
Update tenacity requirement from <9,>=8.2.3 to >=8.2.3,<10 by @dependabot in #1460
Simple change to enable mapping functions for ft constructor by @gupta-abhay in #1468
use default eval interval from composer by @milocress in #1369
Consistent Naming EnviromentLoggingCallback by @josejg in #1470
Register NaN Monitor Callback by @josejg in #1471
Add train subset num batches by @mvpatel2000 in #1472
Parent class hf models by @jdchang1 in #1467
Remove extra bos for prompt/response data with llama3.1 by @dakinggg in #1476
Add prepare fsdp back by @dakinggg in #1477
Add date_string when applying tokenizer chat template by @snarayan21 in #1474
Make sample tokenization extensible by @gupta-abhay in #1478
Use Streaming version 0.8.1 by @snarayan21 in #1479
Bump hf-transfer from 0.1.3 to 0.1.8 by @dependabot in #1480
fix hf checkpointer by @milocress in #1489
Fix device mismatch when running hf.generate by @ShashankMosaicML in #1486
Bump composer to 0.24.1 + FSDP config device_mesh deprecation by @snarayan21 in #1487
master_weights_dtype not supported by ComposerHFCausalLM.init() by @eldarkurtic in #1485
Detect loss spikes and high losses during training by @joyce-chen-uni in #1473
Enable passing in external position ids by @gupta-abhay in #1493
Align logged attributes for errors and run metadata in kill_loss_spike_callback.py by @joyce-chen-uni in #1494
tokenizer is never built when converting finetuning dataset by @eldarkurtic in #1496
Removing error message for reusing kv cache with torch attn by @ShashankMosaicML in #1497
Fix formatting of loss spike & high loss error messages by @joyce-chen-uni in #1498
Enable cross attention layers by @gupta-abhay in #1495
Update to ci-testing 0.2.0 by @dakinggg in #1500
[WIP] Torch 2.4 in docker images by @snarayan21 in #1491
[WIP] Only torch 2.4.0 compatible by @snarayan21 in #1505
Update mlflow requirement from <2.16,>=2.14.1 to >=2.14.1,<2.17 by @dependabot in #1506
Update ci-testing to 0.2.2 by @dakinggg in #1503
Allow passing key_value_statest for x-attn through MPT Block by @gupta-abhay in #1511
Fix cross attention for blocks by @gupta-abhay in #1512
Put 2.3 image back in release examples by @dakinggg in #1513
Sort callbacks so that CheckpointSaver goes before HuggingFaceCheckpointer by @irenedea in #1515
Raise MisconfiguredDatasetError from original error by @irenedea in #1519
Peft fsdp by @dakinggg in #1520
Raise DatasetTooSmall exception if canonical nodes is less than num samples by @irenedea in #1518
Add permissions check for delta table reading by @irenedea in #1522
Add HuggingFaceCheckpointer option for only registering final checkpoint by @irenedea in #1516
Replace FSDP args by @KuuCi in #1517
enable correct padding_idx for embedding layers by @gupta-abhay in #1527
Revert "Replace FSDP args" by @KuuCi in #1533
Delete unneeded inner base model in PEFT HF Checkpointer by @snarayan21 in #1532
Add deprecation warning to fsdp_config by @KuuCi in #1530
Fix reuse kv cache for torch attention by @ShashankMosaicML in #1539
Error on text dataset file not found by @milocress in #1534
Make ICL tasks not required for eval by @snarayan21 in #1540
Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. by @ShashankMosaicML in #1374
Register mosaic logger by @dakinggg in #1542
Hfcheckpointer optional generation config by @KuuCi in #1543
Bump composer version to 0.25.0 by @dakinggg in #1546
Bump streaming version to 0.9.0 by @dakinggg in #1550
Bump version to 0.13.0.dev0 by @dakinggg in #1549
Add proper user error for accessing schema by @KuuCi in #1548
Validate Cluster Access Mode by @KuuCi in #1551

New Contributors

@jdchang1 made their first contribution in #1449
@joyce-chen-uni made their first contribution in #1473

Full Changelog: v0.11.0...v0.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.0