🚀 LLM Foundry v0.11.0

New Features

LLM Foundry CLI Commands (#1337, #1345, #1348, #1354)

We've added CLI commands for our commonly used scripts.

For example, instead of calling composer llm-foundry/scripts/train.py parameters.yaml, you can now do composer -c llm-foundry train parameters.yaml.

Docker Images Contain All Optional Dependencies (#1431)

LLM Foundry Docker images now have all optional dependencies.

Support for Llama3 Rope Scaling (#1391)

To use it, you can add the following to your parameters:

model:
    name: mpt_causal_lm
    attn_config:
      rope: true
      ...
      rope_impl: hf
      rope_theta: 500000
      rope_hf_config:
        type: llama3
        ...

Tokenizer Registry (#1386)

We now have a tokenizer registry so you can easily add custom tokenizers.

LoadPlanner and SavePlanner Registries (#1358)

We now have LoadPlanner and SavePlanner registries so you can easily add custom checkpoint loading and saving logic.

Faster Auto-packing (#1435)

The auto packing startup is now much faster. To use auto packing with finetuning datasets, you can add packing_ratio: auto to your config like so:

  train_loader:
    name: finetuning
    dataset:
      ...
      packing_ratio: auto

What's Changed

Extra serverless by @XiaohanZhangCMU in #1320
Fixing sequence_id =-1 bug, adding tests by @ShashankMosaicML in #1324
Registry docs update by @dakinggg in #1323
Add dependabot by @dakinggg in #1322
HUGGING_FACE_HUB_TOKEN -> HF_TOKEN by @dakinggg in #1321
Bump version by @b-chu in #1326
Relax hf hub pin by @dakinggg in #1314
Error if metadata matches existing keys by @dakinggg in #1313
Update transformers requirement from <4.41,>=4.40 to >=4.42.3,<4.43 by @dependabot in #1327
Bump einops from 0.7.0 to 0.8.0 by @dependabot in #1328
Bump onnxruntime from 1.15.1 to 1.18.1 by @dependabot in #1329
Bump onnx from 1.14.0 to 1.16.1 by @dependabot in #1331
Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. by @ShashankMosaicML in #1332
Fix registry for callbacks with configs by @mvpatel2000 in #1333
Adding a child class of hf's rotary embedding to make hf generate work on multiple gpus. by @ShashankMosaicML in #1334
Add a config arg to just save an hf checkpoint by @dakinggg in #1335
Deepcopy config in callbacks_with_config by @mvpatel2000 in #1336
Avoid HF race condition by @dakinggg in #1338
Nicer error message for undefined symbol by @dakinggg in #1339
Bump sentencepiece from 0.1.97 to 0.2.0 by @dependabot in #1342
Removing logging exception through update run metadata by @jjanezhang in #1292
[MCLOUD-4910] Escape UC names during data prep by @naren-loganathan in #1343
Add CLI for train.py by @KuuCi in #1337
Add fp32 to the set of valid inputs to attention layer by @j316chuck in #1347
Log all extraneous_keys in one go for ease of development by @josejg in #1344
Fix MLFlow Save Model for TE by @j316chuck in #1353
Add flag for saving only composer checkpoint by @irenedea in #1356
Expose flag for should_save_peft_only by @irenedea in #1357
Command utils + train by @KuuCi in #1361
Readd Clear Resolver by @KuuCi in #1365
Add Eval to Foundry CLI by @KuuCi in #1345
Enhanced Logging for convert_delta_to_json and convert_text_to_mds by @vanshcsingh in #1366
Add convert_dataset_hf to CLI by @KuuCi in #1348
Add missing init by @KuuCi in #1368
Make ICL dataloaders build lazily by @josejg in #1359
Add option to unfuse Wqkv by @snarayan21 in #1367
Add convert_dataset_json to CLI by @KuuCi in #1349
Add convert_text_to_mds to CLI by @KuuCi in #1352
Fix hf dataset hang on small dataset by @dakinggg in #1370
Add LoadPlanner and SavePlanner registries by @irenedea in #1358
Load config on rank 0 first by @dakinggg in #1371
Add convert_finetuning_dataset to CLI by @KuuCi in #1354
Allow for transforms on the model before MLFlow registration by @snarayan21 in #1372
Allow flash attention up to 3 by @dakinggg in #1377
Update accelerate requirement from <0.26,>=0.25 to >=0.32.1,<0.33 by @dependabot in #1341
update runners by @KevDevSha in #1360
Allow for multiple workers when autopacking by @b-chu in #1375
Allow train.py-like config for eval.py by @josejg in #1351
Fix load and save planner config logic by @irenedea in #1385
Do dtype conversion in torch hook to save memory by @irenedea in #1384
Get a shared file system safe signal file name by @dakinggg in #1381
Add transformation method to hf_causal_lm by @irenedea in #1383
[kushalkodnad/tokenizer-registry] Introduce new registry for tokenizers by @kushalkodn-db in #1386
Bump transformers version to 4.43.1 by @dakinggg in #1388
Add convert_delta_to_json to CLI by @KuuCi in #1355
Revert "Use utils to get shared fs safe signal file name (#1381)" by @dakinggg in #1389
Avoid race condition in convert text to mds script by @dakinggg in #1390
Refactor loss function for ComposerMPTCausalLM by @irenedea in #1387
Revert "Allow for multiple workers when autopacking (#1375)" by @dakinggg in #1392
Bump transformers to 4.43.2 by @dakinggg in #1393
Support rope scaling by @milocress in #1391
Removing the extra LlamaRotaryEmbedding import by @ShashankMosaicML in #1394
Dtensor oom by @dakinggg in #1395
Condition the meta initialization for hf_causal_lm on pretrain by @irenedea in #1397
Fix license link in readme by @dakinggg in #1398
Enable passing epsilon when building norm layers by @gupta-abhay in #1399
Add pre register method for mlflow by @dakinggg in #1396
add it by @dakinggg in #1400
Remove orig params default by @dakinggg in #1401
Add spin_dataloaders flag by @dakinggg in #1405
Remove curriculum learning error when duration less than saved timestamp by @b-chu in #1406
Set pretrained model name correctly, if provided, in HF Checkpointer by @snarayan21 in #1407
Enable QuickGelu Function for CLIP models by @gupta-abhay in #1408
Bump streaming version to v0.8.0 by @mvpatel2000 in #1411
Kevin/ghcr build by @KevDevSha in #1413
Update accelerate requirement from <0.33,>=0.25 to >=0.25,<0.34 by @dependabot in #1403
Update huggingface-hub requirement from <0.24,>=0.19.0 to >=0.19.0,<0.25 by @dependabot in #1379
Make Pytest log in color in Github Action by @eitanturok in #1412
Read Package Version Better by @eitanturok in #1415
Log original config by @josejg in #1410
Replace pydocstyle with Ruff by @eitanturok in #1417
test cpu by @KevDevSha in #1416
Update pr-gpu.yaml by @KevDevSha in #1420
Additional registry entrypoint documentation by @dakinggg in #1414
Remove type ignore by @dakinggg in #1421
Update pytest-cov requirement from <5,>=4 to >=4,<6 by @dependabot in #1423
Bump onnx from 1.16.1 to 1.16.2 by @dependabot in #1425
Add transforms to logged config by @b-chu in #1428
Install all optional dependencies in the docker images by @dakinggg in #1431
Raise error when not enough data when converting text to MDS by @KuuCi in #1430
Bump yaml versions by @dakinggg in #1433
Automatically get the portion of the dataset config that is constructor args by @dakinggg in #1434
Remove flash patching for HF by @dakinggg in #1436
Fix the context size in long context gauntlet for wikiqa by @bfontain in #1439
Update mlflow requirement from <2.15,>=2.14.1 to >=2.14.1,<2.16 by @dependabot in #1424
Add special errors for bad chat/ift types by @milocress in #1437
Make autopacking faster by @b-chu in #1435
Use the pretrained generation config if it exists for HF models by @irenedea in #1440

New Contributors

@dependabot made their first contribution in #1327
@naren-loganathan made their first contribution in #1343
@vanshcsingh made their first contribution in #1366
@KevDevSha made their first contribution in #1360
@kushalkodn-db made their first contribution in #1386
@gupta-abhay made their first contribution in #1399
@bfontain made their first contribution in #1439

Full Changelog: v0.10.0...v0.11.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.11.0