🚀 LLM Foundry v0.9.0

New Features

More Token Encoding Types (#1254)

We've expanded the different ways to encode token IDs by allowing uint32 and uint16 formats, which saves significant space for datasets with smaller vocab sizes. We also extended ndarray type support for MDS dataset columns to the generic text dataset and updated conversion scripts accordingly.

Enforced Stricter Configs (#1254, #1225, #1202)

We've implemented stricter enforcement on our Train and Eval configs to further protect users from attempting to train with invalid configs. In conjunction with numerous other PRs, we have stronger error handling to help users use LLM Foundry smoothly.

Previously, this was allowed:

parameters:
   train_dataloader:
      ...
      seed: ${global_seed}
      random_other_key_that's_not_in_the_dataloader_constructor # this is not allowed
   ...
   global_seed: 17 # this is also not allowed

But we've added a variables section. Please do this instead:

parameters:
  variables:
    global_seed: 42
  ...
  train_dataloader:
    seed: ${variables.global_seed}

Chunked text to mds conversion (#1240)

We've updated our text to mds to convertion script to convert files to MDS in chunks. This protects us from loading entire large files at once (potentially causing OOMs), and drastically speeds up converting long sequences.

Breaking Changes and Deprecations

What's Changed

Bump version v0.9.0.dev0 by @milocress in #1181
structuredconfig for train.py and eval.py by @milocress in #1051
update version names by @milocress in #1185
Refactoring attention by @ShashankMosaicML in #1182
Checking if attention mask is present for ignoring pad tokens in ffn. by @ShashankMosaicML in #1188
Bump python 3.11 version in setup.py by @j316chuck in #1189
Docstring fix for curriculum learning callback by @snarayan21 in #1186
Set ft dataloader name explicitly by @milocress in #1187
Remove to_container by @dakinggg in #1190
fix eval by @milocress in #1193
Log exception on inactivity callback by @jjanezhang in #1194
Pass FC type along for all FFN types by @dakinggg in #1196
Streaming version bump to 0.7.6 by @snarayan21 in #1195
Clearer error message for unknown example type by @milocress in #1202
Added torch_dmoe defaults, bug fixes for 2D inputs by @snarayan21 in #1210
log eval dataset misconfiguration by @milocress in #1179
Using self.shift_labels instead of self.model.transformer.shift_label in the loss function. by @ShashankMosaicML in #1211
Add fc to HF export by @dakinggg in #1209
TransformerEngine Image Build by @mvpatel2000 in #1204
Removed debugging code in tests by @dakinggg in #1213
Make fc_type a dict to pass fc kwargs through by @snarayan21 in #1201
Fix dmoe tests GPU OOM by @snarayan21 in #1216
Update readme to clarify flash-attn and TE installs by @snarayan21 in #1219
Modularize components of megablocks layer builder by @dakinggg in #1224
Add user error superclass by @milocress in #1225
Make config/class properties on ComposerMPTForCausalLM by @dakinggg in #1227
Quick patch to check that Dataset Keys contain non-None Values by @KuuCi in #1228
Modularize backbone class and block creation by @dakinggg in #1229
Loss v len callback by @ShashankMosaicML in #1226
Fixing the state.timestamp.batch.value issue in loss v len callback by @ShashankMosaicML in #1232
Fix attr error for attention_classes when using act ckpt by @cli99 in #1230
Fix tuple typing by @dakinggg in #1235
Add example eval scripts for dbrx PT sizes by @aspfohl in #1218
Configurable submesh by @dakinggg in #1236
Add retries to downloads in convert_text_to_mds.py by @irenedea in #1238
Move MLFlow dataset outside of log_config by @KuuCi in #1234
add error when chat template fails by @milocress in #1222
Make the exceptions serializable by @dakinggg in #1239
Removing rich install by @jjanezhang in #1198
Chunk file reads and tokenization for text to mds conversion by @irenedea in #1240
Make HF conversion automatically add missing imports by @dakinggg in #1241
Add logging to convert_text_to_mds.py script by @irenedea in #1243
Update CODEOWNERS by @dakinggg in #1248
Replacing icl_task_type question_answering with generation_task_with_answers in long context eval yamls. by @ShashankMosaicML in #1250
Change TE docker image to enable te_shard_weight by @j316chuck in #1251
Fix MPT HF conversion by @dakinggg in #1257
Remove spurious warning by @dakinggg in #1258
Adding more token encoding types by @snarayan21 in #1254
Bump Composer to 0.23.0 by @KuuCi in #1259
Fix typo in setup.py by @XiaohanZhangCMU in #1263
Bump composer to 0.23.2 by @dakinggg in #1269

Full Changelog: v0.8.0...v0.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0