v0.9.0
🚀 LLM Foundry v0.9.0
New Features
More Token Encoding Types (#1254)
We've expanded the different ways to encode token IDs by allowing uint32 and uint16 formats, which saves significant space for datasets with smaller vocab sizes. We also extended ndarray type support for MDS dataset columns to the generic text dataset and updated conversion scripts accordingly.
Enforced Stricter Configs (#1254, #1225, #1202)
We've implemented stricter enforcement on our Train and Eval configs to further protect users from attempting to train with invalid configs. In conjunction with numerous other PRs, we have stronger error handling to help users use LLM Foundry smoothly.
Previously, this was allowed:
parameters:
train_dataloader:
...
seed: ${global_seed}
random_other_key_that's_not_in_the_dataloader_constructor # this is not allowed
...
global_seed: 17 # this is also not allowed
But we've added a variables section. Please do this instead:
parameters:
variables:
global_seed: 42
...
train_dataloader:
seed: ${variables.global_seed}
Chunked text to mds conversion (#1240)
We've updated our text to mds to convertion script to convert files to MDS in chunks. This protects us from loading entire large files at once (potentially causing OOMs), and drastically speeds up converting long sequences.
Breaking Changes and Deprecations
What's Changed
- Bump version v0.9.0.dev0 by @milocress in #1181
- structuredconfig for train.py and eval.py by @milocress in #1051
- update version names by @milocress in #1185
- Refactoring attention by @ShashankMosaicML in #1182
- Checking if attention mask is present for ignoring pad tokens in ffn. by @ShashankMosaicML in #1188
- Bump python 3.11 version in setup.py by @j316chuck in #1189
- Docstring fix for curriculum learning callback by @snarayan21 in #1186
- Set ft dataloader name explicitly by @milocress in #1187
- Remove to_container by @dakinggg in #1190
- fix eval by @milocress in #1193
- Log exception on inactivity callback by @jjanezhang in #1194
- Pass FC type along for all FFN types by @dakinggg in #1196
- Streaming version bump to 0.7.6 by @snarayan21 in #1195
- Clearer error message for unknown example type by @milocress in #1202
- Added torch_dmoe defaults, bug fixes for 2D inputs by @snarayan21 in #1210
- log eval dataset misconfiguration by @milocress in #1179
- Using self.shift_labels instead of self.model.transformer.shift_label in the loss function. by @ShashankMosaicML in #1211
- Add fc to HF export by @dakinggg in #1209
- TransformerEngine Image Build by @mvpatel2000 in #1204
- Removed debugging code in tests by @dakinggg in #1213
- Make
fc_type
a dict to pass fc kwargs through by @snarayan21 in #1201 - Fix dmoe tests GPU OOM by @snarayan21 in #1216
- Update readme to clarify flash-attn and TE installs by @snarayan21 in #1219
- Modularize components of megablocks layer builder by @dakinggg in #1224
- Add user error superclass by @milocress in #1225
- Make config/class properties on ComposerMPTForCausalLM by @dakinggg in #1227
- Quick patch to check that Dataset Keys contain non-None Values by @KuuCi in #1228
- Modularize backbone class and block creation by @dakinggg in #1229
- Loss v len callback by @ShashankMosaicML in #1226
- Fixing the state.timestamp.batch.value issue in loss v len callback by @ShashankMosaicML in #1232
- Fix attr error for attention_classes when using act ckpt by @cli99 in #1230
- Fix tuple typing by @dakinggg in #1235
- Add example eval scripts for dbrx PT sizes by @aspfohl in #1218
- Configurable submesh by @dakinggg in #1236
- Add retries to downloads in convert_text_to_mds.py by @irenedea in #1238
- Move MLFlow dataset outside of log_config by @KuuCi in #1234
- add error when chat template fails by @milocress in #1222
- Make the exceptions serializable by @dakinggg in #1239
- Removing rich install by @jjanezhang in #1198
- Chunk file reads and tokenization for text to mds conversion by @irenedea in #1240
- Make HF conversion automatically add missing imports by @dakinggg in #1241
- Add logging to convert_text_to_mds.py script by @irenedea in #1243
- Update CODEOWNERS by @dakinggg in #1248
- Replacing icl_task_type question_answering with generation_task_with_answers in long context eval yamls. by @ShashankMosaicML in #1250
- Change TE docker image to enable te_shard_weight by @j316chuck in #1251
- Fix MPT HF conversion by @dakinggg in #1257
- Remove spurious warning by @dakinggg in #1258
- Adding more token encoding types by @snarayan21 in #1254
- Bump Composer to 0.23.0 by @KuuCi in #1259
- Fix typo in setup.py by @XiaohanZhangCMU in #1263
- Bump composer to 0.23.2 by @dakinggg in #1269
Full Changelog: v0.8.0...v0.9.0