Skip to content

Latest commit

 

History

History

mpt-7b-8k

alt text

MPT-7b-8k models

MPT-7B-8K are 7B parameter open-source LLM models with 8k context length trained with the MosaicML platform. It contains 2 models which are commercializable:

  • MPT-7B-8k: A decoder-style transformer pretrained starting from MPT-7B, but updating the sequence length to 8k and training for an additional 500B tokens, resulting in a total of 1.5T tokens of text and code. License: CC-BY-SA-3.0
  • MPT-7B-8k-Instruct: a model for long-form instruction following (especially summarization and question-answering). Built by finetuning MPT-7B-8k on several carefully curated datasets. License: CC-BY-SA-3.0

MPT-7B-8k FAQ

When would I choose…

  • MPT-7B-8k over MPT-7B? Use 8k in most cases, except when coding or reasoning ability are the only criteria, in which case you should evaluate both models
  • MPT-7B-8k-Instruct over MPT-7B-Instruct? 8k-Instruct excels at longform instruction following; use it when you have inputs longer than 2048 tokens or for summarization and question answering.