Skip to content

databricks/databricks-ml-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

databricks-ml-examples

databricks/databricks-ml-examples is a repository to show machine learning examples on Databricks platforms.

Currently this repository contains:

  • llm-models/: Example notebooks to use different State of the art (SOTA) models on Databricks.
  • llm-fine-tuning/: Fine tuning scripts and notebooks to fine tune State of the art (SOTA) models on Databricks.

SOTA LLM examples

Databricks works with thousands of customers to build generative AI applications. While you can use Databricks to work with any generative AI model, including commercial and research, the table below lists our current model recommendations for popular use cases. Note: The table only lists open source models that are for free commercial use.

Use case Quality-optimized Balanced Speed-optimized
Text generation following instructions Mixtral-8x7B-Instruct-v0.1

Llama-2-70b-chat-hf
mistral-7b

MPT-7B-Instruct
MPT-7B-8k-Instruct

Llama-2-7b-chat-hf
Llama-2-13b-chat-hf
phi-2
Text embeddings (English only) e5-mistral-7b-instruct(7B) bge-large-en-v1.5(0.3B)
e5-large-v2 (0.3B)
bge-base-en-v1.5 (0.1B)
e5-base-v2 (0.1B)
Transcription (speech to text) whisper-large-v2(1.6B)
whisper-medium (0.8B)
Image generation stable-diffusion-xl
Code generation CodeLlama-70b-hf
CodeLlama-70b-Instruct-hf
CodeLlama-70b-Python-hf (Python optimized)
CodeLlama-34b-hf
CodeLlama-34b-Instruct-hf
CodeLlama-34b-Python-hf (Python optimized)
CodeLlama-13b-hf
CodeLlama-13b-Instruct-hf
CodeLlama-13b-Python-hf (Python optimized)
CodeLlama-7b-hf
CodeLlama-7b-Instruct-hf
CodeLlama-7b-Python-hf (Python optimized)

Model Evaluation Leaderboard

Text generation models

The model evaluation results presented below are measured by the Mosaic Eval Gauntlet framework. This framework comprises a series of tasks specifically designed to assess the performance of language models, including widely-adopted benchmarks such as MMLU, Big-Bench, HellaSwag, and more.

Model Name Core Average World Knowledge Commonsense Reasoning Language Understanding Symbolic Problem Solving Reading Comprehension
Mistral-7B-v0.1 0.522 0.558 0.513 0.555 0.342 0.641
falcon-40b 0.501 0.556 0.55 0.535 0.269 0.597
falcon-40b-instruct 0.5 0.542 0.571 0.544 0.264 0.58
Llama-2-13b-hf 0.479 0.515 0.482 0.52 0.279 0.597
Llama-2-13b-chat-hf 0.476 0.522 0.512 0.514 0.271 0.559
Mistral-7B-Instruct-v0.1 0.469 0.48 0.502 0.492 0.266 0.604
mpt-30b-instruct 0.465 0.48 0.513 0.494 0.238 0.599
mpt-30b 0.431 0.494 0.47 0.477 0.234 0.481
Llama-2-7b-chat-hf 0.42 0.476 0.447 0.478 0.221 0.478
Llama-2-7b-hf 0.401 0.457 0.41 0.454 0.217 0.465
mpt-7b-8k-instruct 0.36 0.363 0.41 0.405 0.165 0.458
mpt-7b-instruct 0.354 0.399 0.415 0.372 0.171 0.415
mpt-7b-8k 0.354 0.427 0.368 0.426 0.171 0.378
falcon-7b 0.335 0.371 0.421 0.37 0.159 0.355
mpt-7b 0.324 0.356 0.384 0.38 0.163 0.336
falcon-7b-instruct 0.307 0.34 0.372 0.333 0.108 0.38

Other examples: