Skip to content

Can llama.cpp support batched prompts? #10299

Closed Answered by ExtReMLapin
chosen-ox asked this question in Q&A
Discussion options

You must be logged in to vote

yes, easiest way to use it is with llama-server where you create multiple slots

Edit :

Long story short, you set the number of parallel requests and parallel http threads to the batch size.

total context size = batch_size * individual context (4096 or 8192 etc)

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@chosen-ox
Comment options

Answer selected by chosen-ox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants