Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrected performance data when batch size is greater than 1 #1100

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wgzintel
Copy link
Contributor

@wgzintel wgzintel commented Oct 29, 2024

openvino.genai/src/cpp/src/perf_metrics.cpp

Line 115 in fa324cf

raw_metrics.m_durations[i] /= batch_sizes[i];
@m_durations[i] should be the duration for one inference which may generate one or more (batch_size > 1) tokens. In the case batch_size>1, it means @batch_size tokens are generated together in @m_durations[i] time, not one token is generated in @m_durations[i]/@batch_size time

@github-actions github-actions bot added the category: sampling Sampling / Decoding algorithms label Oct 29, 2024
@as-suvorov
Copy link
Contributor

m_durations is used to calculate ttop (Time (in ms) per output token (TPOT)) below: https://github.com/openvinotoolkit/openvino.genai/pull/1100/files#diff-99518905570503f302fd3af3fdede4154abef49e74385eb130d958bce9b22756R121

A note "// If in 10 ms a batch of 5 new tokens is generated then TPOT is 10 / 5 = 2 tok/ms." seems valid to me (except it should be 2 ms / tok).
Should ttop calculation be modified as well?
cc: @pavel-esir

@peterchen-intel
Copy link
Collaborator

peterchen-intel commented Oct 31, 2024

@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch.

Related PRs #940, #1100

@pavel-esir
Copy link
Contributor

@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch.

Related PRs #940, #1100

thanks for opening this discussion, i will take a look

@pavel-esir
Copy link
Contributor

pavel-esir commented Oct 31, 2024

@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch.

Related PRs #940, #1100

@peterchen-intel if you need time points/duration of each batch you can get them from raw_metrics.m_new_token_times

eg:

result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
new_token_times = result.perf_metrics.raw_metrics.m_new_token_times

All fields of openvino_genai.RawPerfMetrics can be found here
https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/include/openvino/genai/perf_metrics.hpp#L18-L44

Do we still a new PR?

@pavel-esir
Copy link
Contributor

pavel-esir commented Oct 31, 2024

@peterchen-intel i have added example of getting times of generation of each token/batch of tokens from raw performance metrics here #1118

@peterchen-intel
Copy link
Collaborator

@pavel-esir Can we expose m_durations[i] with the time for the token(s) from one inference (generates batch_sizes[i] tokens)? Current m_durations[i] is not so convinced, since it can't generate one token in m_durations[i] when batch_sizes[i] >1, but generates batch_sizes[i] tokens with one inference in m_durations[i].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: sampling Sampling / Decoding algorithms do_not_merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants