Corrected performance data when batch size is greater than 1 #1100

wgzintel · 2024-10-29T12:50:12Z

openvino.genai/src/cpp/src/perf_metrics.cpp

Line 115 in fa324cf

raw_metrics.m_durations[i] /= batch_sizes[i];
@m_durations[i] should be the duration for one inference which may generate one or more (batch_size > 1) tokens. In the case batch_size>1, it means @batch_size tokens are generated together in @m_durations[i] time, not one token is generated in @m_durations[i]/@batch_size time

as-suvorov · 2024-10-29T17:38:08Z

m_durations is used to calculate ttop (Time (in ms) per output token (TPOT)) below: https://github.com/openvinotoolkit/openvino.genai/pull/1100/files#diff-99518905570503f302fd3af3fdede4154abef49e74385eb130d958bce9b22756R121

A note "// If in 10 ms a batch of 5 new tokens is generated then TPOT is 10 / 5 = 2 tok/ms." seems valid to me (except it should be 2 ms / tok).
Should ttop calculation be modified as well?
cc: @pavel-esir

peterchen-intel · 2024-10-31T01:25:55Z

@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch.

Related PRs #940, #1100

pavel-esir · 2024-10-31T08:34:29Z

@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch.

Related PRs #940, #1100

thanks for opening this discussion, i will take a look

pavel-esir · 2024-10-31T08:47:40Z

@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch.

Related PRs #940, #1100

@peterchen-intel if you need time points/duration of each batch you can get them from raw_metrics.m_new_token_times

eg:

result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
new_token_times = result.perf_metrics.raw_metrics.m_new_token_times

All fields of openvino_genai.RawPerfMetrics can be found here
https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/include/openvino/genai/perf_metrics.hpp#L18-L44

Do we still a new PR?

pavel-esir · 2024-10-31T14:00:02Z

@peterchen-intel i have added example of getting times of generation of each token/batch of tokens from raw performance metrics here #1118

peterchen-intel · 2024-11-05T03:40:54Z

@pavel-esir Can we expose m_durations[i] with the time for the token(s) from one inference (generates batch_sizes[i] tokens)? Current m_durations[i] is not so convinced, since it can't generate one token in m_durations[i] when batch_sizes[i] >1, but generates batch_sizes[i] tokens with one inference in m_durations[i].

Corrected performance data when batch size is greater than 1

ca87aaa

github-actions bot added the category: sampling Sampling / Decoding algorithms label Oct 29, 2024

wgzintel requested review from peterchen-intel and eaidova October 29, 2024 12:52

Merge branch 'master' into guozhong/correct_perf_data_for_genai

616d828

peterchen-intel added the do_not_merge label Oct 30, 2024

peterchen-intel marked this pull request as draft October 31, 2024 01:26

peterchen-intel mentioned this pull request Oct 31, 2024

fix the caculation of performance metric #940

Open

ilya-lavrenov assigned pavel-esir and Wovchena and unassigned pavel-esir Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrected performance data when batch size is greater than 1 #1100

Corrected performance data when batch size is greater than 1 #1100

wgzintel commented Oct 29, 2024 •

edited

Loading

as-suvorov commented Oct 29, 2024

peterchen-intel commented Oct 31, 2024 •

edited

Loading

pavel-esir commented Oct 31, 2024

pavel-esir commented Oct 31, 2024 •

edited

Loading

pavel-esir commented Oct 31, 2024 •

edited

Loading

peterchen-intel commented Nov 5, 2024

Corrected performance data when batch size is greater than 1 #1100

Are you sure you want to change the base?

Corrected performance data when batch size is greater than 1 #1100

Conversation

wgzintel commented Oct 29, 2024 • edited Loading

as-suvorov commented Oct 29, 2024

peterchen-intel commented Oct 31, 2024 • edited Loading

pavel-esir commented Oct 31, 2024

pavel-esir commented Oct 31, 2024 • edited Loading

pavel-esir commented Oct 31, 2024 • edited Loading

peterchen-intel commented Nov 5, 2024

wgzintel commented Oct 29, 2024 •

edited

Loading

peterchen-intel commented Oct 31, 2024 •

edited

Loading

pavel-esir commented Oct 31, 2024 •

edited

Loading

pavel-esir commented Oct 31, 2024 •

edited

Loading