Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/GDRCOPY: rcache lookup overhead config variable #10311

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ivankochin
Copy link
Contributor

What?

Create config variable for gdrcopy lookup overhead

Why?

Now this overhead is mixed together with latency variable

@ivankochin ivankochin self-assigned this Nov 20, 2024
@@ -46,6 +46,10 @@ static ucs_config_field_t uct_gdr_copy_iface_config_table[] = {
ucs_offsetof(uct_gdr_copy_iface_config_t, put_latency)},
{NULL})},

{"RCACHE_OVERHEAD", "250ns",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be in uct_gdr_copy_md_config_table right after UCX_GDR_COPY_RCACHE.
And, perhaps md->reg_cost should be initialized using the overhead, if rcache is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you. We discussed that with @brminich and came to a conclusion that it should be different md_attr field similar to reg_cost but lookup_cost. Also it seems like ucp_proto_init_add_memreg_time doesn't differentiate cacheable and non-cacheable MDs and adds UCP rcache overhead even for GDRcopy which is wrong. Also we need to somehow consider that overhead separately for memtype_ep operation if they used to do D2H and H2D copies which extends the work that need to be done significantly.

# Real latency is around 30ns, rest is gdrcopy rcache overhead
# TODO: Add gdrcopy rcache overhead as separate performance graph node
UCX_GDR_COPY_LAT=200ns
UCX_GDR_COPY_BW=0MBs,get_dedicated:30GBs,put_dedicated:30GBsn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
UCX_GDR_COPY_BW=0MBs,get_dedicated:30GBs,put_dedicated:30GBsn
UCX_GDR_COPY_BW=0MBs,get_dedicated:30GBs,put_dedicated:30GBs

@@ -20,6 +20,7 @@ typedef struct uct_gdr_copy_iface {
uct_ppn_bandwidth_t put_bw;
ucs_linear_func_t get_latency;
ucs_linear_func_t put_latency;
double rcache_ovh;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to shorten the word.

@ivankochin ivankochin marked this pull request as draft November 22, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants