-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/GDRCOPY: rcache lookup overhead config variable #10311
base: master
Are you sure you want to change the base?
UCT/GDRCOPY: rcache lookup overhead config variable #10311
Conversation
@@ -46,6 +46,10 @@ static ucs_config_field_t uct_gdr_copy_iface_config_table[] = { | |||
ucs_offsetof(uct_gdr_copy_iface_config_t, put_latency)}, | |||
{NULL})}, | |||
|
|||
{"RCACHE_OVERHEAD", "250ns", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be in uct_gdr_copy_md_config_table
right after UCX_GDR_COPY_RCACHE
.
And, perhaps md->reg_cost
should be initialized using the overhead, if rcache is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you. We discussed that with @brminich and came to a conclusion that it should be different md_attr
field similar to reg_cost
but lookup_cost
. Also it seems like ucp_proto_init_add_memreg_time
doesn't differentiate cacheable and non-cacheable MDs and adds UCP rcache overhead even for GDRcopy which is wrong. Also we need to somehow consider that overhead separately for memtype_ep
operation if they used to do D2H and H2D copies which extends the work that need to be done significantly.
# Real latency is around 30ns, rest is gdrcopy rcache overhead | ||
# TODO: Add gdrcopy rcache overhead as separate performance graph node | ||
UCX_GDR_COPY_LAT=200ns | ||
UCX_GDR_COPY_BW=0MBs,get_dedicated:30GBs,put_dedicated:30GBsn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
UCX_GDR_COPY_BW=0MBs,get_dedicated:30GBs,put_dedicated:30GBsn | |
UCX_GDR_COPY_BW=0MBs,get_dedicated:30GBs,put_dedicated:30GBs |
@@ -20,6 +20,7 @@ typedef struct uct_gdr_copy_iface { | |||
uct_ppn_bandwidth_t put_bw; | |||
ucs_linear_func_t get_latency; | |||
ucs_linear_func_t put_latency; | |||
double rcache_ovh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need to shorten the word.
What?
Create config variable for gdrcopy lookup overhead
Why?
Now this overhead is mixed together with latency variable