-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove refined recompute deep copy #9617
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
@@ -484,13 +484,13 @@ def forward(self, hidden_states): | |||
|
|||
|
|||
class QWenBlock(nn.Layer): | |||
def __init__(self, config): | |||
def __init__(self, config, layer_idx: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QwenBlock的外部调用,缺少了layer_idx的输入,可以检查一下。
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9617 +/- ##
===========================================
- Coverage 53.18% 52.79% -0.40%
===========================================
Files 718 718
Lines 113340 112256 -1084
===========================================
- Hits 60282 59267 -1015
+ Misses 53058 52989 -69 ☔ View full report in Codecov by Sentry. |
"", | ||
"refined_recompute, Choose from 'mlp_row_ln', 'mlp_column_ln', 'attention_row_ln', 'attention_column_ln', 'flash_attn']", | ||
), | ||
("skip_recompute_ops", Optional[Dict[str, int]], None, "skip_recompute_ops"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip_recompute_ops 这个没有了,现在加在哪里?
@@ -138,9 +138,9 @@ def get_triangle_upper_mask(x, mask=None): | |||
|
|||
|
|||
class QWenAttention(nn.Layer): | |||
def __init__(self, config): | |||
def __init__(self, config, layer_idx: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是必需加layer_idx的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
必须加,不然不知道第几层是不是需要开启rr
self.refined_recompute = kwargs.pop("refined_recompute", {}) | ||
self.skip_recompute_ops = kwargs.pop("skip_recompute_ops", {}) | ||
self.register_unsavable_keys(["refined_recompute", "skip_recompute_ops"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
放到这里了,作为config基类里面的属性,默认都是空字典
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加入到不保存的配置里吧,否则可能影响下游推理等任务加载。
"You can also set `skip_num` to a value within the range [1, ..., num_layers]. If `skip_num` exceeds `num_layers`, it will behave as if set to `-1`.\n" | ||
"If a parameter is omitted, it defaults to `xxx:0`." | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分看可以加到trainer 或者哪里吗?否则很容易漏写
self.refined_recompute = kwargs.pop("refined_recompute", {}) | ||
self.skip_recompute_ops = kwargs.pop("skip_recompute_ops", {}) | ||
self.register_unsavable_keys(["refined_recompute", "skip_recompute_ops"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加入到不保存的配置里吧,否则可能影响下游推理等任务加载。
PR types
Function optimization
PR changes
APIs
Description
删除refined recompute对config的deep copy操作。
添加对应的文档介绍。