-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DETECT: Task: 'off' flag corruption for pid #329
Comments
@solardiz
Unfortunately I can't find anything for:
What is interesting it is not failing always for
|
Can you please try uncommenting |
Here is the one with debug enabled, this time with different container:
dmesg:
|
Thanks @80kk , could you also enable log_level to level 4 under |
How can I do this? The only log_level occurrence I found in this file is in:
|
@80kk You don't need to patch anything to adjust |
Here is the call trace with
and Docker container log:
|
I don't know if that matters but as you probably already noticed this is a VM running on Proxmox. Underlying hardware is Dell PowerEdge R320. |
Sorry for late reply. I tried to repro your issue under VmWare:
but under the kernel and everything works fine. Is there anything specific to repro it? |
Well, as I wrote in my first post:
There was no 24.04 released at that time and hypervisor is Proxmox but I don't think that this is the factor. I will probably upgrade to 24.04 during this weekend and update the ticket. |
Unfortunately there is no official way for upgrading to 24.04 until 24.04.1 will be released. If you think this issue is resolved in 6.8 kernel then feel free and close this ticket. |
Even if the issue is resolved or otherwise avoided in 6.8, we may still care to fix it for older kernels. LKRG supports a wide range of kernel versions. |
I spent some time to do the same test on:
Under the kernel
In the kernel logs I can see that LKRG runs fine:
No other logs related to LKRG. However, I have a question @80kk , do you see in the logs something similar to those messages?
|
So LKRG appears to be killing some Gnome and other apps:
|
The task name discrepancy here is interesting. In our code, it's A mismatch in triggering of I wonder if the below little hack would make a difference with respect to this issue: +++ b/src/modules/exploit_detection/syscalls/exec/p_security_bprm_committed_creds/p_security_bprm_committed_creds.c
@@ -31,8 +31,8 @@ char p_security_bprm_committed_creds_kretprobe_state = 0;
static struct kretprobe p_security_bprm_committed_creds_kretprobe = {
.kp.symbol_name = "security_bprm_committed_creds",
- .handler = p_security_bprm_committed_creds_ret,
- .entry_handler = NULL,
+ .handler = NULL,
+ .entry_handler = p_security_bprm_committed_creds_ret,
.data_size = sizeof(struct p_security_bprm_committed_creds_data),
};
@Strykar and others in here who are able to reproduce the issue, I'd appreciate you trying the above. Thank you! |
That's probably not exactly it - wouldn't explain some other stuff also seen in @Strykar's logs - but I'd appreciate testing anyhow. @Strykar @80kk What CPUs did you see this issue on? |
Please try the below patch/hack to see if it makes a difference: +++ b/src/modules/exploit_detection/p_exploit_detection.c
@@ -985,6 +985,7 @@ static inline void p_validate_off_flag(struct p_ed_process *p_source, long p_val
#if P_OVL_OVERRIDE_SYNC_MODE
notrace int p_verify_ovl_override_sync(struct p_ed_process *p_source) {
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
p_validate_off_flag(p_source,p_off,NULL); // Validate
@@ -998,18 +999,20 @@ notrace int p_verify_ovl_override_sync(struct p_ed_process *p_source) {
notrace void p_ed_is_off_off_wrap(struct p_ed_process *p_source) {
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
p_ed_is_off_off(p_source,p_off,NULL);
}
notrace void p_ed_validate_off_flag_wrap(struct p_ed_process *p_source) {
-
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
p_validate_off_flag(p_source,p_off,NULL); // Validate
}
notrace void p_set_ed_process_on(struct p_ed_process *p_source) {
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
#if defined(CONFIG_SECCOMP)
@@ -1029,6 +1032,7 @@ notrace void p_set_ed_process_on(struct p_ed_process *p_source) {
notrace void p_set_ed_process_off(struct p_ed_process *p_source) {
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
#if defined(CONFIG_SECCOMP)
@@ -1047,6 +1051,7 @@ notrace void p_set_ed_process_off(struct p_ed_process *p_source) {
notrace void p_set_ed_process_override_on(struct p_ed_process *p_source) {
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
p_validate_off_flag(p_source,p_off,NULL); // Validate
@@ -1059,6 +1064,7 @@ notrace void p_set_ed_process_override_on(struct p_ed_process *p_source) {
notrace void p_set_ed_process_override_off(struct p_ed_process *p_source) {
+ smp_rmb();
register unsigned long p_off = p_source->p_ed_task.p_off ^ p_global_off_cookie; // Decode
p_validate_off_flag(p_source,p_off,NULL); // Validate
@@ -1071,7 +1077,7 @@ notrace void p_reset_ed_flags(struct p_ed_process *p_source) {
p_source->p_ed_task.p_off = p_global_cnt_cookie ^ p_global_off_cookie;
p_source->p_ed_task.p_off_count = 0;
-
+ smp_wmb();
}
int p_dump_task_f(void *p_arg) {
@@ -1265,6 +1271,8 @@ static int p_cmp_creds(struct p_cred *p_orig, const struct cred *p_current_cred,
static int p_cmp_tasks(struct p_ed_process *p_orig, struct task_struct *p_current, char p_kill) {
+ smp_rmb();
+
const char p_opt = 1; /* for uses of the P_CMP_PTR() macro */
int p_ret = 0, p_killed = 0;
register long p_off = p_orig->p_ed_task.p_off ^ p_global_off_cookie; This may produce "warning: ISO C90 forbids mixed declarations and code" - we'll address this properly if merging these changes for real. I think the write barrier here should be unneeded because calls to |
Refreshing my memory on x86 memory ordering (mostly guaranteed as-is) and what the |
it wasn't tested a lot across systems, but average solution for me was |
Still waiting to hear from @80kk on the CPU.
@m1lua Can you show the corresponding patch, please? I doubt this is exactly what changes we'll want to make, but it could give us a hint as to what the actual problem may be. |
|
I just started with LKRG by building it for Ubuntu 22.04 with 5.15.0-101-generic kernel. So far it seems to be working fine however I am getting everyday:
host is running Docker containers for Mailcow while none of the containers were restarted/killed it looks more like it prevent new container from starting?
The text was updated successfully, but these errors were encountered: