Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Error: Initialization issues during scap_init #3323

Open
OneideLuizSchneider opened this issue Sep 12, 2024 · 33 comments
Open

ERROR: Error: Initialization issues during scap_init #3323

OneideLuizSchneider opened this issue Sep 12, 2024 · 33 comments
Labels
Milestone

Comments

@OneideLuizSchneider
Copy link

OneideLuizSchneider commented Sep 12, 2024

Describe the bug

After the POD restarted 8 times it worked.
ERROR: Error: Initialization issues during scap_init

Just Install it, details are below.

Expected behaviour
it should not need to restart to able to work

Screenshots
Screenshot 2024-09-12 at 20 11 59

Environment

  • Falco version:
    Falco version: 0.38.2 (x86_64)
  • System info:
    Linux version 5.10.223-212.873.amzn2.x86_64 (mockbuild@ip-10-0-60-177) (gcc10-gcc (GCC) 10.5.0 20230707 (Red Hat 10.5.0-1), GNU ld version 2.35.2-9.amzn2.0.1) Digwatch compiler #1 SMP Wed Aug 7 16:53:32 UTC 2024
  • Cloud provider or hardware configuration:
  • OS:
    AWS Linux 2
  • Kernel:
    5.10
  • Installation method:

EKS 1.29

helm upgrade --install falco falcosecurity/falco \
    -f values.yml \
    --create-namespace \
    --namespace falco

values.yaml->

tty: true

driver:
  enabled: true
  kind: modern_ebpf

falco:

  rules_files:
    - /etc/falco/falco_rules.yaml
    - /etc/falco/falco-incubating_rules.yaml
    - /etc/falco/falco-sandbox_rules.yaml
    - /etc/falco/rules.d
  rules:
    - disable:
        tag: T1552.005
    - disable:
        tag: T1565

  json_output: true

extra:
  env:
    - name: FALCO_HOSTNAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName

falcoctl:

  artifact:
    install:
      enabled: true
    follow:
      enabled: true
  config:
    artifact:
      allowedTypes:
        - rulesfile
      install:
        resolveDeps: false
        refs: [falco-rules:latest, falco-incubating-rules:latest, falco-sandbox-rules:latest]
      follow:
        refs: [falco-rules:latest, falco-incubating-rules:latest, falco-sandbox-rules:latest]

falcosidekick:
  enabled: false

Additional context

I saw many other folks reporting this here, but it's not clear why this happened and how to fix it if there is a fix.

@FedeDP
Copy link
Contributor

FedeDP commented Sep 25, 2024

Hi! Thanks for reporting this issue; i don't have an answer, this seems really weird; since at every restart Falco is using the same driver (ie: modern ebpf one in this case), perhaps it is a timing issue with something else on the system?
cc @Andreagit97 perhaps got more ideas, as i don't really know what to look for in this specific case.

/milestone 0.40.0

@poiana poiana added this to the 0.40.0 milestone Sep 25, 2024
@OneideLuizSchneider
Copy link
Author

OneideLuizSchneider commented Oct 1, 2024

@FedeDP FYI I removed the incubating_rules, sandbox_rules and I had the same issue.

 - /etc/falco/falco-incubating_rules.yaml
 - /etc/falco/falco-sandbox_rules.yaml`

@Andreagit97
Copy link
Member

IMO we should enable a more verbose log Error: Initialization issues during scap_init is too generic to understand what is going on

@kirylbelavus
Copy link

kirylbelavus commented Oct 17, 2024

I encountered the same issue in a similar environment, and switching to eBPF mode instead of modern_eBPF was the only solution that helped. I tried enabling debug logs, but they didn’t provide any insight. Additionally, it’s worth noting that in an EKS cluster with 4 nodes, only 1 node failed to start Falco in modern_eBPF mode (although the kernel version is the same on all nodes)

@roobre
Copy link

roobre commented Oct 19, 2024

Seeing a very similar behavior here:

2024-10-19T09:58:46.595592088Z Sat Oct 19 09:58:46 2024: The --cri option is deprecated and will be removed in Falco 0.40.0. Use -o container_engines.cri.sockets]=<socket_path> instead.
2024-10-19T09:58:46.598439995Z Sat Oct 19 09:58:46 2024: Falco version: 0.39.1 (x86_64)
2024-10-19T09:58:46.598439995Z Sat Oct 19 09:58:46 2024: Falco initialized with configuration files:
2024-10-19T09:58:46.598451935Z Sat Oct 19 09:58:46 2024:    /etc/falco/config.d/engine-kind-falcoctl.yaml | schema validation: ok
2024-10-19T09:58:46.598451935Z Sat Oct 19 09:58:46 2024:    /etc/falco/falco.yaml | schema validation: ok
2024-10-19T09:58:46.598496263Z Sat Oct 19 09:58:46 2024: System info: Linux version 6.6.57-1-lts (linux-lts@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Thu, 17 Oct 2024 13:57:25 +0000
2024-10-19T09:58:46.598824145Z Sat Oct 19 09:58:46 2024: Loading rules from:
2024-10-19T09:58:46.630720133Z Sat Oct 19 09:58:46 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
2024-10-19T09:58:46.651177935Z Sat Oct 19 09:58:46 2024:    /etc/falco/rules.d/rules-override.yaml | schema validation: ok
2024-10-19T09:58:46.651177935Z Sat Oct 19 09:58:46 2024: /etc/falco/rules.d/rules-override.yaml: Ok, with warnings
2024-10-19T09:58:46.651177935Z 1 Warnings:
2024-10-19T09:58:46.651177935Z In rules content: (/etc/falco/falco_rules.yaml:0:0)
2024-10-19T09:58:46.651177935Z     list 'read_sensitive_file_images': (/etc/falco/falco_rules.yaml:382:2)
2024-10-19T09:58:46.651177935Z ------
2024-10-19T09:58:46.651177935Z - list: read_sensitive_file_images
2024-10-19T09:58:46.651177935Z   ^
2024-10-19T09:58:46.651177935Z ------
2024-10-19T09:58:46.651177935Z LOAD_UNUSED_LIST (Unused list): List not referred to by any other rule/macro
2024-10-19T09:58:46.651239866Z Sat Oct 19 09:58:46 2024: Hostname value has been overridden via environment variable to: moniserver
2024-10-19T09:58:46.651761569Z Sat Oct 19 09:58:46 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
2024-10-19T09:58:46.651776556Z Sat Oct 19 09:58:46 2024: Starting health webserver with threadiness 4, listening on 0.0.0.0:8765
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: Loaded event sources: syscall
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: Enabled event sources: syscall
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: Opening 'syscall' source with modern BPF probe.
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: One ring buffer every '2' CPUs.
2024-10-19T09:58:47.613393945Z Sat Oct 19 09:58:47 2024: An error occurred in an event source, forcing termination...
2024-10-19T09:58:47.766747214Z Error: Initialization issues during scap_init
2024-10-19T09:58:47.767029775Z Events detected: 0
2024-10-19T09:58:47.767029775Z Rule counts by severity:
2024-10-19T09:58:47.767029775Z Triggered rules by rule name:
2024-10-19T09:58:53.682056483Z Stream closed EOF for falco/falco-nz8g7 (falco)

This is a very vanilla helm installation with the following values:

  customRules:
    rules-override.yaml: |-
      - macro: user_known_contact_k8s_api_server_activities
        condition: |-
          container.image.repository = registry.k8s.io/node-problem-detector/node-problem-detector
          or
          proc.name startswith node-problem-de
          or
          container.image.repository = ghcr.io/roobre/ktemplate
          or
          container.image.repository = ghcr.io/k8up-io/k8up
          or
          container.name startswith k8up
        override:
          condition: replace
      - macro: user_known_stand_streams_redirect_activities
        condition: |-
          container.image.repository = ghcr.io/fluxcd/kustomize-controller
          or
          (container.name startswith crocochrome and proc.name = chromium)
        override:
          condition: replace
      - macro: known_drop_and_execute_activities
        condition: |-
          (container.image.repository = ghcr.io/flaresolverr/flaresolverr and proc.name = chromedriver)
        override:
          condition: replace
      - macro: user_read_sensitive_file_containers
        condition: |-
          container.id = host
        override:
          condition: replace
      - list: user_known_packet_socket_binaries
        items:
          - speaker # metallb
          - bfdd # also metallb
        override:
          items: append
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: null
      memory: 512Mi

  falcosidekick:
    enabled: true
    replicaCount: 1
    resources:
      requests:
        cpu: 10m
        memory: 64Mi
      limits:
        memory: 64Mi
    config:
      existingSecret: creds

Using the default image shipped in the chart

dependencies:
  - name: falco
    repository: https://falcosecurity.github.io/charts
    version: 4.11.1
Linux moniserver 6.6.57-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 17 Oct 2024 13:57:25 +0000 x86_64 GNU/Linux

Also attaching /proc/config.gz in case it helps
config.gz

@PierreBart
Copy link

PierreBart commented Oct 31, 2024

Hello,

The falco pods running in my GKE cluster fail with the same error Initialization issues during scap_init, but contrary to the author, the pods actually keep restarting forever.

Falco output:

Thu Oct 31 08:50:11 2024: The --cri option is deprecated and will be removed in Falco 0.40.0. Use -o container_engines.cri.sockets[]=<socket_path> instead.
Thu Oct 31 08:50:11 2024: Falco version: 0.39.1 (x86_64)
Thu Oct 31 08:50:11 2024: Falco initialized with configuration files:
Thu Oct 31 08:50:11 2024:    /etc/falco/falco.yaml | schema validation: ok
Thu Oct 31 08:50:11 2024: System info: Linux version 6.6.44+ (builder@5b283881ec70) (Chromium OS 17.0_pre498229-r33 clang version 17.0.0 (/var/cache/chromeos-cache/distfiles/egit-src/external/github.com/llvm/llvm-project 14f0776550b5a49e1c42f49a00213f7f3fa047bf), LLD 17.0.0) #1 SMP PREEMPT_DYNAMIC Sat Sep 28 09:09:42 UTC 2024
Thu Oct 31 08:50:11 2024: Loading rules from:
Thu Oct 31 08:50:11 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
Thu Oct 31 08:50:11 2024: Hostname value has been overridden via environment variable to: gke-***
Thu Oct 31 08:50:11 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Oct 31 08:50:11 2024: Starting health webserver with threadiness 4, listening on 0.0.0.0:8765
Thu Oct 31 08:50:11 2024: Loaded event sources: syscall
Thu Oct 31 08:50:11 2024: Enabled event sources: syscall
Thu Oct 31 08:50:11 2024: Opening 'syscall' source with modern BPF probe.
Thu Oct 31 08:50:11 2024: One ring buffer every '2' CPUs.
Thu Oct 31 08:50:11 2024: An error occurred in an event source, forcing termination...
Error: Initialization issues during scap_init
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

Environment

  • Falco version: 0.39.1
  • System info: Linux version 6.6.44+ (builder@5b283881ec70) (Chromium OS 17.0_pre498229-r33 clang version 17.0.0 (/var/cache/chromeos-cache/distfiles/egit-src/external/github.com/llvm/llvm-project 14f0776550b5a49e1c42f49a00213f7f3fa047bf), LLD 17.0.0) #1 SMP PREEMPT_DYNAMIC Sat Sep 28 09:09:42 UTC 2024)
  • Cloud provider or hardware configuration: GCP
  • OS: Container-Optimized OS cos-beta-117-18613-0-66
  • Kernel: COS-6.6.44

@xvzf
Copy link

xvzf commented Oct 31, 2024

+1, same behaviour as @PierreBart

@shane-lawrence
Copy link
Contributor

Which version of k8s are you seeing this behavior on? I'm just starting to troubleshoot the same problem so I haven't had a chance to isolate it yet, but I'm only seeing it in k8s v1.31 so far.

@OneideLuizSchneider
Copy link
Author

@shane-lawrence 1.29

@shane-lawrence
Copy link
Contributor

Thanks @OneideLuizSchneider, sounds like the k8s version was a red herring and it must be some other difference that's triggering this issue. I will provide more context if I find something.

@PierreBart
Copy link

PierreBart commented Oct 31, 2024

@shane-lawrence I am running 1.31.1, same as you. I have clusters in 1.30.5, and it runs just fine.

@OneideLuizSchneider
Copy link
Author

OneideLuizSchneider commented Oct 31, 2024

I don't think it has something to do with the k8s version, because it's random on some Nodes, not on every Node.
The full version I'm running now is 1.29.8, and I did add and remove many Nodes since I posted here, and now I don't see this behavior anymore, I'm starting my tests on the 1.31 today and will add Falco there as well and I will come back here.

@tiny-pangolin
Copy link

I'm experiencing the issues on Fedora 40 and Fedora 41 hosts without Kubernetes. Sometimes falco works and other times it crashloop with the same config on different hosts. Could it have something to do with having too few resources available to falco at startup or it conflicting with other processes like auditd?

@xvzf
Copy link

xvzf commented Nov 4, 2024

@shane-lawrence we're observing it on GKE 1.31.1 right now

@Andreagit97
Copy link
Member

Andreagit97 commented Nov 4, 2024

Hi folks! could you try to enable the libs_logger to obtain more info on the failure? This is very likely a verifier issue, you can enable the logger by providing the falco binary with the following command line arguments -o libs_logger.enabled=true -o libs_logger.severity=debug so something like

sudo ./usr/bin/falco -c ./etc/falco/falco.yaml -r ./etc/falco/falco_rules.yaml -o libs_logger.enabled=true -o libs_logger.severity=debug

@PierreBart
Copy link

There it is:

Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falcoctl-artifact-install (init)
Mon Nov  4 09:58:02 2024: The --cri option is deprecated and will be removed in Falco 0.40.0. Use -o container_engines.cri.sockets[]=<socket_path> instead.
Mon Nov  4 09:58:02 2024: Falco version: 0.39.1 (x86_64)
Mon Nov  4 09:58:02 2024: Falco initialized with configuration files:
Mon Nov  4 09:58:02 2024:    /etc/falco/falco.yaml | schema validation: ok
Mon Nov  4 09:58:02 2024: System info: Linux version 6.6.44+ (builder@5b283881ec70) (Chromium OS 17.0_pre498229-r33 clang version 17.0.0 (/var/cache/chromeos-cache/distfiles/egit-src/external/github.com/llvm/llvm-project 14f0776550b5a49e1c42f49a00213f7f3fa047bf), LLD 17.0.0) #1 SMP PREEMPT_DYNAMIC Sat Sep 28 09:09:42 UTC 2024
Mon Nov  4 09:58:02 2024: Loading rules from:
Mon Nov  4 09:58:02 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
Mon Nov  4 09:58:02 2024: Hostname value has been overridden via environment variable to: gke-***
Mon Nov  4 09:58:02 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Mon Nov  4 09:58:02 2024: Starting health webserver with threadiness 4, listening on 0.0.0.0:8765
Mon Nov  4 09:58:02 2024: Loaded event sources: syscall
Mon Nov  4 09:58:02 2024: Enabled event sources: syscall
Mon Nov  4 09:58:02 2024: Opening 'syscall' source with modern BPF probe.
Mon Nov  4 09:58:02 2024: One ring buffer every '2' CPUs.
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': BPF program load failed: Permission denied
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function capset_x#984
0: R1=ctx(off=0,imm=0) R10=fp0
; int BPF_PROG(capset_x, struct pt_regs *regs, long ret) {
0: (bf) r7 = r1                       ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0)
; int BPF_PROG(capset_x, struct pt_regs *regs, long ret) {
1: (79) r9 = *(u64 *)(r7 +8)          ; R7_w=ctx(off=0,imm=0) R9_w=scalar()
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
2: (85) call bpf_get_smp_processor_id#8       ; R0_w=scalar(umax=3,var_off=(0x0; 0x3))
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
3: (63) *(u32 *)(r10 -8) = r0         ; R0_w=scalar(umax=3,var_off=(0x0; 0x3)) R10=fp0 fp-8=
4: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
;
5: (07) r2 += -8                      ; R2_w=fp-8
; return (struct ringbuf_map *)bpf_map_lookup_elem(&ringbuf_maps, &cpu_id);
6: (18) r1 = 0xffff8880490d1c00       ; R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)
8: (85) call bpf_map_lookup_elem#1    ; R0=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
9: (bf) r6 = r0                       ; R0=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0) R6_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
; if(!rb) {
10: (55) if r6 != 0x0 goto pc+6 17: R0=map_ptr(off=0,ks=0,vs=0,imm=0) R6=map_ptr(off=0,ks=0,vs=0,imm=0) R7=ctx(off=0,imm=0) R9=scalar() R10=fp0 fp-8=????mmmm
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
17: (85) call bpf_get_smp_processor_id#8      ; R0_w=scalar(umax=3,var_off=(0x0; 0x3))
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
18: (63) *(u32 *)(r10 -8) = r0        ; R0_w=scalar(umax=3,var_off=(0x0; 0x3)) R10=fp0 fp-8=
19: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
;
20: (07) r2 += -8                     ; R2_w=fp-8
; return (struct counter_map *)bpf_map_lookup_elem(&counter_maps, &cpu_id);
21: (18) r1 = 0xffff888097e63c00      ; R1_w=map_ptr(off=0,ks=4,vs=136,imm=0)
23: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=2,off=0,ks=4,vs=136,imm=0)
24: (bf) r7 = r0                      ; R0_w=map_value_or_null(id=2,off=0,ks=4,vs=136,imm=0) R7_w=map_value_or_null(id=2,off=0,ks=4,vs=136,imm=0)
; if(!counter) {
25: (15) if r7 == 0x0 goto pc+372     ; R7_w=map_value(off=0,ks=4,vs=136,imm=0)
; counter->n_evts++;
26: (79) r1 = *(u64 *)(r7 +0)         ; R1_w=scalar() R7_w=map_value(off=0,ks=4,vs=136,imm=0)
27: (07) r1 += 1                      ; R1_w=scalar()
28: (7b) *(u64 *)(r7 +0) = r1         ; R1_w=scalar() R7_w=map_value(off=0,ks=4,vs=136,imm=0)
; uint8_t *space = bpf_ringbuf_reserve(rb, event_size, 0);
29: (bf) r1 = r6                      ; R1_w=map_ptr(off=0,ks=0,vs=0,imm=0) R6=map_ptr(off=0,ks=0,vs=0,imm=0)
30: (b7) r2 = 66                      ; R2_w=66
31: (b7) r3 = 0                       ; R3_w=0
32: (85) call bpf_ringbuf_reserve#131         ; R0=ringbuf_mem_or_null(id=4,ref_obj_id=4,off=0,imm=0) refs=4
33: (bf) r6 = r0                      ; R0=ringbuf_mem_or_null(id=4,ref_obj_id=4,off=0,imm=0) R6_w=ringbuf_mem_or_null(id=4,ref_obj_id=4,off=0,imm=0) refs=4
; if(!space) {
34: (55) if r6 != 0x0 goto pc+7 42: R0=ringbuf_mem(ref_obj_id=4,off=0,imm=0) R6_w=ringbuf_mem(ref_obj_id=4,off=0,imm=0) R7=map_value(off=0,ks=4,vs=136,imm=0) R9=scalar() R10=fp0 fp-8=????mmmm refs=4
; return g_event_params_table[event_id];
42: (18) r1 = 0xffffc900015ba010      ; R1_w=map_value(off=16,ks=4,vs=248766,imm=0) refs=4
44: (71) r2 = *(u8 *)(r1 +353)        ; R1_w=map_value(off=16,ks=4,vs=248766,imm=0) R2_w=4 refs=4
; ringbuf->payload_pos = sizeof(struct ppm_evt_hdr) + nparams * sizeof(uint16_t);
45: (bf) r7 = r2                      ; R2_w=4 R7_w=4 refs=4
46: (67) r7 <<= 1                     ; R7_w=8 refs=4
47: (7b) *(u64 *)(r10 -32) = r7       ; R7_w=8 R10=fp0 fp-32_w=8 refs=4
; ringbuf->payload_pos = sizeof(struct ppm_evt_hdr) + nparams * sizeof(uint16_t);
48: (07) r7 += 26                     ; R7_w=34 refs=4
49: (b7) r1 = 20                      ; R1_w=20 refs=4
50: (7b) *(u64 *)(r10 -24) = r2       ; R2_w=4 R10=fp0 fp-24_w=4 refs=4
; PUSH_FIXED_SIZE_TO_RINGBUF(ringbuf, param, sizeof(int64_t));
51: (2d) if r1 > r2 goto pc+1         ; R1_w=20 R2_w=4 refs=4
; return g_settings.boot_time;
53: (18) r1 = 0xffffc90001c8adb8      ; R1_w=map_value(off=3512,ks=4,vs=600281,imm=0) refs=4
55: (79) r8 = *(u64 *)(r1 +0)         ; R1_w=map_value(off=3512,ks=4,vs=600281,imm=0) R8_w=scalar() refs=4
; hdr->ts = maps__get_boot_time() + bpf_ktime_get_boot_ns();
56: (85) call bpf_ktime_get_boot_ns#125       ; R0_w=scalar() refs=4
; hdr->ts = maps__get_boot_time() + bpf_ktime_get_boot_ns();
57: (0f) r0 += r8                     ; R0_w=scalar() R8_w=scalar() refs=4
; hdr->ts = maps__get_boot_time() + bpf_ktime_get_boot_ns();
58: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
59: (77) r1 >>= 56                    ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) refs=4
60: (73) *(u8 *)(r6 +7) = r1          ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
61: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
62: (77) r1 >>= 48                    ; R1_w=scalar(umax=65535,var_off=(0x0; 0xffff)) refs=4
63: (73) *(u8 *)(r6 +6) = r1          ; R1_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
64: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
65: (77) r1 >>= 40                    ; R1_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) refs=4
66: (73) *(u8 *)(r6 +5) = r1          ; R1_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
67: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
68: (77) r1 >>= 32                    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) refs=4
69: (73) *(u8 *)(r6 +4) = r1          ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
70: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
71: (77) r1 >>= 24                    ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
72: (73) *(u8 *)(r6 +3) = r1          ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
73: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
74: (77) r1 >>= 16                    ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
75: (73) *(u8 *)(r6 +2) = r1          ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
76: (73) *(u8 *)(r6 +0) = r0          ; R0_w=scalar(id=5) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
77: (77) r0 >>= 8                     ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
78: (73) *(u8 *)(r6 +1) = r0          ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->tid = bpf_get_current_pid_tgid() & 0xffffffff;
79: (85) call bpf_get_current_pid_tgid#14     ; R0=scalar() refs=4
80: (b7) r1 = 1                       ; R1_w=1 refs=4
; hdr->type = ringbuf->event_type;
81: (73) *(u8 *)(r6 +21) = r1         ; R1_w=1 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
82: (b7) r1 = 97                      ; R1_w=97 refs=4
83: (73) *(u8 *)(r6 +20) = r1         ; R1_w=97 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
84: (b7) r1 = 0                       ; R1_w=0 refs=4
; hdr->nparams = nparams;
85: (73) *(u8 *)(r6 +25) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
86: (73) *(u8 *)(r6 +24) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
87: (73) *(u8 *)(r6 +23) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->tid = bpf_get_current_pid_tgid() & 0xffffffff;
88: (73) *(u8 *)(r6 +15) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
89: (73) *(u8 *)(r6 +14) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
90: (73) *(u8 *)(r6 +13) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
91: (73) *(u8 *)(r6 +12) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->len = ringbuf->reserved_event_size;
92: (73) *(u8 *)(r6 +19) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
93: (73) *(u8 *)(r6 +18) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
94: (73) *(u8 *)(r6 +17) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
95: (b7) r1 = 66                      ; R1_w=66 refs=4
96: (73) *(u8 *)(r6 +16) = r1         ; R1_w=66 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->tid = bpf_get_current_pid_tgid() & 0xffffffff;
97: (bf) r1 = r0                      ; R0=scalar(id=6) R1_w=scalar(id=6) refs=4
98: (77) r1 >>= 24                    ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
99: (73) *(u8 *)(r6 +11) = r1         ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
100: (bf) r1 = r0                     ; R0=scalar(id=6) R1_w=scalar(id=6) refs=4
101: (77) r1 >>= 16                   ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
102: (73) *(u8 *)(r6 +10) = r1        ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
103: (73) *(u8 *)(r6 +8) = r0         ; R0=scalar(id=6) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
104: (77) r0 >>= 8                    ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
105: (73) *(u8 *)(r6 +9) = r0         ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->nparams = nparams;
106: (79) r1 = *(u64 *)(r10 -24)      ; R1_w=4 R10=fp0 fp-24=4 refs=4
107: (73) *(u8 *)(r6 +22) = r1        ; R1_w=4 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; PUSH_FIXED_SIZE_TO_RINGBUF(ringbuf, param, sizeof(int64_t));
108: (bf) r1 = r6                     ; R1_w=ringbuf_mem(ref_obj_id=4,off=0,imm=0) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
109: (0f) r1 += r7                    ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R7=34 refs=4
110: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
111: (77) r2 >>= 48                   ; R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) refs=4
112: (73) *(u8 *)(r1 +6) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) refs=4
113: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
114: (77) r2 >>= 56                   ; R2_w=scalar(umax=255,var_off=(0x0; 0xff)) refs=4
115: (73) *(u8 *)(r1 +7) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=255,var_off=(0x0; 0xff)) refs=4
116: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
117: (77) r2 >>= 32                   ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) refs=4
118: (73) *(u8 *)(r1 +4) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) refs=4
119: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
120: (77) r2 >>= 40                   ; R2_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) refs=4
121: (73) *(u8 *)(r1 +5) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) refs=4
122: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
123: (77) r2 >>= 16                   ; R2_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
124: (73) *(u8 *)(r1 +2) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
125: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
126: (77) r2 >>= 24                   ; R2_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
127: (73) *(u8 *)(r1 +3) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
128: (73) *(u8 *)(r1 +0) = r9         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R9=scalar(id=7) refs=4
129: (77) r9 >>= 8                    ; R9_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
130: (73) *(u8 *)(r1 +1) = r9         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R9_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
131: (b7) r1 = 8                      ; R1_w=8 refs=4
132: (6b) *(u16 *)(r6 +26) = r1       ; R1_w=8 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
133: (18) r1 = 0x1                    ; R1_w=1 refs=4
; if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_get_current_task_btf) &&
135: (15) if r1 == 0x0 goto pc+5      ; R1_w=1 refs=4
136: (18) r1 = 0x9e                   ; R1_w=158 refs=4
; if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_get_current_task_btf) &&
138: (55) if r1 != 0x9e goto pc+2     ; R1_w=158 refs=4
; return (struct task_struct *)bpf_get_current_task_btf();
139: (85) call bpf_get_current_task_btf#158   ; R0=trusted_ptr_task_struct(off=0,imm=0) refs=4
140: (05) goto pc+1
;
142: (bf) r7 = r0                     ; R0=trusted_ptr_task_struct(off=0,imm=0) R7_w=trusted_ptr_task_struct(off=0,imm=0) refs=4
143: (18) r1 = 0x1                    ; R1_w=1 refs=4
145: (79) r8 = *(u64 *)(r10 -32)      ; R8_w=8 R10=fp0 fp-32=8 refs=4
146: (79) r9 = *(u64 *)(r10 -24)      ; R9_w=4 R10=fp0 fp-24=4 refs=4
; READ_TASK_FIELD_INTO(&cap_struct, task, cred, cap_inheritable);
147: (15) if r1 == 0x0 goto pc+7      ; R1_w=1 refs=4
148: (18) r1 = 0x9e                   ; R1_w=158 refs=4
; READ_TASK_FIELD_INTO(&cap_struct, task, cred, cap_inheritable);
150: (55) if r1 != 0x9e goto pc+4     ; R1_w=158 refs=4
; READ_TASK_FIELD_INTO(&cap_struct, task, cred, cap_inheritable);
151: (79) r1 = *(u64 *)(r7 +1984)     ; R1_w=rcu_ptr_or_null_cred(id=8,off=0,imm=0) R7_w=trusted_ptr_task_struct(off=0,imm=0) refs=4
152: (79) r1 = *(u64 *)(r1 +48)
R1 invalid mem access 'rcu_ptr_or_null_'
processed 146 insns (limit 1000000) max_states_per_insn 0 total_states 7 peak_states 7 mark_read 5
-- END PROG LOAD LOG --
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': failed to load: -13
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load object 'bpf_probe'
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load BPF skeleton 'bpf_probe': -13
Mon Nov  4 09:58:02 2024: [libs]: libpman: failed to load BPF object (errno: 13 | message: Permission denied)
Mon Nov  4 09:58:02 2024: An error occurred in an event source, forcing termination...
Error: Initialization issues during scap_init
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

@Andreagit97
Copy link
Member

Andreagit97 commented Nov 4, 2024

@PierreBart thank you very much! this is an issue we already fixed in dev falcosecurity/libs#2118 let me check with other maintainers what we can do @falcosecurity/falco-maintainers

@FedeDP
Copy link
Contributor

FedeDP commented Nov 4, 2024

Since lots of people are having the issue, my 2c is to definitely make a patch release of libs (0.18.2) and then a patch release for Falco 0.39 (0.39.2).

@Andreagit97
Copy link
Member

I agree, I'm still investigating the fedora issue also reported here by @tiny-pangolin (#3323 (comment)), it would be great to have both of them in the patch

@jordyb6
Copy link

jordyb6 commented Nov 6, 2024

Same issue on almalinux 8.7 hosts, no kubernetes. Some hosts fail to restart after making a config change or whenever I make a tweak in one of the rules files. After a while they do succeed to startup again.
Using modern bpf driver.

@Andreagit97
Copy link
Member

Uhm interesting, I would try to separate this issue into several separate ones:

  1. issues with the latest GKE versions -> this is a verifier error already solved in dev [BUG] Verifier failure on cos-beta-117-18613-0-66 libs#2118
  2. issues with the latest Fedora versions or more in general with a kernel version >= 6.11.4. -> We know what is the issue and we are working on it.
  3. several restarts before the successful one -> this is still under investigation but we need more logs... as suggested here ERROR: Error: Initialization issues during scap_init #3323 (comment) please enable the libs logger.

More, in general, to understand under which category you fall please enable the libs logger:

  • in the falco config:
libs_logger:
   enabled: true 
   severity: debug
  • or directly from the command line:
sudo ./usr/bin/falco -c ./etc/falco/falco.yaml -r ./etc/falco/falco_rules.yaml -o libs_logger.enabled=true -o libs_logger.severity=debug
  1. The GKE error should be the following
-- END PROG LOAD LOG --
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': failed to load: -13
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load object 'bpf_probe'
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load BPF skeleton 'bpf_probe': -13
Mon Nov  4 09:58:02 2024: [libs]: libpman: failed to load BPF object (errno: 13 | message: Permission denied)
Mon Nov  4 09:58:02 2024: An error occurred in an event source, forcing termination...
  1. The Fedora error should be this one:
libbpf: prog 'pf_user': BPF program load failed: Invalid argument
libbpf: prog 'pf_user': -- BEGIN PROG LOAD LOG --
processed 282 insns (limit 1000000) max_states_per_insn 0 total_states 17 peak_states 17 mark_read 8
-- END PROG LOAD LOG --
libbpf: prog 'pf_user': failed to load: -22
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -22
libpman: failed to load BPF object (errno: 22 | message: Invalid argument)
  1. Still under investigation, nobody provided full logs for this case

@tordenist
Copy link

@Andreagit97 a possible patch release was mentioned. Any news on that becoming patch becoming available anytime soon?

@FedeDP
Copy link
Contributor

FedeDP commented Nov 11, 2024

We are still investigating the

several restarts before the successful one

issue. I'd say that we can expect a Falco patch release in a couple of weeks; sorry for the delay!
Also, please note that Falco is having CI issues right now that can slow down the process too.

@FedeDP
Copy link
Contributor

FedeDP commented Nov 11, 2024

/milestone 0.39.2

@poiana poiana modified the milestones: 0.40.0, 0.39.2 Nov 11, 2024
@Andreagit97
Copy link
Member

Andreagit97 commented Nov 11, 2024

ei @tordenist issues 1 and 2 reported here (#3323 (comment)) are solved in dev. It would be great to understand also the third one before releasing a patch release, if you are experiencing the third issue could you please provide the logs required above (@OneideLuizSchneider you are the initial reporter of the restart issue, could you please provide additional logs as suggested above?)

If we cannot reproduce the third issue, we may release just the fixes for the first 2. Maybe the third one is just an unhappy consequence of the first 2 but it would be great to understand it

@FedeDP
Copy link
Contributor

FedeDP commented Nov 11, 2024

@OneideLuizSchneider as andrea said, can you enable libs logging and send us some logs?
I think it might be an issue related with eg: selinux; let's see if logs confirm this.

@OneideLuizSchneider
Copy link
Author

OneideLuizSchneider commented Nov 12, 2024

@FedeDP @Andreagit97
Sorry for not getting back to you sooner, I was testing it and I was not able to simulate it anymore(like I said here as well #3323 (comment)).
I can send the logs if you still want to see them.

I'm using the image=public.ecr.aws/falcosecurity/falco-no-driver:latest

I did test it on:

  • EKS 1.29.8
  • EKS 1.30.4
  • EKS 1.31.0

@tiny-pangolin
Copy link

is there somewhere I can post debug logs to? the full run between setting is about 31000 lines

@Andreagit97
Copy link
Member

ei @tiny-pangolin you can upload a txt file here on the issue or you can create a gists as you prefer

@FedeDP
Copy link
Contributor

FedeDP commented Nov 21, 2024

Falco 0.39.2 is out, feel free to test it!
I will move this issue to 0.40.0 to track the only remaining problem :)
/milestone 0.40.0

@poiana poiana modified the milestones: 0.39.2, 0.40.0 Nov 21, 2024
@jordyb6
Copy link

jordyb6 commented Nov 21, 2024

Same issue on almalinux 8.7 hosts, no kubernetes. Some hosts fail to restart after making a config change or whenever I make a tweak in one of the rules files. After a while they do succeed to startup again. Using modern bpf driver.

Regarding my issue, it turns out the vm's that failed to restart falco didn't have enough free memory to start Falco.

@PierreBart
Copy link

0.39.2 fixes the issue for me, thanks @FedeDP and @Andreagit97 for your help!

@OneideLuizSchneider
Copy link
Author

@FedeDP @Andreagit97
I moved all the EKS Worker Nodes to AWS Linux 2023, and not even one restart anymore.
I tested it with many sizes, from t3.Medium to m7a.4xlarge, all good...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests