Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic deadpool poll time for faster exit detection on linux #148

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

avrecko
Copy link
Contributor

@avrecko avrecko commented Feb 11, 2024

Noticed on Linux that sometimes it takes the full DEADPOOL_POLL_INTERVAL for the exit to be detected.

I think it might make sense to check for exit a lot sooner than DEADPOOL_POLL_INTERVAL when detecting a close of a stream.

…e rapidly.

Maybe should update the readme as well?

Also removed field cleanlyExitedBeforeProcess that is not set by anyone. Looks like it is a left over from previous versions.
@avrecko
Copy link
Contributor Author

avrecko commented Feb 12, 2024

Let's wait with this a bit. Need to investigate some more as seeing weird delays in detecting the exit status on the virtual machine but not real. Hm...

@avrecko
Copy link
Contributor Author

avrecko commented Feb 14, 2024

I've looked why sometimes echo foo took a few ms to complete and sometimes the full poll interval.

If I taskset the test program to the same CPU it always takes full poll interval. Here is perf sched timehist

    6257.725092 [0003]         echo[24612]                                                      awakened: java[24263/24262]
    6257.725101 [0003]     s   echo[24612]                         0.011      0.000      0.072                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.725156 [0003]         java[24263/24262]                                                awakened: kworker/u8:3-ev[21178]
    6257.725287 [0003]     s   java[24263/24262]                   0.072      0.008      0.186                                    futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
    6257.727160 [0003]         echo[24612]                                                      awakened: ProcessQueue0[24280/24262]
    6257.727175 [0003]     s   echo[24612]                         0.186      0.000      1.888                                    exit_to_user_mode_loop <- exit_to_user_mode_prepare <- syscall_exit_to_user_mode <- do_syscall_64
    6257.727253 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.727497 [0003]     s   ProcessQueue0[24280/24262]          3.275      0.014      0.322                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.727589 [0003]         echo[24612]                                                      awakened: ProcessQueue0[24280/24262]
    6257.727627 [0003]     s   echo[24612]                         0.322      0.000      0.130                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.727686 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.727847 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728105 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728147 [0003]     s   ProcessQueue0[24280/24262]          0.130      0.038      0.519                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.728156 [0003]         echo[24612]                                                      awakened: ProcessQueue0[24280/24262]
    6257.728172 [0003]     s   echo[24612]                         0.519      0.000      0.024                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.728223 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728317 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728382 [0003]     s   ProcessQueue0[24280/24262]          0.024      0.016      0.210                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.728519 [0003]     s   echo[24612]                         0.210      0.000      0.136                                    do_task_dead <- do_exit <- do_group_exit <- __x64_sys_exit_group <- do_syscall_64
    .
    .
    .
    the 250 ms+ before ProcessQueue0 shows up again and detected the exit and starts echo foo again
    6257.978137 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.978330 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.978451 [0003]         ProcessQueue0[24280/24262]                                       awakened: migration/3[36]
    6257.978524 [0003]         ProcessQueue0[24280/24262]                                       awakened: rcu_preempt[17]
    6257.978547 [0003]     s   ProcessQueue0[24280/24262]        249.670      0.050      0.493                                    exit_to_user_mode_loop <- exit_to_user_mode_prepare <- syscall_exit_to_user_mode <- do_syscall_64
    6257.978563 [0003]     s   migration/3[36]                     0.000      0.095      0.016                                    smpboot_thread_fn <- kthread <- ret_from_fork
    6257.978570 [0003]     s   rcu_preempt[17]                   171.376      0.039      0.007                                    schedule_timeout <- rcu_gp_fqs_loop <- rcu_gp_kthread <- kthread
    6257.978618 [0003]         ProcessQueue0[24280/24262]                                       awakened: java[24263/24262]
    6257.978654 [0003]     s   ProcessQueue0[24280/24262]          0.023      0.000      0.083                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.978691 [0003]         java[24263/24262]                                                awakened: C2 CompilerThre[24271/24262]
    6257.978699 [0003]         java[24263/24262]                                                awakened: C1 CompilerThre[24272/24262]
    6257.978775 [0003]         java[24263/24262]                                                awakened: kworker/u8:3-ev[21178]
    6257.979224 [0003]         java[24263/24262]                                                awakened: java[24613]
    6257.979233 [0003]     s   java[24263/24262]                 253.366      0.035      0.579                                    schedule_timeout <- __wait_for_common <- wait_for_completion_killable <- kernel_clone
    6257.979250 [0003]     s   C2 CompilerThre[24271/24262]      254.722      0.541      0.017                                    futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
    6257.979619 [0003]     s   C1 CompilerThre[24272/24262]      254.455      0.551      0.368                                    futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
    6257.979920 [0003]         java[24613]                                                      awakened: java[24263/24262]
    6257.979935 [0003]     s   java[24613]                         0.000      0.394      0.316                                    __cond_resched <- down_write_killable <- exec_mmap <- begin_new_exec <- load_elf_binary
    6257.979950 [0003]     s   java[24263/24262]                   0.702      0.014      0.015                                    pipe_read <- new_sync_read <- vfs_read <- ksys_read
    6257.980061 [0003]         echo[24613]                                                      awakened: java[24263/24262]
    6257.980072 [0003]     s   echo[24613]                         0.015      0.000      0.121                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop    

the problem is this

  6257.728172 [0003]     s   echo[24612]                         0.519      0.000      0.024                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.728223 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728317 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728382 [0003]     s   ProcessQueue0[24280/24262]          0.024      0.016      0.210                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.728519 [0003]     s   echo[24612]                         0.210      0.000      0.136                                    do_task_dead <- do_exit <- do_group_exit <- __x64_sys_exit_group <- do_syscall_64

it looks like the following is happening:

echo foo closed the streams but gets preempted before exits,
epoll wakes up sees the closed streams checks for exit status, sees echo still alive, does the wait on 250ms gets preempted
echo foo is scheduled, it finally exits
.... wait 250ms
epoll wakes up sees echo as exited.

aside: this is why I want to have a different name for the threads, ProcessQueue0 is very generic in the report. NuProcessPoller would be more obvious.

@avrecko
Copy link
Contributor Author

avrecko commented Feb 14, 2024

Did a test: Using NuProcess in a loop to do 1000 calls to echo foo, using 1 executor, pinned to 1 cpu core. The case I looked with perf sched.

without any code changes the non-blocking LibC.waitpid(linuxProcess.getPid(), ret, LibC.WNOHANG) with 250ms poll time took 26 seconds for 100 calls to complete, expecting 260 seconds to complete 1000 calls. Didn't wait.

blocking Java Standard Library ProcessBuilder took ~3.6 seconds to complete 1000 calls.

changing the code for #cleanupProcess to do a blocking wait LibC.waitpid(linuxProcess.getPid(), ret, 0) took ~2.8 seconds to complete 1000 calls. This is even better than Java ProcessBuilder.

non-blocking with dynamic poll time took around 5.5 seconds to complete. While not as fast as the blocking code, but not too bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant