NRPE master high load issue #244

younity-ENG · 2020-10-19T06:21:42Z

hi all

im using nagios core 4.4.3 with nagios-nrpe-plugin 3.2.1 installed on ubuntu 18.04.
it installed on AWS EC2 type t2.medium (2cpu, 4ram). my server is configured with 3 check_workers due to my 2 CPUs.
it servers as "on-site" with direct host/service checks via VPN and as an NRPE master server.
the external commands are mostly ping and around 4 http/dns checks.
around 100 direct services and 350 NRPE services (one host)
when adding more NRPE agents (400 services each) the master load is rising and I'm getting "localhost load" alerts
localhost/Current Load is CRITICAL: CRITICAL - load average: 1.31, 1.48, 4.01

while monitoring the server with Htop I see that the CPU uses repeatedly reaches to 100%.
I've looked online and found some recommendations that didn't really help.

using check_fping instead of check_ping plugin
external_command_buffer_slots=512
use_large_installation_tweaks=1

using Htop i see the CPU spikes accrues when external commands are executed.

does anyone have any idea why my CPU is so high?
shouldn't Nagios handle thousands of services (with the right configuration) .
ill appreciate any tips and recommendations.

thanks

The text was updated successfully, but these errors were encountered:

younity-ENG · 2020-10-21T06:04:11Z

hi
ant ideas regarding this issue?

thanks

sawolf · 2020-11-06T21:56:56Z

Hi, thanks for reporting this. Can you elaborate on your current system architecture?

It sounds to me like you're saying you have

One Nagios Core server
100 direct services not related to NRPE
350 services using check_nrpe, all put on the same host, and possibly interacting with the same remote server.
and that you're adding additional remote servers, each of which results in you adding ~400 check_nrpe checks to your nagios config.

I guess my question is - how many of these agents are you adding before you see the CPU load increase?

Also, I recommend increasing the number of check_workers, since those will block on network requests. It may not affect anything, but if any of these plugins take a long time to execute, the worker will just be sleeping for that whole time.

If you're adding a lot of these agents (so that you have 5000+ services), you might want to look into something like mod_gearman to distribute the work being done.

younity-ENG · 2020-11-08T09:38:25Z

hi Sebastian

thank you for responding.
basically the average load is getting high since the second agent.
i did increase my HW resources and the number of workers (4cores and 6 workers) but it didn't really help.
i understand you recommend trying the mod_gearman for this scale.
im not familiar with this module.
dose it mean that the remote agent will be mod_gearman and not NRPE?

thanks

younity-ENG · 2020-11-08T10:52:51Z

are you familiar with NRDP?

sawolf added the Need Information label Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NRPE master high load issue #244

NRPE master high load issue #244

younity-ENG commented Oct 19, 2020

younity-ENG commented Oct 21, 2020

sawolf commented Nov 6, 2020

younity-ENG commented Nov 8, 2020

younity-ENG commented Nov 8, 2020

NRPE master high load issue #244

NRPE master high load issue #244

Comments

younity-ENG commented Oct 19, 2020

younity-ENG commented Oct 21, 2020

sawolf commented Nov 6, 2020

younity-ENG commented Nov 8, 2020

younity-ENG commented Nov 8, 2020