Make the Smart Ping smarter
It happens from time to time that the Smart Ping reports a host as down for several minutes although the server itselve is running and the agent could be queried successfully.
The Smart Ping should consider all network packets that indicate an active host. Or, in case of a missing ping response, perform an alternative test such as a TCP check on port 22 as example.
Comments: 8
-
11 May, '22
Robert SanderThe icmpreceiver process already considers all packets that come from the host's IP address as indicative for the host being UP.
-
16 May, '22
Mike1098The behavior is configurable with the rule "Settings for host checks via Smart PING".
I would recommend to open a topic in forum to discuss. For us its working as Robert mentioned. -
19 May, '22
[t29] AnastasiosHi Lars,
Thanks to Robert and Mike. I agree with the answers and can point you to the docs. By default the smart ping will run every 6 seconds and will wait for at least one packet every 2.50 of the check interval. With the rule "Settings for host checks via Smart PING" you can adjust that behavior.
If your devices are really considered as down I would recommend discussing that on a forum thread, support case or taking a look here: https://kb.checkmk.com/display/KB/Debugging+of+smart+ping
Best Regards
Anastasios
T29 -
23 May, '22
Lars SörensenThank you very much for your feedback.
We unfortunately had several cases where the smart ping reported a host as unreachable for several minutes because no ICMP ping response was received, while the agent was successfully queried via SSH (using multiplexing) several times during that time. The SSH traffic should also have been counted as host up, but it was not (@Anastasios -> SUP-5973).
However, the idea here is to react more intelligently to such situations (if the customer so wishes) by using an alternative check like check_tcp or check_udp to verify that the host host is really unresponsive.
This does not add much overhead, as this more complex check would only need to be performed in the event of an unreachable host. This would make the whole process of detecting unreachable hosts more robust and reduces false host down alarms and wrong figures in the service avaliability report. -
23 Jun, '22
Marsellus WallaceHi Lars,
I did not have a look at your ticket, but if the tcp packets really did not make the core set the host to UP, I'd consider this as a bug.
What should be checked:
* Is the affected host configured with a specific/explicit Iyp address?
* Is the ssh connection done to the host's IP (which should the one CMK also uses to ping the host) or
* Is the ssh connection done to the host's name (which could be resolved in DNS differently to the IP CMK - currently - knows for this host and therefore uses for ping)
If the later is the case, this would explain why the reply-packets of the ssh connection do not make it to the smart ping detection - the IP of the packets cannot be mapped to the host. Same applies if the reply packages are sent by a different SRC IP than the one contacted (for whatever reason, maybe routing if that'd be possible). -
25 Sep, '22
Thomas Lippert AdminThe feature requested is implemented (see discussion). If it does not perform as expected, please open a support ticket for evaluation. I will close this feature request
-
26 Sep, '22
Mike1098Hello Thomas, It would be helpful to have the WERK here as reference.
Also the status is still on "Not Planned" and the request has only 8 votes :-O -
26 Sep, '22
Thomas Lippert AdminHello Michael, from the product side, we strongly believe that this request is about a bug. The feature as requested exists. So it makes no sense to keep it open. If our investigations of the support case show a feature gap in our implementation, we can reopen the request