Integrated & automatic Agent Failover from one site to another (backup-) site

6 votes

Hello,
in a distributed setting it is currently the case that if a site fails, all the agents attached to it simply disappear in the master gui. There is no built-in way that in this case the agents are automatically migrated to another site. You can write your own script, which does this, but such scripts sometimes run for hours, if you have e.g. 1000 hosts on one site. This is due to the fact that each agent must be discovered again. I would like to see an integrated agent failover solution, where an agent is automatically moved to another site (ideally without the need for rediscovery). The backup site should be definable, possibly via a dropdown field in the host config. If the failed site comes back online at a later time, it should be possible to switch back manually or automatically (selectable).

As an enterprise customer, I would expect such a feature in the enterprise version of checkmk, as I know it from other monitoring tools (e.g. IBM Tivoli Monitoring).

Under consideration Site management Suggested by: Christian Friedrich Upvoted: 08 May, '23 Comments: 3

Comments: 3