Support redundant checkmk installations on slave sites
Currently only OS redundancy within the appliance is supported.
IMHO products such as pacemaker and DRBD are componens of a typical datacenter. Nowadays with cloud installations the focus should switch from this OS approach towards a more application based approach.
To start I suggest the following for the slave sites in a distributed monitoring setup:
* Replication "Push configuration to this site" should be possible to at least two sites
* In case of failure of the primary slave site the main site should automatically switch to the secondary slave site
* After the outage the secondary slave should stay alive and only switch back to the primary if needed
Of course additional things such as support of main sites are welcome
Comments: 4
-
21 Aug, '22
Andy"n case of failure"
That is the biggest problem today, Checkmk does not know when there is a failure, the appliance, for example only pings is partner, or another node to determinate if there is an error. I can't stop apache on the main server and no failover will occur, for example. -
22 Aug, '22
Robert SanderI.e. you have a central site for configuration not storing any data but controlling two identical remote sites. Both could be running in parallel but you would get double the alarms.
The central site would monitor the primary site and switch on the secondary if not available.
You just would lose the history (recorded metrics and events) with every switch-over. This is why the appliance uses DRBD, to replicate this data. -
20 Feb, '23
Thomas Lippert AdminFrom what I see, all your 3 topics listed are actually implemented:
* In distributed setup, the configuration is sent to all remote sites
* If the remote site is implemented via 2 Appliances in HA mode, the configuration will be automatically shared between the two sites
* After an outage, the second site will start. No automatic switch back to the former primary sites
What is not covered currently is an implementation of HA based on the application and not the OS level.
Do I miss anything? -
18 Oct, '23
Niklas Pulina AdminHello,
Thank you for your idea. On this portal, we carefully evaluate ideas to ensure that they will benefit a wide range of users. Thus, we close ideas not fulfilling certain criteria:
- Suggestions with low user interest: created more than 1 year ago with 5 votes or less
- Suggestions with no momentum: no votes in the last 6 months
Unfortunately, this suggestion doesn't meet these criteria, so we’re closing it (based on the data available until 2023-10-17). We appreciate your contribution and encourage you to continue to share your ideas. Your input plays a vital role in helping us improve our product for everyone.
Thank you for your understanding and continued support!
Warm regards,
Your Checkmk Team