As we were scaling up the OKD cluster with new worker nodes, a typo sneaked in a DHCP configuration. It did not caused issues when this configuration has been deployed. It only did when the leases were renewed. The leases started to renew this morning and caused some services running on the cluster to be unavailable.
Most services were quickly restored, but some nodes were behaving erratically. We’ve now re-synched all the nodes network configuration and all services are up and running.
We will investigate how to prevent that an issue with the DHCP servers to take down services like it did this morning.
Thanks for your patience while we were working this out.