Several Jenkins instances are down
Incident Report for Eclipse Foundation Services
Resolved
This incident has been resolved.
Posted Feb 04, 2022 - 15:56 EST
Update
Everything seems stable, but we will continue to monitor things.
Posted Feb 04, 2022 - 09:12 EST
Monitoring
Update complete, we are now monitoring.
Posted Feb 03, 2022 - 17:04 EST
Update
Interim update has been successfully completed, we are now preforming the next update
Posted Feb 03, 2022 - 14:53 EST
Update
This seems to be related to the most recent cluster update. We are going to do an emergency update to the latest stable release to attempt to address this issue.
Posted Feb 03, 2022 - 12:12 EST
Identified
Another machine just fell down. We're on it.
Posted Feb 03, 2022 - 04:15 EST
Monitoring
All Jenkins instances are back online for a while now.

We are monitoring the situation.
Posted Feb 03, 2022 - 04:14 EST
Identified
More and more Jenkins instances are coming back online as we are moving them around on different nodes. The root cause is not yet identified, but our current procedure mitigates the effect for end users. Once all Jenkins instances are back online, we will open a ticket to track down the root cause.

Note that we're still affected by https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/682, so if your jobs stays in the queue several minutes while no other job is being executed, please open a ticket at https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues so that we can restart your instance and bypass the startup race condition. Thanks!
Posted Feb 03, 2022 - 03:32 EST
Update
The build cluster is very unstable since last upgrade on Sunday. Machines are becoming "not ready" for no apparent reason.

Jenkins instances are not being automatically re-scheduled in this case until it is manually forced (to ensure that no 2 instances write to the same Jenkins home).

We are in the process of draining the failed nodes so that instances get re-scheduled on working ones. We do that slowly to avoid storming the remaining machines.

We are investigating the root cause.
Posted Feb 03, 2022 - 02:56 EST
Investigating
We are currently investigating this issue.
Posted Feb 03, 2022 - 02:47 EST
This incident affected: CBI (ci.eclipse.org).