Infrastructure issue
Incident Report for Eclipse Foundation Services

At approximately 20:50 (EDT) on July 21, 2021, a virtual-server host stopped responding, for reasons unknown. This server hosts, among other services, one of our internal DNS servers.

Although we run two DNS servers for redundancy, some services were only configured to use one DNS server (which is being rectified). It is unclear to us why certain hosted services were not querying the backup server even when configured to do so. Regardless, in this state, these affected services were unable to resolve important hostnames, such as those for user authentication.

The unresponsive host server and its guest VMs were brought back to service approximately 6 hours later, shortly after staff in the CET timezone became aware of the issue.

We will continue to test the specific conditions under which name resolution has failed, and implement fixes to ensure a single DNS server outage does not cause these problems again.

Posted Jul 22, 2021 - 11:04 EDT

This incident has been resolved.
Posted Jul 22, 2021 - 04:17 EDT
A fix has been implemented and we are monitoring the results.
Posted Jul 22, 2021 - 03:55 EDT
Fix is starting to deploy. Some services are coming back to normal.
Posted Jul 22, 2021 - 03:50 EDT
We are continuing to work on a fix for this issue.
Posted Jul 22, 2021 - 03:40 EDT
The issue has been identified and a fix is being implemented.
Posted Jul 22, 2021 - 03:20 EDT
It seems that some people are also having issues with gerrit, being unable to push/rebase patches.
Posted Jul 22, 2021 - 03:03 EDT
We're have received reports about log in issues.
Posted Jul 22, 2021 - 02:54 EDT
We are getting reports that some CI instances are unable to schedule agents. In addition, is not functional nor is the ECA checker. We're investigating.
Posted Jul 22, 2021 - 02:51 EDT
This incident affected: API (, CBI (, Core Services (, and Working Groups Websites (