Buildbot/OutageReports
We started collecting Outage Reports for Tinderbox last year as a means of determining what intermittent failures we were hitting on each platform. This allowed us to track failure patterns over time and helped us figure out where the highest value fixes were.
Many of the errors are difficult to fix or perhaps even unfixable (e.g. toolchain hangs on Windows), but having a history of outage reports with sufficient diagnostic information allows others (e.g. IT) to restart a hung system with outside intervention.
Sep 2007
Outage Template
On 2007-08-30 at 17:37, qm-winxp01 and qm-win2k3 experienced a service outage for 20 minutes.
What was affected:
unittests on qm-winxp01 and qm-win2k3
What was the cause of the outage:
Unable to roll cvsco.log files.
mv: cannot move `/cygdrive/c/slave/trunk/cvsco.log' to `/cygdrive/c/slave/trunk/cvsco.log.old': Permission denied make: *** [checkout] Error 1
Has this type of outage happened before?
No
What will be done to prevent this in the future:
Monitor for reoccurrence