Tree Closures: Difference between revisions
Jump to navigation
Jump to search
m (→Closures) |
|||
Line 6: | Line 6: | ||
=== Closures === | === Closures === | ||
* Thursday March 6, 2008 | * Thursday March 6, 2008 | ||
** 7:00pm - | ** 7:00pm - 7:40pm | ||
*** closed due to fx-win32-tbox bustage | *** closed due to fx-win32-tbox bustage. Cleared stuck process, reopening assuming that was the issue rather than wait another hour for the PGO-lengthened build to finish | ||
* Wednesday March 5, 2008 | * Wednesday March 5, 2008 | ||
** 2:40pm - 4:00pm | ** 2:40pm - 4:00pm |
Revision as of 03:42, 7 March 2008
Overview
Whenever the main tinderbox tree has to be closed, please record the date, the close start time, a rough time when the problem first started (if different from the close start time), and eventually, a tree open time. We need this information in order to track infrastructure problems, and try to resolve them in the future.
Please keep all times in Mozilla Standard Time (US Pacific, same time as on tinderbox). Put more recent closures on top of old ones. Please include links to any relevant bugs.
Closures
- Thursday March 6, 2008
- 7:00pm - 7:40pm
- closed due to fx-win32-tbox bustage. Cleared stuck process, reopening assuming that was the issue rather than wait another hour for the PGO-lengthened build to finish
- 7:00pm - 7:40pm
- Wednesday March 5, 2008
- 2:40pm - 4:00pm
- Closed for orange on win3k3-01 that looks like memory corruption
- Caused by bug 418703, backed out
- 8am - 11:30am
- bugzilla/CVS down, closed tree for talos bustage because of no-CVS
- 2:40pm - 4:00pm
- Wednesday, February 27, 2008
- 9:30pm - 12:10am
- Overlooked 2 tests that had also broken during the day (bug 420028 due to removal of DOMi, bug 384370 due to incomplete backout) and were not focus issues
- Box has some other test/build failures that went away by themselves after being kicked
- 7:40pm - 9:30pm
- Windows box was orange with focus problems (bug 420010)
- Missed from earlier in day due to expected issues while PGO was landing
- 5:30pm - 7:40pm
- Closed to do Linux kernel upgrades (bug 407796)
- Quiet tree due to B4 freeze, and holding approval1.9b4 flags for enabling PGO on Windows, so closure has minimal impact
- Took a bit longer than expected due to a reboot problem (420007)
- 9:30pm - 12:10am
- Tuesday, February 26, 2008
- 22:38 - 00:49
- problem started at 22:33 when qm-win2k3-01 turned red again
- same problem as earlier in the day: "Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process" during tests
- myk closed the tree around 22:38 to wait out the bustage, since the machine is our only windows unit testerbox
- got stuck with an open file such that each build failed really quickly
- bug 419799 filed to get sysadmin help to fix unit test box
- mrz, who was on-call that evening, jumped into IRC and then went and kicked buildbot; first he closed some dialog about some process having crashed; then he restarted buildbot, but it didn't start building; then he killed both buildbit and its cmd process and then restarted it, after which it started building and completed successfully
- dbaron reopened the tree to metered checkins of b4 blockers
- 22:38 - 00:49
- Tuesday, February 26, 2008
- 18:44 - 19:40
- problems started with linux txul perf regression
- continued with fxdbug-win32-tb reporting: ###!!! ASSERTION: invalid active window: 'Error', file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/embedding/components/windowwatcher/src/nsWindowWatcher.cpp, line 1086
- continued with five cross-platform unit test failures, three reftests and two mochitests
- reed backed out bug 419452, one of two candidates for the perf regression
- dbaron fixed the three reftests and one mochitest, which were from his checkin for bug 363248
- myk backed out sicking's fix for bug 416534, which had caused the last mochitest failure
- various folks speculated that the fxdbug-win32-tb assertion was random (it didn't show up on Mac or Linux)
- myk reopened the tree, feeling that things were under control
- reed backed out second perf regression candidate (bug 395609) when initial backout didn't resolve it
- sicking fixed test failure in bug 416534 and relanded
- others started landing again
- unit test tinderboxes cycled green
- reed's second backout fixed perf regression
- reed's first or second backout also fixed the fxdbug-win32-tb assertion
- 18:44 - 19:40
- Tuesday, February 26, 2008
- 5:10pm - 5:46pm
- problem started at 5:06pm when qm-win2k3-01 turned red
- reed said that machine frequently hits this random bustage and then recovers
- reed previously noted at the top of the page that "qm-win2k3-01 is the only Windows unit test machine, so if it is orange or red, you should NOT check in."
- myk, the sheriff for the day, closed the tree to wait out the apparently random failure and then reopened it when the next build came up green
- reed thought there might be an old bug on the problem but wasn't sure, so dbaron filed bug bug 419761 on the problem to make sure it's tracked and not forgotten
- wolf also filed bug 419759 on fixing or replacing winxp01 so we aren't entirely reliant on win2k3-01 for windows unittests
- 5:10pm - 5:46pm
- Sunday, February 24, 2008
- 11:30am - 4:45pm (with orange lasting longer, with tree open)
- problem started 9:23am, amid some other bustage
- closed by dbaron a little after 11:30am
- filed bug 419328: Windows unit test box stopped cycling
- aravind hard-rebooted the box, came back with a bunch of popup tests orange.
- test still orange after another cycle (forced by dbaron)
- dbaron reopened tree 4:45pm despite the unfixed machine-related orange
- joduinn rebooted the box again around 4:30pm
- this time it came back with the color depth wrong, so the PNG reftests failed (but the mochitests worked)
- color depth issue fixed 6:45am Monday
- 11:30am - 4:45pm (with orange lasting longer, with tree open)
- Thursday, February 21, 2008
- 10pm - 12am
- Dietrich
- Closed to facilitate l10n string freeze
- 10pm - 12am
- Tuesday, February 19, 2008
- 9:10 PM - 10:45 PM
- No sheriff (night)
- Closed due to at least three people landing on orange (seemingly random TUnit failure on Windows). The tree was too wide for anyone to notice the orange. A mochitest timeout on Linux, also seemingly random, occurred almost immediately after.
- 1:00 AM - 3:00 AM (guess)
- No sheriff (night) (guess)
- Closed for experimental landing of bug 399852. The checkin stuck.
- 9:10 PM - 10:45 PM
- Wednesday, February 13, 2008
- 1:00 PM - 1:40 PM (problem first noticed around 9:40 AM)
- No sheriff
- bug 417313 -- graphs.mozilla.org can't keep up with data being submitted
- reopened after machines started going green; db load lessened, but underlying issue has not been fixed.
- 1:00 PM - 1:40 PM (problem first noticed around 9:40 AM)