Tree Closures: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Adjust instructions now that we've switched to treestatus)
(Obsolete page)
Line 1: Line 1:
Whenever the main tinderbox tree has to be closed, please record the date, the close start time, a rough time when the problem first started (if different from the close start time), and eventually, a tree open time.  We need this information in order to track infrastructure problems, and try to resolve them in the future.
Tree closure dates are no longer recorded here, please see the logs on:  
 
https://treestatus.mozilla.org/
Please keep all times in Mozilla Standard Time (US Pacific, same time as on tinderbox).  Put more recent closures on top of old ones.  Please include links to any relevant bugs.
 
Live status for your tree can be found on https://treestatus.mozilla.org/
 
== 2012 ==
* July 26: Unable to access build summaries/logs, {{bug:777634}}
* July 22: {{bug|776142}}
* April 25: closed due to hg.mozilla.org flakiness, {{bug|748939}}
 
== 2011 ==
* Sept 28: Closed to land PRBool -> bool switch. {{bug|675553}}
* Aug 5: Closed again due to tinderbox messages queue explosion
* Aug 3: Closed due to tinderbox messages queue explosion {{bug|676219}}
* Jul 29: Closed due to Stage server bustage {{bug|675170}}
* Jul 28: Closed due to Android ndk deployed with wrong build config - {{bug|674855}}
* Jul 1: Closed because Android had a permaorange and we don't have enough Tegra builders
* June 1: Closed due to issues with surf - {{bug|661386}}
* May 24: Closed due to DNS issues - {{bug|659238}}
* May 23: Closed because people wanted to land lots of patches before the Aurora merge
* May 6: Closed, {{bug|655197}}.
* Apr 18 - 7:30-9:50: Closed due to Linux PGO landings that had some bumpiness on landings - {{bug|559964}}
* Apr 18 - 5:14-7:30: Closed due to bug 653405 and then trying to land
* Mar 1: Closed tree due to tbpl not reporting results. {{bug|637594}}
* Feb 16: Closed to land {{bug|626602}} and {{bug|629799}}.
 
== 2010 ==
* Dec 23: Closed tree due to broken w32 builds, see {{bug|621183}}
* Sep 26:  enable universal 10.6 osx builds and updates
* Sep 19:  650castro network upgrades
* Aug 10:  Builds not showing up on tinderbox {{bug|586179}}
* Aug 5: Closed tree due to tinderbox timeouts & large build backlog {{bug|584365}}
* June 6 - June 8: Closed tree due to [http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/fe821769f03d09d6# hg file rename] fallout.
* May 25 11:30 am - 3:20 pm: Closed due to outages from this morning's MPT connectivity issues. {{bug|568005}}
* May 11 2:30 pm: Closed because builds weren't building and a backlog of unbuilt changes was building up (bsmedberg)
* May 9 3am - : Closed because Windows slaves can only stay connected for 3 minutes {{bug|555794}}
* May 8 6pm - May 9 3am: Closed because talos slaves that weren't ready to be put in production had been put in production and were failing from not having hg {{bug|564658}}
* May 8 10am - 6pm : Closed because nobody bothered to tell us that the talos slaves had been switched back to the old connection
* May 7 11pm - May 8 10am : Closed because the talos slaves weren't switched back to the old connection, and were still failing
* May 6 7pm - May 7 11pm : Closed for "replacing the Netscreen SSG in San Jose with a Juniper SRX240 to better handle the VPN traffic from the build cluster in Castro." {{bug|563968}}. (And then reverting the change.)
* Apr 29 12pm : Closed to land the add-ons manager rewrite
* Apr 17 2pm : Netapp upgrade, taking hg, svn, others, offline {{bug|502151}}
* Apr 15 11am : Mountain View lost connectivity, taking talos offline {{bug|559617}}
* Apr 9 7pm - 8pm : Mac debug tests missing {{bug|558501}}
* Apr 8 10:30pm - Apr 9 8am : Mac opt tests missing {{bug|558258}}
* Apr 2 8am-9amPDT : scheduled talos downtime {{bug|555327}}
* Mar 29 2pm - 7pm : Windows builders timing out uploading builds {{bug|555794}}
* Mar 18 9pm - Mar 19 1:35 PDT: Linux debug tests missing {{bug|553769}}
* Mar 18 5:30-7pm PDT - Graph server needed to have its ears boxed {{bug|553750}}
* Mar 15 2pm - Mar 16 4pm, PDT - Connectivity issues between MV and Mpt caused perma-red on many Talos and build boxes {{bug|552506}}
* Mar 12 - Graph server bustage again. (IT replaced graph server box to fix, I think)  (Closed from approx 9am - 5pm PST)
* Mar 10 0800 - graph server bustage again {{bug|548371}}
* Mar 8 - graph server issues required closing of all trees {{bug|548371}}
* Mar 4 - buildbot master problems on an exceptionally busy day resulted in lost builds, need to figure out which changes are causing the orange
* 2010 Feb 12: [https://bugzilla.mozilla.org/show_bug.cgi?id=543034 Windows compiler crashes during PGO] are happening so frequently that we haven't had an opt build to test in nearly 20 hours.
 
== 2009 ==
* Tue Dec 1 4:00pm PST - 7:50pm - buildbot master is overloaded & misreporting failures ({{bug|532228}}, followed by colo hvac issue
* Tue Nov 24 6:05pm PST - 8:15pm PST - buildbot master restart needed for Firefox 3.6b4 builds
* Fri Nov 20 6:05pm PST - 1.9.2+mobile for Mountain View power outage {{bug|524047}}
* Thu Nov 19 4:10pm PST - 5:45 PST - ftp.m.o & stage.m.o broken {{bug|529961}}
* Mon Nov 2 1:59pm PST - 2:32pm PST - Closed to let HG recover from a DoS
* Thu Oct 22 5:48am PDT - 12:30pm PDT - Closed for scheduled testing of split mochitest setup
* Sun Oct 11 9:38pm PDT - Mon Oct 12 2:40am PDT - Only one Windows build slave still working ({{bug|521722}}), trying to cover all the builds for trunk and 1.9.2 and falling further and further behind
* Fri Sep 25 3:41am PDT - 9:47am PDT - Lots of orange. Caused by fallout from {{bug|473506}} (backed out), with some contribution from {{bug|518274}} (test disabled).
* Thu Sep 24 5:14am PDT - Scheduled RelEng maintenance
* Sun Aug 09 4:15pm - midnight PDT - Air conditioning failure at colo, {{bug|509351}} for unfixed fallout
* Wed July 22 6am - 11am PDT - Adding electrolysis branch {{bug|500755}}, Maemo TraceMonkey builds {{bug|505219}}, enabling TraceMonkey leaktest builds {{bug|504435}}
* Wed July 21 10:15pm - 11:30pm - {{bug|505669}}, buildbot master died
* Sat Apr 18 11am - ? Talos downtime, roll out of: {{bug|480413}} - Bug 480413 (design test to monitor browser shut down time)
* Wed Mar 25: 3:30pm - Thu 4:45am, too much randomness while storage array recovering {{bug|485123}}
* Tue Mar 17: 11:50 am - 2pm build timeouts due to swapping? {{bug|472463}} is related
* Tue Mar 3: 11:30am - ??
 
== 2008 ==
** Closed by platform meeting to let Beta 3 blockers land cleanly
* Wed Jan 9: 10.30am (approx) - ?
** Closed because of orange in the tree.
* Wed Jan 6: 14.30pm (approx) - 15:07pm
** Closed because of orange and red trees.
* Wed Dec 31: 9.30am (approx) - 17:25pm PST
** Closed because of DHCP problem in build network. See details in {{bug|471679}}.
* Tuesday Dec 23, 19:30-20:20PST
** Closed For Joe Drew to work on {{bug|455508}} - 20% Tp regression on linux, September 5
* Tuesday Dec 23, 14:00-17:30 PST
** Talos maintenance {{bug|463020}}
*** https://bugzilla.mozilla.org/show_bug.cgi?id=470113 (move talos jss machines from firefox3.0 to moz-central)
*** https://bugzilla.mozilla.org/show_bug.cgi?id=379233 (pageload tests should flush out layout before timing load end)
**** backed out, caused browser freeze ups
*** https://bugzilla.mozilla.org/show_bug.cgi?id=444174 (more readable/compact talos output to waterfall)
* Friday Dec 12, 06:00-10:57
** planned release team maintenance downtime (06:00-08:00), see [http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/a2add6959ff0bda8# the dev.planing post] for details
** unit test boxes did not cycle green until 10:57
* Tuesday Dec 9, 13:50-Wed 02:25
** Emergency ESX maintenance (note from Aravind in dev.planning) took out boxes like graph server, caused rampant network related bustage.
** Could have reopened much earlier than this, just no-one round to do it
* Monday, Dec 8, 10:25 - 13:11
** reftests had been broken for 10.5 hours due to error in manifest file that wasn't actually causing orange ({{bug|468476}})
** closed tree until fix cycled to prevent more from piling on
** hg.mozilla.org also misbehaving (pushlog db locked); hard to push fixes or load pushlog
* Friday, Dec 5, 18:20 - 19:05
** Waiting on Windows talos machines to start a run that includes the perf-sensitive [http://hg.mozilla.org/mozilla-central/rev/a0c0ed9f461f changeset a0c0ed9f461f]. (Talos had ignored the last 6 completed builds)
* Thursday, Dec 4, 15:00 - 20:10
** {{bug|468014}} Investigating mozilla-central Vista TS, TP3, and TSVG increases. This is likely due to rebooting the Vista talos servers {{bug|463020}} especially since it brought these numbers up to around the same range as 1.9.1 as well as XP on both 1.9.1 and 1.9.2
** Rebooted 1.9.2 Vista talos boxes and waiting on the results.
** After further investigation it appears that rebooting the talos systems caused this. See {{bug|468014}} for more details.
* Thursday, Dec 4, 12:10 - 13:10
** Talos server reboots performed by Chris AtLee - catlee {{bug|463020}}, {{bug|467797}}, and {{bug|467796}}
* Tuesday, Nov 12, 15:35
** null pointer dereference causing crashes on linux and OSX leak test build boxes; {{bug|464571}}
* Tuesday, Nov 12, 08:00
** Tinderbox and various other infrastructure down
* Friday, Nov 7, 0200 - 05:00
** Backing out changesets to find the cause of the 10% Ts on OSX
* Friday, Oct 24, 09:00 - 12:30
** Talos Maintenance - {{bug|443979}}, {{bug|459598}}, and {{bug|457885}}
* Sunday, Sep 28, 17:54 -
** performance regression tracked in {{bug|457607}}
* Friday, Sep 26, 13:54 - 15:32
** reftests orange due to botched reftest.list change by sgautherie
** windows still leaking from sdwilsh's landing and backout
*** required multiple corrections to patch to {{bug|455940}}
* Friday, Sep 26, 10:45 - 13:52
** sdwilsh is trying to land the places fsync work again and wants the tree closed for stable perf numbers
** sdwilsh backed out
** tree remained closed for tracking down performance regression from day before, {{bug|457288}}
** sdwilsh backed out more
* Wednesday, Sep 24, 12:00 - 7:00
** New Windows boxes moz2-win32-slave07 / 08 are orange due to leaks
** Old qm-win2k3-moz2-01 box had been leaking too
** Tracked down to bug 454781 from 9/20, which had unfortunately landed in the middle of a period with other bustage and leaks. Fun!
* Tuesday, Sep 23, 9:00
** Closed because of MPT power outage.
* Monday, Sep 22, 3:30
** Closed for bug {{bug|456463}}
* Thursday, Sep 18, 6:30 - Friday 8:00
** Places fsync work will be landing once the tree gets greener - possible perf regressions
** Window unit test boxes appeared to be hanging, so places fsync was backed out
** Places fsync work backout caused leaks; clobber requested {{bug|455934}}
** Closed for {{bug|455791}}, resolved by backing out {{bug|454735}}.
** Talos bustage may or may not be fixed, see discussion in {{bug|455791}}... but it's all green currently, so reopening the tree
* Tuesday, Sep 16, 12:50 - Wednesday, Sep 17, 3:20
** sdwilsh's third landing of the new SQLite {{bug|449443}} is still causing a huge Ts spike on Linux, despite no hit on tryserver.
** Rather than immediately backing it out, we are trying to gather a little bit of data to help him understand what's going on, since he can't reproduce this offline.
** First step is clobbering the linux machines that feed talos, since tryserver is always a clobber build, but the tinderbox machines aren't, and that's the only real difference (identical images, same hardware).
* Friday, Sep 12, 10:00 - 11:45
** Fix up windows talos boxes {{bug|430832}}, {{bug|419936}}
* Thusrday, Sep 11, 13:00 - 17:30
** memory usage regression (working set/rss) {{bug|454865}} - Started on 09-09-2008 ~ 18:40.
* Tuesday, Sep 9, 15:30 -
** Talos maintenance {{bug|419935}}
* Thursday, Sep 4, 16:00 - 19:45
** Scheduled downtime to make talos mac machines boot cleanly {{bug|419935}} {{bug|419933}}
* Tuesday, Sep 2, 16:15 -
** Closing tree to get it green in order to land tracemonkey updates, and update tinderbox.
* Tuesday, Aug 26, 05:40 - 10:00
** Tree closed to track the perf impact of landing {{bug|432131}}
* Wednesday, Aug 13, ~8am - 10:20pm
** Tree closed due to perf regression (bug 450401). Unable to find cause, reopened tree.
* Tuesday, Aug 12, 8:00 -
** Scheduled unit-test master migration/downtime
* Saturday, July 26, 09:22 - 11:53, 12:30 - Sun 04:30
** Mac OS X builder out of disk space, {{bug|448115}}
** problem seems to have gone away on its own, although disk space probably still low
* Friday, July 25, 09:45 - Saturday, July 26, 09:10
** talos machines all broke due to stage.mozilla.org
*** no ETA given
** turned green around 19:00-20:00
** turned red again around 21:30
*** dbaron filed {{bug|448079}}
** {{bug|448019}} had already been filed earlier in the day, but not linked from here or tinderbox
** hardware on stage was replaced; talos went green again
* Friday, July 18, 20:30-23:00
** brendan checked in a patch ({{bug|445893}}) that made xpcshell hang or crash on Windows
** {{bug|446143}} filed to get tinderboxes fixed
*** since this requires manual maintenance, see {{bug|445578}}
*Wednesday, July 16, 10am-5pm
** tree effectively closed most of the day due to multiple sources of orange
*** no active fixing until around noon, when dbaron backed out {{bug|431842}}
** Windows tinderboxes needed manual maintenance ({{bug|445571}}) after xpcshell test hang
*** filed {{bug|445578}} on making this case not require manual maintenance
** filed {{bug|445610}} on making it more likely that multiple simultaneous failures will all be caught
*Friday, July 11, 4:10pm-6:10pm
** multiple failures on linux and windows unit tests prompted closure. Backed out a test change that broke other tests that relied on the changed one.
*Friday, July 11, 7:30am - 11:30am PDT
** both linux and both windows test boxes were orange, so the tree was closed.
** WINNT 5.2 mozilla-central qm-win2k3-moz2-01 dep unit test went green all by itself
** Linux mozilla-central qm-centos5-moz2-01 dep unit test went green all by itself
** The browser window was not focused for WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test (45 reftests were failing).  Box was stopped, focus restored, and a new test run was kicked off by bhearsum (no bug filed).
** Linux mozilla-central qm-centos5-03 dep unit test went green all by itself.
** Linux mozilla-central qm-centos5-moz2-01 dep unit test went orange
*** leaked 124036 bytes during test execution
** WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test went orange
*** 28 reftest failures
** Linux mozilla-central qm-centos5-03 dep unit test went orange again failing [http://mxr.mozilla.org/mozilla-central/source/toolkit/components/downloads/test/unit/test_bug_406857.js test_bug_406857.js]
** Linux mozilla-central qm-centos5-moz2-01 dep unit test still orange
*** leaked 124036 bytes during test execution
*** failed an xpcshell test case ([http://mxr.mozilla.org/mozilla-central/source/toolkit/components/downloads/test/unit/test_sleep_wake.js test_sleep_wake.js]) with lots of Gdk-CRITICAL assertions
*** failed one chrome test ({{bug|443763}})
** WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test still orange.
*** No more test failures.
*** leaked 292389 bytes during test execution
** Linux mozilla-central qm-centos5-moz2-01 dep unit test still orange.
*** failed one chrome test ({{bug|443763}})
*** leaked 124036 bytes during test execution
** Linux mozilla-central qm-centos5-03 dep unit test went green.
** Linux mozilla-central qm-centos5-moz2-01 dep unit test.
*** Tree re-opening since we have coverage on at least one windows machine for metered checkins (11:30am)
** WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test went green (11:36am)
 
* Thursday, July 10, 12:49pm - 10:44pm PDT
** qm-moz2mini01 went orange, reporting 300k of leaks, and enough other tinderboxes were orange or red (although the issues were understood and being addressed) to warrant the precaution of closing the tree while investigating qm-moz2mini01.
** qm-moz2mini01 subsequently went green in the following cycle without explanation.
** After that, the sheriff (Myk) held the tree closed until at least one of the two windows unit test boxes (which had both been clobbered to resolve a residual problem from an earlier checkin that had been backed out) finished building successfully.
** But those machines both went orange with 45 MochiTest failures, so the sheriff had the four patches since the previous build backed out.
** After those patches were backed out, the next cycle, which included all the backouts, showed the same problem.
** The 45 failures all looked popup-related, so maybe the wrong thing was focused on the test machines.
** The sheriff requested another clobber from IT.
** IT performed another clobber, which didn't work.  IT also confirmed that the machines were in the appropriate state (the cmd window open and minimized, no other windows open) after both clobbers (although later there was discussion that perhaps the window was still focused after minimization, and perhaps it was necessary to click on the desktop to unfocus the window).
** The sheriff escalated to build, cc:ing robcee and bhearsum on the latest bug about the clobber ( {{bug|444674}}), per [[Unittest:Win2k3:Moz2:ITSupport]].
** lsblakk did a source clobber, and qm-win2k3-moz2-01 cycled green after that (with qm-win2k3-03 expected to do so as well once out-of-space issues were resolved by coop), but qm-pxp-fast03 and mozilla-central turned red in the meantime, so the sheriff left the tree closed and went to look at those.
** The problem looked related to various IT maintenance that evening (kernel upgrades on hg, ftp, and other servers as well as some DNS changes), so the sheriff waited.
** qm-pxp-fast03 and mozilla-central turned green on their next cycle, so the sheriff reopened the tree.
* Tuesday, July 8, 9:30am PDT-1:15pm
** unit-test failures caused by a typo in code, plus a tinderbox that hung due to a code error and didn't come back up correctly (no display)
* Friday, July 4, 5am PDT - 10am PDT
** Closed for The Great Tinderbox Move {{bug|441945}}
* Tuesday, July 1, 02:04 - 10:57
** {{bug|442875}} - graph server cause most things to go orange
** {{bug|442887}} - 1.9-only: qm-xserve01 needs repair
** {{bug|442843}} - trunk-only: qm-moz2-unittest01 is out of space
* Wednesday June 25, 08:20 - 12:20
** Planned downtime for VM host maintenance
* Sunday, June 8, 11:55 - Tuesday, June 10, 15:57
** Linux build tinderbox and unit test machine went read-only around 5am
** all talos machines stopped reporting around 3:30am
** filed {{bug|437877}}
** filed {{bug|437893}}
** netapp migration started a day early (Sunday) due to failures
** Tuesday morning status:
*** unit test machines intermittently failing leak check on mochitests on their return
*** talos machines occasionally appearing but still not functional
* Saturday, June 7, 9:00 - 14:27
** windows builders (main and debug) both went red due to open files
** filed {{bug|437785}}
* Friday, June 6, 15:00 - 23:15
** brendan broke the tree in two different ways
*** windows crashing
*** failing JSON test
** DNS outage slowed down the fixing
* Friday, June 6, 5:15 - 9:00
** Scheduled closure to clobber and land NSPR/NSS
*** bsmedberg goofed and the client.py step had to be removed from the master.cfg of builders and unit-testers, which took longer than expected
* Tuesday, June 3, 10:00 - 15:20
** Windows build red for some reason
*** clobbered by bhearsum, didn't help for some reason
* Sun, May 25 15:51 - Monday, May 26 01:25
** We are currently experiencing intermittent VMware/netapp problems, which causes entire sets of machines to start failing with cvs conflicts, corrupted .o files, etc, even when no checkins have occurred. Tree reopened to load-test fix. See bug 435134  for details.
* Fri, May 23
** 12:45 - 18:25
*** Buildbot master for Talos and Unittest went down (all talos boxes went red)
*** fx-linux-tbox also had cvs conflicts in browser/ and accessible/, tree clobbered
 
* Wed, May 7
** 8:00 AM - 12:50 PM
*** Waiting on backout of ({{bug|432492}}) to cycle through.
 
* Tuesday, May 6
** 8:20 - 9:00 PM
*** Talos machines were failing due to cvs-mirror issues ({{bug|432570}}).
 
* Thursday, May 1
** Start 12:00 PM
*** TUnit does not complete,  qm-centos5-02 orange since yesterday.  jst and mkaply in range, jst and sicking investigating.
 
* Friday, April 25
** 11:40 AM - 2:00 PM
** {{bug|430820}}
 
* Thursday, April 24
** 8pm - 11pm
*** expected outage for graph server and buildbot maintenance
 
* Wednesday April 16
** 3PM - 1AM
*** Unexpected outage started ~3pm, {{bug|429406}}
*** Tree closed at 8:10 due to qm-xserve-01 still not working
 
* Tuesday April 8
** 2:24AM PDT - 4:31AM
*** Tree closed due to {{bug|427723}} and {{bug|427728}}.
*** Windows nightly box restarted and completed, talos boxes started testing
 
* Monday April 7
** 7 PM PDT
*** Tree has been orange for too long (unit test failures) and then someone checked in theme changes (bug 427555) that caused red.
*** The orange was fixed after {{bug|426501}} was backed out.
 
* Saturday April 5
** 00:48 PDT - 20:45
*** unit test failures across 3 platforms
*** filed {{bug|427248}}, remaining issues spun off to:
**** {{bug|425987}} worked around reftest failures with a larger timeout
**** {{bug|426997}} for the new PGO test box - still burning, will ignore.
 
* Friday April 4
** 03:24 PDT - 09:05PDT
*** Announced closure to get clean perf numbers from {{bug|425941}}
*** See dev.planning/d.a.f thread: http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/d02e523b8483c914#
 
* Friday, Mar 28, 2008
** 16:23 PDT - 22:40 PDT
*** Bonsai DB replication issues
 
* Tuesday, Mar 25, 2008
** 14:30 PDT - 15:05 PDT
*** Leak test machines orange
**** Kai's patch for bug 420187 caused a leak, he fixed it
** 14:30 PDT - 16:10 PDT
*** Windows unit test failures
**** the Windows unit test machine (qm-win2k3-01) had failed for a few cycles for various reasons, wanted to get a green cycle in before accepting more checkins.
**** {{bug|425081}} filed about machine trouble
 
* Saturday, Mar 22, 2008
** 13:48 PDT - 18:10 PDT
*** Windows talos machines are all red
**** fallout was from enabling strict file: URI security policy
**** alice checked in a config change to talos to disable strict URI policy on talos, filed {{bug|424594}} to get talos in line with this strict policy
**** still closed waiting on unit test orange to resolve.
 
* Tuesday, March 18, 2008
** 12:03 Wednesday, Mar 19, 2008
*** johnath re-opened tree after test failures cleared and rapid cycle test boxes were reporting numbers in the pre-closure range
** 20:22
*** Major network issues at the MPT colo which hosts... everything. Closed until services are back online (including IRC).
*** {{bug|423882}} for details on the missing talos boxes.
** 17:31 - 17:39
*** Network issues at moco, closing to avoid a mess if tinderboxes and/or bonsai is taken down by it.
 
* Friday March 14, 2008
** 14:25 - 16:03
*** mac and windows unit test boxes stopped cycling sometime between 2am and 7:45am
*** dbaron noticed when firebot announced tinderbox falling off 12hr waterfall page at 14:23, Waldo had noticed and commented in #developers at 12:16 (and again at 12:42) but was preoccupied and only had time to really follow up at 14:23 to ask for tree closure
*** tree closed, {{bug|423015}} filed
** 6:00am - 07:40am
*** stage migration, closed to make sure Talos reconfig works OK and builds keep flowing
* Thursday March 6, 2008
** 7:00pm - 7:40pm
*** closed due to fx-win32-tbox bustage. Cleared stuck process, reopening assuming that was the issue rather than wait another hour for the PGO-lengthened build to finish
* Wednesday March 5, 2008
** 2:40pm - 4:00pm
*** Closed for orange on win3k3-01 that looks like memory corruption
*** Caused by {{bug|418703}}, backed out
** 8am - 11:30am
*** bugzilla/CVS down, closed tree for talos bustage because of no-CVS
* Wednesday, February 27, 2008
** 9:30pm - 12:10am
*** Overlooked 2 tests that had also broken during the day ({{bug|420028}} due to removal of DOMi, {{bug|384370}} due to incomplete backout) and were not focus issues
*** Box has some other test/build failures that went away by themselves after being kicked
** 7:40pm - 9:30pm
*** Windows box was orange with focus problems ([https://bugzilla.mozilla.org/show_bug.cgi?id=420010 bug 420010])
*** Missed from earlier in day due to expected issues while PGO was landing
** 5:30pm - 7:40pm
*** Closed to do Linux kernel upgrades ([https://bugzilla.mozilla.org/show_bug.cgi?id=407796 bug 407796])
*** Quiet tree due to B4 freeze, and holding approval1.9b4 flags for enabling PGO on Windows, so closure has minimal impact
*** Took a bit longer than expected due to a reboot problem ([https://bugzilla.mozilla.org/show_bug.cgi?id=420007 420007])
 
* Tuesday, February 26, 2008
** 22:38 - 00:49
*** problem started at 22:33 when qm-win2k3-01 turned red again
*** same problem as earlier in the day: "Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process" during tests
*** myk closed the tree around 22:38 to wait out the bustage, since the machine is our only windows unit testerbox
*** got stuck with an open file such that each build failed really quickly
*** {{bug|419799}} filed to get sysadmin help to fix unit test box
*** mrz, who was on-call that evening, jumped into IRC and then went and kicked buildbot; first he closed some dialog about some process having crashed; then he restarted buildbot, but it didn't start building; then he killed both buildbit and its cmd process and then restarted it, after which it started building and completed successfully
*** dbaron reopened the tree to metered checkins of b4 blockers
 
* Tuesday, February 26, 2008
** 18:44 - 19:40
*** problems started with linux txul perf regression
*** continued with fxdbug-win32-tb reporting: ###!!! ASSERTION: invalid active window: 'Error', file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/embedding/components/windowwatcher/src/nsWindowWatcher.cpp, line 1086
*** continued with five cross-platform unit test failures, three reftests and two mochitests
*** reed backed out {{bug|419452}}, one of two candidates for the perf regression
*** dbaron fixed the three reftests and one mochitest, which were from his checkin for {{bug|363248}}
*** myk backed out sicking's fix for {{bug|416534}}, which had caused the last mochitest failure
*** various folks speculated that the fxdbug-win32-tb assertion was random (it didn't show up on Mac or Linux)
*** myk reopened the tree, feeling that things were under control
*** reed backed out second perf regression candidate ({{bug|395609}}) when initial backout didn't resolve it
*** sicking fixed test failure in {{bug|416534}} and relanded
*** others started landing again
*** unit test tinderboxes cycled green
*** reed's second backout fixed perf regression
*** reed's first or second backout also fixed the fxdbug-win32-tb assertion
 
* Tuesday, February 26, 2008
** 5:10pm - 5:46pm
*** problem started at 5:06pm when qm-win2k3-01 turned red
*** reed said that machine frequently hits this random bustage and then recovers
*** reed previously noted at the top of the page that "qm-win2k3-01 is the only Windows unit test machine, so if it is orange or red, you should NOT check in."
*** myk, the sheriff for the day, closed the tree to wait out the apparently random failure and then reopened it when the next build came up green
*** reed thought there might be an old bug on the problem but wasn't sure, so dbaron filed bug {{bug|419761}} on the problem to make sure it's tracked and not forgotten
*** wolf also filed {{bug|419759}} on fixing or replacing winxp01 so we aren't entirely reliant on win2k3-01 for windows unittests
 
* Sunday, February 24, 2008
** 11:30am - 4:45pm (with orange lasting longer, with tree open)
*** problem started 9:23am, amid some other bustage
*** closed by dbaron a little after 11:30am
*** filed {{bug|419328}}: Windows unit test box stopped cycling
*** aravind hard-rebooted the box, came back with a bunch of popup tests orange.
*** test still orange after another cycle (forced by dbaron)
*** dbaron reopened tree 4:45pm despite the unfixed machine-related orange
*** joduinn rebooted the box again around 4:30pm
*** this time it came back with the color depth wrong, so the PNG reftests failed (but the mochitests worked)
*** color depth issue fixed 6:45am '''Monday'''
 
* Thursday, February 21, 2008
** 10pm - 12am
*** Dietrich
*** Closed to facilitate l10n string freeze
 
* Tuesday, February 19, 2008
** 9:10 PM - 10:45 PM
*** No sheriff (night)
*** Closed due to at least three people landing on orange ([http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1203480120.1203482373.3631.gz seemingly random TUnit failure on Windows]). The tree was too wide for anyone to notice the orange. [http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1203480120.1203484232.8866.gz A mochitest timeout on Linux], also seemingly random, occurred almost immediately after.
** 1:00 AM - 3:00 AM (guess)
*** No sheriff (night) (guess)
*** Closed for experimental landing of {{bug|399852}}.  The checkin stuck.
 
* Wednesday, February 13, 2008
** 1:00 PM - 1:40 PM (problem first noticed around 9:40 AM)
*** No sheriff
*** {{bug|417313}} -- graphs.mozilla.org can't keep up with data being submitted
*** reopened after machines started going green; db load lessened, but underlying issue has not been fixed.

Revision as of 16:39, 21 December 2014

Tree closure dates are no longer recorded here, please see the logs on: https://treestatus.mozilla.org/