QA/Thunderbird3/TestResults/Alpha1/Discussion

2008-05-02, 10am - 11am PDT, #qa
Attendees: davida, dmose, nth10sd, wsmwk, marcia; and Tomcat, tracy and coop later.
Follow-up bugs filed: bug 431883, bug 431884 and bug 431885.
davida: nth10sd: are we doing a phone call as well, or just IRC?
nth10sd: IRC, i was thinking
davida: wfm.
davida: So my big question about the test plan that you have is things like:
nth10sd: so we have me davida and marcia
davida: 1) how long does each kind of test take? (smoke, BFT, spot check)?
davida: 2) how many people are on hand to do it, or is just you?
davida: 3) where does QA for things like web copy get described & done?
davida: but before we get to those, a bit of level-setting.
nth10sd: i'll go first
davida: I think this is an alpha, and in my opinion the expectation of quality is quite low. 
nth10sd: 1) smoketests are loading and starting up the app in each platform, and some basic click-arounds
nth10sd: 2) BFTs are the ones in litmus and because we don't have any tests for TB trunk, I'll be adapting some for trunk from TB 2, and selectively do them since I can't possibly complete everything on my own
nth10sd: spot checks i believe are ad-hoc tests i believe, but marcia can correct me
marcia: nth10sd: yes, spot checks are just running through a quick operations to make sure main things work
nth10sd: the real 2) I am on hand to do all those, because there's nobody in MoMo's QA dept
davida: nth10sd: note that I was asking a specific q about elapsed time.
davida: nth10sd: I don't think QA needs to be done only by people working for MoMo.
nth10sd: marcia: you have an idea on the time taken
davida: In fact I will assert that QA can't just be done by MoMo staff/contractors.
marcia: nth10sd: it depends on who is doing it. Some folks are faster if they are familiar with all the operations
nth10sd: davida: we can't force volunteers to do this tests
davida: nth10sd: no one said anything about forcing.
nth10sd: they can do it if they like
nth10sd: but we must ensure that they are done
davida: for a final or mass-market release, i agree.
davida: for an early alpha, i'm not so sure.
nth10sd: but it
wsmwk_away is now known as wsmwk.
marcia: davida: the alpha will probably get some press coverage
nth10sd: but it's bad PR if for e.g. something screws up for a platform we didn't test
marcia: since there hasn't bee a tbird release in a while
nth10sd: (and this is the first alpha by MoMo)
davida: marcia: i know, but I'm not convinced that one person working 1 day or 5 days will make any difference to the PR impact.
nth10sd: this can easily throw up obvious blockers
davida: I suspect that the way to deal with the PR risk is to message it carefully.  As in "this release will explode".
nth10sd: (through smoketests)
marcia: I agree with gary that I would not want to ship an alpha with major functionality broken on a platform
marcia: we have never done that in the past and we want to keep our streak going
davida: marcia: i agree w/ that "major functionality" point.  I don't mean to say we shouldn't do _any_ testing.
davida: Which is why i was asking about time spent for a BFT for example.  If it's a couple of hours per platform, that seems well worth it.
nth10sd: i'm of the opinion that we should be comprehensive on all our platforms
davida: if it's a week per platform, it's not.
marcia: davida: the BFT would likely take longer than just a few hours. It involves a lot of setting up accounts, etc
marcia: sometimes when i ran the Tbird FFT it could take 2 days
nth10sd: I believe XP has the largest number of users and that's why we test on it
davida: but FFTs are longer, right?
marcia: davida: Yes, FFT are quite a bit longer
wsmwk: FFT?
nth10sd: Full Functionality Tests
Tomcat: full functional test
marcia: nth10sd: I agree that Vista and XP need to work correctly since they will be most visible
nth10sd: (on Litmus)
davida: And the reason I'm being pushy about this is that the current stated tree closure policy is "until we release".  If testing takes a week, then I'm going to push w/ dmose that we branch, so we can unfreeze the tree.
nth10sd: (dmose is coming)
nth10sd: and I can only do so much during the couple of days post-build and pre-release
wsmwk: can we ascertain anything about stability from crash-stats, eg MTBF?
nth10sd: MTBF? (my turn...  )
davida: mean time between failures
wsmwk: mean time between failure
nth10sd: ah ok
nth10sd: wsmwk: look at the graph shape?
davida: wsmwk: good question.  Especially because from what I read somewhere we have a few thousand people using nightlies.
wsmwk: that's probably the only quantitative measure available to us.
nth10sd: except the rate of trunk bugs filed in Bugzilla as well
dmose joined the chat room.
nth10sd: dmose: so we were on the topic of the testplan: http://wiki.mozilla.org/QA/Thunderbird3/TestResults/Alpha1
dmose: ok
davida: marcia, given that you know probably the most about how long it takes to run these things -- how long do you think it would take nth10sd to go through that list?
marcia: davida: Let me look at the list
nth10sd: dmose: and about smoketests on all platforms, with BFTs on XP, and spot checks on the rest
nth10sd: marcia: i adapted it from TB 2
nth10sd: (Fx 3 test plans didn't really seem to be relevant)
davida: My inclination right now is that if nth10sd wants to run a fairly long test plan (say a week), I actually don't mind, as long as we unblock the rest of the dev team, which means branching, which means more work for dmose in particular.
dmose: it's not a ton of work
wsmwk: unsure why we would expect a long test.
marcia: davida: I think the smoketests could probably be completed in 1/2 day. If he going to run BFTs then I would give a buffer of another day or two
dmose: though yesterday we talked about a theory of keeping this short and sweet
dmose: so i'm confused about what's changed
wsmwk: is anytjing more needed than 2-5 ppl running litmus / platform
nth10sd: i haven't done all the tests myself though i'd expect around 2 - 3 days to promise I can complete them
davida: Maybe nothing has changed, but I don't know how to interpret the time impact of gary's test plan.  (hence my question to marcia about timing)
nth10sd: wsmwk: Litmus is good for the community but here we are guaranteeing that they have been done
davida: I've learned that I suck at estimating QA test times 
wsmwk: but litmus is only helpful to the extent that you introduce variability - so > a few is needed.
nth10sd: wsmwk: we know we can rely on you, but we don't know how many of our community will turn up
nth10sd: hence the basic stuff that i must get completed
marcia: Is there a plan to have a test day to bang on the candidate build?
nth10sd: marcia: yes, and someone to operate them as well
marcia: nth10sd: I edited your test plan to cover Tiger and Leopard since they often have different bugs
nth10sd: (if wsmwk doesn't mind, I can let him do it while I focus on the testplan)
nth10sd: marcia: thanks!
wsmwk: I think if you put it out to the community there will be some volunteers
nth10sd: (so that's one platform more)
marcia: at one time we had a mailing list for people interested in testing thunderbird
wsmwk: doesn't need much, but need more than 1-2/per platform.  you could also look to bz QA for a model, not just FF qa
nth10sd: wsmwk: ppl are going to test on the platforms they like but that doesn't mean that we should *not* test a platform our side at all
wsmwk: where is our litmus stuff?
nth10sd: wsmwk: litmus trunk tests don't exist except accessibility ones and one other stuff
nth10sd: i adapt from TB 2, because trunk ones haven't been written
davida: nth10sd: but wouldn't tb2 tests apply 99%?
nth10sd: davida: we have new features like tabbed messaging
davida: in fact figuring out where the tb2 litmus runs fail on trunk would be really useful knowledge.
nth10sd: (that aren't covered in trunk)
wsmwk: mb I'm misunderstanding - bugzilla has a test checklist (forgets if it's litmus)
davida: nth10sd: I wouldn't bother testing tabbed messaging, because IMO it's broken 
marcia: nth10sd: There seems to be a trunk version of Tbird it is just not in the recommended list
nth10sd: as well as things like Gecko 1.9 widgets in Thunderbird
dmose: nth10sd: since this is an alpha (and most especially our first alpha) we have an extremely low bar here
marcia: Need to make sure the crash reporter fires on all platforms since you will want that data in an alpha
dmose: nth10sd: the bar should be "is it basically usable"
nth10sd: so - no loss in major functionality - no major / obvious crashes
dmose: if it crashes in something that's not likely to be used a lot in a day-to-day use, i'm happy to release note and move on
dmose: and tabs is so hard to discover that i probably wouldn't even relnote
wsmwk: yeah, tabs is not ready for QA
nth10sd: the testplan of which is in the wiki that fits in my objective of comprehensive, no loss in major functionality and stable
wsmwk: like dmose's definition
nth10sd: (and i'm not referring to tabs here, point taken about tabs)
dmose: sounds reasonable
nth10sd: so my question now becomes 1) the test plan for our side, 2) the suitability of litmus testcases for the trunk and 3) the running of the testday
dmose: marcia: so right now, we don't have symbols on mac, and we're (at the moment) blocking on it
dmose: marcia: however, i wouldn't want to block on that for too long
marcia: dmose: That is good. Mac is most likely to crash
dmose: marcia:  ie if we don't have any better handle on it by early- to mid- next week, i'd inclined to ship anyway
davida: +1
wsmwk: given that it's alpha, i don't know that we'd block on anything other than what is already flagged + plus what someone mentioned qa functions working like breakpad
dmose: wsmwk: so the symbols thing is that
dmose: wsmwk: right now, no breakpad on mac
dmose: wsmwk: and that's the only currently-plussed blocker
dmose: and i'd rather ship next week than wait even longer to get breakpad on mac
dmose: non-ideal though that may be
nth10sd: https://bugzilla.mozilla.org/show_bug.cgi?id=411171
firebot: nth10sd: Bug 411171 nor, --, ---, rick.tessner@gmail.com, NEW, Thunderbird Mac tinderbox crashing in dump_syms
wsmwk: is that only tinderbox and not crash-stats?
dmose: breakpad == crash-stats == symbols, in this conversation
marcia: nth10sd: https://litmus.mozilla.org/show_test.cgi?id=5118 can be used to test Breakpad functionality for Tbird
dmose: meaning that crash-stats would have info for windows & mac
dmose: er
nth10sd: so with the testplan (and the addition of Tiger) I can safely assure that I can complete the testing by 2 days, if not 3.
dmose: windows & linux
dmose: not mac
nth10sd: (which is pretty weird since Gecko 1.9 Cocoa widgets should have more data)
nth10sd: marcia: true, but that's Firefox, though it can be adapted to TB
marcia_leopard: nth10sd: I just tested it, it installs fine into Tbird and crashes it on mac
nth10sd: (someone should write TB trunk testcases in Litmus if we are going to use Litmus for the future)
nth10sd: marcia_leopard: so it works properly?
Aleksej: Are crash reports processed today?
dmose: no
dmose: crash reports on mac are currently busted
marcia_leopard: nth10sd: yes, it does on mac at least
marcia_leopard: Aleksej: The discussion above is about Thunderbird crash reports, not Firefox in case you are wondering
Aleksej: marcia_leopard: I haven't looked at it 
Aleksej: My todays Firefox crash reports are not processed yet 
marcia: Aleksej: 
nth10sd: so, back to my question:
nth10sd: 1) the test plan for our side, 2) the usage of TB 2 litmus testcases for the trunk and 3) the running of the testday
wsmwk: if symbols busted that long then not a blocker + we can move on  + back to gary
nth10sd: is this a decent plan?
nth10sd: (back to gary.  )
marcia: nth10sd: I think your test plan is fine
marcia: the litmus test cases that are can be used as a framework
nth10sd: so now we set the QA plans as the foundation for the future alphas, at least
wsmwk: if we want to improve litmus for future, gary what would you want to see hqppen?
nth10sd: wsmwk: someone must write Thunderbird trunk Litmus testcases
nth10sd: make that Thunderbird Trunk-specific testcases that guides new testers along
wsmwk: + identify cases needed?
dmose: nth10sd: this sounds like something that could be done on test-writing days
nth10sd: dmose: idea for the future
nth10sd: wsmwk: hang on
dmose: nth10sd: additionally, as we move towards a more test-driven development model, we should start encouraging devs to write tests (ideally in a suite, but litmus is a good fallback) for bugs as they fix them
dmose: and not just devs, really, anyone who's interested in doing that work
dmose: getting triagers involved at the level, for those have the skillset, would be great
davida: yeah, it would be interesting to see what % of litmus tests could be converted to automated tests.
nth10sd: wsmwk: if you log in to litmus and click view/search tests, you select Thunderbird then Trunk
wsmwk: i'm just thinking that, if litmus is to be a key QA item for future, if we get trunk users to buy into using it, we can use there input as to what needs improvement, as well as another way to get them involved.
nth10sd: you'd see that Testgroups only consist of accessibility and l10n
wsmwk: but as david says we don't need it as a basis for releasing a1
nth10sd: I am of the opinion that Litmus is a key QA item for the future, but I won't count on getting users to test a1 now
dmose: agreed; litmus is a great way to test stuff that's not yet automatically testable
nth10sd: s/to test a1 now/to test a1 using Litmus now, if they are new testers
nth10sd: but we are laying the groundwork for the future alphas
davida: i'm not sure we have resolution on the timing issue that I'm most concerned about.
davida: speaking egotistically 
dmose: the timing issue being what, exactly?
nth10sd: davida: i mentioned with the testplan (with Tiger thrown in) I can get it completed by 2 days, if not 3
davida: well starting with t=0 being when a build is available, what is a reasonable estimate for QA signoff in the optimistic case that no blockers are found?
nth10sd: (I can guarantee safely saying that I can get it completed myself, is 2-3 days)
davida: nth10sd: but isn't there a test day also in the loop, and can that be done in parallel w/ little advance notice?
nth10sd: but with that, i
nth10sd: i'm not sure if i have much bandwidth left for testday
davida: I also don't like the idea that you, nth10sd, feel that you're personally responsible for holding up a release.  that seems wrong.
nth10sd: i'm sure wsmwk can help with testdays
nth10sd: i'm just trying to ensure that everything is as comprehensive as possible
dmose: yeah, but pinning that all on you doesn't scale
nth10sd: "holding up a release" seems an inappropriate phrase
wsmwk: testday OK. but more to the point is, what would a testday reveal that we would block on? if there is nothing, then do we need a testday?
wsmwk: prior to release
dmose: well, if the testday's goal was running through the testplan, i'd hope so!
nth10sd: testdays are for ensuring that any potential blockers are discovered pre-release
dmose: or, at least, making as much progress in the test plan as possible
nth10sd: wsmwk: i could assign a you a platform on whatever you're comfortable with
wsmwk: if to validate a testplan then that's a good thing.
wsmwk: but if we are on timed release, then the goal is not primarily to find blockers.
dmose: i guess i might be using "testday" in a way that's slightly different than the traditional ussage
nth10sd: generally, historically, basic smoketests are done by Mozilla (whatever company)
dmose: meaning "some official day that we attempt to engage the testing community to help with whatever we need done for release"
davida: nth10sd: forget about that history 
nth10sd: hmmmm
dmose: yeah, the employer of record of the folks who do the smoketesting is not interesting
davida: nth10sd: i mean it.  i don't have a budget for a full QA department.  so we have to find ways to use volunteers, and adjust everything as approprite.
davida: even if it means changing our critiria for release.
dmose: the interesting bit is that we feel confident that whoever is doing it is doing a reasonable (but not perfect!) job
• wsmwk wonders where are the Mac-fans are that symbols are not working for such a long time?
nth10sd: so we move away from the way we used to do it, the way that has ensured our QA for the past releases / reputation?
davida: criteria
nth10sd: hmmmm ok
dmose: wsmwk: it's only been two weeks
dmose: nth10sd: this is an alpha, our reputation is not staked on alpha releases
wsmwk: but missing symbols seems to happen with some unplanned frequency
davida: nth10sd: I personally believe that the reputation for quality is the result of much more than things like running through litmus, but that's another conversation.
dmose: wsmwk: yeah, it seems to be a bug in dump_syms; hopefully this will make that stup
nth10sd: davida: yes those are 2 things
davida: wsmwk: it looks like a heisenbug
dmose: davida: agreed; just having nightly test builds is where tons of our quality comes from
wsmwk: agree
wsmwk: we are building quality, not guaranteeing
nth10sd: so we still have yet to agree on the testplan for the alpha
nth10sd: 1) the test plan for our side, 2) the usage of TB 2 litmus testcases for the trunk and 3) the running of the testday
wsmwk: i'd say have a testplan, go through the motions of a testday, but have an extremely high bar not t release
nth10sd: and get volunteers to run TB 2 litmus tests on trunk?
wsmwk: if that's part of the testplan, yes
dmose: that sounds right to me
davida: but if no one shows up, oh well.
dmose: marcia: so does the wiki page seem like it's a reasonable test plan?
davida: in the short term oh well.  in the long term we need to fix that.
• nth10sd notes that marcia has approved as well
marcia: dmose: Yes, I confirmed that earlier
nth10sd: (backscroll?)
nth10sd: yup
tracy: y'all can get coop to do a staright copy of the TB2 test cases into a TB3 suite. then work on cleaning those up as they apply to trunk builds.
davida: tracy: good idea.
dmose: marcia: ok, great
dmose: nth10sd: so it looks like we've got an agreed upon test plan, then, no?
tracy: davida: let me know it you need any help facilitating that.
nth10sd: yes
davida: tracy: i have no idea what's involved, so i probably will
nth10sd: the testplan has only been modified to add Tiger at http://wiki.mozilla.org/QA/Thunderbird3/TestResults/Alpha1
nth10sd: and the consensus that we will be using Litmus for volunteers at the a1 testday
dmose: nth10sd: sounds good
davida: also that the a1 test day will be scheduled as soon as the builds are ready?
nth10sd: davida: yes
nth10sd: we are holding on when Rick gets the builds out
dmose: we can just schedule a test day by saying "2 days from now", right?
nth10sd: yes
dmose: i wouldn't want to have to wait until some specific thursday
nth10sd: once he gets the builds out wsmwk and i can easily quickly advertise a testday
wsmwk: yup
nth10sd: since we had been doing this for weeks
dmose: ok, great
nth10sd: dmose: ideally on thursday but we can always change
dmose: nth10sd: so in some basic way, i think we've covered all three of your most recent questions, though not in a lot of depth.  is there anything we need to nail down further?
nth10sd: wsmwk: you think you can guide volunteers to litmus on testday?
wsmwk: yes
nth10sd: (i will still be on for session 2)
nth10sd: ok
nth10sd: great
nth10sd: then i think i'd reiterate:
wsmwk: the only question from me is, how do we gather info about what needs to be added/changed in litmus - is failed litmus test sufficient?
nth10sd: 1) the test plan for our side,(OK) 2) the usage of TB 2 litmus testcases for the trunk (YES) and 3) the running of the testday (USE LITMUS)
tracy: historically, short notice for testdays hasn't worked out so well.  which is why we've stuck to a schedule and really tried to have a list of future testday topics available well ahead of time.
nth10sd: tracy: i wouldn't hold on that
nth10sd: marcia: how do we deal with failed litmus testcases?
dmose: tracy: hmmm, interesting
dmose: we could simply schedule a test day for thursday
nth10sd: dmose: and use a nightly?
marcia: nth10sd: periodically we just review the failed ones and make sure they are really fails. SOmetimes people fail just on verbiage
tracy: dmose, that's what I'd suggest.
dmose: if we don't have a blocker, yeah
wsmwk: agree with gary, if the bigger goal is to test the process, not the product.
marcia: nth10sd: there is automated way to get failed results from Litmus on a daily basis
nth10sd: ah
marcia: coop does it for Firefox, so I am sure it can be adapted to Tbird
dmose: s/blocker/build
tracy: nth10sd: testday reports can also be setup specifically for your testday.
nth10sd: wow so many things still to be done for Litmus
nth10sd: Litmus issues: 1) morph TB2 to TB3 testcases, 2) automated way to get failed results from Litmus on a daily basis and 3) set up testday reports for Litmus
nth10sd: can coop do all those above?
nth10sd: s/do/help us with
coop: if you've setup your testdays, the reports are automatic
coop: and please file bugs for the first two
• nth10sd wonders how to setup testdays
tracy: admins can setup the testday report..
nth10sd: ok, i'll file the bugs
nth10sd: otherwise i don't have anything else for the discussion
tracy: I think the daily report covers all, yes?  coop?
Tomcat: i can setup the testday report
nth10sd: dmose davida marcia wsmwk: anything else?
nth10sd: thanks Tomcat
marcia: nth10sd: not that i can think of
coop: tracy: it might right now
davida: nope, gotta be on a call anyway.
Tomcat: nth10sd: the create testday report bug can you assign to me
davida is now known as davida_phone.
dmose: sounds good to me; thanks everyone for helping us sort through this stuff
Tomcat:
wsmwk: sounds good.
nth10sd: thank you everyone!
wsmwk: quick OT question
coop: but if nth10sd is going to handle the thunderbird results, i could split them up
nth10sd: coop: handle results?
wsmwk: do we have symbols on server for thunderbird, eg
wsmwk: http://symbols.mozilla.org/thunderbird
wsmwk: http://symbols.mozilla.org/seamonkey
wsmwk: http://symbols.mozilla.org/thunderbird
wsmwk: http://symbols.mozilla.org/seamonkey
dmose: i got the impression from ted that we might
dmose: but it'd be good to double-check at some point
nth10sd: wsmwk: sigh~ file a bug?
wsmwk: we need that for some future qa work. 
wsmwk: to QA hangs, loops, etc -
wsmwk: wihtout debug builds
wsmwk: i'll check with ted
nth10sd: wsmwk: i'd say file a bug and probably flag it
dmose: wsmwk: or just try it and see if it works
nth10sd: yeah
wsmwk: i'll farm it out 
nth10sd: coop: re: filing the bugs, which components should they be in? and should i cc you?
davida_phone left the chat room. (Ping timeout)
coop: webtools/litmus, and no need to cc me directly, i'll see them