QA/Thunderbird3/TestResults/Alpha1/Discussion
2008-05-02, 10am - 11am PDT, #qa
Attendees: davida, dmose, nth10sd, wsmwk, marcia; and Tomcat, tracy and coop later.
Follow-up bugs filed: bug 431883, bug 431884 and bug 431885.
davida: nth10sd: are we doing a phone call as well, or just IRC? nth10sd: IRC, i was thinking davida: wfm. davida: So my big question about the test plan that you have is things like: nth10sd: so we have me davida and marcia davida: 1) how long does each kind of test take? (smoke, BFT, spot check)? davida: 2) how many people are on hand to do it, or is just you? davida: 3) where does QA for things like web copy get described & done? davida: but before we get to those, a bit of level-setting. nth10sd: i'll go first davida: I think this is an alpha, and in my opinion the expectation of quality is quite low. nth10sd: 1) smoketests are loading and starting up the app in each platform, and some basic click-arounds nth10sd: 2) BFTs are the ones in litmus and because we don't have any tests for TB trunk, I'll be adapting some for trunk from TB 2, and selectively do them since I can't possibly complete everything on my own nth10sd: spot checks i believe are ad-hoc tests i believe, but marcia can correct me marcia: nth10sd: yes, spot checks are just running through a quick operations to make sure main things work nth10sd: the real 2) I am on hand to do all those, because there's nobody in MoMo's QA dept davida: nth10sd: note that I was asking a specific q about elapsed time. davida: nth10sd: I don't think QA needs to be done only by people working for MoMo. nth10sd: marcia: you have an idea on the time taken davida: In fact I will assert that QA can't just be done by MoMo staff/contractors. marcia: nth10sd: it depends on who is doing it. Some folks are faster if they are familiar with all the operations nth10sd: davida: we can't force volunteers to do this tests davida: nth10sd: no one said anything about forcing. nth10sd: they can do it if they like nth10sd: but we must ensure that they are done davida: for a final or mass-market release, i agree. davida: for an early alpha, i'm not so sure. nth10sd: but it wsmwk_away is now known as wsmwk. marcia: davida: the alpha will probably get some press coverage nth10sd: but it's bad PR if for e.g. something screws up for a platform we didn't test marcia: since there hasn't bee a tbird release in a while nth10sd: (and this is the first alpha by MoMo) davida: marcia: i know, but I'm not convinced that one person working 1 day or 5 days will make any difference to the PR impact. nth10sd: this can easily throw up obvious blockers davida: I suspect that the way to deal with the PR risk is to message it carefully. As in "this release will explode". nth10sd: (through smoketests) marcia: I agree with gary that I would not want to ship an alpha with major functionality broken on a platform marcia: we have never done that in the past and we want to keep our streak going davida: marcia: i agree w/ that "major functionality" point. I don't mean to say we shouldn't do _any_ testing. davida: Which is why i was asking about time spent for a BFT for example. If it's a couple of hours per platform, that seems well worth it. nth10sd: i'm of the opinion that we should be comprehensive on all our platforms davida: if it's a week per platform, it's not. marcia: davida: the BFT would likely take longer than just a few hours. It involves a lot of setting up accounts, etc marcia: sometimes when i ran the Tbird FFT it could take 2 days nth10sd: I believe XP has the largest number of users and that's why we test on it davida: but FFTs are longer, right? marcia: davida: Yes, FFT are quite a bit longer wsmwk: FFT? nth10sd: Full Functionality Tests Tomcat: full functional test marcia: nth10sd: I agree that Vista and XP need to work correctly since they will be most visible nth10sd: (on Litmus) davida: And the reason I'm being pushy about this is that the current stated tree closure policy is "until we release". If testing takes a week, then I'm going to push w/ dmose that we branch, so we can unfreeze the tree. nth10sd: (dmose is coming) nth10sd: and I can only do so much during the couple of days post-build and pre-release wsmwk: can we ascertain anything about stability from crash-stats, eg MTBF? nth10sd: MTBF? (my turn... ) davida: mean time between failures wsmwk: mean time between failure nth10sd: ah ok nth10sd: wsmwk: look at the graph shape? davida: wsmwk: good question. Especially because from what I read somewhere we have a few thousand people using nightlies. wsmwk: that's probably the only quantitative measure available to us. nth10sd: except the rate of trunk bugs filed in Bugzilla as well dmose joined the chat room. nth10sd: dmose: so we were on the topic of the testplan: http://wiki.mozilla.org/QA/Thunderbird3/TestResults/Alpha1 dmose: ok davida: marcia, given that you know probably the most about how long it takes to run these things -- how long do you think it would take nth10sd to go through that list? marcia: davida: Let me look at the list nth10sd: dmose: and about smoketests on all platforms, with BFTs on XP, and spot checks on the rest nth10sd: marcia: i adapted it from TB 2 nth10sd: (Fx 3 test plans didn't really seem to be relevant) davida: My inclination right now is that if nth10sd wants to run a fairly long test plan (say a week), I actually don't mind, as long as we unblock the rest of the dev team, which means branching, which means more work for dmose in particular. dmose: it's not a ton of work wsmwk: unsure why we would expect a long test. marcia: davida: I think the smoketests could probably be completed in 1/2 day. If he going to run BFTs then I would give a buffer of another day or two dmose: though yesterday we talked about a theory of keeping this short and sweet dmose: so i'm confused about what's changed wsmwk: is anytjing more needed than 2-5 ppl running litmus / platform nth10sd: i haven't done all the tests myself though i'd expect around 2 - 3 days to promise I can complete them davida: Maybe nothing has changed, but I don't know how to interpret the time impact of gary's test plan. (hence my question to marcia about timing) nth10sd: wsmwk: Litmus is good for the community but here we are guaranteeing that they have been done davida: I've learned that I suck at estimating QA test times wsmwk: but litmus is only helpful to the extent that you introduce variability - so > a few is needed. nth10sd: wsmwk: we know we can rely on you, but we don't know how many of our community will turn up nth10sd: hence the basic stuff that i must get completed marcia: Is there a plan to have a test day to bang on the candidate build? nth10sd: marcia: yes, and someone to operate them as well marcia: nth10sd: I edited your test plan to cover Tiger and Leopard since they often have different bugs nth10sd: (if wsmwk doesn't mind, I can let him do it while I focus on the testplan) nth10sd: marcia: thanks! wsmwk: I think if you put it out to the community there will be some volunteers nth10sd: (so that's one platform more) marcia: at one time we had a mailing list for people interested in testing thunderbird wsmwk: doesn't need much, but need more than 1-2/per platform. you could also look to bz QA for a model, not just FF qa nth10sd: wsmwk: ppl are going to test on the platforms they like but that doesn't mean that we should *not* test a platform our side at all wsmwk: where is our litmus stuff? nth10sd: wsmwk: litmus trunk tests don't exist except accessibility ones and one other stuff nth10sd: i adapt from TB 2, because trunk ones haven't been written davida: nth10sd: but wouldn't tb2 tests apply 99%? nth10sd: davida: we have new features like tabbed messaging davida: in fact figuring out where the tb2 litmus runs fail on trunk would be really useful knowledge. nth10sd: (that aren't covered in trunk) wsmwk: mb I'm misunderstanding - bugzilla has a test checklist (forgets if it's litmus) davida: nth10sd: I wouldn't bother testing tabbed messaging, because IMO it's broken marcia: nth10sd: There seems to be a trunk version of Tbird it is just not in the recommended list nth10sd: as well as things like Gecko 1.9 widgets in Thunderbird dmose: nth10sd: since this is an alpha (and most especially our first alpha) we have an extremely low bar here marcia: Need to make sure the crash reporter fires on all platforms since you will want that data in an alpha dmose: nth10sd: the bar should be "is it basically usable" nth10sd: so - no loss in major functionality - no major / obvious crashes dmose: if it crashes in something that's not likely to be used a lot in a day-to-day use, i'm happy to release note and move on dmose: and tabs is so hard to discover that i probably wouldn't even relnote wsmwk: yeah, tabs is not ready for QA nth10sd: the testplan of which is in the wiki that fits in my objective of comprehensive, no loss in major functionality and stable wsmwk: like dmose's definition nth10sd: (and i'm not referring to tabs here, point taken about tabs) dmose: sounds reasonable nth10sd: so my question now becomes 1) the test plan for our side, 2) the suitability of litmus testcases for the trunk and 3) the running of the testday dmose: marcia: so right now, we don't have symbols on mac, and we're (at the moment) blocking on it dmose: marcia: however, i wouldn't want to block on that for too long marcia: dmose: That is good. Mac is most likely to crash dmose: marcia: ie if we don't have any better handle on it by early- to mid- next week, i'd inclined to ship anyway davida: +1 wsmwk: given that it's alpha, i don't know that we'd block on anything other than what is already flagged + plus what someone mentioned qa functions working like breakpad dmose: wsmwk: so the symbols thing is that dmose: wsmwk: right now, no breakpad on mac dmose: wsmwk: and that's the only currently-plussed blocker dmose: and i'd rather ship next week than wait even longer to get breakpad on mac dmose: non-ideal though that may be nth10sd: https://bugzilla.mozilla.org/show_bug.cgi?id=411171 firebot: nth10sd: Bug 411171 nor, --, ---, rick.tessner@gmail.com, NEW, Thunderbird Mac tinderbox crashing in dump_syms wsmwk: is that only tinderbox and not crash-stats? dmose: breakpad == crash-stats == symbols, in this conversation marcia: nth10sd: https://litmus.mozilla.org/show_test.cgi?id=5118 can be used to test Breakpad functionality for Tbird dmose: meaning that crash-stats would have info for windows & mac dmose: er nth10sd: so with the testplan (and the addition of Tiger) I can safely assure that I can complete the testing by 2 days, if not 3. dmose: windows & linux dmose: not mac nth10sd: (which is pretty weird since Gecko 1.9 Cocoa widgets should have more data) nth10sd: marcia: true, but that's Firefox, though it can be adapted to TB marcia_leopard: nth10sd: I just tested it, it installs fine into Tbird and crashes it on mac nth10sd: (someone should write TB trunk testcases in Litmus if we are going to use Litmus for the future) nth10sd: marcia_leopard: so it works properly? Aleksej: Are crash reports processed today? dmose: no dmose: crash reports on mac are currently busted marcia_leopard: nth10sd: yes, it does on mac at least marcia_leopard: Aleksej: The discussion above is about Thunderbird crash reports, not Firefox in case you are wondering Aleksej: marcia_leopard: I haven't looked at it Aleksej: My todays Firefox crash reports are not processed yet marcia: Aleksej: nth10sd: so, back to my question: nth10sd: 1) the test plan for our side, 2) the usage of TB 2 litmus testcases for the trunk and 3) the running of the testday wsmwk: if symbols busted that long then not a blocker + we can move on + back to gary nth10sd: is this a decent plan? nth10sd: (back to gary. ) marcia: nth10sd: I think your test plan is fine marcia: the litmus test cases that are can be used as a framework nth10sd: so now we set the QA plans as the foundation for the future alphas, at least wsmwk: if we want to improve litmus for future, gary what would you want to see hqppen? nth10sd: wsmwk: someone must write Thunderbird trunk Litmus testcases nth10sd: make that Thunderbird Trunk-specific testcases that guides new testers along wsmwk: + identify cases needed? dmose: nth10sd: this sounds like something that could be done on test-writing days nth10sd: dmose: idea for the future nth10sd: wsmwk: hang on dmose: nth10sd: additionally, as we move towards a more test-driven development model, we should start encouraging devs to write tests (ideally in a suite, but litmus is a good fallback) for bugs as they fix them dmose: and not just devs, really, anyone who's interested in doing that work dmose: getting triagers involved at the level, for those have the skillset, would be great davida: yeah, it would be interesting to see what % of litmus tests could be converted to automated tests. nth10sd: wsmwk: if you log in to litmus and click view/search tests, you select Thunderbird then Trunk wsmwk: i'm just thinking that, if litmus is to be a key QA item for future, if we get trunk users to buy into using it, we can use there input as to what needs improvement, as well as another way to get them involved. nth10sd: you'd see that Testgroups only consist of accessibility and l10n wsmwk: but as david says we don't need it as a basis for releasing a1 nth10sd: I am of the opinion that Litmus is a key QA item for the future, but I won't count on getting users to test a1 now dmose: agreed; litmus is a great way to test stuff that's not yet automatically testable nth10sd: s/to test a1 now/to test a1 using Litmus now, if they are new testers nth10sd: but we are laying the groundwork for the future alphas davida: i'm not sure we have resolution on the timing issue that I'm most concerned about. davida: speaking egotistically dmose: the timing issue being what, exactly? nth10sd: davida: i mentioned with the testplan (with Tiger thrown in) I can get it completed by 2 days, if not 3 davida: well starting with t=0 being when a build is available, what is a reasonable estimate for QA signoff in the optimistic case that no blockers are found? nth10sd: (I can guarantee safely saying that I can get it completed myself, is 2-3 days) davida: nth10sd: but isn't there a test day also in the loop, and can that be done in parallel w/ little advance notice? nth10sd: but with that, i nth10sd: i'm not sure if i have much bandwidth left for testday davida: I also don't like the idea that you, nth10sd, feel that you're personally responsible for holding up a release. that seems wrong. nth10sd: i'm sure wsmwk can help with testdays nth10sd: i'm just trying to ensure that everything is as comprehensive as possible dmose: yeah, but pinning that all on you doesn't scale nth10sd: "holding up a release" seems an inappropriate phrase wsmwk: testday OK. but more to the point is, what would a testday reveal that we would block on? if there is nothing, then do we need a testday? wsmwk: prior to release dmose: well, if the testday's goal was running through the testplan, i'd hope so! nth10sd: testdays are for ensuring that any potential blockers are discovered pre-release dmose: or, at least, making as much progress in the test plan as possible nth10sd: wsmwk: i could assign a you a platform on whatever you're comfortable with wsmwk: if to validate a testplan then that's a good thing. wsmwk: but if we are on timed release, then the goal is not primarily to find blockers. dmose: i guess i might be using "testday" in a way that's slightly different than the traditional ussage nth10sd: generally, historically, basic smoketests are done by Mozilla (whatever company) dmose: meaning "some official day that we attempt to engage the testing community to help with whatever we need done for release" davida: nth10sd: forget about that history nth10sd: hmmmm dmose: yeah, the employer of record of the folks who do the smoketesting is not interesting davida: nth10sd: i mean it. i don't have a budget for a full QA department. so we have to find ways to use volunteers, and adjust everything as approprite. davida: even if it means changing our critiria for release. dmose: the interesting bit is that we feel confident that whoever is doing it is doing a reasonable (but not perfect!) job • wsmwk wonders where are the Mac-fans are that symbols are not working for such a long time? nth10sd: so we move away from the way we used to do it, the way that has ensured our QA for the past releases / reputation? davida: criteria nth10sd: hmmmm ok dmose: wsmwk: it's only been two weeks dmose: nth10sd: this is an alpha, our reputation is not staked on alpha releases wsmwk: but missing symbols seems to happen with some unplanned frequency davida: nth10sd: I personally believe that the reputation for quality is the result of much more than things like running through litmus, but that's another conversation. dmose: wsmwk: yeah, it seems to be a bug in dump_syms; hopefully this will make that stup nth10sd: davida: yes those are 2 things davida: wsmwk: it looks like a heisenbug dmose: davida: agreed; just having nightly test builds is where tons of our quality comes from wsmwk: agree wsmwk: we are building quality, not guaranteeing nth10sd: so we still have yet to agree on the testplan for the alpha nth10sd: 1) the test plan for our side, 2) the usage of TB 2 litmus testcases for the trunk and 3) the running of the testday wsmwk: i'd say have a testplan, go through the motions of a testday, but have an extremely high bar not t release nth10sd: and get volunteers to run TB 2 litmus tests on trunk? wsmwk: if that's part of the testplan, yes dmose: that sounds right to me davida: but if no one shows up, oh well. dmose: marcia: so does the wiki page seem like it's a reasonable test plan? davida: in the short term oh well. in the long term we need to fix that. • nth10sd notes that marcia has approved as well marcia: dmose: Yes, I confirmed that earlier nth10sd: (backscroll?) nth10sd: yup tracy: y'all can get coop to do a staright copy of the TB2 test cases into a TB3 suite. then work on cleaning those up as they apply to trunk builds. davida: tracy: good idea. dmose: marcia: ok, great dmose: nth10sd: so it looks like we've got an agreed upon test plan, then, no? tracy: davida: let me know it you need any help facilitating that. nth10sd: yes davida: tracy: i have no idea what's involved, so i probably will nth10sd: the testplan has only been modified to add Tiger at http://wiki.mozilla.org/QA/Thunderbird3/TestResults/Alpha1 nth10sd: and the consensus that we will be using Litmus for volunteers at the a1 testday dmose: nth10sd: sounds good davida: also that the a1 test day will be scheduled as soon as the builds are ready? nth10sd: davida: yes nth10sd: we are holding on when Rick gets the builds out dmose: we can just schedule a test day by saying "2 days from now", right? nth10sd: yes dmose: i wouldn't want to have to wait until some specific thursday nth10sd: once he gets the builds out wsmwk and i can easily quickly advertise a testday wsmwk: yup nth10sd: since we had been doing this for weeks dmose: ok, great nth10sd: dmose: ideally on thursday but we can always change dmose: nth10sd: so in some basic way, i think we've covered all three of your most recent questions, though not in a lot of depth. is there anything we need to nail down further? nth10sd: wsmwk: you think you can guide volunteers to litmus on testday? wsmwk: yes nth10sd: (i will still be on for session 2) nth10sd: ok nth10sd: great nth10sd: then i think i'd reiterate: wsmwk: the only question from me is, how do we gather info about what needs to be added/changed in litmus - is failed litmus test sufficient? nth10sd: 1) the test plan for our side,(OK) 2) the usage of TB 2 litmus testcases for the trunk (YES) and 3) the running of the testday (USE LITMUS) tracy: historically, short notice for testdays hasn't worked out so well. which is why we've stuck to a schedule and really tried to have a list of future testday topics available well ahead of time. nth10sd: tracy: i wouldn't hold on that nth10sd: marcia: how do we deal with failed litmus testcases? dmose: tracy: hmmm, interesting dmose: we could simply schedule a test day for thursday nth10sd: dmose: and use a nightly? marcia: nth10sd: periodically we just review the failed ones and make sure they are really fails. SOmetimes people fail just on verbiage tracy: dmose, that's what I'd suggest. dmose: if we don't have a blocker, yeah wsmwk: agree with gary, if the bigger goal is to test the process, not the product. marcia: nth10sd: there is automated way to get failed results from Litmus on a daily basis nth10sd: ah marcia: coop does it for Firefox, so I am sure it can be adapted to Tbird dmose: s/blocker/build tracy: nth10sd: testday reports can also be setup specifically for your testday. nth10sd: wow so many things still to be done for Litmus nth10sd: Litmus issues: 1) morph TB2 to TB3 testcases, 2) automated way to get failed results from Litmus on a daily basis and 3) set up testday reports for Litmus nth10sd: can coop do all those above? nth10sd: s/do/help us with coop: if you've setup your testdays, the reports are automatic coop: and please file bugs for the first two • nth10sd wonders how to setup testdays tracy: admins can setup the testday report.. nth10sd: ok, i'll file the bugs nth10sd: otherwise i don't have anything else for the discussion tracy: I think the daily report covers all, yes? coop? Tomcat: i can setup the testday report nth10sd: dmose davida marcia wsmwk: anything else? nth10sd: thanks Tomcat marcia: nth10sd: not that i can think of coop: tracy: it might right now davida: nope, gotta be on a call anyway. Tomcat: nth10sd: the create testday report bug can you assign to me davida is now known as davida_phone. dmose: sounds good to me; thanks everyone for helping us sort through this stuff Tomcat: wsmwk: sounds good. nth10sd: thank you everyone! wsmwk: quick OT question coop: but if nth10sd is going to handle the thunderbird results, i could split them up nth10sd: coop: handle results? wsmwk: do we have symbols on server for thunderbird, eg wsmwk: http://symbols.mozilla.org/thunderbird wsmwk: http://symbols.mozilla.org/seamonkey wsmwk: http://symbols.mozilla.org/thunderbird wsmwk: http://symbols.mozilla.org/seamonkey dmose: i got the impression from ted that we might dmose: but it'd be good to double-check at some point nth10sd: wsmwk: sigh~ file a bug? wsmwk: we need that for some future qa work. wsmwk: to QA hangs, loops, etc - wsmwk: wihtout debug builds wsmwk: i'll check with ted nth10sd: wsmwk: i'd say file a bug and probably flag it dmose: wsmwk: or just try it and see if it works nth10sd: yeah wsmwk: i'll farm it out nth10sd: coop: re: filing the bugs, which components should they be in? and should i cc you? davida_phone left the chat room. (Ping timeout) coop: webtools/litmus, and no need to cc me directly, i'll see them