Labs/Ubiquity/Usability/Usability Testing/How To
This is rough "as hell". It is based on ideas for error proofing usability testing (especially for beginners) that are being tried out on another usability study now. It will be updated as that study proceeds and will go through another rewrite after that study is complete. As such it will have major grammatical errors (errors tense, terms will not be consistent, etc ;) due to the content being copied and pasted and generally mangled. In the meantime, please share your links and resources, comment, and contribute. This is a wiki, not a book!
Designing the Test
Traditional psychological testing has a pilot study and a final study. Because of the differences in size, recruitment, etc of usability testing, we designed the format by breaking the study into multiple chunks, analyzing the whole process after completing each chunk, and then deciding what's worth changing.
Instead of a single pilot study the testing is broken into multiple rounds thatgrow exponentially and become more refined. Each round goes from test to completed deliverable, after which the facilitator (that's you) evaluates the test, the workflow, and the goals of the test and decides what is worth changing and what is not. This prevents unexpected problems from infecting other test sessions, allows for improvements to the workflow, and caps testing to the minimum number of participants.
The study is designed to have three phases Alpha, Beta, and Gamma. Although presented linearly, jumping around and using Alpha tests during a Gamma phase is a sign of higher intelligence : ) We will walk through an actual study and explain each type of test as we go.
Example study: Video Vs. Written Documentation
Alpha Phase
(1 test with 1-2 participants total)
Alpha tests evaluate:
- the usability of a test
- the technical feasibility of testing the product
- and generate evidence of the worthiness of potential test data.
Spend a bare minimum of time and effort making and producing alpha tests. However, test on actual likely participants. Instead of producing a single basic video, a HTML page, and testing a considerable amount of time was spent on the following example materials, researching this topic, along with full blown mockups, 3 videos, team meetings, getting permission, etc.
<video type="vimeo" id="2901416" width="437" height="315" desc="1x Video Orignal Videos" frame="true" position="center"/>
There is a problem in our first test above, what is it?
The devilish part is that the more familiar one is with the software involved the harder the problem is to see. Two other programmers, four usability researchers, the project mailing list, and blog readers all failed to see that a video object steals keyboard focus from the browser and intercepts the Ubiquity hotkey (option and spacebar) and interpret it as pressing spacebar for pausing the movie.
This problem nuked the study for about 6 months. Beyond saving time, imagine having your manager, professor, or client fail this test during your proposal meeting. An alpha study with likely participants would have avoided all of this.
Beta Phase
(Multiple tests with 2-3 participants each)
Beta tests are pilot studies of 2-3 participants to ensure that your measurements actual measure what you are testing, and how to efficiently administer the test.
If you have not done a usability study before fully compile your data and publish your results, whether that be by video posts or outlining the final report and filling in what data you have. When done, evaluate how to improve the test itself (consent form, video captures software, format, error spreadsheet, etc) and your post production work flow (video encoders, video hosting, blogging, etc).
Publishing the tests now will also encourage buy-in from team members and management, solicit valuable feedback, and make generally help your usability cause. If possible, have a stakeholder watch the test and perform the sticky note exercise. If you are working remotely and are using video tagging assign a video to each stakeholder.
Coworkers may want to fix something, a boss may want to add additional goals. These are all good signs of usability buy-in. While respecting time constraints, try to implement their suggestions and any improvements to the test or post-production work flow.
Gamma Phase
(1+ test with 3-6 participants)
The gamma phase is really just a name for when you stop assessing your workflow and testing suite and just rip through test sessions and post production work like a Ninja.
Unless you are testing a complex interface (MS Office or an e-commerce website) or you have a demographic quota to fill you probably don't need to perform any gamma tests. While statistical significance is important the general rule of thumb is to stop testing when you begin to see patterns emerge. When you can sessions become predictable, when you stop finding new errors, etc you have probably run into all the walls users can run into. Stop, fix the product, and start a new test.
Test Types
Exploratory
This is the most commonly used format for Usability testing. It is more accurately defined as a sudo scientific,where interesting results are gained but there is no variable that changes.
Generally these are in an ‚"interview‚" format, where this is an overall goal and possibly sub-tasks but the user drives the interaction. These are great for early alpha stage products. It allows developers to see fundamental problems with the product, see what users like and understand, and generally give guidance to the overall development of the product.
Just because exploratory studies are more qualitative doesn't mean that all rules go out the window. Probable end users should be chosen, the setting should be as informal as possible, participation from the development team, error statistics, etc are all still very important in getting valuable information.
Strictly Scientific
This is when two groups of users are given the same task but two different interfaces. For websites this can happen sequentially or concurrently and the majority of usability work that major websites perform.
In a usability lab this type of test isn't much different from exploratory tests other than their scope. Users are more restricted, given only one goal instead of multiple goals. The numbers doubles, and often much more so if trying to ferret out slighter results.
These tests rely more heavily on error numbers than they do qualitative tests, so make sure that your alpha and beta phases really polish that process.
Prototype Types
Paper
Paper prototyping is the sliced bread of Usability. Sadly, very few practitioners use them. It's best to think of paper prototyping as the step in-between sketching a design and creating a mockup on the computer. It has all the advantages of sketching (cheap, fast to make, easy to change) and helps users open up to criticizing the design.
Part of the reason paper prototyping isn't used is that people think there must be some magic to it, some step they haven't thought about. The whole thing is silly, draw an interface and ask people to "use it." But it really is that simple.
<video type="youtube" id="GrV2SZuRPv0" width="437" height="315" desc="1x Video Original Video" frame="true" position="center"/>
The only time that paper prototyping isn't very useful is when the interaction is not restricted or response times are important. If you are trying to understand how users interact with a command line (unrestricted input) and that interaction depends on very fast speeds (like auto-suggest) it may be hard to get good information.
And unfortunately, if you have to share your designs to a remote team most people do not give early design prototypes the respect they deserve. And unless you have a touch screen computer, sending the sketches to remote teams can be inconvenient.
- UIE has some great great tips on paper prototyping and information on it's so powerful. [1]
- A great write-up and walkthrough of paper prototyping the hipster PDA. [2]
- And, of course, Wikipedia's article.[3]
Wireframes
Wireframes mockups of varying degrees of functionality that also look good. These are genally used for web posts and pitching ideas.
In order of least PITA to high PITA.
- Omnigraffle/Viso
- Yahoo Wireframes
- Omnigraffle UI
- Raster graphics editors (Photoshop, Gimp, etc)
- Yahoo Wireframes
- Fireworks, Flash, and other design products that have animation can be used for somewhat interactive prototypes.
- Videos?
- Pencil, other Gui Toolkits
Full-blown
There are too many GUI IDE's to list here. Of course we like Mozilla's Prism and XUL, but we STRONGLY caution that you use these as a last resort for early stage experimentation. You can get more information from multiple quick paper prototypes than all the effort spend on a single throw-away GUI prototype.
Recruiting & Interacting with participants
Recruiting participants should be fun and entertaining.
Choose a place.
Labs skew people's behavior, get the most informal setting you can. This is known as in-field testing and it makes people feel more comfortable. Coffee shops, corporate lunchrooms, or university cafeteria are all great places with plenty of people.
Choose a person.
The best sessions are ones in which no money or good are offered as compensation. If are able to just start up a conversation with someone try recruiting doing just that. If working with a group elect the most social person in your team to be the test facilitator.
Most won't be comfortable doing cold-call recruiting. That's okay, more expensive, but still okay.
However you recruit participants follow these guidelines and you will get much better interactions:
- Make eye contact and smile.
- Use the person's name when talking to them, "Heather, could you do X for me" or "Oh this is so great Heather, it will really help the team."
- Don't call the user a participant or subject in the study, with your co-workers, or in your reports. The term "participant" brings baggage and the users feels that s/he is being tested. They get nervous and act much differently than they would if they were using the product in their everyday lives. Instead researchers are known as facilitators and participants are users or testers. Because, after all, they are the ones testing the product!
Recruit your army.
For Beta tests<link> immediate friends and family are off limits, although their friends are not. Your history and interaction with them skews the test too far. Your friends and family in particular don't want to hurt your feelings.
If you do have a budget post on craigslist. A busy cafe or company lunchroom offer the largest pool of potential testers. If that is not an option work whatever social networks you have, be that Facebook, emailing, or just asking around.
Donuts, company merchandise, etc are cheap ways of paying your participants. Not paying them makes for even better test, but it does require more effort socially. <link>
Briefing/Consent
If you are going to collect personal information (be that age, contact info, or a video capture) you must get some sort of recordable evidence that they know what is going to happen with their information. If you are recording the session the easiest thing is for the participant to record their consent on tape. That way user can remain anonymous (their name is not required and can be seperated from the video) but still verifiable (if there is a dispute after the fact you play the recording).
More important than legal consent is moral consent. If you don't think they understand what they are signing up for don't let them participate. Even if your school or business has a broiler plate legal agreement for them to sign make sure they read a one or two sentence description of what will happen to the data.
In the same vein, copyright can't protect against embarrassing your users. Fair use mean that anyone can remix your video and make a parody of your poor test participant. If you do include video, make the video as small as possible, reduce the opacity to 50%, and overlay a product logo for good measure.
Recording the Session
Basically you need screen and web-cam capture software. What you use doesn't really matter, but a quick list of stuff we have tried is here.
If you are sharing the data publicly (especially personally identifiable information) some sort of consent is a very good idea. While a written consent form an audio or visual consent form is easier to manage and can't be lost as they are embedded in the information itself. The standard IANAL applies, nor has the following been vetted by a real lawyer but you can see examples of our consent process from previous testing. This consent process consistent to other Usability professionals and the Psychological research field in general.
Taking notes is good, but you will be going over the video again anyway so it's best to just pay attention. Silverback has an excellent feature that allows use of the Apple remote to add bookmarks.
Debriefing
Thank the person and give them information on where to find out more about the project you are working on.
Analysis and Sharing of Data
Spreadsheets are the easiest, fastest, and most flexible way of tracking your data. After a recording session immediately watch video and record each error. Doing this immediately cuts down on the tedious work when writing your final report and allows you to immediately compare errors from each session. In the next session you will be able to automatically assign a mistake to a specific category, instead of trying to unify all the different sessions at the end.
Local Development Team
The general rule is that all stakeholders (from programmers, to managers, to Quality Assurance people, and even marketers) are required to sit through at least one live session. During that session have them write down their thoughts and different errors into separate sticky-notes. Once the testing is complete buy some pizza and try grouping the stickies together on a white board and talk over what everyone saw.
Remote Development Team
Contact is especially important if you are working with a remote development team. Blogging, video podcasts, and steaming data analysis are all very important but very hard to balance. Take the path of least resistance to each one.
An incredible Wordpress skin would be great, but for now just download a free one and move on! Don't provide too much analysis, especially this early in. Be a reporter of what is happening and ask for analysis by the team. Good analysis takes a lot of time and also discourages others from analyzing your data and becoming engaged. It's a blog, not a scientific journal.
Make your data accessible hosting your spreadsheets on Google Docs, for example, but don't get caught up in having an interface that generates graphs. Your data may look ugly, as long as the delivery mechanism looks okay and your put up a sign that says "This data is raw, ugly, and unfinished. I just threw it up here so you can look at it if you want" people won't judge your ability to do your job well.
If you have video begin by speeding video by 1.65 (link to study on library data). This is the upper limit on making everything go faster but still legible. Producing video is something we will eventually touch on (when we have a unified post production workflow) but Miro has an awesome multi-platform guide to producing and editing video we couldn't hope to match.
Again, disseminate the video as quickly as you can. If you plan on tagging the video post the video immediately after the session without the tag information and repost it again after it's been tagged.
Options
- Upload to Viddler and assign a programmer to tag at least one session.
- Podcasts
- One major problem is that unless a session is being watched very closely (i.e. not playing in the background while a programmer is trying to get her/his programming done) it's hard to catch the most important small issues. With video programmers can take these with them and (hopefully) watch them when there is less stimulation, like on bus commutes. Mozilla has no official channels for this, however Ourmedia provides free hosting.
- Tag and make clip reels
- This can be very time intensive (easily 1 minute for every second of video) and the data is not portable. There is an active effort to improve both of these negative aspects by incorporating the tagging into the early data collection.
Triangulation
As you log your error data in a spreadsheet log bugs in the development bug tracker and link to them in both your spreadsheets, video, and final reports.
In your bug report links to any clip reels or links that jump to point in time of video promote engagement of the development team. It is probably best to create a script for this, although you could hack the ones we have. Link construction hack
Cross referencing further with user complaints provides additional support.
If you have the manpower, money, and political will a Metavid sever nicely takes care of tagging data, clip reels, disseminating information, and collaborative data analysis all in one swoop.
Final Report
You may or may not be required to make a final report. It is suggested you do, it is the single summary of the usability study that can be linked to and digested. It reflects well on you, it is the thing you can link to that you did, as opposed to a collection of data sources. And you will likely find yourself building pieces of it while doing your data analysis anyway.
Make sure to do this within "the fold" of how your company operates. An internal wiki, filing a word processed report, a presentation, etc.