3.14159, 42, and 7±2:
Some numbers are famous. For example, pi is known throughout the world. Other numbers are famous within particular subcultures. For fans of humorous novelist Douglas Adams, 42 is famous as the answer to life, the universe, and everything. Likewise, within the subculture of user interface designers, the most famous number is Miller's (1956) magical number 7±2. More than forty years after its initial publication, Miller's figure is cited in academic literature (e.g., Helander, Landauer, and Prabhu's Handbook of Human-Computer Interaction, 1997), at usability-related conferences such as Human Factors and the Web (e.g., June, 2000, Austin, Texas), and more privately, in countless interface design meetings.
The fame of Miller's number would be a wonderful thing if not for a couple of problems. First, at least in private settings, the magical number is often invoked inappropriately. For example, an individual may claim that a web page should have no more than 7±2 links on it. As will be discussed in more detail, nothing Miller said lends support to such a statement. Second, even when it is cited correctly, Miller's work is discussed as if the scientific understanding of short-term memory had not advanced at all in the last half century. In fact, an analysis of Miller's original paper and of subsequent scientific research suggests that 7±2 is no more relevant to user interface design than is Douglas Adams' facetious 42.
Miller's original paper spent more of its space discussing how many items an individual can identify when the set of items differ only on a single dimension (e.g., tones differing only in frequency or saline solutions differing only in their degree of saltiness); nevertheless, when most writers and speakers refer to the 7±2 heuristic, they seem to be referring to Miller's discussion of memory span (cf. Baddeley, 1994).
In a typical memory span experiment, an individual hears a list of items (e.g., digits) and tries to recall them in their presentation order. Miller's paper discussed the fact that most people can correctly recall about 7±2 items. Memory span is often interpreted as a measure of an individual's short-term memory (Watkins, 1977)--that is, his or her capacity to a remember information over very short periods of time (i.e., seconds, not minutes). If we assume that short-term memory is important to interface design and that memory span is a measure of short-term memory, then surely Miller's 7±2 rule should inform the design of user interfaces. Although this syllogism is alluring, the 7±2 heuristic is a poor one to apply to the design of user interfaces for a number of reasons.
First, the data cited by Miller have long since been superceded (see Baddeley, 1986; Cowan, 1995; Greene, 1992, Neath, 1998 for reviews). More contemporary experiments show that an individual's capacity for short-term remembering depends heavily on the nature of what is being remembered. For instance, memory span for non-verbal stimuli such as images or environmental sounds (e.g., a wailing siren, a honking horn) is typically much more limited than the famous 7 ± 2 (e.g., LeCompte & Watkins, 1993; Schiano & Watkins, 1981).
Even with verbal stimuli such as those used by Miller, there are serious qualifications to his findings. For instance, Miller's data refer to short words such as letters, digits, and monosyllabic words. Baddeley (1986) and others (see Neath, 1998) have documented clearly that as word length increases, short-term memory span decreases; thus, a list such as dirt, star, inn, beast, wife, bronze lump, golf is much easier to remember than a list such as property, officer. musician amplifier, orchestra, mosquito, gallery, alcohol. One well-established implication of this word-length effect is that the 7 ± 2 rule does not apply equally well across languages: Relative to English, memory span for the digits 1-9 is shorter in Welsh (Ellis & Hennelly, 1980) and longer in Cantonese (Stigler, Lee & Stevenson, 1986).
Not only does the length of a word complicate the issue of memory span, but so does the meaning of the word. Holding word length constant, words that are familiar and meaningful (e.g., river, oven, sugar) are remembered better than words that are unfamiliar (décor, pewter, zygote) or meaningless (rimal, plimsy, narple; Hulme, Maughan, & Brown, 1991).
Any simple application of the 7 ± 2 heuristic overlooks the fact that memory span depends on how the to-be-remembered stimuli are presented. Thus, if the stimuli are presented visually instead of auditorily, memory span is reduced by about one item (Crowder, 1967). The number is lower still if the list is presented in the presence of other auditory stimuli (e.g., people talking in the background), regardless of whether those sounds are irrelevant to the task of remembering the visually-presented stimuli (e.g., LeCompte, 1996). These findings are important to user interface design because they show how the basic 7 ± 2 heuristic can seriously overestimate the capacity of short-term memory. Yet another problem with the 7 ± 2 heuristic is that memory span experiments require individuals to recall a set of unrelated items in serial order. Thus, for the list 381649 to be counted as correct, the 3 must be reported first, the 8 second, and so on, with the 9 reported last. Serial order recall is much more difficult than free recall (see Neath, 1998), especially since the lists are constructed so that the order in which the items appear is random. This peculiar set of conditions contrasts sharply with most real world tasks. Ordinarily, people need to remember things that are meaningfully related and the precise order in which they occurred is less important (phone numbers being an obvious exception). Recall for meaningfully-related stimuli where the order of recall is unimportant is typically much greater than 7 ± 2 (see Greene, 1992; Tulving & Patkau, 1962; Watkins, 1974). Thus, although the capacity of short-term memory can be overestimated by Miller's heuristic, it can also be underestimated.
At best, Miller's 7 ± 2 figure applies to immediate serial recall for a sequence of familiar, easy-to-pronounce, unrelated, verbal stimuli presented auditorily with no distracting sounds within earshot. Thus, the narrow range of generality implied by the research findings cannot support the wide variety of situations to which people try to apply this heuristic. Based on the relevant data, user interface designers should probably forego application of the 7 ± 2 heuristic altogether. One response to the above argument is that it is merely an example of the sort of nitpickings that keep academic researchers in business and has little or nothing to do with the straightforward rules of thumb by which practitioners earn their keep. What if practitioners simply grant that the 7 ± 2 heuristic can be off by a digit or two or even more at times? Should we discard it completely because it is even less precise that we thought? Perhaps not. Regardless, there is a better reason to discard it.
Even if Miller's numbers held up in all the conditions under which they actually fail, there would still be a problem. The problem is that memory span is a statistical concept. When a psychologist says that people can remember 7 ± 2 items, he or she has a very technical definition in mind: A memory span of 7.0 means that there is 50% chance that an individual will perfectly recall an entire list in serial order. Some techniques measure this 50% point more accurately than do others, but all measures of span are probabilistic. Consequently, it is perfectly meaningful to state that a particular individual has been shown to have a span of 6.25 items.
What are the implications of this fact for user interface design? Memory span is an attempt to measure the average maximum capacity of short-term memory. Should applied psychologists design systems with the typical individual's maximum capacity in mind? Probably not. Consider the following analogy: A certain manual laborer has about a 50/50 chance of successfully lifting a 100 kg weight over his head in one clean sweep. Knowing this fact tells us something about how strong he is, but it does not imply that we should design his workday so that he has to repeatedly lift 100 kg.. Moreover, it does not follow that if he can lift 100 kg half of the time that we can just knock off 20 or 30 kg and expect him to lift that over and over. Rather, we would need to find a weight that he can comfortably lift again and again. Knowing his maximum capacity reveals little about this more relevant figure. Taking this metaphor back to memory span: Knowing that most people can successfully remember between 5 and 9 items about half of the time gives us virtually no useful guideline for design of user interfaces.
An ardent defender of the 7 ± 2 heuristic might respond to the argument that short-term memory is largely irrelevant to user interface design by pointing to the fact that Miller's original paper actually devoted most of its space to discussing perceptual judgments. Miller reviewed research showing that when an individual tries to identify each member of a set of items that differ only on a single dimension, the number of separately identifiable items falls into the 7 ± 2 range. Thus, for example, most people can reliably identify about 5 or 6 different levels of loudness; more than that and they begin to get confused. So, although memory span may be irrelevant, perhaps the fact that people can make absolute judgments about only 5-9 unidimensionally variable items should influence web design. The weakness of this argument is that web pages comprise stimuli such as words and images that vary along many dimensions at once. Miller (1956) himself noted that multi-dimensional stimuli are different from unidimensional stimuli: "Everyday experience teaches us that we can identify accurately any of several hundred faces, any one of several thousand words, any one of several thousand objects, etc." Thus, in the arena of interface design, the capacity of human perceptual discrimination is not captured by the 7 ± 2 rule.
If we reject 7 ± 2 as a rule of thumb, with what do we replace it? If there is to be a magic number, it should be three, not seven. If we assume that memory spans are normally distributed with a standard deviation of one digit, then more than 90% of the population should have a memory span between 5 and 9, and fewer than 2 out of every 10,000 should have a span as low as 3. In fact, a span of 3 items is typical of individuals who have suffered certain kinds of serious brain damage (e.g., Martin & Lesch, 1996). The point is that almost everyone has a memory span of more than 3 items.
Of course, as mentioned previously, memory span is a peculiar measure that taps an individual's ability to remember random items in serial order. Nevertheless, the number three can be derived from other sources. For instance, a common method in memory research, known as the Brown-Peterson procedure (see Greene, 1992 for a review), involves trying to remember three items in any order while being distracted by another irrelevant task. Performance on this task degrades rapidly over the course of just a few seconds. In contrast, on a 3-item test of short-term memory without distraction, performance is essentially perfect for almost everyone. Therefore, three seems like a relatively safe number of items to expect people to remember.
One last point to keep in mind: Even in his original paper, Miller acknowledged that the concept of an item is a fuzzy one. A designer might look at some set of stimuli as three items, such as the numbers 2, 1, and 6, but some user seeing this trio perceives them as February 16, his birthday. Thus, three items become one item. Miller (1956) called this process chunking. For designers, the more serious problem is that experts often see chunks or patterns among stimuli that novices do not see (Chase & Simon, 1973). Consequently, an interface might appear to have a light memory load to an expert because the terminology is effortlessly chunked, but to the novice user, every element in the display must be remembered separately, making the burden on memory quite severe.
Of course, one way to avoid the consideration of all the intricacies of short-term memory is to acknowledge that this aspect of human cognition is both limited and unreliable. Any system designed with a dependence on the short-term memory of its human users is likely to generate occasional failures and frequent frustration. In other words, short-term memory is like my lazy Uncle Walter--it'll work, but not well. In an ideal system, no one would have to rely on short-term memory (or Uncle Walter) for anything whatsoever.
For more specific design guidance, practitioners should rely on principles, heuristics, and empirical data relevant to the problem at hand. If the question is one of menu design or page layout, there is ample academic research (Shneiderman, 1998), practical data collection (Spool et al., 1997), and expert advice (e.g., Galitz, 1997) to guide initial design. In any case, whether or not one starts with some magic number as a guide, user-centered design dictates that the initial design should be tested iteratively with real users attempting real tasks and should then be modified accordingly. In conclusion, Miller's 7 ± 2 heuristic is wrong more often than its right. And even when it is right, it is a poor heuristic for the design of user interfaces.
© Internet Technical Group
Last update: April 30, 2000
hosted by Sandia National Labs
Disclaimer: Neither Sandia Corporation, the United States Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by Sandia Corporation, the United States Government, or any agency thereof. The views and opinions expressed herein do not necessarily state or reflect those of Sandia Corporation, the United States Government or any agency thereof.