ITG Logo









Internetworking 2.3 Header

contents prev: ITG Listserv next: QOLite.html
ARTICLE

Cluster Analysis for Web Site Organization
Using cluster analysis to help meet users' expectations in site structure

Shirley Martin, smartin2@us.ibm.com
IBM, Austin, USA

Abstract
The organizational structure of a Web site is an essential element of its ease of use. Site visitors must be able to navigate freely and confidently through a site in order to find, enjoy, and make use of its contents. Unfortunately, many corporate Web sites inherit their structures from the internal arrangement of the various divisions of their companies. Such designs are often inscrutable to the users of these sites.

This paper outlines the premises of and describes a method for using card-sorting and cluster analysis to involve users in the organizational design of Web sites. Members of a site's target audience sort cards representing key pages of a proposed site into groups. Cluster analysis is then performed across all participants' card groupings to produce site diagrams. By revealing the perceived relatedness of the key pages, these diagrams can help guide the navigational design of the site to meet users' expectations, resulting in a more usable site.

Web design issue: Web site organization
The organizational structure of a Web site can have a profound effect on its ease of use. An ideal structure would allow users to navigate freely and confidently through the site, while a less-than-ideal structure can throw obstacles between the users and their goals. Many corporate Internet sites inherit their structures from the internal structures of their companies, grouping the pages of the site according to the divisions that produce them. Unfortunately, most visitors to these sites are unfamiliar with the inner workings of the companies, and are unlikely to find this kind of site easy to navigate.

A more user-oriented approach to site structure design requires evaluating users' expectations for organizing a Web site. One method of collecting data on users' organizational expectations is card sorting. In a card-sorting exercise, participants are presented with randomly ordered cards representing pages of a Web site, and group the cards as they see fit.

Learning how users group pages is useful, but how can site designers reconcile the various groupings that different users choose? Some Web site designers have "eyeballed" card groupings created by a few test participants (e.g., Nielsen and Sano, 1994), and somehow divined a central tendency from the competing sorting structures. This method, if ever it were manageable, becomes unwieldy very quickly with the inclusion of more than a handful of topics or users.

Cluster analysis of card-sorting data is a promising method for making sense of multiple participants' input to the organization of Web site pages. Cluster analysis quantifies card-sorting data by calculating the strength of the perceived relationships between pairs of cards, based on how often the members of each possible pair appear in a common group. The measure of the relationship between any two cards is that pair's similarity score. Cluster analysis programs can display output in the form of tree diagrams, in which the relationship between each pair of cards is represented graphically by the distance between the origin and the branching of the lines leading to the two cards (see Interpreting the diagrams below for an explanation of a sample card-sorting tree diagram).

This paper outlines the premises of and describes a method for using card-sorting and cluster analysis to involve users in the organizational design of Web sites.

Card-sorting Exercise

Overview
Card sorting is a data collection method that can be particularly useful for understanding users' perceptions of relationships between items. In the example described here, participants sort cards that display contents of a Web site's most important pages. The strengths of the page relationships are calculated by assigning similarity points to each pair of cards each time they appear in a common group. The points are totaled across all participants and converted into a distance score for each possible pair of cards. Then the distance scores are compared using a cluster analysis program that arranges pages into a tree structure.

Procedure
Target audience definition. As in any user involvement activity, the first step of a card-sorting exercise is to identify the target audience for the site. This step is essential and deserves special attention because different groups of users will expect different arrangements of site content. An audience description should include all the qualities that pertain to their interest in the site; for example, a target audience could be "information technology professionals whose job responsibilities include hardware or software purchasing decisions." If the site is intended to serve more than one audience, the card-sorting exercise should include representatives of each user group.

When the audience descriptions are complete, participants who match those descriptions should be recruited. It is essential that the participants have no more familiarity with the company or organization the site represents than do the target audience members.

Cards. Create several sets of paper or poster-board cards representing information to be included in your Web site (see Figure 1 for an example of a "page" card). The information on each card should include a title and a one-sentence summary of the contents of that page.


Toolbar icons



Downloadable graphical controls for
use in designing toolbars


Figure 1.
Sample page card for card-sorting task

Shuffle the cards thoroughly to assure random arrangement within each set. If users perceive any logical ordering in the cards as initially presented, that ordering may influence the users' groupings.

Procedure

  • Test each participant in an individual session to assure independence of grouping strategies.

    Although it may seem economical to have several test participants arrange card sets in a single session, the results of multiple-user sessions may be less reliable than those of individual sessions. In a multiple-participant situation, participants may influence one another's number of card groups or sorting criteria. Participants also may be reluctant to take as much time as they need for careful sorting if they see that others have completed the task. Because these influences can be subliminal, their effects cannot be avoided through instructions to disregard other participants.

  • Ask each participant to arrange the cards into logical groups.

    Explain that the groups should contain topics that seem to that participant to be related. An example instruction reads:

    "Please arrange the cards into groups in a way that makes sense to you. There is no right or wrong arrangement; we are interested in what you perceive to be the most logical arrangement of the cards."

  • When the participant is satisfied with the groupings, bind each group of cards together.

    The cards should be bound in such a way as to discourage the participant from moving cards from one group to another, e.g., by stapling. Cluster analysis assumes that participants are making the groupings independently, without planning further levels of categorization.

  • Ask the participant to arrange the original groups into larger groups if any further logical groupings are apparent.

  • When the participant is satisfied with the second grouping, or has stated that no further grouping is logical beyond the first pass, bind each set of groups with a clip or rubber band.

  • (Optional) Solicit suggestions for names for the larger groups.

    If suggestions for group names are needed, supply self-adhesive notepaper and ask participants to label the groups they created (see Neilsen, 1994). As discussed above, participants should not be forewarned that they will be providing names for the bundles. They should feel free to group the cards as their "gut" requires, without concern for how to articulate or explain the basis of the groupings.

    The card-sorting procedure should be explained only incrementally over the course of the exercise (see Appendix A, Participant Instruction). The entire procedure should not be explained to the participants at the beginning of the exercise; an explanation that they will be arranging topic cards into groups will suffice. The cluster analysis below is designed to work on the assumption that participants completed the first sorting without planning any subsequent bundling of the groups, and performed each sorting level without concern for naming the groups.
Cluster Analysis

Overview
Cluster analysis is rarely applied to card-sorting data, probably due to the tedious procedures required for getting the user data into, and interpreting the output of, currently available statistical packages such as SAStm or Statisticatm. Both of these popular packages require converting the raw user data (card groups) into matrices of either distance scores or similarity scores. This conversion can take several hours per participant if performed by hand. The packages' output is also difficult to manage. The packages produce tree diagrams that represent the relationships users perceived between the cards, but provide no assistance in visualizing the consequences of choosing various criteria for grouping the pages.

The User Involvement Team has created a program that performs cluster analysis while simplifying both the input procedure and the interpretation of the resulting trees. The program is currently in a beta release from http://www.ibm.com/ibm/easy/downloads/index_ucdtools.html. When the cluster analysis program is offered, detailed instructions will be included; the current paper will discuss only the interpretation of the resulting tree diagrams.

Interpretation of results
Cluster analysis is not by its nature a definitive test to determine which items belong together. It extracts from card-sorting data the relative strength of perceived relationships between pairs of items, allowing site designers to consider these perceptions when organizing the site.

Reading the diagrams. The diagrams indicate the strength of the perceived relationships between pairs of pages by the relative distance from the origin (0) of the nearest vertical line that connects the pages' horizontal lines. To find the strength of the perceived relationship between any two pages, trace a path from one of the pages to the other, following the branches of the tree diagram and taking the shortest possible path. The distance from the origin (0.00) to the outermost vertical line required by this path represents the perceived degree of relatedness between the two pages. The maximum distance (1.00) would indicate that no participant grouped the two cards together; while the minimum distance (0.00) would mean that every participant grouped the two cards together in the initial stage of the sorting procedure.

Figure 2 illustrates the relationships between pairs of pages. The first sample pair is composed of the pages labeled Aptiva and Aptiva's ease of use. This pair is connected by a vertical line at approximately 0.22, indicating that participants perceived these pages as being relatively closely related. The other highlighted pair in Figure 2 is Kona Desktop and UI Fundamentals. The outermost vertical line required by the path between these two pages falls at 1.00, indicating that participants never placed them in a common group.

In Figure 3, the criterion for higher-level grouping is set at 0.92. This threshold means that any pair of cards connected at distance values below 0.92 will appear together in one of the resulting four large groups. This threshold value was set after observing the effects of several alternative values, and choosing one that produced a desirable number of high-level groups. The large groups are distinguished by differing background hues.

Figure 4 depicts a second criterion line at 0.70. This threshold subdivides the four large groups into smaller groups that include cards participants found to be more closely related. The minor divisions are distinguished by variations in saturation and brightness within the hues that distinguish the larger groups. Again, a distance criterion was established by adjusting the placement of the threshold line until a suitable number of groups resulted.

Figure 2. Sample Page Relationships
Figure 2.
Sample page relationships

Figure 3. Major Divisions
Figure 3.
Major divisions

Figure 4. Major and Minor Divisions
Figure 4.
Major and minor divisions

Applying the results The tree diagrams can be used to help plan the hierarchical structure of a Web site. The data from each of the audiences for the site can be analyzed separately, and the resulting trees can be compared. If the members of the various audiences group the contents very differently, the audiences might be better served by alternative views of the site, or by separate sites. But if the diagrams are similar across audiences, a common site structure can serve all those audiences.

To determine an appropriate set of top-level site sections, the upper threshold line can be manipulated until a desirable number of divisions result. For example, the upper threshold in Figure 3 is set at .92, dividing the site into four main sections. After these top-level divisions have been determined, other criteria can be chosen to obtain appropriate numbers of lower-level divisions, as illustrated in Figure 4.

In a large site in which various sections have different audiences, it may be appropriate to have the card-sorting exercises performed on each section by members of the audience most likely to use it.

Summary
Card-sorting exercises and cluster analysis can help site designers understand their target audience's expectations for site content organization. These procedures provide a method for quantifying the relationships users perceive between the content pages of a site. They allow users' expectations to influence a site's navigational structure. Site designers can use the results to help determine a structure that their audiences will understand.

References
  • Aldenderfer, M. S., and Blashfield, R. K. (1984). Cluster Analysis (Sage University Paper series on Quantitative Applications in the Social Sciences, No. 07-044). Beverly Hills, CA: Sage.
  • Nielsen, J., and Sano, D. (1994). SunWeb: User interface design for Sun Microsystem's internal web. Proceedings of the 2nd World Wide Web Conference '94: Mosaic and the Web (Chicago, IL, October 17-20), 547-557. (also available in hypertext form on the World Wide Web as http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/HCI/nielsen/sunweb.html).

contents prev: ITG Listserv next: QOLite.html

© Internet Technical Group
Last update: December 12, 1999
URL: http://www.sandia.gov/itg/newsletter/oct99/qolite.html
hosted by Sandia National Labs

Disclaimer: Neither Sandia Corporation, the United States Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by Sandia Corporation, the United States Government, or any agency thereof. The views and opinions expressed herein do not necessarily state or reflect those of Sandia Corporation, the United States Government or any agency thereof.