Panel Proposal for Measuring Information Architecture Quality: Prove It (or Not)!

This is the proposal that we submitted to the CHI 2001 Panel reviewers in September 2000. Feedback on this proposal led to the Measuring IA panel at CHI 2001.

Panel Proposal for Measuring Information Architecture Quality: Prove It (or Not)!

Louis Rosenfeld
Argus Associates, Inc.

Keith Instone
Argus Associates, Inc.

Abstract

This panel debates a topic that has been popping up recently as a consequence of different disciplines rubbing up against each other in a new field: can the quality of an information architecture be measured quantitatively? And if so, how can this analysis be verified?

Information architects and HCI professionals already are discussing this issue regularly and at times heatedly . The need for guidance is especially pressing because information architecture is an emerging field. As in other areas of HCI, information architects are regularly confronted by clients and employers alike with the need to justify the cost of their efforts in quantitative ways.

Information architects come from a variety of disciplines including HCI, library and information science, visual design, technical communications, and computer science. These fields have widely varying opinions on the validity of and techniques for quantifying information system performance.

While some dispute the validity of quantification, others tend to believe that it is not only possible but the only valid means for assessing information architecture. Members of both camps may resort to traditional means of assessing information systems performance, while others feel that the new medium of the Web requires new tools, techniques, and approaches for such assessment.

Keywords

Information architecture, metrics, evaluation, information systems performance, automated evaluation, qualitative methods.

Issues to be Covered

Panelists will be asked to state their position on these questions for five minutes apiece, followed by open debate:

  • Can an information architecture be measured as a whole, or can it only its components be measured?
  • Prove your case (that it can or cannot be measured). Follow-up questions could include:
  • How does evaluating the information architecture differ from evaluating the user interface?
  • Are different types of information systems (e.g., entertainment, news, reference) more measurable than others?
  • Are different architectural components (e.g., table of contents, site hierarchy, search system. Contextual and global navigation) more measurable than others?
  • What levels of evaluation (such as qualitative, "soft" quantitative and "hard" quantitative) are appropriate at which times?
  • What are the strengths and weaknesses of these qualitative and quantitative approaches?
  • What are specific examples (case studies) from large sites where measurement attempts have been made? Controversial Aspects
  • How do you measure things in this "brave new web world"? Do the standard usability methods apply? Do marketing metrics apply? Do information retrieval metrics apply?
  • Should we even be measuring quantitatively? Do we even know enough in order to be evaluating information architectures?
  • What actually is quality? Should we be considering other measurements, like success, utility, relevance?

Intended Audience

Information architects and any HCI professional interested in how to determine the value of a web site or information systems in general. Any CHI attendee who is applying their knowledge to large Internet and intranet sites should be interested. Also, the term "information architecture" is "hot" but has not yet been covered specifically at the CHI conference, so that term alone may interest attendees.

Justification

Information architecture is a practice rooted in research and professional discipline from many areas of HCI and a variety of other fields. Because of this heritage, IA is emerging as the bridge from research to professional implementation, not unlike the way HCI and the CHI conference has served as a bridge in the past.

That heritage, however, has bestowed on IA a confounding concern: measurement. Measurement in implementation, an unresolved issue in HCI research, has emerged as a controversial issue within IA. The root challenge from all corners is to justify your design decisions and your implementation.

Rather than questioning individual aspects of the interface, these concerns question the overall, integrated, system. Perhaps it will come as a surprise (and be instructive) to some in HCI to learn that among IA practitioners and their clients, especially clients for web implementations, that quantitative measurement is not universally held to be the talisman of quality.

Panel Format

First, one of the moderator will briefly define the term "information architecture". Then, to avoid long presentations, the he will allow each of the panelists only five minutes to make his/her point. While necessarily more discomforting for panelists, this will result in an edgy and more to-the-point experience for the audience. After these brief presentations, the panel will be opened up for audience questions and challenges. We will also come prepared with additional relevant and challenging questions to pose to panelists in case audience participation is flat. Panelists will be encouraged to prepare answers to these questions in advance, although we may not cover them during the discussion.

Panelists

For Quantification

  • Marti Hearst, research and tools
  • Shiraz Cupala, business and user experience

Hearst and Cupala will argue for quantification (thus, against relying on qualitative measures of information architecture). Hearst will discuss research and tools to support quantification. Cupala will argue for business and user experience measures of information architecture.

Against Quantification

  • Nick Ragouzis, quantification follows innovation
  • Jesse James Garrett, analysis tools lack contextual knowledge

Ragouzis and Garrett will argue against quantification (and thus for qualitative methods). Ragouzis will debate on more theoretical grounds while Garrett will present his case as a practicing information architect.

Multifaceted

  • Gary Marchionini, both qualitative and quantitative

Marchionini will take the middle ground and argue that the best solution is the integration of both qualitative and quantitative means.

Moderation

Lou will moderate the panel by asking questions of the panelists, enforcing time limits on responses and taking question from the audience.

Lou Rosenfeld is the President of Argus Associates, Inc., a leading information architecture consulting firm. He is co-author of "Information Architecture for the World Wide Web". He organized the ASIS Summit on Information Architecture event and will be doing the same in 2001. He writes and presents often about information architecture.

Organizing

Keith will focus on organizing the panel by preparing questions for panelists to prepare for, compiling materials to support the panelists positions, and facilitating "practice" discussions before the event. Keith Instone is the Usability Specialist at Argus, where he works with many Fortune 500 companies to apply usability principles on information architecture projects. He is active within SIGCHI and with CHI conference organizing activities. He built and maintains Usable Web.

Position Statements

Marti Hearst

Assistant Professor, School of Information Management & Systems (SIMS), University of California, Berkeley

Given an estimated growth of 196 million new sites within the next five years, an estimated 90% of web sites providing inadequate usability, and a severe shortage of user interface professionals to ensure usable sites [Nielsen 99], tools and methodologies are needed to accelerate and improve the web site design process.

One of the underexploited opportunities in web site usability evaluation resides in the area of automation of evaluation [Ivory & Hearst 00]. We have surveyed the state of the art in automation of user interface usability evaluation. This revealed significant gaps in the automation design space.

Of the 58 usability methods surveyed, only 15% can be conducted without formal or informal user or evaluator testing. To address this gap, we are developing systems that aid in automating usability assessment.

Quantification is necessary for automation, although the converse is not necessary the case. I believe (most aspects) of information architecture quality can be quantified, and that new tools are needed to augment existing ones in order to make this happen.

These tools do not yet exist; my research group is actively investigating how to develop them. It is imperative that such tools be verified with user studies and that their results be seen as complementing those of other methods. It is well-established that different usability measurement methods uncover different results [Molich et al. 98 & 99, Jeffries et al. 91]. We see automated tools as being used in tandem with other more traditional usability assessment techniques, such as formal and informal user studies.

How are we doing this? First, we need to define what we mean by information architecture. Web site design is a combination of information design, navigation design, and graphic design [Newman & Landay 00]. The information architecture of a web site is best described as the intersection of the information structure (or content) with the navigation structure.

Quantitative metrics can be computed over the intersection of graphic design and navigation design. Our earlier results using these types of metrics [Ivory, Sinha, & Hearst 00] found statistically significant differences between sites that have been judged by human experts as being well-designed versus those with no such ratings. Our results so far only compute at the page level; we are in the process of to extending them to measuring the a site as a whole.

We are also now in the process of computing corresponding metrics for the information architectural aspects of web site design. We are making use of metrics and analysis methods from the fields of information retrieval and computational linguistics in these efforts. There is recent related work in this area, e.g. [Zhu & Gauch 00, Amento et al. 00]. We are also using existing models of scent [Furnas 97, Pirolli 97] and will develop new ones if needed.

  • Amento,Terveen, and Hill. Does Authority Mean Quality? Prediction Expert Quality Ratings on Web Documents. SIGIR 2000.
  • George W. Furnas, Effective View Navigation, Proceedings of ACM CHI 97 Conference on Human Factors in Computing Systems, 1997
  • M. Ivory, R. Sinha, and M. Hearst Preliminary Findings on Using Quantitative Measures to Compare Favorably Ranked and Unranked Information-centric Web Pages Proceedings of the 6th Conference on Human Factors and the Web, Austin, TX, June 19, 2000.
  • M. Ivory and M. Hearst, State of the Art in Automated Usability Evaluation of User Interfaces, UCB Computer Science Technical Report UCB//CSD-00-1105, 2000. (Both available at: http://www.cs.berkeley.edu/~ivory/publish/index.html)
  • Robin Jeffries and James R. Miller and Cathleen Wharton and Kathy M. Uyeda, User Interface Evaluation in the Real World: A Comparison of Four Techniques, Proceedings of ACM CHI'91 Conference on Human Factors n Computing Systems, Practical Design Methods, 119--124, 1991
  • R. Molich and N. Bevan and S. Butler and I. Curson and E. Kindlund and J. Kirakowski and D. Miller, Comparative evaluation of usability tests, Proceedings of UPA 98, Washington, DC, 189--200, June, 1998
  • R. Molich and A. D. Thomsen and B. Karyukina and L. Schmidt and M. Ede and W. van Oel and M. Arcuri, Comparative evaluation of usability tests, Proceedings of ACM CHI'99 Conference on Human Factors in Computing Systems, ls, Pittsburg, PA, 83-86, May, 1999
  • Jakob Nielsen, User interface directions for the Web, Communications of the ACM, 42, 1, 65--72, Jan, 1999
  • Mark W. Newman and James A Landay. Sitemaps, Storyboards, and Specifications: A Sketch of Web Site Design Practice. In the Proceedings of Designing Interactive Systems, DIS 2000, New York City, 2000.
  • Peter Pirolli, Computational Models of Information Scent-Following in a Very Large Browsable Text Collection, Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, May, 1997
  • Zhu and Gauch, Incorporating Quality Metrics in Centralized/Distributed Information Retrieval on the WWW. SIGIR 2000.

Shiraz Cupala

Lead Program Manager, Microsoft Corporation

If work on Information Architecture can improve usability and experience, this improvement necessarily must be measurable. If not how would we know it happened?

When you ask someone to improve an IA or to evaluate two IAs to see which is better, they are doing those things with an eye to some set of criteria that they define as important to that work. There are many different potential criteria for this. What we must do in attempting to quantify an IA is determine what are the relevant user experience and business requirements to meet and measure success against those. This measurement will cover both traditional HCI measures (like time on task and successful completion), traditional perceptual measures (like satisfaction and usefulness), and measures new to user experience (such as customer loyalty, specific conversion metrics, or revenue).

I purport that all these measures can be quantified down to the level of the factors/features driving them. Further they all must be measured to get the complete picture of the performance of an IA. Note that the business benefits are the hardest to measure and will require means of measurement that may be newer to the interaction design discipline.

Nick Ragouzis

Principal, Interfacility

Quantification seems an important goal. After all, the progress of science depends on quantification. Or does it?

An ethnographer observing the development of these two cultures, the HCI research community and the application-focused design community, might conclude that the overall system is driven merely by evolution dynamics: loosely-executed mimicry of popular practices and chance mutations arbitrated by the caprice and indifference (and the plasticity and tolerance) of their customers.

One familiar with HCI research could readily conclude that those in applied practices rarely know of even established research results (let alone more recent results). They rarely have a method for evaluating that research (especially in disambiguating popularized applied-domain "findings" from well-executed research) and integrating it into practice. They are equally slack in pursuing research in other domains and reconciling it with HCI research. They almost never credibly analyze their implementations.

Even more disconcerting is this. While HCI reinforces its hallowed stanchion, the human-as-computer operating in the task domain, design practices build ever-more-heavily over the weakest pier of HCI's foundation. That weakness was encapsulated by Card, Moran, and Newell, in TPoHCI, in their then-unanswered questions regarding extensions into the realm of non-instruction-following tasks (Q16) and the role of an applied psychology (Q18). This capsule remains mostly buried, with those in the application domain unaware of their immanent peril.

However, with the rise of interaction design and information architecture, and the overt intention of delighting end users even while making their lives easier, the design community has continued their push into the experience domain. Over decades, without a credible basis for defining or measuring the whole of human experience, they have garnered an astounding quantity of successes.

One could conclude that success in this domain requires only the ability to innovate or to follow strategically, and the ability to deliver user-perceptible value.

Which is another view of science: that quantification merely follows, but that science (especially the social sciences) proceeds through innovation and serendipity in theory and application, and by the delivery of ultimate value.

Decided: Abandon quantification; and may the fittest win.

Jesse James Garrett

Information Architect, Metrius

Imagine yourself a visitor in an unfamiliar city. You have to drive your rental car from your hotel to an important meeting. The clerk in the hotel lobby advises you of two possible routes to your destination. The first route involves negotiating a series of unmarked one-way roads. The second is a straight shot down a major thoroughfare -- but takes 20 minutes longer. Which do you choose?

With information architecture, as with city streets, the shortest path is not always the best. Qualitative considerations for which automated analysis methods are poorly suited invariably come into play. A software tool might be able to count the clicks to a destination, but how can it evaluate the contextual information that steers users along the right path? It might be able to count the words in navigational labels, but how can it know if they are the right words?

In the four years I have been dealing with information architecture issues in Web development, I have encountered this fallacious thinking time and again: that problems arising in a technological context must therefore have a technological solution. In my experience, only the application of our own human intelligence can enable us to avoid casting users into a well-intentioned but ill-conceived array of "shortcuts" to their goals.

Gary Marchionini

Professor, School of Information and Library Science. University of North Carolina at Chapel Hill

My position is strongly influenced by viewing evaluation from a research perspective rather than a product testing perspective. Except in trivial cases where really bad web sites are compared to typical web sites, it seems uninteresting to say that one design is superior to another design without qualifying the context for such a comparison. For me, evaluation is a research process that aims to understand the meaning of some phenomenon situated in a context and the changes that take place as the phenomenon and the context interact. I do believe we can measure the usability and effectiveness of a design for very fine-grained characteristics such as number of clicks to task, mouse-travel distance from object to object, and average response time for an N-term query. We may even be able to measure satisfaction with layout, or attractiveness, or organizational clarity within user-task contexts, not to mention surrogates for usefulness such as popularity based on transaction log data, but I am skeptical that we can have a standardized overall measure of IA effectiveness. Consider the "effectiveness" of a building design. That the roof does not leak with the first rain is one measure (let alone if it leaks during a 100 year rain), but whether it raises your spirit every time you walk past it is quite a different assessment. Clearly, there are some things that we can measure precisely that go into judging design effectiveness and other characteristics that we cannot measure precisely nor decide how much that characteristic contributes to the whole design.

My approach to the measurement of digital libraries and interfaces has thus been multifaceted and longitudinal (or at least iterative when time is short). I believe these same approaches apply to IA evaluation. Multifaceted approaches used several techniques and metrics to gain a fuller view of the design. The analogy is to a medical CAT scan---we examine many slices of data and aggregate an interpretation for the whole. The analogy is only suggestive, since in IA evaluation the slices are not replications at slight variations with the same instrument, but different measures on different criteria collected systematically and then interpreted to make sense of the whole. This triangulation across even quantitative (e.g., results of transaction log analysis) and qualitative (e.g., user self-reports via interview or questionnaire) can lead to plausible conclusions about design effectiveness. Applying the same techniques a year later can, however, yield different conclusions because user populations change-they learn and develop expectations and behavior patterns that may be out of step with technology and or design practice. Thus, I advocate both multifaceted and longitudinal evaluation. In an extreme case, I have taken this approach to the Perseus Digital Library evaluation over a 12 year period (see first 3 papers available at the URLs below). Clearly, a longitudinal approach is not viable in a rapidly changing marketplace environment. We can take a multifaceted approach as long as we specify tasks and context and use specific evaluation techniques (e.g., Hert et al recently used index usage techniques to evaluate web sites for highly specified tasks) and do not over-generalize results. See the fourth URL below for a study that used transaction logs, content analysis of email requests, individual and focus group interviews, and document analysis.