This is the proposal that we submitted to the CHI 2001 Panel reviewers
in September 2000. Feedback on this proposal led to the Measuring IA panel at CHI 2001.
Panel Proposal for Measuring Information Architecture Quality: Prove It (or
Not)!
Louis Rosenfeld
Argus Associates, Inc.
Keith Instone
Argus Associates, Inc.
Abstract
This panel debates a topic that has been popping up recently as a
consequence of different disciplines rubbing up against each other in a new
field: can the quality of an information architecture be measured
quantitatively? And if so, how can this analysis be verified?
Information architects and HCI professionals already are discussing this
issue regularly and at times heatedly . The need for guidance is especially
pressing because information architecture is an emerging field. As in
other areas of HCI, information architects are regularly confronted by
clients and employers alike with the need to justify the cost of their
efforts in quantitative ways.
Information architects come from a variety of disciplines including HCI,
library and information science, visual design, technical communications,
and computer science. These fields have widely varying opinions on the
validity of and techniques for quantifying information system performance.
While some dispute the validity of quantification, others tend to believe
that it is not only possible but the only valid means for assessing
information architecture. Members of both camps may resort to traditional
means of assessing information systems performance, while others feel that
the new medium of the Web requires new tools, techniques, and approaches for
such assessment.
Keywords
Information architecture, metrics, evaluation, information systems
performance, automated evaluation, qualitative methods.
Issues to be Covered
Panelists will be asked to state their position on these questions for five
minutes apiece, followed by open debate:
- Can an information architecture be measured as a whole, or can it only its
components be measured?
- Prove your case (that it can or cannot be measured).
Follow-up questions could include:
- How does evaluating the information architecture differ from evaluating
the user interface?
- Are different types of information systems (e.g., entertainment, news,
reference) more measurable than others?
- Are different architectural components (e.g., table of contents, site
hierarchy, search system. Contextual and global navigation) more measurable
than others?
- What levels of evaluation (such as qualitative, "soft" quantitative and
"hard" quantitative) are appropriate at which times?
- What are the strengths and weaknesses of these qualitative and
quantitative approaches?
- What are specific examples (case studies) from large sites where
measurement attempts have been made?
Controversial Aspects
- How do you measure things in this "brave new web world"? Do the standard
usability methods apply? Do marketing metrics apply? Do information
retrieval metrics apply?
- Should we even be measuring quantitatively? Do we even know enough in
order to be evaluating information architectures?
- What actually is quality? Should we be considering other measurements,
like success, utility, relevance?
Intended Audience
Information architects and any HCI professional interested in how to
determine the value of a web site or information systems in general. Any
CHI attendee who is applying their knowledge to large Internet and intranet
sites should be interested. Also, the term "information architecture" is
"hot" but has not yet been covered specifically at the CHI conference, so
that term alone may interest attendees.
Justification
Information architecture is a practice rooted in research and professional
discipline from many areas of HCI and a variety of other fields. Because of
this heritage, IA is emerging as the bridge from research to professional
implementation, not unlike the way HCI and the CHI conference has served as
a bridge in the past.
That heritage, however, has bestowed on IA a confounding concern:
measurement. Measurement in implementation, an unresolved issue in HCI
research, has emerged as a controversial issue within IA. The root challenge
from all corners is to justify your design decisions and your
implementation.
Rather than questioning individual aspects of the interface, these concerns
question the overall, integrated, system. Perhaps it will come as a surprise
(and be instructive) to some in HCI to learn that among IA practitioners and
their clients, especially clients for web implementations, that quantitative
measurement is not universally held to be the talisman of quality.
Panel Format
First, one of the moderator will briefly define the term "information
architecture". Then, to avoid long presentations, the he will allow each of
the panelists only five minutes to make his/her point. While necessarily
more discomforting for panelists, this will result in an edgy and more
to-the-point experience for the audience. After these brief presentations,
the panel will be opened up for audience questions and challenges. We will
also come prepared with additional relevant and challenging questions to
pose to panelists in case audience participation is flat. Panelists will be
encouraged to prepare answers to these questions in advance, although we may
not cover them during the discussion.
Panelists
For Quantification
- Marti Hearst, research and tools
- Shiraz Cupala, business and user experience
Hearst and Cupala will argue for quantification (thus, against relying on
qualitative measures of information architecture). Hearst will discuss
research and tools to support quantification. Cupala will argue for business
and user experience measures of information architecture.
Against Quantification
- Nick Ragouzis, quantification follows innovation
- Jesse James Garrett, analysis tools lack contextual knowledge
Ragouzis and Garrett will argue against quantification (and thus for
qualitative methods). Ragouzis will debate on more theoretical grounds while
Garrett will present his case as a practicing information architect.
Multifaceted
- Gary Marchionini, both qualitative and quantitative
Marchionini will take the middle ground and argue that the best solution is
the integration of both qualitative and quantitative means.
Moderation
Lou will moderate the panel by asking questions of the panelists, enforcing
time limits on responses and taking question from the audience.
Lou Rosenfeld is the President of Argus Associates, Inc., a leading
information architecture consulting firm. He is co-author of "Information
Architecture for the World Wide Web". He organized the ASIS Summit on
Information Architecture event and will be doing the same in 2001. He writes
and presents often about information architecture.
Organizing
Keith will focus on organizing the panel by preparing questions for
panelists to prepare for, compiling materials to support the panelists
positions, and facilitating "practice" discussions before the event.
Keith Instone is the Usability Specialist at Argus, where he works with many
Fortune 500 companies to apply usability principles on information
architecture projects. He is active within SIGCHI and with CHI conference
organizing activities. He built and maintains Usable Web.
Position Statements
Marti Hearst
Assistant Professor, School of Information Management & Systems (SIMS),
University of California, Berkeley
Given an estimated growth of 196 million new sites within the next five
years, an estimated 90% of web sites providing inadequate usability, and a
severe shortage of user interface professionals to ensure usable sites
[Nielsen 99], tools and methodologies are needed to accelerate and improve
the web site design process.
One of the underexploited opportunities in web site usability evaluation
resides in the area of automation of evaluation [Ivory & Hearst 00]. We
have surveyed the state of the art in automation of user interface usability
evaluation. This revealed significant gaps in the automation design space.
Of the 58 usability methods surveyed, only 15% can be conducted without
formal or informal user or evaluator testing. To address this gap, we are
developing systems that aid in automating usability assessment.
Quantification is necessary for automation, although the converse is not
necessary the case. I believe (most aspects) of information architecture
quality can be quantified, and that new tools are needed to augment existing
ones in order to make this happen.
These tools do not yet exist; my research group is actively investigating
how to develop them. It is imperative that such tools be verified with user
studies and that their results be seen as complementing those of other
methods. It is well-established that different usability measurement
methods uncover different results [Molich et al. 98 & 99, Jeffries et al.
91]. We see automated tools as being used in tandem with other more
traditional usability assessment techniques, such as formal and informal
user studies.
How are we doing this? First, we need to define what we mean by information
architecture. Web site design is a combination of information design,
navigation design, and graphic design [Newman & Landay 00]. The information
architecture of a web site is best described as the intersection of the
information structure (or content) with the navigation structure.
Quantitative metrics can be computed over the intersection of graphic design
and navigation design. Our earlier results using these types of metrics
[Ivory, Sinha, & Hearst 00] found statistically significant differences
between sites that have been judged by human experts as being well-designed
versus those with no such ratings. Our results so far only compute at the
page level; we are in the process of to extending them to measuring the a
site as a whole.
We are also now in the process of computing corresponding metrics for the
information architectural aspects of web site design. We are making use of
metrics and analysis methods from the fields of information retrieval and
computational linguistics in these efforts. There is recent related work in
this area, e.g. [Zhu & Gauch 00, Amento et al. 00]. We are also using
existing models of scent [Furnas 97, Pirolli 97] and will develop new ones
if needed.
- Amento,Terveen, and Hill. Does Authority Mean Quality? Prediction Expert
Quality Ratings on Web Documents. SIGIR 2000.
- George W. Furnas, Effective View Navigation, Proceedings of ACM CHI 97
Conference on Human Factors in Computing Systems, 1997
- M. Ivory, R. Sinha, and M. Hearst Preliminary Findings on Using Quantitative
Measures to Compare Favorably Ranked and Unranked Information-centric Web
Pages Proceedings of the 6th Conference on Human Factors and the Web,
Austin, TX, June 19, 2000.
- M. Ivory and M. Hearst, State of the Art in Automated Usability Evaluation
of User Interfaces, UCB Computer Science Technical Report UCB//CSD-00-1105,
2000.
(Both available at:
http://www.cs.berkeley.edu/~ivory/publish/index.html)
- Robin Jeffries and James R. Miller and Cathleen Wharton and Kathy M. Uyeda,
User Interface Evaluation in the Real World: A Comparison of Four
Techniques, Proceedings of ACM CHI'91 Conference on Human Factors n
Computing Systems, Practical Design Methods, 119--124, 1991
- R. Molich and N. Bevan and S. Butler and I. Curson and E. Kindlund and J.
Kirakowski and D. Miller, Comparative evaluation of usability tests,
Proceedings of UPA 98, Washington, DC, 189--200, June, 1998
- R. Molich and A. D. Thomsen and B. Karyukina and L. Schmidt and M. Ede and
W. van Oel and M. Arcuri, Comparative evaluation of usability tests,
Proceedings of ACM CHI'99 Conference on Human Factors in Computing Systems,
ls, Pittsburg, PA, 83-86, May, 1999
- Jakob Nielsen, User interface directions for the Web, Communications of the
ACM, 42, 1, 65--72, Jan, 1999
- Mark W. Newman and James A Landay. Sitemaps, Storyboards, and
Specifications: A Sketch of Web Site Design Practice. In the Proceedings of
Designing Interactive Systems, DIS 2000, New York City, 2000.
- Peter Pirolli, Computational Models of Information Scent-Following in a Very
Large Browsable Text Collection, Proceedings of the ACM SIGCHI Conference on
Human Factors in Computing Systems, May, 1997
- Zhu and Gauch, Incorporating Quality Metrics in Centralized/Distributed
Information Retrieval on the WWW. SIGIR 2000.
Shiraz Cupala
Lead Program Manager, Microsoft Corporation
If work on Information Architecture can improve usability and experience,
this improvement necessarily must be measurable. If not how would we know it
happened?
When you ask someone to improve an IA or to evaluate two IAs to see which is
better, they are doing those things
with an eye to some set of criteria that they define as important to that
work. There are many different potential criteria for this. What we must do
in attempting to quantify an IA is determine what are the relevant user
experience and business requirements to meet and measure success against
those. This measurement will cover both traditional HCI measures (like time
on task and successful completion), traditional perceptual measures (like
satisfaction and usefulness), and measures new to user experience (such as
customer loyalty, specific conversion metrics, or revenue).
I purport that all these measures can be quantified down to the level of the
factors/features driving them. Further they all must be measured to get the
complete picture of the performance of an IA. Note that the business
benefits are the hardest to measure and will require means of measurement
that may be newer to the interaction design discipline.
Nick Ragouzis
Principal, Interfacility
Quantification seems an important goal. After all, the progress of science
depends on quantification. Or does it?
An ethnographer observing the development of these two cultures, the HCI
research community and the application-focused design community, might
conclude that the overall system is driven merely by evolution dynamics:
loosely-executed mimicry of popular practices and chance mutations
arbitrated by the caprice and indifference (and the plasticity and
tolerance) of their customers.
One familiar with HCI research could readily conclude that those in applied
practices rarely know of even established research results (let alone more
recent results). They rarely have a method for evaluating that research
(especially in disambiguating popularized applied-domain "findings" from
well-executed research) and integrating it into practice. They are equally
slack in pursuing research in other domains and reconciling it with HCI
research. They almost never credibly analyze their implementations.
Even more disconcerting is this. While HCI reinforces its hallowed
stanchion, the human-as-computer operating in the task domain, design
practices build ever-more-heavily over the weakest pier of HCI's foundation.
That weakness was encapsulated by Card, Moran, and Newell, in TPoHCI, in
their then-unanswered questions regarding extensions into the realm of
non-instruction-following tasks (Q16) and the role of an applied psychology
(Q18). This capsule remains mostly buried, with those in the application
domain unaware of their immanent peril.
However, with the rise of interaction design and information architecture,
and the overt intention of delighting end users even while making their
lives easier, the design community has continued their push into the
experience domain. Over decades, without a credible basis for defining or
measuring the whole of human experience, they have garnered an astounding
quantity of successes.
One could conclude that success in this domain requires only the ability to
innovate or to follow strategically, and the ability to deliver
user-perceptible value.
Which is another view of science: that quantification merely follows, but
that science (especially the social sciences) proceeds through innovation
and serendipity in theory and application, and by the delivery of ultimate
value.
Decided: Abandon quantification; and may the fittest win.
Jesse James Garrett
Information Architect, Metrius
Imagine yourself a visitor in an unfamiliar city. You have to drive your
rental car from your hotel to an important meeting. The clerk in the hotel
lobby advises you of two possible routes to your destination. The first
route involves negotiating a series of unmarked one-way roads. The second is
a straight shot down a major thoroughfare -- but takes 20 minutes longer.
Which do you choose?
With information architecture, as with city streets, the shortest path is
not always the best. Qualitative considerations for which automated
analysis methods are poorly suited invariably come into play. A software
tool might be able to count the clicks to a destination, but how can it
evaluate the contextual information that steers users along the right path?
It might be able to count the words in navigational labels, but how can it
know if they are the right words?
In the four years I have been dealing with information architecture issues
in Web development, I have encountered this fallacious thinking time and
again: that problems arising in a technological context must therefore have
a technological solution. In my experience, only the application of our own
human intelligence can enable us to avoid casting users into a
well-intentioned but ill-conceived array of "shortcuts" to their goals.
Gary Marchionini
Professor, School of Information and Library Science. University of North
Carolina at Chapel Hill
My position is strongly influenced by viewing evaluation from a research
perspective rather than a product testing perspective. Except in trivial
cases where really bad web sites are compared to typical web sites, it seems
uninteresting to say that one design is superior to another design without
qualifying the context for such a comparison. For me, evaluation is a
research process that aims to understand the meaning of some phenomenon
situated in a context and the changes that take place as the phenomenon and
the context interact. I do believe we can measure the usability and
effectiveness of a design for very fine-grained characteristics such as
number of clicks to task, mouse-travel distance from object to object, and
average response time for an N-term query. We may even be able to measure
satisfaction with layout, or attractiveness, or organizational clarity
within user-task contexts, not to mention surrogates for usefulness such as
popularity based on transaction log data, but I am skeptical that we can
have a standardized overall measure of IA effectiveness. Consider the
"effectiveness" of a building design. That the roof does not leak with the
first rain is one measure (let alone if it leaks during a 100 year rain),
but whether it raises your spirit every time you walk past it is quite a
different assessment. Clearly, there are some things that we can measure
precisely that go into judging design effectiveness and other
characteristics that we cannot measure precisely nor decide how much that
characteristic contributes to the whole design.
My approach to the measurement of digital libraries and interfaces has thus
been multifaceted and longitudinal (or at least iterative when time is
short). I believe these same approaches apply to IA evaluation.
Multifaceted approaches used several techniques and metrics to gain a fuller
view of the design. The analogy is to a medical CAT scan---we examine many
slices of data and aggregate an interpretation for the whole. The analogy
is only suggestive, since in IA evaluation the slices are not replications
at slight variations with the same instrument, but different measures on
different criteria collected systematically and then interpreted to make
sense of the whole. This triangulation across even quantitative (e.g.,
results of transaction log analysis) and qualitative (e.g., user
self-reports via interview or questionnaire) can lead to plausible
conclusions about design effectiveness. Applying the same techniques a year
later can, however, yield different conclusions because user populations
change-they learn and develop expectations and behavior patterns that may be
out of step with technology and or design practice. Thus, I advocate both
multifaceted and longitudinal evaluation. In an extreme case, I have taken
this approach to the Perseus Digital Library evaluation over a 12 year
period (see first 3 papers available at the URLs below). Clearly, a
longitudinal approach is not viable in a rapidly changing marketplace
environment. We can take a multifaceted approach as long as we specify
tasks and context and use specific evaluation techniques (e.g., Hert et al
recently used index usage techniques to evaluate web sites for highly
specified tasks) and do not over-generalize results. See the fourth URL
below for a study that used transaction logs, content analysis of email
requests, individual and focus group interviews, and document analysis.
Recent comments
5 weeks 4 days ago
6 weeks 22 hours ago
6 weeks 4 days ago
7 weeks 1 day ago
16 weeks 5 days ago
16 weeks 5 days ago
19 weeks 2 days ago
26 weeks 3 days ago
36 weeks 1 day ago
39 weeks 3 days ago