Appears in
A. Ram & K. Moorman (editors), Understanding
Language Understanding: Computational Models of Reading, MIT Press.
Towards a theory of reading and understanding
Ashwin Ram
Kenneth Moorman
College of Computing
Georgia Institute of Technology
Motivations
The human ability to understand and use language remains one of the unsolved
mysteries of modern science. Language is one of the crucial aspects of
human intelligence; in fact, some have argued that it is the central aspect
(e.g., Fodor,
1975; Johnson,
1987; Lakoff
& Johnson,1980; Whorf,
1956; Wittgenstein,
1968). Although the human language processing system has been studied
extensively by researchers from a number of perspectives, including technical,
social, and psychological perspectives, it is still unclear how humans
process language and even what a scientific theory or explanation of this
ability might look like.
In this volume, we focus on one of the tasks that the human language
processing system is responsible for--reading. By reading we mean
the task that takes as its input a body of text in a natural language [1]
and produces as its output an understanding of that text. An obvious
question to be addressed is the nature of this understanding: what
is it, how it is represented, for what and how it is used, and how it might
be measured. Another important question is the nature of the task itself:
how is it carried out, what its constituent tasks are, and how we (as researchers)
might describe this task and how it works. Implicit in this approach is
the assumption that a theory of reading must account not only for what
reading produces as its result (an understanding of the given text) but
also how exactly reading works such that it can produce the said result
from the given text. In other words, we seek an explanatory theory or model
of the reading process and not simply a descriptive account.
Our goal is to address the problem of reading comprehension--processing
and understanding a natural language text, narrative or story. This constrains
our endeavor in two ways. First, an account of reading must explain how
the reader can understand text, that is, understand the situations described
in the text, explain who did what to whom, and how, and why, and construct
a coherent interpretation of the text that "makes sense." A theory which
focuses, for example, only on syntactic parsing of sentences is, by this
metric, not a theory of reading comprehension or text understanding, although
it might certainly be an important piece of a complete theory. The second
constraint is that an account of reading must explain how the reader can
understand "real" natural language texts—narratives, stories, newspaper
articles, dialogs, advertisements, and so on. This rules out models which
focus only on the processing of single sentences taken out of context or
of small researcher-constructed "stories." Although such models are certainly
important in that they provide crucial stepping stones towards the "big
picture" and may even be a piece of the complete theory of reading, they
do not by themselves constitute a satisfactory account of the human reading
capability. Methodologically, of course, researchers must often concentrate
on narrower subtasks of the reading process (such as syntactic parsing,
or explanation construction, or belief modeling) and/or on a narrower range
of textual inputs (such as individual sentences, or short newspaper articles,
or simple question-and-answer scenarios); the point is that the eventual
goal of the endeavor that has come to be known as natural language processing
(NLP) is to produce a theory of reading comprehension "in the large."
Assumptions
What might a theory of reading look like? We make two assumptions in this
volume. First, a scientific understanding of how agents read is best expressed
in terms of a functional-computational-representational model of
the reading process. [2]
By functional we mean that the process will be defined in terms of
its inputs and outputs and that it may be decomposed into one or more interactive
subtasks and further sub-subtasks which, in turn, will be defined in terms
of their inputs and outputs as well as their interactions with each other.
Once defined, the theory will also explain how exactly each subtask works
such that it can perform its function of transforming its inputs into its
outputs. By computational we mean that this transformation will
be described using an information-processing or computational model—an
explanatory, step-by-step account of how exactly the reading system (human
or machine) can derive the required outputs from the given inputs. By convention,
this account will be written down using the language of computer algorithms
and implemented using a computer program that can be executed to provide
evidence that the model does, in fact, do what is claimed of it. This requirement
forces the theory to be described precisely and provides a means for experimentation;
these and other benefits of the "computational psychology" or "cognitive
modeling" approach will be discussed below. Finally, by representational
we mean that the reading process is expected to make use of extensive background
knowledge in order to understand a text and produce as its output some
description of the information conveyed by the text; both the background
knowledge as well as the output description will be represented in some
manner inside the reading system. The form, content, and organization of
these representations is as much a research issue as is the process that
utilizes and produces them.
The second assumption underlying this volume is that inasmuch as a theory
of reading is concerned with accounting for the human ability to read,
it is important that the functions, processes, and representations postulated
by the theory, and the behaviors exhibited by the model, be cognitively
plausible and justified to the extent possible through psychological experimentation.
Where it is not possible to obtain detailed psychological data to verify
or refute fine-grained assumptions of a theory, these assumptions may be
justified in teleological terms (for example, computational, functional,
ecological, evolutionary, or philosophical arguments for why a subsystem
works the way it does) or at least via a sufficiency argument that demonstrates
that the proposed model is able to produce the behaviors that are being
accounted for (see, for example, Ram
& Jones, 1995). This demonstration is facilitated by the presence
of an executable computer model.
The modeling approach
Before we visit the reading task in more detail, let us discuss the computational
modeling approach that we will take to address this task. It is probably
the case that many of the models described in this volume will appear too
limited to appear to be actually "reading" the texts which they are given
in the full sense of the word. Then what purpose do such models serve?
In the computational modeling approach, the model itself is not the end
of the research cycle; instead, the model is used as a tool by the researcher
in order to refine the overarching theory behind it. As Margaret Boden
expressed it (Boden,
1986):
...artificial intelligence is the use of programs as tools
in the study of intelligent processes, tools that help in the discovery
of the thinking-procedures and epistemological structures employed by intelligent
creatures.
As a tool, then, what power does the computational model give to the intelligence
researcher? Boden suggests a set of what she calls Lovelace questions
which explore the "usefulness" of computer modeling with respect to the
study of creativity (see Boden,
1991). These questions are easily adapted to be applicable to the study
of reading as well.
First, can a computational model ever perform in a way such that it
appears to read and understand? The answer to this is "yes," as many of
the models depicted in this volume will show, albeit perhaps in a manner
or domain that is narrower than the full human reading capacity can handle.
However, this is an uninteresting question. After all, ELIZA (Weizenbaum,
1966) appeared to comprehend quite a bit using nothing more
than simple pattern matching, substitution, and a human willingness to
believe. Of course, the ways in which we tend to measure the appearance
of cognitive ability are now more strict, but even then most of the models
here will at least appear to be performing some aspect of reading.
If the first question is not that interesting, a reasonable followup
might be: Can a computational model ever really be able to read
and understand material? Unfortunately, it is not clear exactly how one
can distinguish "true" comprehension from the mere appearance of comprehension;
thus, this question is best left to computational philosophers.
If the appearance of comprehension is uninteresting and the reality
of comprehension is beyond the scope this volume, where does that leave
us? The issue is not whether an implemented computer program can actually
read and understand text but whether building such programs is a reasonable
way to approach the problem of producing an explanatory theory of reading
and understanding. The third Lovelace question, therefore, is the one we
will concentrate on: Can computational models help us understand how human
reading is possible? We believe the answer to be "yes" for a number of
reasons:
-
The computational model can act as a sufficiency argument for the theory.
In other words, if the model is an accurate instantiation of the theory
and if the model appears to perform some aspect of reading, then the model
shows that the claims of the theory are sufficient for explaining that
aspect of reading.
-
The computational model requires the researcher to be precise in the specification
of the theory, not only in what the tasks are and what they do but also
in how exactly they work. It is often easy to ``believe'' an assumption
to be true; when that assumption is implemented, it is revealed whether
it is true or not. As Hintzman states (Hintzman,
1991):
...an assertion can be so intuitively compelling that it is
accepted without close examination. In these cases, it may take a formal
model to convince researchers that the assertion is wrong, and even then
the belief may be hard to kill.
This is not meant to imply that researchers intentionally misbelieve assumptions
which they hold dear; Hintzman goes on to point out that models are useful
in illuminating theories because researchers are subject to a number of
reasoning flaws, such as not being able to track a large number of variables
simultaneously or being biased to accepting often heard statements as true.
-
The computational model gives the researcher a solid basis on which to
perform empirical evaluation. Through rigorous experimentation with the
model, the research can evaluate the power of the theory in question. This
evaluation of the model can also lead to refinement of the theory. As Cohen
states (Cohen,
1995):
Studying [computer] systems is not very different from studying
moderately intelligent animals such as rats. One obliges the agent (rat
or program) to perform a task according to an experimental protocol, observing
and analyzing the macro- and micro-structure of its behavior. Afterword,
if the subject is a rat, its head is opened up or chopped off; and if it
is a program, its innards are fiddled with.
If the model does something unexpected, then the theory can be modified
and the model re-evaluated. The unexpected behavior might represent an
inadequacy in the theory or sometimes even an unusual success.
-
The researcher interested in psychological theory derives an additional
benefit: the behaviors produced by the model can be compared with psychological
data. This allows the cognitive basis or plausibility of the theory to
be evaluated. Often, experimentation with the model provides predictions
that can be evaluated through additional psychological experiments.
-
The model allows the researcher to test the assumptions of the theory—which
ones are warranted, which ones are ad hoc, which ones are simply
wrong. The model forces the researcher to critically examine why the theory
works at an empirical level.
-
Finally, the model can allow the researcher to generalize the theory. Cohen
uses coherent explanations as an example (Cohen,
1995). If one builds a model which produces coherent explanations,
one can then examine that model to determine precisely what aspects of
it are responsible for its behavior. Once this is done, one is able to
manipulate the model in ways which can test the predictions of the underlying
theory and generalize the causal mechanisms involved. For example, one
might be able to reuse a portion of the model for a task other than reading
which requires the construction of coherent explanations.
Reading is a large, complicated, and ill-defined cognitive behavior, and
one that is extremely difficult to capture theoretically. However, for
the above reasons, computational modeling is a promising approach towards
this problem. Even if implemented models are still primitive with respect
to human performance, the endeavor of theorizing about, building, evaluating,
and revising these models can add significantly to our knowledge of the
human reading capacity.
The tasks of reading
A theory of reading, as we have defined it, must deal with a wide range
of issues and account for a wide range of behaviors and capabilities. Consider
the following example (Henry,
1986), which is the first paragraph of a longer story:
One dollar and eighty-seven cents. That was all. And sixty
cents of it was in pennies. Pennies saved one and two at a time by bulldozing
the grocer and the vegetable man and the butcher until one's cheeks burned
with the silent imputation of parsimony that such close dealing implied.
Three times Della counted it. One dollar and eighty-seven cents. And the
next day would be Christmas.
Some of the pieces of this puzzle include:
-
Processing words and sentences: The starting point for reading is the input
of the words in a sentence, word-by-word, sentence-by-sentence. Before
anything can be understood about the story, for example, the English text
has to be processed at this low-level. Much research in natural language
processing is concerned with how word meanings are looked up (what does
parsimony mean?), how ambiguous words are disambiguated (which meaning
of close should be applied?), how the meanings of the words in a
sentence are combined into a meaning for the sentence as a whole, how anaphora
are resolved (in the second sentence, what does that refer to?),
what the role of various punctuation is, what the tense of the sentence
is, when and how a reader might go back and re-read some text, and so on.
This area of the field is often called sentence processing, though
in real-world texts there is also the need to deal with sentence fragments,
such as One dollar and eighty-seven cents.
-
Drawing inferences: Natural language texts leave much as an exercise to
the reader. One of the most important tasks the reader must carry out is
to determine hidden meanings and make explicit what was left implicit in
the text. In order to do this, the reader must draw on the context provided
by the text that has been read so far, by the external situation that the
reader is in, and by the overarching task that the reader is carrying out.
The reader must also draw on background knowledge about the world in general
and the reader's past experiences—for example, why is the amount of money
Della has in the story and the fact that the next day is Christmas important
pieces of related information? Much of the research in this area is concerned
with knowledge representation—how contextual and background knowledge
is encoded; with memory—how this knowledge is organized such that
it can be retrieved at the appropriate moment using the available cues
(many of you reading the example probably recognized it as Gifts of
the Magi and retrieved the gist of the remainder of the story); and
with abduction—how background knowledge and current context can
be brought together to enable the reader to draw plausible inferences from
the material in the text.
-
Dealing with novelty: It is almost a definitional characteristic of natural
languages that they possess a great deal of novelty by virtue of their
flexibility and constant redefinition through cultural and social agreement.
This novelty can range from the introduction of novel words or the metaphorical
reuse of words in new contexts to the description of unfamiliar or novel
concepts through the use of language. Consider the example story. Many
readers will be unfamiliar with the word imputation but will not
have difficulty arriving at a reasonable meaning for it, based on the context.
On the other hand, consider the use of the term bulldoze. Even readers
unfamiliar with the usage given in the story can arrive at a reasonable
interpretation based on what they know about literal bulldozing
and given the the rest of the paragraph. Thus, reading research has also
been concerned with issues of learning, metaphor, analogy,
and creativity.
-
Controlling the process: People do not read in a vacuum; they read for
a purpose, be it entertainment, information seeking, or communication.
During the reading process, they are also concerned with other goals, activities,
and occurrences in the world around them which demand attention. It follows
that reading is an extremely flexible process; one can quickly skim a newspaper
article on the train while commuting to work in the morning, or read a
mystery novel and allocate much attention to details of the plot while
skipping over lengthy descriptions of the setting, or read in great detail
a carefully-constructed argument in an editorial that one has been asked
to write a response to for a term paper. For example, there is probably
no one who read the example paragraph word-by-word. Instead, the average
way to read a bit of text like that is to read almost every word,
skimming the rest. This would be more evident on a longer piece, of course.
There is less research into this aspect of reading, but some research has
been concerned with situated reading—how the reading task interacts
with, and is affected by, the larger context in which it is carried out;
focus of attention—how a reader pays different amounts of attention
on different aspects of the text, switching dynamically between skimming
and in-depth processing; and meta-reasoning—reasoning about the
reading process itself.
The chapters in this volume span this range of tasks that reading research
has been concerned with. We begin with Rapaport and Shapiro's discussion
(Chapter 2) of cognitive models of reading, and the relationship between
cognition and fiction. They explore the epistemological questions of how
a cognitive agent could represent fictional entities and their properties,
and reason about such entities, and their relationship with non-fictional
entities, during the course of reading a story. Following this, Mahesh,
Eiselt, and Holbrook (Chapter 3) discuss psycholinguistic issues in sentence
processing, focusing in particular on how multiple types of information,
such as syntactic and semantic information, can be integrated while understanding
a sentence. They present a computational model that can resolve ambiguous
interpresentations of a sentence and recover from conclusions that turn
out to be erroneous. Next, Domeshek, Jones, and Ram (Chapter 4) discuss
issues of form, content, and organization in knowledge representation.
They discuss how a reader can represent the meaning of a text as well as
the inferential knowledge that is required to understand the text. Wharton
and Lange (Chapter 5) discuss how a reader's episodic memory might be organized
and deployed to provide support for the reader's inferential processes.
They argue that the process by which some text is understood should be
integrated with the process by which it is used to recall relevant information
from memory, and present a computational model of the combined process.
Langston, Trabasso, and Magliano (Chapter 6) further the discussion of
inference, presenting a model of text comprehension along with psychological
data supporting their model. They explore the differences between on-line
processing during text comprehension and off-line processing after the
text has been read.
Following these chapters, we turn our attention to issues of contextualization
of the reading processes in the structure of the text as well as the overarching
tasks that the reader is engaged in. Meyer (Chapter 7) discusses how the
reader can use the structure of the text to support the comprehension of
that text. Different genres of text are read in different ways because
the individual characteristics of the readers interact with the individual
characteristics of the texts and of the authors of those texts. Ram (Chapter
8) discusses the influence of the reader's learning goals on the manner
and depth to which the text is processed. He presents a model of reading
as an active process in which the reader subjectively processes the text
while seeking information, creating hypotheses, asking questions, and pursuing
interesting ideas.
We then move on to discuss issues of learning and creativity. Peterson
and Billman (Chapter 9) present a model that explains how a reader handles
linguistic novelty. They present a computational model that can read and
interpret sentences containing novel verbs using underlying semantic information
about the language. Moorman and Ram (Chapter 10) discuss a model of creative
understanding which enables a reader to comprehend texts that contain novel
concepts. They show how a reader can creatively understand novel concepts
in a science fiction story using analogical reasoning and problem reformulation
supported by a principled representation of knowledge. Cox and Ram (Chapter
11) discuss parallels between reading and learning, arguing that there
are many similarities between these two tasks: identification of interesting
input, elaboration of input concepts, determination of the agent's goals,
and determination and execution of the strategies to be used to process
the input in pursuit of those goals.
While this volume is primarily concerned with functional-computational-representational
models of reading, be they symbolic or distributed (e.g., connectionist)
models, Riloff (Chapter 12) presents a number of alternative recent approaches
which, while they share much with the previous models, deviate from many
of the assumptions underlying these models. She argues that information
extraction approaches, concerned with identifying and extracting specific
types of information from text rather than in-depth knowledge-intensive
analysis of text, can provide significant leverage in story understanding.
Gerrig (Chapter 13) discusses of what human reading is really like, and
provides several directions which future research on reading will need
to pursue. He describes the reader's experience of being transported into
the narrative world of a text and mentally participating in that narrative
world during the reading process. Finally, Fletcher (Chapter 14) concludes
with his perspective on the endeavor of building computational models of
reading, such as those presented in this volume, arguing that it is productive
to invest resources and intellectual energy in this enterprise.
References
-
Boden, 1991
-
M.A. Boden. The Creative Mind: Myths and Mechanisms. Basic Books,
Inc., New York, 1991.
-
Boden, 1986
-
M.A. Boden. Artificial Intelligence and Natural Man. Basic Books,
Inc., New York, second edition, 1986.
-
Cohen, 1995
-
P.R. Cohen. Empirical Methods for Artificial Intelligence. MIT
Press, Cambridge, MA, 1995.
-
Fodor, 1975
-
J.A. Fodor. The Language of Thought. Thomas Y. Crowell, New York,
1975.
-
Henry, 1986
-
O. Henry. Gifts of the magi. In Paul J. Horowitz, editor, Collected
Stories of O. Henry. Avenel Books, New York, 1986.
-
Hintzman, 1991
-
D.L. Hintzman. Why are formal models useful in psychology? In William E.
Hockley and Stephen Lewandowsky, editors, Relating Theory and Data:
Essays on Human Memory in Honor of Bennet B. Murdock. Lawrence Erlbaum
Associatates, Publishers, Hillsdale, NJ, 1991.
-
Johnson, 1987
-
M. Johnson. The body in the mind: Bodily basis of meaning, imagination,
and reason. University of Chicago Press, Chicago, 1987.
-
Lakoff & Johnson, 1980
-
G. Lakoff and M. Johnson. Metaphors We Live By. University of Chicago
Press, Chicago, IL, 1980.
-
Ram & Jones, 1995
-
A. Ram and E. Jones. Foundations of foundations of artificial intelligence.
Philosophical Psychology, 8(2):193-199, 1995.
-
Weizenbaum, 1966
-
J. Weizenbaum. ELIZA—A computer program for the study of natural language
communication between man and machine. Communications of the ACM,
9:36-45, 1966.
-
Whorf, 1956
-
B. L. Whorf. Science and linguistics. In J. B. Carroll, editor, Language,
Thought, and Reality. MIT Press, Cambridge, MA, 1956.
-
Wittgenstein, 1968
-
L. Wittgenstein. Philosophical investigations. Macmillan, New York,
1968. Translated by G. E. M. Anscombe.
Footnotes
- [1]
...text in a natural language
-
A natural language is a language that has evolved through use in
a social system (for example, English, Spanish, French, or Hindi) as opposed
to one that has been designed by people for a specific purpose (for example,
Fortran or Java). Languages which are engineered but evolve through social
action (for instance, Esperanto, American Sign Language, and Klingon) are
also examples of natural languages.
- [2]
...functional-computational-representational model of the reading process
-
This does not imply that all research into reading or natural language
processing must necessarily involve computational modeling; on the contrary,
a range of psychological, social, and computational research is needed
to work towards the common goal of producing a detailed functional-computational-representational
model of reading.