Appears in A. Ram & K. Moorman (editors), Understanding Language Understanding: Computational Models of Reading, MIT Press.

Towards a theory of reading and understanding

Ashwin Ram Kenneth Moorman College of Computing Georgia Institute of Technology

Motivations
Assumptions
The modeling approach
The tasks of reading
References

Motivations

The human ability to understand and use language remains one of the unsolved mysteries of modern science. Language is one of the crucial aspects of human intelligence; in fact, some have argued that it is the central aspect (e.g., Fodor, 1975; Johnson, 1987; Lakoff & Johnson,1980; Whorf, 1956; Wittgenstein, 1968). Although the human language processing system has been studied extensively by researchers from a number of perspectives, including technical, social, and psychological perspectives, it is still unclear how humans process language and even what a scientific theory or explanation of this ability might look like.

In this volume, we focus on one of the tasks that the human language processing system is responsible for--reading. By reading we mean the task that takes as its input a body of text in a natural language [1] and produces as its output an understanding of that text. An obvious question to be addressed is the nature of this understanding: what is it, how it is represented, for what and how it is used, and how it might be measured. Another important question is the nature of the task itself: how is it carried out, what its constituent tasks are, and how we (as researchers) might describe this task and how it works. Implicit in this approach is the assumption that a theory of reading must account not only for what reading produces as its result (an understanding of the given text) but also how exactly reading works such that it can produce the said result from the given text. In other words, we seek an explanatory theory or model of the reading process and not simply a descriptive account.

Our goal is to address the problem of reading comprehension--processing and understanding a natural language text, narrative or story. This constrains our endeavor in two ways. First, an account of reading must explain how the reader can understand text, that is, understand the situations described in the text, explain who did what to whom, and how, and why, and construct a coherent interpretation of the text that "makes sense." A theory which focuses, for example, only on syntactic parsing of sentences is, by this metric, not a theory of reading comprehension or text understanding, although it might certainly be an important piece of a complete theory. The second constraint is that an account of reading must explain how the reader can understand "real" natural language texts—narratives, stories, newspaper articles, dialogs, advertisements, and so on. This rules out models which focus only on the processing of single sentences taken out of context or of small researcher-constructed "stories." Although such models are certainly important in that they provide crucial stepping stones towards the "big picture" and may even be a piece of the complete theory of reading, they do not by themselves constitute a satisfactory account of the human reading capability. Methodologically, of course, researchers must often concentrate on narrower subtasks of the reading process (such as syntactic parsing, or explanation construction, or belief modeling) and/or on a narrower range of textual inputs (such as individual sentences, or short newspaper articles, or simple question-and-answer scenarios); the point is that the eventual goal of the endeavor that has come to be known as natural language processing (NLP) is to produce a theory of reading comprehension "in the large."

Assumptions

What might a theory of reading look like? We make two assumptions in this volume. First, a scientific understanding of how agents read is best expressed in terms of a functional-computational-representational model of the reading process. [2] By functional we mean that the process will be defined in terms of its inputs and outputs and that it may be decomposed into one or more interactive subtasks and further sub-subtasks which, in turn, will be defined in terms of their inputs and outputs as well as their interactions with each other. Once defined, the theory will also explain how exactly each subtask works such that it can perform its function of transforming its inputs into its outputs. By computational we mean that this transformation will be described using an information-processing or computational model—an explanatory, step-by-step account of how exactly the reading system (human or machine) can derive the required outputs from the given inputs. By convention, this account will be written down using the language of computer algorithms and implemented using a computer program that can be executed to provide evidence that the model does, in fact, do what is claimed of it. This requirement forces the theory to be described precisely and provides a means for experimentation; these and other benefits of the "computational psychology" or "cognitive modeling" approach will be discussed below. Finally, by representational we mean that the reading process is expected to make use of extensive background knowledge in order to understand a text and produce as its output some description of the information conveyed by the text; both the background knowledge as well as the output description will be represented in some manner inside the reading system. The form, content, and organization of these representations is as much a research issue as is the process that utilizes and produces them.

The second assumption underlying this volume is that inasmuch as a theory of reading is concerned with accounting for the human ability to read, it is important that the functions, processes, and representations postulated by the theory, and the behaviors exhibited by the model, be cognitively plausible and justified to the extent possible through psychological experimentation. Where it is not possible to obtain detailed psychological data to verify or refute fine-grained assumptions of a theory, these assumptions may be justified in teleological terms (for example, computational, functional, ecological, evolutionary, or philosophical arguments for why a subsystem works the way it does) or at least via a sufficiency argument that demonstrates that the proposed model is able to produce the behaviors that are being accounted for (see, for example, Ram & Jones, 1995). This demonstration is facilitated by the presence of an executable computer model.

The modeling approach

Before we visit the reading task in more detail, let us discuss the computational modeling approach that we will take to address this task. It is probably the case that many of the models described in this volume will appear too limited to appear to be actually "reading" the texts which they are given in the full sense of the word. Then what purpose do such models serve? In the computational modeling approach, the model itself is not the end of the research cycle; instead, the model is used as a tool by the researcher in order to refine the overarching theory behind it. As Margaret Boden expressed it (Boden, 1986):

...artificial intelligence is the use of programs as tools in the study of intelligent processes, tools that help in the discovery of the thinking-procedures and epistemological structures employed by intelligent creatures.

As a tool, then, what power does the computational model give to the intelligence researcher? Boden suggests a set of what she calls Lovelace questions which explore the "usefulness" of computer modeling with respect to the study of creativity (see Boden, 1991). These questions are easily adapted to be applicable to the study of reading as well.

First, can a computational model ever perform in a way such that it appears to read and understand? The answer to this is "yes," as many of the models depicted in this volume will show, albeit perhaps in a manner or domain that is narrower than the full human reading capacity can handle. However, this is an uninteresting question. After all, ELIZA (Weizenbaum, 1966) appeared to comprehend quite a bit using nothing more than simple pattern matching, substitution, and a human willingness to believe. Of course, the ways in which we tend to measure the appearance of cognitive ability are now more strict, but even then most of the models here will at least appear to be performing some aspect of reading.

If the first question is not that interesting, a reasonable followup might be: Can a computational model ever really be able to read and understand material? Unfortunately, it is not clear exactly how one can distinguish "true" comprehension from the mere appearance of comprehension; thus, this question is best left to computational philosophers.

If the appearance of comprehension is uninteresting and the reality of comprehension is beyond the scope this volume, where does that leave us? The issue is not whether an implemented computer program can actually read and understand text but whether building such programs is a reasonable way to approach the problem of producing an explanatory theory of reading and understanding. The third Lovelace question, therefore, is the one we will concentrate on: Can computational models help us understand how human reading is possible? We believe the answer to be "yes" for a number of reasons:

The computational model can act as a sufficiency argument for the theory. In other words, if the model is an accurate instantiation of the theory and if the model appears to perform some aspect of reading, then the model shows that the claims of the theory are sufficient for explaining that aspect of reading.

The computational model requires the researcher to be precise in the specification of the theory, not only in what the tasks are and what they do but also in how exactly they work. It is often easy to ``believe'' an assumption to be true; when that assumption is implemented, it is revealed whether it is true or not. As Hintzman states (Hintzman, 1991):

...an assertion can be so intuitively compelling that it is accepted without close examination. In these cases, it may take a formal model to convince researchers that the assertion is wrong, and even then the belief may be hard to kill.

The computational model gives the researcher a solid basis on which to perform empirical evaluation. Through rigorous experimentation with the model, the research can evaluate the power of the theory in question. This evaluation of the model can also lead to refinement of the theory. As Cohen states (Cohen, 1995):

Studying [computer] systems is not very different from studying moderately intelligent animals such as rats. One obliges the agent (rat or program) to perform a task according to an experimental protocol, observing and analyzing the macro- and micro-structure of its behavior. Afterword, if the subject is a rat, its head is opened up or chopped off; and if it is a program, its innards are fiddled with.

The researcher interested in psychological theory derives an additional benefit: the behaviors produced by the model can be compared with psychological data. This allows the cognitive basis or plausibility of the theory to be evaluated. Often, experimentation with the model provides predictions that can be evaluated through additional psychological experiments.

The model allows the researcher to test the assumptions of the theory—which ones are warranted, which ones are ad hoc, which ones are simply wrong. The model forces the researcher to critically examine why the theory works at an empirical level.

Finally, the model can allow the researcher to generalize the theory. Cohen uses coherent explanations as an example (Cohen, 1995). If one builds a model which produces coherent explanations, one can then examine that model to determine precisely what aspects of it are responsible for its behavior. Once this is done, one is able to manipulate the model in ways which can test the predictions of the underlying theory and generalize the causal mechanisms involved. For example, one might be able to reuse a portion of the model for a task other than reading which requires the construction of coherent explanations.

Reading is a large, complicated, and ill-defined cognitive behavior, and one that is extremely difficult to capture theoretically. However, for the above reasons, computational modeling is a promising approach towards this problem. Even if implemented models are still primitive with respect to human performance, the endeavor of theorizing about, building, evaluating, and revising these models can add significantly to our knowledge of the human reading capacity.

The tasks of reading

A theory of reading, as we have defined it, must deal with a wide range of issues and account for a wide range of behaviors and capabilities. Consider the following example (Henry, 1986), which is the first paragraph of a longer story:

One dollar and eighty-seven cents. That was all. And sixty cents of it was in pennies. Pennies saved one and two at a time by bulldozing the grocer and the vegetable man and the butcher until one's cheeks burned with the silent imputation of parsimony that such close dealing implied. Three times Della counted it. One dollar and eighty-seven cents. And the next day would be Christmas.

Some of the pieces of this puzzle include:

Processing words and sentences: The starting point for reading is the input of the words in a sentence, word-by-word, sentence-by-sentence. Before anything can be understood about the story, for example, the English text has to be processed at this low-level. Much research in natural language processing is concerned with how word meanings are looked up (what does parsimony mean?), how ambiguous words are disambiguated (which meaning of close should be applied?), how the meanings of the words in a sentence are combined into a meaning for the sentence as a whole, how anaphora are resolved (in the second sentence, what does that refer to?), what the role of various punctuation is, what the tense of the sentence is, when and how a reader might go back and re-read some text, and so on. This area of the field is often called sentence processing, though in real-world texts there is also the need to deal with sentence fragments, such as One dollar and eighty-seven cents.

Drawing inferences: Natural language texts leave much as an exercise to the reader. One of the most important tasks the reader must carry out is to determine hidden meanings and make explicit what was left implicit in the text. In order to do this, the reader must draw on the context provided by the text that has been read so far, by the external situation that the reader is in, and by the overarching task that the reader is carrying out. The reader must also draw on background knowledge about the world in general and the reader's past experiences—for example, why is the amount of money Della has in the story and the fact that the next day is Christmas important pieces of related information? Much of the research in this area is concerned with knowledge representation—how contextual and background knowledge is encoded; with memory—how this knowledge is organized such that it can be retrieved at the appropriate moment using the available cues (many of you reading the example probably recognized it as Gifts of the Magi and retrieved the gist of the remainder of the story); and with abduction—how background knowledge and current context can be brought together to enable the reader to draw plausible inferences from the material in the text.

Dealing with novelty: It is almost a definitional characteristic of natural languages that they possess a great deal of novelty by virtue of their flexibility and constant redefinition through cultural and social agreement. This novelty can range from the introduction of novel words or the metaphorical reuse of words in new contexts to the description of unfamiliar or novel concepts through the use of language. Consider the example story. Many readers will be unfamiliar with the word imputation but will not have difficulty arriving at a reasonable meaning for it, based on the context. On the other hand, consider the use of the term bulldoze. Even readers unfamiliar with the usage given in the story can arrive at a reasonable interpretation based on what they know about literal bulldozing and given the the rest of the paragraph. Thus, reading research has also been concerned with issues of learning, metaphor, analogy, and creativity.

Controlling the process: People do not read in a vacuum; they read for a purpose, be it entertainment, information seeking, or communication. During the reading process, they are also concerned with other goals, activities, and occurrences in the world around them which demand attention. It follows that reading is an extremely flexible process; one can quickly skim a newspaper article on the train while commuting to work in the morning, or read a mystery novel and allocate much attention to details of the plot while skipping over lengthy descriptions of the setting, or read in great detail a carefully-constructed argument in an editorial that one has been asked to write a response to for a term paper. For example, there is probably no one who read the example paragraph word-by-word. Instead, the average way to read a bit of text like that is to read almost every word, skimming the rest. This would be more evident on a longer piece, of course. There is less research into this aspect of reading, but some research has been concerned with situated reading—how the reading task interacts with, and is affected by, the larger context in which it is carried out; focus of attention—how a reader pays different amounts of attention on different aspects of the text, switching dynamically between skimming and in-depth processing; and meta-reasoning—reasoning about the reading process itself.

The chapters in this volume span this range of tasks that reading research has been concerned with. We begin with Rapaport and Shapiro's discussion (Chapter 2) of cognitive models of reading, and the relationship between cognition and fiction. They explore the epistemological questions of how a cognitive agent could represent fictional entities and their properties, and reason about such entities, and their relationship with non-fictional entities, during the course of reading a story. Following this, Mahesh, Eiselt, and Holbrook (Chapter 3) discuss psycholinguistic issues in sentence processing, focusing in particular on how multiple types of information, such as syntactic and semantic information, can be integrated while understanding a sentence. They present a computational model that can resolve ambiguous interpresentations of a sentence and recover from conclusions that turn out to be erroneous. Next, Domeshek, Jones, and Ram (Chapter 4) discuss issues of form, content, and organization in knowledge representation. They discuss how a reader can represent the meaning of a text as well as the inferential knowledge that is required to understand the text. Wharton and Lange (Chapter 5) discuss how a reader's episodic memory might be organized and deployed to provide support for the reader's inferential processes. They argue that the process by which some text is understood should be integrated with the process by which it is used to recall relevant information from memory, and present a computational model of the combined process. Langston, Trabasso, and Magliano (Chapter 6) further the discussion of inference, presenting a model of text comprehension along with psychological data supporting their model. They explore the differences between on-line processing during text comprehension and off-line processing after the text has been read.

Following these chapters, we turn our attention to issues of contextualization of the reading processes in the structure of the text as well as the overarching tasks that the reader is engaged in. Meyer (Chapter 7) discusses how the reader can use the structure of the text to support the comprehension of that text. Different genres of text are read in different ways because the individual characteristics of the readers interact with the individual characteristics of the texts and of the authors of those texts. Ram (Chapter 8) discusses the influence of the reader's learning goals on the manner and depth to which the text is processed. He presents a model of reading as an active process in which the reader subjectively processes the text while seeking information, creating hypotheses, asking questions, and pursuing interesting ideas.

We then move on to discuss issues of learning and creativity. Peterson and Billman (Chapter 9) present a model that explains how a reader handles linguistic novelty. They present a computational model that can read and interpret sentences containing novel verbs using underlying semantic information about the language. Moorman and Ram (Chapter 10) discuss a model of creative understanding which enables a reader to comprehend texts that contain novel concepts. They show how a reader can creatively understand novel concepts in a science fiction story using analogical reasoning and problem reformulation supported by a principled representation of knowledge. Cox and Ram (Chapter 11) discuss parallels between reading and learning, arguing that there are many similarities between these two tasks: identification of interesting input, elaboration of input concepts, determination of the agent's goals, and determination and execution of the strategies to be used to process the input in pursuit of those goals.

While this volume is primarily concerned with functional-computational-representational models of reading, be they symbolic or distributed (e.g., connectionist) models, Riloff (Chapter 12) presents a number of alternative recent approaches which, while they share much with the previous models, deviate from many of the assumptions underlying these models. She argues that information extraction approaches, concerned with identifying and extracting specific types of information from text rather than in-depth knowledge-intensive analysis of text, can provide significant leverage in story understanding. Gerrig (Chapter 13) discusses of what human reading is really like, and provides several directions which future research on reading will need to pursue. He describes the reader's experience of being transported into the narrative world of a text and mentally participating in that narrative world during the reading process. Finally, Fletcher (Chapter 14) concludes with his perspective on the endeavor of building computational models of reading, such as those presented in this volume, arguing that it is productive to invest resources and intellectual energy in this enterprise.

References

Boden, 1991: M.A. Boden. The Creative Mind: Myths and Mechanisms. Basic Books, Inc., New York, 1991.
Boden, 1986: M.A. Boden. Artificial Intelligence and Natural Man. Basic Books, Inc., New York, second edition, 1986.
Cohen, 1995: P.R. Cohen. Empirical Methods for Artificial Intelligence. MIT Press, Cambridge, MA, 1995.
Fodor, 1975: J.A. Fodor. The Language of Thought. Thomas Y. Crowell, New York, 1975.
Henry, 1986: O. Henry. Gifts of the magi. In Paul J. Horowitz, editor, Collected Stories of O. Henry. Avenel Books, New York, 1986.
Hintzman, 1991: D.L. Hintzman. Why are formal models useful in psychology? In William E. Hockley and Stephen Lewandowsky, editors, Relating Theory and Data: Essays on Human Memory in Honor of Bennet B. Murdock. Lawrence Erlbaum Associatates, Publishers, Hillsdale, NJ, 1991.
Johnson, 1987: M. Johnson. The body in the mind: Bodily basis of meaning, imagination, and reason. University of Chicago Press, Chicago, 1987.
Lakoff & Johnson, 1980: G. Lakoff and M. Johnson. Metaphors We Live By. University of Chicago Press, Chicago, IL, 1980.
Ram & Jones, 1995: A. Ram and E. Jones. Foundations of foundations of artificial intelligence. Philosophical Psychology, 8(2):193-199, 1995.
Weizenbaum, 1966: J. Weizenbaum. ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9:36-45, 1966.
Whorf, 1956: B. L. Whorf. Science and linguistics. In J. B. Carroll, editor, Language, Thought, and Reality. MIT Press, Cambridge, MA, 1956.
Wittgenstein, 1968: L. Wittgenstein. Philosophical investigations. Macmillan, New York, 1968. Translated by G. E. M. Anscombe.

Footnotes

[1] ...text in a natural language: A natural language is a language that has evolved through use in a social system (for example, English, Spanish, French, or Hindi) as opposed to one that has been designed by people for a specific purpose (for example, Fortran or Java). Languages which are engineered but evolve through social action (for instance, Esperanto, American Sign Language, and Klingon) are also examples of natural languages.
[2] ...functional-computational-representational model of the reading process: This does not imply that all research into reading or natural language processing must necessarily involve computational modeling; on the contrary, a range of psychological, social, and computational research is needed to work towards the common goal of producing a detailed functional-computational-representational model of reading.