Not the real language course

Deterministic parsing


Deterministic parsing--the use of heuristics

As mentioned previously, the parsing mechanisms we've been looking at so far are simulations of nondeterministic processes. Somewhere else in this curriculum you'll learn more about nondeterminism, but for now it should suffice to say that a nondeterministic process always returns the right answer, and the theoretical basis for that right answer is something called an "oracle," which always knows which choice is the right choice. However, you probably don't have an oracle handy. So we end up having to simulate nondeterminism, which typically means we have to do an exhaustive search. The transition net approach to parsing that we've seen up to now implements nothing more than a depth-first search of all the possible ways one might get from the start state (indicated by the S node) to a terminal string that matches the input sentence. That depth-first search is an exhaustive search of what might be a big state space, and as you know from your earlier AI classes, exhaustive search is very expensive. So from a purely practical point of view, we'd like to move away from reliance upon simulations of nondeterminism to find the right parse tree for some sentence. And I'm sure you'll also recall what you learned about how to cut down on search---you employ heuristics. Will this lack of nondeterministic parsing going to be a problem for us? Probably not, because humans seem to do a credible job of parsing, and it looks like humans are deterministic, not nondeterministic, parsers. Human understanders appear to be using heuristics to guide their decisions about syntactic structure, and their decisions aren't always correct. They're not doing this consciously, of course, but their behavior can be explained to some extent by a model of processing that uses heuristics to make decisions about syntactic ambiguities.

Minimal attachment

One heuristic that has been proposed to explain how people make parsing decisions program, is called "minimal attachement." For example, given the sentence, "I saw the man with the telescope," how could a parser determine where the PP "with the telescope" attaches if using the following grammar?

     S <- NP VP
    VP <- VERB NP PP
    VP <- VERB NP
    NP <- ART NOUN
    NP <- PRON
    NP <- NP PP
    PP <- PREP NP

Here's one possible parse, where the prepositional phrase "with the telescope" attaches to verb phrase and modifies the verb "saw":

               S
              / \
            /     \
          /         \
        /             \
       NP             VP
       |             / | \
       |          /    |    \
       |       /       |       \
     PRON    VERB      NP       PP
       |      |        /\        |\
       |      |       /  \       |   \
       |      |      /    \      |      \
                    ART  NOUN  PREP     NP
       I     saw     |    |      |      /\
                     |    |      |     /  \
                     |    |      |    /    \
                                     ART  NOUN
                    the  man   with   |    |
                                      |    |
                                      |    |
     
                                    the   telescope

Here's another acceptable parse tree for the same sentence, but this time the prepositional phrase attaches lower in the tree to the noun phrase and modifies "the man":

               S
              / \
            /     \
          /         \
        /             \
       NP             VP
       |             /  \ 
       |          /        \
       |       /              \
     PRON    VERB             NP
       |      |              /  \
       |      |            /      \
       |      |          /          \
                       NP            PP
       I     saw       /\           /  \
                      /  \         /    \ 
                     /    \       /      \ 
                    ART  NOUN   PREP     NP  
                     |    |      |       /\  
                     |    |      |      /  \ 
                     |    |      |     /    \ 
                                      ART  NOUN
                    the  man    with   |    | 
                                       |    |
                                       |    |
     
                                      the  telescope

The minimal attachment principle says that the preferred parse is the simplest parse, and simplicity here is measured in terms of the number of non-terminals in the parse tree, where fewer non-terminals is considered to be simpler. Thus, the first parse tree above, which contains 13 non-terminals, is preferred by the minimal attachment principle over the second parse tree which contains 14 non-terminals.

This principle won't always work, however. If you used minimal attachment in parsing "We painted all the walls with cracks," the heuristic would select the wrong parse. (Add the following rule to the grammar above:

    NP <- QUANT ART NOUN  

and note that "all" is a QUANT.) Oh, the reason we know that it's the wrong parse is that we know what the sentence is supposed to mean. So, as you can see, it's going to be very hard to keep up this sham about a clean separation between syntax and meaning.

               S
              / \
            /     \
          /         \
        /             \
      /                 \
     NP                 VP
     |                /  |  \
     |            /      |      \
     |        /          |          \
   PRON    VERB          NP           PP
     |       |          /|\           |\
     |       |        /  |  \         |   \
     |       |      /    |    \       |      \
                  QUANT ART NOUN    PREP     NP
    We    painted  |     |    |       |       |
                   |     |    |       |       | 
                   |     |    |       |       |
                                             NOUN
                  all   the walls   with      |
                                              |
                                              |
     
                                            cracks

The parse tree above has only 13 non-terminals, while the alternate below has 14 non-terminals:

               S
              / \
            /     \
          /         \
        /             \
      /                 \
     NP                 VP
     |                /     \
     |            /             \
     |        /                     \
   PRON    VERB                     NP
     |       |                     / \                 
     |       |                   /     \
     |       |                 /         \
                             NP           PP
    We    painted           /|\           |\
                          /  |  \         |   \
                        /    |    \       |      \
                      QUANT ART NOUN    PREP     NP
                       |     |    |       |       |
                       |     |    |       |       | 
                       |     |    |       |       |
                                                 NOUN
                      all   the walls   with      |
                                                  |
                                                  |
    
                                                cracks

Despite the fact that the latter parse tree makes more sense (I mean, you didn't actually use cracks to paint the walls, did you?), the minimal attachment heuristic prefers the former parse tree. Consequently, it's safe to conclude that the minimal attachment heuristic isn't perfect.

Right association

Another proposed heuristic, which is used when minimal attachment can't make a decision, is called "right association." The right association principle says that if all other things are equal (e.g., same number of non-terminals in each parse tree), any new constituent or phrasal component should be attached to the constituent currently under construction (i.e., lower in the parse tree) instead of a constituent higher up in the parse tree. Consequently, the right association principle would choose the parse tree best reflecting that interpretation. For example, the following two parse trees for "I heard your computer died yesterday" contain 13 non-terminals each, so minimal attachment is no help, but right association prefers the first tree because the prepositional phrase in question attaches lower in the tree:

          S
        /   \
      /       \
    /           \
   NP           VP
   |           /  \
   |         /      \
   |       /          \
 PRON    VERB      COMPLEMENT
   |      |            / \
   |      |          /     \
   |      |        /         \
                              S
   I    heard    that        / \
                           /     \
                         /         \
                       NP           VP
                      / \          / \
                     /   \        /   \
                    /     \      /     \
                 POSS    NOUN  VERB    PP
                  |        |    |       |
                  |        |    |       |
                  |        |    |       |

                your  computer died  yesterday
             S
           /   \
        /         \
     /               \
   NP                 VP
   |                 / | \
   |              /    |    \
   |           /       |       \
 PRON      VERB    COMPLEMENT    PP
   |        |         / \         \
   |        |       /     \        \
   |        |     /         \       \
                              S
   I      heard  that        / \   yesterday
                            /   \
                           /     \
                         NP       VP
                        / \       |
                       /   \      |
                      /     \     |    
                   POSS    NOUN  VERB  
                    |        |    |      
                    |        |    |      
                    |        |    |      

                  your  computer  died 

Lexical preferences

Right association doesn't always work either. Sometimes it looks like the main verb influences the attachment decision. For example, where does "in the house" attach in the following sentences? Can the attachment decisions change? (What decisions would minimal attachment make? How about right association? Is there a conflict?)

I wanted the dog in the house.

I kept the dog in the house.

I put the dog in the house.

In his seminal textbook, James Allen indicates that in the first sentence above, "in the house" should attach to the noun phrase "the dog". The author believes that this is intuitively better than attaching the PP to the verb "wanted" -- that is, "wanted" doesn't really need a location for the wanting to happen -- but our informal experiments in past offerings of the natural language class suggests that the author's intuition doesn't match everyone else's, though the results were never entirely decisive.

As for the second sentence, however, you unanimously favored attaching the PP to "kept" instead of "the dog", while "put" in the third sentence absolutely positively demands that "in the house" attaches to the verb, not to the NP "the dog".

So what's going on here? Well, it may be another heuristic that takes verb influence into account. The "lexical preference" principle says that, for each verb, the parser must know which prepositions indicate that the PP must attach to the VP (e.g., "put"), and which ones only prefer that the PP attaches to the VP (e.g., "kept"). If the verb-preposition combination in question doesn't fall into one of these two categories, the principle defaults to the use of right association.

How to implement these principles

First, you don't want to generate all the possible parse trees for any given sentence so you can count up all the non-terminals or whatever. That's what we're trying to avoid. Remember? It's that ugly search problem again. So you need to implement these principles in a parser in such a way that you can make the necessary decisions as the parser proceeds. But, as your textbook explains, any commitment to a specific implementation of any or all of these principles has always had problems. So while these principles may explain some of what's going on in human parsing, they clearly don't explain everything. If you're going to try to embody these principles in some parser someday, you'll just have to be ready to accept the fact that they're going to make wrong decisions some of the time (unless some breakthrough occurs between now and then).

Yet another heuristic approach

The problem, in a nutshell, is this: when you (assuming you are a parser) are parsing some sentence, and you're faced with some syntactic ambiguity, you need to make a choice, and you'd like to make the correct choice every time. If you were non-deterministic, that is, if you had access to an "oracle," you could always make the right choice. But you're not an oracle; you don't know everything that's going to happen in the future. What do you do? You cheat. You can't look infinitely into the future, but you can lookahead a few words or constituents, can't you? Sure, and that's the principle of "lookahead parsing."

The principle of lookahead parsing simply says that, at each decision point in your parser, you add a set of tests that looks ahead at the input. Based on what's coming up, you can make an informed guess as to which choice is the best one to make now. In an ATN parser, you add these tests to the nodes or states of your ATN--these are the decision points (if the node has more than one arc leading away from it). These lookahead tests are sort of like feature tests, but they're different because (1) they're performed at the node before traversing the arc because they are heuristics for guiding arc choice, and (2) the lookahead tests are not required to be true for a correct parse, but feature tests are required to be true.

Simple lookahead parsers peek a fixed number of words ahead. More sophisticated parsers peek a fixed number of constituents ahead. But, as is the case with the heuristics described above, lookahead parsers don't always get the parse right either. If you don't look ahead enough, you get sub-human performance. If you look ahead too much, you get super-human performance, but still not perfect. Plus, your parser starts to look like each node has another parser attached to it. If you look ahead too too much, you might as well just try the nondeterministic simulations. Ugh. What is the magic number of constituents or words to get exactly human performance? Nobody's found it yet, which suggests that lookahead parsing may not be the answer either.

So what do you do?

Punt. Nobody really knows what's inside the "syntax box." We can propose grammars for English, but they're always incomplete. We can propose parsing mechanisms that work in many or most cases, but they won't work in all cases. We can propose heuristics to help out, but they don't completely solve the problem either. And so it goes. We don't have all the answers. Still, we know enough that we can build reasonably good natural language understanding systems in limited domains of expertise, assuming we're willing to accept less than perfect, or even human, performance some of the time. In the meantime we're trying to figure out how this whole language thing really works. You'll see more of that in the weeks to come. So don't give up yet.

Copyright (c) 2004 by Kurt Eiselt. All rights reserved, except as previously noted.

Last revised: February 17, 2004