Not the real language course
Augmented Transition Networks

From transition nets to recursive transition nets

While simplifying the TN in class last time, we introduced the ability 
to push a destination on a stack, jump to a subnetwork, and return to the
pushed destination.  That gave us the ability to implement recursion
in these networks.  Such a network, of course, is called a recursive
transition network (RTN).  RTNs give us a mechanism for representing
CFGs that include recursive rules:

       S <- NP VP
      VP <- VERB
      NP <- ART NOUN
      NP <- ART NOUN PP
      PP <- PREP NP
     ART <- the
    NOUN <- boy | raft | island | ocean
    VERB <- drowned
    PREP <- with | near | in


Why do we want recursive rules in our grammar?  As mentioned in a
previous lecture, natural languages allow us to express an infinite
range of ideas using a finite set of rules and symbols.  We need
the notion of recursion to do this.  Not only are there an infinte
number of sentences, but a single sentence can be infinitely long:

    The boy drowned.

    The boy with the raft drowned.

    The boy with the raft near the island drowned.

    The boy with the raft near the island in the ocean drowned.

    The boy ....


Recursion enables us to represent this capability of language as well.
(Note that while we can generate infinitely long sentences in English,
such sentences will not necessarily be understandable given real
constraints on human processing and memory, human lifespan, the age
of the universe, and stuff like that.  Even very short sentences
that are syntactically correct according to the "rules" of English
may not be understandable.  Syntactic correctness and semantic usefulness
do not necessarily go hand in hand.)


RTNs vs. CFGs

The RTN formalism makes it easier for you and me to follow what's going
on in parsing a sentence.  RTNs describe a process; CFGs don't.
Since a computer program is a process description, RTNs are valuable
when writing a parsing program.  It is possible to build a parser that's
driven by grammar rules, and many have been built.  The big difference
between the two is that at run time, the RTN parser "knows" at each step
of the parsing process exactly which choices are available, or what
knowledge will be relevant.  A CFG-based parser will have to search
through the list of rules each step of the way to find the applicable ones.
(There are variations that help to compensate for this problem.)
The parser that you'll be building in this course derives from the RTN
formalism, so we'll concentrate on that one.


Backtracking

Above I said that the RTN parser knows which choices are available
at each step.  Note that "choices" is plural.  If there's more than
one choice, how does the parser know which one is the right one?
It doesn't, so we have to introduce the notion of backtracking.
Consider the sentence:

    The little orange ducks swallow flies.

If you parse this sentence, you have to make decisions about word
categories.  Is "orange" a noun or an adjective.  If you decide
it's a noun, then "ducks" must be a verb (assuming we're not allowing
noun modifiers for the sake of argument, as in the crude transition
net below).

                                                            pop
                                                            /
       ART       1  NOUN          VERB          NOUN       /
  S ---------> S1 ---------> S2 ---------> S3 ---------> S4
               /\
              /  \ 2
              \  /
               \/
              ADJ


Similarly, "swallow" must be a noun, but then what do you do with
"flies"?

You need to backtrack to the last decision point.  In order to
backtrack, your parser must record at every decision point:

     What word in the sentence it was looking at when it made the
     decision

     What node (or state) in the network it was at

     What other arcs (or paths) could have been chosen but weren't

When the parser bumps into a dead end like that described above,
it must reset the parsing process so that it's looking at the
previously stored word, it's restarting from the previously stored
node, and it's traversing the first of the previously stored arcs.
(If there are more than one arc remaining, just push the information
again, without the first arc, and proceed as usual.)


RTNs are not enough

Recursive transition networks are interesting from a theoretical
standpoint but are not of much use by themselves.  From a computational
perspective, a black box that accepts English input and just says "yes"
or "no" doesn't buy us much.

What we need is a black box that records the structure of the input
as well as providing an evaluation of syntactic correctness.  We do
this by adding procedures to the arcs of the RTN.  These procedures
are then performed when the corresponding arcs are traversed.  The
resulting network is called an "augmented transition network" (ATN).
(And sometimes, adding procedures to the arcs of a transition network
is called "procedural attachment.")


ATN details--register assignment

One of the things we can do with these added procedures or augmentations
is to store information in registers when arcs are traversed.  To record
the structure of the input, we add an action to each arc which stores the
word that was processed while traversing that arc in an appropriate
register.

                                                    pop
                                                    /
                           ART        1  NOUN      /
                     NP ---------> NP1 ---------> NP2
                          :        /\         :
                          :       /  \ 2      :
                          :       \  /        :
                          :        \/ ADJ     :
                          :        :          :
                          :        :          :
                          :        :          :
                      ART <- *     :      NOUN <- *
                                   :
                             ADJS <- ADJS + *


(* = whatever is returned by processing the arc.  If the arc label
doesn't refer to a subnetwork, then just return the word.  Otherwise,
return the structure that was build by going through the subnetwork.)

If we now process the noun phrase "the vicious dog" with the ATN
above, when we reach NP2 we'll have made the following register assignments:


     ART = the
    ADJS = vicious
    NOUN = dog


As we take the "pop" arc from this network, we can then accumulate the
register contents into a larger structure that we call NP:

    (NP (ART the)
        (ADJS vicious)
        (NOUN dog))


If the ATN above were called by some larger network, for example:

                                                             pop
                                                             /
                      NP           VERB           NP        /
                 S ---------> S1 ---------> S2 ---------> S3
                       :             :             :
                       :             :             :
                       :             :             :
                       :             :             :
                       :             :             :
                       :             :             :
                       :             :             :
                    SUBJ <- *     VERB <- *     OBJ <- *


and we had the appropriate lexicon:

  (lexicon
    (the     (CAT ART))
    (vicious (CAT ADJ))
    (dog     (CAT NOUN))
    (ate     (CAT VERB))
    (wimpy   (CAT ADJ))
    (frog    (CAT NOUN)))


we could parse "the vicious dog ate the wimpy frog" and the resulting
structure would be:

    (S (SUBJ (NP (ART the)
                 (ADJS vicious)
                 (NOUN dog)))
       (VERB ate)
       (OBJ  (NP (ART the)
                 (ADJS wimpy)
                 (NOUN frog))))


ATN details--feature tests

Another of the things that we want to do is to recognize when the
subject and the verb agree in person and number.  For example:

"I are smart."    

is syntactically incorrect, but

"I am smart."     

is syntactically correct

The problem in the first sentence is that "I" is a 1st person singular
pronoun but the verb "are" works only in 2nd person singular and
1st, 2nd, or 3rd person plural constructions.

To perform agreement checks in the ATN framework, we attach "feature
tests" to the arcs:


                                                             pop
                                                             /
                      NP           VERB           NP        /
                 S ---------> S1 ---------> S2 ---------> S3
                       :                :             :
                       :                :          OBJ <- *
                       :                :
                    SUBJ <- *        VERB <- *

                  NUM    <- NUM*      NUM <- NUM     intersect NUM*
                     subj                       subj



                                                    pop
                                                    /
                           ART        1  NOUN      /
                     NP ---------> NP1 ---------> NP2
                          :        /\         :
                          :       /  \ 2      :
                          :       \  /        :
                          :        \/ ADJ     :
                          :        :          :
                      ART <- *     :      NOUN <- *
                                   :
                      NUM <- NUM*  :       NUM <- NUM intersect NUM*
                                   :
                                   :
                             ADJS <- ADJS + *


and modify the lexicon appropriately:

  (lexicon
    (the     (CAT ART)
             (NUM 3s 3p))
    (vicious (CAT ADJ))
    (dog     (CAT NOUN)
             (NUM 3s))
    (ate     (CAT VERB)
             (NUM 1s 2s 3s 1p 2p 3p))
    (eat     (CAT VERB)
             (NUM 1s 2s 1p 2p 3p))
    (eats    (CAT VERB)
             (NUM 3s))
    (wimpy   (CAT ADJ))
    (frog    (CAT NOUN)
             (NUM 3s)))


Now, if we try to parse a sentence like "the dog eat the frog" our
ATN parser will catch the agreement error when it performs the
intersection on the number of "dog" and the number of "eat"
because the result is the empty set.  If the intersection between
the set of all possible values for the number of the subject and
the set of all possible values for the number of the verb is the
empty set, then there is no agreement between the subject and the
verb.  The sentence must be syntactically incorrect.

Feature tests can also be used to check for agreement between
auxiliary verbs and the main verb in a verb group.  They can
also be used to check for the correct verb complement structure.
In other words, feature tests can be used to see if the right number
of noun phrases follow the verb.  Remember that intransitive verbs
expect no noun phrases, transitive verbs expect one noun phrase, and
bitransitive verbs expect two noun phrases.  Also note that any
given verb might fall into more than one of these categories.
Information about the complement structure is called the
"subcategorization" of the verb.


ATNs and the real world

The great majority of natural language processing (NLP) systems in actual
use in the world today are based on the ATN formalism.  They are often
used as front-ends or interfaces to database systems for question answering.

They are often designed to perform both the syntactic and semantic
analysis at the same time (i.e., there's no separate semantic or
world knowledge "black boxes").  This approach can work when
the NLP system works in a very limited domain of knowledge or
expertise, with a limited vocabulary and limited syntax; this helps to
eliminate problems caused by ambiguity.  Furthermore, it should be the
case that the NLP system can "assume" that the input it sees is really
a command to perform some action(s).  This is called "procedural semantics".  
In other words, the assumption is that input sentences correspond to 
programs that perform some desired action on the database, and that words 
in the input correspond to program steps.  (The NLP system that you'll be
building for this course will not assume procedural semantics;
it will instead assume what is called "compositional semantics,"
but we'll save that for later.)

If these assumptions hold, and they often do when you're working with
front ends for database systems, you can then attach additional
procedures to the arcs of the ATN that will translate the English input
into commands to be executed by the database system.  This is possible
because the input language is so constrained that there is now roughly
a one-to-one mapping between syntactic structure and semantic interpretation.

One often cited example of a system that followed this approach
successfully is a program called LUNAR, written by William Woods in the
late 1960s and early 1970s, that functioned as the natural language
front end between geologists and a database containing detailed
information about the composition of moon rock samples collected
during the Apollo missions.  Another famous system from that era
was called SHRDLU.  SHRDLU, written by Terry Winograd, accepted English
commands from a user and manipulated simulated blocks on a simulated table
using a simulated robot arm.


Parsing as Search

A grammar describes a way of generating every legal sentence in the
language defined by that grammar.  The tree of all partially parsed and
completely parsed sentences allowed by that grammar, starting with
the S node at the root and ending with all possible sentences at the
leaves, is a state space (or search space or a search tree).  A parser
can then be viewed as a search procedure that searches the tree to
see if there's a path from the S node to a desired sentence (top-down)
or from a desired sentence to the S node (bottom-up).

In parsing a natural language like English, that tree is going to be
very very big (i.e., infinite).  Getting the correct parse every time
is going to be very very difficult (i.e., impossible), unless you
have some truly nondeterministic mechanism.  We can simulate
nondeterminism (e.g., breadth first search, depth first search),
but these simulations have major weaknesses too.  What will we do?



Copyright (c) 2004 by Kurt Eiselt.  All rights reserved,
except as previously noted.
Last revised: February 17, 2004