next up previous
Next: Wrappers to Semi-Structured Up: Generic Wrapper Design Previous: Wrapper Result Format

Wrapper Generalization Hierarchy

In order to provide a sound structure for wrapper design and generation, we classify the numerous types of repositories and information systems into the following major categories: structured (RDBMS or OODBMS), sem-structured (HTML, Usenet, BibTex file) and unstructured (MS-Word file or LaTeX file). This subtyping based classification allows much of the wrapper code from previous implementations to be reused.

Our first relational DBMS wrapper is implemented for the RDBMS Oracle7 using Oraperl gif [4] gateway. The wrapper data manipulation commands are directly translated into Oracle commands using the ora_do Oraperl function call.

Unlike structured repositories, semi-structured information repositories typically do not have any local data processing facilities, so the wrapper must perform the data processing. The third subtype of wrappers is those to unstructured files such as LaTeX or MS-Word documents. There may be specialized ways to extract information (information from figures, tables, lists) from these sorts of documents, but producing wrappers to these types of data will require the most effort.



Ling Liu
Thu Aug 15 17:49:43 MDT 1996