Query Routing Using Source Capability and User Query Profiles

Next: Metadata Description in DIOM Up: A Motivating Example Previous: Problems With Naive Search

3.2 Query Routing Using Source Capability and User Query Profiles

Suppose we annotate the example query Q in the user query profile (see Section 4.1) that the suppliers we are interested in include only book stores and publishers, and the description attribute of books may correspond to title, abstract, or subject of books. Also suppose we have captured the content and query capability of the data sources listed in Figure 1 in a rule-based representation language as shown in Figure 3.

Now we can immediately determine that Source 7 is not relevant to answering this query, because it has no information about books. We can also conclude that Source 1, 3, 5 are not able to contribute to the answer of Q. The reasoning here is more subtle: we are interested only in books published in 1996 and supplied by either book stores or publishers, whereas Source 1 has review information only on books published in between 1970 and 1980, Source 3 has only books supplied by book clubs, and Source 5 neither takes a particular year as its input argument nor provides year information on its books as output. We are left with sources 2, 4, 6, and 8, and three independent subquery execution plans:

Ask Source 2 for the title, authors, and price of books about cancer, published in 1996. Assign the source name Barnes&Nobel to the supplier field of qualifying books. For each book, obtain a review from the Source 6.
Ask Source 4 for title, authors, and price of books about cancer and published in 1996. Assign the source name Morgan Kaufmann Publishers, Inc. to the supplier field of qualifying books. For each book, obtain a review from the Source 6.
Ask Source 8 for the title, authors, price, year, and supplier of books about cancer. From the result tuples of type (title, authors, price, year, supplier), select only those books published in 1996. For each selected book, obtain a review from the Source 6.

The query result is the union of the results returned by executing these three subqueries.

Note that Source 2, Source 4, and Source 8 are qualified for answering Q because (1) the output capabilities of Source 2, Source 4, and Source 8 meet the output requirements of the query Q, and (2) the output capabilities of Source 2, Source 4, and Source 8 are enough to cover the mandatory input requirements of Source 6 (i.e., either title or authors of the book). Had we added ISBN into the list of outputs of Q, we would not be able to use Source 2 and Source 8 in obtaining the answers, because we are not able to obtain ISBN according to the output capabilities of Source 2 and Source 8. Had the input constraints of Source 6 required more specific information about the book (e.g., publisher) to return a review, we would not be able to get the reviews for books returned from Source 2, because the output parameter list of Source 2 does not include publisher, thus if the output parameter review of Q cannot be absent (optional), Source 2 would not be able to contribute to the query Q.

From this example, we observe that it is useful and beneficial to create user query profiles that allow non-native users who know what they want to annotate and record the query-specific semantics of the query parameters. For instance, by capturing and utilizing the annotation to the supplier parameter of Q, i.e., the suppliers of books requested are either book stores or publishers, we can straightway decide that Source 3 and Source 7 are not relevant to the answer of Q. It is also important to capture the content and query capability information of the data sources, so we can further prune the data sources that are incapable of contributing to the query answer, either due to the restriction on the scope of query interest (e.g., Source 7 and Source 3 in our example), or due to constraints on the list of mandatory input or output arguments of the sources (e.g., Source 5 does not have year as input or output argument), or due to the conflict of query interest (``year = 1996") with the access constraints associated with the sources (e.g., Source 1 has reviews for only books published in between 1970 and 1980).

In what follows, we first introduce the metadata description model that is used for semantic specification of user query profiles and source content and capability profiles. Then we discuss the methods that use these metadata profiles to identify and locate the data sources that contain semantically correct answers. We call this functionality query routing in DIOM, a mediator-wrapper based system [6] for querying heterogeneous information sources.

Query routing is the first optimization step that constrains the search space for a query in open environments, in preparation for query execution planning. One of the main goals of query routing is to identify relevant data sources for a query as early as possible, thus reducing the overhead of contacting the data sources that do not contribute to the answer of the query.

Next: Metadata Description in DIOM Up: A Motivating Example Previous: Problems With Naive Search

Ling Liu
Tue Jun 17 15:26:27 PDT 1997