Showing the Context of Nodes in the World-Wide Web

Sougata Mukherjea, James D. Foley

Graphics, Visualization & Usability Center
College of Computing
Georgia Institute of Technology
E-mail: sougata@cc.gatech.edu, foley@cc.gatech.edu

ABSTRACT:: This paper talks about a method to show the context of nodes in the World-Wide Web. World-Wide Web presents a lot of information to the user. Consequently, it suffers from the famous lost in hyperspace problem. One way to solve the problem is to show the user where they are in the context of the overall information space. Since the overall information space is large, we need to show the node's context with respect to only the important nodes. In this paper we discuss our method of showing the context and show some examples of our implementation.
KEYWORDS:: Hypermedia, Visualization, Structural analysis, World-Wide Web.

INTRODUCTION

One of the major problems with current hypermedia systems is being lost in hyperspace. For example in Mosaic[1], the popular interface to the World-Wide Web, the most widely used hypermedia system today, the process of jumping from one location to another can easily confuse the user. One of the main reason for this is that the user does not know the context of the node with respect to the overall information space. Similarly when the user uses the Open URL command to jump to a particular node, some information about the node's context would be very useful.

One common strategy to solve this problem is to use an overview diagram showing the overall graph structure. However, the problem with these are that for any large information space like the WWW, these diagrams are too confusing for the user. Therefore, instead of showing the whole space, we need to show how the node can be reached from important nodes (known as landmarks in the hypermedia literature). This is similar to the common geographical navigation strategy of finding where we are in the context of important landmarks.

This paper discusses an useful but simple method of showing the context of nodes of the World-Wide Web with respect to landmark nodes. We have implemented our method in the Navigational View Builder [3], a tool for forming effective visualizations of hypermedia systems. Examples are shown of how our method found out the context of some of the WWW pages about the research activities at the Graphics Visualization & Usability Center (GVU) at Georgia Tech (URL: http://www.gatech.edu/gvu/gvutop.html). Note that the node and link structure of the WWW were extracted by parsing the html documents using the strategy described in [4].

DISCOVERING LANDMARK NODES

Finding nodes that are good landmarks is not a trivial task. Valdez and Chignell [5] "anticipated that landmarks would tend to be connected to more objects than nonlandmarks, in the same way that major hubs serve as landmarks in airline systems." While running some experiments they observed a high correlation between the recall of words in a hypertext and their second-order connectedness. The second-order connectedness is defined as the number of nodes that can be reached by a node when following at most two links. As observed in [2], since hypertexts are directed graphs, it is possible to extend the idea and postulate that nodes that have high back second-order connectedness are also good landmarks. The back second-order connectedness of a node is the number of nodes that can reach the specified node in two steps. Similarly, the number of nodes that can be reached from the node by following only one link (the outdegree of the node) and the number of nodes that can reach the node following only one link (the indegree) should be also used in calculating the importance of the node.

Thus, the importance of a node can be calculated to be the weighted sum of the second-order connectedness (SOC), the back second-order connectedness (BSOC), the indegree (I) and the outdegree (O). After the importance of the nodes are calculated, the landmark can be defined to be those nodes whose importance value is greater than a threshold. We used a threshold value of ten percent of the total number of nodes in the information space. Thus, the procedure for discovering landmarks can be summarized as follows:

Calculate
importance = (I + O) * wt1 + (SOC + BSOC) * wt2
where wt1 + wt2 = 1.0 . We found the best result using wt1 = 0.4 and wt2 = 0.6 .
Iff importance > 10% of total number of nodes, the given node is a landmark.

SHOWING NODE CONTEXT

Once the importance of the nodes are calculated, the node context is shown by the following procedure:

For each node n that has a link to the node of interest i, if n is a landmark node and importance of n is greater than i, we make n the node of interest and recursively call our procedure for n. Figure 1 shows the context of the WWW page of the first author. Two landmark pages People.Students.html and Multimedia.html had links to this page. Thus they became nodes of interest. The procedure was recursively called for these pages. The recursion stopped when we reached gvutop.html since its importance is greater than all other nodes.

FIGURE 1: Context of Sougata Mukherjea. Indicates that he is a student in GVU and part of the Multimedia group.
For some node, say i, it may happen that none of the nodes that have links to i are landmarks. For these nodes we find n, the node which has the maximum importance among the nodes that have links to i and moreover, the importance of n is greater than the importance of i. n becomes the new node of interest and the procedure is recursively called for n. For example, for the node visdebug.gif, none of the nodes that have links to it were landmarks. Therefore, we selected the node visdebug.html, the most important node that had link to it and called the procedure recursively for that node. The context for visdebug.gif was found to be following the path from gvutop.html to visdebug.gif via research.html, SoftViz.html and visdebug.html.
It may happen that for a node i none of the nodes that have links to it are landmarks and none have importance greater than i. For these nodes, we show the context by finding the shortest distance to this node from the most important node. Thus, Figure 2 shows the context of section3.2.html. No landmark node links to it and it's importance is greater than all nodes that link to it. Therefore, we show the shortest path from the most important node, gvutop.html to this node.

FIGURE 2: Context of section3.2.html. Indicates that it is a page in the Multimedia research area.

CONCLUSION

We have discussed a useful procedure to show the context of nodes in the WWW. Our procedure gives a good insight about the position of the node with respect to the overall information space. For example, looking at Figure 1, one gets a good idea of the position of the first author in the GVU Center. It shows that he is a student and is part of the Multimedia group. Another advantage of the procedure is that it is computationally very cheap. Moreover, this method is not restricted to WWW but can be applied to any hypermedia system.

However, a major limitation of our system is that it uses just structural analysis for determining the importance of the nodes. This leads to unexpected results sometimes. For example, some new PhD students who have not yet decided on their research area, work in many areas. Since they have links to all these areas, their importance is high by our calculation. However, this does not seem correct. Thus, some contextual analysis is also needed. An useful way to do this is to make the importance of the node dependent on the number of accesses to the node. This can can be easily done by incorporating a web access log analysis tool [4] into our system. Finding other contextual methods of determining the importance of a node is an open research issue.

ACKNOWLEDGEMENT

This work is supported by grants from Digital Equipment Corporation, Bell South Enterprises, Inc. and Emory University System of Health Care, Atlanta, Georgia as part of the Hypermedia Interface for Multimedia Databases project.

References

1: M. Andreessen. NCSA Mosaic Technical Summary. Technical report, National Center for Supercomputing Applications, 1993.
2: R. Botafogo, E. Rivlin, and B. Shneiderman. Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics. ACM Transactions on Office Information Systems, 10(2):142-180, 1992.
3: S. Mukherjea, J. Foley, and S. Hudson. Visualizing Complex Hypermedia Networks through Multiple Hierarchical Views. To appear in Proceedings of ACM SIGCHI '95, May 1995.
4: J. Pitkow and K. Bharat. WEBVIZ: A Tool for World-Wide Web Access Log Visualization. In Proceedings of the First International World-Wide Web Conference, Geneva, Switzerland, May 1994.
5: F. Valdez and M. Chignell. Browsing Models for Hypermedia Databases. In Proceedings of the Human Factors Society, 32nd Annual Meeting, Santa Monica, Ca, 1988.