Showing the Context of Nodes in the World-Wide Web
Showing the Context of Nodes in the World-Wide Web
Sougata Mukherjea, James D. Foley
- Graphics, Visualization & Usability Center
- College of Computing
- Georgia Institute of Technology
- E-mail: sougata@cc.gatech.edu, foley@cc.gatech.edu
- ABSTRACT:
- This paper talks about a method to show the context of nodes in the World-Wide Web.
World-Wide Web presents a lot of information to the user. Consequently, it suffers from
the famous lost in hyperspace problem. One way to solve the problem is to show
the user where they are in the context of the overall information space. Since the overall
information space is large, we need to show the node's context with respect to only the
important nodes. In this paper we discuss our method of showing the context and show some
examples of our implementation.
- KEYWORDS:
- Hypermedia, Visualization, Structural analysis, World-Wide Web.
INTRODUCTION
One of the major problems with current hypermedia systems is being
lost in hyperspace. For example in Mosaic[1],
the popular interface to the World-Wide Web, the most widely used hypermedia system today,
the process of jumping from one location to another can easily confuse the user. One of
the main reason for this is that the user does not know the context of the node with
respect to the overall information space. Similarly when the user uses the
Open URL command to jump to a particular node, some information about the node's
context would be very useful.
One common strategy to solve this problem is to use an overview diagram showing the
overall graph structure. However, the problem with these are that for any large
information space like the WWW, these diagrams are too confusing for the user. Therefore,
instead of showing the whole space, we need to show how the node can be reached from
important nodes (known as landmarks in the hypermedia literature). This is
similar to the common geographical navigation strategy of finding where we are in the
context of important landmarks.
This paper discusses an useful but simple method of showing the context of nodes of the
World-Wide Web with respect to landmark nodes. We have implemented our method in the
Navigational View Builder [3], a tool for forming effective
visualizations of hypermedia systems. Examples are shown of how our method found out the
context of some of the WWW pages about the research activities at the Graphics
Visualization & Usability Center (GVU) at Georgia Tech
(URL: http://www.gatech.edu/gvu/gvutop.html). Note that the node and link structure of
the WWW were extracted by parsing the html documents using the strategy described in
[4].
DISCOVERING LANDMARK NODES
Finding nodes that are good landmarks is not a trivial task. Valdez and Chignell
[5] "anticipated that landmarks would tend to be connected
to more objects than nonlandmarks, in the same way that major hubs serve as landmarks in
airline systems." While running some experiments they observed a high correlation between
the recall of words in a hypertext and their second-order connectedness. The
second-order connectedness is defined as the number of nodes that can be reached by a node
when following at most two links. As observed in [2], since
hypertexts are directed graphs, it is possible to extend the idea and postulate that nodes
that have high back second-order connectedness are also good landmarks. The back
second-order connectedness of a node is the number of nodes that can reach the specified
node in two steps. Similarly, the number of nodes that can be reached from the node by
following only one link (the outdegree of the node) and the number of nodes that can reach
the node following only one link (the indegree) should be also used in calculating the
importance of the node.
Thus, the importance of a node can be calculated to be the weighted sum of the
second-order connectedness (SOC), the back second-order connectedness (BSOC), the indegree
(I) and the outdegree (O). After the importance of the nodes are calculated, the landmark
can be defined to be those nodes whose importance value is greater than a threshold. We
used a threshold value of ten percent of the total number of nodes in the information
space. Thus, the procedure for discovering landmarks can be summarized as follows:
- Calculate
importance = (I + O) * wt1 + (SOC + BSOC) * wt2
where wt1 + wt2 = 1.0 . We found the best result using wt1 = 0.4
and wt2 = 0.6 .
- Iff importance > 10% of total number of nodes, the given node is a landmark.
SHOWING NODE CONTEXT
Once the importance of the nodes are calculated, the node context is shown by the
following procedure:
- For each node n that has a link to the node of interest i, if
n is a landmark node and importance of n is greater than i, we
make n the node of interest and recursively call our procedure for n.
Figure 1 shows the context of the WWW page of the first author. Two landmark pages
People.Students.html and Multimedia.html had links to this page. Thus
they became nodes of interest. The procedure was recursively called for these pages. The
recursion stopped when we reached gvutop.html since its importance is greater
than all other nodes.
FIGURE 1:
Context of Sougata Mukherjea. Indicates that he is a student in GVU and part of
the Multimedia group.
- For some node, say i, it may happen that none of the nodes that have links
to i are landmarks. For these nodes we find n, the node which has the
maximum importance among the nodes that have links to i and moreover, the
importance of n is greater than the importance of i. n
becomes the new node of interest and the procedure is recursively called for n.
For example, for the node visdebug.gif, none of the nodes that have links to it
were landmarks. Therefore, we selected the node visdebug.html, the most important
node that had link to it and called the procedure recursively for that node. The context
for visdebug.gif was found to be following the path from gvutop.html to
visdebug.gif via research.html, SoftViz.html and
visdebug.html.
- It may happen that for a node i none of the nodes that have links to it
are landmarks and none have importance greater than i. For these nodes, we show
the context by finding the shortest distance to this node from the most important node.
Thus, Figure 2 shows the context of section3.2.html. No landmark node links
to it and it's importance is greater than all nodes that link to it. Therefore, we show
the shortest path from the most important node, gvutop.html to this node.
FIGURE 2:
Context of section3.2.html. Indicates that it is a page in the Multimedia research area.
CONCLUSION
We have discussed a useful procedure to show the context of nodes in the WWW.
Our procedure gives a good insight about the position of the node with respect
to the overall information space. For example, looking at Figure 1, one gets a
good idea of the position of the first author in the GVU Center. It shows that
he is a student and is part of the Multimedia group. Another advantage of the
procedure is that it is computationally very cheap. Moreover, this method is not
restricted to WWW but can be applied to any hypermedia system.
However, a major limitation of our system is that it uses just structural
analysis for determining the importance of the nodes. This leads to unexpected
results sometimes. For example, some new PhD students who have not yet decided
on their research area, work in many areas. Since they have links to all these
areas, their importance is high by our calculation. However, this does not seem
correct. Thus, some contextual analysis is also needed. An useful way to do this
is to make the importance of the node dependent on the number of accesses to the
node. This can can be easily done by incorporating a web access log analysis
tool [4] into our system. Finding other
contextual methods of determining the importance of a node is an open research issue.
ACKNOWLEDGEMENT
This work is supported by grants from Digital Equipment Corporation, Bell South
Enterprises, Inc. and Emory University System of Health Care, Atlanta, Georgia
as part of the Hypermedia Interface for Multimedia Databases project.
References
- 1
-
M. Andreessen.
NCSA Mosaic Technical Summary.
Technical report, National Center for Supercomputing Applications,
1993.
- 2
-
R. Botafogo, E. Rivlin, and B. Shneiderman.
Structural Analysis of Hypertexts: Identifying Hierarchies and
Useful Metrics.
ACM Transactions on Office Information Systems, 10(2):142-180,
1992.
- 3
-
S. Mukherjea, J. Foley, and S. Hudson.
Visualizing Complex Hypermedia Networks through Multiple
Hierarchical Views.
To appear in Proceedings of ACM SIGCHI '95, May 1995.
- 4
-
J. Pitkow and K. Bharat.
WEBVIZ: A Tool for World-Wide Web Access Log Visualization.
In Proceedings of the First International World-Wide Web
Conference, Geneva, Switzerland, May 1994.
- 5
-
F. Valdez and M. Chignell.
Browsing Models for Hypermedia Databases.
In Proceedings of the Human Factors Society, 32nd Annual
Meeting, Santa Monica, Ca, 1988.