Visualizing Complex Hypermedia Networks through Multiple Hierarchical Views
Visualizing Complex Hypermedia Networks through Multiple Hierarchical Views
Sougata Mukherjea, James D. Foley, Scott Hudson
- Graphics, Visualization & Usability Center
- College of Computing
- Georgia Institute of Technology
- E-mail: sougata@cc.gatech.edu, foley@cc.gatech.edu, hudson@cc.gatech.edu
- ABSTRACT:
- Our work concerns visualizing the information space of hypermedia systems using
multiple hierarchical views. Although overview diagrams are useful for helping the user to
navigate in a hypermedia system, for any real-world system they become too complicated
and large to be really useful. This is because these diagrams represent complex network
structures which are very difficult to visualize and comprehend. On the other hand,
effective visualizations of hierarchies have been developed. Our strategy is to provide
the user with different hierarchies, each giving a different perspective to the underlying
information space to help the user better comprehend the information. We propose an
algorithm based on content and structural analysis to form hierarchies from hypermedia
networks. The algorithm is automatic but can be guided by the user. The multiple
hierarchies can be visualized in various ways. We give examples of the implementation of
the algorithm on two hypermedia systems.
- KEYWORDS:
- Hypermedia, Overview Diagrams, Information Visualization, Hierarchization.
INTRODUCTION
Overview diagrams are one of the best tools for orientation and navigation in hypermedia
documents [17]. By presenting a map of the underlying
information space, they allow the users to see where they are, what other information is
available and how to access the other information. However, for any real-world hypermedia
system with many nodes and links, the overview diagrams represent large complex network
structures. They are generally shown as 2D or 3D graphs and comprehending such large
complex graphs is extremely difficult. The layout of graphs is also a very difficult
problem [1]. Other attempts to visualize networks such as
Semnet [3], have not been very successful.
In [13], Parunak notes that: "The insight for hypermedia is
that a hyperbase structured as a set of distinguishable hierarchies will offer
navigational and other cognitive benefits that an equally complex system of
undifferentiated links does not, even if the union of all the hierarchies is not itself
hierarchical." Neuwirth et al. [12] also observed that the
ability to view knowledge from different perspectives is important. Thus, if different
hierarchies, each of which gives a different perspective to the underlying information can
be formed, the user would be able to comprehend the information better. It should be also
noted that unlike networks some very effective ways of visualizing hierarchies have been
developed. Examples are Treemaps [7] and
Cone Trees [15].
This paper proposes an algorithm for forming hierarchies from hypermedia graphs. It uses
both structural and content analysis to identify the hierarchies. The structural analysis
looks at the structure of the graph while the content analysis looks at the contents of
the nodes. (Note that the content analysis assumes a database-oriented hypermedia system
where the nodes are described with attributes). Although our algorithm is automatic,
forming the "best" possible hierarchy representing the graph, the user can guide the
process so that hierarchies giving different perspectives to the underlying information
can be formed. These hierarchies can be visualized in different ways.
Section 2 presents our hierarchization process. Section 3 shows the implementation of
the algorithm in the Navigational View Builder, a system we are building for
visualizing the information space of Hypermedia systems
[10], [11]. This section
discusses the application of the algorithm on a demo automobile database and a section of
the World-Wide Web. Section 4 discusses how the hierarchies can be transformed to
other forms of data organizations. Section 5 talks about the related work while section 6
is the conclusion.
THE HIERARCHIZATION PROCESS
New Data Structure
For our hierarchization process we use a data structure which we call the pre-tree.
A pre-tree is an intermediate between a graph and a tree. It has a node called the
root which does not have any parent node. However, unlike a real tree, all its
descendants need not be trees themselves - they may be any arbitrary graph. These
descendants thus form a list of graphs and are called branches. However, there is
one restriction - nodes from different branches cannot have links between them. An
example pre-tree is shown in Figure 1. Note that pre-tree is another data structure like
multi-trees [4] - it is not as complex as a graph
but not as simple as a tree. Also note that although the term ``pre-tree'' has not been
used before, this data structure has a long history in top-down clustering techniques
[5]. Top-down clustering would often be halted when new
divisions were not auspicious, leaving a final structure which is essentially a pre-tree.
FIGURE 1:
An example pre-tree. It has a root node which does not have any parent. The
descendants of the root node are graphs. However, none of these graphs have any links
between them. Our hierarchization algorithm tries to identify the best pre-tree to
represent the given graph. The final tree is formed by calling the algorithm recursively
for the branches.
Hierarchization Algorithm
The algorithm tries to identify a suitable pre-tree from a given graph. Thus a root node
is identified and the other nodes are partitioned into branches. This root node forms the
root of the final hierarchy. The algorithm is recursively called for each of the branches
and the trees formed by these recursive calls become children of the root of the final
hierarchy. The recursion stops if a branch has very few nodes or the required depth of the
final tree has been reached. It may also happen that for certain branches, no suitable
pre-trees can be formed. In these cases, the nodes of the branches become children of the
parent of the branch. (This case generally occurs for branches with very few nodes).
For identifying potential pre-trees both content and structural analysis are used.
- Content analysis:
For content analysis, for each attribute, the nodes of the graph are partitioned into
branches based on the attribute values by Content-based Clustering. The
clustering algorithm is explained in [11]. If too many or
too few branches are formed, the attribute is not suitable for forming a pre-tree.
Otherwise a new pre-tree is formed with these branches. The root of the pre-tree is a
cluster representing all the nodes of the graph.
- Structural analysis:
A pre-tree is formed for nodes in the graph which can reach all other nodes. These nodes
are designated as the roots of the pre-trees. The branches are the branches of the
spanning tree formed by doing a breadth-first search from the designated root node. (A
detailed analysis is omitted for the purpose of brevity. The algorithm is explained with
examples in the next section).
Both content and structural analysis can identify several potential pre-trees. A
metric is used to rank these pre-trees. The metric consists of the following
submetrics:
- Information lost in the formation of the pre-tree: When the nodes are partitioned
for forming the branches, all links joining nodes in different branches are removed. Thus
valuable information is lost and a submetric calculates the ratio of the number of links
remaining in the branches to the total number of links in the original graph to rank the
pre-trees in order of the least amount of information lost.
- "Treeness" of the branches: Since our overall objective is to form trees, it is
advantageous if the branches of the pre-tree are already close to trees. If all the
branches only consisted of trees, there would be a total of n - c links where
n is the total number of nodes in the branches and c is the number of
connected components. Thus a submetric which calculates the ratio (n-c)/l
where l is the total number of links is an indication of the "treeness" of the branches.
- "Goodness" of the root: For a structural pre-tree the goodness of the root is
determined by the sum of the distances of the shortest path from the root to all other
nodes. A "good" root will reach all other nodes by following only a few links so that
the resulting tree is not very deep. A deep tree is not desirable since it will force the
user to follow a long and tedious path to reach some information. For content analysis the
goodness of the root is determined by the relevance of the attribute. (For example, for an
automobile database, the manufacturer of the cars is a more relevant attribute than the
number of doors of the cars).
Each submetric returns a number between 0 and 1. The overall metric is calculated by a
weighted sum of the submetrics where the weight is determined by the relative importance
of the submetrics.
The Role of the User
By default, the entire process would be automatically forming the "best" hierarchical
form for the original graph. However, the user can guide the process both during the
translation of the graph to a tree and during the visualization of the tree.
- Translation phase:
- The users can control the various variables that are used in the translation
process. For example, they can control the variable which specifies the maximum possible
depth of the tree (the recursion stops when this depth is reached).
- The user can control the relative importance of the various submetrics in the
overall metric that is used to rank a given pre-tree. For example the user can specify
that the "goodness" of a root is not a useful criteria for judging pre-trees. The user
can also assign different weights to different link types to influence the submetric
calculating the amount of information lost.
- The algorithm generally selects the best possible pre-tree at each level. However,
the user can choose the pre-tree instead. The user is shown the possible pre-trees that
can be selected ranked by the metric and the user can choose one of them. The user can
specify to what level of the hierarchy the pre-trees would be chosen. By choosing
different pre-trees during different runnings of the algorithm, different hierarchies,
giving different perspectives to the data can be formed.
- Visualization phase:
- Besides a 2D tree, the hierarchy can also be visualized as Cone Trees, Treemaps
or as a Table of Contents of a book (which is formed by listing the nodes in the order of
a depth-first search).
- Different visual attributes can be bound to information attributes in the views.
This is an extension of the work reported in [10].
IMPLEMENTATION
The algorithm has been implemented in the Navigational View Builder, a system for forming
overview diagrams of hypermedia systems. Figure 2 represents an overview diagram of an
automobile database. There are a lot of interconnected nodes showing, for example, textual
information about the cars, images of the cars, TV advertisements and audio of previous
buyers' comments. There are also links to other cars with similar price and other models
by the same manufacturer. From this complex network a hierarchy can be formed
automatically. The top-level root of this tree and its children are shown in the left hand
screen of Figure 3. In this case, the attribute Price was used to form the
initial partitioning and the root represents a cluster for all the nodes.
FIGURE 2:
An overview diagram of an automobile database. The diagram is very difficult
to comprehend.
FIGURE 3:
The left hand screen shows the default tree formed for the automobile database. The
top-level partitioning is by the attribute Price. The right hand screen shows
the tree formed if the top-level partitioning is done by the attribute Country.
The user can form different hierarchies by selecting other pre-trees. For example, if the
user wanted to select the pre-tree at the initial level, the dialog box shown in
Figure 4 pops up. If the user wants to partition based on the attribute Country,
the tree shown in the right hand screen in Figure 3 is formed. In this figure some of the
children represents clusters for countries. For example the node labeled Japan represents
all the Japanese cars and its children are shown in the left hand screen of Figure 5. Here
the partitioning is done by the attribute Manufacturer. For some other countries
the nodes in the cluster formed a tree. In these cases the roots of the tree were
identified by structural analysis and they became the children of the overall root. Thus
for Sweden, Saab-Info is the root of the tree for all nodes related to Swedish cars. Its
children are shown in the right hand screen of Figure 5.
FIGURE 4:
At each level various pre-trees can be used. A metric ranks these pre-trees. By default
the pre-tree with the best metric is selected. However, the user can select others using
the above menu.
FIGURE 5:
Examples of Content and Structural analysis for forming pre-trees. The left hand
screen represents the nodes for Japan. The root is a cluster representing all Japanese
cars. The nodes are partitioned by the attribute Manufacturer. The right hand
screen is for Swedish cars. These nodes form a tree with the node Saab-Info as the root.
Figure 6 shows a 3D Tree view of this hierarchy. In this view, the colors of the nodes
represent various countries and the colors of the links represent link types. Various
zooming and filtering operations that are mentioned in
[15] are possible for this 3D tree. Moreover, smooth
animation is used so that the view changes are not abrupt and allow the user to see the
changes easily. (Note that the implementation is done using C++, Motif and Open Inventor
[18].)
FIGURE 6:
A 3d tree View of a hierarchy of the automobile database. Initial partitioning by the
attribute Country. Node colors represent different countries and the link colors
different link types.
Forming Hierarchies in the World-Wide Web
Let us now look at an example from perhaps the most popular hypermedia system, the
World-Wide Web. For input to the Navigational View Builder, information was
automatically extracted from the WWW about the various files, their authors, their links
to other files and other information by parsing the HTML documents using the method
described in [14]. Figure 7 shows an unstructured overview
diagram of the WWW pages about the research activities at the Graphics Visualization
& Usability (GVU) Center at Georgia Tech.
(URL: http/::/www.gatech.edu/gvu/gvutop.html) Obviously, this information is very complicated.
FIGURE 7:
An overview diagram of the World-Wide Web pages about the research activities at GVU. It
indicates clearly why traditional overview diagrams are useless for real-world hypermedia systems.
The left hand screen of Figure 8 shows the top level of the hierarchy automatically
created for the data by the system. The file research.html which lists the
various research activities of the GVU Center is the root. It has branches to the major
research area as well as to gvutop.html, a file containing general information
about GVU. The right hand side of Figure 8 shows a view of a section of this hierarchy
where the nodes are listed as a table of content of a book.
FIGURE 8:
The left hand screen shows the top level of the default hierarchy formed for the
GVU WWW pages. research.html is the root and the major research areas are shown.
The right hand screen shows a book view of a portion of this hierarchy showing research in
Software Visualization.
A major drawback of the World-Wide Web is that very few useful semantic attributes are
defined for the pages. To create some other meaningful hierarchies, attributes like the
topic of the page (whether it is a research page or a personal page, etc.) were inserted
manually. (Efforts are underway to incorporate metadata into WWW and hopefully in the near
future we can extract all useful information from the WWW automatically.) The left hand
screen of Figure 9 represents a treemap view of a hierarchy formed when the initial
partitioning is done by the topic of the page. The colors are used to represent the kind
of users who created the pages. Green is used to represent Phd students and the color
plate indicates that the Phd students are the primary authors of the pages.
FIGURE 9:
The left hand scree shows a Treemap view of a hierarchy of the GVU WWW pages. Initial
partitioning is by the attribute Topic . Colors represent different types of
authors. The selected node is visdebug.html . The corresponding WWW page is
shown on the right.
Multiple hierarchies, each giving a different perspective to the underlying information
space can be formed. If a user selects a node in one view, its positions in the other
views are also highlighted. Thus, these views help the user in comprehending the data. It
should be also noted that the user can go directly to the corresponding WWW page for the
selected node. Thus in the Treemap view, the node visdebug.html is highlighted.
The corresponding WWW page is shown on the right hand screen of Figure 9.
GENERATING OTHER VIEWS
Once a hierarchy is formed from the original graph structure, the hierarchy can be
transformed to other data organizations as well. Visualizations can be formed for these
data organizations also. For example, if the original partitioning for forming the
hierarchy was done by a quantitative attribute, a linear structure sorted by that
attribute can be formed from the subtrees of the root node.
Figure 10 represents a perspective wall [9] view of a
linear arrangement of the GVU WWW pages sorted by the last modification times of the
files. From the hierarchy whose initial partitioning was by the attribute
last-modified-time, the files were divided into partitions based on the time when
they were last modified. These partitions were arranged on walls. Only some walls are in
the focus at a given time. The user can easily control the walls which are in focus
through a scrollbar. Similarly, for the automobile database a Perspective Wall view can be
formed where the cars are sorted by the attribute Price.
FIGURE 10:
A Perspective Wall view showing a linear arrangement of the files based on the last
modification time. The different walls show files which were last modified in different
time frames. Only some walls are in the focus at a given time.
Other views can also be generated. For example, a tabular view showing information like
average price, mileage, etc. for various car models and also such useful statistics for
different manufacturers of the cars can be formed by a depth-first traversal of the
hierarchical structure whose partitionings are done by the attributes
Manufacturer and Car-Model.
RELATED WORK
Our structural analysis is similar to that described in [2]
for identifying hierarchies from hypermedia structures. Although using just structural
analysis to identify hierarchies works for hypertext systems with simpler underlying
structures, identifying meaningful hierarchies by structural analysis alone is difficult
for real-world systems. Content analysis is also essential as is evident from the paper.
[6] describes a method to linearize complex hyper-networked
nodes to facilitate browsing using a book metaphor. However, this work also uses structural analysis only.
This paper is also related to systems that deal with graphical presentation of
information automatically or semi-automatically. Examples include APT
[8] and SAGE [16]. However,
our information domain is different from these systems - these systems deal with highly
structured information. The views that we want to develop are also different. The previous
systems generally produced bar diagrams, scatter plots and such graph views.
CONCLUSION
One of the best ways to comprehend a large complicated information structure is to form
multiple simpler structures each highlighting different aspects of the original
structure. Our work tries to use this philosophy to make a complex hypermedia system
understandable to the user. We believe that by forming various effective views of the
underlying space, we would allow the user to better understand the complex information.
We give examples of the hierarchization process from two complicated hypermedia systems to
illustrate our point. These examples show that our algorithm was able to extract
meaningful hierarchies which gave better insights into the complex information spaces.
Future work is planned along the following directions:
- Visualizing Larger Databases: Although a detailed complexity analysis is
beyond the scope of this paper, it can be shown that the major bottleneck of the algorithm
is the structural analysis to identify roots. [2] uses an
algorithm to identify roots. On the other hand we use
a algorithm to identify roots (by calling the
breadth-first search for each node). Although in the worst case l =
, on average l =
and our algorithm will perform better. For the WWW
database with about 400 nodes and 800 links our algorithm took about 7 seconds on a SGI
reality engine. Although this is acceptable, we will face problems for larger databases.
We are investigating ways to enhance the performance by improving the efficiency of the
code and using probabilistic algorithms to identify roots. Moreover, even cone trees and
treemaps are not able to visualize larger databases effectively. New visualization techniques are needed.
- Usability Studies: A limitation of our system is that no evaluation of how
useful our views really are have been done so far. We plan to do serious usability studies
in the near future. These studies may give us new insights that will help to improve our system.
ACKNOWLEDGEMENT
This work is supported by grants from Digital Equipment Corporation, Bell South
Enterprises, Inc. and Emory University System of Health Care, Atlanta, Georgia as part of
the Hypermedia Interface for Multimedia Databases project. We would also like to thank
the reviewers of this paper for their useful comments.
REFERENCES
- 1
-
G. Battista, P. Eades, R. Tamassia, and I. Tollis.
Algorithms for Drawing Graphs: an Annotated Bibliography.
Technical report, Brown University, June 1993.
- 2
-
R. Botafogo, E. Rivlin, and B. Shneiderman.
Structural Analysis of Hypertexts: Identifying Hierarchies and
Useful Metrics.
ACM Transactions on Office Information Systems, 10(2):142-180,
1992.
- 3
-
K. Fairchild, S. Poltrok, and G. Furnas.
Semnet: Three-dimensional Graphic Representations of Large Knowledge
Bases.
In R. Guindon, editor, Cognitive Science and its Applications
for Human-Computer Interaction. Lawrence Erlbaum, 1988.
- 4
-
G. Furnas and J. Zacks.
Multitrees: Enriching and Reusing Hierarchical Structures.
In Proceedings of the ACM SIGCHI '94 Conference on Human Factors
in Computing Systems, pages 330-336, Boston, Ma, April 1994.
- 5
-
J. Hartigan.
Clustering Algorithms.
John Wiley and Sons, 1975.
- 6
-
S. Ichimura and Y. Matsushita.
Another Dimension to Hypermedia Access.
In Proceedings of Hypertext '93 Conference, pages 63-72,
Seattle, Wa, November 1993.
- 7
-
B. Johnson and B. Shneiderman.
Treemaps: A Space-filling Approach to the Visualization of
Hierarchical Information.
In Proceedings of IEEE Visualization '91 Conference, pages
284-291, San Diego, Ca, October 1991.
- 8
-
J. MacKinlay.
Automating the Design of Graphical Presentation of Relational
Information.
ACM Transactions on Graphics, 5(2):110-141, April 1986.
- 9
-
J. D. Mackinlay, S. Card, and G. Robertson.
Perspective Wall: Detail and Context Smoothly Integrated.
In Proceedings of the ACM SIGCHI '91 Conference on Human Factors
in Computing Systems, pages 173-179, New Orleans, La, April 1991.
- 10
-
S. Mukherjea and J. Foley.
Navigational View Builder: A Tool for Building Navigational Views of
Information Spaces.
In ACM SIGCHI '94 Conference Companion, pages 289-290, Boston,
Ma, April 1994.
- 11
-
S. Mukherjea, J. Foley, and S. Hudson.
Interactive Clustering for Navigating in Hypermedia Systems.
In Proceedings of the ACM European Conference of Hypermedia
Technology, pages 136-144, Edinburgh, Scotland, September 1994.
- 12
-
C. Neuwirth, D. Kauffer, R. Chimera, and G. Terilyn.
The Notes Program: A Hypertext Application for Writing from Source
Texts.
In Proceedings of Hypertext '87 Conference, pages 121-135,
Chapel Hill, NC, November 1987.
- 13
-
H. Parunak.
Hypermedia Topologies and User Navigation.
In Proceedings of Hypertext '89 Conference, pages 43-50,
Pittsburgh, Pa, November 1989.
- 14
-
J. Pitkow and K. Bharat.
WEBVIZ: A Tool for World-Wide Web Access Log Visualization.
In Proceedings of the First International World-Wide Web
Conference, Geneva, Switzerland, May 1994.
- 15
-
G. G. Robertson, J. D. Mackinlay, and S. Card.
Cone Trees: Animated 3D Visualizations of Hierarchical Information.
In Proceedings of the ACM SIGCHI '91 Conference on Human Factors
in Computing Systems, pages 189-194, New Orleans, La, April 1991.
- 16
-
S. Roth, J. Kolojejchick, J. Mattis, and J. Goldstein.
Interactive Graphic Design Using Automatic Presentation Knowledge.
In Proceedings of the ACM SIGCHI '94 Conference on Human Factors
in Computing Systems, pages 112-117, Boston, Ma, April 1994.
- 17
-
K. Utting and N. Yankelovich.
Context and Orientation in Hypermedia Networks.
ACM Transactions on Office Information Systems, 7(1):58-84,
1989.
- 18
-
J. Wernecke.
The Inventor Mentor: Programming Object-Oriented 3D Graphics
with Open Inventor.
Addison-Wesley Publishing Company, 1994.