Parallel discrete event simulation systems (PDES) are used to simulate
large-scale applications such as modeling telecommunication networks,
transportation grids, and battlefield scenarios. While a large amount
of PDES research has focused on employing multiprocessors and
multicomputers, the use of networks of workstations interconnected
through Ethernet or ATM has evolved into a popular effective platform
for PDES. Nonetheless, the development of efficient PDES systems in
network computing environments is not without its share of
difficulties that severely degrade simulator performance. To better
understand how these factors degrade performance as well as develop
new algorithms to mitigate them, we investigate the use of graphical
visualization to provide insight into performance evaluation and
simulator execution. We began with a general purpose network
computing visualization system,
PVaniM, and used it to
investigate the execution of an advanced version of Time Warp, called
Georgia Tech Time Warp (GTW), which executes in network computing
environments. Because PDES systems such as GTW are essentially
middleware that support their own applications, we soon realized these
systems require their own middleware-specific visualization
support. To this end we have extended
PVaniM into a new system,
called PVaniM-GTW by adding middleware-specific views. Our experiences
with PVaniM-GTW indicates that these enhancements enable one to better
satisfy the needs of PDES middleware than general purpose
visualization systems while also not requiring the development of
application specific visualizations by the end user.
The following describes the middleware-specific views that have been
added to PVaniM. A description of
PVaniM's default views may be
found here.
The Processor Advance Time (PAT) view is located to the right of the
Host List view. The processor advance time for processor is defined as
the amount of wall clock time needed to advance the simulation a
single unit of simulation time. When the PAT values among the the host
processors differ, there exists a load imbalance. The BGE algorithm
migrates clusters of LPs to the appropriate machines such that the PAT
values across all machines should be about equal. Consequently, this
view gives an immediate indication how well the BGE algorithm is
balancing the load. For processors not in use, their PAT value is
zero.
Positioned to the right of the PAT view, the Clusters / Primary
Rollbacks (PRBS) is a toggled view that displays either how the
clusters are distributed among the active hosts or the percentage of
events processed that are rolled back during the sampling interval due
to a late arriving application message (a.k.a straggler message).
The clusters view is used in conjunction with other information to
determine if the BGE algorithm is operating correctly.
Primary rollbacks serve as one of the major indicators for GTW
performance. The fewer events rolled back due to straggler
messages results in a reduction in erroneous event computations,
which ultimately yields better simulator performance.
The Secondary Rollbacks
view is a toggling view (shared with
PVaniM's
Load view) shows the percentage of the events processed that are
rolled back during the sampling interval due to the processing of an
anti-message. This view provides insight into how far an erroneous
computation has spread, by indicating the fan-out of LP communications
links along which messages are scheduled in the application being
simulated.
The Aborted Events view
is a toggling view (shared with
PVaniM's
Memory Usage view) which shows the percentage of events processed that
are aborted during the sampling interval. In the GTW system, a fixed
number of event buffers are allocated during initialization and
manages those buffers to avoid costly memory allocation system calls
during runtime. An event is aborted if the scheduling of a future
event fails because all event buffers are currently in use. This
approach is used to prevent a processor from becoming overly
optimistic. Usually, events are aborted because a slow GVT calculation
process or a general lack of event buffers due to the large set of
pending events. Like rollbacks, aborted events have a detrimental
effect on system performance and should be avoided whenever possible.
Christopher Carothers Parallel Simulation and Computer Architecture
College of Computing
Georgia Institute of Technology
|
Brad Topol Graphics, Visualization & Usability Center
College of Computing
Georgia Institute of Technology
|
Richard Fujimoto Parallel Simulation and Computer Architecture
College of Computing
Georgia Institute of Technology
|
John Stasko Graphics, Visualization & Usability Center
College of Computing
Georgia Institute of Technology
|
Vaidy Sunderam Department of Math and Computer Science
Emory University
|
Back to Software Visualization Home Page
Questions or comments? Email
gvu-webmaster@cc.gatech.edu.
|