Principal Investigators:
Karsten Schwan,
Mustaque Ahamad
Technical contacts:
Karsten Schwan
(schwan@cc.gatech.edu)
and
Mustaque Ahamad
(mustaq@cc.gatech.edu)
High performance computing systems are moving from single, large-scale, homogeneous platforms like the CM-5 to heterogeneous platforms containing both uniprocessor and SMP nodes. In addition, by leveraging supercomputer technologies (e.g., Myricom and/or Dolphin boards), cluster machines can take on tasks formerly requiring supercomputer support. In fact, manufacturers are now attempting to leverage PC and workstation markets by developing products that can span the entire range of performance requirements from uniprocessor to supercomputer applications.
Given the architectural trends outlined above, end users must be able to write application programs such that they may run on any such platform. For operating systems, this results in pressure to support a single programming model capable of spanning all such machines. Shared memory or more precisely, distributed shared memory coupled with threads, is one such model, but it has been recognized that its scalability is limited to moderate size machines. The object model, on the other hand, has been shown to be useful for very large scale applications like those involving the world wide web; however, existing commercial implementations do not offer the performance necessary for its use with high performance applications on all of the machines indicated above. More specifically, CORBA-compliant object representations and the implementation of object communications do not adequately address the performance requirements of supercomputer applications. At the same time, parallel implementations of C++ do not permit programmers to vary object representations or the implementations of object communications to exploit specific attributes of underlying computational platforms.
The COBS project is developing a uniform programming model for high performance, heterogeneous machines. The intent of our work is to have broad impact by leveraging commercial object technologies, while simultaneously attaining high performance by gaining and using novel research insights on efficient object representations:
Distributed objects will also support communications across different machines and human-machine as well as human-human communications. As a result, they will be formulated such that programs can use them to share information across different address spaces and across homogeneous and heterogeneous sets of machines.
The COBS project's work is important because (1) it addresses the needs of high performance, scientific programs while (2) also addresssing commercially broadly relevant applications sharing information across networked systems at lower bandwidths (e.g., local or metropolitan area networks). The intellectual contribution of this research is the identification and evaluation of the mechanisms required at the level of object implementation that facilitate the construction of alternative object representations, where representations may be tailored to the levels of performance and functionality required by target applications. COBS will offer a single set of mechanisms with which objects may be constructed that span this breadth of applications and target machines. At the same time, we will produce a reference implementation at the object level based on which we hope to affect current industry standards efforts (ie., OMG's standards).
The expected results of this research are directly derived from the unique proposed collaboration between the IBM TJ Watson industrial research group and the Georgia Tech academic research team.
The primary role of the Georgia Tech team is to perform exploratory implementations and evaluations on equipment platforms with homogeneous operating systems, in order to experiment with alternative representations of concurrent and distributed objects. Two platforms will be used, effectively leveraging both a recent large SUR grant from IBM to Georgia Tech (approx. $800,000) and recent equipment awards to Georgia Tech from the National Science Foundation (approx. $1,500,000 total): (1) workstation clusters and SMP machines linked with ATM or Myricom network boards, and (2) an IBM SP-2 machine. The deliverables resulting from this work are software libraries and graphical tools supporting their usage, demonstrated with applications that address the high performance computing needs of scientific and engineering end users.
The primary role of the IBM TJ Watson team will be the simultaneous and joint development of object technologies for heterogeneous operating systems on ATM-networked computing platforms (ie., for the AIX and OS/2 operating systems), and the provision and maintenance of reference implementations leveraging IBM object technologies, and resulting in the ability of making developed technologies relevant in commercial contexts (e.g., by working with OpenDoc and with Independent Software Vendors - ISVs).
The expected outcomes of our collaboration are (1) the development of object technologies that break new ground in research while (2) performing such developments in contexts that facilitate the rapid transfer of technologies to applications in commercial and military environments. We thereby hope to address the principal shortcoming of many past efforts concerning high performance, distributed and parallel objects, such as Concurrent C++ at Caltech, research systems like Eden and Amber, and our own previous work described in [clouds] , [Ghe93] , [CMS93] , [MS93c] and [Sch90c] .
Specific expected results and deliverables of the proposed research at Georgia Tech are aimed at supporting future high performance computing environments, called Distributed Laboratories by our research group. Specifically, we aim to support the effective use of modern high performance machines and applications by end users, Essentially, such users employ computational `instruments' in their `virtual laboratories' to investigate certain scientific or engineering phenomena. Furthermore, end users tend to use not one but multiple such instruments simultaneously, to generate data, analyze it, correlate information, display it, etc. Last, typically, multiple scientists will collaborate by cooperative and on-line use of such shared instruments.
The object technologies to be developed by our group will address the efficient sharing of state in distributed scientific or engineering laboratories. Specifically, such state sharing occurs across distributed program modules (ie., between different computational `instruments'), between the program and human users (e.g., on-line visualizations of program data), and between different human users (e.g., shared data visualizations manipulated by cooperating end users):
Joint work with researchers at IBM TJ Watson will result in the generalization of such results to other platforms and applications, add innovative technologies for object construction, and result in higher potentials for technology transfer. Expected results provided by the IBM TJ Watson team include:
[CMS93] Christian Clemencon and Bodhisattwa Mukherjee and Karsten Schwan, "Distributed Shared Abstractions (DSA) on Large-Scale Multiprocessors" Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS-4), Sept. 1993.
[Indigo] Prince Kohli, Mustaque Ahamad, and Karsten Schwan, Indigo: User-level Support for Building Distributed Shared Abstractions , Fourth IEEE International Symposium on High-Performance Distributed Computing (HPDC-4), August 1995.
[Sch94a] Ahmed Gheith, Bodhi Mukherjee, Dilma Silva, and Karsten Schwan, ``KTK: Kernel Support for Configurable Objects and Invocations'' , Second International Workshop on Configurable Distributed Systems, IEEE, ACM, March 1994.
[Sch90c] Karsten Schwan and Win Bo, Topologies -- Computational Messaging for Multicomputers, ACM Transactions on Computer Systems, May 1990.
[clouds] Partha Dasgupta and Richard J. LeBlanc and Mustaque Ahamad and Umakishore Ramachandran, The CLOUDS Distributed Operating System, IEEE Computer, Nov. 1991.
[ghe93] Ahmed Gheith and Karsten Schwan", CHAOS-Arc -- Kernel Support for Multi-Weight Objects, Invocations, and Atomicity in Real-Time Applications, ACM Transactions on Computer Systems, May 1989.
[MS93c] Bodhisattwa Mukherjee and Karsten Schwan, Improving Performance by Use of Adaptive Objects: Experimentation with a Configurable Multiprocessor Thread Package, Proc. of Second International Symposium on High Performance Distributed Computing (HPDC-2), July 1993.
[Kindler] Thomas Kindler, Karsten Schwan, Dilma Silva, Mary Trauner and Fred Aleya, Parallelization of Spectral Models for Atmospheric Transport Processes , To appear in Concurrency: Practise and Experience, 1995.
[MD] Greg Eisenhauer and Karsten Schwan, Design and Analysis of a Parallel Molecular Dynamics Application, submitted to the Journal of Parallel and Distributed Computing, February 1995.
[RT] Karsten Schwan, Hongyi Zhou and Ahmed Gheith, Multiprocessor Real-Time Threads, Operating Systems Review, Jan. 1992.
[ATKSS] Mustaque Ahamad, Francisco Jose Torres-Rojas, Rammohan Kordale, Jasjit Singh and Shawn Smith, Detecting Mutual Consistency of Shared Objects.
[KA95] Rammmohan Kordale and Mustaque Ahamad, Object Caching in a CORBA-compliant system, Technical Report GIT-CC-95-23, College of Computing, Georgia Institute of Technology.
[TW] Kaushik Ghosh, Richard Fujimoto and Karsten Schwan, Time Warp Simulation in Time-Constrained Systems, 7th Workshop on Parallel and Distributed Simulation (PADS), May 1993.
[PORTS] Kaushik Ghosh, Kiran Panesar, Richard Fujimoto and Karsten Schwan, PORTS: A Parallel, Optimistic, Real-Time Simulator, 8th Workshop on Parallel and Distributed Simulation (PADS), July 1994.