The COBS Project

The COBS Project

Principal Investigators: Karsten Schwan, Mustaque Ahamad
Technical contacts: Karsten Schwan (schwan@cc.gatech.edu) and Mustaque Ahamad (mustaq@cc.gatech.edu)


Configurable OBjects for High Performance Systems

High performance computing systems are moving from single, large-scale, homogeneous platforms like the CM-5 to heterogeneous platforms containing both uniprocessor and SMP nodes. In addition, by leveraging supercomputer technologies (e.g., Myricom and/or Dolphin boards), cluster machines can take on tasks formerly requiring supercomputer support. In fact, manufacturers are now attempting to leverage PC and workstation markets by developing products that can span the entire range of performance requirements from uniprocessor to supercomputer applications.

Given the architectural trends outlined above, end users must be able to write application programs such that they may run on any such platform. For operating systems, this results in pressure to support a single programming model capable of spanning all such machines. Shared memory or more precisely, distributed shared memory coupled with threads, is one such model, but it has been recognized that its scalability is limited to moderate size machines. The object model, on the other hand, has been shown to be useful for very large scale applications like those involving the world wide web; however, existing commercial implementations do not offer the performance necessary for its use with high performance applications on all of the machines indicated above. More specifically, CORBA-compliant object representations and the implementation of object communications do not adequately address the performance requirements of supercomputer applications. At the same time, parallel implementations of C++ do not permit programmers to vary object representations or the implementations of object communications to exploit specific attributes of underlying computational platforms.

The COBS project is developing a uniform programming model for high performance, heterogeneous machines. The intent of our work is to have broad impact by leveraging commercial object technologies, while simultaneously attaining high performance by gaining and using novel research insights on efficient object representations:

  1. Novel formulations of concurrent and distributed objects will address low-latency communications on high performance parallel machines, such as the IBM SP2 and shared memory platforms. First results of this work are described in [CMS93]. The intent of this work is to support the communication and synchronization requirements of single, large-scale, parallel applications on the designated target platforms.

  2. Object formulations will be extended so that programs running on high performance platforms can equally easily run on workstation clusters, such as machines connected by Myricom or ATM network boards. In effect, such distributed objects will effectively extend an application's address space across large-scale, distributed memory machines. The Indigo system developed in our earlier research will be one basis of this work, in conjunction with the configurable object model described in [Sch94a].

  3. Objects will range from `memory' objects permitting programs to share unstructured, raw data stored in memory pages to typed and structured objects explicitly defined by application programs, so that page objects may be used when appropriate, but performance may be improved further by exploiting class information or even information about how object state is physically distributed (fragmented) across different machines (see [CMS93], [Sch90c]).

  4. Object implementations will take advantage of a variety of performance techniques, including replication (see Catamaran ), and the dynamic adaptation of selected aspects of their implementations, so that performance-sensitive implementation attributes may be dynamically changed in response to runtime variations in application behavior or execution environments (see [Sch94a]).

  5. Objects can interoperate across different machines and different operating systems, by using evolving industry standards like the Common Object Services (COS) as defined by OMG.

Distributed objects will also support communications across different machines and human-machine as well as human-human communications. As a result, they will be formulated such that programs can use them to share information across different address spaces and across homogeneous and heterogeneous sets of machines.

The COBS project's work is important because (1) it addresses the needs of high performance, scientific programs while (2) also addresssing commercially broadly relevant applications sharing information across networked systems at lower bandwidths (e.g., local or metropolitan area networks). The intellectual contribution of this research is the identification and evaluation of the mechanisms required at the level of object implementation that facilitate the construction of alternative object representations, where representations may be tailored to the levels of performance and functionality required by target applications. COBS will offer a single set of mechanisms with which objects may be constructed that span this breadth of applications and target machines. At the same time, we will produce a reference implementation at the object level based on which we hope to affect current industry standards efforts (ie., OMG's standards).

Expected Outcomes

The expected results of this research are directly derived from the unique proposed collaboration between the IBM TJ Watson industrial research group and the Georgia Tech academic research team.

The primary role of the Georgia Tech team is to perform exploratory implementations and evaluations on equipment platforms with homogeneous operating systems, in order to experiment with alternative representations of concurrent and distributed objects. Two platforms will be used, effectively leveraging both a recent large SUR grant from IBM to Georgia Tech (approx. $800,000) and recent equipment awards to Georgia Tech from the National Science Foundation (approx. $1,500,000 total): (1) workstation clusters and SMP machines linked with ATM or Myricom network boards, and (2) an IBM SP-2 machine. The deliverables resulting from this work are software libraries and graphical tools supporting their usage, demonstrated with applications that address the high performance computing needs of scientific and engineering end users.

The primary role of the IBM TJ Watson team will be the simultaneous and joint development of object technologies for heterogeneous operating systems on ATM-networked computing platforms (ie., for the AIX and OS/2 operating systems), and the provision and maintenance of reference implementations leveraging IBM object technologies, and resulting in the ability of making developed technologies relevant in commercial contexts (e.g., by working with OpenDoc and with Independent Software Vendors - ISVs).

The expected outcomes of our collaboration are (1) the development of object technologies that break new ground in research while (2) performing such developments in contexts that facilitate the rapid transfer of technologies to applications in commercial and military environments. We thereby hope to address the principal shortcoming of many past efforts concerning high performance, distributed and parallel objects, such as Concurrent C++ at Caltech, research systems like Eden and Amber, and our own previous work described in [clouds] , [Ghe93] , [CMS93] , [MS93c] and [Sch90c] .

Specific expected results and deliverables of the proposed research at Georgia Tech are aimed at supporting future high performance computing environments, called Distributed Laboratories by our research group. Specifically, we aim to support the effective use of modern high performance machines and applications by end users, Essentially, such users employ computational `instruments' in their `virtual laboratories' to investigate certain scientific or engineering phenomena. Furthermore, end users tend to use not one but multiple such instruments simultaneously, to generate data, analyze it, correlate information, display it, etc. Last, typically, multiple scientists will collaborate by cooperative and on-line use of such shared instruments.

The object technologies to be developed by our group will address the efficient sharing of state in distributed scientific or engineering laboratories. Specifically, such state sharing occurs across distributed program modules (ie., between different computational `instruments'), between the program and human users (e.g., on-line visualizations of program data), and between different human users (e.g., shared data visualizations manipulated by cooperating end users):

  1. Development of distributed and concurrent objects on the IBM SP-2 machine and on an ATM- (and Myrinet-)networked cluster of uni- and multiprocessor workstations. Such objects will be used to maintain the shared state of high performance, scientific applications developed jointly with end users in related research programs. These applications are being developed in `Distributed Laboratory' settings already addressed by current, funded research, including (1) a large-scale atmospheric modeling code ( [Kindler] ), (2) embedded and real-time applications ( [RT] ), (3) molecular simulations developed jointly with physicists ([MD]) and (4) high performance, distributed discrete event simulations ([TW], [PORTS]).

  2. Development of and experimentation with specific performance-relevant techniques for the construction of concurrent and distributed objects, including the ability to cache and replicate object state ([ATKSS], [KA95]), the exploitation of class information to optimize inter-object communications ( [Indigo] ), the explicit fragmentation of objects ( [CMS93] ), the association of active events with object implementations ( [CMS93] ), and the ability to dynamically adapt selected object attributes ( [Sch94a] ).

  3. Development and provision of software libraries and tools for the construction, evaluation, and runtime support of concurrent and distributed objects, on the aforementioned multiprocessor and distributed system platforms. Library APIs will be consistent with the APIs of the libraries developed and used by our IBM research partners, so that applications developed by either team can be run on any of the target platforms explored in this research. In addition, programming and performance profiling tools will be shared by both research teams.

Joint work with researchers at IBM TJ Watson will result in the generalization of such results to other platforms and applications, add innovative technologies for object construction, and result in higher potentials for technology transfer. Expected results provided by the IBM TJ Watson team include:

  1. Provision of a CORBA-compliant reference implementation that can operate on heterogeneous platforms, including both AIX Unix and OS/2 platforms, by leveraging existing IBM object technologies (e.g., DSOM and SOM or OpenDoc) This implementation will also be used by the Georgia Tech researchers on the shared IBM SP-2 platform.

  2. Development and investigation of a specific mechanism required for efficient, distributed object implementations, called the Event Response Architecture (ERA), on an ATM-networked workstation platform.

  3. Development of a graphical tool for composing primitive objects defined at the ERA level of abstraction into objects offering the functionality, interfaces, and performance sought by application programs. Performance profiling will be possible in the context of this tool, as well.

  4. Development and experimentation with commercially relevant applications for distributed objects, specifically requiring the mechanisms provided by the ERA model. The general class of applications the IBM team has been investigating are shared virtual environments, such as shared documents, shared visualizations, and shared workspaces.

References

[CMS93] Christian Clemencon and Bodhisattwa Mukherjee and Karsten Schwan, "Distributed Shared Abstractions (DSA) on Large-Scale Multiprocessors" Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS-4), Sept. 1993.

[Indigo] Prince Kohli, Mustaque Ahamad, and Karsten Schwan, Indigo: User-level Support for Building Distributed Shared Abstractions , Fourth IEEE International Symposium on High-Performance Distributed Computing (HPDC-4), August 1995.

[Sch94a] Ahmed Gheith, Bodhi Mukherjee, Dilma Silva, and Karsten Schwan, ``KTK: Kernel Support for Configurable Objects and Invocations'' , Second International Workshop on Configurable Distributed Systems, IEEE, ACM, March 1994.

[Sch90c] Karsten Schwan and Win Bo, Topologies -- Computational Messaging for Multicomputers, ACM Transactions on Computer Systems, May 1990.

[clouds] Partha Dasgupta and Richard J. LeBlanc and Mustaque Ahamad and Umakishore Ramachandran, The CLOUDS Distributed Operating System, IEEE Computer, Nov. 1991.

[ghe93] Ahmed Gheith and Karsten Schwan", CHAOS-Arc -- Kernel Support for Multi-Weight Objects, Invocations, and Atomicity in Real-Time Applications, ACM Transactions on Computer Systems, May 1989.

[MS93c] Bodhisattwa Mukherjee and Karsten Schwan, Improving Performance by Use of Adaptive Objects: Experimentation with a Configurable Multiprocessor Thread Package, Proc. of Second International Symposium on High Performance Distributed Computing (HPDC-2), July 1993.

[Kindler] Thomas Kindler, Karsten Schwan, Dilma Silva, Mary Trauner and Fred Aleya, Parallelization of Spectral Models for Atmospheric Transport Processes , To appear in Concurrency: Practise and Experience, 1995.

[MD] Greg Eisenhauer and Karsten Schwan, Design and Analysis of a Parallel Molecular Dynamics Application, submitted to the Journal of Parallel and Distributed Computing, February 1995.

[RT] Karsten Schwan, Hongyi Zhou and Ahmed Gheith, Multiprocessor Real-Time Threads, Operating Systems Review, Jan. 1992.

[ATKSS] Mustaque Ahamad, Francisco Jose Torres-Rojas, Rammohan Kordale, Jasjit Singh and Shawn Smith, Detecting Mutual Consistency of Shared Objects.

[KA95] Rammmohan Kordale and Mustaque Ahamad, Object Caching in a CORBA-compliant system, Technical Report GIT-CC-95-23, College of Computing, Georgia Institute of Technology.

[TW] Kaushik Ghosh, Richard Fujimoto and Karsten Schwan, Time Warp Simulation in Time-Constrained Systems, 7th Workshop on Parallel and Distributed Simulation (PADS), May 1993.

[PORTS] Kaushik Ghosh, Kiran Panesar, Richard Fujimoto and Karsten Schwan, PORTS: A Parallel, Optimistic, Real-Time Simulator, 8th Workshop on Parallel and Distributed Simulation (PADS), July 1994.


Maintainer:
Prince Kohli (pkohli@cc.gatech.edu)
Last Modified Tue Aug 29