Next: Examples Up: Top Previous: Parallel computing scheme

Implementation

The mesh generator has been implemented on several parallel computing platforms: i) IBM SP2, ii) Transtech Paramid, and iii) Dell PC cluster.

The IBM SP2 machine at CTU in Prague is currently equipped with 15 Power2 processors running at 66.7 MHz and 8 P2SC processors running at 120 MHz, each one with at least 128 MB of memory and 2 GB of disk space. The processors are interconnected with the standard Ethernet (10 MBits/s) and HPS (high performance switch - 40 MB/s). The machines are running AIX 4.1. A set of queues, managed by a job scheduler, is configured above the processors. The queues differs in priority, CPU limit, and access to a particular set of processors. The processors are organized into several groups (pools) with respect to their accessibility and performance. Note that the above described configuration of the SP2 machine is very recent. During the actual implementation of the parallel mesh generator, the SP2 machine has been equipped with the processors of the same type (Power2). Therefore different performance of individual processors has not been taken into account in the domain decomposition strategy.

The Transtech Paramid machine (UWC Cardiff) possesses 48 Intel i860xp vector processors with 16 MB of memory. The communication is based on T805 transputers with a typical speed of 1.2 MB/s. The processors are organized into nodes containing three mutually interconnected processors. The connection to other processors is done via connection on the node level. The jobs are spawn from a host machine (Sparc 10 workstation) on the first in - first out basis. The parallel jobs can be run on different topologies of processors - pipe, grid, or torus. This restricts the number of processors which can be required for a parallel job because only specific configurations of a given topology are available. In the presented implementation, the torus topology has been used because it provides the most suitable connections between the nodes with respect to the communication requirements of the mesh generator.

The PC cluster is based on 4 Dell PCs each containing 2 Pentium II Xeon processors running either at 400 or 450 MHz, with 512 MB of memory and 512 kB cache. These computers are connected with a 3COM switch 3300 (10/100 Mbits/s Ethernet). The machines are running Windows NT 4.0. Note that this cluster represents a heterogenous parallel computing platform with the combination of shared and distributed memory. It is therefore beneficial to possess such a message passing library that supports not only TCP/IP communication, used between the machines in the cluster, but also a shared memory communication, between the processors on a single machine. The MPI/Pro for Windows NT message passing library, that has been actually used, supports both types of communication. The heterogeneity, on the other hand, has not been considered significant and the system has been treated like homogeneous.

Two different message passing libraries have been used for the implementation of communication: i) MPI (Message Passing Interface) and ii) Parmacs (Parallel Macros). MPI is the primary message passing library used for implementation on IBM SP2 and PC cluster. Since MPI is not available on Transtech Paramid machine, Parmacs has been chosen as an alternative message passing library.

MPI offers a full range of tools for point-to-point communication, collective operations, process topology and groups, and communication context. Some other tools, e.g., for the task management or remote execution (both available in PVM (Parallel Virtual Machine)), are not included into current standard specification. There is one important feature in the point-to-point communication - the fairness. MPI guarantees fairness if only two processes are involved in the point-to-point communication in a single threaded environment. In that case, any two communications between these two processes are ordered and the messages are not overtaking. This guarantees that the message passing code is deterministic. However, the fairness is not guaranteed if more than two processes are involved in the communication. Then it is possible that the destination process, repeatedly posting a receive which matches a particular send, will never receive that particular send because it is each time overtaken by another message sent from another source. The same situation may arise in a multi-threaded process if the semantics of the thread execution does not define the relative order between two send or receive operations executed by two distinct threads.

Parmacs message passing library is fairly not as reach as MPI. Most importantly, the collective operations are not available. Parmacs only provides the user with the hierarchy of spawning tree of individual processes and it is up to the user to implement the collective communication. Also the point-to-point communication is limited to basic modes (synchronous and asynchronous). The most crucial aspect of Parmacs is that the fairness of communication is not guaranteed for asynchronous mode at all. This seriously complicates the implementation of repeated multiple asynchronous communication.



Next: Examples Up: Top Previous: Parallel computing scheme

Daniel Rypl
2005-12-03