Next: Nonlinear Parallel Analysis by Explicit Algorithm Up: Examples Previous: Examples

Parallel Mesh Generation

The mesh generation utilizes the tree-based approach and is designed for parallel processing on memory distributed computing platforms [15,16]. The parallelization strategy is based on the domain decomposition concept. Two levels of the domain decomposition have been considered - the model level and the model entity parametric tree level. The discretization is accomplished by application of templates fitted into the cells of a generalized parametric tree data structure built over individual model entities. The compatibility of tree structures on the processor interface is ensured by an iterative process. The actual parallel computing scheme is based on the master and slaves parallel paradigm. Since a dynamic load balancing mechanism is employed, an even distribution of the work load among the processors is ensured. A very favourable ratio between the computation and communication has been  achieved and a considerable speedup has been evidenced. The algorithm has been successfully implemented on several parallel computing platforms - IBM SP2, IBM SP, Transtech Paramid and Dell PC cluster. Two different message passing libraries have been used for the implementation of communication: i) MPI (Message Passing Interface) and ii) Parmacs (Parallel Macros). MPI is the primary message passing library used for implementation on IBM SP2, SP and PC cluster. Since MPI is not available on Transtech Paramid machine, Parmacs has been chosen as an alternative message passing library. To demonstrate the parallel performance of the algorithm a set of examples is presented - a chair (Fig. 1), a mechanical joint (Fig. 2) a junction of two pipes (Fig. 3). The chair has been discretized by two uniform two-dimensional meshes. The smaller one contains 338.512 nodes and the larger one 1.045.504 nodes. Similarly, two uniform three-dimensional meshes comprising 152.186 and 518.929 nodes, respectively, have been generated to discretize the model of the junction. The mechanical joint has been discretized by four three-dimensional meshes. Two uniform meshes contain 162.300 and 524.979 nodes, respectively, and two graded meshes comprise 165.771 and 574.053 nodes, respectively. Since the meshes are generally too large with respect to the memory available on Transtech Paramid machine, only the results from IBM SP and PC cluster are presented. Since the master and slaves parallel computing scheme has been adopted, a separate processor must be allocated for the master process. Note however that the master processor has not been considered in the evaluation of the speedup and efficiency. This is affordable because it has been verified (on IBM SP2 only) that master process can be running together with one slave process on the same processor without impact on the performance.  Despite this  fact, the  speedup on PC  cluster was evaluated  only up to slave processors due to the license restrictions limiting the total number of processes to 8 (under Windows NT). The execution times and speedups for individual meshes, hardware and software platforms are summarized in Figs 4 - 11. Note that the speedup on PC cluster is always evaluated using the single processor time obtained on the faster processor (450 MHz). This results in a slightly underestimated speedup if a slower processor was also involved in the computation.

Next: Nonlinear Parallel Analysis by Explicit Algorithm Up: Examples Previous: Examples

Daniel Rypl
2005-12-03