To demonstrate the parallel performance of the algorithm a set of examples is presented - a junction of two pipes (Fig. 2), a chair (Fig. 3), and a mechanical joint (Fig. 4). The junction has been discretized by two uniform three-dimensional meshes. The smaller one contains 152.186 nodes and the larger one 518.929 nodes. Similarly, two uniform two-dimensional meshes comprising 338.512 and 1.045.504 nodes, respectively, have been generated to discretize the model of the chair. The mechanical joint has been discretized by four three-dimensional meshes. Two uniform meshes contain 162.300 and 524.979 nodes, respectively, and two graded meshes comprise 165.771 and 574.053 nodes, respectively. Note that the number of nodes rather than elements is used to describe the size of the mesh. The reason is that the number of elements is slightly misleading because of the mixed nature of the mesh. Since the meshes are generally too large with respect to the memory available on Transtech Paramid machine, only the results from IBM SP2 and PC cluster are presented. The size of the smaller mesh has been always chosen with respect to the available memory on SP2 machine in order to make the discretization attainable using just a single processor. This is important for the speedup evaluation. The larger mesh is about three times as big as the smaller one. In this case, the speedup on SP2 machine has been calculated only approximately by estimating the time required to accomplish the discretization on a single processor. This estimate was based on the number of generated nodes taking into account the linear computational complexity of the algorithm. The speedup on PC cluster is always evaluated using the single processor time obtained on the faster processor (450 MHz). This results in a slightly underestimated speedup if a slower processor was also involved in the computation. Since the master and slaves parallel computing scheme has been adopted, a separate processor must be allocated for the master process. Note however that the master processor has not been considered in the evaluation of the speedup and efficiency. This is affordable because it has been verified that master process can be running together with one slave process on the same processor without impact on the performance. Despite this fact, the speedup on PC cluster was evaluated only up to 7 slave processors due to the license restrictions limiting the total number of processes to 8. The execution times (), speedups (), and efficiencies () are summarized in Tables 1 - 8. Note that speedup and efficiency calculated only approximately are marked by an asterisk.

*Daniel Rypl
2005-12-03*