The analysis has been performed on two different parallel hardware platforms - PC cluster and IBM SP2 machine. The PC cluster consists of four workstations DELL 610, each equipped with two processors. Two workstations contain dual PII Xeon processors at 450 MHz with 512 MB of shared system memory and the remaining two comprise dual PII Xeon processors at 400 MHz with 512 MB of shared memory. The workstations are connected by Fast Ethernet 100 Mb network using 3Com Superstack II switch, model 3300. All workstations are running Windows NT 4.0 operating system. Note that this cluster represents a heterogenous parallel computing platform with the combination of shared and distributed memory. The communication is based on MPI/Pro for Windows NT message passing library (MPI Software technology, Inc1) that supports both the distributed and shared memory communication. The IBM SP2 (installed at CTU computing centre) is a heterogenous machine equipped with P2SC processors, running at 120 and 160 MHz and having at least 128 MB of system memory. The SP nodes, running AIX 4.1 operating system, are connected by HPS switch, allowing simultaneous bidirectional transfer of 40 MB/sec between any two nodes. The communication is based on MPI built on the top of the native MPL message passing library.
The results achieved on PC cluster and SP2 machine are presented in Tables 5 and 6 and in Figs 6 and 7, respectively. Note that the heterogeneity of the computing platforms has been taken into account neither in the mesh partitioning (all partitions are equally load balanced) nor in the speedup or efficiency evaluation. Since the single processor computation has been always performed on the most powerful processor, the speedup is slightly underestimated whenever a slower processor has participated in the calculation. The degradation of the speedup profile is also caused by the adopted static load balancing. Since the computational complexity at some regions is increasing considerably during the analysis (strain-softening), the load balance is disturbed, resulting in the less loaded processors to be idle. This effect is becoming more significant as the number of processors increases. Despite these facts, the achieved speedup and efficiency are significant, leading to considerable reduction of the computational time.