Next: Conclusions Up: Top Previous: Communication

Example

A 3D notched specimen has been analyzed in three-point-bending using the direct explicit integration. The specimen geometry is shown in Fig. 4. The employed constitutive model is a nonlocal variant of rotating crack model with transition to scalar damage (see Section 3). The constitutive properties are summarized in Table 4. In order to simulate static test, the specimen loading has been controlled by the prescribed displacement of a transverse edge in the middle of the top specimen surface

(see Fig. 4), which has been determined from the requirement of minimal inertia forces [9]. The mesh contains 1964 nodes and 9324 linear tetrahedral elements. The total number of time steps analyzed is 7500. The modified node-cut strategy has been used. The partitions have been generated prior the analysis and have been kept constant throughout the whole analysis (static load balancing). An example of domain decomposition for 4-processor analysis is depicted in Fig. 5.

The analysis has been performed on two different parallel hardware platforms - PC cluster and IBM SP2 machine. The PC cluster consists of four workstations DELL 610, each equipped with two processors. Two workstations contain dual PII Xeon processors at 450 MHz with 512 MB of shared system memory and the remaining two comprise dual PII Xeon processors at 400 MHz with 512 MB of shared memory. The workstations are connected by Fast Ethernet 100 Mb network using 3Com Superstack II switch, model 3300. All workstations are running Windows NT 4.0 operating system. Note that this cluster represents a heterogenous parallel computing platform with the combination of shared and distributed memory. The communication is based on MPI/Pro for Windows NT message passing library (MPI Software technology, Inc1) that supports both the distributed and shared memory communication. The IBM SP2 (installed at CTU computing centre) is a heterogenous machine equipped with P2SC processors, running at 120 and 160 MHz and having at least 128 MB of system memory. The SP nodes, running AIX 4.1 operating system, are connected by HPS switch, allowing simultaneous bidirectional transfer of 40 MB/sec between any two nodes. The communication is based on MPI built on the top of the native MPL message passing library.

The results achieved on PC cluster and SP2 machine are presented in Tables 5 and  6 and in Figs 6 and 7, respectively. Note that the heterogeneity of the computing platforms has been taken into account neither in the mesh partitioning (all partitions are equally load balanced) nor in the speedup or efficiency evaluation. Since the single processor computation has been always performed on the most powerful processor, the speedup is slightly underestimated whenever a slower processor has participated in the calculation. The degradation of the speedup profile is also caused by the adopted static load balancing. Since the computational complexity at some regions is increasing considerably during the analysis (strain-softening), the load balance is disturbed, resulting in the less loaded processors to be idle. This effect is becoming more significant as the number of processors increases. Despite these facts, the achieved speedup and efficiency are significant, leading to considerable reduction of the computational time.



Footnotes

... Inc1
www.mpi-softech.com


Next: Conclusions Up: Top Previous: Communication

Daniel Rypl
2005-12-03