A 3D notched specimen has been analyzed in three-point-bending using the direct explicit integration. The specimen geometry is shown in Fig. 12. Initially, the employed constitutive model is a nonlocal variant of rotating crack model. Once cracking process reaches a certain critical state (identified by principal stress to tensile strength ratio and by current shear stiffness to shear modulus ratio), the procedure switches to a damage type formulation. The final stage is then described by the damage model, that uses the anisotropic stiffness multiplied by a scalar factor, that decays to zero value as the cracking continues. The constitutive properties are summarized in Table 1. In order to simulate static test, the specimen loading has been controlled by the prescribed displacement of a transverse edge in the middle of the top specimen surface, which has been determined from the requirement of minimal inertia forces . The mesh contains 1964 nodes and 9324 linear tetrahedral elements. The total number of time steps analyzed is 7500. The modified node-cut strategy (allowing for nonlocal material model) has been used.
The mesh partitioning implementation is based on METIS  partitioning library. A general front-end application to METIS serving simultaneously as a data converter between the (sequential) mesh generator  and the object oriented computational code  has been written. This application firstly transforms the general mesh into an appropriate graph structure, according to the selected cut strategy. A METIS graph partitioning routine is then used to obtain the mesh partitioning which is further modified to account for zones involved in averaging algorithms. The partitions have been generated prior the analysis and have been kept constant throughout the whole analysis (static load balancing). An example of domain decomposition for 4-processor analysis is depicted in Fig. 13.
The results achieved on Dell PC cluster and SP2 machine are presented in Figs 14 and 15, respectively. Note that the heterogeneity of the computing platforms has been taken into account neither in the mesh partitioning (all partitions are equally load balanced) nor in the speedup or efficiency evaluation. Since the single processor computation has been always performed on the most powerful processor, the speedup is slightly underestimated whenever a slower processor has participated in the calculation. The degradation of the speedup profile is also caused by the adopted static load balancing. Since the computational complexity at some regions is increasing considerably during the analysis (strain-softening), the load balance is disturbed, resulting in the less loaded processors to be idle. This effect is becoming more significant as the number of processors increases. Despite these facts, the achieved speedup and efficiency are significant, leading to considerable reduction of the computational time.
The same problem was solved using the microplane material model M4. The model geometry (see Fig. 16) was slightly modified in order to enable the use of uniform structured mesh. This is necessary to ensure the material properties (summarized in Table 2) to be the same for each element, otherwise separate fitting procedure would be required for each element to specify appropriate material properties. Note that the dependence of material properties can be eliminated by introduction of nonlocal version of microplane model. Again, the static loading was simulated by prescribed displacement of transverse edges on top of specimen surface.
The structured mesh contains 2772 nodes and 2030 linear brick elements
(each with 8 integration points). The analysis has been performed
using 7500 time increments. The load time history of reaction of
prescribed outer node is depicted
in Fig. 17. The achieved computation times and speedups on PC cluster
are presented in Fig. 18. Note that superlinear speedup has been
achieved, which can be explained by enlarged amount of available
cache and by preserving computation to communication ratio at high values.