The adopted parallelization strategy is based on (spatial) domain decomposition concept. From the point of view of the finite element method there are generally two dual partitioning approaches. With respect to the character of a cut, dividing the problem mesh into partitions, one can distinguish between node-cut and element-cut strategies . In the node-cut strategy (Figure 3), the elements are uniquely assigned to individual partitions by leading the cut along element sides. Nodes on these element sides are shared by 2 or more adjacent partitions and are called shared nodes. In the element-cut strategy (Figure 4), the nodes are uniquely assigned to individual partitions by running the cut across elements. The divided elements are shared by 2 or more adjacent partitions and are called shared elements. For each partition, those boundary nodes of the shared element not being assigned to that partition are called remote nodes. It is clear that such configurations require modification of standard sequential code in terms of exchange of some data. While for shared node, contributions from individual partitions sharing it must be assembled, for remote node, the nodal values must be communicated from the partition owning it to other partitions owning shared element being incident to that node. This data exchange can be done quite efficiently because for a given mesh and its partitioning the individual communication maps, setuped initially on each partition, do not change (unless the problem is subjected to repartitioning to recover the load balance).
When the nonlocal constitutive model is considered, additional issues have to be addressed. Generally, the response of the nonlocal material model in a given material point depends on other points in its neighbourhood. However if such a point (typically integration point) is close to the partition boundary, then its response is dependent on off-partition points. Thus the evaluation of nonlocal quantity requires assembling local contribution from points on the same partition as well as remote contributions from points on adjacent (and even not adjacent) partitions. On memory distributed computing platforms this would lead to extremely fine grain communication pattern, which is not acceptable. To get rid of this remote dependency, each partition mirrors a band of elements, that are owned by other partition. These elements are called remote elements and their boundary nodes remote nodes. The width of the band is given by the interaction radius used in the nonlocal averaging. Partitions possessing remote elements and remote nodes must communicate with partitions for which these elements and nodes are local in order to keep them up to date. This is achieved only by introducing additional communication maps and using standard communication scheme. The nonlocal averaging for all material points local to a given partition is then performed locally on that partition. Keep in mind that the efficiency of this approach, based on the limited support of the weight function, depends on the size of the interaction radius.
From the computational point of view, the node-cut strategy is usually considered more efficient due to the smaller amount of interprocessor data transfer and because the duplicated processing of shared elements is avoided.
The above scheme with mirrored band of elements is also very suitable for efficient parallelization of the employed error estimation. For node-cut concept, the error estimation on element patches is always local. On the other hand, the nodal patch problem is local only if related to a local node. At shared nodes, nodal patch error assessment needs access to off-processor data. However if it is ensured that the mirrored band of elements has always at least one layer of elements (in other words all elements incident to a shared node are mirrored on appropriate partition(s)), then the shared node patch problems can be also solved locally. In the case of element-cut, the error estimation on element patches as well as nodal patches is local. Note that there is no need to evaluate error at remote element as well as remote node patches. This makes the element-cut concept more appropriate for parallel estimation.
Taking into account that the mirrored elements are kept updated by the analysis itself, the overall communication of the parallel error estimator consists only in the exchange of the error and solution norms summed on individual partitions. Note however that in the case of element-cut strategy, it is more convenient to make the sums over local nodal patches, which avoids problems with duplication of contributions from shared elements.