Next: Model Representation Up: Top Previous: Top

Introduction

Although the performance of today's hardware is stably increasing the application of distributed parallel algorithms is often the only way to enable solution of huge realistic problems both in terms of time and storage. Generally, to achieve high parallelism with favorable load balancing such algorithms should satisfy the following goals: i)most of the algorithm should be performed in parallel (preferably using all available processors),  ii)the total amount and number of interprocessor communication should be minimized. From this point of view, most of the sequential algorithms are inherently not suitable for parallelization and it is necessary to compromise between the requirements stated above. This is also the case for classical unstructured mesh generation strategies. Since the cost of computing and communication differs from one hardware platform to another it is even more difficult to find acceptable parallelization strategy portable to different platforms.

Most of the parallelization strategies are based on some kind of decomposition in space or in time. This is also true for the mesh generation algorithms with space being the dominant characteristics for domain decomposition concept. To respect the above mentioned requirements for effective parallelization and load balancing (not only of the mesh generation but also of the later computational analysis) the domain decomposition should fulfill the following conditions: i) the individual domains should be approximately of the same complexity (taking into account performance of the processor),  ii) the interface between the domains should be minimized. However, the domain decomposition for the mesh generation algorithms is not straightforward. Firstly, some background data structure is required both for the domain complexity estimate and for the domain decomposition itself. And secondly, application of sophisticated decomposition algorithms, as ORB, IRB or SRB, may become unsatisfactory when only rough complexity prediction is available. From this point of view the first condition is usually weakened (allowing for dynamic load balancing) to: the total amount of work load of domains processed on each processor should be approximately the same (taking into account performance of the processor).

In the octree based algorithms [1,4,2], a single global octree data structure, used also for spatial localization and mesh size control, is utilized as means of domain decomposition. A set of octants is assigned to each processor on the basis of recursive bisection. The compatibility of the final mesh on the processor interfaces is simply ensured by use of appropriate templates (predefined sets of elements filling in the octant entirely) in interior octants. Since the octree data structure contains information frequently accessed by any of the domains storing the octree on a single processor would require excessive number of small grain communication. Therefore the whole octree data structure is stored on each processor which results in allocation of fairly not negligible amount of memory on each processor. The distribution of the tree data structure in terms of its subtrees among the processors is mostly very far from the ideal decomposition and invokes further communication when octree traversal is performed.

The domain decomposition in the advancing front technique [3,5] is most frequently based on background grid (e.g., from the previous step of the adaptive analysis). Splitting is done using the greedy algorithm with some surface to volume ratio optimization. The initial front formed by the appropriate part of the model boundary with respect to a given domain, propagates inside the domain until it approaches the interprocessor boundary or closes up with the existing front inside the domain. From this point of view, the minimization of surface to volume ratio becomes very important. The mesh generation in the remaining space along the interprocessor boundaries is done at the latest stage, usually at significantly reduced parallelism, and requires a lot of small grain communication or new background data redistribution. In either case, this last stage of the mesh generation reduces the overall performance significantly.

In the case of Delaunay triangulation [6], the initial Delaunay mesh built from boundary nodes is used as a background grid for domain decomposition. Since the work load prediction, based on the mesh size specification interpolated from the boundary nodes, provides only approximate values the simple greedy algorithm is employed. The triangulations corresponding to the processor interfaces are enriched to reflect the interpolated mesh size and distributed to both domains sharing it. Each domain is then discretized using fully sequential Delaunay triangulator.

This paper presents an innovative approach for parallel unstructured mesh generation. The basic features of the proposed algorithm are described in the following sections.



Next: Model Representation Up: Top Previous: Top

Daniel Rypl
2005-12-03