Next: Implementation Up: Top Previous: Discretization strategy

Parallel computing scheme

In the current implementation, the master and slaves parallel computing scheme (Fig. 1) has been used. In the beginning phase, the input data are read, parsed, and checked by the master processor. The constituted model is then broadcast to all slave processors. For each relevant model entity, its approximate load level is calculated on the first available slave processor. The total load level, gathered from individual relevant model entities, is then broadcast to slaves. After that, relevant model entities are split appropriately into subdomains using the first available slave processor. The completed domain decomposition, collected by the master together with the workload of individual subdomains, is then again broadcast to slave processors. In the following phase, a dynamic load balancing mechanism is applied. A subdomain (not yet assigned) with the largest estimated workload is assigned to the first available slave processor and the parametric tree of that subdomain is built on this processor. After all subdomains have been processed, the complete subdomain to processor assignment is broadcast to all slaves. In the next step, the boundary tree exchange process is looped on all slaves. For each subdomain, assigned to that slave processor, the boundary tree structures are extracted and sent to appropriate subdomains (slaves) which will update their basic tree structure accordingly, and which, if necessary, will invoke a further boundary tree exchange. The role of the master processor in this process consists in the ``listening'' to exchange messages to detect the completion of the process. Once the parametric tree structures have been updated with respect to all boundary tree structures, the mesh generation starts on the slave processors followed immediately by the mesh smoothing. The master processor is only notified about the numbers of generated elements and nodes in the subdomain and on its boundary. These numbers are used to setup final numbering ranges which are broadcast to all slaves. Each slave processor then performs the renumbering of all subdomains assigned to it. In the final phase, the output data are written on the local device of each processor. The domain decomposition is output on the master processor while the mesh data are stored on the slave processors.



Next: Implementation Up: Top Previous: Discretization strategy

Daniel Rypl
2005-12-03