In the current implementation, the master and slaves parallel computing paradigm with a dynamic load balancing (Verhoeven et al. ) has been adopted. In the beginning phase, the input data are read, parsed, and checked by the master processor. The constituted model is then broadcast to all slave processors. For each model entity, its approximate load level is calculated on the first available slave processor and sent to the master. In the following phase, a dynamic load balancing mechanism is applied. This concept relies on the fact, that there is (much) more subdomains to be discretized with respect to the total number of available slave processors (this is usually true for at least a little bit complex model). A subdomain (not yet assigned) with the largest estimated workload is assigned to the first available slave processor and the parametric tree of that subdomain is built on this processor. After all subdomains have been processed, the complete subdomain to processor assignment is broadcast to all slaves. In the next step, the boundary tree exchange process is looped on all slaves. For each subdomain, assigned to that slave processor, the boundary trees are extracted and sent to appropriate subdomains (slaves) which will update their basic tree accordingly, and which, if necessary, will invoke a further boundary tree exchange. Once the parametric trees have been updated with respect to all boundary trees, the mesh generation starts on the slave processors followed immediately by the mesh smoothing. The master processor is only notified about the numbers of generated elements and nodes in the subdomain and on its boundary. These numbers are used to setup final numbering ranges which are broadcast to all slaves. Each slave processor then performs the renumbering of all subdomains assigned to it. In the final phase, the output data are written on the local device of each processor. The domain decomposition is output on the master processor while the mesh data are stored on the slave processors.