Next: Parallel Implementation Up: Parallelization Strategy Previous: Parallel Computing Scheme


Parallel Performance

The parallel performance of the mesh generator is evaluated in terms of the speedup, efficiency, and load balance [96]. The speedup is defined as

(3.31)


where is the execution time required to run the program on one processor and is the time required to run the same program in parallel on processors. The efficiency is related to the speedup according to the relation

(3.32)


Although the speedup and efficiency seem to be exclusively the functions of number of processors they are usually also dependent on the problem size and proportions of the sequential and parallel parts of the code. In this context, there are two important rules concerning the speedup and efficiency. The Amdahl's law states that if is the proportion of a calculation that is serial and is the portion which can be parallelized then the maximum speedup possible, no matter how many processors are used, is given by

(3.33)


Assuming that is the constant time taken to do the sequential part of the calculations and is the time consumed to do the parallelizable part of the calculations for a problem of size on processors, the speedup achieved on processors can be according to Eq. (3.31) written as

(3.34)


The Gustafson's law says that if and then any desired efficiency can be achieved on any number of processor by increasing the problem size sufficiently. Taking into account the computational complexity of the presented mesh generation algorithm and assuming that the parallelizable part of the code scales up ideally, the time can be expressed as

(3.35)


The limit value of the speedup for the problem size growing without bound can be then calculated as

(3.36)


which shows that an ideal speedup can be achieved for a large enough problem (mesh). Whereas the efficiency theoretically cannot exceed 100 %, in practise, a superlinear speedup () may occur. This is caused by the fact that when a problem is distributed over many processors the effective total size of the cache being used may increase. The distribution may also change the order in which non-deterministic operations are carried out, which can lead to earlier completion of the problem solution.

The load balance represents the degree to which the work is evenly distributed among available processors. The application usually executes most quickly if it is perfectly load balanced, that is, if each processor has exactly the same amount of work to do. The load balance can be quantified using the relation

(3.37)


where is a conveniently chosen quantity (typically the execution time). The and may be expressed as

(3.38)


where is the quantity achieved on a particular processor. Thus in the case when represents the execution time, and correspond to the finishing times of the first and last processors, respectively.



Next: Parallel Implementation Up: Parallelization Strategy Previous: Parallel Computing Scheme

Daniel Rypl
2005-12-07