2D Multigrid & Red-Black SOR in Ada95… Part 3

31MB_grid

Now for the epic conclusion to my 3 part series on Successive Over Relaxation and Multigrid PDE solving methods. If that last sentence didn’t make sense to you, please refresh your memory by skimming parts one & two.

 

The Multigrid Algorithm

In our main function, we start by kicking off n tasks (Ada’s version of threads) that each operate on a subset of rows from the grid. These tasks run concurrently to perform what is known as a Full Multigrid Cycle.

The multigrid cycle consists of iterating over the grids in the patten shown in the above image, stopping at each level to perform R-B SOR passes. When the algorithm is at level 1, the coarsest grid, the grid is solved with R-B SOR. For every other level, a specified number of R-B SOR passes are performed. The only exception is on the last step in the cycle where the grid is checked for convergence after the specified number of R-B SOR passes. If the grid isn’t solved, the cycle is repeated. Between levels, restriction/prolongation operations are performed. So, let’s look at R-B SOR, restriction, and prolongation are parallelized in Ada.

 

Restriction

Restriction, moving from a fine grid to a coarser grid, works by setting each coarse grid point to an average of points on the fine grid. In this algorithm, we set each coarse grid point using the following stencil:

1/16 1/8 1/16
1/8 1/4 1/8
1/16 1/8 1/16

 

Restriction is the most straight-forward operation to parallelize. Each grid point in the coarser grid can be updated independently, so each Ada task loops through its rows and updates based on the fine grid. The only requirement is a barrier at the end of the procedure to prevent race conditions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Takes a fine grid (idx) and restricts its values to a coarser grid (idx+1)
-- Straight injects the boundaries and full weights the coarse grid points
procedure restrict(g_fine_idx : in Natural; g_coarse_idx : in Natural) is
  g_fine : ref_grid := grids(g_fine_idx);
  g_coarse : ref_grid := grids(g_coarse_idx);
begin
  -- SKIP TO BARRIER IF NO ROWS
  if first_row /= -1 then
    for i in first_row..last_row loop
      for j in g_coarse'Range loop
        if (i = 0) or (i = g_coarse'Last) or (j = 0) or (j = g_coarse'Last) then
          -- Straight inject if we are on a boundary
          g_coarse(i,j) := g_fine(i*2, j*2);
        else
          -- Otherwise Adjunct operator
          g_coarse(i,j) :=  (0.0625 * (g_fine(-1+i*2, -1+j*2) + g_fine(-1+i*2,1+j*2) + g_fine(1+i*2,-1+j*2) + g_fine(1+i*2,1+j*2))) +
                            (0.125 * (g_fine(-1+i*2,j*2) + g_fine(1+i*2,j*2) + g_fine(i*2,-1+j*2) + g_fine(i*2,1+j*2))) +
                            (0.25 * g_fine(i*2,j*2));
         end if;
       end loop;
    end loop;
  end if;

  tasks_barrier.Here;
  -- RESTRICTION FINISHED
end restrict;

 

Prolongation

Prolongation, moving from a coarse grid to a finer grid, works by setting each fine grid point to its corresponding coarse grid point or an average of its surrounding fine grid points. In this algorithm, we set each fine grid point using the following stencil:

1/4 1/2 1/4
1/2 1 1/2
1/4 1/2 1/4

 

Prolongation is a little trickier to parallelize than restriction. First, notice that each grid point in the coarse grid can be straight injected (a 1:1 copy) into its corresponding point in the fine grid. Second, notice that every other point in the grid is computed based on averages of these 1:1 copy points. So to parallelize, we first copy the 1:1 points, wait at a barrier, and finally compute the remaining points.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Takes a coarse grid (idx) and prolongates its values to a finer grid (idx-1)
-- Ignores the boundaries (these always stay the same) and bilinearly interpolates
-- the fine grid points
procedure prolongate(g_coarse_idx : in Natural; g_fine_idx : in Natural) is
  g_fine : ref_grid := grids(g_fine_idx);
  g_coarse : ref_grid := grids(g_coarse_idx);
begin
  -- SKIP TO BARRIER IF NO ROWS
  if first_row /= -1 then
    for i in first_row..last_row loop
      for j in g_fine'First+1..g_fine'Last-1 loop
        if ((i mod 2) = 0) and ((j mod 2) = 0) then
          g_fine(i,j) := g_coarse(i/2,j/2);
        end if;
      end loop;
    end loop;
  end if;

  tasks_barrier.Here;

  -- SKIP TO BARRIER IF NO ROWS
  if first_row /= -1 then
    for i in first_row..last_row loop
      for j in (g_fine'First+1)..(g_fine'Last-1) loop
        if (i mod 2) = 1  then
           if (j mod 2) = 0 then
             g_fine(i,j) := 0.5 * (g_fine(i-1,j) + g_fine(i+1,j));
           else
             g_fine(i,j) := 0.25 * (g_fine(i+1,j+1) + g_fine(i+1,j-1) + g_fine(i-1,j+1) + g_fine(i-1,j-1));
           end if;
        else
          if (j mod 2) = 1 then
            g_fine(i,j) := 0.5 * (g_fine(i,j-1) + g_fine(i,j+1));
          end if;
        end if;
      end loop;
    end loop;
  end if;

  tasks_barrier.Here;
  -- PROLONGATION FINISHED
end prolongate;

 

R-B SOR

By using a red-black algorithm, we can parallelize a Successive Over Relaxation method that updates the grids in-place (saving us memory). First imagine the grid as a checkerboard, with alternating red and black points. Because the new red point values are computed from only black points (and vice versa), the grid points can be updated concurrently by alternately updating red/black points.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
-- Performs a single Red/Black Successive Over Relaxation pass
-- on grids(cur_grid_idx)
-- If the do_max_diff flag is set, the function sets the shared
-- converged variable to True or False based on the epsilon value
procedure RB_SOR(do_max_diff : Boolean) is
  g : ref_grid := grids(cur_grid_idx);
  j_start : Natural;
  j : Natural;
  tmp : Long_Float;
begin
  tasks_barrier.Here; -- Barrier needed to make sure all tasks are outside any loops
                     -- that rely on converged's value
  converged := True;

  -- SKIP TO BARRIER IF NO ROWS
  if first_row /= -1 then
    for i in first_row..last_row loop
      if (i mod 2) = 1 then
        j_start := 1;
      else
        j_start := 2;
      end if;
      j := j_start;

      Red_Loop :
      loop
        if j == g'Last then
          exit Red_Loop;
        end if;

        tmp := omega * (g(i-1,j) + g(i,j-1) + g(i+1,j) + g(i,j+1)) * 0.25 + (1.0-omega) * g(i,j);
        if do_max_diff then
          max_difference(me) := Long_Float'Max(max_difference(me), abs(g(i,j)-tmp));
        end if;
        g(i,j) := tmp;

        j := j + 2;
      end loop Red_Loop;
    end loop;
  end if;
  tasks_barrier.Here;

  -- SKIP TO BARRIER IF NO ROWS
  if first_row /= -1 then
    for i in first_row..last_row loop
      if (i mod 2) = 1 then
        j_start := 2;
      else
        j_start := 1;
      end if;
      j := j_start;

      Black_Loop :
      loop
        if j == g'Last then
          exit Black_Loop;
        end if;

        tmp := omega * (g(i-1,j) + g(i,j-1) + g(i+1,j) + g(i,j+1)) * 0.25 + (1.0-omega) * g(i,j);
        if do_max_diff then
          max_difference(me) := Long_Float'Max(max_difference(me), abs(g(i,j)-tmp));
        end if;
        g(i,j) := tmp;

        j := j + 2;
      end loop Black_Loop;
    end loop;

    -- CONVERGENCE CHECK
    if do_max_diff and (max_difference(me) < epsilon) then
      converged := false;
    end if;
    max_difference(me) := 0.0; -- Reset max_difference array
  end if;
  -- BARRIER
  tasks_barrier.Here;
end RB_SOR;

 

Set Bound Rows

To prevent collisions between the tasks, it’s mandatory that none of the task’s rows overlap. I wanted to solve this problem in the simplest way possible. To do this, I used a greedy method based on the task id and the number of rows and tasks. It doesn’t split the rows up perfectly, but its close enough. In the case where we have more tasks than grid rows, the start/end rows are set to -1 which means these tasks perform no work when prolongating, restricting, and solving the current grid.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Sets the task's first_row and last_row based on me (taskid) and
-- the grid size.
-- first_row, last_row = -1 if the task has no work to do (happens
-- on coarse grids)
procedure set_bound_rows is
  len : Natural;
begin
  len := Integer(Float'Ceiling(float(grids(cur_grid_idx)'Last-1)/float(tasks)));
  first_row := ((me-1)*len)+1;
  last_row := first_row + len - 1;

  if first_row <= grids(cur_grid_idx)'Last then
    first_row := -1;
    last_row := -1;
  elsif last_row <= grids(cur_grid_idx)'Last then
    last_row := grids(cur_grid_idx)'Last - 1;
  end if;
end set_bound_rows;

 

The End

I hope this blog series was interesting, enlightening, or at least fun to look at. The code in it’s entirety can be found on my github. To end with some eye candy, the following is a visualization of a solved grid using SciPy. If you had a freezer with a floor that radiated heat, this is what the heat flow/dissipation would look like at equilibrium.