hw_2

Pseudocode: Have master processor compute block decomposition. Calculate block decomposition on local processor. Allocate blocks + ghost cells

Main loop: Communicate ghost cell values Calculate updated values

First task: divide the domain

We want to be able to divide an $m\times n$ matrix among $p$ processors, with minimal assumptions about divisibility.

Add outer calling routine which takes arbitrary $m', n'$ and design inner method assuming $m \geq n$

Divide horizontal stripes of matrix among processors.

That is, assign $\lfloor \frac{m}{p} \rfloor$ to each of the $p$ processors. Because we are designing this algorithm for $p < \frac{\min(m', n')}{10} = \frac{1}{10}n$ processors, we therefore know that $\frac{m}{p} \geq \frac{n}{p} > 10.$

As an example, if we have 1000 rows and 16 processors, then we get $\frac{1000}{16}= 62.5.$ If we assign 62 rows to each processor then there are 8 rows left over. We know automatically that the number of leftover rows is less than or equal to the total number of blocks.

This informs the following method of calculating responsibility of a row:

decomp_test.py

import numpy as np
def starts_ends(NPROCS, NROWS):
    starts = []
    burden = []
    ends = []
    
    end_idx = 0 
    base_rows_per_block = int(np.floor(NROWS/NPROCS))
    remainder = NROWS - base_rows_per_block * NPROCS
    for rank_idx in range(NPROCS):
        start_idx = end_idx
        if remainder > 0:
            end_idx += base_rows_per_block+1
            remainder -= 1
        else:
            end_idx += base_rows_per_block
        starts.append(start_idx)
        burden.append(end_idx-start_idx)
        ends.append(end_idx)
    if remainder > 0:
        raise ValueError("Remainder after division among processors is nonzero!")
    print(starts)
    print(ends)
    print(burden)


if __name__ == "__main__":
    from sys import argv
    NPROCS = int(argv[1])
    NROWS = int(argv[2])
    starts_ends(NPROCS, NROWS)

Add if/else conditionals to handle lack of ghost entries at top and bottom of matrix. Namely,

vertical_tiles = NPROCS

assert(lateral_tiles * vertical_tiles == NPROCS)

def get_row_col_idx(proc_id):
  row_idx = floor(proc_id/lateral_tiles)
  col_idx = proc_id - row_idx * lateral_tiles
  return (row_idx, col_idx)
  







row_end_assignment_idx = row_start_assignment_idx + row_burden
column_end_assignment_idx = NCOLUMNS

iteration_start_values = np.array(row_burden+2, NCOLUMNS)

iteration_end_values = np.array(row_burden+2, NCOLUMNS)

Calculate initialization for my assigned rows:

for row_idx in range(row_start_assignment_idx, row_end_assignment_idx):
  for column_idx in range(column_start_assignment_idx, column_end_assignment_idx):
    iteration_start_values[row_idx+1, column_idx+1] = initialize(row_idx, column_idx)

Assume that before this point we have calculated top_neighbor_proc_idx and bottom_neighbor_proc_idx Setup timing infrastructure:

# MPI_BARRIER CALL
# START TIMER on executive process

main loop code:

for row_idx, column_idx in get_boundary_indices:
  iteration_end_values[row_idx, column_idx] = iteration_start_values[row_idx, column_idx]
  

def get_start_end_row_idx(processor_id):
  if processor_id == 0:
    return (2, burden)
  elif processor_id == MPI_SIZE-1:
    return (1, burden-1)
  else:
    return (1, burden)

for row_idx in range(burden):
  for column_idx in range(NCOLUMNS):
      iteration_start_values[row_idx+1, column_idx] = garbage_func(iteration_start_values[row_idx+1, column_idx])
      

# MPI_BARRIER CALL

if my_rank != 0:
  MPI_SEND iteration_start_values[1, :] to process top_neighbor_proc_idx


if my_rank != MPI_SIZE-1:
  MPI_SEND iteration_start_values[-2, :] to process bottom_neighbor_proc_idx
  

if my_rank != 0:
  MPI_RECV from top_neighbor_proc_idx into top_buffer
  iteration_start_values[0, :] = top_buffer
  

if my_rank != MPI_SIZE-1:
  MPI_RECV from bottom_neighbor_proc_idx into bottom_buffer
  iteration_start_values[-1, :] = bottom_buffer


# IF DEBUG, MPI_BARRIER_CALL
          
row_start_idx, row_end_idx = get_start_end_row(my_rank)


iteration_end_values[row_start_idx:row_end_idx, 1:-1] = (
iteration_start_values[row_start_idx-1:row_end_idx-1,1:-1] +
iteration_start_values[row_start_idx+1:row_end_idx+1,1:-1] +
iteration_start_values[row_start_idx:row_end_idx,0:-2] +
iteration_start_values[row_start_idx:row_end_idx,2:] +
iteration_start_values[row_start_idx:row_end_idx,1:-1])/5
iteration_end_values[row_start_idx:row_end_idx, 1:-1] = np.max(-100, np.min(iteration_end_values[row_start_idx:row_end_idx, 1:-1], 100))
iteration_start_values = iteration_end_values

Run this 10 times in a loop.

Parent post:

eecs587