hw_2
Pseudocode: Have master processor compute block decomposition. Calculate block decomposition on local processor. Allocate blocks + ghost cells
Main loop: Communicate ghost cell values Calculate updated values
First task: divide the domain
We want to be able to divide an matrix among
processors, with minimal assumptions about divisibility.
Add outer calling routine which takes arbitrary and
design inner method assuming
Divide horizontal stripes of matrix among processors.
That is, assign to each of the
processors.
Because we are designing this algorithm for
processors,
we therefore know that
As an example, if we have 1000 rows and 16 processors, then
we get If we assign 62 rows to
each processor then there are 8 rows left over. We know automatically that
the number of leftover rows is less than or equal to the total number of blocks.
This informs the following method of calculating responsibility of a row:
decomp_test.py
import numpy as np def starts_ends(NPROCS, NROWS): starts = [] burden = [] ends = [] end_idx = 0 base_rows_per_block = int(np.floor(NROWS/NPROCS)) remainder = NROWS - base_rows_per_block * NPROCS for rank_idx in range(NPROCS): start_idx = end_idx if remainder > 0: end_idx += base_rows_per_block+1 remainder -= 1 else: end_idx += base_rows_per_block starts.append(start_idx) burden.append(end_idx-start_idx) ends.append(end_idx) if remainder > 0: raise ValueError("Remainder after division among processors is nonzero!") print(starts) print(ends) print(burden) if __name__ == "__main__": from sys import argv NPROCS = int(argv[1]) NROWS = int(argv[2]) starts_ends(NPROCS, NROWS)
Add if/else conditionals to handle lack of ghost entries at top and bottom of matrix. Namely,
vertical_tiles = NPROCS
assert(lateral_tiles * vertical_tiles == NPROCS)
def get_row_col_idx(proc_id):
row_idx = floor(proc_id/lateral_tiles)
col_idx = proc_id - row_idx * lateral_tiles
return (row_idx, col_idx)
row_end_assignment_idx = row_start_assignment_idx + row_burden
column_end_assignment_idx = NCOLUMNS
iteration_start_values = np.array(row_burden+2, NCOLUMNS)
iteration_end_values = np.array(row_burden+2, NCOLUMNS)
Calculate initialization for my assigned rows:
for row_idx in range(row_start_assignment_idx, row_end_assignment_idx):
for column_idx in range(column_start_assignment_idx, column_end_assignment_idx):
iteration_start_values[row_idx+1, column_idx+1] = initialize(row_idx, column_idx)
Assume that before this point we have calculated top_neighbor_proc_idx
and bottom_neighbor_proc_idx
Setup timing infrastructure:
# MPI_BARRIER CALL
# START TIMER on executive process
main loop code:
for row_idx, column_idx in get_boundary_indices:
iteration_end_values[row_idx, column_idx] = iteration_start_values[row_idx, column_idx]
def get_start_end_row_idx(processor_id):
if processor_id == 0:
return (2, burden)
elif processor_id == MPI_SIZE-1:
return (1, burden-1)
else:
return (1, burden)
for row_idx in range(burden):
for column_idx in range(NCOLUMNS):
iteration_start_values[row_idx+1, column_idx] = garbage_func(iteration_start_values[row_idx+1, column_idx])
# MPI_BARRIER CALL
if my_rank != 0:
MPI_SEND iteration_start_values[1, :] to process top_neighbor_proc_idx
if my_rank != MPI_SIZE-1:
MPI_SEND iteration_start_values[-2, :] to process bottom_neighbor_proc_idx
if my_rank != 0:
MPI_RECV from top_neighbor_proc_idx into top_buffer
iteration_start_values[0, :] = top_buffer
if my_rank != MPI_SIZE-1:
MPI_RECV from bottom_neighbor_proc_idx into bottom_buffer
iteration_start_values[-1, :] = bottom_buffer
# IF DEBUG, MPI_BARRIER_CALL
row_start_idx, row_end_idx = get_start_end_row(my_rank)
iteration_end_values[row_start_idx:row_end_idx, 1:-1] = (
iteration_start_values[row_start_idx-1:row_end_idx-1,1:-1] +
iteration_start_values[row_start_idx+1:row_end_idx+1,1:-1] +
iteration_start_values[row_start_idx:row_end_idx,0:-2] +
iteration_start_values[row_start_idx:row_end_idx,2:] +
iteration_start_values[row_start_idx:row_end_idx,1:-1])/5
iteration_end_values[row_start_idx:row_end_idx, 1:-1] = np.max(-100, np.min(iteration_end_values[row_start_idx:row_end_idx, 1:-1], 100))
iteration_start_values = iteration_end_values
Run this 10 times in a loop.