Variable resolution computational cost estimation!
Several people in our research group need to answer the following question: If you know the cost of running an ESM configuration whose compute time is almost entirely due to the Atmospheric component (e.g., Aquaplanet, SCREAM, etc.) at nominal 1º resolution, can you compute the cost of running a regionally refined model configuration?
Here are (some of) my starting assumptions:
- The (compute) cost of advancing one "time step" within a grid cell (time step will be a shorthand for everything that happens between physics updates) does not change with the size of a grid cell.
- The nominal 1º reference simulation uses a grid partition/load balancing such that the amount of inter-node communication required for a node is comparable to what it will be for the variable resolution run. E.g., a simulation performed with all physics/dynamics work performed within a single node (even if it uses multiple sockets) may be a bad reference cost.
- The model has been tailored to the target computer such that increasing the number of nodes in a simulation does not cause drastic increases in time spent in MPI calls and, what is possibly more troubling, filesystem and IO calls.
- Remapping, e.g. for file output, is a negligible percentage of total runtime.
With that in mind, I think you can (approximately) compute the cost basically by counting elements and accounting for the new time step .
One way to do this if you don't yet have a particular grid in mind is by defining some density function over the surface of the earth with units of, e.g., Grid Points/sq km
that matches the final grid and integrate over the surface of the earth. For a constant function
,
we get approximately 35,000 gridpoints, which is approximately right for a grid with nominal 1º grid spacing. Discarding a couple of constant coefficients, this means that you can specify a
that
describes the nominal radius of a grid cell and then calculate
The time step decreases approximately linearly with the nominal distance between gridpoints, so if the smallest grid spacing in your
variable resolution mesh is, e.g., , then the time step is
,
so you must take
times as many time steps. Suppose we find the total cost
(in whatever units you want to do your accounting in) for, e.g.,
of total simulation.
We can calculate the number of time steps
.
The computational cost per timestep is then
.
If the number of grid cells in the reference run is
,
then the cost per timestep per element is then
.
Under the assumptions above, then the final cost just requires multiplying
.
Interesting note: I could test how well this analysis works using CESM on greatlakes with, e.g., a variable resolution CAM4 physics Aquaplanet run or something,