Developers: Understanding an analysis task

This tutorial walks a new developer through an existing analysis task to get a more in-depth understanding of the code. This tutorial is meant as the starting point for the Developers: Adding a new analysis task tutorial. It is a common practice to find an existing analysis task that is as close as possible to the new analysis, and to copy that existing task as a template for the new task. This tutorial describes an existing analysis task, and Developers: Adding a new analysis task uses it as a starting point for developing a new task.

1. The big picture

MPAS-Analysis is meant to provide a first look at E3SM simulation output from the MPAS-Ocean and MPAS-Seaice components. The analysis is intended to be robust and automated. However, there is currently little effort to ensure that the time period covered by the observations and model output are the same. In other words, we often compare pre-industrial simulation results with present-day observations. The justification for this is twofold. First, we typically have few if any observations covering the pre-industrial period. Second, we may be attempting to reduce biases that we assess to be much larger than expected differences between pre-industrial and present-day climate conditions. Under these conditions, MPAS-Analysis provides us with a useful first impression of how our simulation is doing.

1.1 MPAS output

The primary output from MPAS-Ocean and MPAS-Seaice are monthly and daily averages of a large number of data fields. Here are links to the list of:

The components also produce a smaller amount of more specialized output, such as monthly maximum/minimum values.

MPAS data is provided on unstructured meshes, meaning that it isn’t particularly amenable to analysis with standard tools such as ESMValTool. Additionally, E3SM’s science campaigns require unique, sometimes regionally focused analysis not available in existing tools.

1.2 Analysis tasks

MPAS-Analysis is designed to a series of interdependent analysis tasks in parallel with one another. It builds up dependency graph between the tasks, allowing independent tasks to run at the same time while putting dependent tasks on hold until the tasks they depend on are completed. Additionally, MPAS-Analysis has some rudimentary tools for keeping track of the resources that some computationally intensive tasks require to prevent the tool from running out of memory.

Currently, nearly all operations in MPAS-Analysis must run on a single HPC node. (The exception is ncclimo, which is used to generate climatologies, and which can run in parallel across up to 12 nodes if desired.) We hope to support broader task parallelism in the not-too-distant future using the parsl python package.

Each analysis tasks is a class that descends from the AnalysisTask base class. Tasks can also have “subtasks” that do part of the work needed for the final analysis. A subtask might perform a computation on a specific region, period of time, or season. It might combine data from other subtasks into a single dataset. Or it might plot the data computed by a previous task. The advantages of dividing up the work are 1) that each subtask can potentially run in parallel with other subtasks and 2) it can allow code reuse if the same subtask can be used across multiple analysis tasks.

1.3 Shared framework

MPAS-Analysis includes a shared framework used across analysis tasks. The framework is made up mostly of functions that can be called from within analysis tasks but also includes some analysis tasks and subtasks that are common to MPAS-Ocean, MPAS-Seaice and potentially other MPAS components (notably MALI) that may be supported in the future.

This tutorial will not go though the shared framework in detail. In addition to the AnalysisTask base class, the shared framework includes the following packages:

$ ls mpas_analysis/shared
climatology
constants
generalized_reader
html
interpolation
io
mpas_xarray
plot
projection
regions
time_series
timekeeping
transects
...

A separate tutorial will explore the shared framework and how how to modify it.

2. Tour of an analysis task (ClimatologyMapOHCAnomaly)

Aside from some code that takes care of managing analysis tasks and generating web pages, MPAS-Analysis is made up almost entirely of analysis tasks and shared functions they can call. Since adding new analysis nearly always mean creating a new class for the task, we start with a tour of an existing analysis task as well as the AnalysisTask base class that it descends from.

We will use ClimatologyMapOHCAnomaly as an example analysis task for this tour because it will turn out to be a useful staring point for the analysis we want to add in Developers: Adding a new analysis task. You can read more about climatologyMapOHCAnomaly in the User’s Guide.

It will be useful to open the following links in your browser to have a look at the code directly: ClimatologyMapOHCAnomaly

If you want to be a little more adventurous, you can also pull up the code for the base class: AnalysisTask

2.1 Attributes

Classes can contain pieces of data called attributes. In MPAS-Analysis, the objects representing tasks share several attributes that they inherit from the AnalysisTask class. A few of the most important attributes of an analysis task are:

  • config - an object for getting the values of config options

  • namelist - an object for getting namelist options from the E3SM simulation

  • runStreams - an object for finding MPAS output files in the run directory. In practice, this is always a restart file used to get the MPAS mesh and, for MPAS-Ocean, the vertical coordinate.

  • historyStreams - an object for finding MPAS history streams (often timeSeriesStatsMonthlyOutput).

  • calendar - the name of the calendar that was used in the MPAS run (in practice always 'noleap' or until recently 'gregorian_noleap').

  • xmlFileNames - a list of XML files associated with plots produced by this analysis task. As we will discuss, these are used to help populate the web page showing the analysis.

  • logger - and object that keeps track of sending output to log files (rather than the terminal) when the analysis is running. During the run_task() phase of the analysis when tasks are running in parallel with each other, make sure to use logger.info() instead of print() to send output to the log file.

Within the methods of analysis task class, these attributes can be accessed using the self object, e.g. self.config. It is often helpful to make a local reference to the object to make the code more compact, e.g.:

config = self.config
seasons = config.getexpression('climatologyMapOHCAnomaly', 'seasons')

The analysis task we’re looking at, ClimatologyMapOHCAnomaly has some attributes of its own:

  • mpasClimatologyTask - the task that produced the climatology to be remapped and plotted

  • refYearClimatologyTask - The task that produced the climatology from the first year to be remapped and then subtracted from the main climatology (since we want to plot an anomaly from the beginning of the simulation)

2.2 Constructor

Almost all classes have “constructors”, which are methods for making a new object of that class. In python, the constructor is called __init__(). In general, the __ (double underscore) is used in python to indicate a function or method with special meaning.

The constructor of a subclass (such as ClimatologyMapOHCAnomaly) always calls the constructor of the superclass (AnalysisTask in this case). So we’ll talk about the constructor for ClimatologyMapOHCAnomaly first and then get to AnalysisTask.

The constructor for ClimatologyMapOHCAnomaly starts off like this:

def __init__(self, config, mpas_climatology_task,
             ref_year_climatology_task, control_config=None):

As with all methods, it takes the self object as the first argument. Then, it takes a config object, which is true of all analysis tasks. Then, it has some other arguments that are more specific to the analysis being performed. Here, we have 2 other analysis tasks as arguments: mpasClimatologyTask and refYearClimatologyTask. As described in the previous section, these are tasks for computing climatologies that will later be remapped to a comparison grid for plotting. A little later in the constructor, we store references to these tasks as attributes:

self.mpas_climatology_task = mpas_climatology_task
self.ref_year_climatology_task = ref_year_climatology_task

Returning to the constructor above, the first thing we do it to call the super class’s __init__() method:

def __init__(self, config, mpas_climatology_task,
             ref_year_climatology_task, control_config=None):
    """
    Construct the analysis task.

    Parameters
    ----------
    config : mpas_tools.config.MpasConfigParser
        Configuration options

    mpas_climatology_task : mpas_analysis.shared.climatology.MpasClimatologyTask
        The task that produced the climatology to be remapped and plotted

    ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
        The task that produced the climatology from the first year to be
        remapped and then subtracted from the main climatology

    control_config : mpas_tools.config.MpasConfigParser, optional
        Configuration options for a control run (if any)
    """

    field_name = 'deltaOHC'
    # call the constructor from the base class (AnalysisTask)
    super().__init__(config=config, taskName='climatologyMapOHCAnomaly',
                     componentName='ocean',
                     tags=['climatology', 'horizontalMap', field_name,
                           'publicObs', 'anomaly'])

We’re passing along the config options to the base class so it can store them. Then, we’re giving the task a unique taskName (the same as the class name except that it starts with a lowercase letter). We’re saying that the MPAS componentName is the ocean.

Then, we giving the task a number of tags that can be helpful in determining whether or not to generate this particular analysis based on the Generate Option. The tags are used to describe various aspects of the analysis. Here, we will produce plots of a climatology (as opposed to a time series). The plot will be a horizontalMap. It will involve the variable deltaOHC. This analysis doesn’t involve any observations, but we include a publicObs tag to indicate that it doesn’t require any proprietary observational data sets that we do not have the rights to make public. (Currently, we have a few such data sets for things like Antarctic melt rates.) Finally, the analysis involves an anomaly computed relative to the beginning of the simulation.

From there, we get the values of some config options, raising errors if we find something unexpected:

section_name = self.taskName

# read in what seasons we want to plot
seasons = config.getexpression(section_name, 'seasons')

if len(seasons) == 0:
    raise ValueError(f'config section {section_name} does not contain '
                     f'valid list of seasons')

comparison_grid_names = config.getexpression(section_name,
                                             'comparisonGrids')

if len(comparison_grid_names) == 0:
    raise ValueError(f'config section {section_name} does not contain '
                     f'valid list of comparison grids')

depth_ranges = config.getexpression('climatologyMapOHCAnomaly',
                                    'depthRanges',
                                    use_numpyfunc=True)

By default, these config options look like this:

[climatologyMapOHCAnomaly]
## options related to plotting horizontally remapped climatologies of
## ocean heat content (OHC) against control model results (if available)

...

# Months or seasons to plot (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct,
# Nov, Dec, JFM, AMJ, JAS, OND, ANN)
seasons =  ['ANN']

# comparison grid(s) ('latlon', 'antarctic') on which to plot analysis
comparisonGrids = ['latlon']

# A list of pairs of minimum and maximum depths (positive up, in meters) to
# include in the vertical sums.  The default values are the equivalents of the
# default ranges of the timeSeriesOHCAnomaly task, with a value of -10,000 m
# intended to be well below the bottom of the ocean for all existing MPAS-O
# meshes.
depthRanges = [(0.0, -10000.0), (0.0, -700.0), (-700.0, -2000.0), (-2000.0, -10000.0)]

We plot only the annual mean OHC anomaly and we plot it only on a global latitude-longitude grid. The range of depths is:

  • the full ocean column

  • sea surface to 700 m depth

  • 700 m to 2000 m depth

  • 2000 m to the seafloor

A user would be free to change any of these config options, and the analysis should run correctly. They could choose to plot on a different comparison grid, add new seasons, or change the depth range. As long as they ran the analysis in a fresh directory (or purged output from a previous analysis run), this should work correctly.

Next, we store some values that will be useful later:

mpas_field_name = 'deltaOHC'

variable_list = ['timeMonthly_avg_activeTracers_temperature',
                 'timeMonthly_avg_layerThickness']

This particular analysis involves 4 different depth ranges over which we compute the ocean heat content. The remainder of the analysis is performed separately for each of these depth ranges in subtask. We loop over the depth range and add a subtask that will first compute the ocean heat content (OHC) and then remap it to the comparison grids (RemapMpasOHCClimatology):

for min_depth, max_depth in depth_ranges:
    depth_range_string = \
        f'{np.abs(min_depth):g}-{np.abs(max_depth):g}m'
    remap_climatology_subtask = RemapMpasOHCClimatology(
        mpas_climatology_task=mpas_climatology_task,
        ref_year_climatology_task=ref_year_climatology_task,
        parent_task=self,
        climatology_name=f'{field_name}_{depth_range_string}',
        variable_list=variable_list,
        comparison_grid_names=comparison_grid_names,
        seasons=seasons,
        min_depth=min_depth,
        max_depth=max_depth)

    self.add_subtask(remap_climatology_subtask)

    ...

We will explore the RemapMpasOHCClimatology subtask later in the tutorial so we will not discuss it further here.

Still within the loop over depth range, we then add a subtask (PlotClimatologyMapSubtask) for plot we want to create, one for each each comparison grid and season. (By default, there is only one comparison grid and one “season”: the full year, ANN.)

for min_depth, max_depth in depth_ranges:
    ...
    out_file_label = f'deltaOHC_{depth_range_string}'
    remap_observations_subtask = None
    if control_config is None:
        ref_title_label = None
        ref_field_name = None
        diff_title_label = 'Model - Observations'

    else:
        control_run_name = control_config.get('runs', 'mainRunName')
        ref_title_label = f'Control: {control_run_name}'
        ref_field_name = mpas_field_name
        diff_title_label = 'Main - Control'

    for comparison_grid_name in comparison_grid_names:
        for season in seasons:
            # make a new subtask for this season and comparison grid
            subtask_name = f'plot{season}_{comparison_grid_name}_{depth_range_string}'

            subtask = PlotClimatologyMapSubtask(
                self, season, comparison_grid_name,
                remap_climatology_subtask, remap_observations_subtask,
                controlConfig=control_config, subtaskName=subtask_name)

            subtask.set_plot_info(
                outFileLabel=out_file_label,
                fieldNameInTitle=f'$\\Delta$OHC over {depth_range_string}',
                mpasFieldName=mpas_field_name,
                refFieldName=ref_field_name,
                refTitleLabel=ref_title_label,
                diffTitleLabel=diff_title_label,
                unitsLabel=r'GJ m$^{-2}$',
                imageCaption=f'Anomaly in Ocean Heat Content over {depth_range_string}',
                galleryGroup='OHC Anomaly',
                groupSubtitle=None,
                groupLink='ohc_anom',
                galleryName=None)

            self.add_subtask(subtask)

First, we make sure the subtask has a unique name. If two tasks or subtasks have the same taskName and subtaskName, MPAS-Analysis will only run the last one and the task manager may become confused.

Then, we create a subtask object that is an instance of the PlotClimatologyMapSubtask class. This class is shared between several ocean analysis tasks for plotting climatologies as horizontal maps. It can plot just MPAS output, remapped to one or more comparison grids and averaged over one or more seasons. It can also plot that data against an observational field that has been remapped to the same comparison grid and averaged over the same seasons. In this case, there are no observations available for comparison (remap_observations_subtask = None). A user may have provided a “control” run of MPAS-Analysis to compare with this analysis run (a so-called “model vs. model” comparison). If so, control_config will have config options describing the other analysis run. If not, control_config is None.

Next, We call the set_plot_info() method of PlotClimatologyMapSubtask to provide things like the title and units for the plot and the field to plot. We also provide information needed for the final analysis web page such as the name of the gallery group. (We do not provide a gallery name within the gallery group because there will be no other galleries within this group.) All the plots for a given comparison grid will end up in the same gallery, with different depths and seasons one after the other.

Finally, we call add_subtask() to add the subtask to this task.

2.3 setup_and_check() method

The setup_and_check() method of an analysis task is called when it is clear that this particular analysis has been requested (but before the analysis is actually ready to run). This is in contrast to the constructor, which is run for every analysis task everytime MPAS-Analysis runs because we need information from the analysis task (its name, component and tags) in order to determine if it should run or not.

In this method, we would typically perform checks to make sure the simulation has been configured properly to run the analysis. For example, is the necessary analysis member enabled.

def setup_and_check(self):
    """
    Checks whether analysis is being performed only on the reference year,
    in which case the analysis will not be meaningful.

    Raises
    ------
    ValueError: if attempting to analyze only the reference year
    """

    # first, call setup_and_check from the base class (AnalysisTask),
    # which will perform some common setup, including storing:
    #     self.runDirectory , self.historyDirectory, self.plotsDirectory,
    #     self.namelist, self.runStreams, self.historyStreams,
    #     self.calendar
    super().setup_and_check()

    start_year, end_year = self.mpas_climatology_task.get_start_and_end()
    ref_start_year, ref_end_year = \
        self.ref_year_climatology_task.get_start_and_end()

    if (start_year == ref_start_year) and (end_year == ref_end_year):
        raise ValueError('OHC Anomaly is not meaningful and will not work '
                         'when climatology and ref year are the same.')

In this particular case, we first call the super class’ version of the setup_and_check() method. This takes care of some important setup.

Then, we use this method to check if the user has specified meaningful values for the climatology start and end year and the reference year. If they happen to be the same, it doesn’t really make sense to run the analysis and it will raise an error so the analysis gets skipped.

The ClimatologyMapOHCAnomaly has delegated all its work to its subtasks so it doesn’t define a run_task() method. Tasks or subtasks that actually do the work typically need to define this method, as we will explore below.

3. Tour of a subtask (RemapMpasOHCClimatology)

The class RemapMpasOHCClimatology is, in some ways, more complicated than its “parent” task ClimatologyMapOHCAnomaly. It descends not from the AnalysisTask base class but from another subtask, RemapMpasClimatologySubtask. This tutorial won’t attempt to cover RemapMpasClimatologySubtask in all its detail. The basics are that that class starts with MPAS climatology data over one or more seasons that has previously been computed by an MpasClimatologyTask task. It remaps that data from the MPAS mesh to one or more comparison grids (e.g. global latitude-longitude or Antarctic stereographic) where it can be plotted and compared with observations or another MPAS-Analysis run.

Here, we are not just using RemapMpasClimatologySubtask directly because we need to add to its functionality. We need to compute the OHC, which is not available straight from MPAS-Ocean output, from the monthly-mean temperature and layer thickness.

3.1 Attributes

The docstring indicates the attributes that RemapMpasOHCClimatology includes. (It also has all the attributes of its super class, RemapMpasClimatologySubtask, and that class’ super class, AnalysisTask, but we don’t redundantly document these in the docstring in part because that would be a maintenance nightmare.)

class RemapMpasOHCClimatology(RemapMpasClimatologySubtask):
    """
    A subtask for computing climatologies of ocean heat content from
    climatologies of temperature

    Attributes
    ----------
    ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
        The task that produced the climatology from the first year to be
        remapped and then subtracted from the main climatology

    min_depth, max_depth : float
        The minimum and maximum depths for integration
    """

The attributes are a task for computing the climatology over the reference year (usually the start of the simulation), ref_year_climatology_task, and the minimum and maximum depth over which the ocean heat content will be integrated.

3.2 Constructor

def __init__(self, mpas_climatology_task, ref_year_climatology_task,
             parent_task, climatology_name, variable_list, seasons,
             comparison_grid_names, min_depth, max_depth):

    """
    Construct the analysis task and adds it as a subtask of the
    ``parent_task``.

    Parameters
    ----------
    mpas_climatology_task : mpas_analysis.shared.climatology.MpasClimatologyTask
        The task that produced the climatology to be remapped

    ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
        The task that produced the climatology from the first year to be
        remapped and then subtracted from the main climatology

    parent_task :  mpas_analysis.shared.AnalysisTask
        The parent task, used to get the ``taskName``, ``config`` and
        ``componentName``

    climatology_name : str
        A name that describes the climatology (e.g. a short version of
        the important field(s) in the climatology) used to name the
        subdirectories for each stage of the climatology

    variable_list : list of str
        A list of variable names in ``timeSeriesStatsMonthly`` to be
        included in the climatologies

    seasons : list of str, optional
        A list of seasons (keys in ``shared.constants.monthDictionary``)
        to be computed or ['none'] (not ``None``) if only monthly
        climatologies are needed.

    comparison_grid_names : list of {'latlon', 'antarctic'}
        The name(s) of the comparison grid to use for remapping.

    min_depth, max_depth : float
        The minimum and maximum depths for integration
    """

    depth_range_string = f'{np.abs(min_depth):g}-{np.abs(max_depth):g}m'
    subtask_name = f'remapMpasClimatology_{depth_range_string}'
    # call the constructor from the base class
    # (RemapMpasClimatologySubtask)
    super().__init__(
        mpas_climatology_task, parent_task, climatology_name,
        variable_list, seasons, comparison_grid_names,
        subtaskName=subtask_name)

    self.ref_year_climatology_task = ref_year_climatology_task
    self.run_after(ref_year_climatology_task)
    self.min_depth = min_depth
    self.max_depth = max_depth

Most of the arguments to the constructor are passed along to the constructor of RemapMpasClimatologySubtask. These include a reference to the class for computing MPAS climatologies (used to find the input files and to make sure this task waits until that task is finished), a reference to the “parent” ClimatologyMapOHCAnomaly task for some of its attributes, the name of the climatology supplied by the parent (something like deltaOHC_0-700m, depending on the depth range), a list of the variables that go into computing the OHC, the season(s) over which the climatology was requested, the comparison grid(s) to plot on and a unique name for this subtask.

The ref_year_climatology_task that computing the climatology over the reference year is retained as an attribute to the class along with the depth range. These attributes will all be needed later when we compute the OHC. We indicate that this task must wait for the reference climatology to be available by calling the run_after(). The super class will do the same for the mpas_climatology_task task. It will also add this task as a subtask of the parent task.

3.3 setup_and_check() method

As in the parent task, we need to define the setup_and_check() method.

def setup_and_check(self):
    """
    Perform steps to set up the analysis and check for errors in the setup.
    """

    # first, call setup_and_check from the base class
    # (RemapMpasClimatologySubtask), which will set up remappers and add
    # variables to mpas_climatology_task
    super().setup_and_check()

    # don't add the variables and seasons to mpas_climatology_task until
    # we're sure this subtask is supposed to run
    self.ref_year_climatology_task.add_variables(self.variableList,
                                                 self.seasons)

In this particular case, we first call the super class’ version of the setup_and_check() method. This takes care of some important setup including adding the variables and season(s) we need to the mpas_climatology_task.

Then, we use this method to add variables we need and the requested season(s) to the task for computing the climatology over the reference year (ref_year_climatology_task). We don’t do this in the constructor because if we did, we would always be asking for the variables needed to compute the OHC even if we don’t actually end up computing it. This could be a big waste of time and disk space. The super class RemapMpasClimatologySubtask can’t take care of this for us because it isn’t designed for computing anomalies, just “normal” climatologies over a range of years.

3.4 run_task() method

Normally, the main work of a task happens in the run_task() method. The RemapMpasOHCClimatology class doesn’t define this method because it is happy to inherit the run_task() method from its super class, RemapMpasClimatologySubtask.

An abbreviated version of that method looks like this:

def run_task(self):
    """
    Compute the requested climatologies
    """
    ...
    for season in self.seasons:
        self._mask_climatologies(season, dsMask)
    ...

It calls a private helper method:

def _mask_climatologies(self, season, dsMask):
    """
    For each season, creates a masked version of the climatology
    """
    ...
    if not os.path.exists(maskedClimatologyFileName):
        ...

        # customize (if this function has been overridden)
        climatology = self.customize_masked_climatology(climatology,
                                                        season)

        write_netcdf(climatology, maskedClimatologyFileName)

This private method (the leading underscore indicates that it is private), in turn, calls the customize_masked_climatology() method, which is our chance to make changes to the climatology before it gets remapped. That’s where we will actually compute the OHC from variables available from MPAS output.

3.5 customize_masked_climatology() method

Here is how we compute the OHC itself:

def customize_masked_climatology(self, climatology, season):
    """
    Compute the ocean heat content (OHC) anomaly from the temperature
    and layer thickness fields.

    Parameters
    ----------
    climatology : xarray.Dataset
        the climatology data set

    season : str
        The name of the season to be masked

    Returns
    -------
    climatology : xarray.Dataset
        the modified climatology data set
    """

    ohc = self._compute_ohc(climatology)

    ...

We call a private helper method to do the actual work, so let’s take a look at that before we continue with customize_masked_climatology().

def _compute_ohc(self, climatology):
    """
    Compute the OHC from the temperature and layer thicknesses in a given
    climatology data sets.
    """
    ds_restart = xr.open_dataset(self.restartFileName)
    ds_restart = ds_restart.isel(Time=0)

    # specific heat [J/(kg*degC)]
    cp = self.namelist.getfloat('config_specific_heat_sea_water')
    # [kg/m3]
    rho = self.namelist.getfloat('config_density0')

    units_scale_factor = 1e-9

    n_vert_levels = ds_restart.sizes['nVertLevels']

    z_mid = compute_zmid(ds_restart.bottomDepth, ds_restart.maxLevelCell-1,
                         ds_restart.layerThickness)

    vert_index = xr.DataArray.from_dict(
        {'dims': ('nVertLevels',), 'data': np.arange(n_vert_levels)})

    temperature = climatology['timeMonthly_avg_activeTracers_temperature']
    layer_thickness = climatology['timeMonthly_avg_layerThickness']

    masks = [vert_index < ds_restart.maxLevelCell,
             z_mid <= self.min_depth,
             z_mid >= self.max_depth]
    for mask in masks:
        temperature = temperature.where(mask)
        layer_thickness = layer_thickness.where(mask)

    ohc = units_scale_factor * rho * cp * layer_thickness * temperature
    ohc = ohc.sum(dim='nVertLevels')
    return ohc

This function uses a combination of mesh information taken from an MPAS restart file (available from the self.restartFileName attribute inherited from RemapMpasClimatologySubtask), namelist options available from the self.namelist reader (inherited from AnalysisTask), and temperature and layer_thickness from the climatology dataset itself. As the docstring for customize_masked_climatology() states, climatology is and xarray.Dataset. We know it has variables timeMonthly_avg_activeTracers_temperature and timeMonthly_avg_layerThickness because we requested them back in the constructor of ClimatologyMapOHCAnomaly. We compute the ohc as an xarray.DataArray that we return from this helper method.

Back to customize_masked_climatology(), we have:

def customize_masked_climatology(self, climatology, season):
    ...
    ohc = self._compute_ohc(climatology)

    ref_file_name = self.ref_year_climatology_task.get_file_name(season)
    ref_year_climo = xr.open_dataset(ref_file_name)
    if 'Time' in ref_year_climo.dims:
        ref_year_climo = ref_year_climo.isel(Time=0)
    ref_ohc = self._compute_ohc(ref_year_climo)

    climatology['deltaOHC'] = ohc - ref_ohc
    climatology.deltaOHC.attrs['units'] = 'GJ m^-2$'
    start_year = self.ref_year_climatology_task.startYear
    climatology.deltaOHC.attrs['description'] = \
        f'Anomaly from year {start_year} in ocean heat content'
    climatology = climatology.drop_vars(self.variableList)

    return climatology

We use the same helper function to compute the ref_ohc using the climatology for the reference year. Then, we compute the anomaly (the difference between these two, deltaOHC) and we add some attributes, units and description, to make the NetCDF output that will go into the analysis output directory a little more useful.

4. The full code for posterity

Since the ClimatologyMapOHCAnomaly analysis task may evolve in the future, here is the full analysis task as described in this tutorial:

# This software is open source software available under the BSD-3 license.
#
# Copyright (c) 2022 Triad National Security, LLC. All rights reserved.
# Copyright (c) 2022 Lawrence Livermore National Security, LLC. All rights
# reserved.
# Copyright (c) 2022 UT-Battelle, LLC. All rights reserved.
#
# Additional copyright and license information can be found in the LICENSE file
# distributed with this code, or at
# https://raw.githubusercontent.com/MPAS-Dev/MPAS-Analysis/main/LICENSE
import xarray as xr
import numpy as np

from mpas_analysis.shared import AnalysisTask
from mpas_analysis.shared.climatology import RemapMpasClimatologySubtask
from mpas_analysis.ocean.plot_climatology_map_subtask import \
    PlotClimatologyMapSubtask
from mpas_analysis.ocean.utility import compute_zmid


class ClimatologyMapOHCAnomaly(AnalysisTask):
    """
    An analysis task for comparison of the anomaly from a reference year
    (typically the start of the simulation) of ocean heat content (OHC)

    Attributes
    ----------
    mpas_climatology_task : mpas_analysis.shared.climatology.MpasClimatologyTask
        The task that produced the climatology to be remapped and plotted

    ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
        The task that produced the climatology from the first year to be
        remapped and then subtracted from the main climatology
    """

    def __init__(self, config, mpas_climatology_task,
                 ref_year_climatology_task, control_config=None):
        """
        Construct the analysis task.

        Parameters
        ----------
        config : mpas_tools.config.MpasConfigParser
            Configuration options

        mpas_climatology_task : mpas_analysis.shared.climatology.MpasClimatologyTask
            The task that produced the climatology to be remapped and plotted

        ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
            The task that produced the climatology from the first year to be
            remapped and then subtracted from the main climatology

        control_config : mpas_tools.config.MpasConfigParser, optional
            Configuration options for a control run (if any)
        """

        field_name = 'deltaOHC'
        # call the constructor from the base class (AnalysisTask)
        super().__init__(config=config, taskName='climatologyMapOHCAnomaly',
                         componentName='ocean',
                         tags=['climatology', 'horizontalMap', field_name,
                               'publicObs', 'anomaly'])

        self.mpas_climatology_task = mpas_climatology_task
        self.ref_year_climatology_task = ref_year_climatology_task

        section_name = self.taskName

        # read in what seasons we want to plot
        seasons = config.getexpression(section_name, 'seasons')

        if len(seasons) == 0:
            raise ValueError(f'config section {section_name} does not contain '
                             f'valid list of seasons')

        comparison_grid_names = config.getexpression(section_name,
                                                     'comparisonGrids')

        if len(comparison_grid_names) == 0:
            raise ValueError(f'config section {section_name} does not contain '
                             f'valid list of comparison grids')

        depth_ranges = config.getexpression('climatologyMapOHCAnomaly',
                                            'depthRanges',
                                            use_numpyfunc=True)

        mpas_field_name = 'deltaOHC'

        variable_list = ['timeMonthly_avg_activeTracers_temperature',
                         'timeMonthly_avg_layerThickness']

        for min_depth, max_depth in depth_ranges:
            depth_range_string = \
                f'{np.abs(min_depth):g}-{np.abs(max_depth):g}m'
            remap_climatology_subtask = RemapMpasOHCClimatology(
                mpas_climatology_task=mpas_climatology_task,
                ref_year_climatology_task=ref_year_climatology_task,
                parent_task=self,
                climatology_name=f'{field_name}_{depth_range_string}',
                variable_list=variable_list,
                comparison_grid_names=comparison_grid_names,
                seasons=seasons,
                min_depth=min_depth,
                max_depth=max_depth)

            self.add_subtask(remap_climatology_subtask)

            out_file_label = f'deltaOHC_{depth_range_string}'
            remap_observations_subtask = None
            if control_config is None:
                ref_title_label = None
                ref_field_name = None
                diff_title_label = 'Model - Observations'

            else:
                control_run_name = control_config.get('runs', 'mainRunName')
                ref_title_label = f'Control: {control_run_name}'
                ref_field_name = mpas_field_name
                diff_title_label = 'Main - Control'

            for comparison_grid_name in comparison_grid_names:
                for season in seasons:
                    # make a new subtask for this season and comparison grid
                    subtask_name = f'plot{season}_{comparison_grid_name}_{depth_range_string}'

                    subtask = PlotClimatologyMapSubtask(
                        self, season, comparison_grid_name,
                        remap_climatology_subtask, remap_observations_subtask,
                        controlConfig=control_config, subtaskName=subtask_name)

                    subtask.set_plot_info(
                        outFileLabel=out_file_label,
                        fieldNameInTitle=f'$\\Delta$OHC over {depth_range_string}',
                        mpasFieldName=mpas_field_name,
                        refFieldName=ref_field_name,
                        refTitleLabel=ref_title_label,
                        diffTitleLabel=diff_title_label,
                        unitsLabel=r'GJ m$^{-2}$',
                        imageCaption=f'Anomaly in Ocean Heat Content over {depth_range_string}',
                        galleryGroup='OHC Anomaly',
                        groupSubtitle=None,
                        groupLink='ohc_anom',
                        galleryName=None)

                    self.add_subtask(subtask)

    def setup_and_check(self):
        """
        Checks whether analysis is being performed only on the reference year,
        in which case the analysis will not be meaningful.

        Raises
        ------
        ValueError: if attempting to analyze only the reference year
        """

        # first, call setup_and_check from the base class (AnalysisTask),
        # which will perform some common setup, including storing:
        #     self.runDirectory , self.historyDirectory, self.plotsDirectory,
        #     self.namelist, self.runStreams, self.historyStreams,
        #     self.calendar
        super().setup_and_check()

        start_year, end_year = self.mpas_climatology_task.get_start_and_end()
        ref_start_year, ref_end_year = \
            self.ref_year_climatology_task.get_start_and_end()

        if (start_year == ref_start_year) and (end_year == ref_end_year):
            raise ValueError('OHC Anomaly is not meaningful and will not work '
                             'when climatology and ref year are the same.')


class RemapMpasOHCClimatology(RemapMpasClimatologySubtask):
    """
    A subtask for computing climatologies of ocean heat content from
    climatologies of temperature

    Attributes
    ----------
    ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
        The task that produced the climatology from the first year to be
        remapped and then subtracted from the main climatology

    min_depth, max_depth : float
        The minimum and maximum depths for integration
    """

    def __init__(self, mpas_climatology_task, ref_year_climatology_task,
                 parent_task, climatology_name, variable_list, seasons,
                 comparison_grid_names, min_depth, max_depth):

        """
        Construct the analysis task and adds it as a subtask of the
        ``parent_task``.

        Parameters
        ----------
        mpas_climatology_task : mpas_analysis.shared.climatology.MpasClimatologyTask
            The task that produced the climatology to be remapped

        ref_year_climatology_task : mpas_analysis.shared.climatology.RefYearMpasClimatologyTask
            The task that produced the climatology from the first year to be
            remapped and then subtracted from the main climatology

        parent_task :  mpas_analysis.shared.AnalysisTask
            The parent task, used to get the ``taskName``, ``config`` and
            ``componentName``

        climatology_name : str
            A name that describes the climatology (e.g. a short version of
            the important field(s) in the climatology) used to name the
            subdirectories for each stage of the climatology

        variable_list : list of str
            A list of variable names in ``timeSeriesStatsMonthly`` to be
            included in the climatologies

        seasons : list of str, optional
            A list of seasons (keys in ``shared.constants.monthDictionary``)
            to be computed or ['none'] (not ``None``) if only monthly
            climatologies are needed.

        comparison_grid_names : list of {'latlon', 'antarctic'}
            The name(s) of the comparison grid to use for remapping.

        min_depth, max_depth : float
            The minimum and maximum depths for integration
        """

        depth_range_string = f'{np.abs(min_depth):g}-{np.abs(max_depth):g}m'
        subtask_name = f'remapMpasClimatology_{depth_range_string}'
        # call the constructor from the base class
        # (RemapMpasClimatologySubtask)
        super().__init__(
            mpas_climatology_task, parent_task, climatology_name,
            variable_list, seasons, comparison_grid_names,
            subtaskName=subtask_name)

        self.ref_year_climatology_task = ref_year_climatology_task
        self.run_after(ref_year_climatology_task)
        self.min_depth = min_depth
        self.max_depth = max_depth

    def setup_and_check(self):
        """
        Perform steps to set up the analysis and check for errors in the setup.
        """

        # first, call setup_and_check from the base class
        # (RemapMpasClimatologySubtask), which will set up remappers and add
        # variables to mpas_climatology_task
        super().setup_and_check()

        # don't add the variables and seasons to mpas_climatology_task until
        # we're sure this subtask is supposed to run
        self.ref_year_climatology_task.add_variables(self.variableList,
                                                     self.seasons)

    def customize_masked_climatology(self, climatology, season):
        """
        Compute the ocean heat content (OHC) anomaly from the temperature
        and layer thickness fields.

        Parameters
        ----------
        climatology : xarray.Dataset
            the climatology data set

        season : str
            The name of the season to be masked

        Returns
        -------
        climatology : xarray.Dataset
            the modified climatology data set
        """

        ohc = self._compute_ohc(climatology)
        ref_file_name = self.ref_year_climatology_task.get_file_name(season)
        ref_year_climo = xr.open_dataset(ref_file_name)
        if 'Time' in ref_year_climo.dims:
            ref_year_climo = ref_year_climo.isel(Time=0)
        ref_ohc = self._compute_ohc(ref_year_climo)

        climatology['deltaOHC'] = ohc - ref_ohc
        climatology.deltaOHC.attrs['units'] = 'GJ m^-2'
        start_year = self.ref_year_climatology_task.startYear
        climatology.deltaOHC.attrs['description'] = \
            f'Anomaly from year {start_year} in ocean heat content'
        climatology = climatology.drop_vars(self.variableList)

        return climatology

    def _compute_ohc(self, climatology):
        """
        Compute the OHC from the temperature and layer thicknesses in a given
        climatology data sets.
        """
        ds_restart = xr.open_dataset(self.restartFileName)
        ds_restart = ds_restart.isel(Time=0)

        # specific heat [J/(kg*degC)]
        cp = self.namelist.getfloat('config_specific_heat_sea_water')
        # [kg/m3]
        rho = self.namelist.getfloat('config_density0')

        units_scale_factor = 1e-9

        n_vert_levels = ds_restart.sizes['nVertLevels']

        z_mid = compute_zmid(ds_restart.bottomDepth, ds_restart.maxLevelCell-1,
                             ds_restart.layerThickness)

        vert_index = xr.DataArray.from_dict(
            {'dims': ('nVertLevels',), 'data': np.arange(n_vert_levels)})

        temperature = climatology['timeMonthly_avg_activeTracers_temperature']
        layer_thickness = climatology['timeMonthly_avg_layerThickness']

        masks = [vert_index < ds_restart.maxLevelCell,
                 z_mid <= self.min_depth,
                 z_mid >= self.max_depth]
        for mask in masks:
            temperature = temperature.where(mask)
            layer_thickness = layer_thickness.where(mask)

        ohc = units_scale_factor * rho * cp * layer_thickness * temperature
        ohc = ohc.sum(dim='nVertLevels')
        return ohc