Moving variable mapping outside of mpas_xarray
Xylar Asay-Davis
date: 2017/02/10
Summary
In discussions with @pwolfram, it became clear that we would like to keep mpas_xarray as general as possible, rather than adding code specific to MPAS-Analysis. In particular, the capability for mapping variable names that is currently part of mpas_xarray is likely a capability that only MPAS-Analysis will need when opening xarray data sets. Likewise, there is a desire for mpax_xarray not to use any of the functionality outside of its own module so that it remains autonomous from MPAS-Analysis. At the same time, it is desirable for efficiency and parallelism to perform certain operations during the preprocessing step within xarray, rather than constructing a data set first and then (in serial) performing manipulations (e.g. creating a time coordinate and slicing variables). The solution will be tested by making sure it produces bit-for-bit identical results to those from the develop branch for typical test cases on LANL IC and Edison.Requirements
Requirement: mpas_xarray does not include MPAS-Analysis specific
functionality
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis
MPAS-Analysis specific functionality such as variable mapping should be removed from mpas_xarray so it can remain an independent module, requiring minimal modification to accommodate MPAS-Analysis’ needs.
Requirement: MPAS-Analysis specific functionality should be supported in
xarray preprossing
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis
There should be a way to perform MPAS-Analysis specific functionality such as mapping variables during preprocessing. This functionality should be relatively easy to add to as new preprocessing needs arise.
Algorithmic Formulations (optional)
Algorithm: mpas_xarray does not include MPAS-Analysis specific
functionality
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis
All functions and function arguments related to variable mapping will be removed from mpas_xarray and moved elsewhere.
Algorithm: MPAS-Analysis specific functionality should be supported in
xarray preprossing
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis
A new utility function, open_multifile_dataset
will added to mpas_xarray
that simplifies current calls to xarray.open_mfdataset
to hide the
preprocessor and take care of removing redundant time indices once the dataset
has been built. (This function doesn’t directly address the requirement but
is meant to make mpas_xarray
easier to use and made sense because it
has a one-to-one correspondence with other functionality, described below,
that does address the requirement.)
A new module, generalized_reader
will also be added with its own
open_multifile_dataset
function. This version takes additional arguments
including a variable map and start and end dates for the dataset.
generalized_reader.open_multifile_dataset
will create a data set
by calling xarray.open_mfdataset
with its own preprocessing function,
generalized_reader._preprocess
that first maps variable names, then
calls mpas_xarray.preprocess
to finish the job. Once the dataset has
been constructed, redundant time indices are removed and the ‘Time’
coordinate is sliced to be between the supplied start and end dates.
This solution may add some confusion in terms of which reader should
be used to open xarray datasets. It is my sense that most developers
adding new functionality will do so by modifying existing scripts, and
these examples should make it clear which version of
open_multifile_dataset
is most appropriate. Nevertheless, clear
documentation of generalized_reader
and mpas_xarray
, and their
differences are needed.
Here is a typical usage of generalized_reader.open_multifile_dataset
:
from mpas_analysis.shared.generalized_reader.generalized_reader \
import open_multifile_dataset
file_name = 'example_jan_feb.nc'
timestr = ['xtime_start', 'xtime_end']
var_list = ['time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature']
variable_map = {
'avgSurfaceTemperature':
['time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature',
'other_string',
'yet_another_string'],
'daysSinceStartOfSim':
['time_avg_daysSinceStartOfSim',
'xtime',
'something_else']}
ds = open_multifile_dataset(file_names=file_name,
calendar=calendar,
time_variable_name=timestr,
variable_list=var_list,
start_date='0001-01-01',
end_date='9999-12-31',
variable_map=variable_map,
year_offset=1850)
Here is the same for mpas_xarray.open_multifile_dataset
without the
variable map, start and end dates:
from mpas_analysis.shared.mpas_xarray.mpas_xarray \
import open_multifile_dataset
file_name = 'example_jan_feb.nc'
timestr = ['xtime_start', 'xtime_end']
var_list = ['time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature']
ds = open_multifile_dataset(file_names=file_name,
calendar=calendar,
time_variable_name=timestr,
variable_list=var_list,
year_offset=1850)
Design and Implementation
Implementation: mpas_xarray does not include MPAS-Analysis specific
functionality
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis
A test branch can be found here xylar/MPAS-Analysis/variable_mapping_reorg
I have removed map_variable
and rename_variables
from mpas_xarray
.
I also removed any mention of the variable map from the rest of mpas_xarray
.
This branch also includes several other cleanup operations that are not addressing any requirements. These include:
I added a new helper function,
open_multifile_dataset
, for opening an xarray data set in a single, simple command without reference to the preprocessor. This function should make opening new data sets more intuitive for mpas_xarray users.making several utility functions non-public (it is unclear to me why anyone want to call these directly):
_assert_valid_datetimes
_assert_valid_selections
_ensure_list
_get_datetimes
I have removed the ability to run
mpas_xarray.py
as a script and the associated tests. This is on the premise that 1) the test were outdated and would have needed to be updated to work with the current code and 2) unit testing intest/test_mpas_xarray.py
takes care of this capability in a better way.I have tried to make variable names a bit more verbose in various places. However, at @pwolfram’2 request, I have left ds for datasets, following the
xarray
convention.I have tried to improve the docstrings using a syntax that should be useful for generating documentation later on.
I have update unit testing to work with the new inerface, notably the
open_multifile_dataset
function.
Implementation: MPAS-Analysis specific functionality should be supported in
xarray preprossing
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis
In the same branch as above, I have added a generalized_reader
module that
extends the capabilities of mpas_xarray
to include mapping of variable names.
The file structure is as follows:
mpas_analysis/shared/
- generalized_reader/
__init__.py
generalized_reader.py
generalized_reader.py
contains a function open_multifile_dataset
that is similar to
the one in mpas_xarray
but with additional arguments needed by analysis:
variable_map
, a map between MPAS and MPAS-Analysis variable namesstart_date
, the start date of the analysisend_date
, the end date of the analysis This function performs the same steps asmpas_xarray.open_multifile_dataset
but uses the local preprocessing function,_preprocess
, and also slices the ‘Time’ coordinate using the given start and end dates as a final step.
The generalized_reader._preprocess
funciton first maps variable names, then calls
mpas_xarray.preprocess
to do the rest of the preprocessing as normal.
Two private functions, _map_variable_name
and _rename_variables
(take out of
mpas_xarray
) are used to perform variable-name mapping.
Testing
Testing and Validation: MPAS-Analysis specific functionality should be supported in
xarray preprossing
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis
In xylar/MPAS-Analysis/variable_mapping_reorg, the unit testing for mpas_xarray has been updated. This includes moving unit testing for variable mapping elsewhere.
I will make sure all tests with config files in the configs/lanl
and configs/edison
directories produce bit-for-bit results with the current develop
.
Testing and Validation: MPAS-Analysis specific functionality should be supported in
xarray preprossing
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis
Largely, the same as above.
I have added unit testing for generalized_reader
(via the standalone
generalized_reader.open_multifile_dataset
function). These tests ensure that:
variable mapping works as expected
start and end dates work as expected