Moving variable mapping outside of mpas_xarray ============================================== .. raw:: html

Xylar Asay-Davis
date: 2017/02/10

Summary

In discussions with @pwolfram, it became clear that we would like to keep mpas_xarray as general as possible, rather than adding code specific to MPAS-Analysis. In particular, the capability for mapping variable names that is currently part of mpas_xarray is likely a capability that only MPAS-Analysis will need when opening xarray data sets. Likewise, there is a desire for mpax_xarray not to use any of the functionality outside of its own module so that it remains autonomous from MPAS-Analysis. At the same time, it is desirable for efficiency and parallelism to perform certain operations during the preprocessing step within xarray, rather than constructing a data set first and then (in serial) performing manipulations (e.g. creating a time coordinate and slicing variables). The solution will be tested by making sure it produces bit-for-bit identical results to those from the develop branch for typical test cases on LANL IC and Edison.

Requirements

.. raw:: html

Requirement: mpas_xarray does not include MPAS-Analysis specific functionality
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis

MPAS-Analysis specific functionality such as variable mapping should be removed from mpas_xarray so it can remain an independent module, requiring minimal modification to accommodate MPAS-Analysis' needs. .. raw:: html

Requirement: MPAS-Analysis specific functionality should be supported in xarray preprossing
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis

There should be a way to perform MPAS-Analysis specific functionality such as mapping variables during preprocessing. This functionality should be relatively easy to add to as new preprocessing needs arise. .. raw:: html

Algorithmic Formulations (optional)

.. raw:: html

Algorithm: mpas_xarray does not include MPAS-Analysis specific functionality
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis

All functions and function arguments related to variable mapping will be removed from mpas_xarray and moved elsewhere. .. raw:: html

Algorithm: MPAS-Analysis specific functionality should be supported in xarray preprossing
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis

A new utility function, ``open_multifile_dataset`` will added to ``mpas_xarray`` that simplifies current calls to ``xarray.open_mfdataset`` to hide the preprocessor and take care of removing redundant time indices once the dataset has been built. (This function doesn't directly address the requirement but is meant to make ``mpas_xarray`` easier to use and made sense because it has a one-to-one correspondence with other functionality, described below, that does address the requirement.) A new module, ``generalized_reader`` will also be added with its own ``open_multifile_dataset`` function. This version takes additional arguments including a variable map and start and end dates for the dataset. ``generalized_reader.open_multifile_dataset`` will create a data set by calling ``xarray.open_mfdataset`` with its own preprocessing function, ``generalized_reader._preprocess`` that first maps variable names, then calls ``mpas_xarray.preprocess`` to finish the job. Once the dataset has been constructed, redundant time indices are removed and the 'Time' coordinate is sliced to be between the supplied start and end dates. This solution may add some confusion in terms of which reader should be used to open xarray datasets. It is my sense that most developers adding new functionality will do so by modifying existing scripts, and these examples should make it clear which version of ``open_multifile_dataset`` is most appropriate. Nevertheless, clear documentation of ``generalized_reader`` and ``mpas_xarray``\ , and their differences are needed. Here is a typical usage of ``generalized_reader.open_multifile_dataset``\ : .. code-block:: python from mpas_analysis.shared.generalized_reader.generalized_reader \ import open_multifile_dataset file_name = 'example_jan_feb.nc' timestr = ['xtime_start', 'xtime_end'] var_list = ['time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature'] variable_map = { 'avgSurfaceTemperature': ['time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature', 'other_string', 'yet_another_string'], 'daysSinceStartOfSim': ['time_avg_daysSinceStartOfSim', 'xtime', 'something_else']} ds = open_multifile_dataset(file_names=file_name, calendar=calendar, time_variable_name=timestr, variable_list=var_list, start_date='0001-01-01', end_date='9999-12-31', variable_map=variable_map, year_offset=1850) Here is the same for ``mpas_xarray.open_multifile_dataset`` without the variable map, start and end dates: .. code-block:: python from mpas_analysis.shared.mpas_xarray.mpas_xarray \ import open_multifile_dataset file_name = 'example_jan_feb.nc' timestr = ['xtime_start', 'xtime_end'] var_list = ['time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature'] ds = open_multifile_dataset(file_names=file_name, calendar=calendar, time_variable_name=timestr, variable_list=var_list, year_offset=1850) .. raw:: html

Design and Implementation

.. raw:: html

Implementation: mpas_xarray does not include MPAS-Analysis specific functionality
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis

A test branch can be found here `xylar/MPAS-Analysis/variable_mapping_reorg `_ I have removed ``map_variable`` and ``rename_variables`` from ``mpas_xarray``. I also removed any mention of the variable map from the rest of ``mpas_xarray``. This branch also includes several other cleanup operations that are not addressing any requirements. These include: * I added a new helper function, ``open_multifile_dataset``\ , for opening an xarray data set in a single, simple command without reference to the preprocessor. This function should make opening new data sets more intuitive for mpas_xarray users. * making several utility functions non-public (it is unclear to me why anyone want to call these directly): * ``_assert_valid_datetimes`` * ``_assert_valid_selections`` * ``_ensure_list`` * ``_get_datetimes`` * I have removed the ability to run ``mpas_xarray.py`` as a script and the associated tests. This is on the premise that 1) the test were outdated and would have needed to be updated to work with the current code and 2) unit testing in ``test/test_mpas_xarray.py`` takes care of this capability in a better way. * I have tried to make variable names a bit more verbose in various places. However, at @pwolfram'2 request, I have left ds for datasets, following the ``xarray`` convention. * I have tried to improve the docstrings using a syntax that should be useful for generating documentation later on. * I have update unit testing to work with the new inerface, notably the ``open_multifile_dataset`` function. .. raw:: html

Implementation: MPAS-Analysis specific functionality should be supported in xarray preprossing
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis

In the same branch as above, I have added a ``generalized_reader`` module that extends the capabilities of ``mpas_xarray`` to include mapping of variable names. The file structure is as follows: .. code-block:: bash mpas_analysis/shared/ - generalized_reader/ __init__.py generalized_reader.py ``generalized_reader.py`` contains a function ``open_multifile_dataset`` that is similar to the one in ``mpas_xarray`` but with additional arguments needed by analysis: * ``variable_map``\ , a map between MPAS and MPAS-Analysis variable names * ``start_date``\ , the start date of the analysis * ``end_date``\ , the end date of the analysis This function performs the same steps as ``mpas_xarray.open_multifile_dataset`` but uses the local preprocessing function, ``_preprocess``\ , and also slices the 'Time' coordinate using the given start and end dates as a final step. The ``generalized_reader._preprocess`` funciton first maps variable names, then calls ``mpas_xarray.preprocess`` to do the rest of the preprocessing as normal. Two private functions, ``_map_variable_name`` and ``_rename_variables`` (take out of ``mpas_xarray``\ ) are used to perform variable-name mapping. .. raw:: html

Testing

.. raw:: html

Testing and Validation: MPAS-Analysis specific functionality should be supported in xarray preprossing
Date last modified: 2017/02/15
Contributors: Xylar Asay-Davis

In `xylar/MPAS-Analysis/variable_mapping_reorg `_\ , the unit testing for mpas_xarray has been updated. This includes moving unit testing for variable mapping elsewhere. I will make sure all tests with config files in the ``configs/lanl`` and ``configs/edison`` directories produce bit-for-bit results with the current ``develop``. .. raw:: html

Testing and Validation: MPAS-Analysis specific functionality should be supported in xarray preprossing
Date last modified: 2017/02/10
Contributors: Xylar Asay-Davis

Largely, the same as above. I have added unit testing for ``generalized_reader`` (via the standalone ``generalized_reader.open_multifile_dataset`` function). These tests ensure that: * variable mapping works as expected * start and end dates work as expected