.. role:: raw-html-m2r(raw) :format: html Analysis Task Template ====================== :raw-html-m2r:`

Xylar Asay-Davis
date: 2017/03/08

` .. raw:: html

Summary

A new template python file for analysis tasks will be added to the repository. The template will include a list of functions that each analysis task *must* implement and example syntax for docstrings used to commend both the full analysis task and the individual functions. The existing analysis tasks will be updated to be consistent with this template. The ``run_analysis.py`` driver script will also be updated to work with the template. The template is needed to: #. serve as a starting point for writing new analysis tasks #. ensure that tasks implement a standard set of functions, making it easier to perform actions (such as checking whether the task should be run, checking for required model and observations files, purging files from a previous analysis run, and running the analysis) on each analysis task in sequence (and, in the future, in parallel) #. demonstrate the syntax and style of docstrings required to comment/document each task and each function .. raw:: html

Requirements

.. raw:: html

Requirement: Template for Analysis Tasks
Date last modified: 2017/03/08
Contributors: Xylar Asay-Davis

The template should include each function that each analysis task *must* implement and example docstring both for the task as a whole and for each funciton. .. raw:: html

Requirement: Validation within Analysis Tasks
Date last modified: 2017/03/08
Contributors: Xylar Asay-Davis

Validation, such as checking config options or adding new ones if they are missing, or checking if required data files are present, should be performed within a function in each task (rather than in ``run_analysis.py``\ , as is sometimes the current case). .. raw:: html

Requirement: Analysis Continues even when Analysis Task Fails
Date last modified: 2017/03/08
Contributors: Xylar Asay-Davis

If validation fails, an error message should be printed but other analysis tasks should be allowed to run. Similarly, if a given analysis task raises an exception, the error and stack trace should be printed but other analysis tasks should still be run. .. raw:: html

Requirement: List of Tasks to Perform
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

There should be a single place where new tasks are added to ``run_analysis.py``\ , as is presently the case. Yet, there should be a way to create a list of tasks to be performed and later determine whether, when and how those tasks are to be run. This capability also allows for operations like purging files from a prevous run to be added in the future. The capability is also requried to allow for later task parallelism. Currently, a task module is imported, there is a check to see if that task should be run, and the task is performed in immediate sequence. .. raw:: html

Algorithmic Formulations (optional)

.. raw:: html

Design solution: Template for Analysis Tasks
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

A base class, ``AnalysisTask`` will be added under ``shared/analysis_task.py``. This class will include methods: * ``__init__``\ : construct the task, including assigning variable and streams maps (optional). * ``setup_and_check``\ : performs common tasks to all analysis, such as reading namelist and streams files * ``run``\ : the base class version does nothing The template will show how to set up a child class that decends from ``AnalysisTask``. It will show examples of: * ``__init__``\ : construct the task, including assigning the ``taskName``\ , ``componentName`` and ``categories`` of the analysis, and calling the base class's constructor. * ``setup_and_check``\ : first, calls the base class' version of ``setup_and_check``\ , then, determines if the configuration is valid for running this task (e.g. if necessary files and config options are present) * ``run``\ : runs the analysis task The template will be located at: .. code-block:: bash mpas_analysis/ - analysis_task_template.py That is, it is the only file (other than ``__init__.py``\ ) in the base of the ``mpas_analysis`` directory, making it easy to find. This way, it will be the first file most developers see when they look in ``mpas_analysis`` itself. A reference to the template as the starting point for new developers will be added to the readme. .. raw:: html

Design solution: Validation within Analysis Tasks
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

The ``setup_and_check`` method within each analysis task can be used to determine if necessary input files are present and/or if config options are set as expected. The template will provide examples of doing this. Existing checks for missing observations files in ``run_analysis.py`` will be moved to individual analyses. This will make clearer which checks correspond with which analysis tasks and will make clearer where such checks should be added within future analysis tasks. Similarly, the addition of the ``startDate`` and ``endDate`` config options will be moved to the corresponding analysis tasks. .. raw:: html

Design solution: Analysis Continues even when Analysis Task Fails
Date last modified: 2017/03/08
Contributors: Xylar Asay-Davis

A try/except will be used around both ``setup_and_check`` and ``run`` calls to make sure an error message and stack trace are printed, but execution will continue for other tasks. .. raw:: html

Design solution: List of Tasks to Perform
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

By having a common base class for all analysis tasks, each task can be checked to see if it should be run based on the ``generate`` command-line or config option. If so, its ``setup_and_check`` function will be run to make sure the configuration is right (and will print a warning if not). If ``setup_and_check`` passes, the analysis can be added to a list of functions to be run. Later, a loop through the list can be used to run each analysis. Some analysis tasks require extra arguments (e.g. the field to be analyzed in the case of ``ocean.modelvsobs`` and the streams and variable maps for all analysis tasks). These arguments will be passed to ``__init__`` and stored as member variables that can later be accessed via ``self.``. .. raw:: html

Design and Implementation

Implementation is in the branch: https://github.com/xylar/MPAS-Analysis/tree/analysis_task_template .. raw:: html

Implementation: Template for Analysis Tasks
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

Here is the suggested base class ``AnalysisTask`` in full, intended to make discussion of individual lines easier: .. code-block:: python """ Defines the base class for analysis tasks. Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ from ..shared.io import NameList, StreamsFile from ..shared.io.utility import build_config_full_path, make_directories class AnalysisTask(object): # {{{ """ The base class for analysis tasks. Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ def __init__(self, config, streamMap=None, variableMap=None): # {{{ """ Construct the analysis task. Individual tasks (children classes of this base class) should first call this method to perform basic initialization, then, define the `taskName`, `componentName` and list of `categories` for the task. Parameters ---------- config : instance of MpasAnalysisConfigParser Contains configuration options streamMap : dict, optional A dictionary of MPAS-O stream names that map to their mpas_analysis counterparts. variableMap : dict, optional A dictionary of MPAS-O variable names that map to their mpas_analysis counterparts. Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ self.config = config self.streamMap = streamMap self.variableMap = variableMap # }}} def setup_and_check(self): # {{{ """ Perform steps to set up the analysis (e.g. reading namelists and streams files). After this call, the following member variables are set: self.inDirectory : the base input directory self.plotsDirectory : the directory for writing plots (which is also created if it doesn't exist) self.namelist : the namelist reader self.streams : the streams file reader self.calendar : the name of the calendar ('gregorian' or 'gregoraian_noleap') Individual tasks (children classes of this base class) should first call this method to perform basic setup, then, check whether the configuration is correct for a given analysis and perform additional, analysis-specific setup. For example, this function could check if necessary observations and other data files are found, then, determine the list of files to be read when the analysis is run. Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ # read parameters from config file self.inDirectory = self.config.get('input', 'baseDirectory') self.plotsDirectory = build_config_full_path(self.config, 'output', 'plotsSubdirectory') namelistFileName = self.config.get('input', 'oceanNamelistFileName') self.namelist = NameList(namelistFileName, path=self.inDirectory) streamsFileName = self.config.get('input', 'oceanStreamsFileName') self.streams = StreamsFile(streamsFileName, streamsdir=self.inDirectory) self.calendar = self.namelist.get('config_calendar_type') make_directories(self.plotsDirectory) # }}} def run(self): # {{{ """ Runs the analysis task. Individual tasks (children classes of this base class) should first call this method to perform any common steps in an analysis task, then, perform the steps required to run the analysis task. Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ return # }}} def check_generate(self): # {{{ """ Determines if this analysis should be generated, based on the `generate` config option and `taskName`, `componentName` and `categories`. Individual tasks do not need to create their own versions of this function. Returns ------- generate : bool Whether or not this task should be run. Raises ------ ValueError : If one of `self.taskName`, `self.componentName` or `self.categories` has not been set. Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017s """ for memberName in ['taskName', 'componentName', 'categories']: if not hasattr(self, memberName): raise ValueError('Analysis tasks must define self.{} in their ' '__init__ method.'.format(memberName)) if (not isinstance(self.categories, list) and self.categories is not None): raise ValueError('Analysis tasks\'s member self.categories ' 'must be NOne or a list of strings.') config = self.config generateList = config.getExpression('output', 'generate') generate = False for element in generateList: if '_' in element: (prefix, suffix) = element.split('_', 1) else: prefix = element suffix = None allSuffixes = [self.componentName] if self.categories is not None: allSuffixes = allSuffixes + self.categories noSuffixes = [self.taskName] + allSuffixes if prefix == 'all': if (suffix in allSuffixes) or (suffix is None): generate = True elif prefix == 'no': if suffix in noSuffixes: generate = False elif element == self.taskName: generate = True return generate # }}} # }}} # vim: foldmethod=marker ai ts=4 sts=4 et sw=4 ft=python And here is the suggested template in full: .. code-block:: python """ This is an example analysis task to be used as a template for new tasks. It should be copied into one of the component folders (`ocean`, `sea_ice`, `land_ice`, etc.) and modified as needed. Don't forget to remove this docstring. (It's not needed.) Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ # import python modules here # import mpas_analysis module here (those with relative paths starting with # dots) from ..shared.analysis_task import AnalysisTask class MyTask(AnalysisTask): # {{{ """ Authors ------- Last Modified ------------- """ def __init__(self, config, streamMap=None, variableMap=None, myArg='myDefaultValue'): # {{{ """ Construct the analysis task. Parameters ---------- config : instance of MpasAnalysisConfigParser Contains configuration options streamMap : dict, optional A dictionary of MPAS-O stream names that map to their mpas_analysis counterparts. variableMap : dict, optional A dictionary of MPAS-O variable names that map to their mpas_analysis counterparts. myNewArg : str, optional Authors ------- Last Modified ------------- """ # first, call the constructor from the base class (AnalysisTask) super(MyTask, self).__init__(config, streamMap, variableMap).__init__(config, streamMap, variableMap) # next, name the task, the component (ocean, sea_ice, etc.) and the # categories (if any) of the component ('timeSeries', 'climatologyMap' # etc.) self.taskName = 'myTask' self.componentName = 'component' self.categories = ['category1', 'category2'] # then, store any additional arguments for use later on. These would # likely include things like the name of a field, region, month, # season, etc. to be analyzed so that the same subclass of AnalysisTask # can perform several different tasks (potentially in parallel) self.myArg = myArg # }}} def setup_and_check(self): # {{{ """ Perform steps to set up the analysis and check for errors in the setup. Raises ------ ValueError: if myArg has an invalid value Authors ------- Last Modified ------------- """ # first, call setup_and_check from the base class (AnalysisTask), # which will perform some common setup, including storing: # self.inDirectory, self.plotsDirectory, self.namelist, self.streams # self.calendar super(MyTask, self).__init__(config, streamMap, variableMap).setup_and_check() # then, perform additional checks specific to this analysis possibleArgs = ['blah', 'thing', 'stuff'] if self.myArg not in possibleArgs: # Note: we're going to allow a long line in this case because it # would be confusing to break up the string (even though it # violates the PEP8 standard) raise ValueError('MyTask must be constructed with argument myArg having one of the values\n' '{}.'.format(possibleArgs)) section = 'MyTask' startDate = '{:04d}-01-01_00:00:00'.format( self.config.getint(section, 'startYear')) if not self.config.has_option(section, 'startDate'): self.config.set(section, 'startDate', startDate) endDate = '{:04d}-12-31_23:59:59'.format( self.config.getint(section, 'endYear')) if not self.config.has_option(section, 'endDate'): self.config.set(section, 'endDate', endDate) # }}} def run(self): # {{{ """ Runs the analysis task. Individual tasks (children classes of this base class) should first call this method to perform any common steps in an analysis task, then, perform the steps required to run the analysis task. Authors ------- Last Modified ------------- """ # here is where the main "meat" of the analysis task goes self._my_sub_task('someText', arg2='differentText') return # }}} # here is where you add helper methods that are meant to be non-public # (they start with an underscore), meaning you don't expect anyone to # access them outside of this file. Typically you won't put as much in # the docstring as you would for a public function or method. # # you can either pass arguments (with or without defaults) or you can # "save" arguments as member variables of `self` and then get them back # (like `self.myArg` here). def _my_sub_task(self, arg1, arg2=None): # {{{ """ """ # perform the task print 'myArg:', self.myArg print 'arg1:', arg1 if arg2 is not None: print 'arg2:', arg2 # }}} # }}} # vim: foldmethod=marker ai ts=4 sts=4 et sw=4 ft=python .. raw:: html

Implementation: Validation within Analysis Tasks
Date last modified: 2017/03/08
Contributors: Xylar Asay-Davis

Here is an example (from ``ocean.climatology_map.ClimatologyMap``\ ) of what the new ``__init__`` and ``setup_and_check`` methods : .. code-block:: python def __init__(self, config, streamMap=None, variableMap=None, fieldName=None): # {{{ """ Construct the analysis task. Parameters ---------- config : instance of MpasAnalysisConfigParser Contains configuration options streamMap : dict, optional A dictionary of MPAS-O stream names that map to their mpas_analysis counterparts. variableMap : dict, optional A dictionary of MPAS-O variable names that map to their mpas_analysis counterparts. fieldName : {'sst', 'mld', 'sss'} The name of the field to be analyzed Raises ------ ValueError : if `fieldName` is not provided or is not one of the supported values Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ # first, call the constructor from the base class (AnalysisTask) AnalysisTask.__init__(config, streamMap, variableMap) upperFieldNames = {'sst': 'SST', 'mld': 'MLD', 'sss': 'SSS' # 'nino34': 'Nino34', # 'mht': 'MHT' # 'moc': 'MOC' } if fieldName is None: raise ValueError('fieldName must be supplied.') if fieldName not in upperFieldNames.keys(): raise ValueError('fieldName must be one of {}.'.format( upperFieldNames.keys())) self.fieldName = fieldName self.upperFieldName = upperFieldNames[fieldName] # name the task, component and category self.taskName = 'climatologyMap{}'.format(self.upperFieldName) self.componentName = 'ocean' self.categories = ['climatologyMap', fieldName] # }}} def setup_and_check(self): # {{{ """ Perform steps to set up the analysis and check for errors in the setup. Raises ------ OSError If files are not present Authors ------- Xylar Asay-Davis Last Modified ------------- 03/16/2017 """ config = self.config section = 'climatology' startDate = '{:04d}-01-01_00:00:00'.format( config.getint(section, 'startYear')) if not config.has_option(section, 'startDate'): config.set(section, 'startDate', startDate) endDate = '{:04d}-12-31_23:59:59'.format( config.getint(section, 'endYear')) if not config.has_option(section, 'endDate'): config.set(section, 'endDate', endDate) return # }}} Much of this code has been taken out of ``run_analysis.py``\ , simplifying and clarifying the code. .. raw:: html

Implementation: Analysis Continues even when Analysis Task Fails
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

Calls to ``check`` and ``run`` methods in ``run_analysis.py`` are inside of ``try/except`` blocks, which catch the exceptions and print the stack trace but don't cause the code to exit. .. code-block:: python try: analysisTask.check() ... except: traceback.print_exc(file=sys.stdout) print "ERROR: analysis module {} failed during check and " \ "will not be run".format(analysisTask.taskName) ... try: analysisModule.run() except: traceback.print_exc(file=sys.stdout) print "ERROR: analysis module {} failed during run".format( analysisTask.taskName) .. raw:: html

Implementation: List of Tasks to Perform
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

The tasks are imported and added to an anaysis list as follows: .. code-block:: python analyses = [] # Ocean Analyses from mpas_analysis.ocean.time_series_ohc import TimeSeriesOHC analyses.append(TimeSeriesOHC(config, streamMap=oceanStreamMap, variableMap=oceanVariableMap)) from mpas_analysis.ocean.time_series_sst import TimeSeriesSST analyses.append(TimeSeriesSST(config, streamMap=oceanStreamMap, variableMap=oceanVariableMap)) from mpas_analysis.ocean.climatology_map import ClimatologyMap \ as ClimatologyMapOcean for fieldName in ['sst', 'mld', 'sss']: analyses.append(ClimatologyMapOcean(config, streamMap=oceanStreamMap, variableMap=oceanVariableMap, fieldName=fieldName)) # Sea Ice Analyses from mpas_analysis.sea_ice.timeseries import TimeSeries as TimeSeriesSeaIce analyses.append(TimeSeriesSeaIce(config, streamMap=seaIceStreamMap, variableMap=seaIceVariableMap)) from mpas_analysis.sea_ice.climatology_map import ClimatologyMap \ as ClimatologyMapSeaIce analyses.append(ClimatologyMapSeaIce(config, streamMap=seaIceStreamMap, variableMap=seaIceVariableMap)) The ``analyses`` list is a list of instances of subclasses of ``AnalysisTask``. Subsequent calls to analysis functions can loop over analyses, as in the following example for calling ``run``\ : .. code-block:: python # run each analysis task for analysisTask in analyses: try: analysisTask.run() except: traceback.print_exc(file=sys.stdout) print "ERROR: analysis module {} failed during run".format( analysisTask.taskName) :raw-html-m2r:`

Testing

` .. raw:: html

Testing and Validation: Template for Analysis Tasks
Date last modified: 2017/03/10
Contributors: Xylar Asay-Davis

Ideally, the test here would be having another developer create an analysis task based on this template. Realistically, this won't happen before the template gets merged into the repository, so I'm counting on feedback from other developers to "test" the template before it gets merged, and there will probably need to be subsequent PRs to make changes as issues arise. .. raw:: html

Testing and Validation: Validation within Analysis Tasks
Date last modified: 2017/03/16
Contributors: Xylar Asay-Davis

I have added ``setup_and_check`` functions within each analysis task. So far, these check for only a subset of the necessary configuration and input files, and could (and should) be expanded in the future. I have verified that all ``setup_and_check`` routines fail when the path to their respective observations and/or preprocessed reference run is not found. .. raw:: html

Testing and Validation: Analysis Continues even when Analysis Task Fails
Date last modified: 2017/03/10
Contributors: Xylar Asay-Davis

I have verified using the ``GMPAS_QU240`` test case and by deliberately introducing errors in the file paths that an error in a given analysis task (either during ``setup_and_check`` or ``run``\ ) causes that task to print a stack trace and an error message but does not prevent other tasks from running. .. raw:: html

Testing and Validation: List of Tasks to Perform
Date last modified: 2017/03/10
Contributors: Xylar Asay-Davis

As stated in implementation, there is a single place in ``run_analysis.py`` where a developer would add her or his task to the analysis. I think this requirement has been satisfied without requiring testing.