.. _quick_start:


Quick Start Guide
=================

Analysis for simulations produced with Model for Prediction Across Scales
(MPAS) components and the Energy Exascale Earth System Model (E3SM), which
used those components.


.. image:: _static/sst_example.png
   :target: _static/sst_example.png
   :alt: sea surface temperature


Installation
------------

MPAS-Analysis is available as an anaconda package via the ``conda-forge`` channel:

.. code-block::

   conda config --add channels conda-forge
   conda create -n mpas-analysis mpas-analysis
   conda activate mpas-analysis

To use the latest version for developers, you will need to set up a conda
environment with the following packages:


* python >= 3.6
* numpy
* scipy
* matplotlib >= 3.0.2
* netCDF4
* xarray >= 0.10.0
* dask
* bottleneck
* basemap
* lxml
* nco >= 4.8.1
* pyproj
* pillow
* cmocean
* progressbar2
* requests
* setuptools
* shapely
* cartopy
* geometric_features
* gsw
* pyremap

These can be installed via the conda command:

.. code-block::

   conda config --add channels conda-forge
   conda create -n mpas-analysis python=3.7 numpy scipy "matplotlib>=3.0.2" \
       netCDF4 "xarray>=0.10.0" dask bottleneck basemap lxml "nco>=4.8.1" pyproj \
       pillow cmocean progressbar2 requests setuptools shapely cartopy \
       geometric_features gsw pyremap
   conda activate mpas-analysis

Then, get the code from:
 `https://github.com/MPAS-Dev/MPAS-Analysis <https://github.com/MPAS-Dev/MPAS-Analysis>`_

Download analysis input data
----------------------------

If you installed the ``mpas-analysis`` package, download the data that is
necessary to MPAS-Analysis by running:

.. code-block::

   download_analysis_data -o /path/to/mpas_analysis/diagnostics

If you are using the git repository, run:

.. code-block::

   ./download_analysis_data.py -o /path/to/mpas_analysis/diagnostics

where ``/path/to/mpas_analysis/diagnostics`` is the main folder that will contain
two subdirectories:


* ``mpas_analysis``\ , which includes mapping and region mask files for
  standard resolution MPAS meshes
* ``observations``\ , which includes the pre-processed observations listed in the
  `Observations table <http://mpas-analysis.readthedocs.io/en/latest/observations.html>`_
  and used to evaluate the model results

Once you have downloaded the analysis data, you will point to its location
(your equivalent of ``path/to/mpas_analysis/diagnostics`` above) in the config
option ``baseDirectory`` in the ``[diagnostics]`` section.

Download Natural Earth data for cartopy
---------------------------------------

The cartopy package (used for creating inset maps) requires shapes of the land,
ocean and coastline from `Natural Earth <https://www.naturalearthdata.com>`_.
Typically, these data are downloaded automatically by cartopy.  However, for
systems with compute nodes that cannot reach the internet, you will need to
download the data manually into your conda environment from a login node before
launching any MPAS-Analysis jobs:

.. code-block::

   download_natural_earth_110m

(or if using the git repo: ``./download_natural_earth_110m.py``\ ).

If the data have already been downloaded, you will see nothing.  Otherwise, you
should see a warning that the data are being downloaded.

**Note**\ : If you are having issues downloading the shape files (e.g., a time out error or forbidden error), follow these steps:


#. Run the following in python on your local machine (i.e., one that has no trouble downloading these files):
   .. code-block::

      import cartopy.io.shapereader as shpreader
      for name in ['ocean', 'coastline', 'land']:
       shpfilename = shpreader.natural_earth(resolution='110m',
                                             category='physical',
                                             name=name)
       shpreader.Reader(shpfilename)

#. On your local machine, run ``python -c "import cartopy; print(cartopy.config['data_dir'])"``. This will print out the directory in which the natural earth shapefiles are being placed locally.
#. Copy these files onto the remote machine you are working on. Include folders ``shapefiles/natural_earth/physical/*`` where ``*`` is the set of shapefiles that were downloaded.
#. On your remote machine, run ``python -c "import cartopy; print(cartopy.config['data_dir'])"``. Copy the ``shapefiles`` folder and all contents over to this location.
#. ``cartopy`` should now be able to find these files for ``MPAS-Analysis``.`

List Analysis
-------------

If you installed the ``mpas-analysis`` package, list the available analysis tasks
by running:

.. code-block::

   mpas_analysis --list

If using a git repository, run:

.. code-block::

   python -m mpas_analysis --list

This lists all tasks and their tags.  These can be used in the ``generate``
command-line option or config option.  See ``mpas_analysis/config.default``
for more details.

Running the analysis
--------------------


#. Create and empty config file (say ``config.myrun``\ ), copy ``config.example``\ ,
   or copy one of the example files in the ``configs`` directory (if using a
   git repo) or download one from the
   `example configs directory <https://github.com/MPAS-Dev/MPAS-Analysis/tree/develop/configs>`_.
#. Either modify config options in your new file or copy and modify config
   options from ``mpas_analysis/config.default`` (in a git repo) or directly
   from GitHub:
   `config.default <https://github.com/MPAS-Dev/MPAS-Analysis/tree/develop/mpas_anlysis/config.default>`_.
#. If you installed the ``mpas-analysis`` package, run:
   ``mpas_analysis config.myrun``.  If using a git checkout, run:
   ``python -m mpas_analysis config.myrun``.  This will read the configuraiton
   first from ``mpas_analysis/config.default`` and then replace that
   configuraiton with any changes from from ``config.myrun``
#. 
   If you want to run a subset of the analysis, you can either set the
   ``generate`` option under ``[output]`` in your config file or use the
   ``--generate`` flag on the command line.  See the comments in
   ``mpas_analysis/config.default`` for more details on this option.

   **Requirements for custom config files:**


* At minimum you should set ``baseDirectory`` under ``[output]`` to the folder
  where output is stored.  **NOTE** this value should be a unique
  directory for each run being analyzed.  If multiple runs are analyzed in
  the same directory, cached results from a previous analysis will not be
  updated correctly.
* Any options you copy into the config file **must** include the
  appropriate section header (e.g. '[run]' or '[output]')
* You do not need to copy all options from ``mpas_analysis/config.default``.
  This file will automatically be used for any options you do not include
  in your custom config file.
* You should **not** modify ``mpas_analysis/config.default`` directly.

List of MPAS output files that are needed by MPAS-Analysis:
-----------------------------------------------------------


* mpas-o files:

  * ``mpaso.hist.am.timeSeriesStatsMonthly.*.nc`` (Note: since OHC
    anomalies are computed wrt the first year of the simulation,
    if OHC diagnostics is activated, the analysis will need the
    first full year of ``mpaso.hist.am.timeSeriesStatsMonthly.*.nc``
    files, no matter what ``[timeSeries]/startYear`` and
    ``[timeSeries]/endYear``  are. This is especially important to know if
    short term archiving is used in the run to analyze: in that case, set
    ``[input]/runSubdirectory``\ , ``[input]/oceanHistorySubdirectory`` and
    ``[input]/seaIceHistorySubdirectory`` to the appropriate run and archive
    directories and choose ``[timeSeries]/startYear`` and
    ``[timeSeries]/endYear`` to include only data that have been short-term
    archived).
  * ``mpaso.hist.am.meridionalHeatTransport.0001-03-01.nc`` (or any
    ``hist.am.meridionalHeatTransport`` file)
  * ``mpaso.rst.0002-01-01_00000.nc`` (or any other mpas-o restart file)
  * ``streams.ocean``
  * ``mpaso_in``

* mpas-seaice files:

  * ``mpasseaice.hist.am.timeSeriesStatsMonthly.*.nc``
  * ``mpasseaice.rst.0002-01-01_00000.nc`` (or any other mpas-seaice restart
    file)
  * ``streams.seaice``
  * ``mpassi_in``

Note: for older runs, mpas-seaice files will be named:


* ``mpascice.hist.am.timeSeriesStatsMonthly.*.nc``
* ``mpascice.rst.0002-01-01_00000.nc``
* ``streams.cice``
* ``mpas-cice_in``
  Also, for older runs mpaso-in will be named:
* ``mpas-o_in``

Purge Old Analysis
------------------

To purge old analysis (delete the whole output directory) before running run
the analysis, add the ``--purge`` flag.  If you installed ``mpas-analysis`` as
a package, run:

.. code-block::

   mpas_analysis --purge <config.file>

If you are running in the repo, use:

.. code-block::

   python -m mpas_analysis --purge <config.file>

All of the subdirectories listed in ``output`` will be deleted along with the
climatology subdirectories in ``oceanObservations`` and ``seaIceObservations``.

It is a good policy to use the purge flag for most changes to the config file,
for example, updating the start and/or end years of climatologies (and
sometimes time series), changing the resolution of a comparison grid, renaming
the run, changing the seasons over which climatologies are computed for a given
task, updating the code to the latest version.

Cases where it is reasonable not to purge would be, for example, changing
options that only affect plotting (color map, ticks, ranges, font sizes, etc.),
rerunning with a different set of tasks specified by the ``generate`` option
(though this will often cause climatologies to be re-computed with new
variables and may not save time compared with purging), generating only the
final website with ``--html_only``\ , and re-running after the simulation has
progressed to extend time series (however, not recommended for changing the
bounds on climatologies, see above).

Running in parallel via a queueing system
-----------------------------------------

If you are running from a git repo:


#. If you are running from a git repo, copy the appropriate job script file
   from ``configs/<machine_name>`` to the root directory (or another directory
   if preferred). The default cript, ``configs/job_script.default.bash``\ , is
   appropriate for a laptop or desktop computer with multiple cores.
#. If using the ``mpas-analysis`` conda package, download the job script and/or
   sample config file from the
   `example configs directory <https://github.com/MPAS-Dev/MPAS-Analysis/tree/develop/configs>`_.
#. Modify the number of parallel tasks, the run name, the output directory
   and the path to the config file for the run.
#. Note: the number of parallel tasks can be anything between 1 and the
   number of analysis tasks to be performed.  If there are more tasks than
   parallel tasks, later tasks will simply wait until earlier tasks have
   finished.
#. Submit the job using the modified job script

If a job script for your machine is not available, try modifying the default
job script in ``configs/job_script.default.bash`` or one of the job scripts for
another machine to fit your needs.

Instructions for creating a new analysis task
---------------------------------------------

Analysis tasks can be found in a directory corresponding to each component,
e.g., ``mpas_analysis/ocean`` for MPAS-Ocean. Shared functionality is contained
within the ``mpas_analysis/shared`` directory.


#. create a new task by ``copying mpas_analysis/analysis_task_template.py`` to
   the appropriate folder (\ ``ocean``\ , ``sea_ice``\ , etc.) and modifying it as
   described in the template.  Take a look at
   ``mpas_analysis/shared/analysis_task.py`` for additional guidance.
#. note, no changes need to be made to ``mpas_analysis/shared/analysis_task.py``
#. modify ``mpas_analysis/config.default`` (and possibly any machine-specific
   config files in ``configs/<machine>``\ )
#. import new analysis task in ``mpas_analysis/<component>/__init__.py``
#. add new analysis task to ``mpas_analysis/__main__.py`` under
   ``build_analysis_list``\ , see below.

A new analysis task can be added with:

.. code-block::

      analyses.append(<component>.MyTask(config, myArg='argValue'))

This will add a new object of the ``MyTask`` class to a list of analysis tasks
created in ``build_analysis_list``.  Later on in ``run_analysis``\ , it will first
go through the list to make sure each task needs to be generated
(by calling ``check_generate``\ , which is defined in ``AnalysisTask``\ ), then,
will call ``setup_and_check`` on each task (to make sure the appropriate AM is
on and files are present), and will finally call ``run`` on each task that is
to be generated and is set up properly.

Generating Documentation
------------------------

To generate the ``sphinx`` documentation, run:

.. code-block::

   conda config --add channels conda-forge
   conda remove -y --all -n mpas-analysis-docs
   conda env create -f docs/environment.yml
   conda install -y -n mpas-analysis-docs mock pillow sphinx sphinx_rtd_theme
   conda activate mpas-analysis-docs
   pip install .
   rm -rf build dist mpas_analysis.egg-info
   cd docs
   make clean
   make html