Tools¶

Some additional tools are provided with the library.

Benchmarking¶

Benchmarking tools for testing and comparing outputs between different files. Some of these functions are also used for testing.

pysd.tools.benchmarking.runner(model_file, canonical_file=None, transpose=False, data_files=None)[source]¶

Translates and runs a model and returns its output and the canonical output.

Parameters:

model_file (str) – Name of the original model file. Must be ‘.mdl’ or ‘.xmile’.
canonical_file (str or None (optional)) – Canonical output file to read. If None, will search for ‘output.csv’ and ‘output.tab’ in the model directory. Default is None.
transpose (bool (optional)) – If True reads transposed canonical file, i.e. one variable per row. Default is False.
data_files (list (optional)) – List of the data files needed to run the model.

Returns:

output, canon – pandas.DataFrame of the model output and the canonical output.

Return type:

(pandas.DataFrame, pandas.DataFrame)

pysd.tools.benchmarking.assert_frames_close(actual, expected, assertion='raise', verbose=False, precision=2, **kwargs)[source]¶

Compare DataFrame items by column and raise AssertionError if any column is not equal.

Ordering of columns is unimportant, items are compared only by label. NaN and infinite values are supported.

Parameters:

actual (pandas.DataFrame) – Actual value from the model output.
expected (pandas.DataFrame) – Expected model output.
assertion (str (optional)) – “raise” if an error should be raised when not able to assert that two frames are close. If “warning”, it will show a warning message. If “return” it will return information. Default is “raise”.
verbose (bool (optional)) – If True, if any column is not close the actual and expected values will be printed in the error/warning message with the difference. Default is False.
precision (int (optional)) – Precision to print the numerical values of assertion verbosed message. Default is 2.
kwargs – Optional rtol and atol values for assert_allclose.

Returns:

(cols, first_false_time, first_false_cols) or None – If assertion is ‘return’, return the sets of the all columns that are different. The time when the first difference was found and the variables that what different at that time. If assertion is not ‘return’ it returns None.

Return type:

(set, float, set) or None

Examples

>>> assert_frames_close(
...     pd.DataFrame(100, index=range(5), columns=range(3)),
...     pd.DataFrame(100, index=range(5), columns=range(3)))

>>> assert_frames_close(
...     pd.DataFrame(100, index=range(5), columns=range(3)),
...     pd.DataFrame(110, index=range(5), columns=range(3)),
...     rtol=.2)

>>> assert_frames_close(
...     pd.DataFrame(100, index=range(5), columns=range(3)),
...     pd.DataFrame(150, index=range(5), columns=range(3)),
...     rtol=.2)  
Traceback (most recent call last):
...
AssertionError:
Following columns are not close:
    '0'

>>> assert_frames_close(
...     pd.DataFrame(100, index=range(5), columns=range(3)),
...     pd.DataFrame(150, index=range(5), columns=range(3)),
...     verbose=True, rtol=.2)  
Traceback (most recent call last):
...
AssertionError:
Following columns are not close:
    '0'
Column '0' is not close.
Expected values:
    [150, 150, 150, 150, 150]
Actual values:
    [100, 100, 100, 100, 100]
Difference:
    [50, 50, 50, 50, 50]

>>> assert_frames_close(
...     pd.DataFrame(100, index=range(5), columns=range(3)),
...     pd.DataFrame(150, index=range(5), columns=range(3)),
...     rtol=.2, assertion="warn")
...
UserWarning:
Following columns are not close:
    '0'

References

Derived from:: http://nbviewer.jupyter.org/gist/jiffyclub/ac2e7506428d5e1d587b

pysd.tools.benchmarking.assert_allclose(x, y, rtol=1e-05, atol=1e-05)[source]¶

Asserts if numeric values from two arrays are close.

Parameters:

x (ndarray) – Expected value.
y (ndarray) – Actual value.
rtol (float (optional)) – Relative tolerance on the error. Default is 1.e-5.
atol (float (optional)) – Absolut tolerance on the error. Default is 1.e-5.

Return type:

None

Exporting netCDF data_vars to csv or tab¶

Simulation results can be stored as netCDF (.nc) files (see Storing simulation results on a file).

The pysd.tools.ncfiles.NCFile allows loading netCDF files generated with PySD as an xarray.Dataset. When passing the argument parallel=True to the constructor, xarray.DataArray inside the Dataset will be loded as dask arrays, with chunks=-1.

Once the Dataset is loaded, a subset (or all) of the data_vars can be exported into:

A pandas.DataFrame, using the pysd.tools.ncfiles.NCFile.to_df() method
A *.csv or *.tab files, using the the pysd.tools.ncfiles.NCFile.to_text_file() method

Alternatively, to get further control of the chunking, users can load the xarray.Dataset using xarray.open_dataset() and then use the pysd.tools.ncfiles.NCFile.ds_to_df() or pysd.tools.ncfiles.NCFile.df_to_text_file() static methods.

Tools for importing and converting netCDF files generated from simulations run using PySD.

class pysd.tools.ncfiles.NCFile(filename: str | Path, parallel: bool | None = False)[source]¶

Helper class to extract data from netCDF files.

Parameters:

ncfile (str or pathlib.Path) – Path to the netCDF file to process.
parallel (bool (optional)) – When True, the Dataset is opened using chunks=-1 (see xarray documentation for details) and DataArrays are processed in parallel using dask delayed. Dask is not included as a requirement for pysd, hence it must be installed separately. Setting parallel=True is highly recommended when the Dataset contains large multidimensional DataArrays.

to_text_file(outfile: str | Path | None = 'result.tab', subset: list | None = None, time_in_row: bool | None = False) → <Mock name='mock.DataFrame' id='139963400489680'>[source]¶

Convert netCDF file contents into comma separated or tab delimited file.

Parameters:

outfile (str or pathlib.Path (optional)) – Path to the output file.
subset (list (optional)) – List of variables to export from the netCDF.
time_in_row (bool (optional)) – Whether time increases along row. Default is False.

Returns:

df – Dataframe with all colums specified in subset.

Return type:

pandas.DataFrame

to_df(subset: list | None = None) → <Mock name='mock.DataFrame' id='139963400489680'>[source]¶

Wrapper to ds_to_df static method. Convert xarray.Dataset into a pandas DataFrame.

Parameters:: subset (list (optional)) – List of variables to export from the Dataset.
Returns:: df – Dataframe with all colums specified in subset.
Return type:: pandas.DataFrame

open_nc() → <Mock name='mock.Dataset' id='139963400472480'>[source]¶

Loads netCDF file into xarray Dataset. It’s basically a wrapper to xr.open_dataset to simplify the interface for pysd use case (loading simulation results).

Return type:: xarray.Dataset

static ds_to_df(ds: <Mock name='mock.Dataset' id='139963400472480'>, subset: list | None = None, parallel: bool | None = False, index_dim: str | None = 'time') → <Mock name='mock.DataFrame' id='139963400489680'>[source]¶

Convert xarray.Dataset into a pandas DataFrame.

Parameters:

ds (xarray.Dataset) – Dataset object.
subset (list (optional)) – List of variables to export from the Dataset.
parallel (bool (optional)) – When True, DataArrays are processed in parallel using dask delayed. Setting parallel=True is highly recommended when DataArrays are large and multidimensional.
index_dim (str (optional)) – Name of dimensions to use as index of the resulting DataFrame (usually “time”).

Returns:

df – Dataframe with all colums specified in subset.

Return type:

pandas.DataFrame

static df_to_text_file(df: <Mock name='mock.DataFrame' id='139963400489680'>, outfile: ~pathlib.Path, time_in_row: bool | None = False) → None[source]¶

Store pandas DataFrame into csv or tab file.

Parameters:

df (pandas.DataFrame) – DataFrame to save as csv or tab file.
outfile (str or pathlib.Path) – Path of the output file.
time_in_row (bool (optional)) – Whether time increases along a column or a row.

Return type:

None

static da_to_dict(da: <Mock name='mock.DataArray' id='139963400566832'>, index_dim: str) → dict[source]¶

Splits a DataArray into a dictionary, with keys equal to the name of the variable plus all combinations of the cartesian product of coordinates within brackets, and values equal to the data corresponding to those coordinates along the index_dim dimension.

Parameters:: index_dim (str) – The coordinates of this dimension will not be fixed during indexing of the DataArray (i.e. the indexed data will be a scalar or an array along this dimension).

static da_to_dict_delayed(da: <Mock name='mock.DataArray' id='139963400566832'>, index_dim: str) → dict[source]¶

Same as da_to_dict, but using dask delayed and compute. This function runs much faster when da is a dask array (chunked).

To use it on its own, you must first make the following imports:

from dask import delayed, compute from dask.diagnostics import ProgressBar

Parameters:: index_dim (str) – The coordinates of this dimension will not be fixed during indexing (the indexed data will be an array along this dimension).

static dict_to_df(d: dict) → <Mock name='mock.DataFrame' id='139963400489680'>[source]¶

Convert a dict to a pandas Dataframe.

Parameters:: d (dict) – Dictionary to convert to pandas DataFrame.