Projects#

Introduction#

Projects are the recommended way to organize your work in acm. They are a way to group together all the components of a research project, such as the statistic computation, the emulator training, the cosmological inference, etc. This makes it easier to keep track of the different components of your project, and to share your work with others.

The projects are stored under the projects directory in the acm repository. Each project is a subdirectory of projects, and contains all the scripts, tests, notebooks, etc. specific to a dataset and/or an analysis.

Project dependant

Each project defines its own way to handle the statistics and files, as well as the organization of the code.

Bridges with the acm package#

acm.observables classes#

The acm.observables classes are used to define a class for each statistic, that handles :

  • the formatting of the statistic to a single file (see Storing the data)

  • retrieving the statistic from the data files, the model and its errors

  • applying filters to the data and model outputs

The base class defines an abstract class that should be inherited by the project classes.

The methods to implement are :

  • stat_name : the name of the statistic

  • paths : a dictionary with the paths to the data, model and errors (see Storing the data)

  • summary_coords_dict : a dictionary with the coordinates of the summary statistics (see Coordinates)

Note

Some methods are not implemented in the base class, but are not mandatory to implement if they are not needed. However, calling those methods will raise a NotImplementedError if they are not implemented in the project class. Those methods are :

  • create_lhc : to create the LHC file

  • create_covariance : to create the covariance array that will be stored in the LHC file

  • create_emulator_error : to create the emulator error file

  • create_emulator_covariance : to create the emulator covariance file

Those methods depend on the file format of the statistics, and cannot be defined in the base class. However, if the creation of the files is not needed, the methods can be left unimplemented.

Then, the class can be used to call the statistics, the parameters, the model, get a prediction, etc.

To combine statistics, a CombinedObservables class is available, that takes a list of Observables classes and combines them in a single class. The methods are the same, with the filters applied to each statistic.

See also

For examples of how to use these classes, see examples

Global parameters#

Some parameters are shared between the different components of the project, such as the paths, and the coordinates. To avoid copying all this information trough the classes, we recommend creating a path and a coordinates dictionary in a default.py file in the project, that can be accessed by all the components of the project.

Integration in the acm package#

You can integrate the project to the acm package, by adding a subfolder in acm/projects with the name of the project, and adding the project classes names in the __init__.py file. In this file, you can add the default.py file, and all the statistics classes created trough the acm.observables classes (we recommend one file per statistic).

This way, the statistic handling can be easily accessed trough the acm package, and the project can be shared with others.

How to add a project or a statistic to the acm package#

Adding a project#

To add a project to the acm package, you need to create a subfolder in the acm/projects directory, with the name of the project. In this folder, you need to add :

  • An __init__.py file, that imports the project classes

  • A default.py file, that contains the default paths and coordinates dictionaries for the project

  • A base.py file, that contains the base Observable class for the project

  • The acm.observables classes files that handle the statistics for the project

You also can create a subfolder in the projects directory, that contains all the scripts, notebooks, tests, etc. specific to the project.

Adding a statistic#

To add the computation of the statistic to the acm package, you need to provide a class in acm.estimators that computes the statistic.

Note

It is also recommended to provide a documentation for the statistic, that explains how the statistic is computed, and what it represents, to be added to the doc under the Statistics section.

See also

To handle the statistic in a project, see acm.observables classes

API#

class acm.observables.base.Observable(stat_name: str, paths: dict = None, select_filters: dict = None, slice_filters: dict = None, select_indices: list = None, select_indices_on: list = ['y', 'covariance_y', 'emulator_error', 'emulator_covariance_y'], flat_output_dims: int = None, squeeze_output: bool = False, numpy_output: bool = False)[source]#

Bases: object

Class to load a compressed Observable file or model and apply filters to their outputs.

Parameters:
  • stat_name (str) – Name of the statistic to load. Also the name of the file containing the data.

  • paths (dict, optional) – Paths to the compressed Observable files or models. If None, the internal dataset will be None. Defaults to None.

  • select_filters (dict, optional) – Filters to select values in coordinates. Defaults to None.

  • slice_filters (dict, optional) – Filters to slice values in coordinates. Defaults to None.

  • select_indices (list, optional) – Indices to select in the flattened data vector. Cannot be used with select_filters or slice_filters. Defaults to None.

  • select_indices_on (list, optional) – List of data variables to apply the indices selection on. Defaults to [‘y’, ‘covariance_y’, ‘emulator_error’, ‘emulator_covariance_y’].

  • flat_output_dims (int, optional) – If 2, the output will be flattened on two dimensions (sample and features). If 1, the output will be flattened on a single dimension (dims) - Not recommended. If None, the output will not be flattened. Defaults to None.

  • squeeze_output (bool, optional) – If True, the output will be squeezed to remove single-dimensional entries. Defaults to False.

  • numpy_output (bool, optional) – If True, the output will be converted to a numpy array. Defaults to False.

  • Paths

  • -----

  • paths[key]/stat_name.npy (The data is expected to be in)

  • stored. (in which an xarray DataSet is)

  • are (The possible keys) –

    • ‘data_dir’: directory containing the data (x, y)

    • ’covariance_dir’: directory containing the covariance of the data (covariance_y)

    • ’error_dir’: directory containing the emulator error of the data (emulator_error, emulator_covariance_y)

    • ’model_dir’: directory containing the trained model (model.pth)

    • ’checkpoint_name’: name of the checkpoint file (default: ‘model.pth’)

Example

slice_filters = {'sep': (0, 0.5),}
select_filters = {'multipoles': [0, 2],}

will return the summary statistics for 0 < sep < 0.5 and multipoles 0 and 2

static stack_on_attribute(attribute: str | dict, dataarray: DataArray, **kwargs) DataArray[source]#

Stacks a DataArray on the dimensions given.

Parameters:
  • attribute (str | Mapping) – The dimension(s) to stack on. If a string, will be read from the DataArray attributes. Will be used as the dim to stack on (see xarray.DataArray.stack)

  • dataarray (xarray.DataArray) – The DataArray to stack the dimensions on.

  • **kwargs – Additional keyword arguments to pass to the stack method.

Returns:

The stacked DataArray

Return type:

xarray.DataArray

apply_filters(dataarray: DataArray) DataArray[source]#

Apply the class filters on a given DataArray or Dataset.

Parameters:

dataarray (xarray.DataArray) – The DataArray to apply the filters on.

Returns:

The filtered DataArray.

Return type:

xarray.DataArray

flatten_output(dataarray: DataArray) DataArray[source]#

Flatten the output of a given DataArray by stacking all dimensions over attributes ‘sample’ and ‘features’, containing the list of dimensions to stack on.

If flat_output_dims is 2, stacks on both ‘sample’ and ‘features’ attributes. If flat_output_dims is 1, stacks all dimensions into a single dimension ‘dims’. Otherwise, returns the DataArray as is.

Parameters:

dataarray (xarray.DataArray) – The DataArray to flatten.

Returns:

The flattened DataArray.

Return type:

xarray.DataArray

apply_indices_selection(dataarray: DataArray) DataArray[source]#

Apply the indices selection on a given DataArray. Should be called after filters are applied and before flattening. Does nothing if select_indices is None.

Parameters:

dataarray (xarray.DataArray) – The DataArray to apply the indices selection on.

Returns:

The DataArray with the selected indices.

Return type:

xarray.DataArray

get_coordinate_list(name: str) list[source]#

Returns the list of values of a coordinate of the dataset

Parameters:

name (str) – The name of the coordinate to retrieve.

Returns:

The list of values of the specified coordinate.

Return type:

list

property x_names: list#

Returns the list of the parameters coordinate of the x dataset.

Returns:

The list of the parameters of the x dataset.

Return type:

list

property emulator_error#

Returns the emulator error of the statistic, with filters applied. Reads the emulator error from the error_dir if it is provided, otherwise uses the get_emulator_error method if implemented.

property emulator_covariance_y#

Returns the covariance of the emulator error of the statistic, with filters applied. Reads the emulator covariance from the error_dir if it is provided, otherwise uses the get_emulator_covariance_y method if implemented.

property checkpoint_fn: str#

Path to the checkpoint file of the model, constructed from the paths and the statistic name.

load_model(checkpoint_fn: str = None) sunbird.emulators.FCN[source]#

Trained theory model.

get_model_prediction(x, model=None, coords=None, attrs=None, nofilters: bool = False) DataArray[source]#

Get the prediction from the model.

Parameters:
  • x (array_like, dict) – Input features.

  • model (FCN) – Trained theory model. If None, the model attribute of the class is used. Defaults to None.

  • coords (dict, optional) – Coordinates for the output DataArray. If None, the coordinates of _dataset.y are used. Defaults to None.

  • attrs (dict, optional) – Attributes for the output DataArray. If None, the attributes of _dataset.y are used. Defaults to None.

  • nofilters (bool, optional) – If True, no filters are applied to the output and the full DataArray is returned. Defaults to False.

Returns:

Model prediction.

Return type:

array_like

get_covariance_matrix(volume_factor: float = 64, prefactor: float = 1) ndarray[source]#

Covariance matrix for the statistic. The prefactor is here for corrections if needed, and the volume factor is the volume correction of the boxes.

get_emulator_covariance_matrix(prefactor: float = 1) ndarray[source]#

Emulator covariance matrix for the statistic. The prefactor is here for corrections if needed.

get_save_handle(save_dir: str | Path = None) str | Path[source]#

Creates a handle that includes the statistics and filters used. This can be used to save anything related to this observable.

Parameters:

save_dir (str) – Directory where the results will be saved. If provided, the directory is created if it does not exist. If None, the handle is returned as a string. Default is None.

Returns:

The handle for saving the results, to be completed with the file extension. Returned as a Path instance if save_dir is provided as a Path.

Return type:

str|Path

class acm.observables.combined.CombinedModel(observables: list[Observable])[source]#

Bases: object

Class for the combination of theory models.

Parameters:

observables (list[Observable]) – List of observables to be combined, initialized with their respective filters.

get_prediction(x)[source]#

Get the prediction from the model.

Parameters:

x (array_like) – Input features.

Returns:

Model prediction, with respective filters applied to each observable.

Return type:

array_like

class acm.observables.combined.CombinedObservable(observables: list[Observable])[source]#

Bases: object

Class for the combination of observables. It has list properties, that allow to access easily self.observables for readibility.

Parameters:

observables (list[Observable]) – List of observables to be combined, initialized with their respective filters.

property stat_name: list#

Name of the statistic.

property x: ndarray#

Input features (samples).

Note: We assume all observable have the same input features, so we just return the first from the list.

property x_names: list#

Names of the input features.

Note: We assume all observable have the same input features, so we just return the first from the list.

property model#

Theory model of the combination of observables. model.get_prediction(x) returns the prediction of the combination of observables, with the respective filters applied to each observable.

get_model_prediction(x) ndarray[source]#

Get the prediction from the model.

Parameters:

x (array_like) – Input features.

Returns:

Model prediction.

Return type:

array_like

get_covariance_matrix(volume_factor: float = 64, prefactor: float = 1) ndarray[source]#

Covariance matrix for the statistic. The prefactor is here for corrections if needed, and the volume factor is the volume correction of the boxes.

get_emulator_covariance_matrix(prefactor: float = 1) ndarray[source]#

Emulator covariance matrix for the statistic. The prefactor is here for corrections if needed.

get_save_handle(save_dir: str | Path = None) str | Path[source]#

Creates a handle that combines the handles of the observables, separated by a ‘+’. They contain the statistic name and the filters used. This can be used to save anything related to this observable.

Parameters:

save_dir (str) – Directory where the results will be saved. If provided, the directory is created if it does not exist. If None, the handle is returned as a string. Default is None.

Returns:

The handle for saving the results, to be completed with the file extension. Returned as a Path instance if save_dir is provided as a Path.

Return type:

str|Path