poet/README.md
2024-10-19 18:29:41 +02:00

341 lines
13 KiB
Markdown

# POET
[POET](https://doi.org/10.5281/zenodo.4757913) is a coupled reactive
transport simulator implementing a parallel architecture and a fast,
original MPI-based Distributed Hash Table.
![POET's Coupling Scheme](./docs/Scheme_POET_en.svg)
## Parsed code documentiation
A parsed version of POET's documentation can be found at [Gitlab
pages](https://naaice.git-pages.gfz-potsdam.de/poet).
## External Libraries
The following external libraries are shipped with POET:
- **CLI11** - <https://github.com/CLIUtils/CLI11>
- **IPhreeqc** with patches from GFZ/UP -
<https://github.com/usgs-coupled/iphreeqc> -
<https://git.gfz-potsdam.de/naaice/iphreeqc>
- **tug** - <https://git.gfz-potsdam.de/naaice/tug>
## Installation
### Requirements
To compile POET you need following software to be installed:
- C/C++ compiler (tested with GCC)
- MPI-Implementation (tested with OpenMPI and MVAPICH)
- CMake 3.9+
- Eigen3 3.4+ (required by `tug`)
- *optional*: `doxygen` with `dot` bindings for documentation
- R language and environment including headers or `-dev` packages
(distro dependent)
The following R packages (and their dependencies) must also be
installed:
- [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html)
- [RInside](https://cran.r-project.org/web/packages/RInside/index.html)
- [qs](https://cran.r-project.org/web/packages/qs/index.html)
This can be simply achieved by issuing the following commands:
```sh
# start R environment
$ R
# install R dependencies (case sensitive!)
> install.packages(c("Rcpp", "RInside","qs"))
> q(save="no")
```
### Clone the repository
POET can be anonimously cloned from this repo over https. Make sure to
also download the submodules:
```sh
git clone --recurse-submodules https://git.gfz-potsdam.de/naaice/poet.git
```
The `--recurse-submodules` option is a shorthand for:
```sh
cd poet
git submodule init && git submodule update
```
### Compiling source code
POET is built with CMake. You can generate Makefiles by running the
usual:
```sh
mkdir build && cd build
cmake ..
```
This will create the directory `build` and processes the CMake files
and generate Makefiles from it. You're now able to run `make` to start
build process.
If everything went well you'll find the executables at
`build/src/poet`, but it is recommended to install the POET project
structure to a desired `CMAKE_INSTALL_PREFIX` with `make install`.
During the generation of Makefiles, various options can be specified
via `cmake -D <option>=<value> [...]`. Currently, there are the
following available options:
- **POET_DHT_Debug**=_boolean_ - toggles the output of detailed
statistics about DHT usage. Defaults to _OFF_.
- **POET_ENABLE_TESTING**=_boolean_ - enables small set of unit tests
(more to come). Defaults to _OFF_.
- **POET_PHT_ADDITIONAL_INFO**=_boolean_ - enabling the count of
accesses to one PHT bucket. Use with caution, as things will get
slowed down significantly. Defaults to _OFF_.
- **POET_PREPROCESS_BENCHS**=*boolean* - enables the preprocessing of
predefined models/benchmarks. Defaults to *ON*.
- **USE_AI_SURROGATE**=*boolean* - includes the functions of the AI
surrogate model. When active, CMake relies on `find_package()` to find
an a implementation of `Threads` and a Python environment where Numpy
and Keras need to be installed. Defaults to _OFF_.
### Example: Build from scratch
Assuming that only the C/C++ compiler, MPI libraries, R runtime
environment and CMake have been installed, POET can be installed as
follows:
```sh
# start R environment
$ R
# install R dependencies
> install.packages(c("Rcpp", "RInside","qs"))
> q(save="no")
# cd into POET project root
$ cd <POET_dir>
# Build process
$ mkdir build && cd build
$ cmake -DCMAKE_INSTALL_PREFIX=/home/<user>/poet ..
$ make -j<max_numprocs>
$ make install
```
This will install a POET project structure into `/home/<user>/poet`
which is called hereinafter `<POET_INSTALL_DIR>`. With this version of
POET we **do not recommend** to install to hierarchies like
`/usr/local/` etc.
The correspondending directory tree would look like this:
```sh
poet
├── bin
│   ├── poet
│   └── poet_init
└── share
└── poet
├── barite
│   ├── barite_200.rds
│   ├── barite_200_rt.R
│   ├── barite_het.rds
│   └── barite_het_rt.R
├── dolo
│   ├── dolo_inner_large.rds
│   ├── dolo_inner_large_rt.R
│   ├── dolo_interp.rds
│   └── dolo_interp_rt.R
└── surfex
├── PoetEGU_surfex_500.rds
└── PoetEGU_surfex_500_rt.R
```
With the installation of POET, two executables are provided:
- `poet` - the main executable to run simulations
- `poet_init` - a preprocessor to generate input files for POET from
R scripts
Preprocessed benchmarks can be found in the `share/poet` directory
with an according *runtime* setup. More on those files and how to
create them later.
## Running
Run POET by `mpirun ./poet [OPTIONS] <RUNFILE> <SIMFILE>
<OUTPUT_DIRECTORY>` where:
- **OPTIONS** - POET options (explained below)
- **RUNFILE** - Runtime parameters described as R script
- **SIMFILE** - Simulation input prepared by `poet_init`
- **OUTPUT_DIRECTORY** - path, where all output of POET should be
stored
### POET command line arguments
The following parameters can be set:
| Option | Value | Description |
|-----------------------------|--------------|----------------------------------------------------------------------------------|
| **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) |
| **-P, --progress** | | show progress bar |
| **--ai-surrogate** | | activates the AI surrogate chemistry model (defaults to _OFF_) |
| **--dht** | | enabling DHT usage (defaults to _OFF_) |
| **--qs** | | store results using qs::qsave() (.qs extension) instead of default RDS (.rds) |
| **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) |
| **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) |
| **--dht-snaps=** | _0-2_ | disable or enable storage of DHT snapshots |
| **--dht-file=** | `<SNAPSHOT>` | initializes DHT with the given snapshot file |
| **--interp-size** | _1-n_ | size of PHT (interpolation) per process in megabyte |
| **--interp-bucket-entries** | _1-n_ | number of entries to store at maximum in one PHT bucket |
| **--interp-min** | _1-n_ | number of entries in PHT bucket needed to start interpolation |
#### Additions to `dht-snaps`
Following values can be set:
- _0_ = snapshots are disabled
- _1_ = only stores snapshot at the end of the simulation with name
`<OUTPUT_DIRECTORY>.dht`
- _2_ = stores snapshot at the end and after each iteration iteration
snapshot files are stored in `<DIRECTORY>/iter<n>.dht`
### Example: Running from scratch
We will continue the above example and start a simulation with
*barite_het*, which simulation files can be found in
`<INSTALL_DIR>/share/poet/barite/barite_het*`. As transport a
heterogeneous diffusion is used. It's a small 2D grid, 2x5 grid,
simulating 50 time steps with a time step size of 100 seconds. To
start the simulation with 4 processes `cd` into your previously
installed POET-dir `<POET_INSTALL_DIR>/bin` and run:
```sh
cp ../share/poet/barite/barite_het* .
mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output
```
After a finished simulation all data generated by POET will be found
in the directory `output`.
You might want to use the DHT to cache previously simulated data and
reuse them in further time-steps. Just append `--dht` to the options
of POET to activate the usage of the DHT. Also, after each iteration a
DHT snapshot shall be produced. This is done by appending the
`--dht-snaps=<value>` option. The resulting call would look like this:
```sh
mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
```
### Example: Preparing Environment and Running with AI surrogate
To run the AI surrogate, you need to have a Keras installed in your
Python environment. The implementation in POET is agnostic to the exact
Keras version, but the provided model file must match your Keras version.
Using Keras 3 with `.keras` model files is recommended. The compilation
process of POET remains mostly the same as shown above, but the CMake
option `-DUSE_AI_SURROGATE=ON` must be set.
To use the AI surrogate, you must declare several values in the R input
script. This can be either done directly in the input script or in an
additional file. This file can be provided by adding the file path as the
element `ai_surrogate_input_script` to the `chemistry_setup` list in the
R input script.
The following variables and functions must be declared:
- `model_file_path` [*string*]: Path to the Keras model file with which
the AI surrogate model is initialized.
- `validate_predictions(predictors, prediction)` [*function*]: Returns a boolean
vector of length `nrow(predictions)`. The output of this function defines
which predictions are considered valid and which are rejected. Regular
simulation will only be done for the rejected values, and the results
will be added to the training data buffer of the AI surrogate model.
Can eg. be implemented as a mass balance threshold between the predictors
and the prediction.
The following variables and functions can be declared:
- `batch_size` [*int*]: Batch size for the inference and training functions,
defaults to 2560.
- `training_epochs` [*int*]: Number of training epochs with each training data
set, defaults to 20.
- `training_data_size` [*int*]: Size of the training data buffer. After the
buffer has been filled, the model starts training and removes this amount of
data from the front of the buffer. Defaults to the size of the Field.
- `use_Keras_predictions` [*bool*]: Decides if the Keras prediction function
should be used instead of the custom C++ implementation (Keras might be faster
for larger models, especially on GPU). Defaults to false.
- `disable_training` [*bool*]: Deactivates the training functions.
- `save_model_path` [*string*]: After each training step the current model
is saved to this path as a .keras file.
- `preprocess(df)` [*function*]:
Returns the scaled/transformed data frame. The default implementation uses no
scaling or transformations.
- `postprocess(df)` [*function*]:
Returns the rescaled/backtransformed data frame. The combination of preprocess() and postprocess() is expected to be idempotent. The default implementation uses no
scaling or transformations.
```sh
cd <installation_dir>/bin
# copy the benchmark files to the installation directory
cp <project_root_dir>/bench/barite/{barite_50ai*,db_barite.dat,barite.pqi} .
# preprocess the benchmark
./poet_init barite_50ai.R
# run POET with AI surrogate and GPU utilization
srun --gres=gpu -N 1 -n 12 ./poet --ai-surrogate barite_50ai_rt.R barite_50ai.rds output
```
Keep in mind that the AI surrogate is currently not stable or might also not
produce any valid predictions.
## Defining a model
In order to provide a model to POET, you need to setup a R script
which can then be used by `poet_init` to generate the simulation
input. Which parameters are required can be found in the
[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization).
We try to keep the document up-to-date. However, if you encounter
missing information or need help, please get in touch with us via the
issue tracker or E-Mail.
`poet_init` can be used as follows:
```sh
./poet_init [-o, --output output_file] [-s, --setwd] <script.R>
```
where:
- **output** - name of the output file (defaults to the input file
name with the extension `.rds`)
- **setwd** - set the working directory to the directory of the input
file (e.g. to allow relative paths in the input script). However,
the output file will be stored in the directory from which
`poet_init` was called.
## About the usage of MPI_Wtime()
Implemented time measurement functions uses `MPI_Wtime()`. Some
important information from the OpenMPI Man Page:
For example, on platforms that support it, the clock_gettime()
function will be used to obtain a monotonic clock value with whatever
precision is supported on that platform (e.g., nanoseconds).