mirror of
https://git.gfz-potsdam.de/naaice/poet.git
synced 2025-12-16 12:54:50 +01:00
359 lines
14 KiB
Markdown
359 lines
14 KiB
Markdown
# POET
|
|
|
|
[POET](https://doi.org/10.5281/zenodo.4757913) is a coupled reactive
|
|
transport simulator implementing a parallel architecture and a fast,
|
|
original MPI-based Distributed Hash Table.
|
|
|
|

|
|
|
|
## Parsed code documentiation
|
|
|
|
A parsed version of POET's documentation can be found at [Gitlab
|
|
pages](https://naaice.git-pages.gfz-potsdam.de/poet).
|
|
|
|
## External Libraries
|
|
|
|
The following external libraries are shipped with POET:
|
|
|
|
- **CLI11** - <https://github.com/CLIUtils/CLI11>
|
|
- **IPhreeqc** with patches from GFZ/UP -
|
|
<https://github.com/usgs-coupled/iphreeqc> -
|
|
<https://git.gfz-potsdam.de/naaice/iphreeqc>
|
|
- **tug** - <https://git.gfz-potsdam.de/naaice/tug>
|
|
|
|
## Installation
|
|
|
|
### Requirements
|
|
|
|
To compile POET you need following software to be installed:
|
|
|
|
- C/C++ compiler (tested with GCC)
|
|
- MPI-Implementation (tested with OpenMPI and MVAPICH)
|
|
- CMake 3.9+
|
|
- Eigen3 3.4+ (required by `tug`)
|
|
- *optional*: `doxygen` with `dot` bindings for documentation
|
|
- R language and environment including headers or `-dev` packages
|
|
(distro dependent)
|
|
|
|
The following R packages (and their dependencies) must also be
|
|
installed:
|
|
|
|
- [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html)
|
|
- [RInside](https://cran.r-project.org/web/packages/RInside/index.html)
|
|
- [qs](https://cran.r-project.org/web/packages/qs/index.html)
|
|
|
|
This can be simply achieved by issuing the following commands:
|
|
|
|
```sh
|
|
# start R environment
|
|
$ R
|
|
|
|
# install R dependencies (case sensitive!)
|
|
> install.packages(c("Rcpp", "RInside","qs"))
|
|
> q(save="no")
|
|
```
|
|
|
|
### Clone the repository
|
|
|
|
POET can be anonimously cloned from this repo over https. Make sure to
|
|
also download the submodules:
|
|
|
|
```sh
|
|
git clone --recurse-submodules https://git.gfz-potsdam.de/naaice/poet.git
|
|
```
|
|
The `--recurse-submodules` option is a shorthand for:
|
|
```sh
|
|
cd poet
|
|
git submodule init && git submodule update
|
|
```
|
|
|
|
### Compiling source code
|
|
|
|
POET is built with CMake. You can generate Makefiles by running the
|
|
usual:
|
|
|
|
```sh
|
|
mkdir build && cd build
|
|
cmake ..
|
|
```
|
|
|
|
This will create the directory `build` and processes the CMake files
|
|
and generate Makefiles from it. You're now able to run `make` to start
|
|
build process.
|
|
|
|
If everything went well you'll find the executables at
|
|
`build/src/poet`, but it is recommended to install the POET project
|
|
structure to a desired `CMAKE_INSTALL_PREFIX` with `make install`.
|
|
|
|
During the generation of Makefiles, various options can be specified
|
|
via `cmake -D <option>=<value> [...]`. Currently, there are the
|
|
following available options:
|
|
|
|
- **POET_DHT_Debug**=_boolean_ - toggles the output of detailed
|
|
statistics about DHT usage. Defaults to _OFF_.
|
|
- **POET_ENABLE_TESTING**=_boolean_ - enables small set of unit tests
|
|
(more to come). Defaults to _OFF_.
|
|
- **POET_PHT_ADDITIONAL_INFO**=_boolean_ - enabling the count of
|
|
accesses to one PHT bucket. Use with caution, as things will get
|
|
slowed down significantly. Defaults to _OFF_.
|
|
- **POET_PREPROCESS_BENCHS**=*boolean* - enables the preprocessing of
|
|
predefined models/benchmarks. Defaults to *ON*.
|
|
- **USE_AI_SURROGATE**=*boolean* - includes the functions of the AI
|
|
surrogate model. When active, CMake relies on `find_package()` to find
|
|
an a implementation of `Threads` and a Python environment where Numpy
|
|
and Keras need to be installed. Defaults to _OFF_.
|
|
|
|
|
|
### Example: Build from scratch
|
|
|
|
Assuming that only the C/C++ compiler, MPI libraries, R runtime
|
|
environment and CMake have been installed, POET can be installed as
|
|
follows:
|
|
|
|
```sh
|
|
# start R environment
|
|
$ R
|
|
|
|
# install R dependencies
|
|
> install.packages(c("Rcpp", "RInside","qs"))
|
|
> q(save="no")
|
|
|
|
# cd into POET project root
|
|
$ cd <POET_dir>
|
|
|
|
# Build process
|
|
$ mkdir build && cd build
|
|
$ cmake -DCMAKE_INSTALL_PREFIX=/home/<user>/poet ..
|
|
$ make -j<max_numprocs>
|
|
$ make install
|
|
```
|
|
|
|
This will install a POET project structure into `/home/<user>/poet`
|
|
which is called hereinafter `<POET_INSTALL_DIR>`. With this version of
|
|
POET we **do not recommend** to install to hierarchies like
|
|
`/usr/local/` etc.
|
|
|
|
The correspondending directory tree would look like this:
|
|
|
|
```sh
|
|
poet
|
|
├── bin
|
|
│ ├── poet
|
|
│ └── poet_init
|
|
└── share
|
|
└── poet
|
|
├── barite
|
|
│ ├── barite_200.rds
|
|
│ ├── barite_200_rt.R
|
|
│ ├── barite_het.rds
|
|
│ └── barite_het_rt.R
|
|
├── dolo
|
|
│ ├── dolo_inner_large.rds
|
|
│ ├── dolo_inner_large_rt.R
|
|
│ ├── dolo_interp.rds
|
|
│ └── dolo_interp_rt.R
|
|
└── surfex
|
|
├── PoetEGU_surfex_500.rds
|
|
└── PoetEGU_surfex_500_rt.R
|
|
```
|
|
|
|
With the installation of POET, two executables are provided:
|
|
- `poet` - the main executable to run simulations
|
|
- `poet_init` - a preprocessor to generate input files for POET from
|
|
R scripts
|
|
|
|
Preprocessed benchmarks can be found in the `share/poet` directory
|
|
with an according *runtime* setup. More on those files and how to
|
|
create them later.
|
|
|
|
## Running
|
|
|
|
Run POET by `mpirun ./poet [OPTIONS] <RUNFILE> <SIMFILE>
|
|
<OUTPUT_DIRECTORY>` where:
|
|
|
|
- **OPTIONS** - POET options (explained below)
|
|
- **RUNFILE** - Runtime parameters described as R script
|
|
- **SIMFILE** - Simulation input prepared by `poet_init`
|
|
- **OUTPUT_DIRECTORY** - path, where all output of POET should be
|
|
stored
|
|
|
|
### POET command line arguments
|
|
|
|
The following parameters can be set:
|
|
|
|
| Option | Value | Description |
|
|
|-----------------------------|--------------|----------------------------------------------------------------------------------|
|
|
| **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) |
|
|
| **-P, --progress** | | show progress bar |
|
|
| **--ai-surrogate** | | activates the AI surrogate chemistry model (defaults to _OFF_) |
|
|
| **--dht** | | enabling DHT usage (defaults to _OFF_) |
|
|
| **--qs** | | store results using qs::qsave() (.qs extension) instead of default RDS (.rds) |
|
|
| **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) |
|
|
| **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) |
|
|
| **--dht-snaps=** | _0-2_ | disable or enable storage of DHT snapshots |
|
|
| **--dht-file=** | `<SNAPSHOT>` | initializes DHT with the given snapshot file |
|
|
| **--interp-size** | _1-n_ | size of PHT (interpolation) per process in megabyte |
|
|
| **--interp-bucket-entries** | _1-n_ | number of entries to store at maximum in one PHT bucket |
|
|
| **--interp-min** | _1-n_ | number of entries in PHT bucket needed to start interpolation |
|
|
|
|
#### Additions to `dht-snaps`
|
|
|
|
Following values can be set:
|
|
|
|
- _0_ = snapshots are disabled
|
|
- _1_ = only stores snapshot at the end of the simulation with name
|
|
`<OUTPUT_DIRECTORY>.dht`
|
|
- _2_ = stores snapshot at the end and after each iteration iteration
|
|
snapshot files are stored in `<DIRECTORY>/iter<n>.dht`
|
|
|
|
### Example: Running from scratch
|
|
|
|
We will continue the above example and start a simulation with
|
|
*barite_het*, which simulation files can be found in
|
|
`<INSTALL_DIR>/share/poet/barite/barite_het*`. As transport a
|
|
heterogeneous diffusion is used. It's a small 2D grid, 2x5 grid,
|
|
simulating 50 time steps with a time step size of 100 seconds. To
|
|
start the simulation with 4 processes `cd` into your previously
|
|
installed POET-dir `<POET_INSTALL_DIR>/bin` and run:
|
|
|
|
```sh
|
|
cp ../share/poet/barite/barite_het* .
|
|
mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output
|
|
```
|
|
|
|
After a finished simulation all data generated by POET will be found
|
|
in the directory `output`.
|
|
|
|
You might want to use the DHT to cache previously simulated data and
|
|
reuse them in further time-steps. Just append `--dht` to the options
|
|
of POET to activate the usage of the DHT. Also, after each iteration a
|
|
DHT snapshot shall be produced. This is done by appending the
|
|
`--dht-snaps=<value>` option. The resulting call would look like this:
|
|
|
|
```sh
|
|
mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
|
|
```
|
|
|
|
### Example: Preparing Environment and Running with AI surrogate
|
|
|
|
To run the AI surrogate, you need to have a Keras installed in your
|
|
Python environment. The implementation in POET is agnostic to the exact
|
|
Keras version, but the provided model file must match your Keras version.
|
|
Using Keras 3 with `.keras` model files is recommended. The compilation
|
|
process of POET remains mostly the same as shown above, but the CMake
|
|
option `-DUSE_AI_SURROGATE=ON` must be set.
|
|
|
|
To use the AI surrogate, you must declare several values in the R input
|
|
script. This can be either done directly in the input script or in an
|
|
additional file. This file can be provided by adding the file path as the
|
|
element `ai_surrogate_input_script` to the `chemistry_setup` list in the
|
|
R input script.
|
|
|
|
The following variables and functions must be declared:
|
|
- `model_file_path` [*string*]: Path to the Keras model file with which
|
|
the AI surrogate model is initialized.
|
|
|
|
- `validate_predictions(predictors, prediction)` [*function*]: Must return a
|
|
boolean vector of length `nrow(predictions)`. The output of this function
|
|
defines which predictions are considered valid and which are rejected.
|
|
the predictors and predictions are passed in their original original (not
|
|
transformed) scale. Regular simulation will only be done for the rejected
|
|
values. The input data of the rejected rows and the respective true results
|
|
from simulation will be added to the training data buffer of the AI surrogate
|
|
model. Can eg. be implemented as a mass balance threshold between the
|
|
predictors and the prediction.
|
|
|
|
|
|
The following variables and functions can be declared:
|
|
- `batch_size` [*int*]: Batch size for the inference and training functions,
|
|
defaults to 2560.
|
|
|
|
- `training_epochs` [*int*]: Number of training epochs with each training data
|
|
set, defaults to 20.
|
|
|
|
- `training_data_size` [*int*]: Size of the training data buffer. After
|
|
the buffer has been filled, the model starts training and removes this amount
|
|
of data from the front of the buffer. Defaults to the size of the Field.
|
|
|
|
- `use_Keras_predictions` [*bool*]: Decides if the Keras prediction function
|
|
should be used instead of the custom C++ implementation (Keras might be faster
|
|
for larger models, especially on GPU). Defaults to false.
|
|
|
|
- `disable_training` [*bool*]: Deactivates the training functions. Defaults to
|
|
false.
|
|
|
|
- `train_only_invalid` [*bool*]: Use only the data from PHREEQC for training
|
|
instead of the whole field (which might contain the models own predictions).
|
|
Defaults to false.
|
|
|
|
- `save_model_path` [*string*]: After each training step the current model
|
|
is saved to this path as a .keras file.
|
|
|
|
- `preprocess(df)` [*function*]:
|
|
Returns the scaled/transformed data frame. The default implementation uses no
|
|
scaling or transformations.
|
|
|
|
- `postprocess(df)` [*function*]: Returns the rescaled/backtransformed data frame.
|
|
The combination of preprocess() and postprocess() is expected to be idempotent.
|
|
The default implementation uses no scaling or transformations.
|
|
|
|
- `assign_clusters(df)` [*function*]: Must return a vector of length
|
|
`nrow(predictions)` that contains cluster labels as 0/1. According to these
|
|
labels, two separate models will be used for inference and training. Cluster
|
|
assignemnts can e.g. be done for the reactive and non reactive parts of the
|
|
field.
|
|
|
|
- `model_reactive_file_path` [*string*]: Path to the Keras model file with
|
|
which the AI surrogate model for the reactive cluster is initialized. If
|
|
ommitted, the models for both clusters will be initialized from
|
|
`model_file_path`
|
|
|
|
```sh
|
|
cd <installation_dir>/bin
|
|
|
|
# copy the benchmark files to the installation directory
|
|
cp <project_root_dir>/bench/barite/{barite_50ai*,db_barite.dat,barite.pqi} .
|
|
|
|
# preprocess the benchmark
|
|
./poet_init barite_50ai.R
|
|
|
|
# run POET with AI surrogate and GPU utilization
|
|
srun --gres=gpu -N 1 -n 12 ./poet --ai-surrogate barite_50ai_rt.R barite_50ai.rds output
|
|
```
|
|
|
|
Keep in mind that the AI surrogate is currently not stable or might also not
|
|
produce any valid predictions.
|
|
|
|
## Defining a model
|
|
|
|
In order to provide a model to POET, you need to setup a R script
|
|
which can then be used by `poet_init` to generate the simulation
|
|
input. Which parameters are required can be found in the
|
|
[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization).
|
|
We try to keep the document up-to-date. However, if you encounter
|
|
missing information or need help, please get in touch with us via the
|
|
issue tracker or E-Mail.
|
|
|
|
`poet_init` can be used as follows:
|
|
|
|
```sh
|
|
./poet_init [-o, --output output_file] [-s, --setwd] <script.R>
|
|
```
|
|
|
|
where:
|
|
|
|
- **output** - name of the output file (defaults to the input file
|
|
name with the extension `.rds`)
|
|
- **setwd** - set the working directory to the directory of the input
|
|
file (e.g. to allow relative paths in the input script). However,
|
|
the output file will be stored in the directory from which
|
|
`poet_init` was called.
|
|
|
|
## About the usage of MPI_Wtime()
|
|
|
|
Implemented time measurement functions uses `MPI_Wtime()`. Some
|
|
important information from the OpenMPI Man Page:
|
|
|
|
For example, on platforms that support it, the clock_gettime()
|
|
function will be used to obtain a monotonic clock value with whatever
|
|
precision is supported on that platform (e.g., nanoseconds). |