Update Readme

This commit is contained in:
Max Lübke 2024-05-06 09:09:24 +00:00
parent 20a0c453b0
commit 2e265443b9
5 changed files with 101 additions and 159 deletions

View File

@ -15,15 +15,10 @@ list(APPEND CMAKE_MODULE_PATH "${POET_SOURCE_DIR}/CMake")
get_poet_version()
# set(GCC_CXX_FLAGS "-D STRICT_R_HEADERS") add_definitions(${GCC_CXX_FLAGS})
find_package(MPI REQUIRED)
find_package(RRuntime REQUIRED)
# add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
# add_link_options(-fsanitize=address)
add_subdirectory(src)
add_subdirectory(bench)

108
README.md
View File

@ -20,10 +20,10 @@ pages](https://naaice.git-pages.gfz-potsdam.de/poet).
The following external header library is shipped with POET:
- **argh** - https://github.com/adishavit/argh (BSD license)
- **PhreeqcRM** with patches from GFZ -
https://www.usgs.gov/software/phreeqc-version-3 -
https://git.gfz-potsdam.de/mluebke/phreeqcrm-gfz
- **tug** - https://git.gfz-potsdam.de/sec34/tug
- **IPhreeqc** with patches from GFZ -
https://github.com/usgs-coupled/iphreeqc -
https://git.gfz-potsdam.de/naaice/iphreeqc
- **tug** - https://git.gfz-potsdam.de/naaice/tug
## Installation
@ -35,6 +35,7 @@ To compile POET you need several software to be installed:
- MPI-Implementation (tested with OpenMPI and MVAPICH)
- R language and environment
- CMake 3.9+
- Eigen3 3.4+ (required by `tug`)
- *optional*: `doxygen` with `dot` bindings for documentiation
The following R libraries must then be installed, which will get the
@ -107,58 +108,50 @@ The correspondending directory tree would look like this:
```sh
poet
├── bin
│ └── poet
├── R_lib
│ └── kin_r_library.R
│   ├── poet
│   └── poet_init
└── share
└── poet
└── bench
├── barite
│ ├── barite_interp_eval.R
│ ├── barite.pqi
│ ├── barite.R
│ └── db_barite.dat
├── dolo
│ ├── dolo_diffu_inner_large.R
│ ├── dolo_diffu_inner.R
│ ├── dolo_inner.pqi
│ ├── dolo_interp_long.R
│ └── phreeqc_kin.dat
└── surfex
├── ExBase.pqi
├── ex.R
├── SMILE_2021_11_01_TH.dat
├── SurfExBase.pqi
└── surfex.R
├── barite
│   ├── barite_200.rds
│   ├── barite_200_rt.R
│   ├── barite_het.rds
│   └── barite_het_rt.R
├── dolo
│   ├── dolo_inner_large.rds
│   ├── dolo_inner_large_rt.R
│   ├── dolo_interp.rds
│   └── dolo_interp_rt.R
└── surfex
├── PoetEGU_surfex_500.rds
└── PoetEGU_surfex_500_rt.R
```
The R libraries will be loaded at runtime and the paths are hardcoded
absolute paths inside `poet.cpp`. So, if you consider to move
`bin/poet` either change paths of the R source files and recompile
POET or also move `R_lib/*` relative to the binary.
With the installation of POET, two executables are provided:
- `poet` - the main executable to run simulations
- `poet_init` - a preprocessor to generate input files for POET from R scripts
The benchmarks consist of input scripts, which are provided as .R files.
Additionally, Phreeqc scripts and their corresponding databases are required,
stored as .pqi and .dat files, respectively.
Preprocessed benchmarks can be found in the `share/poet` directory with an
according *runtime* setup. More on those files and how to create them later.
## Running
Run POET by `mpirun ./poet <OPTIONS> <SIMFILE> <OUTPUT_DIRECTORY>`
Run POET by `mpirun ./poet [OPTIONS] <RUNFILE> <SIMFILE> <OUTPUT_DIRECTORY>`
where:
- **OPTIONS** - runtime parameters (explained below)
- **SIMFILE** - simulation described as R script (e.g.
`<POET_INSTALL_DIR>/share/poet/bench/dolo/dolo_interp_long.R`)
- **OPTIONS** - POET options (explained below)
- **RUNFILE** - Runtime parameters described as R script
- **SIMFILE** - Simulation input prepared by `poet_init`
- **OUTPUT_DIRECTORY** - path, where all output of POET should be stored
### Runtime options
### POET options
The following parameters can be set:
| Option | Value | Description |
|-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) |
| **--ignore-result** | | disables store of simulation resuls |
| **-P, --progress** | | show progress bar |
| **--dht** | | enabling DHT usage (defaults to _OFF_) |
| **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) |
| **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) |
@ -180,14 +173,16 @@ Following values can be set:
### Example: Running from scratch
We will continue the above example and start a simulation with
`dolo_diffu_inner.R`. As transport a simple fixed-coefficient diffusion is used.
It's a 2D, 100x100 grid, simulating 10 time steps. To start the simulation with
4 processes `cd` into your previously installed POET-dir
`<POET_INSTALL_DIR>/bin` and run:
We will continue the above example and start a simulation with *barite_het*,
which simulation files can be found in
`<INSTALL_DIR>/share/poet/barite/barite_het*`. As transport a heterogeneous
diffusion is used. It's a small 2D grid, 2x5 grid, simulating 50 time steps with
a time step size of 100 seconds. To start the simulation with 4 processes `cd`
into your previously installed POET-dir `<POET_INSTALL_DIR>/bin` and run:
```sh
mpirun -n 4 ./poet ../share/poet/bench/dolo/dolo_diffu_inner.R/ output
cp ../share/poet/barite/barite_het* .
mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output
```
After a finished simulation all data generated by POET will be found
@ -200,9 +195,32 @@ produced. This is done by appending the `--dht-snaps=<value>` option. The
resulting call would look like this:
```sh
mpirun -n 4 ./poet --dht --dht-snaps=2 ../share/poet/bench/dolo/dolo_diffu_inner.R/ output
mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
```
## Defining a model
In order to provide a model to POET, you need to setup a R script which can then
be used by `poet_init` to generate the simulation input. Which parameters are
required can be found in the
[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). We try to
keep the document up-to-date. However, if you encounter missing information or
need help, please get in touch with us via the issue tracker or E-Mail.
`poet_init` can be used as follows:
```sh
./poet_init [-o, --output output_file] [-s, --setwd] <script.R>
```
where:
- **output** - name of the output file (defaults to the input file name
with the extension `.rds`)
- **setwd** - set the working directory to the directory of the input file (e.g.
to allow relative paths in the input script). However, the output file
will be stored in the directory from which `poet_init` was called.
## About the usage of MPI_Wtime()
Implemented time measurement functions uses `MPI_Wtime()`. Some

View File

@ -14,7 +14,6 @@ if(DOXYGEN_FOUND)
doxygen_add_docs(doxygen
${PROJECT_SOURCE_DIR}/src
${PROJECT_SOURCE_DIR}/README.md
${PROJECT_SOURCE_DIR}/docs/Input_Scripts.md
${PROJECT_SOURCE_DIR}/docs/Output.md
COMMENT "Generate html pages")
endif()

View File

@ -1,86 +0,0 @@
# Input Scripts
In the following the expected schemes of the input scripts is described.
Therefore, each section of the input script gets its own chapter. All sections
should return a `list` as results, which are concatenated to one setup list at
the end of the file. All values must have the same name in order to get parsed
by POET.
## Grid initialization
| name | type | description |
|----------------|----------------|-----------------------------------------------------------------------|
| `n_cells` | Numeric Vector | Number of cells in each direction |
| `s_cells` | Numeric Vector | Spatial resolution of grid in each direction |
| `type` | String | Type of initialization, can be set to *scratch*, *phreeqc* or *rds* |
## Diffusion parameters
| name | type | description |
|----------------|----------------------|-------------------------------------------|
| `init` | Named Numeric Vector | Initial state for each diffused species |
| `vecinj` | Data Frame | Defining all boundary conditions row wise |
| `vecinj_inner` | List of Triples | Inner boundaries |
| `vecinj_index` | List of 4 elements | Ghost nodes boundary conditions |
| `alpha` | Named Numeric Vector | Constant alpha for each species |
### Remark on boundary conditions
Each boundary condition should be defined in `vecinj` as a data frame, where one
row holds one boundary condition.
To define inner (constant) boundary conditions, use a list of triples in
`vecinj_inner`, where each triples is defined by $(i,x,y)$. $i$ is defining the
boundary condition, referencing to the row in `vecinj`. $x$ and $y$ coordinates
then defining the position inside the grid.
Ghost nodes are set by `vecinj_index` which is a list containing boundaries for
each celestial direction (**important**: named by `N, E, S, W`). Each direction
is a numeric vector, also representing a row index of the `vecinj` data frame
for each ghost node, starting at the left-most and upper cell respectively. By
setting the boundary condition to $0$, the ghost node is set as closed boundary.
#### Example
Suppose you have a `vecinj` data frame defining 2 boundary conditions and a grid
consisting of $10 \times 10$ grid cells. Grid cell $(1,1)$ should be set to the
first boundary condition and $(5,6)$ to the second. Also, all boundary
conditions for the ghost nodes should be closed. Except the southern boundary,
which should be set to the first boundary condition injection. The following
setup describes how to setup your initial script, where `n` and `m` are the
grids cell count for each direction ($n = m = 10$):
```R
vecinj_inner <- list (
l1 = c(1, 1, 1),
l2 = c(2, 5, 6)
)
vecinj_index <- list(
"N" = rep(0, n),
"E" = rep(0, m),
"S" = rep(1, n),
"W" = rep(0, m)
)
```
## Chemistry parameters
| name | type | description |
|----------------|--------------|----------------------------------------------------------------------------------|
| `database` | String | Path to the Phreeqc database |
| `input_script` | String | Path the the Phreeqc input script |
| `dht_species` | Named Vector | Indicates significant digits to use for each species for DHT rounding. |
| `pht_species` | Named Vector | Indicates significant digits to use for each species for Interpolation rounding. |
## Final setup
| name | type | description |
|----------------|----------------|------------------------------------------------------------|
| `grid` | List | Grid parameter list |
| `diffusion` | List | Diffusion parameter list |
| `chemistry` | List | Chemistry parameter list |
| `iterations` | Numeric Value | Count of iterations |
| `timesteps` | Numeric Vector | $\Delta t$ to use for specific iteration |
| `store_result` | Boolean | Indicates if results should be stored |
| `out_save` | Numeric Vector | *optional:* At which iteration the states should be stored |

View File

@ -35,34 +35,50 @@ corresponding values can be found in `<OUTPUT_DIRECTORY>/timings.rds`
and possible to read out within a R runtime with
`readRDS("timings.rds")`. There you will find the following values:
| Value | Description |
|--------------------|----------------------------------------------------------------------------|
| simtime | time spent in whole simulation loop without any initialization and cleanup |
| simtime\_transport | measured time in *transport* subroutine |
| simtime\_chemistry | measured time in *chemistry* subroutine (actual parallelized part) |
| Value | Description |
| --------- | -------------------------------------------------------------------------- |
| simtime | time spent in whole simulation loop without any initialization and cleanup |
| chemistry | measured time in *chemistry* subroutine |
| diffusion | measured time in *diffusion* subroutine |
### chemistry subsetting
### Chemistry subsetting
If running parallel there are also measured timings which are subsets of
*simtime\_chemistry*.
| Value | Description |
| ------------- | --------------------------------------------------------- |
| simtime | overall runtime of chemistry |
| loop | time spent in send/recv loop of master |
| sequential | sequential part of the master (e.g. shuffling field) |
| idle\_master | idling time of the master waiting for workers |
| idle\_worker | idling time (waiting for work from master) of the workers |
| phreeqc\_time | accumulated times for Phreeqc calls of every worker |
| Value | Description |
|-----------------------|-----------------------------------------------------------|
| chemistry\_loop | time spent in send/recv loop of master |
| chemistry\_sequential | sequential part of master chemistry |
| idle\_master | idling time (waiting for any free worker) of the master |
| idle\_worker | idling time (waiting for work from master) of the workers |
| phreeqc\_time | accumulated times for Phreeqc calls of every worker |
### DHT usage {#DHT-usage}
#### DHT usage
If running in parallel and with activated DHT, two more timings and also
some profiling about the DHT usage are given:
| Value | Description |
|-----------------|---------------------------------------------------------|
| dht\_fill\_time | time to write data to DHT |
| dht\_get\_time | time to retreive data from DHT |
| dh\_hits | count of data points retrieved from DHT |
| dht\_miss | count of misses/count of data points written to DHT |
| --------------- | ------------------------------------------------------- |
| dht\_hits | count of data points retrieved from DHT |
| dht\_evictions | count of data points evicted by another write operation |
| dht\_get\_time | time to retreive data from DHT |
| dht\_fill\_time | time to write data to DHT |
#### Interpolation
If using interpolation, the following values are given:
| Value | Description |
| -------------- | --------------------------------------------------------------------- |
| interp\_w | time spent to write to PHT |
| interp\_r | time spent to read from DHT/PHT/Cache |
| interp\_g | time spent to gather results from DHT |
| interp\_fc | accumulated time spent in interpolation function call |
| interp\_calls | count of interpolations |
| interp\_cached | count of interpolation data sets, which where cached in the local map |
### Diffusion subsetting
| Value | Description |
| --------- | ------------------------------------------ |
| simtime | overall runtime of diffusion |