Update Readme

This commit is contained in:
Max Lübke 2024-05-06 09:09:24 +00:00
parent a12ac2c3d5
commit 0992143be5
5 changed files with 101 additions and 159 deletions

View File

@ -15,15 +15,10 @@ list(APPEND CMAKE_MODULE_PATH "${POET_SOURCE_DIR}/CMake")
get_poet_version() get_poet_version()
# set(GCC_CXX_FLAGS "-D STRICT_R_HEADERS") add_definitions(${GCC_CXX_FLAGS})
find_package(MPI REQUIRED) find_package(MPI REQUIRED)
find_package(RRuntime REQUIRED) find_package(RRuntime REQUIRED)
# add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
# add_link_options(-fsanitize=address)
add_subdirectory(src) add_subdirectory(src)
add_subdirectory(bench) add_subdirectory(bench)

108
README.md
View File

@ -20,10 +20,10 @@ pages](https://naaice.git-pages.gfz-potsdam.de/poet).
The following external header library is shipped with POET: The following external header library is shipped with POET:
- **argh** - https://github.com/adishavit/argh (BSD license) - **argh** - https://github.com/adishavit/argh (BSD license)
- **PhreeqcRM** with patches from GFZ - - **IPhreeqc** with patches from GFZ -
https://www.usgs.gov/software/phreeqc-version-3 - https://github.com/usgs-coupled/iphreeqc -
https://git.gfz-potsdam.de/mluebke/phreeqcrm-gfz https://git.gfz-potsdam.de/naaice/iphreeqc
- **tug** - https://git.gfz-potsdam.de/sec34/tug - **tug** - https://git.gfz-potsdam.de/naaice/tug
## Installation ## Installation
@ -35,6 +35,7 @@ To compile POET you need several software to be installed:
- MPI-Implementation (tested with OpenMPI and MVAPICH) - MPI-Implementation (tested with OpenMPI and MVAPICH)
- R language and environment - R language and environment
- CMake 3.9+ - CMake 3.9+
- Eigen3 3.4+ (required by `tug`)
- *optional*: `doxygen` with `dot` bindings for documentiation - *optional*: `doxygen` with `dot` bindings for documentiation
The following R libraries must then be installed, which will get the The following R libraries must then be installed, which will get the
@ -107,58 +108,50 @@ The correspondending directory tree would look like this:
```sh ```sh
poet poet
├── bin ├── bin
│ └── poet │   ├── poet
├── R_lib │   └── poet_init
│ └── kin_r_library.R
└── share └── share
└── poet └── poet
└── bench ├── barite
├── barite │   ├── barite_200.rds
│ ├── barite_interp_eval.R │   ├── barite_200_rt.R
│ ├── barite.pqi │   ├── barite_het.rds
│ ├── barite.R │   └── barite_het_rt.R
│ └── db_barite.dat ├── dolo
├── dolo │   ├── dolo_inner_large.rds
│ ├── dolo_diffu_inner_large.R │   ├── dolo_inner_large_rt.R
│ ├── dolo_diffu_inner.R │   ├── dolo_interp.rds
│ ├── dolo_inner.pqi │   └── dolo_interp_rt.R
│ ├── dolo_interp_long.R └── surfex
│ └── phreeqc_kin.dat ├── PoetEGU_surfex_500.rds
└── surfex └── PoetEGU_surfex_500_rt.R
├── ExBase.pqi
├── ex.R
├── SMILE_2021_11_01_TH.dat
├── SurfExBase.pqi
└── surfex.R
``` ```
The R libraries will be loaded at runtime and the paths are hardcoded With the installation of POET, two executables are provided:
absolute paths inside `poet.cpp`. So, if you consider to move - `poet` - the main executable to run simulations
`bin/poet` either change paths of the R source files and recompile - `poet_init` - a preprocessor to generate input files for POET from R scripts
POET or also move `R_lib/*` relative to the binary.
The benchmarks consist of input scripts, which are provided as .R files. Preprocessed benchmarks can be found in the `share/poet` directory with an
Additionally, Phreeqc scripts and their corresponding databases are required, according *runtime* setup. More on those files and how to create them later.
stored as .pqi and .dat files, respectively.
## Running ## Running
Run POET by `mpirun ./poet <OPTIONS> <SIMFILE> <OUTPUT_DIRECTORY>` Run POET by `mpirun ./poet [OPTIONS] <RUNFILE> <SIMFILE> <OUTPUT_DIRECTORY>`
where: where:
- **OPTIONS** - runtime parameters (explained below) - **OPTIONS** - POET options (explained below)
- **SIMFILE** - simulation described as R script (e.g. - **RUNFILE** - Runtime parameters described as R script
`<POET_INSTALL_DIR>/share/poet/bench/dolo/dolo_interp_long.R`) - **SIMFILE** - Simulation input prepared by `poet_init`
- **OUTPUT_DIRECTORY** - path, where all output of POET should be stored - **OUTPUT_DIRECTORY** - path, where all output of POET should be stored
### Runtime options ### POET options
The following parameters can be set: The following parameters can be set:
| Option | Value | Description | | Option | Value | Description |
|-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| |-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) | | **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) |
| **--ignore-result** | | disables store of simulation resuls | | **-P, --progress** | | show progress bar |
| **--dht** | | enabling DHT usage (defaults to _OFF_) | | **--dht** | | enabling DHT usage (defaults to _OFF_) |
| **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) | | **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) |
| **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) | | **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) |
@ -180,14 +173,16 @@ Following values can be set:
### Example: Running from scratch ### Example: Running from scratch
We will continue the above example and start a simulation with We will continue the above example and start a simulation with *barite_het*,
`dolo_diffu_inner.R`. As transport a simple fixed-coefficient diffusion is used. which simulation files can be found in
It's a 2D, 100x100 grid, simulating 10 time steps. To start the simulation with `<INSTALL_DIR>/share/poet/barite/barite_het*`. As transport a heterogeneous
4 processes `cd` into your previously installed POET-dir diffusion is used. It's a small 2D grid, 2x5 grid, simulating 50 time steps with
`<POET_INSTALL_DIR>/bin` and run: a time step size of 100 seconds. To start the simulation with 4 processes `cd`
into your previously installed POET-dir `<POET_INSTALL_DIR>/bin` and run:
```sh ```sh
mpirun -n 4 ./poet ../share/poet/bench/dolo/dolo_diffu_inner.R/ output cp ../share/poet/barite/barite_het* .
mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output
``` ```
After a finished simulation all data generated by POET will be found After a finished simulation all data generated by POET will be found
@ -200,9 +195,32 @@ produced. This is done by appending the `--dht-snaps=<value>` option. The
resulting call would look like this: resulting call would look like this:
```sh ```sh
mpirun -n 4 ./poet --dht --dht-snaps=2 ../share/poet/bench/dolo/dolo_diffu_inner.R/ output mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
``` ```
## Defining a model
In order to provide a model to POET, you need to setup a R script which can then
be used by `poet_init` to generate the simulation input. Which parameters are
required can be found in the
[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). We try to
keep the document up-to-date. However, if you encounter missing information or
need help, please get in touch with us via the issue tracker or E-Mail.
`poet_init` can be used as follows:
```sh
./poet_init [-o, --output output_file] [-s, --setwd] <script.R>
```
where:
- **output** - name of the output file (defaults to the input file name
with the extension `.rds`)
- **setwd** - set the working directory to the directory of the input file (e.g.
to allow relative paths in the input script). However, the output file
will be stored in the directory from which `poet_init` was called.
## About the usage of MPI_Wtime() ## About the usage of MPI_Wtime()
Implemented time measurement functions uses `MPI_Wtime()`. Some Implemented time measurement functions uses `MPI_Wtime()`. Some

View File

@ -14,7 +14,6 @@ if(DOXYGEN_FOUND)
doxygen_add_docs(doxygen doxygen_add_docs(doxygen
${PROJECT_SOURCE_DIR}/src ${PROJECT_SOURCE_DIR}/src
${PROJECT_SOURCE_DIR}/README.md ${PROJECT_SOURCE_DIR}/README.md
${PROJECT_SOURCE_DIR}/docs/Input_Scripts.md
${PROJECT_SOURCE_DIR}/docs/Output.md ${PROJECT_SOURCE_DIR}/docs/Output.md
COMMENT "Generate html pages") COMMENT "Generate html pages")
endif() endif()

View File

@ -1,86 +0,0 @@
# Input Scripts
In the following the expected schemes of the input scripts is described.
Therefore, each section of the input script gets its own chapter. All sections
should return a `list` as results, which are concatenated to one setup list at
the end of the file. All values must have the same name in order to get parsed
by POET.
## Grid initialization
| name | type | description |
|----------------|----------------|-----------------------------------------------------------------------|
| `n_cells` | Numeric Vector | Number of cells in each direction |
| `s_cells` | Numeric Vector | Spatial resolution of grid in each direction |
| `type` | String | Type of initialization, can be set to *scratch*, *phreeqc* or *rds* |
## Diffusion parameters
| name | type | description |
|----------------|----------------------|-------------------------------------------|
| `init` | Named Numeric Vector | Initial state for each diffused species |
| `vecinj` | Data Frame | Defining all boundary conditions row wise |
| `vecinj_inner` | List of Triples | Inner boundaries |
| `vecinj_index` | List of 4 elements | Ghost nodes boundary conditions |
| `alpha` | Named Numeric Vector | Constant alpha for each species |
### Remark on boundary conditions
Each boundary condition should be defined in `vecinj` as a data frame, where one
row holds one boundary condition.
To define inner (constant) boundary conditions, use a list of triples in
`vecinj_inner`, where each triples is defined by $(i,x,y)$. $i$ is defining the
boundary condition, referencing to the row in `vecinj`. $x$ and $y$ coordinates
then defining the position inside the grid.
Ghost nodes are set by `vecinj_index` which is a list containing boundaries for
each celestial direction (**important**: named by `N, E, S, W`). Each direction
is a numeric vector, also representing a row index of the `vecinj` data frame
for each ghost node, starting at the left-most and upper cell respectively. By
setting the boundary condition to $0$, the ghost node is set as closed boundary.
#### Example
Suppose you have a `vecinj` data frame defining 2 boundary conditions and a grid
consisting of $10 \times 10$ grid cells. Grid cell $(1,1)$ should be set to the
first boundary condition and $(5,6)$ to the second. Also, all boundary
conditions for the ghost nodes should be closed. Except the southern boundary,
which should be set to the first boundary condition injection. The following
setup describes how to setup your initial script, where `n` and `m` are the
grids cell count for each direction ($n = m = 10$):
```R
vecinj_inner <- list (
l1 = c(1, 1, 1),
l2 = c(2, 5, 6)
)
vecinj_index <- list(
"N" = rep(0, n),
"E" = rep(0, m),
"S" = rep(1, n),
"W" = rep(0, m)
)
```
## Chemistry parameters
| name | type | description |
|----------------|--------------|----------------------------------------------------------------------------------|
| `database` | String | Path to the Phreeqc database |
| `input_script` | String | Path the the Phreeqc input script |
| `dht_species` | Named Vector | Indicates significant digits to use for each species for DHT rounding. |
| `pht_species` | Named Vector | Indicates significant digits to use for each species for Interpolation rounding. |
## Final setup
| name | type | description |
|----------------|----------------|------------------------------------------------------------|
| `grid` | List | Grid parameter list |
| `diffusion` | List | Diffusion parameter list |
| `chemistry` | List | Chemistry parameter list |
| `iterations` | Numeric Value | Count of iterations |
| `timesteps` | Numeric Vector | $\Delta t$ to use for specific iteration |
| `store_result` | Boolean | Indicates if results should be stored |
| `out_save` | Numeric Vector | *optional:* At which iteration the states should be stored |

View File

@ -35,34 +35,50 @@ corresponding values can be found in `<OUTPUT_DIRECTORY>/timings.rds`
and possible to read out within a R runtime with and possible to read out within a R runtime with
`readRDS("timings.rds")`. There you will find the following values: `readRDS("timings.rds")`. There you will find the following values:
| Value | Description | | Value | Description |
|--------------------|----------------------------------------------------------------------------| | --------- | -------------------------------------------------------------------------- |
| simtime | time spent in whole simulation loop without any initialization and cleanup | | simtime | time spent in whole simulation loop without any initialization and cleanup |
| simtime\_transport | measured time in *transport* subroutine | | chemistry | measured time in *chemistry* subroutine |
| simtime\_chemistry | measured time in *chemistry* subroutine (actual parallelized part) | | diffusion | measured time in *diffusion* subroutine |
### chemistry subsetting ### Chemistry subsetting
If running parallel there are also measured timings which are subsets of | Value | Description |
*simtime\_chemistry*. | ------------- | --------------------------------------------------------- |
| simtime | overall runtime of chemistry |
| loop | time spent in send/recv loop of master |
| sequential | sequential part of the master (e.g. shuffling field) |
| idle\_master | idling time of the master waiting for workers |
| idle\_worker | idling time (waiting for work from master) of the workers |
| phreeqc\_time | accumulated times for Phreeqc calls of every worker |
| Value | Description | #### DHT usage
|-----------------------|-----------------------------------------------------------|
| chemistry\_loop | time spent in send/recv loop of master |
| chemistry\_sequential | sequential part of master chemistry |
| idle\_master | idling time (waiting for any free worker) of the master |
| idle\_worker | idling time (waiting for work from master) of the workers |
| phreeqc\_time | accumulated times for Phreeqc calls of every worker |
### DHT usage {#DHT-usage}
If running in parallel and with activated DHT, two more timings and also If running in parallel and with activated DHT, two more timings and also
some profiling about the DHT usage are given: some profiling about the DHT usage are given:
| Value | Description | | Value | Description |
|-----------------|---------------------------------------------------------| | --------------- | ------------------------------------------------------- |
| dht\_fill\_time | time to write data to DHT | | dht\_hits | count of data points retrieved from DHT |
| dht\_get\_time | time to retreive data from DHT |
| dh\_hits | count of data points retrieved from DHT |
| dht\_miss | count of misses/count of data points written to DHT |
| dht\_evictions | count of data points evicted by another write operation | | dht\_evictions | count of data points evicted by another write operation |
| dht\_get\_time | time to retreive data from DHT |
| dht\_fill\_time | time to write data to DHT |
#### Interpolation
If using interpolation, the following values are given:
| Value | Description |
| -------------- | --------------------------------------------------------------------- |
| interp\_w | time spent to write to PHT |
| interp\_r | time spent to read from DHT/PHT/Cache |
| interp\_g | time spent to gather results from DHT |
| interp\_fc | accumulated time spent in interpolation function call |
| interp\_calls | count of interpolations |
| interp\_cached | count of interpolation data sets, which where cached in the local map |
### Diffusion subsetting
| Value | Description |
| --------- | ------------------------------------------ |
| simtime | overall runtime of diffusion |