Update Readme

2025-12-15 12:28:22 +01:00 · 2024-05-06 09:09:24 +00:00 · 2024-05-06 09:09:24 +00:00 · 2e265443b9
commit 2e265443b9
parent 20a0c453b0
5 changed files with 101 additions and 159 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -15,15 +15,10 @@ list(APPEND CMAKE_MODULE_PATH "${POET_SOURCE_DIR}/CMake")

 get_poet_version()

-# set(GCC_CXX_FLAGS "-D STRICT_R_HEADERS") add_definitions(${GCC_CXX_FLAGS})
-
 find_package(MPI REQUIRED)

 find_package(RRuntime REQUIRED)

-# add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
-# add_link_options(-fsanitize=address)
-
 add_subdirectory(src)
 add_subdirectory(bench)

--- a/README.md
+++ b/README.md
@ -20,10 +20,10 @@ pages](https://naaice.git-pages.gfz-potsdam.de/poet).
 The following external header library is shipped with POET:

 - **argh** - https://github.com/adishavit/argh (BSD license)
- **PhreeqcRM** with patches from GFZ -
-  https://www.usgs.gov/software/phreeqc-version-3 -
-  https://git.gfz-potsdam.de/mluebke/phreeqcrm-gfz
- **tug** - https://git.gfz-potsdam.de/sec34/tug
+- **IPhreeqc** with patches from GFZ -
+  https://github.com/usgs-coupled/iphreeqc -
+  https://git.gfz-potsdam.de/naaice/iphreeqc
+- **tug** - https://git.gfz-potsdam.de/naaice/tug

 ## Installation

@ -35,6 +35,7 @@ To compile POET you need several software to be installed:
 - MPI-Implementation (tested with OpenMPI and MVAPICH)
 - R language and environment
 - CMake 3.9+
+- Eigen3 3.4+ (required by `tug`)
 - *optional*: `doxygen` with `dot` bindings for documentiation

 The following R libraries must then be installed, which will get the
@ -107,58 +108,50 @@ The correspondending directory tree would look like this:
 ```sh
 poet
 ├── bin
-│   └── poet
-├── R_lib
-│   └── kin_r_library.R
+│   ├── poet
+│   └── poet_init
 └── share
    └── poet
-        └── bench
-            ├── barite
-            │   ├── barite_interp_eval.R
-            │   ├── barite.pqi
-            │   ├── barite.R
-            │   └── db_barite.dat
-            ├── dolo
-            │   ├── dolo_diffu_inner_large.R
-            │   ├── dolo_diffu_inner.R
-            │   ├── dolo_inner.pqi
-            │   ├── dolo_interp_long.R
-            │   └── phreeqc_kin.dat
-            └── surfex
-                ├── ExBase.pqi
-                ├── ex.R
-                ├── SMILE_2021_11_01_TH.dat
-                ├── SurfExBase.pqi
-                └── surfex.R
+        ├── barite
+        │   ├── barite_200.rds
+        │   ├── barite_200_rt.R
+        │   ├── barite_het.rds
+        │   └── barite_het_rt.R
+        ├── dolo
+        │   ├── dolo_inner_large.rds
+        │   ├── dolo_inner_large_rt.R
+        │   ├── dolo_interp.rds
+        │   └── dolo_interp_rt.R
+        └── surfex
+            ├── PoetEGU_surfex_500.rds
+            └── PoetEGU_surfex_500_rt.R
 ```

-The R libraries will be loaded at runtime and the paths are hardcoded
-absolute paths inside `poet.cpp`. So, if you consider to move
-`bin/poet` either change paths of the R source files and recompile
-POET or also move `R_lib/*` relative to the binary.
+With the installation of POET, two executables are provided: 
+  - `poet` - the main executable to run simulations
+  - `poet_init` - a preprocessor to generate input files for POET from R scripts

-The benchmarks consist of input scripts, which are provided as .R files.
-Additionally, Phreeqc scripts and their corresponding databases are required,
-stored as .pqi and .dat files, respectively.
+Preprocessed benchmarks can be found in the `share/poet` directory with an
+according *runtime* setup. More on those files and how to create them later. 

 ## Running

-Run POET by `mpirun ./poet <OPTIONS> <SIMFILE> <OUTPUT_DIRECTORY>`
+Run POET by `mpirun ./poet [OPTIONS] <RUNFILE> <SIMFILE> <OUTPUT_DIRECTORY>`
 where:

- **OPTIONS** - runtime parameters (explained below)
- **SIMFILE** - simulation described as R script (e.g.
-  `<POET_INSTALL_DIR>/share/poet/bench/dolo/dolo_interp_long.R`)
+- **OPTIONS** - POET options (explained below)
+- **RUNFILE** - Runtime parameters described as R script 
+- **SIMFILE** - Simulation input prepared by `poet_init`
 - **OUTPUT_DIRECTORY** - path, where all output of POET should be stored

-### Runtime options
+### POET options

 The following parameters can be set:

 | Option                      | Value        | Description                                                                                                              |
 |-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
 | **--work-package-size=**    | _1..n_       | size of work packages (defaults to _5_)                                                                                  |
-| **--ignore-result**         |              | disables store of simulation resuls                                                                                      |
+| **-P, --progress**          |              | show progress bar                                                                                                        |
 | **--dht**                   |              | enabling DHT usage (defaults to _OFF_)                                                                                   |
 | **--dht-strategy=**         | _0-1_        | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_)                                                           |
 | **--dht-size=**             | _1-n_        | size of DHT per process involved in megabyte (defaults to _1000 MByte_)                                                  |
@ -180,14 +173,16 @@ Following values can be set:

 ### Example: Running from scratch

-We will continue the above example and start a simulation with
-`dolo_diffu_inner.R`. As transport a simple fixed-coefficient diffusion is used.
-It's a 2D, 100x100 grid, simulating 10 time steps. To start the simulation with
-4 processes `cd` into your previously installed POET-dir
-`<POET_INSTALL_DIR>/bin` and run:
+We will continue the above example and start a simulation with *barite_het*,
+which simulation files can be found in
+`<INSTALL_DIR>/share/poet/barite/barite_het*`. As transport a heterogeneous
+diffusion is used. It's a small 2D grid, 2x5 grid, simulating 50 time steps with
+a time step size of 100 seconds. To start the simulation with 4 processes `cd`
+into your previously installed POET-dir `<POET_INSTALL_DIR>/bin` and run:

 ```sh
-mpirun -n 4 ./poet ../share/poet/bench/dolo/dolo_diffu_inner.R/ output
+cp ../share/poet/barite/barite_het* .
+mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output
 ```

 After a finished simulation all data generated by POET will be found
@ -200,9 +195,32 @@ produced. This is done by appending the `--dht-snaps=<value>` option. The
 resulting call would look like this:

 ```sh
-mpirun -n 4 ./poet --dht --dht-snaps=2 ../share/poet/bench/dolo/dolo_diffu_inner.R/ output
+mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
 ```

+## Defining a model
+
+In order to provide a model to POET, you need to setup a R script which can then
+be used by `poet_init` to generate the simulation input. Which parameters are
+required can be found in the
+[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). We try to
+keep the document up-to-date. However, if you encounter missing information or
+need help, please get in touch with us via the issue tracker or E-Mail.
+
+`poet_init` can be used as follows:
+
+```sh
+./poet_init [-o, --output output_file] [-s, --setwd]  <script.R>
+```
+
+where: 
+
+- **output** - name of the output file (defaults to the input file name
+  with the extension `.rds`)
+- **setwd** - set the working directory to the directory of the input file (e.g.
+  to allow relative paths in the input script). However, the output file
+  will be stored in the directory from which `poet_init` was called.
+
 ## About the usage of MPI_Wtime()

 Implemented time measurement functions uses `MPI_Wtime()`. Some
--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@ -14,7 +14,6 @@ if(DOXYGEN_FOUND)
  doxygen_add_docs(doxygen
    ${PROJECT_SOURCE_DIR}/src
    ${PROJECT_SOURCE_DIR}/README.md
-    ${PROJECT_SOURCE_DIR}/docs/Input_Scripts.md
    ${PROJECT_SOURCE_DIR}/docs/Output.md
    COMMENT "Generate html pages")
 endif()
--- a/docs/Input_Scripts.md
+++ b/docs/Input_Scripts.md
@ -1,86 +0,0 @@
-# Input Scripts
-
-In the following the expected schemes of the input scripts is described.
-Therefore, each section of the input script gets its own chapter. All sections
-should return a `list` as results, which are concatenated to one setup list at
-the end of the file. All values must have the same name in order to get parsed
-by POET.
-
-## Grid initialization
-
-| name           | type           | description                                                           |
-|----------------|----------------|-----------------------------------------------------------------------|
-| `n_cells`      | Numeric Vector | Number of cells in each direction                                     |
-| `s_cells`      | Numeric Vector | Spatial resolution of grid in each direction                          |
-| `type`         | String         | Type of initialization, can be set to *scratch*, *phreeqc* or *rds*   |
-
-## Diffusion parameters
-
-| name           | type                 | description                               |
-|----------------|----------------------|-------------------------------------------|
-| `init`         | Named Numeric Vector | Initial state for each diffused species   |
-| `vecinj`       | Data Frame           | Defining all boundary conditions row wise |
-| `vecinj_inner` | List of Triples      | Inner boundaries                          |
-| `vecinj_index` | List of 4 elements   | Ghost nodes boundary conditions           |
-| `alpha`        | Named Numeric Vector | Constant alpha for each species           |
-
-### Remark on boundary conditions
-
-Each boundary condition should be defined in `vecinj` as a data frame, where one
-row holds one boundary condition.
-
-To define inner (constant) boundary conditions, use a list of triples in
-`vecinj_inner`, where each triples is defined by $(i,x,y)$. $i$ is defining the
-boundary condition, referencing to the row in `vecinj`. $x$ and $y$ coordinates
-then defining the position inside the grid. 
-
-Ghost nodes are set by `vecinj_index` which is a list containing boundaries for
-each celestial direction (**important**: named by `N, E, S, W`). Each direction
-is a numeric vector, also representing a row index of the `vecinj` data frame
-for each ghost node, starting at the left-most and upper cell respectively. By
-setting the boundary condition to $0$, the ghost node is set as closed boundary.
-
-#### Example
-
-Suppose you have a `vecinj` data frame defining 2 boundary conditions and a grid
-consisting of $10 \times 10$ grid cells. Grid cell $(1,1)$ should be set to the
-first boundary condition and $(5,6)$ to the second. Also, all boundary
-conditions for the ghost nodes should be closed. Except the southern boundary,
-which should be set to the first boundary condition injection. The following
-setup describes how to setup your initial script, where `n` and `m` are the
-grids cell count for each direction ($n = m = 10$):
-
-```R
-vecinj_inner <- list (
-  l1 = c(1, 1, 1),
-  l2 = c(2, 5, 6)
-)
-
-vecinj_index <- list(
-  "N" = rep(0, n),
-  "E" = rep(0, m),
-  "S" = rep(1, n),
-  "W" = rep(0, m)
-)
-```
-
-## Chemistry parameters
-
-| name           | type         | description                                                                      |
-|----------------|--------------|----------------------------------------------------------------------------------|
-| `database`     | String       | Path to the Phreeqc database                                                     |
-| `input_script` | String       | Path the the Phreeqc input script                                                |
-| `dht_species`  | Named Vector | Indicates significant digits to use for each species for DHT rounding.           |
-| `pht_species`  | Named Vector | Indicates significant digits to use for each species for Interpolation rounding. |
-
-## Final setup
-
-| name           | type           | description                                                |
-|----------------|----------------|------------------------------------------------------------|
-| `grid`         | List           | Grid parameter list                                        |
-| `diffusion`    | List           | Diffusion parameter list                                   |
-| `chemistry`    | List           | Chemistry parameter list                                   |
-| `iterations`   | Numeric Value  | Count of iterations                                        |
-| `timesteps`    | Numeric Vector | $\Delta t$ to use for specific iteration                   |
-| `store_result` | Boolean        | Indicates if results should be stored                      |
-| `out_save`     | Numeric Vector | *optional:* At which iteration the states should be stored |
--- a/docs/Output.md
+++ b/docs/Output.md
@ -35,34 +35,50 @@ corresponding values can be found in `<OUTPUT_DIRECTORY>/timings.rds`
 and possible to read out within a R runtime with
 `readRDS("timings.rds")`. There you will find the following values:

-| Value              | Description                                                                |
-|--------------------|----------------------------------------------------------------------------|
-| simtime            | time spent in whole simulation loop without any initialization and cleanup |
-| simtime\_transport | measured time in *transport* subroutine                                    |
-| simtime\_chemistry | measured time in *chemistry* subroutine (actual parallelized part)         |
+| Value     | Description                                                                |
+| --------- | -------------------------------------------------------------------------- |
+| simtime   | time spent in whole simulation loop without any initialization and cleanup |
+| chemistry | measured time in *chemistry* subroutine                                    |
+| diffusion | measured time in *diffusion* subroutine                                    |

-### chemistry subsetting
+### Chemistry subsetting

-If running parallel there are also measured timings which are subsets of
-*simtime\_chemistry*.
+| Value         | Description                                               |
+| ------------- | --------------------------------------------------------- |
+| simtime       | overall runtime of chemistry                              |
+| loop          | time spent in send/recv loop of master                    |
+| sequential    | sequential part of the master (e.g. shuffling field)      |
+| idle\_master  | idling time of the master waiting for workers             |
+| idle\_worker  | idling time (waiting for work from master) of the workers |
+| phreeqc\_time | accumulated times for Phreeqc calls of every worker       |

-| Value                 | Description                                               |
-|-----------------------|-----------------------------------------------------------|
-| chemistry\_loop       | time spent in send/recv loop of master                    |
-| chemistry\_sequential | sequential part of master chemistry                       |
-| idle\_master          | idling time (waiting for any free worker) of the master   |
-| idle\_worker          | idling time (waiting for work from master) of the workers |
-| phreeqc\_time         | accumulated times for Phreeqc calls of every worker       |
-
-### DHT usage {#DHT-usage}
+#### DHT usage

 If running in parallel and with activated DHT, two more timings and also
 some profiling about the DHT usage are given:

 | Value           | Description                                             |
-|-----------------|---------------------------------------------------------|
-| dht\_fill\_time | time to write data to DHT                               |
-| dht\_get\_time  | time to retreive data from DHT                          |
-| dh\_hits        | count of data points retrieved from DHT                 |
-| dht\_miss       | count of misses/count of data points written to DHT     |
+| --------------- | ------------------------------------------------------- |
+| dht\_hits       | count of data points retrieved from DHT                 |
 | dht\_evictions  | count of data points evicted by another write operation |
+| dht\_get\_time  | time to retreive data from DHT                          |
+| dht\_fill\_time | time to write data to DHT                               |
+
+#### Interpolation
+
+If using interpolation, the following values are given:
+
+| Value          | Description                                                           |
+| -------------- | --------------------------------------------------------------------- |
+| interp\_w      | time spent to write to PHT                                            |
+| interp\_r      | time spent to read from DHT/PHT/Cache                                 |
+| interp\_g      | time spent to gather results from DHT                                 |
+| interp\_fc     | accumulated time spent in interpolation function call                 |
+| interp\_calls  | count of interpolations                                               |
+| interp\_cached | count of interpolation data sets, which where cached in the local map |
+
+### Diffusion subsetting
+
+| Value     | Description                                |
+| --------- | ------------------------------------------ |
+| simtime   | overall runtime of diffusion               |