Update Readme

2025-12-16 12:54:50 +01:00 · 2024-05-06 09:09:24 +00:00 · 2024-05-06 09:09:24 +00:00 · 0992143be5
commit 0992143be5
parent a12ac2c3d5
5 changed files with 101 additions and 159 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -15,15 +15,10 @@ list(APPEND CMAKE_MODULE_PATH "${POET_SOURCE_DIR}/CMake")
 get_poet_version()
 # set(GCC_CXX_FLAGS "-D STRICT_R_HEADERS") add_definitions(${GCC_CXX_FLAGS})
 find_package(MPI REQUIRED)
 find_package(RRuntime REQUIRED)
 # add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
 # add_link_options(-fsanitize=address)
 add_subdirectory(src)
 add_subdirectory(bench)
--- a/README.md
+++ b/README.md
@ -20,10 +20,10 @@ pages](https://naaice.git-pages.gfz-potsdam.de/poet).
 The following external header library is shipped with POET:
 - **argh** - https://github.com/adishavit/argh (BSD license)
- **PhreeqcRM** with patches from GFZ -
+- **IPhreeqc** with patches from GFZ -
-  https://www.usgs.gov/software/phreeqc-version-3 -
+  https://github.com/usgs-coupled/iphreeqc -
-  https://git.gfz-potsdam.de/mluebke/phreeqcrm-gfz
+  https://git.gfz-potsdam.de/naaice/iphreeqc
- **tug** - https://git.gfz-potsdam.de/sec34/tug
+- **tug** - https://git.gfz-potsdam.de/naaice/tug
 ## Installation
@ -35,6 +35,7 @@ To compile POET you need several software to be installed:
 - MPI-Implementation (tested with OpenMPI and MVAPICH)
 - R language and environment
 - CMake 3.9+
 - Eigen3 3.4+ (required by `tug`)
 - *optional*: `doxygen` with `dot` bindings for documentiation
 The following R libraries must then be installed, which will get the
@ -107,58 +108,50 @@ The correspondending directory tree would look like this:
 ```sh
 poet
 ├── bin
-│   └── poet
+│   ├── poet
-├── R_lib
+│   └── poet_init
 │   └── kin_r_library.R
 └── share
    └── poet
-        └── bench
+        ├── barite
-            ├── barite
+        │   ├── barite_200.rds
-            │   ├── barite_interp_eval.R
+        │   ├── barite_200_rt.R
-            │   ├── barite.pqi
+        │   ├── barite_het.rds
-            │   ├── barite.R
+        │   └── barite_het_rt.R
-            │   └── db_barite.dat
+        ├── dolo
-            ├── dolo
+        │   ├── dolo_inner_large.rds
-            │   ├── dolo_diffu_inner_large.R
+        │   ├── dolo_inner_large_rt.R
-            │   ├── dolo_diffu_inner.R
+        │   ├── dolo_interp.rds
-            │   ├── dolo_inner.pqi
+        │   └── dolo_interp_rt.R
-            │   ├── dolo_interp_long.R
+        └── surfex
-            │   └── phreeqc_kin.dat
+            ├── PoetEGU_surfex_500.rds
-            └── surfex
+            └── PoetEGU_surfex_500_rt.R
                ├── ExBase.pqi
                ├── ex.R
                ├── SMILE_2021_11_01_TH.dat
                ├── SurfExBase.pqi
                └── surfex.R
 ```
-The R libraries will be loaded at runtime and the paths are hardcoded
+With the installation of POET, two executables are provided: 
-absolute paths inside `poet.cpp`. So, if you consider to move
+  - `poet` - the main executable to run simulations
-`bin/poet` either change paths of the R source files and recompile
+  - `poet_init` - a preprocessor to generate input files for POET from R scripts
 POET or also move `R_lib/*` relative to the binary.
-The benchmarks consist of input scripts, which are provided as .R files.
+Preprocessed benchmarks can be found in the `share/poet` directory with an
-Additionally, Phreeqc scripts and their corresponding databases are required,
+according *runtime* setup. More on those files and how to create them later. 
 stored as .pqi and .dat files, respectively.
 ## Running
-Run POET by `mpirun ./poet <OPTIONS> <SIMFILE> <OUTPUT_DIRECTORY>`
+Run POET by `mpirun ./poet [OPTIONS] <RUNFILE> <SIMFILE> <OUTPUT_DIRECTORY>`
 where:
- **OPTIONS** - runtime parameters (explained below)
+- **OPTIONS** - POET options (explained below)
- **SIMFILE** - simulation described as R script (e.g.
+- **RUNFILE** - Runtime parameters described as R script 
-  `<POET_INSTALL_DIR>/share/poet/bench/dolo/dolo_interp_long.R`)
+- **SIMFILE** - Simulation input prepared by `poet_init`
 - **OUTPUT_DIRECTORY** - path, where all output of POET should be stored
-### Runtime options
+### POET options
 The following parameters can be set:
 | Option                      | Value        | Description                                                                                                              |
 |-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
 | **--work-package-size=**    | _1..n_       | size of work packages (defaults to _5_)                                                                                  |
-| **--ignore-result**         |              | disables store of simulation resuls                                                                                      |
+| **-P, --progress**          |              | show progress bar                                                                                                        |
 | **--dht**                   |              | enabling DHT usage (defaults to _OFF_)                                                                                   |
 | **--dht-strategy=**         | _0-1_        | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_)                                                           |
 | **--dht-size=**             | _1-n_        | size of DHT per process involved in megabyte (defaults to _1000 MByte_)                                                  |
@ -180,14 +173,16 @@ Following values can be set:
 ### Example: Running from scratch
-We will continue the above example and start a simulation with
+We will continue the above example and start a simulation with *barite_het*,
-`dolo_diffu_inner.R`. As transport a simple fixed-coefficient diffusion is used.
+which simulation files can be found in
-It's a 2D, 100x100 grid, simulating 10 time steps. To start the simulation with
+`<INSTALL_DIR>/share/poet/barite/barite_het*`. As transport a heterogeneous
-4 processes `cd` into your previously installed POET-dir
+diffusion is used. It's a small 2D grid, 2x5 grid, simulating 50 time steps with
-`<POET_INSTALL_DIR>/bin` and run:
+a time step size of 100 seconds. To start the simulation with 4 processes `cd`
 into your previously installed POET-dir `<POET_INSTALL_DIR>/bin` and run:
 ```sh
-mpirun -n 4 ./poet ../share/poet/bench/dolo/dolo_diffu_inner.R/ output
+cp ../share/poet/barite/barite_het* .
 mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output
 ```
 After a finished simulation all data generated by POET will be found
@ -200,9 +195,32 @@ produced. This is done by appending the `--dht-snaps=<value>` option. The
 resulting call would look like this:
 ```sh
-mpirun -n 4 ./poet --dht --dht-snaps=2 ../share/poet/bench/dolo/dolo_diffu_inner.R/ output
+mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
 ```
 ## Defining a model
 In order to provide a model to POET, you need to setup a R script which can then
 be used by `poet_init` to generate the simulation input. Which parameters are
 required can be found in the
 [Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). We try to
 keep the document up-to-date. However, if you encounter missing information or
 need help, please get in touch with us via the issue tracker or E-Mail.
 `poet_init` can be used as follows:
 ```sh
 ./poet_init [-o, --output output_file] [-s, --setwd]  <script.R>
 ```
 where: 
 - **output** - name of the output file (defaults to the input file name
  with the extension `.rds`)
 - **setwd** - set the working directory to the directory of the input file (e.g.
  to allow relative paths in the input script). However, the output file
  will be stored in the directory from which `poet_init` was called.
 ## About the usage of MPI_Wtime()
 Implemented time measurement functions uses `MPI_Wtime()`. Some
--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@ -14,7 +14,6 @@ if(DOXYGEN_FOUND)
  doxygen_add_docs(doxygen
    ${PROJECT_SOURCE_DIR}/src
    ${PROJECT_SOURCE_DIR}/README.md
    ${PROJECT_SOURCE_DIR}/docs/Input_Scripts.md
    ${PROJECT_SOURCE_DIR}/docs/Output.md
    COMMENT "Generate html pages")
 endif()
--- a/docs/Input_Scripts.md
+++ b/docs/Input_Scripts.md
@ -1,86 +0,0 @@
 # Input Scripts
 In the following the expected schemes of the input scripts is described.
 Therefore, each section of the input script gets its own chapter. All sections
 should return a `list` as results, which are concatenated to one setup list at
 the end of the file. All values must have the same name in order to get parsed
 by POET.
 ## Grid initialization
 | name           | type           | description                                                           |
 |----------------|----------------|-----------------------------------------------------------------------|
 | `n_cells`      | Numeric Vector | Number of cells in each direction                                     |
 | `s_cells`      | Numeric Vector | Spatial resolution of grid in each direction                          |
 | `type`         | String         | Type of initialization, can be set to *scratch*, *phreeqc* or *rds*   |
 ## Diffusion parameters
 | name           | type                 | description                               |
 |----------------|----------------------|-------------------------------------------|
 | `init`         | Named Numeric Vector | Initial state for each diffused species   |
 | `vecinj`       | Data Frame           | Defining all boundary conditions row wise |
 | `vecinj_inner` | List of Triples      | Inner boundaries                          |
 | `vecinj_index` | List of 4 elements   | Ghost nodes boundary conditions           |
 | `alpha`        | Named Numeric Vector | Constant alpha for each species           |
 ### Remark on boundary conditions
 Each boundary condition should be defined in `vecinj` as a data frame, where one
 row holds one boundary condition.
 To define inner (constant) boundary conditions, use a list of triples in
 `vecinj_inner`, where each triples is defined by $(i,x,y)$. $i$ is defining the
 boundary condition, referencing to the row in `vecinj`. $x$ and $y$ coordinates
 then defining the position inside the grid. 
 Ghost nodes are set by `vecinj_index` which is a list containing boundaries for
 each celestial direction (**important**: named by `N, E, S, W`). Each direction
 is a numeric vector, also representing a row index of the `vecinj` data frame
 for each ghost node, starting at the left-most and upper cell respectively. By
 setting the boundary condition to $0$, the ghost node is set as closed boundary.
 #### Example
 Suppose you have a `vecinj` data frame defining 2 boundary conditions and a grid
 consisting of $10 \times 10$ grid cells. Grid cell $(1,1)$ should be set to the
 first boundary condition and $(5,6)$ to the second. Also, all boundary
 conditions for the ghost nodes should be closed. Except the southern boundary,
 which should be set to the first boundary condition injection. The following
 setup describes how to setup your initial script, where `n` and `m` are the
 grids cell count for each direction ($n = m = 10$):
 ```R
 vecinj_inner <- list (
  l1 = c(1, 1, 1),
  l2 = c(2, 5, 6)
 )
 vecinj_index <- list(
  "N" = rep(0, n),
  "E" = rep(0, m),
  "S" = rep(1, n),
  "W" = rep(0, m)
 )
 ```
 ## Chemistry parameters
 | name           | type         | description                                                                      |
 |----------------|--------------|----------------------------------------------------------------------------------|
 | `database`     | String       | Path to the Phreeqc database                                                     |
 | `input_script` | String       | Path the the Phreeqc input script                                                |
 | `dht_species`  | Named Vector | Indicates significant digits to use for each species for DHT rounding.           |
 | `pht_species`  | Named Vector | Indicates significant digits to use for each species for Interpolation rounding. |
 ## Final setup
 | name           | type           | description                                                |
 |----------------|----------------|------------------------------------------------------------|
 | `grid`         | List           | Grid parameter list                                        |
 | `diffusion`    | List           | Diffusion parameter list                                   |
 | `chemistry`    | List           | Chemistry parameter list                                   |
 | `iterations`   | Numeric Value  | Count of iterations                                        |
 | `timesteps`    | Numeric Vector | $\Delta t$ to use for specific iteration                   |
 | `store_result` | Boolean        | Indicates if results should be stored                      |
 | `out_save`     | Numeric Vector | *optional:* At which iteration the states should be stored |
--- a/docs/Output.md
+++ b/docs/Output.md
@ -35,34 +35,50 @@ corresponding values can be found in `<OUTPUT_DIRECTORY>/timings.rds`
 and possible to read out within a R runtime with
 `readRDS("timings.rds")`. There you will find the following values:
-| Value              | Description                                                                |
+| Value     | Description                                                                |
-|--------------------|----------------------------------------------------------------------------|
+| --------- | -------------------------------------------------------------------------- |
-| simtime            | time spent in whole simulation loop without any initialization and cleanup |
+| simtime   | time spent in whole simulation loop without any initialization and cleanup |
-| simtime\_transport | measured time in *transport* subroutine                                    |
+| chemistry | measured time in *chemistry* subroutine                                    |
-| simtime\_chemistry | measured time in *chemistry* subroutine (actual parallelized part)         |
+| diffusion | measured time in *diffusion* subroutine                                    |
-### chemistry subsetting
+### Chemistry subsetting
-If running parallel there are also measured timings which are subsets of
+| Value         | Description                                               |
-*simtime\_chemistry*.
+| ------------- | --------------------------------------------------------- |
 | simtime       | overall runtime of chemistry                              |
 | loop          | time spent in send/recv loop of master                    |
 | sequential    | sequential part of the master (e.g. shuffling field)      |
 | idle\_master  | idling time of the master waiting for workers             |
 | idle\_worker  | idling time (waiting for work from master) of the workers |
 | phreeqc\_time | accumulated times for Phreeqc calls of every worker       |
-| Value                 | Description                                               |
+#### DHT usage
 |-----------------------|-----------------------------------------------------------|
 | chemistry\_loop       | time spent in send/recv loop of master                    |
 | chemistry\_sequential | sequential part of master chemistry                       |
 | idle\_master          | idling time (waiting for any free worker) of the master   |
 | idle\_worker          | idling time (waiting for work from master) of the workers |
 | phreeqc\_time         | accumulated times for Phreeqc calls of every worker       |
 ### DHT usage {#DHT-usage}
 If running in parallel and with activated DHT, two more timings and also
 some profiling about the DHT usage are given:
 | Value           | Description                                             |
-|-----------------|---------------------------------------------------------|
+| --------------- | ------------------------------------------------------- |
-| dht\_fill\_time | time to write data to DHT                               |
+| dht\_hits       | count of data points retrieved from DHT                 |
 | dht\_get\_time  | time to retreive data from DHT                          |
 | dh\_hits        | count of data points retrieved from DHT                 |
 | dht\_miss       | count of misses/count of data points written to DHT     |
 | dht\_evictions  | count of data points evicted by another write operation |
 | dht\_get\_time  | time to retreive data from DHT                          |
 | dht\_fill\_time | time to write data to DHT                               |
 #### Interpolation
 If using interpolation, the following values are given:
 | Value          | Description                                                           |
 | -------------- | --------------------------------------------------------------------- |
 | interp\_w      | time spent to write to PHT                                            |
 | interp\_r      | time spent to read from DHT/PHT/Cache                                 |
 | interp\_g      | time spent to gather results from DHT                                 |
 | interp\_fc     | accumulated time spent in interpolation function call                 |
 | interp\_calls  | count of interpolations                                               |
 | interp\_cached | count of interpolation data sets, which where cached in the local map |
 ### Diffusion subsetting
 | Value     | Description                                |
 | --------- | ------------------------------------------ |
 | simtime   | overall runtime of diffusion               |