diff --git a/CMakeLists.txt b/CMakeLists.txt index 1c17fb843..f2f4434bf 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -15,15 +15,10 @@ list(APPEND CMAKE_MODULE_PATH "${POET_SOURCE_DIR}/CMake") get_poet_version() -# set(GCC_CXX_FLAGS "-D STRICT_R_HEADERS") add_definitions(${GCC_CXX_FLAGS}) - find_package(MPI REQUIRED) find_package(RRuntime REQUIRED) -# add_compile_options(-fsanitize=address -fno-omit-frame-pointer) -# add_link_options(-fsanitize=address) - add_subdirectory(src) add_subdirectory(bench) diff --git a/README.md b/README.md index a273f7f10..e6535ed5f 100644 --- a/README.md +++ b/README.md @@ -20,10 +20,10 @@ pages](https://naaice.git-pages.gfz-potsdam.de/poet). The following external header library is shipped with POET: - **argh** - https://github.com/adishavit/argh (BSD license) -- **PhreeqcRM** with patches from GFZ - - https://www.usgs.gov/software/phreeqc-version-3 - - https://git.gfz-potsdam.de/mluebke/phreeqcrm-gfz -- **tug** - https://git.gfz-potsdam.de/sec34/tug +- **IPhreeqc** with patches from GFZ - + https://github.com/usgs-coupled/iphreeqc - + https://git.gfz-potsdam.de/naaice/iphreeqc +- **tug** - https://git.gfz-potsdam.de/naaice/tug ## Installation @@ -35,6 +35,7 @@ To compile POET you need several software to be installed: - MPI-Implementation (tested with OpenMPI and MVAPICH) - R language and environment - CMake 3.9+ +- Eigen3 3.4+ (required by `tug`) - *optional*: `doxygen` with `dot` bindings for documentiation The following R libraries must then be installed, which will get the @@ -107,58 +108,50 @@ The correspondending directory tree would look like this: ```sh poet ├── bin -│ └── poet -├── R_lib -│ └── kin_r_library.R +│   ├── poet +│   └── poet_init └── share └── poet - └── bench - ├── barite - │ ├── barite_interp_eval.R - │ ├── barite.pqi - │ ├── barite.R - │ └── db_barite.dat - ├── dolo - │ ├── dolo_diffu_inner_large.R - │ ├── dolo_diffu_inner.R - │ ├── dolo_inner.pqi - │ ├── dolo_interp_long.R - │ └── phreeqc_kin.dat - └── surfex - ├── ExBase.pqi - ├── ex.R - ├── SMILE_2021_11_01_TH.dat - ├── SurfExBase.pqi - └── surfex.R + ├── barite + │   ├── barite_200.rds + │   ├── barite_200_rt.R + │   ├── barite_het.rds + │   └── barite_het_rt.R + ├── dolo + │   ├── dolo_inner_large.rds + │   ├── dolo_inner_large_rt.R + │   ├── dolo_interp.rds + │   └── dolo_interp_rt.R + └── surfex + ├── PoetEGU_surfex_500.rds + └── PoetEGU_surfex_500_rt.R ``` -The R libraries will be loaded at runtime and the paths are hardcoded -absolute paths inside `poet.cpp`. So, if you consider to move -`bin/poet` either change paths of the R source files and recompile -POET or also move `R_lib/*` relative to the binary. +With the installation of POET, two executables are provided: + - `poet` - the main executable to run simulations + - `poet_init` - a preprocessor to generate input files for POET from R scripts -The benchmarks consist of input scripts, which are provided as .R files. -Additionally, Phreeqc scripts and their corresponding databases are required, -stored as .pqi and .dat files, respectively. +Preprocessed benchmarks can be found in the `share/poet` directory with an +according *runtime* setup. More on those files and how to create them later. ## Running -Run POET by `mpirun ./poet ` +Run POET by `mpirun ./poet [OPTIONS] ` where: -- **OPTIONS** - runtime parameters (explained below) -- **SIMFILE** - simulation described as R script (e.g. - `/share/poet/bench/dolo/dolo_interp_long.R`) +- **OPTIONS** - POET options (explained below) +- **RUNFILE** - Runtime parameters described as R script +- **SIMFILE** - Simulation input prepared by `poet_init` - **OUTPUT_DIRECTORY** - path, where all output of POET should be stored -### Runtime options +### POET options The following parameters can be set: | Option | Value | Description | |-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) | -| **--ignore-result** | | disables store of simulation resuls | +| **-P, --progress** | | show progress bar | | **--dht** | | enabling DHT usage (defaults to _OFF_) | | **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) | | **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) | @@ -180,14 +173,16 @@ Following values can be set: ### Example: Running from scratch -We will continue the above example and start a simulation with -`dolo_diffu_inner.R`. As transport a simple fixed-coefficient diffusion is used. -It's a 2D, 100x100 grid, simulating 10 time steps. To start the simulation with -4 processes `cd` into your previously installed POET-dir -`/bin` and run: +We will continue the above example and start a simulation with *barite_het*, +which simulation files can be found in +`/share/poet/barite/barite_het*`. As transport a heterogeneous +diffusion is used. It's a small 2D grid, 2x5 grid, simulating 50 time steps with +a time step size of 100 seconds. To start the simulation with 4 processes `cd` +into your previously installed POET-dir `/bin` and run: ```sh -mpirun -n 4 ./poet ../share/poet/bench/dolo/dolo_diffu_inner.R/ output +cp ../share/poet/barite/barite_het* . +mpirun -n 4 ./poet barite_het_rt.R barite_het.rds output ``` After a finished simulation all data generated by POET will be found @@ -200,9 +195,32 @@ produced. This is done by appending the `--dht-snaps=` option. The resulting call would look like this: ```sh -mpirun -n 4 ./poet --dht --dht-snaps=2 ../share/poet/bench/dolo/dolo_diffu_inner.R/ output +mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output ``` +## Defining a model + +In order to provide a model to POET, you need to setup a R script which can then +be used by `poet_init` to generate the simulation input. Which parameters are +required can be found in the +[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). We try to +keep the document up-to-date. However, if you encounter missing information or +need help, please get in touch with us via the issue tracker or E-Mail. + +`poet_init` can be used as follows: + +```sh +./poet_init [-o, --output output_file] [-s, --setwd] +``` + +where: + +- **output** - name of the output file (defaults to the input file name + with the extension `.rds`) +- **setwd** - set the working directory to the directory of the input file (e.g. + to allow relative paths in the input script). However, the output file + will be stored in the directory from which `poet_init` was called. + ## About the usage of MPI_Wtime() Implemented time measurement functions uses `MPI_Wtime()`. Some diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt index 4781c2b87..fa9038767 100644 --- a/docs/CMakeLists.txt +++ b/docs/CMakeLists.txt @@ -14,7 +14,6 @@ if(DOXYGEN_FOUND) doxygen_add_docs(doxygen ${PROJECT_SOURCE_DIR}/src ${PROJECT_SOURCE_DIR}/README.md - ${PROJECT_SOURCE_DIR}/docs/Input_Scripts.md ${PROJECT_SOURCE_DIR}/docs/Output.md COMMENT "Generate html pages") endif() diff --git a/docs/Input_Scripts.md b/docs/Input_Scripts.md deleted file mode 100644 index 1c03de749..000000000 --- a/docs/Input_Scripts.md +++ /dev/null @@ -1,86 +0,0 @@ -# Input Scripts - -In the following the expected schemes of the input scripts is described. -Therefore, each section of the input script gets its own chapter. All sections -should return a `list` as results, which are concatenated to one setup list at -the end of the file. All values must have the same name in order to get parsed -by POET. - -## Grid initialization - -| name | type | description | -|----------------|----------------|-----------------------------------------------------------------------| -| `n_cells` | Numeric Vector | Number of cells in each direction | -| `s_cells` | Numeric Vector | Spatial resolution of grid in each direction | -| `type` | String | Type of initialization, can be set to *scratch*, *phreeqc* or *rds* | - -## Diffusion parameters - -| name | type | description | -|----------------|----------------------|-------------------------------------------| -| `init` | Named Numeric Vector | Initial state for each diffused species | -| `vecinj` | Data Frame | Defining all boundary conditions row wise | -| `vecinj_inner` | List of Triples | Inner boundaries | -| `vecinj_index` | List of 4 elements | Ghost nodes boundary conditions | -| `alpha` | Named Numeric Vector | Constant alpha for each species | - -### Remark on boundary conditions - -Each boundary condition should be defined in `vecinj` as a data frame, where one -row holds one boundary condition. - -To define inner (constant) boundary conditions, use a list of triples in -`vecinj_inner`, where each triples is defined by $(i,x,y)$. $i$ is defining the -boundary condition, referencing to the row in `vecinj`. $x$ and $y$ coordinates -then defining the position inside the grid. - -Ghost nodes are set by `vecinj_index` which is a list containing boundaries for -each celestial direction (**important**: named by `N, E, S, W`). Each direction -is a numeric vector, also representing a row index of the `vecinj` data frame -for each ghost node, starting at the left-most and upper cell respectively. By -setting the boundary condition to $0$, the ghost node is set as closed boundary. - -#### Example - -Suppose you have a `vecinj` data frame defining 2 boundary conditions and a grid -consisting of $10 \times 10$ grid cells. Grid cell $(1,1)$ should be set to the -first boundary condition and $(5,6)$ to the second. Also, all boundary -conditions for the ghost nodes should be closed. Except the southern boundary, -which should be set to the first boundary condition injection. The following -setup describes how to setup your initial script, where `n` and `m` are the -grids cell count for each direction ($n = m = 10$): - -```R -vecinj_inner <- list ( - l1 = c(1, 1, 1), - l2 = c(2, 5, 6) -) - -vecinj_index <- list( - "N" = rep(0, n), - "E" = rep(0, m), - "S" = rep(1, n), - "W" = rep(0, m) -) -``` - -## Chemistry parameters - -| name | type | description | -|----------------|--------------|----------------------------------------------------------------------------------| -| `database` | String | Path to the Phreeqc database | -| `input_script` | String | Path the the Phreeqc input script | -| `dht_species` | Named Vector | Indicates significant digits to use for each species for DHT rounding. | -| `pht_species` | Named Vector | Indicates significant digits to use for each species for Interpolation rounding. | - -## Final setup - -| name | type | description | -|----------------|----------------|------------------------------------------------------------| -| `grid` | List | Grid parameter list | -| `diffusion` | List | Diffusion parameter list | -| `chemistry` | List | Chemistry parameter list | -| `iterations` | Numeric Value | Count of iterations | -| `timesteps` | Numeric Vector | $\Delta t$ to use for specific iteration | -| `store_result` | Boolean | Indicates if results should be stored | -| `out_save` | Numeric Vector | *optional:* At which iteration the states should be stored | diff --git a/docs/Output.md b/docs/Output.md index 4044c84a6..82644d347 100644 --- a/docs/Output.md +++ b/docs/Output.md @@ -35,34 +35,50 @@ corresponding values can be found in `/timings.rds` and possible to read out within a R runtime with `readRDS("timings.rds")`. There you will find the following values: -| Value | Description | -|--------------------|----------------------------------------------------------------------------| -| simtime | time spent in whole simulation loop without any initialization and cleanup | -| simtime\_transport | measured time in *transport* subroutine | -| simtime\_chemistry | measured time in *chemistry* subroutine (actual parallelized part) | +| Value | Description | +| --------- | -------------------------------------------------------------------------- | +| simtime | time spent in whole simulation loop without any initialization and cleanup | +| chemistry | measured time in *chemistry* subroutine | +| diffusion | measured time in *diffusion* subroutine | -### chemistry subsetting +### Chemistry subsetting -If running parallel there are also measured timings which are subsets of -*simtime\_​chemistry*. +| Value | Description | +| ------------- | --------------------------------------------------------- | +| simtime | overall runtime of chemistry | +| loop | time spent in send/recv loop of master | +| sequential | sequential part of the master (e.g. shuffling field) | +| idle\_master | idling time of the master waiting for workers | +| idle\_worker | idling time (waiting for work from master) of the workers | +| phreeqc\_time | accumulated times for Phreeqc calls of every worker | -| Value | Description | -|-----------------------|-----------------------------------------------------------| -| chemistry\_loop | time spent in send/​recv loop of master | -| chemistry\_sequential | sequential part of master chemistry | -| idle\_master | idling time (waiting for any free worker) of the master | -| idle\_worker | idling time (waiting for work from master) of the workers | -| phreeqc\_time | accumulated times for Phreeqc calls of every worker | - -### DHT usage {#DHT-usage} +#### DHT usage If running in parallel and with activated DHT, two more timings and also some profiling about the DHT usage are given: | Value | Description | -|-----------------|---------------------------------------------------------| -| dht\_fill\_time | time to write data to DHT | -| dht\_get\_time | time to retreive data from DHT | -| dh\_hits | count of data points retrieved from DHT | -| dht\_miss | count of misses/count of data points written to DHT | +| --------------- | ------------------------------------------------------- | +| dht\_hits | count of data points retrieved from DHT | | dht\_evictions | count of data points evicted by another write operation | +| dht\_get\_time | time to retreive data from DHT | +| dht\_fill\_time | time to write data to DHT | + +#### Interpolation + +If using interpolation, the following values are given: + +| Value | Description | +| -------------- | --------------------------------------------------------------------- | +| interp\_w | time spent to write to PHT | +| interp\_r | time spent to read from DHT/PHT/Cache | +| interp\_g | time spent to gather results from DHT | +| interp\_fc | accumulated time spent in interpolation function call | +| interp\_calls | count of interpolations | +| interp\_cached | count of interpolation data sets, which where cached in the local map | + +### Diffusion subsetting + +| Value | Description | +| --------- | ------------------------------------------ | +| simtime | overall runtime of diffusion |