From 4090c0a78f258af93702fb85647b352d7b5ccb88 Mon Sep 17 00:00:00 2001 From: Marco De Lucia Date: Tue, 11 Jun 2024 16:50:02 +0200 Subject: [PATCH 01/22] feat: fast serialization/storage using qs package via `--qs` flag --- README.md | 97 +++++++++++++++++++++------------------ R_lib/kin_r_library.R | 54 +++++++++++++++++----- src/poet.cpp | 103 ++++++++++++++++++++++++++---------------- src/poet.hpp.in | 3 +- 4 files changed, 160 insertions(+), 97 deletions(-) diff --git a/README.md b/README.md index b17d73c12..ec75dceaf 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # POET @@ -87,7 +87,7 @@ follows: $ R # install R dependencies -> install.packages(c("Rcpp", "RInside")) +> install.packages(c("Rcpp", "RInside","qs")) > q(save="no") # cd into POET project root @@ -133,13 +133,14 @@ With the installation of POET, two executables are provided: - `poet` - the main executable to run simulations - `poet_init` - a preprocessor to generate input files for POET from R scripts -Preprocessed benchmarks can be found in the `share/poet` directory with an -according *runtime* setup. More on those files and how to create them later. +Preprocessed benchmarks can be found in the `share/poet` directory +with an according *runtime* setup. More on those files and how to +create them later. ## Running -Run POET by `mpirun ./poet [OPTIONS] ` -where: +Run POET by `mpirun ./poet [OPTIONS] +` where: - **OPTIONS** - POET options (explained below) - **RUNFILE** - Runtime parameters described as R script @@ -154,8 +155,9 @@ The following parameters can be set: |-----------------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | **--work-package-size=** | _1..n_ | size of work packages (defaults to _5_) | | **-P, --progress** | | show progress bar | -| **--ai-surrogate** | | activates the AI surrogate chemistry model (defaults to _OFF_) | +| **--ai-surrogate** | | activates the AI surrogate chemistry model (defaults to _OFF_) | | **--dht** | | enabling DHT usage (defaults to _OFF_) | +| **--qs** | | store results using qs::qsave() (.qs extension) instead of default RDS (.rds) | | **--dht-strategy=** | _0-1_ | change DHT strategy. **NOT IMPLEMENTED YET** (Defaults to _0_) | | **--dht-size=** | _1-n_ | size of DHT per process involved in megabyte (defaults to _1000 MByte_) | | **--dht-snaps=** | _0-2_ | disable or enable storage of DHT snapshots | @@ -253,12 +255,13 @@ produce any valid predictions. ## Defining a model -In order to provide a model to POET, you need to setup a R script which can then -be used by `poet_init` to generate the simulation input. Which parameters are -required can be found in the -[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). We try to -keep the document up-to-date. However, if you encounter missing information or -need help, please get in touch with us via the issue tracker or E-Mail. +In order to provide a model to POET, you need to setup a R script +which can then be used by `poet_init` to generate the simulation +input. Which parameters are required can be found in the +[Wiki](https://git.gfz-potsdam.de/naaice/poet/-/wikis/Initialization). +We try to keep the document up-to-date. However, if you encounter +missing information or need help, please get in touch with us via the +issue tracker or E-Mail. `poet_init` can be used as follows: @@ -268,46 +271,50 @@ need help, please get in touch with us via the issue tracker or E-Mail. where: -- **output** - name of the output file (defaults to the input file name - with the extension `.rds`) -- **setwd** - set the working directory to the directory of the input file (e.g. - to allow relative paths in the input script). However, the output file - will be stored in the directory from which `poet_init` was called. +- **output** - name of the output file (defaults to the input file + name with the extension `.rds`) +- **setwd** - set the working directory to the directory of the input + file (e.g. to allow relative paths in the input script). However, + the output file will be stored in the directory from which + `poet_init` was called. ## Additional functions for the AI surrogate -The AI surrogate can be activated for any benchmark and is by default initiated -as a sequential keras model with three hidden layer of depth 48, 96, 24 with -relu activation and adam optimizer. All functions in `ai_surrogate_model.R` can -be overridden by adding custom definitions via an R file in the input script. -This is done by adding the path to this file in the input script. Simply add the -path as an element called `ai_surrogate_input_script` to the `chemistry_setup` -list. Please use the global variable `ai_surrogate_base_path` as a base path +The AI surrogate can be activated for any benchmark and is by default +initiated as a sequential keras model with three hidden layer of depth +48, 96, 24 with relu activation and adam optimizer. All functions in +`ai_surrogate_model.R` can be overridden by adding custom definitions +via an R file in the input script. This is done by adding the path to +this file in the input script. Simply add the path as an element +called `ai_surrogate_input_script` to the `chemistry_setup` list. +Please use the global variable `ai_surrogate_base_path` as a base path when relative filepaths are used in custom funtions. -**There is currently no default implementation to determine the validity of -predicted values.** This means, that every input script must include an R source -file with a custom function `validate_predictions(predictors, prediction)`. -Examples for custom functions can be found for the barite_200 benchmark +**There is currently no default implementation to determine the +validity of predicted values.** This means, that every input script +must include an R source file with a custom function +`validate_predictions(predictors, prediction)`. Examples for custom +functions can be found for the barite_200 benchmark -The functions can be defined as follows: +The functions can be defined as follows: -`validate_predictions(predictors, prediction)`: Returns a boolean index vector -that signals for each row in the predictions if the values are considered valid. -Can eg. be implemented as a mass balance threshold between the predictors and -the prediction. +`validate_predictions(predictors, prediction)`: Returns a boolean +index vector that signals for each row in the predictions if the +values are considered valid. Can eg. be implemented as a mass balance +threshold between the predictors and the prediction. -`initiate_model()`: Returns a keras model. Can be used to load pretrained -models. +`initiate_model()`: Returns a keras model. Can be used to load +pretrained models. `preprocess(df, backtransform = FALSE, outputs = FALSE)`: Returns the -scaled/transformed/backtransformed dataframe. The `backtransform` flag signals -if the current processing step is applied to data that's assumed to be scaled -and expects backtransformed values. The `outputs` flag signals if the current -processing step is applied to the output or tatget of the model. This can be -used to eg. skip these processing steps and only scale the model input. +scaled/transformed/backtransformed dataframe. The `backtransform` flag +signals if the current processing step is applied to data that's +assumed to be scaled and expects backtransformed values. The `outputs` +flag signals if the current processing step is applied to the output +or tatget of the model. This can be used to eg. skip these processing +steps and only scale the model input. -`training_step (model, predictor, target, validity)`: Trains the model after -each iteration. `validity` is the bool index vector given by -`validate_predictions` and can eg. be used to only train on values that have not -been valid predictions. \ No newline at end of file +`training_step (model, predictor, target, validity)`: Trains the model +after each iteration. `validity` is the bool index vector given by +`validate_predictions` and can eg. be used to only train on values +that have not been valid predictions. diff --git a/R_lib/kin_r_library.R b/R_lib/kin_r_library.R index cb8eaecd3..143c72df5 100644 --- a/R_lib/kin_r_library.R +++ b/R_lib/kin_r_library.R @@ -1,4 +1,4 @@ -## Time-stamp: "Last modified 2023-08-15 11:58:23 delucia" +## Time-stamp: "Last modified 2024-06-11 14:26:33 delucia" ### Copyright (C) 2018-2023 Marco De Lucia, Max Luebke (GFZ Potsdam) ### @@ -35,14 +35,18 @@ master_init <- function(setup, out_dir, init_field) { setup$iterations <- setup$maxiter setup$simulation_time <- 0 + dgts <- as.integer(ceiling(log10(setup$maxiter))) + ## string format to use in sprintf + fmt <- paste0("%0", dgts, "d") + if (is.null(setup[["store_result"]])) { setup$store_result <- TRUE } if (setup$store_result) { - init_field_out <- paste0(out_dir, "/iter_0.rds") + init_field_out <- paste0(out_dir, "/iter_", sprintf(fmt = fmt, 0), ".", setup$out_ext) init_field <- data.frame(init_field, check.names = FALSE) - saveRDS(init_field, file = init_field_out) + SaveRObj(x = init_field, path = init_field_out) msgm("Stored initial field in ", init_field_out) if (is.null(setup[["out_save"]])) { setup$out_save <- seq(1, setup$iterations) @@ -69,7 +73,7 @@ master_iteration_end <- function(setup, state_T, state_C) { ## comprised in setup$out_save if (setup$store_result) { if (iter %in% setup$out_save) { - nameout <- paste0(setup$out_dir, "/iter_", sprintf(fmt = fmt, iter), ".rds") + nameout <- paste0(setup$out_dir, "/iter_", sprintf(fmt = fmt, iter), ".", setup$out_ext) state_T <- data.frame(state_T, check.names = FALSE) state_C <- data.frame(state_C, check.names = FALSE) @@ -77,13 +81,14 @@ master_iteration_end <- function(setup, state_T, state_C) { prediction_time = if(exists("ai_prediction_time")) as.integer(ai_prediction_time) else NULL, training_time = if(exists("ai_training_time")) as.integer(ai_training_time) else NULL, valid_predictions = if(exists("validity_vector")) validity_vector else NULL) - saveRDS(list( - T = state_T, - C = state_C, - simtime = as.integer(setup$simulation_time), - totaltime = as.integer(totaltime), - ai_surrogate_info = ai_surrogate_info - ), file = nameout) + + SaveRObj(x = list( + T = state_T, + C = state_C, + simtime = as.integer(setup$simulation_time), + totaltime = as.integer(totaltime), + ai_surrogate_info = ai_surrogate_info + ), path = nameout) msgm("results stored in <", nameout, ">") } } @@ -172,3 +177,30 @@ GetWorkPackageSizesVector <- function(n_packages, package_size, len) { ids <- rep(1:n_packages, times = package_size, each = 1)[1:len] return(as.integer(table(ids))) } + + +## Handler to read R objs from binary files using either builtin +## readRDS() or qs::qread() based on file extension +ReadRObj <- function(path) { + ## code borrowed from tools::file_ext() + pos <- regexpr("\\.([[:alnum:]]+)$", path) + extension <- ifelse(pos > -1L, substring(path, pos + 1L), "") + + switch(extension, + rds = readRDS(path), + qs = qs::qread(path)) +} + +## Handler to store R objs to binary files using either builtin +## saveRDS() or qs::qsave() based on file extension +SaveRObj <- function(x, path) { + msgm("Storing to", path) + ## code borrowed from tools::file_ext() + pos <- regexpr("\\.([[:alnum:]]+)$", path) + extension <- ifelse(pos > -1L, substring(path, pos + 1L), "") + + switch(extension, + rds = saveRDS(object = x, file=path), + qs = qs::qsave(x=x, file = path)) +} + diff --git a/src/poet.cpp b/src/poet.cpp index 4a0abc2c1..06e15b16a 100644 --- a/src/poet.cpp +++ b/src/poet.cpp @@ -52,17 +52,23 @@ static int MY_RANK = 0; static std::unique_ptr global_rt_setup; -// we need some layz evaluation, as we can't define the functions before the R -// runtime is initialized +// we need some lazy evaluation, as we can't define the functions +// before the R runtime is initialized static std::optional master_init_R; static std::optional master_iteration_end_R; static std::optional store_setup_R; +static std::optional ReadRObj_R; +static std::optional SaveRObj_R; +static std::optional source_R; static void init_global_functions(RInside &R) { R.parseEval(kin_r_library); - master_init_R = Rcpp::Function("master_init"); + master_init_R = Rcpp::Function("master_init"); master_iteration_end_R = Rcpp::Function("master_iteration_end"); - store_setup_R = Rcpp::Function("StoreSetup"); + store_setup_R = Rcpp::Function("StoreSetup"); + source_R = Rcpp::Function("source"); + ReadRObj_R = Rcpp::Function("ReadRObj"); + SaveRObj_R = Rcpp::Function("SaveRObj"); } // HACK: this is a step back as the order and also the count of fields is @@ -150,8 +156,16 @@ ParseRet parseInitValues(char **argv, RuntimeParameters ¶ms) { params.use_ai_surrogate = cmdl["ai-surrogate"]; + // MDL: optional flag "qs" to switch to qsave() + params.out_ext = "rds"; + if (cmdl["qs"]) { + MSG("Enabled output"); + params.out_ext = "qs"; + } + if (MY_RANK == 0) { // MSG("Complete results storage is " + BOOL_PRINT(simparams.store_result)); + MSG("Output format/extension is " + params.out_ext); MSG("Work Package Size: " + std::to_string(params.work_package_size)); MSG("DHT is " + BOOL_PRINT(params.use_dht)); MSG("AI Surrogate is " + BOOL_PRINT(params.use_ai_surrogate)); @@ -207,18 +221,22 @@ ParseRet parseInitValues(char **argv, RuntimeParameters ¶ms) { // R["dht_log"] = simparams.dht_log; try { - Rcpp::Function source("source"); - Rcpp::Function readRDS("readRDS"); + // Rcpp::Function source("source"); + // Rcpp::Function ReadRObj("ReadRObj"); + // Rcpp::Function SaveRObj("SaveRObj"); - Rcpp::List init_params_ = readRDS(init_file); + Rcpp::List init_params_ = ReadRObj_R.value()(init_file); params.init_params = init_params_; - + global_rt_setup = std::make_unique(); - *global_rt_setup = source(runtime_file, Rcpp::Named("local", true)); + *global_rt_setup = source_R.value()(runtime_file, Rcpp::Named("local", true)); *global_rt_setup = global_rt_setup->operator[]("value"); + // MDL add "out_ext" for output format to R setup + (*global_rt_setup)["out_ext"] = params.out_ext; + params.timesteps = - Rcpp::as>(global_rt_setup->operator[]("timesteps")); + Rcpp::as>(global_rt_setup->operator[]("timesteps")); } catch (const std::exception &e) { ERRMSG("Error while parsing R scripts: " + std::string(e.what())); @@ -450,18 +468,21 @@ std::vector getSpeciesNames(const Field &&field, int root, int main(int argc, char *argv[]) { int world_size; - + MPI_Init(&argc, &argv); { MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_rank(MPI_COMM_WORLD, &MY_RANK); - + RInsidePOET &R = RInsidePOET::getInstance(); - + if (MY_RANK == 0) { MSG("Running POET version " + std::string(poet_version)); } + + + init_global_functions(R); RuntimeParameters run_params; @@ -473,19 +494,19 @@ int main(int argc, char *argv[]) { case ParseRet::PARSER_OK: break; } - + InitialList init_list(R); init_list.importList(run_params.init_params, MY_RANK != 0); - + MSG("RInside initialized on process " + std::to_string(MY_RANK)); - + std::cout << std::flush; - + MPI_Barrier(MPI_COMM_WORLD); - + ChemistryModule chemistry(run_params.work_package_size, init_list.getChemistryInit(), MPI_COMM_WORLD); - + const ChemistryModule::SurrogateSetup surr_setup = { getSpeciesNames(init_list.getInitialGrid(), 0, MPI_COMM_WORLD), run_params.use_dht, @@ -501,56 +522,58 @@ int main(int argc, char *argv[]) { if (MY_RANK > 0) { chemistry.WorkerLoop(); } else { - init_global_functions(R); // R.parseEvalQ("mysetup <- setup"); // // if (MY_RANK == 0) { // get timestep vector from // // grid_init function ... // *global_rt_setup = - master_init_R.value()(*global_rt_setup, run_params.out_dir, - init_list.getInitialGrid().asSEXP()); + master_init_R.value()(*global_rt_setup, run_params.out_dir, + init_list.getInitialGrid().asSEXP()); // MDL: store all parameters // MSG("Calling R Function to store calling parameters"); // R.parseEvalQ("StoreSetup(setup=mysetup)"); + R["out_ext"] = run_params.out_ext; + R["out_dir"] = run_params.out_dir; + if (run_params.use_ai_surrogate) { /* Incorporate ai surrogate from R */ R.parseEvalQ(ai_surrogate_r_library); /* Use dht species for model input and output */ R["ai_surrogate_species"] = init_list.getChemistryInit().dht_species.getNames(); - R["out_dir"] = run_params.out_dir; - + const std::string ai_surrogate_input_script = init_list.getChemistryInit().ai_surrogate_input_script; - - MSG("AI: sourcing user-provided script"); - R.parseEvalQ(ai_surrogate_input_script); - + + MSG("AI: sourcing user-provided script"); + R.parseEvalQ(ai_surrogate_input_script); + MSG("AI: initialize AI model"); - R.parseEval("model <- initiate_model()"); + R.parseEval("model <- initiate_model()"); R.parseEval("gpu_info()"); - } - + } + MSG("Init done on process with rank " + std::to_string(MY_RANK)); - + // MPI_Barrier(MPI_COMM_WORLD); - + DiffusionModule diffusion(init_list.getDiffusionInit(), init_list.getInitialGrid()); - + chemistry.masterSetField(init_list.getInitialGrid()); - + Rcpp::List profiling = RunMasterLoop(R, run_params, diffusion, chemistry); - + MSG("finished simulation loop"); - + R["profiling"] = profiling; R["setup"] = *global_rt_setup; + R["setup$out_ext"] = run_params.out_ext; string r_vis_code; r_vis_code = - "saveRDS(profiling, file=paste0(setup$out_dir,'/timings.rds'));"; + "SaveRObj(x = profiling, path = paste0(out_dir, '/timings.', setup$out_ext));"; R.parseEval(r_vis_code); - + MSG("Done! Results are stored as R objects into <" + run_params.out_dir + - "/timings.rds>"); + "/timings." + run_params.out_ext); } } diff --git a/src/poet.hpp.in b/src/poet.hpp.in index cca89e264..660a9e074 100644 --- a/src/poet.hpp.in +++ b/src/poet.hpp.in @@ -39,7 +39,7 @@ static const inline std::string ai_surrogate_r_library = R"(@R_AI_SURROGATE_LIB@ static const inline std::string r_runtime_parameters = "mysetup"; const std::set flaglist{"ignore-result", "dht", "P", "progress", - "interp", "ai-surrogate"}; + "interp", "ai-surrogate", "qs"}; const std::set paramlist{ "work-package-size", "dht-strategy", "dht-size", "dht-snaps", "dht-file", "interp-size", "interp-min", "interp-bucket-entries"}; @@ -51,6 +51,7 @@ constexpr uint32_t CHEM_DHT_SIZE_PER_PROCESS_MB = 1.5E3; struct RuntimeParameters { std::string out_dir; std::vector timesteps; + std::string out_ext; // MDL added to accomodate for qs::qsave/qread bool print_progressbar; uint32_t work_package_size; From d35a9a6d95588d571d53345d324ac94b6d3c8f07 Mon Sep 17 00:00:00 2001 From: Marco De Lucia Date: Tue, 11 Jun 2024 18:45:40 +0200 Subject: [PATCH 02/22] fixed initializer. Format is given by extension in the `-o` argument --- src/initializer.cpp | 32 ++++++++++++++++++++------------ src/poet.cpp | 3 +-- 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/src/initializer.cpp b/src/initializer.cpp index d9a8afadf..2c76e5420 100644 --- a/src/initializer.cpp +++ b/src/initializer.cpp @@ -12,15 +12,15 @@ int main(int argc, char **argv) { + // pre-register expected parameters before calling `parse` argh::parser cmdl({"-o", "--output"}); - cmdl.parse(argc, argv); if (cmdl[{"-h", "--help"}] || cmdl.pos_args().size() != 2) { std::cout << "Usage: " << argv[0] << " [-o, --output output_file]" - " [-s, --setwd] " - " " + << " [-s, --setwd] " + << " " << std::endl; return EXIT_SUCCESS; } @@ -28,6 +28,7 @@ int main(int argc, char **argv) { RInside R(argc, argv); R.parseEvalQ(init_r_library); + R.parseEvalQ(kin_r_library); std::string input_script = cmdl.pos_args()[1]; std::string normalized_path_script; @@ -53,11 +54,19 @@ int main(int argc, char **argv) { std::string output_file; - cmdl({"-o", "--output"}, - curr_path + "/" + - in_file_name.substr(0, in_file_name.find_last_of('.')) + ".rds") >> - output_file; + // MDL: some test to understand + // std::string output_ext = ".rds"; + // if (cmdl["q"]) output_ext = ".qs"; + // std::cout << "Ouptut ext: " << output_ext << " ; infile substr: " + // << in_file_name.substr(0, in_file_name.find_last_of('.')) << std::endl; + // cmdl({"-o", "--output"}, + // curr_path + "/" + + // in_file_name.substr(0, in_file_name.find_last_of('.')) + ".qs") >> output_file; + + cmdl({"-o", "--output"}) >> output_file; + + if (cmdl[{"-s", "--setwd"}]) { const std::string dir_path = Rcpp::as( Rcpp::Function("dirname")(normalized_path_script)); @@ -71,13 +80,12 @@ int main(int argc, char **argv) { init.initializeFromList(setup); - // replace file extension by .rds - Rcpp::Function save("saveRDS"); - - save(init.exportList(), Rcpp::wrap(output_file)); + // use the generic handler defined in kin_r_library.R + Rcpp::Function SaveRObj_R("SaveRObj"); + SaveRObj_R(init.exportList(), Rcpp::wrap(output_file)); std::cout << "Saved result to " << output_file << std::endl; // parseGrid(R, grid, results); return EXIT_SUCCESS; -} \ No newline at end of file +} diff --git a/src/poet.cpp b/src/poet.cpp index 06e15b16a..229c48255 100644 --- a/src/poet.cpp +++ b/src/poet.cpp @@ -156,10 +156,9 @@ ParseRet parseInitValues(char **argv, RuntimeParameters ¶ms) { params.use_ai_surrogate = cmdl["ai-surrogate"]; - // MDL: optional flag "qs" to switch to qsave() + // MDL: optional flag "--qs" to switch to qsave() params.out_ext = "rds"; if (cmdl["qs"]) { - MSG("Enabled output"); params.out_ext = "qs"; } From eee1f0d689fc8e7ccfe9cefa8668fbede0a93a1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Max=20L=C3=BCbke?= Date: Wed, 12 Jun 2024 09:37:36 +0200 Subject: [PATCH 03/22] refactor: Rework deferred R function evaluation fix: Unique pointer behaviour of `global_rt_setup` was messed up --- src/Base/RInsidePOET.hpp | 60 ++++++++++++------- src/Chemistry/SurrogateModels/DHT_Wrapper.cpp | 4 +- .../SurrogateModels/InterpolationModule.cpp | 4 +- src/Init/InitialList.hpp | 8 +-- src/poet.cpp | 43 +++++++------ test/testField.cpp | 4 +- test/testNamedVector.cpp | 10 ++-- 7 files changed, 76 insertions(+), 57 deletions(-) diff --git a/src/Base/RInsidePOET.hpp b/src/Base/RInsidePOET.hpp index 2897fc5a8..466c49375 100644 --- a/src/Base/RInsidePOET.hpp +++ b/src/Base/RInsidePOET.hpp @@ -1,17 +1,13 @@ -#ifndef RPOET_H_ -#define RPOET_H_ +#pragma once #include #include #include -#include #include -#include -#include +#include #include -#include -#include +namespace poet { class RInsidePOET : public RInside { public: static RInsidePOET &getInstance() { @@ -33,44 +29,64 @@ private: RInsidePOET() : RInside(){}; }; -template class RHookFunction { +/** + * @brief Deferred evaluation function + * + * The class is intended to call R functions within an existing RInside + * instance. The problem with "original" Rcpp::Function is that they require: + * 1. RInside instance already present, restricting the declaration of + * Rcpp::Functions in global scope + * 2. Require the function to be present. Otherwise, they will throw an + * exception. + * This class solves both problems by deferring the evaluation of the function + * until the constructor is called and evaluating whether the function is + * present or not, wihout throwing an exception. + * + * @tparam T Return type of the function + */ +class DEFunc { public: - RHookFunction() {} - RHookFunction(RInside &R, const std::string &f_name) { + DEFunc() {} + DEFunc(const std::string &f_name) { try { - this->func = Rcpp::Function(Rcpp::as(R.parseEval(f_name.c_str()))); + this->func = std::make_shared(f_name); } catch (const std::exception &e) { } } - RHookFunction(SEXP f) { + DEFunc(SEXP f) { try { - this->func = Rcpp::Function(f); + this->func = std::make_shared(f); } catch (const std::exception &e) { } } - template T operator()(Args... args) const { - if (func.has_value()) { - return (Rcpp::as(this->func.value()(args...))); + template SEXP operator()(Args... args) const { + if (func) { + return (*this->func)(args...); } else { throw std::exception(); } } - RHookFunction &operator=(const RHookFunction &rhs) { + DEFunc &operator=(const DEFunc &rhs) { this->func = rhs.func; return *this; } - RHookFunction(const RHookFunction &rhs) { this->func = rhs.func; } + DEFunc(const DEFunc &rhs) { this->func = rhs.func; } - bool isValid() const { return this->func.has_value(); } + bool isValid() const { return static_cast(func); } - SEXP asSEXP() const { return Rcpp::as(this->func.value()); } + SEXP asSEXP() const { + if (!func) { + return R_NilValue; + } + return Rcpp::as(*this->func.get()); + } private: - std::optional func; + std::shared_ptr func; }; -#endif // RPOET_H_ +} // namespace poet \ No newline at end of file diff --git a/src/Chemistry/SurrogateModels/DHT_Wrapper.cpp b/src/Chemistry/SurrogateModels/DHT_Wrapper.cpp index eac73b14a..a7c1827a6 100644 --- a/src/Chemistry/SurrogateModels/DHT_Wrapper.cpp +++ b/src/Chemistry/SurrogateModels/DHT_Wrapper.cpp @@ -25,6 +25,7 @@ #include "Init/InitialList.hpp" #include "Rounding.hpp" +#include #include #include #include @@ -267,7 +268,8 @@ LookupKey DHT_Wrapper::fuzzForDHT_R(const std::vector &cell, NamedVector input_nv(this->output_names, cell); - const std::vector eval_vec = hooks.dht_fuzz(input_nv); + const std::vector eval_vec = + Rcpp::as>(hooks.dht_fuzz(input_nv)); assert(eval_vec.size() == this->key_count); LookupKey vecFuzz(this->key_count + 1, {.0}); diff --git a/src/Chemistry/SurrogateModels/InterpolationModule.cpp b/src/Chemistry/SurrogateModels/InterpolationModule.cpp index 455c96729..e6015b14b 100644 --- a/src/Chemistry/SurrogateModels/InterpolationModule.cpp +++ b/src/Chemistry/SurrogateModels/InterpolationModule.cpp @@ -9,6 +9,7 @@ #include "Rounding.hpp" #include +#include #include #include @@ -94,7 +95,8 @@ void InterpolationModule::tryInterpolation(WorkPackage &work_package) { if (hooks.interp_pre.isValid()) { NamedVector nv_in(this->out_names, work_package.input[wp_i]); - auto rm_indices = hooks.interp_pre(nv_in, pht_result.in_values); + std::vector rm_indices = Rcpp::as>( + hooks.interp_pre(nv_in, pht_result.in_values)); pht_result.size -= rm_indices.size(); diff --git a/src/Init/InitialList.hpp b/src/Init/InitialList.hpp index 3e6ae7654..3a3c5ea23 100644 --- a/src/Init/InitialList.hpp +++ b/src/Init/InitialList.hpp @@ -215,10 +215,10 @@ private: public: struct ChemistryHookFunctions { - RHookFunction dht_fill; - RHookFunction> dht_fuzz; - RHookFunction> interp_pre; - RHookFunction interp_post; + poet::DEFunc dht_fill; + poet::DEFunc dht_fuzz; + poet::DEFunc interp_pre; + poet::DEFunc interp_post; }; struct ChemistryInit { diff --git a/src/poet.cpp b/src/poet.cpp index 229c48255..9151016a5 100644 --- a/src/poet.cpp +++ b/src/poet.cpp @@ -4,7 +4,8 @@ ** ** Copyright (C) 2018-2022 Marco De Lucia, Max Luebke (GFZ Potsdam) ** -** Copyright (C) 2023-2024 Max Luebke (University of Potsdam) +** Copyright (C) 2023-2024 Marco De Lucia (GFZ Potsdam), Max Luebke (University +** of Potsdam) ** ** POET is free software; you can redistribute it and/or modify it under the ** terms of the GNU General Public License as published by the Free Software @@ -36,7 +37,6 @@ #include #include #include -#include #include #include "Base/argh.hpp" @@ -54,21 +54,21 @@ static std::unique_ptr global_rt_setup; // we need some lazy evaluation, as we can't define the functions // before the R runtime is initialized -static std::optional master_init_R; -static std::optional master_iteration_end_R; -static std::optional store_setup_R; -static std::optional ReadRObj_R; -static std::optional SaveRObj_R; -static std::optional source_R; +static poet::DEFunc master_init_R; +static poet::DEFunc master_iteration_end_R; +static poet::DEFunc store_setup_R; +static poet::DEFunc ReadRObj_R; +static poet::DEFunc SaveRObj_R; +static poet::DEFunc source_R; static void init_global_functions(RInside &R) { R.parseEval(kin_r_library); - master_init_R = Rcpp::Function("master_init"); - master_iteration_end_R = Rcpp::Function("master_iteration_end"); - store_setup_R = Rcpp::Function("StoreSetup"); - source_R = Rcpp::Function("source"); - ReadRObj_R = Rcpp::Function("ReadRObj"); - SaveRObj_R = Rcpp::Function("SaveRObj"); + master_init_R = DEFunc("master_init"); + master_iteration_end_R = DEFunc("master_iteration_end"); + store_setup_R = DEFunc("StoreSetup"); + source_R = DEFunc("source"); + ReadRObj_R = DEFunc("ReadRObj"); + SaveRObj_R = DEFunc("SaveRObj"); } // HACK: this is a step back as the order and also the count of fields is @@ -224,12 +224,12 @@ ParseRet parseInitValues(char **argv, RuntimeParameters ¶ms) { // Rcpp::Function ReadRObj("ReadRObj"); // Rcpp::Function SaveRObj("SaveRObj"); - Rcpp::List init_params_ = ReadRObj_R.value()(init_file); + Rcpp::List init_params_(ReadRObj_R(init_file)); params.init_params = init_params_; - - global_rt_setup = std::make_unique(); - *global_rt_setup = source_R.value()(runtime_file, Rcpp::Named("local", true)); - *global_rt_setup = global_rt_setup->operator[]("value"); + + global_rt_setup = std::make_unique( + source_R(runtime_file, Rcpp::Named("local", true))); + *global_rt_setup = (*global_rt_setup)["value"]; // MDL add "out_ext" for output format to R setup (*global_rt_setup)["out_ext"] = params.out_ext; @@ -524,9 +524,8 @@ int main(int argc, char *argv[]) { // R.parseEvalQ("mysetup <- setup"); // // if (MY_RANK == 0) { // get timestep vector from // // grid_init function ... // - *global_rt_setup = - master_init_R.value()(*global_rt_setup, run_params.out_dir, - init_list.getInitialGrid().asSEXP()); + *global_rt_setup = master_init_R(*global_rt_setup, run_params.out_dir, + init_list.getInitialGrid().asSEXP()); // MDL: store all parameters // MSG("Calling R Function to store calling parameters"); // R.parseEvalQ("StoreSetup(setup=mysetup)"); diff --git a/test/testField.cpp b/test/testField.cpp index 0800b4dbd..51858ecc4 100644 --- a/test/testField.cpp +++ b/test/testField.cpp @@ -89,14 +89,14 @@ TEST_CASE("Field") { } SUBCASE("Apply R function (set Na to zero)") { - RHookFunction to_call(R, "simple_field"); + poet::DEFunc to_call("simple_field"); Field field_proc = to_call(dut.asSEXP()); CHECK_EQ(field_proc["Na"], FieldColumn(dut.GetRequestedVecSize(), 0)); } SUBCASE("Apply R function (add two fields)") { - RHookFunction to_call(R, "extended_field"); + poet::DEFunc to_call("extended_field"); Field field_proc = to_call(dut.asSEXP(), dut.asSEXP()); CHECK_EQ(field_proc["Na"], diff --git a/test/testNamedVector.cpp b/test/testNamedVector.cpp index 7b86c7496..71d575ba0 100644 --- a/test/testNamedVector.cpp +++ b/test/testNamedVector.cpp @@ -9,7 +9,7 @@ #include "testDataStructures.hpp" TEST_CASE("NamedVector") { - RInsidePOET &R = RInsidePOET::getInstance(); + poet::RInsidePOET &R = poet::RInsidePOET::getInstance(); R["sourcefile"] = RInside_source_file; R.parseEval("source(sourcefile)"); @@ -36,14 +36,14 @@ TEST_CASE("NamedVector") { } SUBCASE("Apply R function (set to zero)") { - RHookFunction> to_call(R, "simple_named_vec"); + poet::DEFunc to_call("simple_named_vec"); nv = to_call(nv); CHECK_EQ(nv[2], 0); } SUBCASE("Apply R function (second NamedVector)") { - RHookFunction> to_call(R, "extended_named_vec"); + poet::DEFunc to_call("extended_named_vec"); const std::vector names{{"C", "H", "Mg"}}; const std::vector values{{0, 1, 2}}; @@ -56,8 +56,8 @@ TEST_CASE("NamedVector") { } SUBCASE("Apply R function (check if zero)") { - RHookFunction to_call(R, "bool_named_vec"); + poet::DEFunc to_call("bool_named_vec"); - CHECK_FALSE(to_call(nv)); + CHECK_FALSE(Rcpp::as(to_call(nv))); } } From e7c0f6cc49b68bfc704699aa39b6dce2d7694865 Mon Sep 17 00:00:00 2001 From: Marco De Lucia Date: Thu, 13 Jun 2024 09:31:19 +0200 Subject: [PATCH 04/22] Update README.md --- README.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index ec75dceaf..3c86f7821 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Distributed Hash Table. ## Parsed code documentiation -A parsed version of POET's documentiation can be found at [Gitlab +A parsed version of POET's documentation can be found at [Gitlab pages](https://naaice.git-pages.gfz-potsdam.de/poet). ## External Libraries @@ -29,25 +29,24 @@ The following external header library is shipped with POET: ### Requirements -To compile POET you need several software to be installed: +To compile POET you need following software to be installed: - C/C++ compiler (tested with GCC) - MPI-Implementation (tested with OpenMPI and MVAPICH) -- R language and environment - CMake 3.9+ - Eigen3 3.4+ (required by `tug`) -- *optional*: `doxygen` with `dot` bindings for documentiation +- *optional*: `doxygen` with `dot` bindings for documentation +- R language and environment (distro dependent) -The following R libraries must then be installed, which will get the -needed dependencies automatically: +The following R packages (and their dependencies) must also be installed: - [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html) - [RInside](https://cran.r-project.org/web/packages/RInside/index.html) +- [qs](https://cran.r-project.org/web/packages/qs/index.html) ### Compiling source code -The generation of makefiles is done with CMake. You should be able to generate -Makefiles by running: +POET is built with CMake. You can generate Makefiles by running the usual: ```sh mkdir build && cd build @@ -58,7 +57,7 @@ This will create the directory `build` and processes the CMake files and generate Makefiles from it. You're now able to run `make` to start build process. -If everything went well you'll find the executable at +If everything went well you'll find the executables at `build/app/poet`, but it is recommended to install the POET project structure to a desired `CMAKE_INSTALL_PREFIX` with `make install`. From 2e115c865b981701115ed8cb4dd362a85db74c8c Mon Sep 17 00:00:00 2001 From: Marco De Lucia Date: Thu, 12 Sep 2024 12:36:11 +0200 Subject: [PATCH 05/22] Fixing rebase conflicts --- README.md | 114 ++++++++++++++++++++++++++++++--------------------- ext/iphreeqc | 2 +- src/poet.cpp | 24 ++++++----- 3 files changed, 83 insertions(+), 57 deletions(-) diff --git a/README.md b/README.md index 3c86f7821..c192eda8a 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,16 @@ # POET -[POET](https://doi.org/10.5281/zenodo.4757913) is a coupled reactive transport -simulator implementing a parallel architecture and a fast, original MPI-based -Distributed Hash Table. +[POET](https://doi.org/10.5281/zenodo.4757913) is a coupled reactive +transport simulator implementing a parallel architecture and a fast, +original MPI-based Distributed Hash Table. ![POET's Coupling Scheme](./docs/Scheme_POET_en.svg) @@ -17,7 +21,7 @@ pages](https://naaice.git-pages.gfz-potsdam.de/poet). ## External Libraries -The following external header library is shipped with POET: +The following external libraries are shipped with POET: - **argh** - https://github.com/adishavit/argh (BSD license) - **IPhreeqc** with patches from GFZ - @@ -36,17 +40,32 @@ To compile POET you need following software to be installed: - CMake 3.9+ - Eigen3 3.4+ (required by `tug`) - *optional*: `doxygen` with `dot` bindings for documentation -- R language and environment (distro dependent) +- R language and environment including headers or `-dev` packages + (distro dependent) -The following R packages (and their dependencies) must also be installed: +The following R packages (and their dependencies) must also be +installed: - [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html) - [RInside](https://cran.r-project.org/web/packages/RInside/index.html) - [qs](https://cran.r-project.org/web/packages/qs/index.html) +This can be simply achieved by issuing the following commands: + +```sh +# start R environment +$ R + +# install R dependencies (case sensitive!) +> install.packages(c("Rcpp", "RInside","qs")) +> q(save="no") +``` + + ### Compiling source code -POET is built with CMake. You can generate Makefiles by running the usual: +POET is built with CMake. You can generate Makefiles by running the +usual: ```sh mkdir build && cd build @@ -58,22 +77,22 @@ and generate Makefiles from it. You're now able to run `make` to start build process. If everything went well you'll find the executables at -`build/app/poet`, but it is recommended to install the POET project +`build/src/poet`, but it is recommended to install the POET project structure to a desired `CMAKE_INSTALL_PREFIX` with `make install`. During the generation of Makefiles, various options can be specified via `cmake -D