From ee92c75330b1458440060c9e00c9b0e70878415c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Max=20L=C3=BCbke?= <mluebke@uni-potsdam.de>
Date: Thu, 29 Aug 2024 14:05:01 +0200
Subject: [PATCH 1/3] doc: Update R package installation process for AI
 surrogate

---
 README.md | 95 +++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 78 insertions(+), 17 deletions(-)

diff --git a/README.md b/README.md
index cb6de363c..b17d73c12 100644
--- a/README.md
+++ b/README.md
@@ -201,6 +201,56 @@ resulting call would look like this:
 mpirun -n 4 ./poet --dht --dht-snaps=2 barite_het_rt.R barite_het.rds output
 ```
 
+### Example: Preparing Environment and Running with AI surrogate
+
+To run the AI surrogate, you need to install the R package `keras3`. The
+compilation process of POET remains the same as shown above.
+
+In the following code block, the installation process on the Turing Cluster is
+shown. `miniconda` is used to create a virtual environment to install
+tensorflow/keras. Please adapt the installation process to your needs.
+
+<!-- Start an R interactive session and install the required packages: -->
+
+```sh
+# First, install the required R packages
+R -e "install.packages('keras3', repos='https://cloud.r-project.org/')"
+
+# manually create a virtual environment to install keras/python using conda, 
+# as this is somehow broken on the Turing Cluster when using the `keras::install_keras()` function
+cd poet
+
+# create a virtual environment in the .ai directory with python 3.11
+conda create -p ./.ai python=3.11
+conda activate ./.ai
+
+# install tensorflow and keras
+pip install keras tensorflow[and-cuda]
+
+# add conda's python path to the R environment
+# make sure to have the conda environment activated
+echo -e "RETICULATE_PYTHON=$(which python)\n" >> ~/.Renviron
+```
+
+After setup the R environment, recompile POET and you're ready to run the AI
+surrogate.
+
+```sh
+cd <installation_dir>/bin
+
+# copy the benchmark files to the installation directory
+cp <project_root_dir>/bench/barite/{barite_50ai*,db_barite.dat,barite.pqi} .
+
+# preprocess the benchmark
+./poet_init barite_50ai.R
+
+# run POET with AI surrogate and GPU utilization
+srun --gres=gpu -N 1 -n 12 ./poet --ai-surrogate barite_50ai_rt.R barite_50ai.rds output
+```
+
+Keep in mind that the AI surrogate is currently not stable or might also not
+produce any valid predictions.
+
 ## Defining a model
 
 In order to provide a model to POET, you need to setup a R script which can then
@@ -224,29 +274,40 @@ where:
   to allow relative paths in the input script). However, the output file
   will be stored in the directory from which `poet_init` was called.
 
-## About the usage of MPI_Wtime()
-
-Implemented time measurement functions uses `MPI_Wtime()`. Some
-important information from the OpenMPI Man Page:
-
-For example, on platforms that support it, the clock_gettime()
-function will be used to obtain a monotonic clock value with whatever
-precision is supported on that platform (e.g., nanoseconds).
-
 ## Additional functions for the AI surrogate
 
-The AI surrogate can be activated for any benchmark and is by default initiated as a sequential keras model with three hidden layer of depth 48, 96, 24 with relu activation and adam optimizer. All functions in `ai_surrogate_model.R` can be overridden by adding custom definitions via an R file in the input script.
-This is done by adding the path to this file in the input script. Simply add the path as an element called `ai_surrogate_input_script` to the `chemistry_setup` list.
-Please use the global variable `ai_surrogate_base_path` as a base path when relative filepaths are used in custom funtions.
+The AI surrogate can be activated for any benchmark and is by default initiated
+as a sequential keras model with three hidden layer of depth 48, 96, 24 with
+relu activation and adam optimizer. All functions in `ai_surrogate_model.R` can
+be overridden by adding custom definitions via an R file in the input script.
+This is done by adding the path to this file in the input script. Simply add the
+path as an element called `ai_surrogate_input_script` to the `chemistry_setup`
+list. Please use the global variable `ai_surrogate_base_path` as a base path
+when relative filepaths are used in custom funtions.
 
-**There is currently no default implementation to determine the validity of predicted values.** This means, that every input script must include an R source file with a custom function `validate_predictions(predictors, prediction)`. Examples for custom functions can be found for the barite_200 benchmark
+**There is currently no default implementation to determine the validity of
+predicted values.** This means, that every input script must include an R source
+file with a custom function `validate_predictions(predictors, prediction)`.
+Examples for custom functions can be found for the barite_200 benchmark
 
 The functions can be defined as follows:  
 
-`validate_predictions(predictors, prediction)`: Returns a boolean index vector that signals for each row in the predictions if the values are considered valid. Can eg. be implemented as a mass balance threshold between the predictors and the prediction.
+`validate_predictions(predictors, prediction)`: Returns a boolean index vector
+that signals for each row in the predictions if the values are considered valid.
+Can eg. be implemented as a mass balance threshold between the predictors and
+the prediction.
 
-`initiate_model()`: Returns a keras model. Can be used to load pretrained models.
+`initiate_model()`: Returns a keras model. Can be used to load pretrained
+models.
 
-`preprocess(df, backtransform = FALSE, outputs = FALSE)`: Returns the scaled/transformed/backtransformed dataframe. The `backtransform` flag signals if the current processing step is applied to data that's assumed to be scaled and expects backtransformed values. The `outputs` flag signals if the current processing step is applied to the output or tatget of the model. This can be used to eg. skip these processing steps and only scale the model input.
+`preprocess(df, backtransform = FALSE, outputs = FALSE)`: Returns the
+scaled/transformed/backtransformed dataframe. The `backtransform` flag signals
+if the current processing step is applied to data that's assumed to be scaled
+and expects backtransformed values. The `outputs` flag signals if the current
+processing step is applied to the output or tatget of the model. This can be
+used to eg. skip these processing steps and only scale the model input.
 
-`training_step (model, predictor, target, validity)`: Trains the model after each iteration. `validity` is the bool index vector given by `validate_predictions` and can eg. be used to only train on values that have not been valid predictions.
\ No newline at end of file
+`training_step (model, predictor, target, validity)`: Trains the model after
+each iteration. `validity` is the bool index vector given by
+`validate_predictions` and can eg. be used to only train on values that have not
+been valid predictions.
\ No newline at end of file

From 23b4182a97e37d3e938eb521f1e46e89747bf6a6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Max=20L=C3=BCbke?= <mluebke@uni-potsdam.de>
Date: Thu, 29 Aug 2024 14:05:07 +0200
Subject: [PATCH 2/3] chore: Remove unnecessary code in FindRRuntime.cmake

---
 CMake/FindRRuntime.cmake | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/CMake/FindRRuntime.cmake b/CMake/FindRRuntime.cmake
index ade64af3a..8dd4de83c 100644
--- a/CMake/FindRRuntime.cmake
+++ b/CMake/FindRRuntime.cmake
@@ -24,8 +24,6 @@ else()
   message(FATAL_ERROR "No R runtime found!")
 endif()
 
-mark_as_advanced(R_INCLUDE_DIR R_LIBRARY R_EXE)
-
 set(R_LIBRARIES ${R_LIBRARY})
 set(R_INCLUDE_DIRS ${R_INCLUDE_DIR})
 
@@ -45,8 +43,6 @@ find_path(R_Rcpp_INCLUDE_DIR Rcpp.h
   HINTS ${RCPP_PATH}
   PATH_SUFFIXES include)
 
-mark_as_advanced(R_Rcpp_INCLUDE_DIR)
-
 list(APPEND R_INCLUDE_DIRS ${R_Rcpp_INCLUDE_DIR})
 
 # find RInside libraries and include path
@@ -72,8 +68,6 @@ find_path(R_RInside_INCLUDE_DIR RInside.h
 list(APPEND R_LIBRARIES ${R_RInside_LIBRARY})
 list(APPEND R_INCLUDE_DIRS ${R_RInside_INCLUDE_DIR})
 
-mark_as_advanced(R_RInside_LIBRARY R_RInside_INCLUDE_DIR)
-
 # putting all together into interface library
 
 add_library(RRuntime INTERFACE IMPORTED)

From 9119504dcb3b5352b02c0f504d0834c288972d0c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Max=20L=C3=BCbke?= <mluebke@uni-potsdam.de>
Date: Thu, 29 Aug 2024 14:05:36 +0200
Subject: [PATCH 3/3] chore: remove preprocessed file from project structure

---
 bench/barite/barite_50ai.rds | Bin 5146 -> 0 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 bench/barite/barite_50ai.rds

diff --git a/bench/barite/barite_50ai.rds b/bench/barite/barite_50ai.rds
deleted file mode 100644
index efc230f27bb201b6420fd86ef92f97cbc1e20398..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 5146
zcmcgvX;2f`n(c0<#TA}{fFPhP8WaQ-Widcr+X@CWG+S6DqAaonBai?A@*EVJU5OEp
zMImeoA*`~5z=N=3K$Z{#Bx#6Y4<W=5AS8^_Ggb3usvgsSW^Ub|r_OiI_nq%nU9IGO
z-~Dm+@R)akPSCyNR*#*QJBu~3ysH#ET)2m%b4!;rk<GBBBwgsUE^1ajZ0B$PFZ-&s
z=|{a{bx69vRuNQbUM?yK9c0iXQhKZNm!iLp6!sSPHkB&(7R;Y&`l{3!V+gFBI+OV_
z_wTc>Hl%w54M3o9mWmExeSF?M8>ox3<~ra8GAToT9RvM_u%?~j06<H`sci{$l~1R%
z+1O0?gif<ZmQd#-tp~V7sbu4he|2=ft<R6KvD>km5~7M$hBo-0-yyp7ra1z(!%}Dq
zT|%RGgZ2r-*US+iGGU0p4&9Fp*o76GPvnX_W5mvwZEo9^WM>ob&lP|(zHGY1+Irl)
z@u!aRnN|SgPa(alXV5c$3VB0WcT?O+UhkG_VH(lg<o~4$>*$$r_Q~DtWBG|_^b$c!
z`gIa?dUxozPLX3vGZJ>I8rcd>`o_ouaEy0@^$F|kqiv^_+JLdY?ACEJm~Pj<8(trG
z2W>4hU@UhB1K5gUtRS=d#cs8~C5gLczkzu7xT)qu^z7RPwpP5#Py*(SfwDd`5+4!1
zfZV)Kju0&5D4^ryq>riU<TV<VT+^FM3#fs%G__PuoQD|%AL_pj|GJliuX)ZDkf>IO
zrr&)rM9>Khf`HNYuXtR(Mu?Lx8{=C8aS2;K*&vRhteqTqrVAw75!VM#p4h3TWy&S+
z>HWw}52yen53<-^#lQ}4x9H3=(#xNnw=wwAl}`4^qVbzn4NMtt9uM|OxFRBkpTOQE
zz1apYL^@m7gl>%_-~<cJj`_iO1x;M(tRY{tu9<)Z0ZaXCYeLIwbCsnv&o|FNE<4ve
zOV(vv@5U|vbhAy*+hPQ4RJ+Dq+rL`QO?Kxdleo#L{lk3=*>Hhj>(SQZm4DmWiesNM
z=UXT0PTx$5b?uEPA>Bfpaqn%+xLY)z`7FinnyEF*fX$PN-^luj(jL&ud{l8KZuJj(
zv(dXnjTslbJ^J+RzYbjQHq1NWGi9)|K|h6&@J57{FB^|~{s&8QveGwcDY}gREP}<S
zNVx2dA6G+UYcT5}lKqTz#EU|h^5y&meS42S<;#`cD(AL*PI0m!X*1e)d;gIL@L&GS
zM*Hk?iw+)l?nqYp)&aq7)QcJI|0X^O8+MBG58S!l-_SLGpf_*iFZh0tY{>YlYdbw;
zmF^xV8I0q@fp1D+EuwbVwx4v!^VIIE(h5l1ErKdGY(=ZxqMTgJPuy)A`?^NNox7#{
z%l!rP%&lEYJg)v~j@V6Olgn1V$u0yL9(uaC0UKO@v4SqHUpURK#SP%NzZXw(lSMOK
zr6Cr}hc&+@qil?$xaB$Mu5v#kXnCKTV0NuLpgfvfcWEC8qv_f5oZsO*x~q}v{|`oD
zY)|YqPyJW__w%f=Z)$OT=do2~|ISQHX71kP;VSRhe7Dubz%zPlmcsh|=eCH3FuZeX
zbCYvRl<$N3*A>S?^g}8>^wVwB<5q=pUHbZ|nX>LTZ3wAMq)s=66ZG1`qM%0OJHP9S
z`4wzt^wL(h)Lm?3%xVo64iZvv+)2CqN$1D!5m#`NSZ4&mlYe8_W*DCk7rT=)=}d_)
z7u5&ldAC@`$l`DM8mG`9%hjIcTI|`*_=fJ}E!u+lAmbwJ3VAxpuH0Md?L<%a>?SCx
zr==reSeGiTdM=Z^<|!O>^y>#LlLIPD-d)qCo|#%5Z~zk`c0o<4Vj}RUGp5yE0#}oc
zy1O>&wd(LgYw1RNLyhz-4ldLM61<U=9P}o@So|JP?iQ_l)4z87VM))ri&87A>?~lt
zk)>mFJ3urpcps0QPE>!D&3rVaq0V_fcxWwE=UO<W&owlx*(0&?Spli0#Utj^B%6Pe
zS(Z;`M1mH_M;c)S6lF5AaC(~4j@-f3l;OA+1BlNGfIF0+T#o_Ef$j9#;YSzL6|G3Y
zV;!5SdmRDi)-~xPb)V591u?$pRajvF{?#mqQ|3mz3M2$2!^id&*%|gJk1zSZju%s*
zfwETN_u&_`oE^{#?w!=S&*7!x^IyvmF*Qq3x`*e&by|+Srv=9!alRAa|3Ped6jkW!
zGI2<hM0vf?cr@>dUM}oO^gYjko;kZ8U)O8v>O*D<<#}!mmz)wRJ~c811CJyWQicYL
zjuvPn@qkpg%Y-Jq;D}#+@DTB)Q9se=fwWBCpMb0PFo&yz_saXdKd4r3S$;egxro@v
zK1;3mA<q)HT+G~;m&v<&hoiF}cMxYP+7J%M2AeIGx}7A3Js6RQ*2gM=1^NX(>R#6a
zZxGx@9>z}~GpewH9SDOKp<2nVv~*6o!~)LqVlGZFS9rneZK8*twB%6=3WY^E;~DUy
zYWj}m!OvI&5)uPWYeuYJ1_$!4z-Q%-zY2Fx^)+-PnldA(eTj<K9yG7f-~?Jy2E+xg
z_+iMG^e8$H4&?`J<RiWVJD#|HH0|w+yZ0}z)mAp6Htq?TWfS{@NBU4xY;}>5R=jDo
zT_6gNqFjP(T?&TKN?ij{9V~Y*e^ra=-W<j|TzpBY@C6pB?r(v}bt|n<1HT;lt^xb5
zxyUOB;#{m|X&G8_p8BcP!O5T2;p(m+>g$QU1k1Y`R1^Dg{WTdlfl!^z%Qo5<hc{*L
zwk?Q8?)4SNYYdBN4@>Pb#wT5e!YDJJMu)c}%N%SvI<ubQtpgAqL+bkQXV^=XT~>ZS
zD-8~hT5&DMAE|c@-@XdV0yPVQVDWjk)9_}t{a)jPb92#2CU|l)XJt`|S#PJ2UKAam
zkIaoX#C%}86#&&l577CVGX)Puvrjptb1oi9SSXo1tZOtuh(^vM)fg@zS7r>OkRGLo
zwzKM1w0=lk_wD-E^>wl$cxr80Z`4&Vr6ud^v}4v+pNG_U)(wGq7hP%1go*(*f}s@*
z+0~peq|FEql&*&n^G3es-05emz(bI|@jrt`hrKqmLimqLFVR?r_=~_4UnGX!>qmtG
zNn2x!jO7yvsn<t+=-n;H)EN6;!6qOiRnK#0v(={?D|1z`u6~+YF{PP~nk%WxkJ6LC
z0i_=CYsp@&Pg2{r4`?$>%l>LN@@4U|3G@l#ln)xf;8`-JHgAlr%o-nKLbJ-<>x0G>
za1av5X5lzR%$uBc^*J5*xAv7XEoh$txsjrydnMeADL4~pj}20b9m#@R0~0Q_*3O4b
zg2$G%!Ac3Q-l+&3kq6>mn6>$I)bJ=$sjPh-SG7i!d16e|oUeuy&3Jl4oZP45@6B!>
zH1_wGfN{Zdzg*05;+O;&3kTehr{-U&d|Zb9lrmLXwoG0U*OuftnvybONDXx0=i|$k
zZ7SsBn&xQ!ddFcavLHaTpAq{R81X56U7}l%vLt35PzWa8h>HzD*GW7<Y8IL;qmwIE
z<SOloPm`E4i^2Srr?grgGC>0SO}?WhPR1F2+h9zN(A8rt9+p@QKx?Y)HPWKuhOHwN
zH0k*1d!fdGOXzX~i2Qt4{qpm8@-XM;keBtIgDQoz#ffs^drxJR*(53DS4$mYKCTK~
z1x&JJ#BJzA864_ljeCM#9vyeh2~sQ>1j3S_&g>-5!|sHebbjj*!YnT8Y7%|CP=xN9
z3}giNxrObl6G+3EbUx9_KWnDnzQf7u^TBeZJ6XJF_-S9X?Kl*Awg;BJ*Q*xN=zCR-
z10c$gz2=wVKK&|(uiD~Ug~a>K{mKn6dl%N&VF?u1o+oq<)8=J4fp+2Bmlu#7RLj;K
znsI5vzB{A*;PrKi(e!~|hjIW!cH<L?oU}bo;kw)$`UIaxYoZkCU8Ei~`)n||<?SVB
z3v)Sb5VVxi@gcvp%GbPqXiP91U;ZGg`K<JawVt~130})6=f;ybS*+c?*gD#yb8p<I
z+O00Ymp;yX$ggZPnKIK#QMfvlHzw@p;NG`qP2Vd8nQ(u#C9vBaVYzxivD8^*<=MA7
z#K{2F8g|y#0k^M@!<GU=h|rz~YuxB7u++VbwIyRdu{A<sowcX+h1TcW<TRD=iw<M$
zVRoaF!5VCMpf8DFSYA~RGKGz94Ife*6ldaTg6gNC#MeXKSaJztZ2X|#PJ}2EKMWr}
zPm{F|T1R?q`;@FK-lwuSdoG7#ave}^J7o^yY(??$uDr{ULwOfstKUfy6;TnN-hrf@
zF*S|Q7~cu_8@H*JM2TYF2ke*AgA+R;XL;H$tA4{G8wsF|AN^L`QEZoUG85rck4XHc
z@VVfnTZ?X`YEJsh<MuDHO)jhd^P_S_2q@tKv9LHcNk-F2Gm;l;&h04^59%CxJG~;~
zkOhf(W7pA+R#K_e@RT-nS-8!Z6THQ&2~Bo39w}-1A<<}nMde(vR$7PR7{#=>2a!eP
z6ul^+!YBMS2+<~fT;JO93X~fYQa%cv<OOkggw6Uv*)1oV8K@y-0^E=-Eo@>@jXp1E
z9YR@YePo<8Q$<Z8YMHkvddy&MU|R!l$|9E(OCDbsUjk300>Pz>6-a=^#{_4Aa~XEO
z?Gz7*pR{)KH(xG;rRHqz?P7yxoys-y4Y5)9yt*Z(uen)gy0PlVyV5Uv^Hv&@A=^)g
zw}BF$*xVv2bYy?tbD~)(z9(^M`ci|2THTT6f^5`h8D|ta_&(!lXcZ2LeGY$3V4<@<
zE=@h0+Ebu#ofGz&a79Cfr7myG=KI7US0*B_QQ?5TaZSw?ji6Rk#ADT2%;7uDgimWs
zsG;d%C6&b@k;%*Q+_(jtuwb+|9zxn6I6{TXj~js#ymZONxPQW3bH**ZFiG7JyF0*o
zq%MR+h*H}dIJYctNL8=D$jZ=OB0$_yo(b6fvjh!PAVSTTYH3I?ZuV2C4{)>j1yBq=
z+5=t!bM#=O(F1z`D`6AN=&3Jle^;w)iJV&@NKkON5_ZTMc`Qc{9Vkr&!nW`afm48Q
zk5O7G5>?a0$z#gysghrcXUcXSh61jL1_fSO>0jDOrDHn%J#=V@pP_+Qo0z~z&k;o%
zMVT#!z}rv}g{${#`a%6Rs+w$cOu#b%ZsW83TDpQ*OGESNFW2m%1(I4pPWs4Ck)i3c
zhnIBp_xiQ=%ch`x7?QDTv8>ZcK57~~U9~g#rLsA?C7X$8+8Uh2ufDAX<yL6IrD<sh
z@mtUYQ<HTl^(Yx*5`o&#A;}rM`JqHUB>I_!UG*Ag`Osl{CY3>3^fTTVHuwP`+I%{(
zy`L8o<!&qe>CxaKVB(tW>16W`wP!kx8OM9%pfV8{5hVPil0Ns9@`Z!-b7~!W;L!OF
zXHZ!sOoDCpcM~3B#O}jCs~*_Q=`$B5p2=Y_A(-dp^!2&uYfbb%&;etoHv=IR9ZU}d
zR`k(8_T~ql9KXl{oyROm@n%Dc73w#?5Q{nKNT}=gL>0a#rh0YLVVmH<Qqhzsl)l0w
zs?U`+JLg-hE`y4r;EN}9YsF-o#92#oMNIVFGat~{M@ES}Wr5=I5Uq79I^TmC)k10r
z4gX8`!wC7tYlmn{uoxwa=4Kqf`d+Oop9haWb@B5Qn^De`fE0a>x0lHu`0n2T%<E<d