Extend Readme
This commit is contained in:
parent
6bac8f5a22
commit
69350034eb
89
README.org
89
README.org
@ -8,7 +8,88 @@ multiplication on a single CPU core while using SYCL for both OpenMP and GPU
|
||||
parallelization. Subsequently, we will record and analyze the execution times.
|
||||
|
||||
At this stage, the project showcases how to transfer and manipulate data on the
|
||||
GPU using the Unified Shared Memory (USM) model with explicit data movement.
|
||||
Unfortunately, I've encountered a hurdle as my current implementation with =hip=
|
||||
lacks a valid USM provider for my graphics card, the AMD Radeon RX 6700 XT,
|
||||
preventing me from achieving implicit data movement for demonstration 😔
|
||||
GPU using +the Unified Shared Memory (USM) model with explicit data movement+ an
|
||||
abstract view to the host and device memory using buffers and accessors. I will
|
||||
not attend to implement those functions using Unified Shared Memory.
|
||||
|
||||
For more detailed information about the implementation and how specific
|
||||
functions are used, as well as explanations for the reasoning behind certain
|
||||
design choices, I recommend referring to the source code itself. The source code
|
||||
typically contains comments that provide insights into the code's functionality
|
||||
and rationale.
|
||||
|
||||
* Compilation
|
||||
|
||||
Regrettably, integrating Intel's oneAPI with the AMD GPU plugin proves to be
|
||||
quite challenging on Arch Linux, primarily due to the plugin's dependency on an
|
||||
older version of ROCm than what's available in the official repositories. While
|
||||
I could have chosen to compile my own ROCm/hip version, I opted for a more
|
||||
convenient solution and turned to the [[https://github.com/AdaptiveCpp/AdaptiveCpp/tree/develop][AdaptiveCpp]] compiler, which offers both
|
||||
CPU and GPU acceleration through CUDA and ROCm support. You can find a version
|
||||
of AdaptiveCpp compatible with AMD GPUs on the AUR (Arch User Repository).
|
||||
|
||||
If your goal is to run benchmarks on an AMD GPU alongside AdaptiveCpp, I
|
||||
recommend using [[https://github.com/sobc/pkgbuilds/tree/master/hipsycl-rocm-git][this]] specific PKGBUILD. Other versions that rely on ROCm might
|
||||
not build correctly at the moment. I've already raised an issue with the
|
||||
responsible maintainer of the PKGBUILDs to address this compatibility issu
|
||||
|
||||
Currently, I can only utilize CMake for generating makefiles when working with
|
||||
AdaptiveCpp. However, I intend to add CMake support for Intel's oneAPI as soon
|
||||
as I have a working version of the compiler.
|
||||
|
||||
To generate Makefiles for AdaptiveCpp, you can follow these steps:
|
||||
|
||||
#+BEGIN_SRC bash
|
||||
# Create a build directory and navigate to it
|
||||
mkdir build && cd build
|
||||
|
||||
# Adjust the path to AdaptiveCpp and your target devices according to your system
|
||||
cmake .. -DAdaptiveCpp_DIR=/opt/AdaptiveCpp/ROCm/lib/cmake/AdaptiveCpp -DACPP_TARGETS="omp.accelerated;hip.integrated-multipass;gfx90c"
|
||||
#+END_SRC
|
||||
|
||||
You can find more information about =ACPP_TARGETS= and the compilation process in
|
||||
the documentation [[https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/compilation.md][here]].
|
||||
|
||||
Once your Makefiles are generated, you can build the project using the following
|
||||
command:
|
||||
|
||||
#+BEGIN_SRC bash
|
||||
make -j$(nproc)
|
||||
#+END_SRC
|
||||
|
||||
The compiled executable can be found in the =build/src= directory.
|
||||
|
||||
* Data
|
||||
|
||||
I provide 6 different matrices with 3 different sizes:
|
||||
|
||||
- =sma*.txt= are matrices with the size of 16x16
|
||||
- =med*.txt= are matrices with the size of 2048x2048
|
||||
- =big*.txt= are matrices with the size of 8192x8192
|
||||
|
||||
All matrices are stored in text files under =data=.
|
||||
|
||||
*Warning*: If you're about to run the benchmark with the big matrices, please
|
||||
disable the benchmark on one single CPU core, unless you want to sit and wait
|
||||
forever. Do this by calling cmake with =-DSEQ_BENCH=OFF= and recompile the
|
||||
executable.
|
||||
|
||||
Below you will find the combination of all multiplication of all matrices and
|
||||
their checksum. Let me now if you encounter other checksums.
|
||||
|
||||
| Matrix A | Matrix B | Checksum |
|
||||
|------------+------------+--------------|
|
||||
| =sma1.txt= | =sma1.txt= | =0xe6134d8e= |
|
||||
| =sma2.txt= | =sma2.txt= | =0xf1ba0ac6= |
|
||||
| =sma1.txt= | =sma2.txt= | =0xe71fdf1e= |
|
||||
| =sma2.txt= | =sma1.txt= | =0x36b44d2c= |
|
||||
|------------+------------+--------------|
|
||||
| =med1.txt= | =med1.txt= | =0xd92eb6d6= |
|
||||
| =med2.txt= | =med2.txt= | =0x9f0e1206= |
|
||||
| =med1.txt= | =med2.txt= | =0x4cf45b91= |
|
||||
| =med2.txt= | =med1.txt= | =0xfdeb52bf= |
|
||||
|------------+------------+--------------|
|
||||
| =big1.txt= | =big1.txt= | =0xde9b4c0d= |
|
||||
| =big2.txt= | =big2.txt= | =0x5365fc1= |
|
||||
| =big1.txt= | =big2.txt= | =0xb185e6c1= |
|
||||
| =big2.txt= | =big1.txt= | =0x59f5ffef= |
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user