142 lines
5.4 KiB
Org Mode
142 lines
5.4 KiB
Org Mode
#+title: Matrix multiplication with SYCL, yay
|
|
|
|
This project serves as a sample demonstration of SYCL syntax and offers a
|
|
straightforward program as an illustration.
|
|
|
|
Its primary objective is to function as a benchmark for executing matrix
|
|
multiplication on a single CPU core while using SYCL for both OpenMP and GPU
|
|
parallelization. Subsequently, we will record and analyze the execution times.
|
|
|
|
At this stage, the project showcases how to transfer and manipulate data on the
|
|
GPU using +the Unified Shared Memory (USM) model with explicit data movement+ an
|
|
abstract view to the host and device memory using buffers and accessors. I will
|
|
not attend to implement those functions using Unified Shared Memory.
|
|
|
|
For more detailed information about the implementation and how specific
|
|
functions are used, as well as explanations for the reasoning behind certain
|
|
design choices, I recommend referring to the source code itself. The source code
|
|
typically contains comments that provide insights into the code's functionality
|
|
and rationale.
|
|
|
|
* Prerequisites
|
|
|
|
To use the project, you'll need the following prerequisites:
|
|
|
|
** Mandatory Prerequisites
|
|
|
|
- A functional SYCL compiler. You can choose from options like Intel's oneAPI or
|
|
AdaptiveCpp.
|
|
|
|
- The "xxhash" library.
|
|
|
|
** Optional Prerequisite
|
|
|
|
- CMake (for generating build files)
|
|
|
|
* Compilation
|
|
|
|
** Using Intel oneAPI
|
|
|
|
Finally, I've made to code run with Intel's oneAPI and adapated the CMake
|
|
generation process.
|
|
|
|
#+BEGIN_SRC bash
|
|
# Make sure to source Intels vars together with the inbuild llvm!
|
|
. /opt/intel/oneapi/setvars.sh --include-intel-llvm
|
|
|
|
# Create a build directory and navigate to it
|
|
mkdir build && cd build
|
|
|
|
# Adjust the path to AdaptiveCpp and your target devices according to your system
|
|
CXX=$(which clang++) cmake .. -DUSE_INTELSYCL=ON \
|
|
-DCMAKE_BUILD_TYPE="Release"
|
|
|
|
# Compile the executable
|
|
make
|
|
#+END_SRC
|
|
|
|
** Using AdaptiveCpp
|
|
|
|
Regrettably, integrating Intel's oneAPI with the AMD GPU plugin proves to be
|
|
quite challenging on Arch Linux, primarily due to the plugin's dependency on an
|
|
older version of ROCm than what's available in the official repositories. While
|
|
I could have chosen to compile my own ROCm/hip version, I opted for a more
|
|
convenient solution and turned to the [[https://github.com/AdaptiveCpp/AdaptiveCpp/tree/develop][AdaptiveCpp]] compiler, which offers both
|
|
CPU and GPU acceleration through CUDA and ROCm support. You can find a version
|
|
of AdaptiveCpp compatible with AMD GPUs on the AUR (Arch User Repository).
|
|
|
|
If your goal is to run benchmarks on an AMD GPU alongside AdaptiveCpp, I
|
|
recommend using [[https://github.com/sobc/pkgbuilds/tree/master/hipsycl-rocm-git][this]] specific PKGBUILD. Other versions that rely on ROCm might
|
|
not build correctly at the moment. I've already raised an issue with the
|
|
responsible maintainer of the PKGBUILDs to address this compatibility issu
|
|
|
|
Currently, I can only utilize CMake for generating makefiles when working with
|
|
AdaptiveCpp. However, I intend to add CMake support for Intel's oneAPI as soon
|
|
as I have a working version of the compiler.
|
|
|
|
To generate Makefiles for AdaptiveCpp, you can follow these steps:
|
|
|
|
#+BEGIN_SRC bash
|
|
# Create a build directory and navigate to it
|
|
mkdir build && cd build
|
|
|
|
# Adjust the path to AdaptiveCpp and your target devices according to your system
|
|
cmake .. -DUSE_ACPP=ON \
|
|
-DAdaptiveCpp_DIR=/opt/AdaptiveCpp/ROCm/lib/cmake/AdaptiveCpp \
|
|
-DACPP_TARGETS="omp.accelerated;hip.integrated-multipass;gfx90c" \
|
|
-DCMAKE_BUILD_TYPE="Release"
|
|
#+END_SRC
|
|
|
|
You can find more information about =ACPP_TARGETS= and the compilation process in
|
|
the documentation [[https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/compilation.md][here]].
|
|
|
|
Once your Makefiles are generated, you can build the project using the following
|
|
command:
|
|
|
|
#+BEGIN_SRC bash
|
|
make -j$(nproc)
|
|
#+END_SRC
|
|
|
|
The compiled executable can be found in the =build/src= directory.
|
|
|
|
* Data Information
|
|
|
|
I have provided a set of 6 matrices, each with 3 different sizes:
|
|
|
|
- =sma*.txt=: These matrices are of size 16x16
|
|
- =med*.txt=: These matrices are of size 2048x2048
|
|
- =big*.txt=: These matrices are of size 8192x8192
|
|
|
|
All of these matrices are available in text file format, and you can locate them
|
|
within the =data/= directory.
|
|
|
|
|
|
*Important note*:
|
|
|
|
A word of caution when working with the large matrices (=big*.txt=): To avoid
|
|
exceedingly long execution times, it is advisable to disable the benchmark for a
|
|
single CPU core. You can achieve this by invoking CMake with the option
|
|
=-DSYCL_EX_COMPILE_SEQUENTIAL_BENCH=OFF= and then recompiling the executable
|
|
accordingly.
|
|
|
|
Additionally, below, you will find the results of multiplying all combinations
|
|
of these matrices along with their corresponding checksums. Please feel free to
|
|
reach out if you come across any other checksums or encounter further questions.
|
|
|
|
| Matrix A | Matrix B | Checksum |
|
|
|------------+------------+--------------|
|
|
| =sma1.txt= | =sma1.txt= | =0xe6134d8e= |
|
|
| =sma2.txt= | =sma2.txt= | =0xf1ba0ac6= |
|
|
| =sma1.txt= | =sma2.txt= | =0xe71fdf1e= |
|
|
| =sma2.txt= | =sma1.txt= | =0x36b44d2c= |
|
|
|------------+------------+--------------|
|
|
| =med1.txt= | =med1.txt= | =0xd92eb6d6= |
|
|
| =med2.txt= | =med2.txt= | =0x9f0e1206= |
|
|
| =med1.txt= | =med2.txt= | =0x4cf45b91= |
|
|
| =med2.txt= | =med1.txt= | =0xfdeb52bf= |
|
|
|------------+------------+--------------|
|
|
| =big1.txt= | =big1.txt= | =0xde9b4c0d= |
|
|
| =big2.txt= | =big2.txt= | =0x05365fc1= |
|
|
| =big1.txt= | =big2.txt= | =0xb185e6c1= |
|
|
| =big2.txt= | =big1.txt= | =0x59f5ffef= |
|