ExCALIBUR H&ES RISC-V testbed

RISC-V soft-core support

2024-02-12T00:00:00+00:00

As well as supporting physical hardware (e.g. Allwinner D1, SiFive U74, and 64-core SOPHGO SG2042 CPUs), the testbed also supports RISC-V soft-cores running on an ADM-PA101, which is an AMD/Xilinx Versal FPGA equipped with 16GB DDR.

Background

In order to simplify development, the ADM-PA101 has been set up to run PetaLinux, to allow the soft-cores to be added to the Slurm cluster as the card has Ethernet access. To enable this, we need to configure PetaLinux to boot via ‘tftp’ and mount its root filesystem over NFS.

Networking / MAC address configuration

By default, PetaLinux configures the Ethernet port with a random MAC address. To allow a DHCP assigned IP address based on the MAC address, the following variables need to be set:

CONFIG_SUBSYSTEM_ETHERNET_VERSAL_CIPS_0_PSPMC_0_PSV_ETHERNET_0_MAC="00:c0:ff:ee:00:00"
CONFIG_SUBSYSTEM_ETHERNET_VERSAL_CIPS_0_PSPMC_0_PSV_ETHERNET_0_USE_DHCP=y

The hostname can be set as using CONFIG_SUBSYSTEM_HOSTNAME="fpga01".

Root filesystem user configuration

The default PetaLinux configuration will set up root and petalinux users. This configration can be overridden as follows:

CONFIG_ADD_EXTRA_USERS="root:root;user1:initialpassword;"
CONFIG_CREATE_NEW_GROUPS="aie;"
CONFIG_ADD_USERS_TO_GROUPS="user1:audio,video,aie;"
CONFIG_ADD_USERS_TO_SUDOERS="user1"

NOTE: This sets the default root password to ‘root’ and should be changed. The petalinux-build command will raise a warning to remind you to change this.

In the above example, user1 has sudo access through the addition of CONFIG_ADD_USERS_TO_SUDOERS="user1". The example also shows how groups can be added.

NOTE: The first build of PetaLinux should be used to create the root filesystem (or use petalinux-build -c rootfs to rebuild), which should then be expanded into the NFS share directory (e.g. /tftpboot/nfsroot).

Root filesystem over NFS configuration

Using NFS for the root filesystem should be a trivial configuration change using petalinux-config. However, by default, the Xilinx PetaLinux configuration uses NFS v4 protocol for the client. Unfortunately, this is incompatible with the default Debian NFS server running on our login node. The answer is to force the PetaLinux boot to use NFS v3 which can be set in the BOOTARGS using the PetaLinux config UI or in the BOOTARGS variable of project-spec/configs/config file in the PetaLinux project directory (sw/petalinux/base):

CONFIG_SUBSYSTEM_BOOTARGS_GENERATED="console=ttyAMA0 earlycon=pl011,mmio32,0xFF000000,115200n8 clk_ignore_unused root=/dev/nfs nfsroot=c0.ff.ee.00:/tftpboot/nfsroot,tcp,v3 ip=dhcp rw"

Here we can see that the root file system is being set to a NFS mount (root=/dev/nfs) with the nfsroot option including the server and path, as well as forcing tcp and v3 of the NFS protocol.

Issues

Unfortunately, the CONFIG_SUBSYSTEM_BOOTARGS_GENERATED setting, as the name suggests, is generated and gets wiped during the build. Therefore, the documentation states that the boot command arguments need to be placed in the chosen section of sw/petalinux/system-user.dtsi as follows:

chosen {
	stdout-path = "serial0:115200";
    	bootargs = "console=ttyAMA0 earlycon=pl011,mmio32,0xFF000000,115200n8 clk_ignore_unused root=/dev/nfs nfsroot=c0.ff.ee.00:/tftpboot/nfsroot,tcp,v3 ip=dhcp rw"
};

However, this breaks the build when petalinux-build generates other .dtsi files and we are unable to proceed further.

Workaround for the RISC-V testbed

After much experimentation, the following approach can be used to build a PetaLinux image for the uSD card that will boot over ‘tftp’ and mount the root filesystem over NFS.

Expand the AlphaData supplied ps_base_sw-admpa101-v1_2_0.tar.gz in a working directory
Setup PetaLinux and Vivado environment (assuming Bash on Linux):
- source /settings. sh
- source /settings64.sh
cd ps_base_sw-admpa101-v1_2_0/fpga/proj/base
- Runvivado -mode batch -source mkxpr-base.tcl
- When complete, vivado -mode batch -source do_build.tcl
cd ps_base_sw-admpa101-v1_2_0/sw/petalinux
- petalinux-create -t project -s ../../os/simple.bsp
cd simple
petalinux-build
Make a cup of tea / coffee, drink slowly and wait…
Either:
- Create a patch to the config file to add DHCP and NFS support using diff
- Copy the patch (here config.patch) to ps_base_sw-admpa101-v1_2_0/sw/petalinux/simple
- patch -b project-spec/configs/config config.patch
Or:
- Edit the project-spec/configs/config directly to make the required changes above
petalinux-build
petalinux-package --boot --u-boot (builds BOOT.BIN)
Copy image.ub, boot.scr and BOOT.BIN from /tftpboot to the uSD card (petalinux-build will place the files in /tftpboot by default).

Note: Ignore the following warning as once NFS is enabled, the user accounts will be configured from the NFS root file system:

WARNING: petalinux-image-minimal-1.0-r0 do_rootfs: Enabling autologin to user root. This configuration should NOT be used in production!

As mentioned above, this build assumes that there is an expanded rootfs for the ARM cores in /tftpboot/nfsroot (previous petalinux-build -c rootfs)

International workshop on RISC-V for HPC co-hosted at EuroPar 2024

2024-02-11T00:00:00+00:00

Important dates

Paper Deadline: 6th May 2024 (AoE)
Author Notification: 20th June 2024
Camera Ready: 1st July 2024
Workshop: 26th or 27th August 2024

Workshop details

Co-located with EuroPar 2024, this is workshop will be held on the 26th or 27th of August in Madrid, Spain.

Workshop scope

The goal of this workshop is to continue building the community of RISC-V in HPC, sharing the benefits of this technology with domain scientists, tool developers, and supercomputer operators. RISC-V is an open standard Instruction Set Architecture (ISA) which enables the royalty free development of CPUs and a common software ecosystem to be shared across them. Following this community driven ISA standard, a very diverse set of CPUs have been, and continue to be, developed which are suited to a range of workloads. Whilst RISC-V has become very popular already in some fields, and in 2022 the ten billionth RISC-V core was shipped, to date it has yet to gain traction in HPC.

However, there are numerous potential advantages that RISC-V can provide to HPC and, assuming the significant rate of growth of this technology to date continues, as we progress further into the decade it is highly likely that RISC-V will become more relevant and widespread for HPC workloads. Furthermore, recent advances in RISC-V make it a more realistic proposition for HPC workloads than ever before. An example of this is vectorisation extension which provides important performance advantages for HPC workloads but was only standardised in early 2022, and-so we are only now seeing mature CPUs that fully implement this.

The open and standardised nature of RISC-V means that the large, and growing community, can be involved in shaping the standard and tooling. This is important from two perspectives, firstly it is our opportunity in the HPC community to help shape the future of RISC-V to ensure that it is suitable for the next generation of supercomputers. Secondly, whilst there are a wide variety of RISC-V CPUs currently available, the standard nature of the tooling means that very often the same software ecosystem comprising the compiler, operating system, and libraries will run across these whilst requiring few changes.

This workshop aims to bring together those already looking to popularise RISC-V in the field of HPC with the supercomputing community at-large. By sharing benefits of the architecture, success stories, and techniques we hope to further popularise the technology and increase involvement by the community.

Call for papers - workshop topics

We invite submissions of high-quality, original research results and works-in-progress on RISC-V with a general connection to HPC. Topics of interest for this workshop include (but are not limited to):

Example use-cases and case-studies that use RISC-V
Lessons learnt from leveraging RISC-V in HPC
Industry papers exploring the use of RISC-V
The porting of codes to RISC-V
Novel hardware and accelerators built upon RISC-V
Tools and techniques to aid in the use of RISC-V for HPC
Developments in HPC libraries to port them to RISC-V
Enhancements to RISC-V to make the architecture more suited for HPC
Compiler and runtime support for RISC-V
The RISC-V ecosystem
Future gazing how RISC-V might evolve the HPC community
And anything else related to RISC-V and HPC!

Paper submission

Authors are invited to submit unpublished, original work. Accepted papers will appear in the post-conference workshop proceedings in the Springer Lecture Notes in Computer Science (LNCS) series and submitted versions available online for the workshop. Submissions of original work between 10 and 12 pages (the page count does not include references) are welcomed on work-in-progress, position papers, or mature work. All papers should be submitted via EasyChair here

All papers should be formatted Springer single column LNCS style, with formatting information and templates here

Organisation

Organising committee

Nick Brown (EPCC at the University of Edinburgh)
Michael Wong (Codeplay)
John Davis (Independent)

Program committee

Oliver Perks (Rivos)
John Leidel (Tactical Computing Labs)
Maurice Jamieson (EPCC)
Ruyman Reyes (Codeplay)
Luis Plana (BSC)
Joseph Lee (EPCC)
Luc Berger-Vergait (Sandia National Laboratories)
Teresa Cervero (BSC)
Chris Taylor (Tactical Computing Labs)
John Davis

Third International workshop on RISC-V for HPC

2024-01-01T00:00:00+00:00

Logistics

Co-located with HPC Asia 2024, this workshop will run between 08:30 and 12:30 on the morning of January 25th 2024 in Nagoya, Japan

Workshop details

Workshop schedule

Time	Session	Speaker
09:00 - 09:10	Welcome and aims	Michael Wong
09:10 - 09:50	Keynote: Rev: Scalable HPC Workload Simulation using RISC-V in SST (slides)	John Leidel
09:50 - 10:00	SG2042 Empowering RISC-V in High-Performance Computing (slides)	Wang Zihan
10:00 - 10:30	Break
10:30 - 11:00	E4 Experience with RISC-V in HPC (slides)	Daniele Gregori
11:00 - 11:20	The phenomenal pace of change making RISC-V more attractive for HPC (slides)	Nick Brown
11:20 - 11:50	Lessons learned on Cell/B.E. for Hetero Programming Model, and alignments tweaks on RISC-V for Network speeds (slides)	Akira Tsukamoto
11:50 - 12:25	Panel: Will 2024 be the year for RISC-V in HPC?
12:25 - 12:30	Conclusions and next steps	Nick Brown

Benchmarks update

2023-03-29T00:00:00+00:00

Here we summarize the result of some benchmark tests performed on RISC-V hardware available as part of the testbed.

RAJAPerf

RAJAPerf tests a suite of loop-based computational kernels relevant for HPC.

DongshanNezhaSTU (Allwinner D1-H)

The DongshanNezhaSTU board contains the Allwinner D1 C906, which supports the V vector extension (version 0.7.1). The chip contains 128-bit wide vector registers and supports element sizes up to 32-bit. Because of this, we compiled RAJAPerf with single percision floating points numbers to enable speedup from vectorization.

We also compare the performance against the StarFive JH7110 (VF2), which contains a quad-core SiFive U74, and a Fujitsu Arm A64FX system, which has SIMD instructions (NEON) as well as scalable vectors (SVE). The A64FX processor is designed for HPC applications and completely different in nature to the RISC-V cores, which are designed for embedded and single-board computers (SBC). However, a comparison against the A64FX is still useful as it can highlight important differences and potential design improvements for an HPC-class RISC-V processor in the future. Because the C906 only contains a single core, all benchmarks are run on a single core to enable direct comparison across CPUs, and only NEON with 128-bit vector width is used on A64FX.

The RISC-V results are compiled using the XuanTie GCC 8.4, with -O3 -march=rv64gcv0p7 -ffast-math for vector and -O3 -march=rv64gc -ffast-math for scalar, and for Arm we used GCC 11.2 with -O3 -ffast-math -mcpu=a64fx -march=armv8.2-a+simd+nosve for vector and -O3 -ffast-math -mcpu=a64fx -march=armv8.2-a+nosimd+nosve for scalar.

In the following plots we show runtimes for the RAJAPerf kernel normalised against the kernel’s scalar runtime. For the A64FX, normalisation is against running in scalar mode on the A64FX, whereas for the Allwinner D1 and StarFive JH7110 it is normalised against running scalar on the D1. The orange and purple bars show the vectorisation performance difference on the A64FX and D1 respectively, and the green bars show a comparison of the scalar performance between the JH7110 (VF2) and the D1.

It can be observed from these plots that for most linear algebra kernels, the vectorised code on the RISC-V D1 is faster compared to its scalar counterpart.

Below we also tested LLVM 15.0, which is able to vectorize more kernels than XuanTie GCC 8.4, but generated RVV 1.0 code. We utilized the RVV-rollback tool https://github.com/RISCVtestbed/rvv-rollback to translate some of the kernels, and the speedup can be seen in the plots below.

Kernels vectorized by GCC:

Kernels not vectorized by GCC:

Kernels vectorized by GCC, but no vector instructions were executed at runtime:

Clang contains settings for vector length specific code (VLS - via -riscv-v-vector-bits-min=128) and vector length agnostic (VLA - via -scalable-vectorization=on), which we showed in the plots above. It can be seen that Clang and GCC have different performance in terms of vectorizing and executing vector instructions for the different kernels.

For more details of the above results, see the following publications:

Test-driving RISC-V Vector hardware for HPC, J. K. L. Lee, M. Jamieson, N. Brown, R. Jesus
Backporting RISC-V vector assembly, J. K. L. Lee, M. Jamieson, N. Brown

Toolchains & Cross-debugging

2023-01-11T00:00:00+00:00

In this post we cover the toolchains and debugging tools available to compile applications for RISC-V. These allow users to cross-compile RISC-V executables on the login node, which can then be run on the testbed nodes. The toolchains provide various binutils, such as ld - linker, as - assembler, and objdump - displays object file information.

GNU toolchain

The first toolchain is the RISC-V GNU Compuler Toolchain, which is available at https://github.com/riscv-collab/riscv-gnu-toolchain. The README provides comprehensive instructions to compile the toolchain.

Different versions of this toolchain have already been installed on the login node and can be directly be loaded using module load, following the instructions here. Once loaded, the compilers and binutils can be called directly, e.g.

[username@riscv-login ~]$ module load riscv64-linux/gnu-12.2
[username@riscv-login ~]$ riscv64-unknown-linux-gnu-gcc --version
riscv64-unknown-linux-gnu-gcc (g) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Notes:

The toolchain can be compiled with two C standard libraries: GNU C Library (glibc) and Newlib. Newlib provides ISO C, is focused on size and is intended for embedded systems. On top of ISO C, glibc also provides other APIs including POSIX, BSD, XPG, making it more suitable for linux applications. Toolchains for both newlib and glibc in 32/64-bit are provided and can be loaded directly.
The binaries have the prefix riscv(32/64)-unknown-(elf/linux-gnu)- for (32/64)-bit and (newlib/glibc) respectively
When using the gnu compiler, the isa can be specified by -march=ISA-string, e.g. -march=rv64gc. For more options, see https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Options.html

Simulator

The toolchain also includes a simulator (e.g. QEMU), which allows us to run RISC-V binaries on the host. To build the simulator, after configuring and building the gnu toolchain, additionally run $ make build-sim SIM=qemu. To use the simulator, just run $ qemu-riscv64 (application).

Note:

This has also been installed in the module riscv64-linux/gnu-12.2 on riscv-login
If the default compilers are too old, modify Makefile.in under build-qemu and add the following flags to configure:
```
--cc=[c compiler] \
--cxx=[c++ compiler]
```

LLVM toolchain

LLVM also supports RISC-V, and at the moment provides better vector (1.0) support than gcc. To build the LLVM project, the gnu toolchain has to be first built. For reference see https://llvm.org/docs/CMake.html and https://llvm.org/docs/GettingStarted.html. Most important for building LLVM for RISC-V, the following flags have to be added to cmake (e.g. for 64-bit):

cmake ... -DLLVM_TARGETS_TO_BUILD="RISCV" \
     -DLLVM_ENABLE_PROJECTS="clang;lld" \
     -DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi;libunwind" \
     -DLLVM_DEFAULT_TARGET_TRIPLE="riscv64-linux-gnu" \
     -DDEFAULT_SYSROOT="$(INSTALL_DIR)/sysroot" 

where $(INSTALL_DIR) is the gcc toolchain install directory. However, since the -DDEFAULT_SYSROOT is set, the flag DGCC_INSTALL_PREFIX will be ignored, which is actually necessary to find libgcc. A workaround is to merge the paths.

This has been implemented in a PR https://github.com/riscv-collab/riscv-gnu-toolchain/pull/1166, which is currently the easiest way to build the LLVM project. To build this toolchain

$ git clone https://github.com/cmuellner/riscv-gnu-toolchain.git
$ cd riscv-gnu-toolchain/
$ git checkout origin/llvm-new
$ ./configure --prefix=$(prefix) --with-arch=rv64gc --with-abi=lp64d --enable-llvm --enable-linux
$ make 

The LLVM binaries will be built in the same location in $prefix.

Notes:

The LLVM project can currently be built only with glibc
LLVM RISC-V reference: https://llvm.org/docs//RISCVUsage.html
At the moment this PR will build LLVM 15.0. To build with an up to date LLVM, run git submodule update --init --recursive , then cd LLVM and git fetch to pull the latest LLVM.
When configuring LLVM build, by default the C compiler uses /usr/bin/cc and CXX compiler uses /usr/bin/c++ . If the default compilers are too old, modify Makefile.in under build-llvm-linux and add the following flags to cmake:

-DCMAKE_C_COMPILER="[c compiler]" \
-DCMAKE_CXX_COMPILER="[c++ compiler]" \

Vector

The upstream LLVM Compiler (clang) by default supports the vector extension and auto-vectorization. To build gcc with vector support and auto-vectorization, the rvv-next branch needs to checked out.

Notes:

To enable vectorization in clang, add the flags -march=rv64gcv -menable-experimental-extensions -O2 -mllvm --riscv-v-vector-bits-min=128 or -march=rv64gcv -menable-experimental-extensions -O2 -mllvm -scalable-vectorization=on
To enable vectorization in gcc, add the flags --with-arch=rv64gcv -O3
For more information, see the Compiling Vector Code page

(Cross-)Debugging

The toolchain contains the debugger riscv64-unknown-linux-gnu-gdb. To debug RISC-V executables on the host, we need to use it in conjunction with the QEMU simulator. To do so, we first connect QEMU to the application by adding the -g (port) flag, e.g.

$ qemu-riscv64 -g 1234 ./hello-world

Next we need to set up gdb to connect to the QEMU instance. In a separate terminal, create the file .gdbinit, and include the target to connect to the port. For example,

$ cat .gdbinit
target remote localhost:1234
tui enable
layout asm
break main

This will allow us to debug with the text user interface, with a breakpoint at main.

Then, we can simply run the debugger

$ riscv64-unknown-linux-gnu-gdb ./hello-world

and commence debugging. There may be additional instructions prompted on screen here, which should be followed.

References:

The LLVM and cross-debugging instructions mainly come from the very helpful tutorial by Christoph Müllner: https://youtu.be/mBNX843U2qE

Compiling Vector Code

2022-11-23T00:00:00+00:00

Some of the hardware (e.g. Sophon SG2042 and Allwinner D1) within the testbed supports RISC-V V vector extension (RVV). Here we document and provide references for compiling code with vector instructions.

A major caveat is that the first ratified RVV is version 1.0 (spec), whereas the C920 and C906 cores in Sophon SG2042 and the Allwinner D1 SoCs were designed to support RVV 0.7.1 (spec). The two specs are similar but not compatible. For more information, see 1 2.

On riscv-login, the following compilers modules (see Getting Started) support RVV 0.7.1:

riscv64-linux/gnu-8.4-rvv
riscv64-linux/gnu-9.2-rvv
riscv64-linux/gnu-10.2-rvv

The following compiler modules support RVV 1.0

riscv64-linux/gnu-10.2-rvv
riscv64-linux/llvm-15.0
riscv64-linux/llvm-16.0

RVV 0.7.1

The simplest way to work with RVV 0.7.1 is in assembly language. The spec provides some examples of how to do so. Tests of memcpy and strcpy speeds on Allwinner D1 hardware using RVV 0.7.1 have been recorded here.

Notes:

Include -march=...v (e.g. -march=rv64gcv to include vector extension; to specify the version -march=rv64gcv0p7)
QEMU supports RVV 0.7.1
riscv64-linux/gnu-8.4-rvv provides the best auto-vectorisation
RVV 0.7.1 intrinsic manual for the riscv64-linux/gnu-10.2-rvv compiler: https://occ-oss-prod.oss-cn-hangzhou.aliyuncs.com/resource//1663142187133/Xuantie+900+Series+RVV-0.7.1+Intrinsic+Manual.pdf
OpenBLAS optimized for RVV 0.7.1: https://github.com/xianyi/OpenBLAS/tree/risc-v

RVV 1.0

Due to the fact that RVV 1.0 is the ratified version, there is significantly more support by compilers. The latest LLVM compiler and toolchain provide support for vector intrinsics (v0.10)and auto-vectorization.

Notes:

Include -march=...v (e.g. -march=rv64gcv to include vector extension; to specify the version -march=rv64gcv1p0)
To use the Gnu rvv-next branch toolchain, also pull the riscv-gcc-rvv-next branch in riscv-gcc
Instructions to build LLVM toolchain: https://github.com/riscv-collab/riscv-gnu-toolchain/pull/1166 or https://github.com/sifive/riscv-llvm
To enable auto-vectorization in gnu toolchain (rvv-next), configure with --with-arch=rv64gcv and compile with -ftree-vectorize or -O3 (see 1 2)
To enable auto-vectorization in clang, add the following flags -march=rv64gv -target riscv64 -O2 -mllvm --riscv-v-vector-bits-min=N (e.g. N = 128 ) for vector length specific, and -march=rv64gv -target riscv64 -O2 -mllvm -scalable-vectorization=on for vector length agnostic
Intrinsics and Auto-Vectorization (with Clang) can be tested on Compiler Explorer
To view details for auto-vectorization by the compilers, add -fopt-info-vec-all for gcc or -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize for clang. (See https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-fopt-info-1337 and https://llvm.org/docs/Vectorizers.html)
Talk at RISC-V Summit: Getting the Most out of the LLVM Auto Vectorizer for RISC-V Vectors (RVV) - Kolya Panchenko, SiFive

Examples:

Intrinsics on Compiler Explorer: https://godbolt.org/z/xd1d1Tfdf
Auto-Vectorization on Compiler Explorer: https://godbolt.org/z/PzjbnM93E
Example runs of Auto-Vectorized code: https://www.luffca.com/2022/06/riscv-vector-vicuna-simulator/

RVV rollback

We have introduced a tool to translate RVV 1.0 assembly code to 0.7, which is available for download here https://github.com/RISCVtestbed/rvv-rollback. It is tested for the following workflow:

This is tested for the following workflow:

Clang 15.0 to compile .cpp source to RVV 1.0 .s
RVV-rollback to translate RVV1.0 .s to RVV0.7 .s
GCC 10.2 (Xuantie-900 linux-5.10.4 glibc gcc Toolchain V2.6.1 B-20220906) to assemble RVV0.7 .s to .o

The tool does not support some features introduced in v1.0, such as fractional LMUL and 64-bit elements.