Speedbuilding LLVM/Clang in 2 minutes on ARM
Frederic Cambus May 12, 2021 [LLVM] [Compilers] [Toolchains]This post is the AArch64 counterpart of my "Speedbuilding LLVM/Clang in 5 minutes" article.
After publishing and sharing the previous post URL with some friends on IRC, I was asked if I wanted to try doing the same on a 160 cores ARM machine. Finding out what my answer was is left as an exercise to the reader :-)
The system I'm using for this experiment is a BM.Standard.A1.160 bare-metal machine from Oracle Cloud, which has a dual-socket motherboard with two 80 cores Ampere Altra CPUs, for a total 160 cores, and 1024 GB of RAM. This is to the best of my knowledge the fastest AArch64 server machine available at this time.
The system is running Oracle Linux Server 8.3 with up-to-date packages and kernel.
The full result of cat /proc/cpuinfo is available here.
uname -a
Linux benchmarks 5.4.17-2102.201.3.el8uek.aarch64 #2 SMP Fri Apr 23 09:42:46 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux
Let's start by installing required packages:
dnf in clang git lld
Unfortunately the CMake version available in the packages repository (3.11.4) is too old to build the main branch of the LLVM Git repository, and Ninja is not available either.
Let's bootstrap Pkgsrc to build and install them:
git clone https://github.com/NetBSD/pkgsrc.git
cd pkgsrc/bootstrap
./bootstrap --make-jobs=160 --unprivileged
===> bootstrap started: Wed May 12 12:23:34 GMT 2021
===> bootstrap ended: Wed May 12 12:26:08 GMT 2021
We then need to add ~pkg/bin and ~pkg/sbin to the path:
export PATH=$PATH:$HOME/pkg/bin:$HOME/pkg/sbin
For faster Pkgsrc builds, we can edit ~/pkg/etc/mk.conf and add:
MAKE_JOBS= 160
Let's build and install CMake and Ninja:
cd ~/pkgsrc/devel/cmake
bmake install package clean clean-depends
cd ~/pkgsrc/devel/ninja-build
bmake install package clean clean-depends
The compiler used for the builds is Clang 10.0.1:
clang --version
clang version 10.0.1 (Red Hat 10.0.1-1.0.1.module+el8.3.0+7827+89335dbf)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /bin
Regarding linkers, we are using GNU ld and GNU Gold from binutils 2.30, and LLD 10.0.1.
GNU ld version 2.30-79.0.1.el8
GNU gold (version 2.30-79.0.1.el8) 1.15
LLD 10.0.1 (compatible with GNU linkers)
For all the following runs, I'm building from the Git repository main branch commit cf4610d27bbb5c3a744374440e2fdf77caa12040. The build directory is of course fully erased between each run.
commit cf4610d27bbb5c3a744374440e2fdf77caa12040
Author: Victor Huang <wei.huang@ibm.com>
Date: Wed May 12 10:56:54 2021 -0500
I'm not sure what the underlying storage is, but with 1 TB of RAM there is no reason not to use a ramdisk.
mkdir /mnt/ramdisk
mount -t tmpfs -o size=32g tmpfs /mnt/ramdisk
cd /mnt/ramdisk
To get a baseline, let's do a full release build on this machine:
cd llvm-project
mkdir build
cd build
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
../llvm
time make -j160
real 7m3.226s
user 403m28.362s
sys 6m41.331s
By default, CMake generates Makefiles. As documented in the "Getting Started with the LLVM System" tutorial, most LLVM developers use Ninja.
Let's switch to generating Ninja build files, and using ninja to build:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-GNinja ../llvm
time ninja
[4182/4182] Linking CXX executable bin/c-index-test
real 4m20.403s
user 427m27.118s
sys 7m2.320s
By default, GNU ld is used for linking. Let's switch to using gold:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=gold \
-GNinja ../llvm
time ninja
[4182/4182] Linking CXX executable bin/c-index-test
real 4m1.062s
user 427m1.648s
sys 6m58.282s
LLD has been a viable option for some years now. Let's use it:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-GNinja ../llvm
time ninja
[4182/4182] Linking CXX executable bin/clang-scan-deps
real 3m58.476s
user 428m3.807s
sys 7m14.418s
Using GNU gold instead of GNU ld results in noticeably faster builds, and switching to LLD shaves a few mores seconds from the build.
If we want to build faster, we can make some compromises and start stripping the build by removing some components.
Let's start by disabling additional architecture support:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="AArch64" \
-GNinja ../llvm
time ninja
[3195/3195] Linking CXX executable bin/c-index-test
real 3m10.312s
user 326m54.898s
sys 5m24.770s
We can verify the resulting Clang binary only supports AArch64 targets:
bin/clang --print-targets
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
Let's go further and disable the static analyzer and the ARC Migration Tool:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="AArch64" \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-GNinja ../llvm
time ninja
[3146/3146] Creating library symlink lib/libclang-cpp.so
real 3m6.474s
user 319m25.914s
sys 5m20.924s
Let's disable building some LLVM tools and utils:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="AArch64" \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-DLLVM_BUILD_TOOLS=OFF \
-DLLVM_BUILD_UTILS=OFF \
-GNinja ../llvm
time ninja
[2879/2879] Creating library symlink lib/libclang-cpp.so
real 2m59.659s
user 298m47.482s
sys 4m57.430s
Compared to the previous build, the following binaries were not built: FileCheck, count, lli-child-target, llvm-jitlink-executor, llvm-PerfectShuffle, not, obj2yaml, yaml2obj, and yaml-bench.
We are reaching the end of our journey here. At this point, we are done stripping out things.
Let's disable optimizations and do a last run:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="AArch64" \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-DLLVM_BUILD_TOOLS=OFF \
-DLLVM_BUILD_UTILS=OFF \
-DCMAKE_CXX_FLAGS_RELEASE="-O0" \
-GNinja ../llvm
time ninja
[2879/2879] Linking CXX executable bin/c-index-test
real 2m37.003s
user 231m53.133s
sys 4m56.675s
So this is it, this machine can build a full LLVM/Clang release build in a bit less than four minutes, and a stripped down build with optimizations disabled in two minutes. Two minutes. This is absolutely mind-blowing... The future is now!