Speedbuilding LLVM/Clang in 3 minutes on Power10
Frederic Cambus March 28, 2024 [LLVM] [Compilers] [Toolchains]This post is the Power10 counterpart of my "Speedbuilding LLVM/Clang in 5 minutes" and "Speedbuilding LLVM/Clang in 2 minutes on ARM" articles.
The system I'm using for this experiment is an IBM POWER10 9043-MRX (E1050) server with a total of 24 cores and 192 threads, and 2 TB of RAM.
The system is running AlmaLinux 9.3 with up-to-date packages and kernel.
The full result of cat /proc/cpuinfo is available here.
uname -a
Linux benchmarks 5.14.0-284.11.1.el9_2.ppc64le #1 SMP Tue May 9 09:51:51 UTC 2023 ppc64le ppc64le ppc64le GNU/Linux
The compiler used for the builds is Clang 16.0.6:
clang --version
clang version 16.0.6 (Red Hat 16.0.6-1.el9)
Target: ppc64le-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Regarding linkers, we are using GNU ld and GNU Gold from binutils 2.35, LLD 16.0.6, and mold 2.30.0.
GNU ld version 2.35.2-42.el9_3.1
GNU gold (version 2.35.2-42.el9_3.1) 1.16
LLD 16.0.6 (compatible with GNU linkers)
mold 2.30.0 (compatible with GNU ld)
For all the following runs, I'm building from the Git repository main branch commit d7975c9d93fb4a69c0bd79d7d5b3f6be77a25c73. The build directory is of course fully erased between each run.
commit d7975c9d93fb4a69c0bd79d7d5b3f6be77a25c73
Author: Alexey Bataev <a.bataev@outlook.com>
Date: Thu Mar 28 10:35:15 2024 -0400
To get a baseline, let's do a full release build on this machine:
cd llvm-project
mkdir build
cd build
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
../llvm
time make -j192
real 6m24.963s
user 558m22.641s
sys 6m36.038s
By default, CMake generates Makefiles. As documented in the "Getting Started with the LLVM System" tutorial, most LLVM developers use Ninja.
Let's switch to generating Ninja build files, and using ninja to build:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-GNinja ../llvm
time ninja
[4996/4996] Linking CXX executable bin/c-index-test
real 4m18.966s
user 646m50.702s
sys 7m4.562s
By default, GNU ld is used for linking. Let's switch to using gold:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=gold \
-GNinja ../llvm
time ninja
[4996/4996] Linking CXX executable bin/c-index-test
real 4m16.043s
user 644m52.475s
sys 6m22.136s
LLD has been a viable option for some years now. Let's use it:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-GNinja ../llvm
time ninja
[4996/4996] Linking CXX executable bin/c-index-test
real 4m6.797s
user 644m10.316s
sys 7m19.764s
Since I wrote the previous posts of the series, Mold has reached maturity and gained PowerPC support. Let's try it:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=mold \
-GNinja ../llvm
time ninja
[4996/4996] Linking CXX executable bin/c-index-test
real 4m4.206s
user 642m24.880s
sys 6m23.151s
Using GNU gold instead of GNU ld results in slightly faster builds, and switching to LLD and then Mold shaves a few more seconds from the build. For the remaining of the article, I will stick to using LLD as linker.
If we want to build faster, we can make some compromises and start stripping the build by removing some components.
Let's start by disabling additional architecture support:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="PowerPC" \
-GNinja ../llvm
time ninja
[3787/3787] Linking CXX executable bin/c-index-test
real 3m17.436s
user 476m8.062s
sys 4m57.820s
We can verify the resulting Clang binary only supports PowerPC targets:
bin/clang --print-targets
Registered Targets:
ppc32 - PowerPC 32
ppc32le - PowerPC 32 LE
ppc64 - PowerPC 64
ppc64le - PowerPC 64 LE
Let's go further and disable the static analyzer and the ARC Migration Tool:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="PowerPC" \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-GNinja ../llvm
time ninja
[3717/3717] Linking CXX executable bin/c-index-test
real 3m16.444s
user 462m24.103s
sys 4m48.255s
Let's disable building some LLVM tools and utils:
cmake -DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=clang \
-DLLVM_USE_LINKER=lld \
-DLLVM_TARGETS_TO_BUILD="PowerPC" \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-DLLVM_BUILD_TOOLS=OFF \
-DLLVM_BUILD_UTILS=OFF \
-GNinja ../llvm
time ninja
[3324/3324] Linking CXX executable bin/c-index-test
real 3m11.458s
user 429m11.170s
sys 4m15.618s
We are reaching the end of our journey here. At this point, we are done stripping out things.
Contrary to the previous builds done in 2021 on X86 and ARM, disabling optimizations by building with the "-O0" flag results in consistently slower build times on this server.
So this is it, this machine can build a full LLVM/Clang release build in a bit more than four minutes, and a stripped down build in three minutes.