The state of toolchains in OpenBSD
Frederic Cambus May 19, 2021 [OpenBSD] [GCC] [LLVM] [Compilers] [Toolchains]For most of the 2010s, the OpenBSD base system has been stuck with GCC 4.2.1. It was released in July 2007, imported into the OpenBSD source tree in October 2009, and became the default compiler on the amd64, i386, hppa, sparc64, socppc and macppc platforms in OpenBSD 4.8, released in November 2010. As specified in the commit message during import, this is the last version released under the GPLv2 license.
OpenBSD was not the only operating system sticking to GCC 4.2.1 for licensing reasons, FreeBSD did the same, and Mac OS X as well.
As a general rule, and this is not OpenBSD specific, being stuck with old compilers is problematic for several reasons:
- The main reason has to be newer C and C++ standards support. While the C standards committee is conservative and the language evolves slowly, the pace at which new C++ standards appear has been accelerating, with a new version emerging every 3 years.
- Another major reason is new architectures support. Not only new ISAs like ARMv8 and RISC-V, but also x86-64 microarchitecture updates.
- They are not getting bugfixes, nor new optimizations or advances in diagnostic (better warnings) and security features.
The latest point has been partially mitigated on OpenBSD, as several developers have worked on fixing OpenBSD related issues and backporting fixes, as detailed in Miod's excellent "Compilers in OpenBSD" post from 2013.
Regarding new architectures support, the more astute reader will know that all OpenBSD supported platforms are self-hosted and releases must be built using the base system compiler on real hardware. No cross-compilation, no emulators. The ARMv8 architecture was announced in 2011, a few years after GCC 4.2.1 was released. By the year 2016, 64-bit ARMv8 devices were getting widely available and more affordable. During the g2k16 hackathon, the Castle Inn pub had become a favorite meet-up point among OpenBSD developers, and the topic came up at one of the evening gatherings. I happened to be sitting nearby when patrick@ discussed with deraadt@ about the possibility of importing LLVM to make a future OpenBSD/arm64 porting effort possible, and Theo said that there was nothing blocking it. The next day, pascal@ mentioned he already had some Makefiles to replace the LLVM build system, and when someone then asked how long it would take to put them in shape for import, he said he didn't know, then smiled and said: "Let's find out! :-)". Before the end of the hackathon, he imported LLVM 3.8.1 along with Clang. Patrick's g2k16 hackathon report retraces the events and gives more technical details.
OpenBSD/arm64 became the first platform to use Clang as base system compiler and LLD as the default linker. Clang then became the default compiler on amd64 and i386 in July 2017, on armv7 in January 2018, on octeon in July 2019, on powerpc in April 2020, and finally on loongson in December 2020.
LLVM was updated regularly along the way up to the 8.0.1 version, which was the latest version released under the NCSA license. From then all later LLVM versions have been released under the Apache 2.0 license, which couldn't be included in OpenBSD. The project's copyright policy page details OpenBSD's stance on the license, and Mark Kettenis objection on the llvm-dev mailing list gives more background information.
While staying with LLVM 8.0.1 would not have been an immediate problem for the OpenBSD kernel and the base system userland which uses C99, the project also includes 3rd party C++ codebases for parts of the graphics stack and of course LLVM itself. Jonathan Gray hinted at the problem on the openbsd-misc mailing list, mentioning that not updating was becoming increasingly painful. The effect which can be observed in the 3rd party software ecosystem regarding newer C and C++ standards is that while C99 is still reigning supreme in C codebases, C++ codebases maintainers have been eager to adopt new C++ standards (and for good reasons). The recent RFC on cfe-dev about bumping toolchain requirements for LLVM to Clang 6.0 (released in March 2018) proves that LLVM is no exception. A compromise was thus inevitable, and LLVM 10.0.0 was imported into -current in August 2020.
At the time the OpenBSD 6.9 branch was created, the CVS tree contained LLVM 10.0.1, GCC 4.2.1, and GCC 3.3.6. However, it's important to understand that not all compilers are built on all platforms:
- Clang is the default compiler on amd64, arm64, armv7, i386, loongson, macppc, octeon, powerpc64, and riscv64.
- GCC 4.2.1 is still the default compiler on alpha, hppa, landisk, and sparc64.
- OpenBSD/luna88k is the only platform still using GCC 3.3.6, as m88k support was removed in GCC 3.4.
Following the OpenBSD 6.9 release, OpenBSD-current has been updated to LLVM 11.1.0 and GCC 4.2.1 is not built anymore on amd64. GCC 8.4.0 (released in March 2020) is available in the ports collection.
Among the remaining platforms still using GCC 4.2.1 as the default compiler, only sparc64 will be able to switch in the future. LLVM has a Sparc V9 backend and work has been done in OpenBSD to make the switch possible. For all the other remaining ones, there are no alpha, hppa, sh4, nor m88k backends in LLVM, and even if this changed in the future, the hardware is too slow to be able to self-host the compiler.
Regarding linkers, LLD is the default linker on amd64, arm64, armv7, i386, powerpc64, and riscv64. All other platforms still use GNU ld from binutils 2.17. Realistically, it should be possible to switch to LLD in the future on the following platforms: loongson, macppc, octeon, and sparc64.
At this point, all relevant architectures have modern and up-to-date toolchains, and we can look ahead in confidence on that front.