Diving into toolchains
Frederic Cambus June 08, 2021 [Compilers] [Toolchains]I've been wanting to learn more about compilers and toolchains in general for a while now. In June 2016, I asked about recommended readings on lexers and parsers on Twitter. However, I have to confess that I didn't go forward with reading the Dragon Book.
Instead, I got involved as a developer in the OpenBSD and NetBSD projects, and witnessing the evolution of toolchains within those systems played a big role in maintaining my interest and fascination in the topic. In retrospect, it now becomes apparent that the work I did on porting and packaging software for those systems really helped to put in perspective how the different parts of the toolchains interact together to produce binaries.
Approximately one year ago, I asked again on Twitter whether I knew anyone having worked on compilers and toolchains professionally to get real world advice on how to gain expertise in the field. I got several interesting answers and started to collect and read more resources on the topic. Some of the links I collected ended up on toolchains.net, a collection of toolchain resources which I put online in February.
But the answer that resonate the most with me was Howard's advice to learn by doing. Because I seem to be the kind of person who need to see some concrete results in order to keep motivated, that's exactly what I decided to do.
I started by doing some cleanups in the binutils package in NetBSD's pkgsrc, which resulted in a series of commits:
2020-12-20 | ca38479 | Remove now unneeded OpenBSD specific checks in gold |
2020-12-15 | 7263eee | Add missing TEST_DEPENDS on devel/dejagnu |
2020-12-14 | b1637da | Don't use hard-coded -ldl in the gold test suite. |
2020-12-13 | 146def2 | Remove apparently unneeded patch for libiberty |
2020-12-12 | 6b347a9 | Remove CFLAGS.OpenBSD+= -Wno-bounded directive |
2020-12-11 | f53b2d8 | Remove now unneeded patch dropping hidden symbols warning |
2020-12-10 | b037380 | Enable building gold on Linux |
2020-12-03 | 75d00bc | Remove now unneeded workaround for binutils 2.24 |
2020-12-03 | adfee30 | Drop all Bitrig related patches |
Meanwhile, I also got the opportunity to update our package and apply security fixes:
2021-02-11 | 761e000 | Update to binutils 2.36.1 |
2021-01-27 | ba983e5 | Update to binutils 2.36 |
2021-01-07 | 7aef5c0 | Add upstream fixes for CVE-2020-35448 |
2020-12-06 | 99fdf39 | Update to binutils 2.35.1 |
I eventually took maintainership of binutils in Pkgsrc.
Building it repeatedly with different compilers exposed different warnings, and I've also run builds through Clang's static analyzer.
All of this resulted in the opportunity to contribute to binutils itself:
2021-04-14 | 5f47741 | Remove unneeded tests for definitions of NT_NETBSDCORE values |
2021-04-12 | 0fa29e2 | Remove now unneeded #ifdef check for NT_NETBSD_PAX |
2021-03-12 | be3b926 | Add values for NetBSD .note.netbsd.ident notes (PaX) |
2021-01-26 | e37709f | Fix a double free in objcopy's memory freeing code |
Most recently, I also wrote a couple of blog posts on the topic:
- The state of toolchains in NetBSD
- Speedbuilding LLVM/Clang in 5 minutes
- Speedbuilding LLVM/Clang in 2 minutes on ARM
- The state of toolchains in OpenBSD
- Playing with DJGPP and GCC 10 on DOS
And the journey continues. I'm following a different path from traditional compiler courses starting with lexers and parsers, and doing the opposite curriculum somehow, starting from binaries instead. I will be focusing on the final stages of the pipeline for now: compiling assembly to machine code and producing binaries.
My next steps are to read the full ELF specification, followed by the Linkers and Loader book, and then refresh my ASM skills. My favorite course at university was the computer architecture one and especially its MIPS assembly part, so I'm looking to revisit the subject but with ARM64 assembly this time.