Tuesday 6 May 2014

Building Go - First Steps

As I have mentioned in my previous posts, my goal is to add a new architecture (PowerPC) to the Go toolchain. If successful, you will be able to both compile and install the toolchain on a PowerPC machine. Also, you will be able to cross-compile PowerPC (ppc64) binaries using x86/arm machines. This is due in part to the multi-platform capabilities inherited from Plan 9.

Logically, each of the four fundamental tools in the toolchain (Assembler, Linker, C Compiler, Go Compiler) has some mixture of platform-neutral and platform-specific code. This is evident given the fact that there are platform-neutral directories (ld, cc, gc) as well as platform-specific ones (e.g. 5a, 5l, 5c, 5g). It is clear that that I need to create ppc64 equivalents (e.g. 9a, 9l, 9c, 9g).

At a very superficial level, ppc64 shares some similarities with arm. They are both considered RISC architectures. There are still some very large differences including the endianness and number of registers, but the arm code seems like a better place to start than with x86/amd64, which are much different. I copied the 5x directories to 9x as a starting point to get something compiling.

Running an initial build of the toolchain (using the "src/all.bash" script) falls flat on its face.
$ ./all.bash
# Building C bootstrap tool.
cmd/dist
go tool dist: unknown architecture: ppc64
This is not entirely unexpected. I did not declare the ppc64 architecture anywhere in the code and I have not mapped it to the chosen number '9.' The question is where to put the declaration and mapping. A good place to start is with the "cmd/dist" entry shown in the build log from the "all.bash" script.

The toolchain uses a special C program called 'dist' instead of make to compile the tools, which was quite unexpected. The build log doesn't show the command-line parameters and the dist command doesn't have any debugging symbols making it rather difficult to figure out what goes wrong. Let's fix that by making these changes to the 'make.bash' file:
make.bash:
${CC:-gcc} $mflag -g -O0 -Wall -Werror -o cmd/dist/dist -Icmd/dist "$DEFGOROOT" cmd/dist/*.c
...echo ./cmd/dist/dist bootstrap $buildall $GO_DISTFLAGS -v

./cmd/dist/dist bootstrap $buildall $GO_DISTFLAGS -v # builds go_bootstrap

After some careful debugging I discovered that the declarations should go in the cmd/dist/build.c and cmd/dist/unix.c files:
build.c:
// The known architecture letters.
static char *gochars = "56689";
// The known architectures.
static char *okgoarch[] = {
// same order as gochars
"arm",
"amd64",
"amd64p32",
"386",
"ppc64",
};
dist.c:
else if(contains(u.machine, "arm"))
gohostarch = "arm";
else if(contains(u.machine, "ppc64"))
gohostarch = "ppc64";


Now when I run the all.bash script it gets alot further along but ultimately fails due to missing dependencies. I will go over the problem and my solution in the next post.

Sunday 4 May 2014

Origin of the Go toolchain

I will devote some time to presenting what I understand to be the origins of the toolchain. I think it helps to explain why the design looks like it does. Please take it with a grain of salt because I may not have the complete and accurate picture. I ask any experts out there to please point out any misunderstandings or inaccuracies.

The Go toolchain has a history that extends back to Plan 9, an operating system developed in Bell Labs starting 30 years ago and intended to be the successor to Unix. You need only to check the wikipedia article to see the resemblance between the Go mascot and the Plan 9 mascot. I could devote an entire post about Plan 9 alone. I encourage anyone with an understanding of Unix/Linux to give it a try in a Virtualbox image.

To make a long story short, Plan 9 is a distributed operating system that allows you to treat a collection of computers as one logical system. A system can be composed of nodes with different processor architectures (e.g. arm, powerpc, x86, mips). In order to support such a system there needs to be a platform independent programming language, ANSI C. Also, there needs to be a way to easily compile and distribute architecture-specific binaries.

It is for this reason that Plan 9 has a variety of C compilers that you can invoke. Each one compiles C code into a architecture-specific object file. The object files are linked together with the correct linker for the architecture to produce the final binary. This can be repeated for each architecture so that your program can run anywhere in the system. Plan 9's sophisticated union mounts are configured so that the '/bin' directory usually contains the binaries for the correct architecture for the current node making it transparent to most users.

Plan 9's compilers, linkers and assemblers use a two character naming convention. One example is the "8c" tool, which is the C compiler for x86. Similarly, "5l" is the Linker for ARM. I'm not certain in all cases how the first character is chosen. I found a table with the historical architectures in the C compiler documentation. When you want to refer to the tools in an architecture independent way there are two-letter names: cc (C compiler), ld (linker).

Getting back to the Go toolchain, it retains the naming convention with some additions. They have chosen the number '6' to represent the relatively new AMD64 (64-bit x86) architecture. Also, they have added the letter 'g' to represent the Go compiler (e.g. 8g for x86 Go compiler, gc for the general name). The Go tool hides much of these internals away with its high level commands but you can still find them the pkg/tool/archname directory of your installation.

Go toolchain also uses a unique family of high-level assembly languages originally devised in Plan 9. You can't generally copy assembly language snippets and embed them in your Go programs. For example, there are special move pseudo-instructions in place of the usual load/store that are common in some architectures, such as PowerPC. Also, there are certain addressing modes used in all of the assemblers.

Go has its own object file and archive file format based on Plan 9. Plan 9 existed before ELF so you'll find that even the Linux 'nm' tool is unable to parse the archive (*.a) files in your GOPATH/pkg directory. You will notice the same kinds of issues using any OS-specific tool on these files in Windows, Mac OS, Solaris and FreeBSD. Fortunately, Go provides its own 'nm' and 'objdump' tools to inspect object and archive files. In most cases, you can use OS tools to analyse the final executable because the Go linker eventually converts everything into the OS-specific format.

I'm going to stop here. I hope that this post has been useful, interesting and accurate.

Saturday 3 May 2014

Introduction

Welcome to the Adventures on the Go blog. I am relatively new to the Go language. What better way to build a thorough understanding than to look at how the toolchain and compiler works?

The toolchain currently supports a number of popular architecture (x86, amd64 and arm) but lacks support for PowerPC. Also, the toolchain is largely undocumented, except for some old plan 9 documents from the more than 10 years ago. This seems like a good area to explore for a project.

Porting the toolchain to a new processor architecture will certainly be difficult to achieve, considering my lack of experience in compilers and linkers. I hope to document what I learn about the toolchain along the way so that it can help others who want to explore it. This is the central purpose of this blog. I hope that it is helpful.