The GNU Build tools, part 1

So there you are, you want to install some tool or application that didn’t come in a package, or you want to use the latest version. You download the tarball (*.tar.gz file), untar it, do a ./configure && make && make install and all is well.. But what exactly is it that you are doing? Why must unix users compile everything by themselves? Wouldn’t life be much easier if we all could download a binary and run it, just like on close source OS’es? Are binaries that evil that we must compile everything ourselves manually so we know what we are installing. Well, yes and no…

As far as I can tell, there are 3 advantages for compiling applications manually. The first reason is freedom. You can browse the source, maybe even make adjustments and compile it for yourself. But that alone isn’t good enough. Most people (I dare to say at least 98% of them) just want stuff to work so downloading a binary would save lots of time. And if you have the source available online, people can still browse and modify it.

Another reason is that we can scrutinize the source code so we know nothing evil will be done by our application. But honestly, when was the last time you downloaded a tarball and actually spend the next few days or weeks browsing through the complete source in order to make sure that nothing strange is going on? Never!? Though so!

The last reason might be the most important one: we can actually compile our application the way we want to. Consider this: suppose we have a Intel core i5 processor in our system. What would be the most efficient application: the application that is written for the most generic 80386 processor, or the application that is actually optimized for the Intel core i5? And would you want your application to be 32bit when you have a 64bit processor? Off course not.

So it all boils down to this: when compiling an application manually, you get the most optimized application for YOUR system.

How does that difference with standard application you install through your distribution’s package system? Those packages are mostly non-optimal for your system, since those binaries must be generic enough that ALL systems running that distribution can work. There are a few common flavors: i386, i686 and x86_64 (we consider Intel only for the moment). So when you have a 486 running, you can only install packages that were actually compiled for the 80386 (i386). The i686 uses a more efficient code but has got it’s own quirks.

So in the end, compiling for your specific system benefit applications. What about your highly-optimized Apache on your web servers? Or the best MySQL-server your system can run?

The GNU Build tools

The GNU build tools are tools that are used by programmers to create automatic scripts that makes it possible to change the way your software is build. They are also know as autotools and consists of a few (simple) applications. But before we can actually discuss the autotools, we must know how a “normal” application is build on the unix platform. So the first 2 posts are about 2 tools: Make and autoconf.

Make

We assume a C program. In our case a simple example like hello world will do. It consists of a source file (hello.c), maybe a header file (hello.h). This has to be compiled with a compiler (gcc), so we need to a shell-script that contains the compile-commands to make life easier for everyone. But when we get into more complex sources, shell-scripts aren’t a great option anymore. Our sources can consists of many smaller objects, each with their own dependencies and rules. So when you change one header file, it might be possible that you need to compile 3 objects again. Off course, if your application consists of 100 of those objects, you don’t want to recompile the other 97 ones each time you change a single thing inside that header file.

To manage this, we use makefiles. Makefiles are sorta kinda scripts that is processed by something called make. If you do PHP, you might have heard of phing or ant, which can be consider as makefiles in a sense. Make will figure out the dependencies and according to the rules you specify it will execute stuff to build your software. In a sense, normally you will have 2 main targets: all and install. The all target (executed when you don’t specify a target with make), will turn your source-code into a binary or binaries. It normally consists of smaller sub-targets, depending on the size of the software. The install target is where make will copy all the binaries and libraries to the correct place on your system. Since you don’t want to do that all the time, the install target is kept outside the all target. Another reason is that you almost certainly need root permissions to install your software, which you don’t need to do the actual compiling, so it separates permissions as well.

    # This default rule will compile hello into a binary
    all:
            gcc -o hello hello.c
    
    # This rule will install the binary into /usr/local/bin with the correct more. Note that we
    # need to be root to do this.
    install:
            install -m 0755 hello /usr/local/bin

Now as you can see, a makefile can be pretty simple and in this case, maybe a shellscript would have sufficed. But this file is not generic enough. What about the users who don’t have the gcc or install program? What about the users that wants to install the software into another directory? They must change the makefile themselves in order to get things running. And this is still a simple makefile.

# We use "gcc" as our compiler
CC=gcc

# "gcc" is also our linker
LD=gcc

# There are the flags for "gcc"
CFLAGS=-g -Wall

# THere are no linker flags we specify
LDFLAGS=

# These are the objects we need to create
OBJS=hello.o

# This is the main binary
PROG=hello

# This is our install program
INSTALL=install

# This is the path where we need to install our program
INSTALL_PATH=/usr/local/bin

# This is the default target. It only depends on the main program
all: $(PROG)

# Main binary depends on our objects. After the objects are
# created, link them into the main binary
$(PROG): $(OBJS)
        $(LD) $(LDFLAGS) $(OBJS) -o $(PROG)

# This rule will be triggered when we need to convert a .c file into a .o file
# The $< stands for the .c file, the $@ for the .o file.
.c.o:
        $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@

install: $(PROG)
        $(INSTALL) -m 0755 $(PROG) $(INSTALL_PATH)

# This target will cleanup everything so we can start from scratch again
clean:
        $(RM) *.o $(PROG)

.PHONY: install clean

Ok, so this looks a lot complicated, but in fact, it really isn’t. The first few lines are nothing more than simple defines on what software and options we use. As you can see, we have split the c-compiler and linker so if anyone chooses so, they can use cc for compiling and ld for linking the software, or use any other compiler/linker they want by merely changing the few top lines.

We have added an extra target (clean) that will clean up all object files and binaries so we can start again if we want to. Also almost everything depends on each other. The all target depends on the main program (PROG), and that depends on the objects, which are compiled from .c into .o files by the .c.o rule. Complex? Yes, in our case too complex, but we can add 100 different objects in which case we only need to change the OBJS= line. The rest of the makefile doesn’t have to be touched.

Solving dependencies

As I’ve stated before, make will try to be as lazy as possible and only recompile things when they need to be recompiled. So when you change one source file, only that object will be recompiled, not all the other ones. But how does make know when to recompile? Surely, it’s easy to figure out that a source has changed (it uses its timestamp for this), but what happens when we change a header-file? How does make knows which objects use that header file?

We have 2 solutions: the first solutions is manually add the dependencies to the makefile. It’s easy:

hello.o: hello.h world.h

This tells make that whenever hello.h or world.h have changed, hello.o needs to be rebuild. But again, this isn’t generic enough. Suppose we have 100 different objects, with 100 different header files, each with their own dependencies and inter-dependencies. It becomes unmanageable quickly.

A better solution is to use tools like makedepend. This tool will scan source-codes and figure out the dependencies which are added automatically to your makefile. So in our case, we just add another target to our makefile:

    MAKEDEPEND=makedepend
    
    # Since our sources are nothing more than our objects, which the .o changed to .c, we use this rule:
    SRCS=$(OBJS:.o=.c)
    
    depend:
            $(MAKEDEPEND) -Y $(SRCS)

What makedepend does is modify the actual Makefile by scanning all sources we specify (our SRCS list), and adding all dependencies at the END of the makefile.

So before we run make target, we must run a make depend first. Of course, you can add this as a dependency on the all target as well, so it runs automatically:

all: depend $(PROG)

So now we have a makefile that takes care of many things: it can solve our dependencies, it’s easy to modify for users who want things just to be a little different and it works is most cases. But still, we need more automation! Our makefile uses the gcc, install and makedepend applications. What if they are not present on a users system? Can we use another compiler? And maybe our software uses 3rd party libraries like libxml. If that is not present, is our software capable of using another library? And maybe the most important thing: can we find out the most optimal compiler options run-time so we can modify our makefile to use them? That would result in the most optimal binary for that particular user. Turns out, this all can be automated pretty easy with the help of autoconf.

In the next part, we will discuss what autoconf is, and how it affects our Makefile. Stay tuned…