The GNU Build tools, part 1
Tagged with: [ autotools ] [ gnu ] [ make ]
So there you are, you want to install some tool or application that didn’t come in a package, or you want to use the
latest version. You download the tarball (*.tar.gz
file), untar it, do a ./configure && make && make install
and all
is well.. But what exactly is it that you are doing? Why must unix users compile everything by themselves? Wouldn’t
life be much easier if we all could download a binary and run it, just like on close source OS’es? Are binaries that
evil that we must compile everything ourselves manually so we know what we are installing. Well, yes and
no…
As far as I can tell, there are 3 advantages for compiling applications manually. The first reason is freedom. You can browse the source, maybe even make adjustments and compile it for yourself. But that alone isn’t good enough. Most people (I dare to say at least 98% of them) just want stuff to work so downloading a binary would save lots of time. And if you have the source available online, people can still browse and modify it.
Another reason is that we can scrutinize the source code so we know nothing evil will be done by our application. But honestly, when was the last time you downloaded a tarball and actually spend the next few days or weeks browsing through the complete source in order to make sure that nothing strange is going on? Never!? Though so!
The last reason might be the most important one: we can actually compile our application the way we want to. Consider this: suppose we have a Intel core i5 processor in our system. What would be the most efficient application: the application that is written for the most generic 80386 processor, or the application that is actually optimized for the Intel core i5? And would you want your application to be 32bit when you have a 64bit processor? Off course not.
So it all boils down to this: when compiling an application manually, you get the most optimized application for YOUR system.
How does that difference with standard application you install through your distribution’s package system? Those
packages are mostly non-optimal for your system, since those binaries must be generic enough that ALL systems running
that distribution can work. There are a few common flavors: i386
, i686
and x86_64
(we consider Intel only for the
moment). So when you have a 486 running, you can only install packages that were actually compiled for the 80386 (i386
).
The i686
uses a more efficient code but has got it’s own quirks.
So in the end, compiling for your specific system benefit applications. What about your highly-optimized Apache on your web servers? Or the best MySQL-server your system can run?
The GNU Build tools
The GNU build tools are tools that are used by programmers to create automatic scripts that makes it possible to change the way your software is build. They are also know as autotools and consists of a few (simple) applications. But before we can actually discuss the autotools, we must know how a ”normal” application is build on the unix platform. So the first 2 posts are about 2 tools: Make and autoconf.
Make
We assume a C program. In our case a simple example like hello world
will do. It consists of a source file (hello.c
),
maybe a header file (hello.h
). This has to be compiled with a compiler (gcc), so we need to a shell-script that contains
the compile-commands to make life easier for everyone. But when we get into more complex sources, shell-scripts aren’t a
great option anymore. Our sources can consists of many smaller objects, each with their own dependencies and rules. So
when you change one header file, it might be possible that you need to compile 3 objects again. Off course, if your
application consists of 100 of those objects, you don’t want to recompile the other 97 ones each time you change a
single thing inside that header file.
To manage this, we use makefiles. Makefiles are sorta kinda scripts that is processed by something called make
. If you
do PHP, you might have heard of phing
or ant
, which can be consider as makefiles in a sense. Make will figure out the
dependencies and according to the rules you specify it will execute stuff to build your software. In a sense, normally
you will have 2 main targets: all
and install
. The all
target (executed when you don’t specify a target with
make), will turn your source-code into a binary or binaries. It normally consists of smaller sub-targets, depending on
the size of the software. The install
target is where make will copy all the binaries and libraries to the correct
place on your system. Since you don’t want to do that all the time, the install
target is kept outside the all
target. Another reason is that you almost certainly need root
permissions to install your software, which you don’t need
to do the actual compiling, so it separates permissions as well.
# This default rule will compile hello into a binary all: gcc -o hello hello.c # This rule will install the binary into /usr/local/bin with the correct more. Note that we # need to be root to do this. install: install -m 0755 hello /usr/local/bin
Now as you can see, a makefile can be pretty simple and in this case, maybe a shellscript would have sufficed. But this
file is not generic enough. What about the users who don’t have the gcc
or install
program? What about the users
that wants to install the software into another directory? They must change the makefile themselves in order to get
things running. And this is still a simple makefile.
# We use "gcc" as our compiler CC=gcc # "gcc" is also our linker LD=gcc # There are the flags for "gcc" CFLAGS=-g -Wall # THere are no linker flags we specify LDFLAGS= # These are the objects we need to create OBJS=hello.o # This is the main binary PROG=hello # This is our install program INSTALL=install # This is the path where we need to install our program INSTALL_PATH=/usr/local/bin # This is the default target. It only depends on the main program all: $(PROG) # Main binary depends on our objects. After the objects are # created, link them into the main binary $(PROG): $(OBJS) $(LD) $(LDFLAGS) $(OBJS) -o $(PROG) # This rule will be triggered when we need to convert a .c file into a .o file # The $< stands for the .c file, the $@ for the .o file. .c.o: $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@ install: $(PROG) $(INSTALL) -m 0755 $(PROG) $(INSTALL_PATH) # This target will cleanup everything so we can start from scratch again clean: $(RM) *.o $(PROG) .PHONY: install clean
Ok, so this looks a lot complicated, but in fact, it really isn’t. The first few lines are nothing more than simple
defines on what software and options we use. As you can see, we have split the c-compiler and linker so if anyone
chooses so, they can use cc
for compiling and ld
for linking the software, or use any other compiler/linker they
want by merely changing the few top lines.
We have added an extra target (clean) that will clean up all object files and binaries so we can start again if we want
to. Also almost everything depends on each other. The all
target depends on the main program (PROG
), and that depends
on the objects, which are compiled from .c
into .o
files by the .c.o
rule. Complex? Yes, in our case too complex, but
we can add 100 different objects in which case we only need to change the OBJS=
line. The rest of the makefile doesn’t
have to be touched.
Solving dependencies
As I’ve stated before, make will try to be as lazy as possible and only recompile things when they need to be recompiled. So when you change one source file, only that object will be recompiled, not all the other ones. But how does make know when to recompile? Surely, it’s easy to figure out that a source has changed (it uses its timestamp for this), but what happens when we change a header-file? How does make knows which objects use that header file?
We have 2 solutions: the first solutions is manually add the dependencies to the makefile. It’s easy:
hello.o: hello.h world.h
This tells make that whenever hello.h or world.h have changed, hello.o needs to be rebuild. But again, this isn’t generic enough. Suppose we have 100 different objects, with 100 different header files, each with their own dependencies and inter-dependencies. It becomes unmanageable quickly.
A better solution is to use tools like makedepend
. This tool will scan source-codes and figure out the dependencies
which are added automatically to your makefile. So in our case, we just add another target to our makefile:
MAKEDEPEND=makedepend # Since our sources are nothing more than our objects, which the .o changed to .c, we use this rule: SRCS=$(OBJS:.o=.c) depend: $(MAKEDEPEND) -Y $(SRCS)
What makedepend
does is modify the actual Makefile by scanning all sources we specify (our SRCS
list), and adding all
dependencies at the END of the makefile.
So before we run make
target, we must run a make depend
first. Of course, you can add this as a dependency on the
all
target as well, so it runs automatically:
all: depend $(PROG)
So now we have a makefile that takes care of many things: it can solve our dependencies, it’s easy to modify for users
who want things just to be a little different and it works is most cases. But still, we need more automation! Our
makefile uses the gcc
, install
and makedepend
applications. What if they are not present on a users system? Can we use
another compiler? And maybe our software uses 3rd party libraries like libxml
. If that is not present, is our software
capable of using another library? And maybe the most important thing: can we find out the most optimal compiler options
run-time so we can modify our makefile to use them? That would result in the most optimal binary for that particular
user. Turns out, this all can be automated pretty easy with the help of autoconf.
In the next part, we will discuss what autoconf is, and how it affects our Makefile. Stay tuned…