Using syslog for your php applications

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.

« PHP 5.4: RegexIterator::getRegex() Talking at the PHP Benelux 2011 »

Posted on 12 Jan 2011
Tagged with: [ logging ] [ PHP ] [ syslog ]

Linux, and other unices have an excellent system to centralize log events. This is done through syslog. This system removes the need for every application to maintain their own log files and let the syslog server handle all the events. Depending on the type of event that is logged, it can take additional action like alerting you through email or even text-messaging if a critical event occurs. No system administrator can live without syslog and is normally the first place to look for signs of trouble on a system. So instead of writing our own log system for your application, why not use an existing one?

Where to place your (custom) logfiles?

I occasionally see people create logfiles on weird places: in their custom log directory called /log (yes, a directory directory on root), inside the root of an application /www/myapp/log, or even worse, inside a public_html directory because of the “easy access” it provides (never mind the security issues). All these are not considered best practice (and I would like to nominate some for being the worst practice), so instead you should use the standard log directory: /var/log

Why use /var/log?

First of all, it’s the correct place for log-files according to the File Hierarchy Standard. Secondly, systems like logrotate have an easier time on finding and rotating the log-files. It’s not the first time log-files eat up all disk space on production servers because they weren’t properly rotated. Third, because we use the standard paths, the life of everybody is much easier. We don’t have to browse the whole server to look for log-files, we know where we can find them. Especially when time is of the essence (your system doesn’t respond any you need to find out why) you don’t want to waste too much time finding your logs. Things should be stored where others expect them..

Not caring how or where to log: meet syslog

But let’s go a step further and suppose you’ve got a system that logs nicely into the /var/log directory. So you log your custom error-events to /var/log/yourapplication/errors.log, and maybe you have made a system that will log warnings to /var/log/yourapplication/warnings.log. But lets assume you want to be notified when an error occurs. Do you modify your code? Assuming you played nice you’ve probably created a log-class with a log()-method that will take care of logging. The only thing you need to do is change that code so it emails you as soon as something happens. But where do we configure who to mail or even when to mail? And what if we want to aggregate all logs from all your web-server nodes onto 1 single system. How many modifications do you need inside your applications?

So instead, we let a generic system like syslog handle this. It’s so easy, it’s almost ridiculous :). The ONLY thing we need to do from a program perspective is send the event through the syslog() method of php, together with a severity & facility. At that point, the message gets picked up by the syslog-daemon, examines the message and it will take appropriate action. It could be that it will write the message to a log-file, it might email you, it might store the message to a database. Whatever you prefer. All without modification of your code.

openlog('myapplication', LOG_NDELAY, LOG_USER);
syslog(LOG_NOTICE, "Something has happened");
syslog(LOG_ERR, "Something bad has happened");

All we do is open the syslog connection with some default parameters and a custom application tag. This is needed later on so syslog can filter out our messages. Our default facility is USER, the 3rd parameter from the openlog() call. We come back to facilities later. Logging is as easy as calling syslog() with the severity (we discus this too later on), and the actual error message.

The syslog() call actually pushes the message (together with additional information like a priority, a timestamp and our application tag) to the /dev/log socket (or through a UDP network connection). On the other end the syslog-daemon reads from this socket and since we have used the “LOG_USER” facility, the message would probably be logged inside /var/log/user.log. You can try to see if the message has ended up there. Since log-files are normally “privileged” for the superuser, you might have to “sudo” to view the log:

$ sudo tail /var/log/user.log
Jan   4 21:53:42 debian-jth myapplication: Something has happened
Jan   4 21:53:42 debian-jth myapplication: Something bad has happened

As you can see, the last 2 lines are the events fired from your php application. It has got the date, time, the hostname, the application “tag”, and the actual message. You can aggregate multiple sources into one syslog file, so it might be possible to have multiple sources all log to the same log-file, which gives you something like this:

$ sudo tail /var/log/user.log
Jan   4 21:53:42 webnode-001 myapplication: Something bad has happened
Jan   4 21:53:54 webnode-004 myapplication: User 2345 has entered a bad password.
Jan   4 21:54:20 webnode-004 myapplication: User 2345 has successfully logged in.

As you can see you can create a simple system that keeps track on everything that goes on in a clustered system this way. More on how to centralize your event logs later.

About syslog itself

The syslog protocol is currently standardized in RFC 5424 but it dates back to the 1980’s where it was used for only a couple of programs like sendmail. The most important bits you need to worry about are the severities and facilities. It’s a very simple text-based protocol that has not much more information than a priority, a timestamp, an application tag and the actual message. This string can be send to anywhere where there is a syslog server listening on. In the “old” days this was on UDP (port 514), but nowadays syslog reads through /dev/log or a combination of both (or even other ways).

Severities and Facilities

As stated, a message must be accompanied by a priority. A priority is a combination of a severity and a facility. A facility is like a “division” for the message. Although they are numerical (ranging from 0 to 23) they map to constants: LOG_AUTH, LOG_MAIL, LOG_USER, LOG_LOCAL0..7 etc.. You can find them all in the PHP manual. Most syslog daemons will filter messages according to this facility, but that would mean we can only have 8 custom applications (when we use LOG_LOCAL0 to LOG_LOCAL7). It would be a lot nicer when you use the name of the application to filter out our messages, but not all syslog daemons can do this. If you have your custom application, always try to use a facility of LOG_LOCAL0 to 7. The others are most likely reserved for other systems on your server.

The severity parameter reflects the seriousness of the message and range from LOG_DEBUG for debugging messages to LOG_EMERG, which must only be issues when the system is not usable anymore (user applications should never issue those and LOG_ALERT should be the maximum severity it can publish). They almost map nicely on the normal php-error levels so it’s even possible to send “normal” php errors to syslog with the help of a custom error-handler inside your application.

The priority itself nothing more than a simple calculation of:** severity*8+facility**. When the priority is calculated, it’s added to the message string to the syslog daemon. You don’t need to worry about the priority. That is calculated for you automatically by the syslog() call.

Message sizes

Another aspect about syslog, are the size of the messages you send. It really depends on the syslog daemon you are using in what the maximum message is, but the RFC states that the MINIMUM maximum length is 480 bytes. If you have longer messages, be prepared that your messages MIGHT get cut off. Either issue multiple messages or at least place the important bits in the front.

Using syslog-ng

I’m using syslog-ng as the syslog daemon. There are others like the “standard” sysklog package (consisting of *syslogd** and klogd) and the more recent rsyslogd. The main reason I like syslog-ng is that it’s very flexible in use. You can add multiple sources so you can use one single log-server where all other servers log to and you can filter out almost everything. Even it’s output is very flexible: it’s really simple to filter out messages that syslog needs to send over email, store on disk or even add to a database if you like.

In theory it doesn’t really matter which syslog system you use but you should use a syslog daemon that is capable of filtering out on an application tag if you really want to get the most out of syslog.

Filtering our application into a separate logfile

Now that we got our application reporting to syslog, which in turn places our events into /var/log/user.log, it’s time to change some things. First of all we like to pass all messages from our application into custom log-files. You should note that we don’t change anything in our code, only in the syslog configuration. We can filter out based on our application tag like this:

    destination df_userapp { file("/var/log/userapp.log"); };
    
    filter f_userapp { program("myapplication"); };
    
    log {
      source(s_all);
      filter(f_userapp);
      destination(df_userapp);
    };

The configuration is pretty self-explanatory. We create a new “destination” (to a file), which decides where the event is stored. We create a new filter which filters out the events we need (in this case, only events that have the “myapplication” tag) and finally we create a “log” where we tell syslog-ng that we want to collect all events (from all sources), filter out on our tag, and store them into our destination. We can add multiple sources, multiple filters and even multiple destinations. I suggest you go and read up on syslog-ng to find out more..

Emailing our critical errors

Let’s email out critical errors. Again, no need to change our code. We only need to create a new “log” entry with a different severity-filter and a different destination. In our case, this is a simple program that mails the input to a destination, but it might as well be any other type of program. This program is needed because there is no “standard” mail functionality in syslog.

    destination dm_userapp { program("/usr/local/sbin/maillog.pl"); };
    
    filter f_userapp { program("myapplication"); };
    
    log {
      source(s_all);
      filter(f_userapp);
      filter(f_at_least_err);
      destination(dm_userapp);
    };

This is not much different than the previous configuration, and you can have both of them running in syslog-ng. We have an additional filter that filters our on the severity and only filters our events with a severity of “error” or higher. This is a filter already present in syslog-ng.conf, so we don’t have to implement it again. Our destination is not a file anymore, but a program. This program mails the event to a user and is nothing more than a simple script (in this case written in perl, but might have been php):

    #!/usr/bin/perl -n
    # thanks to Brian Dowling for an example with security in mind.
    
    $TO = ‘root@localhost’;
    $FROM = $TO;
    
    s/^<\d{1,2}>//;
    
    open(MAIL, “|/usr/sbin/sendmail -t”);
    
    print MAIL <<”EOT”;
    To: $TO
    From: $FROM
    Subject: Log Alert: $_
    $_
    EOT

(Make sure you change the email-address. Another way could be to do this through a configuration-setting)

When you run our first example php file again after you added the configuration in syslog, you should receive emails with the events. Note that the standard “notice” event should not be emailed (but still it’s stored in the user.log or in your custom log if you have made one).

Sending to a remote server

As a last example, we send our data to a remote server as a good security practice. When something happens to your server, at least the log-files are secured onto a different system. It’s an easy way to aggregate log-files from multiple nodes inside a web-cluster. It makes it easier to track users who get moved from server to server during a session (when your load balancer does not fixate a user-session to a single server).

It means we are dealing with 2 syslog servers here. The “sending” syslog server, the server that actually receives the message from your application. And the “receiving” syslog server. This server listens for sending syslog servers. You must make sure that the “receiving” syslog-server actually listens for incoming events. This normally means you have to uncomment the “udp()” line in the syslog-ng config. On the sending syslog-server, you only need to add another destination:

    destination d_loghost { udp("192.168.1.100" port(514)); };
    
    filter f_userapp { program("myapplication"); };
    
    log {
      source(s_all);
      filter(f_userapp);
      destination(d_loghost);
    };

This shouldn’t look too complicated. We just add another destination, in this case a UDP connection. The host is the receiving syslog-server that listens for the events.

Without syslog-ng

If you don’t use syslog-ng, or just want to send everything to a remote server, you can use the standard syslog daemon. Sending to a remote host is just as easy:

local0.*  @192.168.1.100

where local0.* means everything from the local0 facility (make sure your application only uses this facility). The @ means it must send it to the specified host, in this case 192.168.1.100 where a syslog-ng server is running and accepting input over udp.

Additional experiment

As stated before the syslog() sends its output to /dev/log or to a host over UDP to port 514. If you have your own syslog server that listens to UDP (or TCP), you can try to create a simple function that doesn’t use the syslog() call, but actually sends over the data through UDP. You need to use the actual syslog-format as stated in the RFC, but it’s a good exercise. Give it a try and don’t forget to add it to github as a gist or repository and mail the results in the comments :)

Conclusion

If you are thinking about writing a custom error-system to handle your log-files, take a good look at syslog. There is not much extra you gain from writing your own handler. Syslog and syslog-ng in particular is fast and flexible. You probably already adapt a lot of “unix” features like crontabs, init-scripts and maybe even more. It’s time to add syslog to this list as well…

« PHP 5.4: RegexIterator::getRegex() Talking at the PHP Benelux 2011 »