Bit manipulation in PHP

Warning: This blogpost has been automatically converted from WordPress to Jekyll, and hasn't been fully checked yet. It might be possible that it misses some code snippets or that the formatting is not yet complete. As soon as this blogpost has been checked, this banner will automatically be removed.

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.

« Suits v. Techies.. the neverending battle.. Deflating the universe »

Posted on 02 Jun 2010
Tagged with: [ bit manipulation ]

Although you probably never need it as much as a C-programmer would, it’s not a bad idea to know how bit manipulation works. This post will tell you a bit about what bit manipulation is, why you could use it and how you are using it already (with or without knowing)

As you probably know, computers works with 2-base system called the binary system. A bit (which stands for binary digit) is simply either a 0 or a 1. A group of 8 bits is called a byte. 1024 bytes is called a kilobyte, 1024 kilobytes is 1 megabyte etc etc..

A byte can be represented in many ways:

Decimal:          65
Hexadecimal:      0x41
ASCII character:  'A'
Binary:           10000001

As you can see, it’s all the same value, but written differently.

Each bit in a byte has a value:

bit 0: 1
bit 1: 2
bit 2: 4
bit 3: 8
bit 4: 16
bit 5: 32
bit 6: 64
bit 7: 128

You can see that binary 01000001 means both bit 6 (on the left, we go from right to left!) is set to 1, and bit 0 is set to 1.

bit 7 = `64, bit 0 = 1.. 64+1 = 65, which happens to be the decimal value of the variable we are working on. It all seems to fit perfectly :)

Now, so much for the basics..

Meet your friends: or, xor, and

There are a few basic bit manipulation commands we can use:

or
When or-ing two bits, the outcome will be 1 if at least one bit is 1.
xor
When xor-ing two bits, the outcome will be 1 if both bits are different.
and
When and-ing two bits, the outcome will be 1 if both bits are 1.

There are some more:

not
if the bit you are not-ting is 1, the outcome will be 0, otherwise 1. (reversing the bit basically)
shift left «
the bits will the variable will be shifted X places to the left (more later)
shift right »
the bits in the variable will be shifted X places to the right (again, more later)

Some basic math

AND 0 = 0
AND 1 = 0
AND 0 = 0
AND 1 = 1

OR 0 = 0
OR 1 = 1
OR 0 = 1
OR 1 = 1
  
XOR 0 = 0
XOR 1 = 1
XOR 0 = 1
XOR 1 = 0

Great, but why do I care?

Suppose you have a variable that holds 16 different bits. Each bit is a “flag” that holds a special case.

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

Here, the variable is called $error_level and is set to 0. This means, all the bits (flags) are 0 as well.

Now, we want to set the 3rd flag (bit 2). We know bit 2 has a value of 4, so we could just say:

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

but this causes a problem: if there are already some bits set, they will be unset. So we solve this by OR-ing the flag:

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

Now, suppose error level is 67 (bits 0, 1 and 7 are set) and we are setting bit 2:

10000011
00000100 |
--------
10000111

Since we are dealing with bits, we only want to manipulate the bit in question, we do not care about any other bits. This way we can. We only tell php which bit (or bits) we want to manipulate. No other bits will be harmed in the process.

Suppose we want to make sure bit 5 is not set (we don’t know if it’s currently set or not):

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

Looks complicated, but let’s take a look:

The ~16 means: not 16. So every bit that does not make up 16 will be set to 1, every other bit will be set to 0. This is called ‘masking’ bits. That gives us this:

 16:   00010000
 ~16:  11101111

Now, we are going to AND this value to our variable:

10000111     <- random value that is in error-level, could be anything
11101111&    <- our ~16
---------
10000111

As you can see, nothing has changed. This is because bit 5 wasn’t set in the first place. Now, let’s try it with a $error_level value where bit 5 IS set:

10111101     <- almost all bits are set, including bit 5
11101111&    <- our ~16
---------
10101101

As you can see, the result is the same as the original, except for bit 5 which is set to 0.
You can also set multiple bits at the same time:

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

10000000
10001001|
---------
10001001

Now, attention paying viewers may have noticed that when you OR-ing data, you might as well can add them up by using +. Be very careful with this: even though this works when dealing with bit-fields only, it does not work when flags are made of several bits. When dealing with bit manipulation, use the bitwise operators, not the arithmic ones.

Neat tricks with bits

$value xor $value = 0

In assembly this is (was) one of the quickest ways to set a variable to zero. It makes sense:

10011011
10011011^
---------
00000000

Since all bits are equal to each other, every bit will produce a zero.

$value = $value >> 1;

This will divide the value by 2.

      00010100  >> 1    (20 decimal)
      00001010  bits shifted left 1 place (adding a 0 at the end).  10 decimal.

Works for everything (odd numbers are rounded downwards).

Check quickly if a number is even or odd (without doing a divide):

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

This will mask the first bit (bit 0). When this bit is 1, the value is odd (check it out yourself!)

Why PHP does not do bits

You assign a variable in PHP probably this way:

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

This will assign the value ‘1’ to $var. Note that you don’t specify what kind of variable it is. You assign it as a integer, but you can also use it as a string, or as a float. Doesn’t matter for PHP. It’s internal structure (the ZVAL) does all the hard work for you converting things the way you want it too. That’s one of the strengths (some say weaknesses) of PHP. As said: since you don’t specify the type, you cannot tell PHP that $var is a single bit. PHP simply does not work this way.

Why PHP DOES do bits

Ok, so I lied.. you can do bit manipulation in PHP, but not really the way you’d expect. What we can do, is use ANY variable as a value that stores bits. For instance, we can use strings, integers, even arrays. Doesn’t matter. Bitwise manipulation can come in handy even in PHP from time to time. It’s a simple way of dealing with on-off flags inside either your code.

Suppose we have this:

class myfile {
  protected bool isDirectory;
  protected bool isReadOnly;
  protected bool isSymlink;
  protected bool isBlockDevice;
  protected bool isCharDevice;
  ...
}

instead of all these options (including the getters/setters), you could have one method / property:

class myfile {
  const DIRECTORY = 1;
  const READONLY  = 2;
  const SYMLINK   = 4;
  const BLOCKDEV  = 8;
  const CHARDEV  = 16;

  protected int $iFileFlags;

  function getFlags () {
    return this->iFileFlags;
  }
}

$f = myfile ();
if ($f->getFlags() & (myFile::DIRECTORY | myFile::SIMLINK) ) {
  print "File is a symlinked directory!";
}

Conclusion

Bit manipulation comes in very handy from time to time. It can save space, speed and CAN increase readability when used correctly. An example of bit manipulation is very easy to spot. Just look at the error_reporting function in php: http://nl.php.net/manual/en/function.error-reporting.php. Even if you didn’t understand what was going on, I hope you do now…

« Suits v. Techies.. the neverending battle.. Deflating the universe »