Incrementing values in PHP

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.
Posted on 13 Oct 2015
Tagged with: [ Bytecode ]  [ Internals ]  [ PHP

Take a variable, increment it with 1. That sounds like a simple enough job right? Well.. from a PHP developer point of view that might seem the case, but is it really? There are bound to be some catches to it (otherwise we wouldn’t write a blogpost about it). So, there are a few different ways to increment a value, and they MIGHT seem similar, they work and behave differently under the hood of PHP, which can lead to - let’s say - interesting results.

Let’s take a look:

There seems to be many different ways of adding 1 to a variable. Take a look at these three examples:

$a = 1;
$a++;           # Unary increment operator
var_dump($a);

$b = 1;
$b += 1;        # Add assignment operator
var_dump($b);

$c = 1;
$c = $c + 1;    # Standard add operator
var_dump($c);

Different code, but all three blocks will increment the number. But will they all result in the same output?

int(2)
int(2)
int(2)

Seems intuitive enough and they look all equal enough. So it seems that using $a++ is just as valid as using $a += 1 for incrementing. But let’s take a look at another example:

$a = "foo";
$a++;
var_dump($a);

$a = "foo";
$a += 1;
var_dump($a);

$a = "foo";
$a = $a + 1;
var_dump($a);
string(3) "fop"
int(1)
int(1)

I reckon most aren’t expecting this outcome! Maybe some of you probably knew that adding something to a string will result in different characters, and guessed the fop string right, but the two int(1)’s? Where do they come from? From a PHP developer’s point of view, it seems very inconsistent and now it seems that these three statements actually aren’t equal. But let’s take a look what is actually happening under the hood of PHP when executing the code.

The bytecode

When a PHP script runs, the first thing it does is actually compile your PHP code into an intermediate format called byte code (this also debunks the fact that PHP is a truly interpreted language, it’s the byte code that gets interpreted, but not the actual PHP source code). Our example code will output the following byte code:

compiled vars:  !0 = $a, !1 = $b, !2 = $c
line     #* E I O op                 fetch          ext  return  operands
---------------------------------------------------------------------------
    3     0  E >   ASSIGN                                         !0, 1
    4     1        POST_INC                               ~1      !0
          2        FREE                                           ~1
    5     3        SEND_VAR                                       !0
          4        DO_FCALL                            1          'var_dump'

    7     5        ASSIGN                                         !1, 1
    8     6        ASSIGN_ADD                          0          !1, 1
    9     7        SEND_VAR                                       !1
          8        DO_FCALL                            1          'var_dump'

   11     9        ASSIGN                                         !2, 1
   12    10        ADD                                    ~7      !2, 1
         11        ASSIGN                                         !2, ~7
   13    12        SEND_VAR                                       !2
         13        DO_FCALL                            1          'var_dump'

         14      > RETURN                                         1

You can create this kind of opcodes easily yourself with the help of Derick Rethans VLD Debugger or online through 3v4l.org. Don’t worry about what it all means. If we get rid of all uninteresting things, we only keep these lines:

compiled vars:  !0 = $a, !1 = $b, !2 = $c
line     #* E I O op                 fetch          ext  return  operands
---------------------------------------------------------------------------
  4     1        POST_INC                               ~1      !0
        2        FREE                                           ~1

  8     6        ASSIGN_ADD                          0          !1, 1

 12    10        ADD                                    ~7      !2, 1
       11        ASSIGN                                         !2, ~7

So an $a++ results into 2 opcodes (POST_INC and FREE), $a += 1 into one (ASSIGN_ADD), and $a = $a + 1 into two again. Notice that all three of them result into different codes, already implying that the actual code that will be executed by PHP will also differ.

Unary increment operator

Let’s talk about the first way of incrementing, the unary incremental operator ($a++). This PHP code will result into the POST_INC opcode. (its partner PRE_INC would be the result of ++$a and you should know the difference between the two). The second opcode FREE actually frees up the result from POST_INC, as we don’t use its return value (since a POST_INC changes the actual operand in-place). We can ignore this opcode for our case.

The magic that defines what will happen when these opcodes are executed is located in the file called zend_vm_def.h, which can be found in the actual C source code of PHP. It’s a large C-language header file full of macro’s so it might be a bit hard to read, even if you know C. Let’s take a look what happens during a POST_INC opcode call, defined at line 971 of that file (don’t worry, you don’t need to know about C):

In a nutshell it does the following:

  • Check if the variable ($a in PHP code, which in our bytecode is referenced as !0) is of the type long. Basically it means to check if the variable contains a number (even though PHP is loose-typed, every variable still has a “type”, and it can switch these types, which we see later). If it’s a long, it will call the C function fast_increment_function() and returns to the next opcode.
  • If the variable is not a number, it will do some basic checks to see if incrementing is possible (you can’t do this for instance on string offsets: $a = "foobar"; $a[2]++ which will result in an error).
  • Next, check if the variable is a non-existing property of an object and that the object has a __get and __set magic PHP methods. If so, use the __get to fetch the correct value, call fast_increment_function() and store the value by calling the __set method (it actually calls these methods from C, not within PHP).
  • Finally, if the variable is not a property, just call the increment_function().

As you can see, incrementing a number, behaves differently based on the type of the variable. Pretty much it boils down to calling fast_increment_function when it’s a number or when it’s a magic property, and calling increment_function() otherwise. We’ll discuss these functions below, as the real work will be done there.

The fast_increment_function()

The fast_increment_function() is a function located in zend_operators and its job is to increment a certain variable as fast as possible.

If the given variable is a long, it will actually use some very fast assembly code to increment the value. If a value reached the maximum int value (LONG_MAX), the variable gets automatically converted to a double. Since this piece of code is written assembly, this is the fastest way to actually increase a number (provided that the compiler cannot optimize its C code better than this assembly code), but it can only work when the variable is a long. If the given variable is not a long, it will simply redirect to the increment_function(). Since incrementing (and decrementing) will happen mostly in very tight inner loops (like in for-statements for instance), doing this as fast as possible is mandatory in keeping PHP quick.

increment_function()

So if the fast_increment_function() is the fast way of incrementing a number, the increment_function is the slow way of doing this. How something is incremented from this point, is again based on the type of the variable.

  • If the variable is a long, it will simply increase the number (and convert it to a double, if we reached the maximum value that can be stored inside a long). Most of the time, this would already be taken care of by the fast_increment_function, but it might happen that we enter this function with a long anyway, so we must check it here as well.
  • If the variable is a double, we simply increase the double.
  • If the variable is a NULL, we return a long 1 (always!).
  • If the variable is a string, we do some magic we discuss later.
  • If the variable is an object, and has internal operator functionality, call the add operator to add the long 1 to it. Note that this only works for internal classes that manually have defined these operator functions, as you cannot define operators on objects in userland PHP code. The only class I found in a quick scan through the PHP source code, that actually implements this, is the GMP class so you can do $a = new gmp(1) + new gmp(3); // gmp(4). This is actually a new feature of GMP since PHP 5.6, but operator overloading is something that is not directly possible in PHP.
  • If the variable of some other type than the ones above, we can’t increment it and return a failure code.

So it takes care of objects, doubles, nulls etc. It does not handle for instance booleans, indicating that you cannot increment a boolean. So $a = false; $a++ won’t work, but also won’t return an error. It just won’t change the variable (it stays false).

Incrementing strings

Now for the fun part. Incrementing strings. Dealing with strings is always tricky, but here is what happens:

First, a check is done to see if a given string actually contains a number. For instance, the string 123, contains the number 123. This string-number will be converted into an actual long number (thus int(123)). There are few catches though when trying to convert:

  • White spaces are stripped.
  • Hex numbers are supported (0x123).
  • Octal and binary (0123 and b11) are not supported.
  • Scientific notation is supported (1E5).
  • Doubles are supported.
  • Pre or postfixed number string (like: 135abc or ab123) are not supported and are not considered a number.

If the output of this check is a long or double, it will simply increase the number. This means that when using a string 123 and increment it, the output will be int(124) (note that it changes the variable type from a string to an int!).

If the string could not be converted into a long or double, it will call the function increment_string() instead.

increment_string()

PHP uses a perl-like string incrementing system. If a string is empty, it will return simply string("1"). Otherwise, it will use a carry-system to increment the string:

Start from the back of the string. If the character is between ‘a’ or ‘z’, increment this character (a becomes b, etc). If the character is z, wrap around to a, and carry one over to the string position before.

So: a becomes b, ab becomes ac (no carry needed), az becomes ba (z becomes a and a becomes b because we carry one character).

Same goes with uppercase A to Z and with digits 0 to 9. When incrementing a 9 it wraps to 0 and carries one.

When we reach the beginning of the string, and we need to carry, we just add another character IN FRONT of the string, of the same type that we carried:

 "z" =>  "aa"
 "9" =>  "00"
"Zz" => "AAa"
"9z" => "10a"

So when incrementing a string, we can never change the type of each character: if it’s a lowercase letter, it will always stay a lower case letter.

But be careful: when incrementing a “string-number” multiple times:

Incrementing string("2D9") will result in string("2E0") (since string("2D9") is not a number, thus the regular string increment will happen). But, when incrementing string("2E0"), it will result in double(3): 2E0 is the scientific notation for 2, thus it will convert it to a double, and then increment that double into 3. So be careful with loops and increments!

 

This string-increment system also might explain why we can increment the string “Z” to “AA”, but why we cannot decrement “AA” back to “Z”. We could decrement the last “A” back to a “Z”, but what would we do with the first “A”? Should it decrement also to a “Z” because of a (negative) carry? What what about “0A”? Would that become Z? But if so, incrementing that again, will result into AA. In other words: we cannot simply remove characters during decrementing like we can add characters when incrementing.

 

Add assignment expression

So let’s take a look at the second PHP code, which is the add assignment expression (basically $a += 1). This seems similar to the unary increment operator, but behaves differently, in both generated opcodes and in actual execution. It is ultimately processed by the zend_binary_assign_op_helper, which after some checks, calls the <a href="http://lxr.php.net/xref/phpng/Zend/zend_operators.c#921">add_function</a>, with 2 operands: $a and our int(1) value.

add_function()

The add_function behaves differently based on the types of the variables. It mainly consists of doing a type-check on the operand pair, to see what the variable types are of both operands:

  • If the two operands are both long, their values are simply added (and the result is converted to a double if overflowed).
  • If the two operands are a long and a double, both will be converted to a double, and added.
  • If the two operands are doubles, they are simply added together.
  • If both operands are arrays, they will be merged based on keys: $a = [ 'a', 'b' ] + [ 'c', 'd' ];, will result in [ 'a', 'b'], as it will merge the second array, BUT they happen to have the same keys. Note that it does < strong>not</strong> merge on values, only on keys.
  • Next, it will try and see if the operands are objects, and checks if the the first operand has internal operator functionality (just like in the increment_function() method). Again, this is not something that you can create yourself in php, but is only supported for internal classes like the GMP class.

If all fails, because the operands are of different types (like being a string and a long), it will convert both operands into scalars through the zendi_convert_scalar_to_number method. When converted, it will basically retry the whole add_function again, but this time, it will probably match one of the pairs above.

zendi_convert_scalar_to_number()

Converting a scalar to number depends on the scalar type. It basically boils down to this:

  • If the scalar is a string, check to see if it contains a number through is_numeric_string. If it does not contain a numerical value, return int(0).
  • If the scalar is null, or boolean false, return int(0).
  • If the scalar is a boolean true, return int(1).
  • If the scalar is a resource, return the numerical value of the resource number.
  • If the scalar is an object, try and cast the object to a long (just like the internal operators, there could also be internal cast functionality, again, not always implemented, and only available for core classes, not php userland classes).

Add operator

The add operator is the simplest one of the three. It boils down to calling the function fast_add_function(). Just like the fast_increment_function(), is uses some direct assembly code to add the numbers if both operands are a long or double. If not possible, it will redirect to the add_function(), which is the same one that is used by the assignment expression.

Since both the add-operator and the add-assignment expression both use the same underlying functionality, doing a $a = $a + 1 and $a += 1 are equal in working. The only exception is that the add operator CAN result in a fast adding, if both operands are long or double, so IF you want to do some micro-optimization, an $a = $a + 1 will be faster than $a += 1. Not only because the fast_add_function(), but also we don’t need to process the additional bytecode to store the result back into $a.

#Conclusion Incrementing values behave differently from adding values: the add_function actually converts types into compatible pairs, while the increment_function does not. We can explain the following results now:

$a = false;
$a++;
var_dump($a);   // bool(false)

$a = false;
$a += 1;
var_dump($a);   // int(1)

$a = false;
$a = $a + 1;
var_dump($a);   // int(1)

Since increment_function does not convert the boolean value (it’s not a number, or a string that can be converted into a number), it fails (silently) and does not increment the value. Thus leaving it to bool(false). The add_function tries to match a boolean and long, which doesn’t exist. Thus is converts both values to long: the bool(false) gets converted to int(0) and int(1) just stays int(1). Now we have a long & long pair, so the add_function can simply add them, resulting in int(1). (Question: what would a boolean true + int(1) become?)

Another weird thing we can explain now:

$a = "foo";
$a++;
var_dump($a);   // string("fop")

$a = "foo";
$a += 1;
var_dump($a);   // int(1)

$a = "foo";
$a = $a + 1;
var_dump($a);   // int(1)

The increment does a normal string increment, as it cannot convert the string into a number. The add expressions convert the strings into longs, by checking if a number is present. Since there isn’t, it will convert the string to int(0)and simply add int(1) to it.