Incrementing values in PHP
Tagged with: [ Bytecode ] [ Internals ] [ PHP ]
Take a variable, increment it with 1. That sounds like a simple enough job right? Well.. from a PHP developer point of view that might seem the case, but is it really? There are bound to be some catches to it (otherwise we wouldn’t write a blogpost about it). So, there are a few different ways to increment a value, and they MIGHT seem similar, they work and behave differently under the hood of PHP, which can lead to - let’s say - interesting results.
Let’s take a look:
There seems to be many different ways of adding 1
to a variable. Take a look at these three examples:
Different code, but all three blocks will increment the number. But will they all result in the same output?
Seems intuitive enough and they look all equal enough. So it seems that using $a++
is just as valid as using $a += 1
for incrementing. But let’s take a look at another example:
I reckon most aren’t expecting this outcome! Maybe some of you probably knew that adding something to a string will
result in different characters, and guessed the fop
string right, but the two int(1)
’s? Where do they come
from? From a PHP developer’s point of view, it seems very inconsistent and now it seems that these three statements
actually aren’t equal. But let’s take a look what is actually happening under the hood of PHP when executing the code.
The bytecode
When a PHP script runs, the first thing it does is actually compile your PHP code into an intermediate format called byte code (this also debunks the fact that PHP is a truly interpreted language, it’s the byte code that gets interpreted, but not the actual PHP source code). Our example code will output the following byte code:
compiled vars: !0 = $a, !1 = $b, !2 = $c
line #* E I O op fetch ext return operands
---------------------------------------------------------------------------
3 0 E > ASSIGN !0, 1
4 1 POST_INC ~1 !0
2 FREE ~1
5 3 SEND_VAR !0
4 DO_FCALL 1 'var_dump'
7 5 ASSIGN !1, 1
8 6 ASSIGN_ADD 0 !1, 1
9 7 SEND_VAR !1
8 DO_FCALL 1 'var_dump'
11 9 ASSIGN !2, 1
12 10 ADD ~7 !2, 1
11 ASSIGN !2, ~7
13 12 SEND_VAR !2
13 DO_FCALL 1 'var_dump'
14 > RETURN 1
You can create this kind of opcodes easily yourself with the help of Derick Rethans VLD Debugger or online through 3v4l.org. Don’t worry about what it all means. If we get rid of all uninteresting things, we only keep these lines:
compiled vars: !0 = $a, !1 = $b, !2 = $c
line #* E I O op fetch ext return operands
---------------------------------------------------------------------------
4 1 POST_INC ~1 !0
2 FREE ~1
8 6 ASSIGN_ADD 0 !1, 1
12 10 ADD ~7 !2, 1
11 ASSIGN !2, ~7
So an $a++
results into 2 opcodes (POST_INC
and FREE
), $a += 1
into one (ASSIGN_ADD
), and $a = $a + 1
into
two again. Notice that all three of them result into different codes, already implying that the actual code that will be
executed by PHP will also differ.
Unary increment operator
Let’s talk about the first way of incrementing, the unary incremental operator ($a++
). This PHP code will result into
the POST_INC
opcode. (its partner PRE_INC
would be the result of ++$a
and you should know the difference between
the two). The second opcode FREE
actually frees up the result from POST_INC
, as we don’t use its return value (since
a POST_INC
changes the actual operand in-place). We can ignore this opcode for our case.
The magic that defines what will happen when these opcodes are executed is located in the file called zend_vm_def.h
,
which can be found in the actual C source code of PHP. It’s a large C-language header file full of macro’s so it might
be a bit hard to read, even if you know C. Let’s take a look what happens during a POST_INC
opcode call, defined at line 971 of that file (don’t worry, you don’t need to
know about C):
In a nutshell it does the following:
- Check if the variable (
$a
in PHP code, which in our bytecode is referenced as!0
) is of the typelong
. Basically it means to check if the variable contains a number (even though PHP is loose-typed, every variable still has a “type”, and it can switch these types, which we see later). If it’s along
, it will call the C functionfast_increment_function()
and returns to the next opcode. - If the variable is not a number, it will do some basic checks to see if incrementing is possible (you can’t do this
for instance on string offsets:
$a = "foobar"; $a[2]++
which will result in an error). - Next, check if the variable is a non-existing property of an object and that the object has a
__get
and__set
magic PHP methods. If so, use the__get
to fetch the correct value, callfast_increment_function()
and store the value by calling the__set
method (it actually calls these methods from C, not within PHP). - Finally, if the variable is not a property, just call the
increment_function()
.
As you can see, incrementing a number, behaves differently based on the type of the variable. Pretty much it boils down
to calling fast_increment_function
when it’s a number or when it’s a magic property, and calling
increment_function()
otherwise. We’ll discuss these functions below, as the real work will be done there.
The fast_increment_function()
The fast_increment_function()
is a function located in zend_operators and its job is to increment a certain
variable as fast as possible.
If the given variable is a long, it will actually use some very fast assembly code to increment the value. If a value
reached the maximum int value (LONG_MAX
), the variable gets automatically converted to a double. Since this piece of
code is written assembly, this is the fastest way to actually increase a number (provided that the compiler cannot
optimize its C code better than this assembly code), but it can only work when the variable is a long. If the given
variable is not a long, it will simply redirect to the increment_function()
. Since incrementing (and decrementing)
will happen mostly in very tight inner loops (like in for
-statements for instance), doing this as fast as possible is
mandatory in keeping PHP quick.
increment_function()
So if the fast_increment_function()
is the fast way of incrementing a number, the increment_function
is the slow
way of doing this. How something is incremented from this point, is again based on the type of the variable.
- If the variable is a long, it will simply increase the number (and convert it to a double, if we reached the maximum
value that can be stored inside a long). Most of the time, this would already be taken care of by the
fast_increment_function
, but it might happen that we enter this function with a long anyway, so we must check it here as well. - If the variable is a double, we simply increase the double.
- If the variable is a NULL, we return a long 1 (always!).
- If the variable is a string, we do some
magic
we discuss later. - If the variable is an object, and has
internal
operator functionality, call theadd
operator to add the long1
to it. Note that this only works forinternal
classes that manually have defined these operator functions, as you cannot define operators on objects in userland PHP code. The only class I found in a quick scan through the PHP source code, that actually implements this, is theGMP
class so you can do$a = new gmp(1) + new gmp(3); // gmp(4)
. This is actually a new feature of GMP since PHP 5.6, but operator overloading is something that is not directly possible in PHP. - If the variable of some other type than the ones above, we can’t increment it and return a failure code.
So it takes care of objects, doubles, nulls etc. It does not handle for instance booleans, indicating that you cannot
increment a boolean. So $a = false; $a++
won’t work, but also won’t return an error. It just won’t change the variable
(it stays false
).
Incrementing strings
Now for the fun part. Incrementing strings. Dealing with strings is always tricky, but here is what happens:
First, a check is done to see if a given string actually contains a number
. For instance, the string 123
, contains
the number 123
. This string-number will be converted into an actual long number (thus int(123)
). There are few
catches though when trying to convert:
- White spaces are stripped.
- Hex numbers are supported (
0x123
). - Octal and binary (
0123
andb11
) are not supported. - Scientific notation is supported (
1E5
). - Doubles are supported.
- Pre or postfixed number string (like:
135abc or ab123
) are not supported and are not considered a number.
If the output of this check is a long or double, it will simply increase the number. This means that when using a
string 123
and increment it, the output will be int(124)
(note that it changes the variable type from a string to
an int!).
If the string could not be converted into a long or double, it will call the function increment_string()
instead.
increment_string()
PHP uses a perl-like
string incrementing system. If a string is empty, it will return simply string("1")
. Otherwise,
it will use a carry-system to increment the string:
Start from the back of the string. If the character is between ‘a’ or ‘z’, increment this character
(a
becomes b
, etc). If the character is z
, wrap around to a
, and carry one over to the string position
before.
So: a
becomes b
, ab
becomes ac
(no carry needed), az
becomes ba
(z
becomes a
and a
becomes b
because
we carry one character).
Same goes with uppercase A
to Z
and with digits 0
to 9
. When incrementing a 9
it wraps to 0
and carries one.
When we reach the beginning of the string, and we need to carry, we just add another character IN FRONT
of the string, of the same type
that we carried:
So when incrementing a string, we can never change
the type of each character: if it’s a lowercase letter, it will
always stay a lower case letter.
But be careful: when incrementing a “string-number” multiple times:
Incrementing string("2D9")
will result in string("2E0")
(since string("2D9")
is not a number, thus the regular
string increment will happen). But, when incrementing string("2E0")
, it will result in double(3)
: 2E0
is the
scientific notation for 2
, thus it will convert it to a double, and then increment that double into 3. So be careful
with loops and increments!
This string-increment system also might explain why we can increment the string “Z” to “AA”, but why we cannot decrement
“AA” back to “Z”. We could decrement the last “A” back to a “Z”, but what would we do with the first “A”? Should it
decrement also to a “Z” because of a (negative) carry? What what about “0A”? Would that become Z
? But if so,
incrementing that again, will result into AA
. In other words: we cannot simply remove characters during decrementing
like we can add characters when incrementing.
Add assignment expression
So let’s take a look at the second PHP code, which is the add assignment expression
(basically $a += 1
). This seems
similar to the unary increment operator, but behaves differently, in both generated opcodes and in actual execution. It
is ultimately processed by the zend_binary_assign_op_helper, which after some checks,
calls the <a href="http://lxr.php.net/xref/phpng/Zend/zend_operators.c#921">add_function</a>
, with 2 operands: $a
and our int(1)
value.
add_function()
The add_function
behaves differently based on the types of the variables. It mainly consists of doing a type-check on
the operand pair, to see what the variable types are of both operands:
- If the two operands are both
long
, their values are simply added (and the result is converted to a double if overflowed). - If the two operands are a
long
and adouble
, both will be converted to a double, and added. - If the two operands are doubles, they are simply added together.
- If both operands are arrays, they will be merged based on keys:
$a = [ 'a', 'b' ] + [ 'c', 'd' ];
, will result in[ 'a', 'b']
, as it will merge the second array, BUT they happen to have the same keys. Note that it does < strong>not</strong> merge on values, only on keys. - Next, it will try and see if the operands are objects, and checks if the the first operand has internal operator
functionality (just like in the
increment_function()
method). Again, this is not something that you can create yourself in php, but is only supported for internal classes like theGMP
class.
If all fails, because the operands are of different types (like being a string and a long), it will convert both
operands into scalars through the zendi_convert_scalar_to_number
method. When converted, it will basically retry the
whole add_function
again, but this time, it will probably match one of the pairs above.
zendi_convert_scalar_to_number()
Converting a scalar to number depends on the scalar type. It basically boils down to this:
- If the scalar is a string, check to see if it contains a number through
is_numeric_string
. If it does not contain a numerical value, returnint(0)
. - If the scalar is
null
, or booleanfalse
, returnint(0)
. - If the scalar is a boolean
true
, returnint(1)
. - If the scalar is a resource, return the numerical value of the resource number.
- If the scalar is an object, try and cast the object to a long (just like the internal operators, there could also be internal cast functionality, again, not always implemented, and only available for core classes, not php userland classes).
Add operator
The add operator is the simplest one of the three. It boils down to calling the function fast_add_function()
. Just
like the fast_increment_function()
, is uses some direct assembly code to add the numbers if both operands are a long
or double. If not possible, it will redirect to the add_function()
, which is the same one that is used by the
assignment expression.
Since both the add-operator and the add-assignment expression both use the same underlying functionality, doing a $a =
$a + 1
and $a += 1
are equal in working. The only exception is that the add operator CAN result in a fast adding, if
both operands are long or double, so IF you want to do some micro-optimization, an $a = $a + 1
will be faster than
$a += 1
. Not only because the fast_add_function()
, but also we don’t need to process the additional bytecode to
store the result back into $a
.
#Conclusion
Incrementing values behave differently from adding values: the add_function
actually converts
types into compatible
pairs, while the increment_function
does not. We can explain the following results now:
Since increment_function
does not convert the boolean value (it’s not a number, or a string that can be converted into
a number), it fails (silently) and does not increment the value. Thus leaving it to bool(false)
. The add_function
tries to match a boolean
and long
, which doesn’t exist. Thus is converts both values to long: the bool(false)
gets
converted to int(0)
and int(1)
just stays int(1)
. Now we have a long
& long
pair, so the add_function
can simply add them, resulting in int(1)
. (Question: what would a boolean true
+ int(1)
become?)
Another weird thing we can explain now:
The increment does a normal string increment, as it cannot convert the string into a number. The add
expressions convert the strings into longs, by checking if a number is present. Since there isn’t,
it will convert the string to int(0)
and simply add int(1)
to it.