PHP 7 performance improvements (4/5): References mismatch

Julien Pauli, PHP contributor and release manager, details what changed between PHP 5 and PHP 7, and how to migrate and make effective use of the language optimizations. All statements are documented with specific examples and Blackfire profiles. Fourth episode: References mismatch generating no overhead.

By Julien Pauli, on Dec 05, 2016

The following is the 4th episode of our blog series on PHP 7 performance improvements, detailed by Julien Pauli, PHP Contributor and Release Manager.

Read episode 1: Packed arrays

Read episode 2: ints/floats are free in PHP 7

Read episode 3: Encapsed strings optimization

Read episode 5: Immutable arrays

Reference mismatch happens when you pass a non-ref variable to a function argument passed-by-ref, or the opposite. Something like this:

function foo(&$arg) { }
$var = 'str';
foo($var);

Do you know what horrible things happen under the hood in the PHP 5 engine? In PHP 5, whenever there is a reference mismatch, the engine is forced to duplicate the variable before passing it as an argument to a function. As you probably guessed, if the variable contains something big, like a several-thousands-entries array or a big chunk of string; the copy will take a lot of time.

This is because of the way variables and references used to work into PHP 5. When going into the body function, the engine doesn’t know yet if you will change the argument value. If you are, then as the argument is taken by reference, the change must be reflected outside if a reference variable were passed. What if it was not the case (as in our example)? The engine must then create a reference, from the non-reference variable you passed it in the function call. This procedure leads the engine to fully duplicate the variable content in PHP 5 (calling a lot of memcpy() on very sparse pointers, leading to tons of slow memory accesses).

In PHP 7, when the engine wants to create a reference from a non-reference variable, it just wraps the latter into the newly created former, that’s all. There is no memory copy, anywhere. This is because variables work in a very different way internally in PHP 7, and references have been deeply reworked.

Look at this piece of code:

function bar(&$a) { $f = $a; }

$var = range(1,1024);

for ($i=0; $i<1000; $i++) {
bar($var);
}

There is a double mismatch. When you call bar(), you force the engine to create a reference from $var to $a , as the signature says &$a. Then, now that $a is part of a reference set ($var-$a) into the bar() body, you affect it by value to $f: this is another mismatch. $f is not affected by reference, so $a-$f-$var must not be linked together, but $var-$a in one part, and $f alone in another part: you once more force the engine to perform some copies. In PHP 7, creating variables to/from references is pretty light, only copy-on-write may happen after that.

ref-mis

Keep in mind that if you don’t fully understand how references work in PHP (and not that many people understand them deeply and fully), you’d better not use them at all.

We have seen that PHP 7 saves once more a lot of resources for such use cases compared to PHP 5, but we have not played with the copy-on-write in our examples. Things would have been different, and PHP would have been forced to dump the memory under our variable in such a write-to-it case. Using PHP 7 however improves cases where mismatches may not be done on purpose, like calling count($array) with $array being part of a reference: there is no overhead in PHP 7, but it burns your CPU in PHP 5 (assuming a big enough array, like the ones gathered from SQL queries for example).

Next week (last episode): Immutable arrays.

Happy PHP 7’ing,

Julien Pauli

Julien is a web&system architect. He's been using PHP for more than a decade together with frameworks such as Symfony. He is now working at SensioLabs and is a PHP contributor and PHP 5.5/5.6 release manager. He tries to make PHP and its ecosystem better and more efficient.