PHP 5.4: RegexIterator::getRegex()

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.
Posted on 06 Jan 2011
Tagged with: [ core ]  [ patch ]  [ PHP ]  [ regexiterator ]  [ spl

Recently, my colleague Jeroen van Dijk needed to extend (or better yet: override) the accept()* method for the RegexIterator. Turns out this wasn’t as easy as it might sound in practice. So after extending and overriding multiple methods he found an acceptable solution. But there is room for improvement. And starting from PHP 5.4, this improvement is available through regexiterator::getregex() method.

The problem

SPL’s RegexIterator can only accept() an array of strings like this:

    Array
    (
        [0] => cat
        [1] => hat
        [2] => sat
        [3] => bat
    )

Now, let’s pretend that we have a deeper nested structure:

    Array
    (
        [0] => stdClass Object
            (
                [item] => hat
            )
    
        [1] => stdClass Object
            (
                [item] => cat
            )
    
        [2] => stdClass Object
            (
                [item] => sat
            )
    
        [3] => stdClass Object
            (
                [item] => bat
            )
    )

In order to create a regexiterator that can handle this, we need to extend the class something like this:

<?php
class MyRegexIterator extends RegexIterator {
  protected $_regex;

  public function __construct($iterator, $regex, $mode, $flags, $preg_flags ) {
    $this->_regex = $regex;
    parent::__construct($iterator, $regex, $mode, $flags, $preg_flags);
  }

  public function getRegex() {
    return $this->_regex;
  }

  public function accept() {
    $current = $this->current();
    if (! isset($current->item)) {
      return false;
    }

    if (preg_match($this->getRegex(), $current->item)) {
      return true;
    }

    return false;
  }
}

We need an “ugly” saving of the regex in a custom constructor, when in fact, deep under php’s hood, this data is already present. With the help of the new getRegex() method, we can leave the constructor alone:

<?php

class MyNewerRegexIterator extends RegexIterator {
  public function accept() {
    $current = $this->current();
    if (! isset($current->item)) {
      return false;
    }

    if (preg_match($this->getRegex(), $current->item)) {
      return true;
    }

    return false;
  }
}

The new patch which has been accepted into the 5.4 version of PHP will add this method, which makes life just a little better…  Stay tuned for more patches :)