Markus Malkusch's weblog 

Entries tagged [php]

Monkey patch a PHP class

by Markus Malkusch


Posted on Friday Jan 08, 2016 at 01:01AM in Technology


Monkey patch

PHP-Mock is a testing libray for mocking built-in PHP functions. This is done with a namespace monkey patch of a PHP function. Today I want to show a way to monkey patch a user land class with plain PHP (i.e. without extensions like runkit or UOPZ).

First of all, the motivation: Why the hell would I want such a thing? Honestly, I don't! What I want, is integrating PHP-Mock with Prophecy. To provide complete functionality, that integration must be able to prophesize functions which have call-by-reference parameters (e.g. exec()). Unfortunately upstream has one offending line of code which simply eats references. I wanted to prepare a pull request, but upstream seems to be reluctant:

So I would even consider that not being able to prophesize such API is a Prophecy feature rather than a bug [..]. This goes in line with the PhpSpec way: making it hard to work with badly designed APIs [..].

I do very much agree with PHP having a badly designed API, but still I want to provide the possibility to prophesize such badly designed functions. So let's replace that reference eating class locally. I did this experimental operation in c9cd844. In this commit you see (next to some documentation and an accidentially commited test) my own implementation of Prophecy\Doubler\ClassPatch\ProphecySubjectPatch and some change in composer.json:

     "autoload": {
-        "psr-4": {"phpmock\\prophecy\\": "classes/"}
+        "psr-4": {
+            "phpmock\\prophecy\\": "classes/",
+            "Prophecy\\": "overwrites/Prophecy/"
+        }
     },

The magic is PHP's autoloading. Nowadays autoloading is effectively done by Composer's autoloader. Let's find some implicit functionality to provide a class path which would have a higher priority in discovering a class definition. In ClassLoader::findFileWithExtension() you can read that PSR-4 is prefered over PSR-0. Luckily Prophecy uses PSR-0. By providing a PSR-4 class path for the namespace Prophecy, Composer's autoloader will first see if I have an implementation for any class in Prophecy\*. If not it continues searching on its other sources, which includes the original prophecy PSR-0 class path.

But wait! I'm leveraging implementation details here. This is highly fragile and could break with any patch release of Composer or Prophecy without any further notice. Monkey patching is a very dangerous tool, but there can be use cases where it might be helpful. Having that implicit behaviour as an explicit feature in Composer would remove the fragility of user land monkey patching.


Deadlocks are not dangerous. Just try again.

by Markus Malkusch


Posted on Sunday Aug 02, 2015 at 06:20PM in Technology


TL;TR

Set the appropriate isolation level, wrap a unit of work into a transaction and do expect it to fail. Therefore I recommend a pattern like this:

for ($i = 3; true; $i--) {
    $pdo->beginTransaction();
    try {

        // Do the unit of work

        $pdo->commit();
        break;

    } catch (\PDOException $e) {
        $pdo->rollback();
        if ($i <= 0) {
            throw $e;
        }
    }
}

The long way

Let's consider this very simple script which should just increase a counter per request:

$increaseCounter = function(\PDO $pdo, $id = 1) {
    $select = $pdo->prepare("SELECT counter FROM counter WHERE id = ?");
    $select->execute([$id]);
    $counter = $select->fetchColumn();

    $counter++;

    $pdo->prepare("UPDATE counter SET counter = ? WHERE id = ?")
        ->execute([$counter, $id]);
};

I'd like to reference the above mentioned script in the further reading simply as $increaseCounter. Also for the purpose of this article we simply ignore that the same can be achived with a single UPDATE query. Let's take this exemplary for any use case where you have a unit of work consisting of several database queries.

So let's emulate 1000 requests and see what happens:

for ($i = 0; $i < 1000; $i++) {
    $increaseCounter($pdo);
}

Nothing unexpected happend. The counter increases sequentially to 1000. But this emulation is not the reality of the web. Requests are not coming sequentially, they are coming concurrently. We can emulate this contention by simply forking the process. As this article is not about forking itself, I won't polute it with that pcntl_* noise and just use for simplicity the spork library.

$concurrency = 4;
$manager = new \Spork\ProcessManager();

for ($i = 0; $i < $concurrency; $i++) {
    $manager->fork(function () use ($concurrency) {

        // each child needs its own connection.
        $pdo = new \PDO("mysql:host=localhost;dbname=test", "test");

        for ($i = 0; $i < 1000 / $concurrency; $i++) {
            $increaseCounter($pdo);
        }
    });
}

Now the counter increased concurrently only to around 500 at my multi core machine. The race condition which happens here is quiet obvious. Multiple processes read the same value while only one of them will effectivly set the counter. The other updates are lost.

Then let's wrap the whole thing into a transaction (assuming we are using a DBS which can do that, i.e. MySQL's InnoDB). That's why we have the I in ACID. To keep the code example readable I will ommit the outer forking skeleton. You can assume for all following examples that there's still those 4 forks created which will run the code concurrently. But let's focus on the actual code of these forks and increase the counter concurrently within transactions.

$manager->fork(function () use ($concurrency) {
    $options = [
        PDO::ATTR_ERRMODE    => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_AUTOCOMMIT => false,
    ];
    $pdo = new \PDO("mysql:host=localhost;dbname=test", "test", null, $options);

    for ($i = 0; $i < 1000 / $concurrency; $i++) {
        $pdo->beginTransaction();
        $increaseCounter($pdo);
        $pdo->commit();
    }
});

Suprisingly the result is still not 1000. Why is that so? Let's learn about isolation levels. In our case MySQL uses per default the level REPEATABLE READ. This level is actually quiet consistent, but as our example is not using SELECT ... FOR UPDATE MySQL is still not locking the row. So let's increase the isolation level to SERIALIZABLE and get that counter until 1000. Also I will skip the boiler plate about forking and connecting to the database. Let's concentrate on the transaction and use your imagination for the missing code.

$pdo->exec("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE");

for ($i = 0; $i < 1000 / $concurrency; $i++) {
    $pdo->beginTransaction();
    $increaseCounter($pdo);
    $pdo->commit();
}

Unfortunately before reaching 1000 the forked children got killed by an uncaught PDOException. Fortunately the exception tells us very clearly what we can do to reach the 1000:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

Ok, let's repeat those failing units of work. Disclaimer: Children, please don't do that at home. There should be a timeout around that loop. We do it here because we just want to see a 1000 and after that we can throw the code away.

$pdo->exec("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE");

for ($i = 0; $i < 1000 / $concurrency; $i++) {
    $pdo->beginTransaction();
    try {
	$increaseCounter($pdo);
	$pdo->commit();

    } catch (\PDOException $e) {
        $pdo->rollback();
	$i--; // that's a very subtle loop, is it?
    }
}

Heureka - after repeating every deadlock I can observe a very 1000 in the database. You can avoid that boiler plate by using e.g. TransactionalMutex:

$pdo->exec("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE");
$mutex = new \malkusch\lock\mutex\TransactionalMutex($pdo);

for ($i = 0; $i < 1000 / $concurrency; $i++) {
    $mutex->synchronized(function () use ($pdo, $increaseCounter) {
	$increaseCounter($pdo);
    });
}

I'd like to finish this article with some warm words from MySQL's manual: How to Cope with Deadlocks

Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again.


yield finally: The exception eater

by Markus Malkusch


Posted on Monday Jun 01, 2015 at 08:43PM in Technology


What do you think will the following code fragment print?

<?php

function generate()
{
    try {
        yield 1;
        yield 2;
    } finally {
        echo "finally\n";
    }
}

foreach (generate() as $i) {
    echo $i, "\n";
    throw new Exception();
}

PHP-5.5 introduced two new language constructs yield and finally. They are not really related to each other, but you could do such nice things as building very easily an iterator which would free its resources guaranteed (e.g. an iterator over an unbuffered query).

In order to demonstrate the execution flow, I created that simple code fragment. This is the output of my PHP-5.6:

1
finally

But this is not what I expected. Something is missing. Do you know what? I will tell you at the very end. So let's see if I did something nasty and check the documentation regarding yield and finally:

If the generator contains (relevant) finally blocks those will be run.

So nothing wrong with my code. Then let's file the bug report and let it run on different PHP versions. And we can finally find the missing stack trace in PHP-5.5:

1
finally

Fatal error: Uncaught exception 'Exception'

Update

I happily see that the bug was fixed within a pleasant short time span in 8405265578d2df8d76be223910b3e44aff4bdfef


PHP hates BLOBs for a long time.

by Markus Malkusch


Posted on Sunday May 10, 2015 at 02:41PM in Technology


There seems to be this uninformed dogma about databases and files. People tend to constantly repeat the mantra "Store files in the file system". While the disadvantages are obvious (no ACID), the disciples bring as the only advantage performance concerns

There's this excellent research regarding performance "To BLOB or Not To BLOB", which comes to the conclusion:

Objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem.

But please don't stop reading here. This statement is valid only for SQL Server 2005. Therefore I wanted to make some benchmarks for some current DBS. I've chosen to use PHP for those benchmarks, as I am afraid that Java is indeed too fast with memory mapped files. Suprisingly I came to the conclusion, for PHP the mantra is indeed correct. This conclusion came quickly even before I could implement the benchmark. Since 8 years PHP has an unresolved bug which doesn't allow streaming Blobs from the database. This makes any further research needless.


It is not safe to rely on the system's timezone settings, but only sometimes.

by Markus Malkusch


Posted on Friday May 08, 2015 at 09:47PM in Technology


It is not safe to rely on the system's timezone settings.

Since version 5.4 PHP is yelling about the missing configuration setting date.timezone:

Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.

Back in the days my first reaction was: Why? I do configure the system's timezone and indeed would consider not doing so a bad idea, but why does PHP think it needs its own timezone setting? The motivation for this annoyance can be found in the PHP-INTERNALS:

you, as an admin, are required to make an informed decision on what you want your timezone to be. There have been way too many bug reports where people had no clue, so now we throw a warning.

Now after zillions of such warnings I just got used to it and configure it like a good sheep. But what is this? There's this killer feature function easter_date():

easter_date() uses the TZ environment variable to determine the time zone it should operate in, rather than using PHP's default time zone.

WTF?


The weakly typed String Comparison in PHP

by Markus Malkusch


Posted on Thursday May 07, 2015 at 02:53AM in Technology


What do you think? Is the following code fragment true or false?

"1e1"=="0xa"

Today I learned that this is true. The section Comparison Operators does explain that:

If [..] the comparison involves numerical strings, then each string is converted to a number and the comparison performed numerically.

So let's see how "numeric string" is defined and compare some other representations of 10:

var_dump(
    0b1010, "10" == "0b1010", // false
    012,    "10" == "012",    // false
    0xa,    "10" == "0xa",    // true
    1E+1,   "10" == "1E+1",   // true
    1e1,    "10" == "1e1",    // true
    10.0,   "10" == "10.0",   // true
    +10,    "10" == "+10"     // true
);

Suprisingly the behaviour is not consistent with the binary and octal representation. Let's ask the PHP guys about that:

The manual section about "String conversions to number" doesn't explicitly mention that hex strings ('0x') are allowed, but neither octal ('0') nor binary ('0b') strings. However, PHP 7 is going remove support for converting hex strings to number.

So they obviously fixed that inconsistent behaviour by changing the definition of "numerical string" and removing the hexadecimal representation from the conversion. This means finally "1e1" == "0xa" won't be true any more. But "10" == "1e1" will still be.

So what did I learn from this? fly.jpg


Mocking a built-in PHP function e.g. time()

by Markus Malkusch


Posted on Sunday Dec 07, 2014 at 03:59PM in Technology


When it comes to writing unit tests you sometimes want to mock a built-in function like time(). These libraries do exist:

I extend this list with the library PHP-Mock. PHP-Mock doesn't need any further extension. It uses PHP's namespace fallback policy:

For functions […], PHP will fall back to global functions […] if a namespaced function […] does not exist.

There exist plenty of resources which explain this technique in greater detail. This short example illustrates mocking an unqualified call to time() in the namespace foo:

namespace foo;

function time() {
    return 1234;
}

assert(1234 == time());

During developing and using PHP-Mock I experienced a suprising restriction which I documented in bug #68541:

Class methods do resolve unqualified function calls only once in their life time.

This means that the mocked function must be defined before the first call of a class method which would call the unqualified built-in function. Most of the times this restriction doesn't bother because a typical test would define the mock before calling the tested method. This piece of code illustrates the restriction:

namespace foo;

class Foo
{
    
    public static function time()
    {
        return time();
    }
}

Foo::time(); // If you remove this line all assertions are true.

// It doesn't matter if you eval the namespaced function or include it.
eval("
    namespace foo {
        function time() {
            return 1234;
        }
    }
");

assert (1234 == time());
assert (1234 == Foo::time()); // This assertion fails, Foo::time() calls \time()!