Markus Malkusch's weblog 

The hidden costs of using URL

by Markus Malkusch


Posted on Friday Oct 07, 2016 at 03:27PM in Technology


I'm a fan of using existing types and so I use from time to time the class URL for URLs. In my Whois API I'm having a quiet huge number of such URL objects in a Spring managed Bean. I recently did a Spring upgrade and experienced a significant performance regression on starting the application. It turned out that Spring did a tiny change which would effectivly call hashCode() on its beans. And exactly this is an unexpected expensive call in URL.hashCode(). The API documentation could be more explicit on that:

The hash code is based upon all the URL components relevant for URL comparison. As such, this operation is a blocking operation.

"Blocking" might give a hint on the performance impact or just refer to the fact that this method is synchronized. Digging deeper into the hash code generation, there's this line of code: InetAddress addr = getHostAddress(u); When generating the hash code for an URL object, the hostname of the URL will be resolved via DNS to its IP address. I.e. URL.hashCode() effectivly waits on a Network operation.


Whois API

by Markus Malkusch


Posted on Monday Mar 14, 2016 at 12:24PM in Technology


There is this hobby project which I must have done at least 12 years ago, as that's the date of the oldest news entry. This project was about a convenient web interface to do bulk domain checks. I found it quit useful that you can enter example and it would return a result for all top level domains, e.g. example.net, example.com, example.org. Well, back in that times all top level domains was not the same as today. Today we can almost count 1000 top level domains. Also it's quit remarkable that I did this in vanilla PHP with non blocking IO. Just I didn't call that thing event loop or reactive programming as the cool kids do today.

But time passes and the project lost my attention, until I check the web access statistics. Holy shit, it got thousends of hits per day. It seems like people started to abuse my script to use my server as a proxy for automated whois queries. I stopped that madness by giving them a Captcha and a 429 response code. It took a while but now the traffic seems to be human again.

So obviously there's a demand on doing automated domain checks. During the years a side project emerged out of this old project: A comprehensive whois server list. I aggregate data from several sources and check the quality of that data to compile a very accurate whois server list. With this list I want to give it a try and build a Whois API which people can use automated to query Whois servers and check for the availability of a domain name. You can get a free API key which gives you 11000 free requests each month. These are the endpoints:

/check?domain={domain} Checks if a domain is available or already registered. The result includes also the whois server response.
/whois?host={host}&query={query} Queries an arbitrary whois server

This Whois API uses my compiled whois server list, which tries to support all available top level domains (more than 500). Also it avoids rate limits from the respective whois server. Response times are quit long (more than 1s in average). I therefore recommend to consume the API concurrently. You can also use one of the implemented clients: Java Whois API, PHP Whois API, Wordpress Domain Check Plugin.


Monkey patch a PHP class

by Markus Malkusch


Posted on Friday Jan 08, 2016 at 01:01AM in Technology


Monkey patch

PHP-Mock is a testing libray for mocking built-in PHP functions. This is done with a namespace monkey patch of a PHP function. Today I want to show a way to monkey patch a user land class with plain PHP (i.e. without extensions like runkit or UOPZ).

First of all, the motivation: Why the hell would I want such a thing? Honestly, I don't! What I want, is integrating PHP-Mock with Prophecy. To provide complete functionality, that integration must be able to prophesize functions which have call-by-reference parameters (e.g. exec()). Unfortunately upstream has one offending line of code which simply eats references. I wanted to prepare a pull request, but upstream seems to be reluctant:

So I would even consider that not being able to prophesize such API is a Prophecy feature rather than a bug [..]. This goes in line with the PhpSpec way: making it hard to work with badly designed APIs [..].

I do very much agree with PHP having a badly designed API, but still I want to provide the possibility to prophesize such badly designed functions. So let's replace that reference eating class locally. I did this experimental operation in c9cd844. In this commit you see (next to some documentation and an accidentially commited test) my own implementation of Prophecy\Doubler\ClassPatch\ProphecySubjectPatch and some change in composer.json:

     "autoload": {
-        "psr-4": {"phpmock\\prophecy\\": "classes/"}
+        "psr-4": {
+            "phpmock\\prophecy\\": "classes/",
+            "Prophecy\\": "overwrites/Prophecy/"
+        }
     },

The magic is PHP's autoloading. Nowadays autoloading is effectively done by Composer's autoloader. Let's find some implicit functionality to provide a class path which would have a higher priority in discovering a class definition. In ClassLoader::findFileWithExtension() you can read that PSR-4 is prefered over PSR-0. Luckily Prophecy uses PSR-0. By providing a PSR-4 class path for the namespace Prophecy, Composer's autoloader will first see if I have an implementation for any class in Prophecy\*. If not it continues searching on its other sources, which includes the original prophecy PSR-0 class path.

But wait! I'm leveraging implementation details here. This is highly fragile and could break with any patch release of Composer or Prophecy without any further notice. Monkey patching is a very dangerous tool, but there can be use cases where it might be helpful. Having that implicit behaviour as an explicit feature in Composer would remove the fragility of user land monkey patching.


Deadlocks are not dangerous. Just try again.

by Markus Malkusch


Posted on Sunday Aug 02, 2015 at 06:20PM in Technology


TL;TR

Set the appropriate isolation level, wrap a unit of work into a transaction and do expect it to fail. Therefore I recommend a pattern like this:

for ($i = 3; true; $i--) {
    $pdo->beginTransaction();
    try {

        // Do the unit of work

        $pdo->commit();
        break;

    } catch (\PDOException $e) {
        $pdo->rollback();
        if ($i <= 0) {
            throw $e;
        }
    }
}

The long way

Let's consider this very simple script which should just increase a counter per request:

$increaseCounter = function(\PDO $pdo, $id = 1) {
    $select = $pdo->prepare("SELECT counter FROM counter WHERE id = ?");
    $select->execute([$id]);
    $counter = $select->fetchColumn();

    $counter++;

    $pdo->prepare("UPDATE counter SET counter = ? WHERE id = ?")
        ->execute([$counter, $id]);
};

I'd like to reference the above mentioned script in the further reading simply as $increaseCounter. Also for the purpose of this article we simply ignore that the same can be achived with a single UPDATE query. Let's take this exemplary for any use case where you have a unit of work consisting of several database queries.

So let's emulate 1000 requests and see what happens:

for ($i = 0; $i < 1000; $i++) {
    $increaseCounter($pdo);
}

Nothing unexpected happend. The counter increases sequentially to 1000. But this emulation is not the reality of the web. Requests are not coming sequentially, they are coming concurrently. We can emulate this contention by simply forking the process. As this article is not about forking itself, I won't polute it with that pcntl_* noise and just use for simplicity the spork library.

$concurrency = 4;
$manager = new \Spork\ProcessManager();

for ($i = 0; $i < $concurrency; $i++) {
    $manager->fork(function () use ($concurrency) {

        // each child needs its own connection.
        $pdo = new \PDO("mysql:host=localhost;dbname=test", "test");

        for ($i = 0; $i < 1000 / $concurrency; $i++) {
            $increaseCounter($pdo);
        }
    });
}

Now the counter increased concurrently only to around 500 at my multi core machine. The race condition which happens here is quiet obvious. Multiple processes read the same value while only one of them will effectivly set the counter. The other updates are lost.

Then let's wrap the whole thing into a transaction (assuming we are using a DBS which can do that, i.e. MySQL's InnoDB). That's why we have the I in ACID. To keep the code example readable I will ommit the outer forking skeleton. You can assume for all following examples that there's still those 4 forks created which will run the code concurrently. But let's focus on the actual code of these forks and increase the counter concurrently within transactions.

$manager->fork(function () use ($concurrency) {
    $options = [
        PDO::ATTR_ERRMODE    => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_AUTOCOMMIT => false,
    ];
    $pdo = new \PDO("mysql:host=localhost;dbname=test", "test", null, $options);

    for ($i = 0; $i < 1000 / $concurrency; $i++) {
        $pdo->beginTransaction();
        $increaseCounter($pdo);
        $pdo->commit();
    }
});

Suprisingly the result is still not 1000. Why is that so? Let's learn about isolation levels. In our case MySQL uses per default the level REPEATABLE READ. This level is actually quiet consistent, but as our example is not using SELECT ... FOR UPDATE MySQL is still not locking the row. So let's increase the isolation level to SERIALIZABLE and get that counter until 1000. Also I will skip the boiler plate about forking and connecting to the database. Let's concentrate on the transaction and use your imagination for the missing code.

$pdo->exec("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE");

for ($i = 0; $i < 1000 / $concurrency; $i++) {
    $pdo->beginTransaction();
    $increaseCounter($pdo);
    $pdo->commit();
}

Unfortunately before reaching 1000 the forked children got killed by an uncaught PDOException. Fortunately the exception tells us very clearly what we can do to reach the 1000:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

Ok, let's repeat those failing units of work. Disclaimer: Children, please don't do that at home. There should be a timeout around that loop. We do it here because we just want to see a 1000 and after that we can throw the code away.

$pdo->exec("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE");

for ($i = 0; $i < 1000 / $concurrency; $i++) {
    $pdo->beginTransaction();
    try {
	$increaseCounter($pdo);
	$pdo->commit();

    } catch (\PDOException $e) {
        $pdo->rollback();
	$i--; // that's a very subtle loop, is it?
    }
}

Heureka - after repeating every deadlock I can observe a very 1000 in the database. You can avoid that boiler plate by using e.g. TransactionalMutex:

$pdo->exec("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE");
$mutex = new \malkusch\lock\mutex\TransactionalMutex($pdo);

for ($i = 0; $i < 1000 / $concurrency; $i++) {
    $mutex->synchronized(function () use ($pdo, $increaseCounter) {
	$increaseCounter($pdo);
    });
}

I'd like to finish this article with some warm words from MySQL's manual: How to Cope with Deadlocks

Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again.


yield finally: The exception eater

by Markus Malkusch


Posted on Monday Jun 01, 2015 at 08:43PM in Technology


What do you think will the following code fragment print?

<?php

function generate()
{
    try {
        yield 1;
        yield 2;
    } finally {
        echo "finally\n";
    }
}

foreach (generate() as $i) {
    echo $i, "\n";
    throw new Exception();
}

PHP-5.5 introduced two new language constructs yield and finally. They are not really related to each other, but you could do such nice things as building very easily an iterator which would free its resources guaranteed (e.g. an iterator over an unbuffered query).

In order to demonstrate the execution flow, I created that simple code fragment. This is the output of my PHP-5.6:

1
finally

But this is not what I expected. Something is missing. Do you know what? I will tell you at the very end. So let's see if I did something nasty and check the documentation regarding yield and finally:

If the generator contains (relevant) finally blocks those will be run.

So nothing wrong with my code. Then let's file the bug report and let it run on different PHP versions. And we can finally find the missing stack trace in PHP-5.5:

1
finally

Fatal error: Uncaught exception 'Exception'

Update

I happily see that the bug was fixed within a pleasant short time span in 8405265578d2df8d76be223910b3e44aff4bdfef


PHP hates BLOBs for a long time.

by Markus Malkusch


Posted on Sunday May 10, 2015 at 02:41PM in Technology


There seems to be this uninformed dogma about databases and files. People tend to constantly repeat the mantra "Store files in the file system". While the disadvantages are obvious (no ACID), the disciples bring as the only advantage performance concerns

There's this excellent research regarding performance "To BLOB or Not To BLOB", which comes to the conclusion:

Objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem.

But please don't stop reading here. This statement is valid only for SQL Server 2005. Therefore I wanted to make some benchmarks for some current DBS. I've chosen to use PHP for those benchmarks, as I am afraid that Java is indeed too fast with memory mapped files. Suprisingly I came to the conclusion, for PHP the mantra is indeed correct. This conclusion came quickly even before I could implement the benchmark. Since 8 years PHP has an unresolved bug which doesn't allow streaming Blobs from the database. This makes any further research needless.


It is not safe to rely on the system's timezone settings, but only sometimes.

by Markus Malkusch


Posted on Friday May 08, 2015 at 09:47PM in Technology


It is not safe to rely on the system's timezone settings.

Since version 5.4 PHP is yelling about the missing configuration setting date.timezone:

Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.

Back in the days my first reaction was: Why? I do configure the system's timezone and indeed would consider not doing so a bad idea, but why does PHP think it needs its own timezone setting? The motivation for this annoyance can be found in the PHP-INTERNALS:

you, as an admin, are required to make an informed decision on what you want your timezone to be. There have been way too many bug reports where people had no clue, so now we throw a warning.

Now after zillions of such warnings I just got used to it and configure it like a good sheep. But what is this? There's this killer feature function easter_date():

easter_date() uses the TZ environment variable to determine the time zone it should operate in, rather than using PHP's default time zone.

WTF?


The weakly typed String Comparison in PHP

by Markus Malkusch


Posted on Thursday May 07, 2015 at 02:53AM in Technology


What do you think? Is the following code fragment true or false?

"1e1"=="0xa"

Today I learned that this is true. The section Comparison Operators does explain that:

If [..] the comparison involves numerical strings, then each string is converted to a number and the comparison performed numerically.

So let's see how "numeric string" is defined and compare some other representations of 10:

var_dump(
    0b1010, "10" == "0b1010", // false
    012,    "10" == "012",    // false
    0xa,    "10" == "0xa",    // true
    1E+1,   "10" == "1E+1",   // true
    1e1,    "10" == "1e1",    // true
    10.0,   "10" == "10.0",   // true
    +10,    "10" == "+10"     // true
);

Suprisingly the behaviour is not consistent with the binary and octal representation. Let's ask the PHP guys about that:

The manual section about "String conversions to number" doesn't explicitly mention that hex strings ('0x') are allowed, but neither octal ('0') nor binary ('0b') strings. However, PHP 7 is going remove support for converting hex strings to number.

So they obviously fixed that inconsistent behaviour by changing the definition of "numerical string" and removing the hexadecimal representation from the conversion. This means finally "1e1" == "0xa" won't be true any more. But "10" == "1e1" will still be.

So what did I learn from this? fly.jpg


Mocking a built-in PHP function e.g. time()

by Markus Malkusch


Posted on Sunday Dec 07, 2014 at 03:59PM in Technology


When it comes to writing unit tests you sometimes want to mock a built-in function like time(). These libraries do exist:

I extend this list with the library PHP-Mock. PHP-Mock doesn't need any further extension. It uses PHP's namespace fallback policy:

For functions […], PHP will fall back to global functions […] if a namespaced function […] does not exist.

There exist plenty of resources which explain this technique in greater detail. This short example illustrates mocking an unqualified call to time() in the namespace foo:

namespace foo;

function time() {
    return 1234;
}

assert(1234 == time());

During developing and using PHP-Mock I experienced a suprising restriction which I documented in bug #68541:

Class methods do resolve unqualified function calls only once in their life time.

This means that the mocked function must be defined before the first call of a class method which would call the unqualified built-in function. Most of the times this restriction doesn't bother because a typical test would define the mock before calling the tested method. This piece of code illustrates the restriction:

namespace foo;

class Foo
{
    
    public static function time()
    {
        return time();
    }
}

Foo::time(); // If you remove this line all assertions are true.

// It doesn't matter if you eval the namespaced function or include it.
eval("
    namespace foo {
        function time() {
            return 1234;
        }
    }
");

assert (1234 == time());
assert (1234 == Foo::time()); // This assertion fails, Foo::time() calls \time()!


Never clean a Lenovo Thinkpad keyboard

by Markus Malkusch


Posted on Saturday Nov 08, 2014 at 02:04AM in Technology


Last time I cleaned the keyboard of my Lenovo ThinkPad T410s I showered it completely in the basin (to remove the diarrhoea of my daughter). I assumed that keyboard was waterproof, or at least won't be damaged by water. I was wrong. Lenovo did good efforts to build the keyboard water proof, but the wrong direction. Of course some water did enter the keyboard, which is no problem for any of the components. But unfortunately the infiltrated water was locked in the water proof keyboard for ever. Using the keyboard was not possible. After around 4 weeks of drying I replaced the keyboard.

Now, half year later, I decided to sell my Lenovo ThinkPad T410s (because the T420s is so cheap at Ebay). Let's clean the notebook a little bit. May be with a wet sponge to remove all the hairs, breadcrumbs and dust from the keyboard. This again was a bad idea. The keyboard is now for around one week drying on the heating and I can still see the water inside of it.


Flickering and blank screens at high resolution with Intel HD

by Markus Malkusch


Posted on Wednesday Sep 24, 2014 at 07:40PM in Technology


My Lenovo T410s' graphic chip (Intel HD) produces under some circumstances annoying flickering and blank screens. Those circumstances apears to be a high resolution (in my case 2560x1440 pixel at my Dell U2711 via DisplayPort).

The kernel log might show something like this:

WARNING: CPU: 0 PID: 1815 at drivers/gpu/drm/i915/intel_display.c:953 ironlake_crtc_disable+0x86/0x816()
pipe_off wait timed out
…
Call Trace:
dump_stack+0x49/0x6a
warn_slowpath_common+0x78/0x90
ironlake_crtc_disable+0x86/0x816
warn_slowpath_fmt+0x45/0x4a
ironlake_crtc_disable+0x86/0x816
__intel_set_mode+0xbfe/0x11d4
intel_set_mode+0xd/0x27
intel_crtc_set_config+0x70e/0xa13
drm_mode_set_config_internal+0x48/0xad
restore_fbdev_mode+0x8f/0xa8
drm_fb_helper_restore_fbdev_mode_unlocked+0x1d/0x34
get_vtime_delta+0xd/0x59
drm_fb_helper_set_par+0x3a/0x58
fb_set_var+0x246/0x32c
finish_task_switch+0x44/0xd2
__schedule+0x5cb/0x755
fbcon_blank+0x71/0x230
do_unblank_screen+0xe1/0x15b
complete_change_console+0x4b/0xb6
vt_ioctl+0x915/0xf83
tty_ioctl+0x8f1/0x960
vtime_user_enter+0x23/0x3e
syscall_trace_leave+0x185/0x190
do_vfs_ioctl+0x3f3/0x43c
__fget+0x64/0x6c
SyS_ioctl+0x33/0x59
tracesys+0xe1/0xe6
--[ end trace 0a29c17dff6acf50 ]---
[drm:ibx_irq_handler] *ERROR* PCH transcoder B FIFO underrun

I found someone with the newer graphic chip HD 4000 whose problems sound very familiar to mine. Fortunately Intel is giving hints to identify the problem, a workaround and a solution:

This issue is a BIOS issue specifically with the Memory Reference Code (MRC) version 1.4.0.0 or older. Please contact your system or motherboard manufacturer for a system BIOS update for your system or motherboard that includes MRC 1.5.0.0 or newer. If you have a system with 2 or more memory modules and are comfortable with removing all but one of them so that your memory is in single channel mode, try and remove all but one of them and the issue will go away. This can be used as a workaround until you are able to update your system BIOS from your system or motherboard manufacturer that includes MRC 1.5.0.0 or newer.

Running the system with one RAM module is a nice start to identify the problem. After that let's replace the old T410s BIOS 1.47 with the latest version 1.50 and see how the system performs with two RAM modules.

Bingo! The system runs now for more than one day without flickering or blank screen. But unfortunately it's not fixed completely: Waking up from hibernation brings you back into flickering land!


Booting ISO image with LILO

by Markus Malkusch


Posted on Wednesday Sep 24, 2014 at 12:59AM in Technology


I came into the painful situation to update the BIOS of my Lenovo T410s. Last time I did any BIOS update I remember vendors where shipping floppy disk images. Fortunately today vendors are using a more modern vehicle: The CD-ROM. Unfortunately my T410s doesn't have that 26 years old technology.

I didn't find a nativ solution to boot an ISO with LILO. But LILO can load the bootloader MEMDISK from the SYSLINUX project. MEMDISK itself can boot an ISO image:

For ISO images, the parameter 'iso' must be passed to MEMDISK.

All you have to do is get MEMDISK, which is in Gentoo an emerge syslinux. Then create a folder in which you put the memdisk bootloader and the bootable ISO image. In my case:

mkdir /boot/bios-update
cp /usr/share/syslinux/memdisk 6uuj20uc.iso /boot/bios-update

Now add that image to your lilo.conf:

image = /boot/bios-update/memdisk
    initrd = /boot/bios-update/6uuj20uc.iso
    append = iso

Don't forget to run the mandatory lilo after editing the file. Happy rebooting!