Practical fault detection on timeseries part 2: first macros and templates

In the previous fault detection article, we saw how we can cover a lot of ground in fault detection with simple methods and technology that is available today. It had an example of a simple but effective approach to find sudden spikes (peaks and drops) within fluctuating time series. This post explains the continuation of that work and provides you the means to implement this yourself with minimal effort. I’m sharing with you:

  • Bosun macros which detect our most common not-trivially-detectable symptoms of problems
  • Bosun notification template which provides a decent amount of information
  • Grafana and Graph-Explorer dashboards and integration for further troubleshooting
We reuse this stuff for a variety of cases where the data behaves similarly and I suspect that you will be able to apply this to a bunch of your monitoring targets as well.

Practical fault detection & alerting. You don't need to be a data scientist

As we try to retain visibility into our increasingly complicated applications and infrastructure, we’re building out more advanced monitoring systems. Specifically, a lot of work is being done on alerting via fault and anomaly detection. This post covers some common notions around these new approaches, debunks some of the myths that ask for over-complicated solutions, and provides some practical pointers that any programmer or sysadmin can implement that don’t require becoming a data scientist.

IT-Telemetry Google group. Trying to foster more collaboration around operational insights.

The discipline of collecting infrastructure & application performance metrics, aggregation, storage, visualizations and alerting has many terms associated with it... Telemetry. Insights engineering. Operational visibility. I've seen a bunch of people present their work in advancing the state of the art in this domain: from Anton Lebedevich's statistics for monitoring series, Toufic Boubez' talks on anomaly detection and Twitter's work on detecting mean shifts to projects such as flapjack (which aims to offload the alerting responsibility from your monitoring apps), the metrics 2.0 standardization effort or Etsy's Kale stack which tries to bring interesting changes in timeseries to your attention with minimal configuration.

A real whisper-to-InfluxDB program.

The whisper-to-influxdb migration script I posted earlier is pretty bad. A shell script, without concurrency, and an undiagnosed performance issue. I hinted that one could write a Go program using the unofficial whisper-go bindings and the influxdb Go client library. That's what I did now, it's at It uses configurable amounts of workers for both whisper fetches and InfluxDB commits, but it's still a bit naive in the sense that it commits to InfluxDB one serie at a time, irrespective of how many records are in it.

InfluxDB as a graphite backend, part 2

Updated oct 1, 2014 with a new Disk space efficiency section which fixes some mistakes and adds more clarity.

The Graphite + InfluxDB series continues.

  • In part 1, “On Graphite, Whisper and InfluxDB” I described the problems of Graphite’s whisper and ceres, why I disagree with common graphite clustering advice as being the right path forward, what a great timeseries storage system would mean to me, why InfluxDB - despite being the youngest project - is my main interest right now, and introduced my approach for combining both and leveraging their respective strengths: InfluxDB as an ingestion and storage backend (and at some point, realtime processing and pub-sub) and graphite for its renown data processing-on-retrieval functionality. Furthermore, I introduced some tooling: carbon-relay-ng to easily route streams of carbon data (metrics datapoints) to storage backends, allowing me to send production data to Carbon+whisper as well as InfluxDB in parallel, graphite-api, the simpler Graphite API server, with graphite-influxdb to fetch data from InfluxDB.
  • Not Graphite related, but I wrote influx-cli which I introduced here. It allows to easily interface with InfluxDB and measure the duration of operations, which will become useful for this article.
  • In the Graphite & Influxdb intermezzo I shared a script to import whisper data into InfluxDB and noted some write performance issues I was seeing, but the better part of the article described the various improvements done to carbon-relay-ng, which is becoming an increasingly versatile and useful tool.
  • In part 2, which you are reading now, I’m going to describe recent progress, share more info about my setup, testing results, state of affairs, and ideas for future work

Pixie: simple photo management using directory layouts and tags.

So you have a few devices with pictures, and maybe some additional pictures your friends sent you. You have a lot of pictures of the same thing and probably too high of a resolution. Some may require some editing. How do you easily create photo albums out of this mess? And how do you do it in a way that keeps a simple and elegant, yet flexible file/directory layout for portability and simplicity?

Metrics 2.0: a proposal

  • Graphite’s metrics are strings comprised of dot-separated nodes which, due to their ordering, can be represented as a tree. Many other places use a similar format (stats in /proc etc).
  • OpenTSDB’s metrics are shorter, because they move some of the dimensions (server, etc) into key-value tags.
I think we can do better…
I think our metrics format is restrictive and we do our self a disservice using it:

Graphite-ng: A next-gen graphite server in Go.

I've been a graphite contributor for a while (and still am). It's a great tool for timeseries metrics. Two weeks ago I started working on Graphite-ng: it's somewhere between an early clone/rewrite, a redesign, and an experiment playground, written in Golang. The focus of my work so far is the API web server, which is a functioning prototype, it answers requests like


I.e. it lets you retrieve your timeseries, processed by function pipelines which are setup on the fly based on a spec in your http/rest arguments. Currently it only fetches metrics from text files but I'm working on decent metrics storage as well.

Hi Planet Devops and Infratalk

This blog just got added to planet devops and infra-talk, so for my new readers: you might know me as Dieterbe on irc, github or twitter. Since my move from Belgium to NYC (to do backend stuff at Vimeo) I’ve started writing more about devops-y topics (whereas I used to write more about general hacking and arch linux release engineering and (automated) installations). I’ll mention some earlier posts you might be interested in:

Profiling and behavior testing of processes and daemons, and Devopsdays NYC

Profiling a process run

I wanted the ability to run a given process and get
a plot of key metrics (cpu usage, memory usage, disk i/o) throughout the duration of the process run.
Something light-weight with minimal dependencies so I can easily install it on a server for a one-time need.
Couldn’t find a tool for it, so I wrote profile-process
which does exactly that in <100 lines of python.

black-box behavior testing processes/daemons

I wrote simple-black-box to do this.
It runs the subject(s) in a crafted sandbox, sends input (http requests, commands, …)
and allows to make assertions on http/statsd requests/responses, network listening state, processes running, log entries,
file existence/checksums in the VFS/swift clusters, etc.
Each test-case is a scenario.
It also can use logstash to give a centralized “distributed stack trace” when you need to debug a failure after multiple processes interacting and acting upon received messages; or to compare behavior across different scenario runs.
You can integrate this with profile-process to compare runtime behaviors across testcases/scenarios.

Anthracite, an event database to enrich monitoring dashboards and to allow visual and numerical analysis of events that have a business impact


Graphite can show events such as code deploys and puppet changes as vertical markers on your graph. With the advent of new graphite dashboards and interfaces where we can have popups and annotations to show metadata for each event (by means of client-side rendering), it’s time we have a database to track all events along with categorisation and text descriptions (which can include rich text and hyperlinks). Graphite is meant for time series (metrics over time), Anthracite aims to be the companion for annotated events.
More precisely, Anthracite aims to be a database of “relevant events” (see further down), for the purpose of enriching monitoring dashboards, as well as allowing visual and numerical analysis of events that have a business impact (for the latter, see “Thoughts on incident nomenclature, severity levels and incident analysis” below)
It has a TCP receiver, a database (sqlite3), a http interface to deliver event data in many formats and a simple web frontend for humans.

Histograms in statsd, and graphing them over time with graphite.

I submitted a pull request to statsd which adds histogram support. Example histogram, from Wikipedia
(refresher: a histogram is [a visualization of] a frequency distribution of data, paraphrasing your data by keeping frequencies for entire classes (ranges of data). histograms - Wikipedia)
It’s commonly documented how to plot single histograms, that is a 2D diagram consisting of rectangles whose

  • area is proportional to the frequency of a variable
  • whose width is equal to the class interval
Class intervals go on x-axis, frequencies on y-axis.

Note: histogram class intervals are supposed to have the same width.
My implementation allows arbitrary class intervals with potentially different widths, as well as an upper boundary of infinite.

Plotting histograms.. over time

Moving to New York City

I have a one-way ticket to NYC on Sept. 21st. Vimeo HQ is in Manhattan, and practically the whole team works in the building so it makes sense for me to relocate and join them locally. I'm looking forward to working with the colleagues face to face, but mainly I'm looking forward to the experience of living in such a different place, and exploring the US. In fact, I already have some small trips planned (Hamptons NY, camping in Pennsylvania, skiing in New York this winter) with some friends I met last year in NY.

Resigning as Arch Linux developer

A few days ago, I resigned as Arch Linux developer. I'm sad to go, but I felt like my work on Arch became a drag, so it was time I officialized my decreased interest. The Releng team we started more than 3 years ago is now dead, but other developers are showing interest in iso building and installer scripts, so as long as they don't burn out, you'll see new isos again.

Joining Vimeo

Working on scalable information retrieval systems at the university of Ghent has been very fun: interesting and challenging work, smart team, and an environment that fosters growth and innovation. I could definitely see myself continuing there... However, Vimeo got in touch and told me about their plans... specifically what's going into the new version and what other stuff they have on their roadmap. I can honestly say vimeo is the most beautiful web property I've ever seen [*], not just that, they also provide a top product/service, and host a great community of passionate people who create some of the most beautiful online videos I've ever seen.

Lighttpd socket Arch Linux /var/run tmpfs tmpfiles.d

On Arch Linux, and probably many other distros /run is a new tmpfs, and /var/run symlinks to it. With Lighttpd you might have a fastcgi socket defined something like "/var/run/lighttpd/sockets/mywebsite.sock". This won't work anymore as after each reboot /var/run is an empty directory and lighttpd won't start, /var/log/lighttpd/error.log will tell you: 2012-03-16 09:21:34: (log.c.166) server started 2012-03-16 09:21:34: (mod_fastcgi.c.977) bind failed for: unix:/var/run/lighttpd/sockets/mywebsite.sock-0 No such file or directory 2012-03-16 09:21:34: (mod_fastcgi.c.1397) [ERROR]: spawning fcgi failed.

Luamail: a mail client built into luakit

Similarly to how back in 2009 there was no browser that works in a way I find sane, and I started solving that with uzbl, now I'm fed up with the lack of an email client that works in a way I find sane. Uzbl turned out to be a bit cumbersome for my taste, so I switched to the uzbl-inspired but more pragmatic luakit browser, which is much in the same vein, except that all configuration, extensions, event handling, programmatic input etc are done by interfacing with lua API's. Now I want to build the "luakit of email clients". Let me explain what's that all about...

Hitchhiking.. try it.

For the last few months, I’ve started to actively use hitchhiking as a means to travel between home and work. What started as a “I’m not sure about this, it seems a bit awkward, but I do want to know how it goes and feels, so I’ll try it out once” ended up being “this is great, I’m doing it every day and loving it”. Here’s why you should try it and why it may make your life more awesome.

My metal band

Since the audience of this blog is largely technical, I don't post much about other topics, but I feel it's time for a short summary about one of my "real life projects". In the spring of 2009 I joined a progressive death metal band. I've been drumming since I was 17, but during the last 2 years I've been practising and rehearsing like never before.[1] When you hear yourself on tape for the first time, it's a bit of disillusionment as you suddenly hear every imperfection, many of which you didn't realise you had (or didn't think were very noticeable).

Thank you Google!

Google, you get a lot of bad words over you lately. "Evil", "big brother", "dangerous", .... But I just wanted to say: thank you. You provide us some nice services. Google search, Gmail, analytics, google maps, ... All of these products are/were game changers and made the life of people all over the world easier. Many people take them for granted and don't realise what it takes to design, engineer and operate these applications.

Dvcs-autosync: An open source dropbox clone... well.. almost

I found the Dvcs-autosync project on the vcs-home mailing list, which btw is a great list for folks who are doing stuff like maintaining their home directory in a vcs.
In short:

Use cases:

Why rewriting git history? And why should commits be in imperative present tense?

There are tons of articles describing how you can rewrite history with git, but they do not answer “why should I do it?“. A similar question is “what are the tradeoffs / how do I apply this in my distributed workflow?”.
Also, git developers strongly encourage/command you to write commit message in imperative present tense, but do not say why. So, why?
I’ll try to answer these to the best of my abilities, largely based on how I see things. I won’t get too detailed (there are enough manuals and tutorials for the exact concepts and commands).

Can we build a simple, cross-distribution installation framework?

Today at Fosdem 2011 I did my talk Can we build a simple, cross-distribution installation framework? Basically, using the Arch Installation Framework as a starting point, along with the notion that most of the code is actually not Arch-specific I adressed other distros to check if there was any interest in sharing workload on the distribution-agnostic aspects of the framework. If other distros with a similar philosophy of little-abstractions/KISS would join, we would all reap the benefits of a simple, yet quite featureful installer.

Dir 2011, Fosdem 2011

On February 4, I'll be in Amsterdam at DIR 2011, the 11th Dutch-Belgian Information Retrieval Workshop. After that, I'm going to the devopsdinner and Fosdem beer event in Brussels. On february 5/6 of course, Fosdem itself. Looking forward to the systemd talk. On sunday I'll do a talk about simple shell based Gnu/Linux installers, like mentioned earlier I hope devs from other "lightweight"/kiss-style distro's will be present (Gentoo and other *too's, Crux, *ppix, ...

Building a search engine

I started working at IBCN, the research group of the university of Ghent. I was looking to get back to the challenging world of high-performance and large-scale (web) applications, but I also wanted something more conceptual and researchy, rather then the highly hands-on dev- and ops work I've been doing for a few years now. The Bom-vl project is pretty broad: it aims to make the Flemish cultural heritage media more useable by properly digitizing, archiving and making public the (currently mostly analog) archives from providers such as TV stations.

Blog moved

This blog now runs on Pyblosxom on Lighttpd on my new Linode machine. The moving/conversion process wasn't as smooth as I thought it would be as I needed to work quite a bit on pyblosxom and implement some new plugins to get certain features working (like syntax highlighting). Also, my previous hosting provider removed my account before the contract expired. But luckily I managed to restore everything and all should work pretty much as before, in particular: Rss feeds should still be working on the old urls and the same GUID's are used to avoid spamming anyone, syntax highlighting of all old entries and comments works, but not yet posting highlighted code in new comments.

Libui-sh: a library providing UI functions for shell scripts

== A library providing UI functions for shell scripts ==

When you write bash/shell scripts, do you write your own error/debug/logging/abort functions?
Logic that requests the user to input a boolean, string, password, selection out of a list,
date/time, integer, … ?

Libui-sh is written to take care of all that.
libui-sh is meant to a be a general-purpose UI abstraction library for shell scripts.
Low impact, easy to use, but still flexible.
cli by default, can optionally use ncurses dialogs as well.

Migrating blogs from Drupal to Pyblosxom

pyblosxom is a pretty cool blogging platform written in python.
Like many of the modern minimal blog engines it works with plaintext files only (no database), has a relatively small codebase, supports many plugins (like markdown support), is written in a proper scripting language, has a simple and clean file structure, is seo-friendly, and so on.
The one feature that sets it apart from other minimal blog engines is that it supports comments, and doesn’t just rely on an external service like disqus, but stores comments as plaintext files as well.
Some features seem a bit overengineered (like, multiple possible locations to store themes (known as “flavours”) and templates; I’m a fan of convention over configuration and keeping things simple), but discussing this with the maintainer revealed this is because pyblosxom is meant as a reimplementation of the original perl-based bloxsom project. Over time features could be simplified and/or redesigned.
So I plan to migrate this blog from drupal to pyblosxom.
To do this, I’m building the tool drupal-to-pyblosxom.
The goal is to convert posts, associated metadata (publish time, tags) and comments from the drupal database to pyblosxom files. Source code display should be converted too (merely a matter of converting between different plugin conventions), and images shown should be downloaded. Currently I’m about halfway, if there’s anyone out there with a similar use case, help is welcome ;)

Checking if a git clone has any unique content, git/svn scripts

When cleaning up a system and going over git repositories I often wonder if a git repo contains any interesting, but unpushed work. (i.e. “unique” content)
I heard bzr (or was it hg…) can do it out-of-the-box, but I couldn’t find any existing solution for git.
So I wrote a script to do this. It checks a repo for unique commits, tags, branches, dirty files/index, added files, or stashed states. In comparison to a specific remote, or all of them, and uses an appropriate exitcode.
The script is part of a bigger git-scripts repo (most of the scripts written by random people). Although the original repo creator hasn’t gotten back to me this seems like a good starting point to have some sense of order in the wildspread of git scripts.

Here are some other scripts I find pretty useful:

Filesystem code in AIF

In light of the work and discussions around supporting Nilfs2 and Btrfs on Arch Linux and its installer AIF,
I’ve shared some AIF filesystem code design insights and experiences on the arch-releng mailing list.
This is some hard to understand code. Partly because it’s in bash (and I’ve needed to work around some limitations in bash),
partly because there is some complex logic going on.

I think it’s very useful material for those who are interested (it can also help understanding the user aspect),
so I wanted to share an improved version here.
On a related topic: I proposed to do a session at Fosdem 2011/“distro miniconf” about simple (console based) installers for Linux,
and how multiple distributions could share efforts maintaining installation tools, because there are a lot of cross-distribution concerns
which are not trivial to get right (mostly filesystems, but I also think about clock adjustments, bootloaders, etc).
Already several distro’s use the (or a fork of) the Arch installer, for example Pentoo,
but I think cooperation could be much better and more efficient.


Handing off uzbl to a new project leader

As of yesterday, Brendan ‘bct’ Taylor is the new Uzbl project leader / maintainer.
Yesterday I did the newspost on which explains the reasoning. I can add it feels pretty weird “giving away” and “leaving behind” a project you spent so much time on and which grew a large (well, for a FOSS side project with a hacker audience) base of users and contributors, and which served as inspiration for various other projects.

Rsyncbench, an rsync benchmarking tool

Background info:
I’m currently in the process of evaluating (V)PS hosting providers and backup solutions. The idea being: I want a (V)PS to run my stuff, which doesn’t need much disk space,
but in the meantime it might be a good idea to look for online backup solutions (oops did I say “online”? I meant “cloud”), like on the (V)PS itself, or maybe as a separate solution.
But I’ve got some diverse amount of data (my personal data is mostly a lot of small plaintext files, my mom has a windows VM for which I considered syncing the entire vdi file)
At this point the biggest contenders are Linode (which offers quite some flexibility and management tools, but becomes expensive when you want extra disk space (2$/month*GB), Rackspace backup gives you 10GB for 5$/month, but they have nice backup tools so I could only backup the important files from within the windows VM (~200MB), and then there’s Hetzner, which offers powerful physical private servers with a lot of storage (160GB) for 29eur/month, but less flexibility (I.e. kvm-over-ip costs an extra 15eur/month)

Another issue, given the limited capacity of Belgian internet connections, I needed to figure out how much bandwith rsync really needs, so I can calculate if the duration of a backup run including syncing the full vdi file is still reasonable.

I couldn’t find an rsync benchmarking tool, so I wrote my own.


  • simple
  • non invasive: you specify the target and destination hosts (just localhost is fine too), and file locations
  • measures time spent, bytes sent (measured with tcpdump), and data sent (rsync’s statistics which takes compression into account)
  • supports plugins
  • generates png graphs using Gnuplot
  • two current plugins: one using files of various sizes, both randomly generated (/dev/urandom) and easily compressable (/dev/zero), does some use cases like initial sync, second sync (no-op), and syncing with a data block appended and prepended. The other plugin collects vdi files from rsnapshot directories and measures the rsyncing from each image to the next

An rss2email fork that sucks less

Rss2email is a great tool. I like getting all my news messages in my mailbox and using smtp to make the “news delivery” process more robust makes sense.
However, there are some things I didn’t like about it so I made a github repo where I maintain an alternative version which (imho) contains several useful improvements, both for end users and for developers/downstreams.
Also, this was a nice opportunity for me to improve my python skills :)

Here is how it compares:

What the open source community can learn from Devops

Being active as both a developer and ops person in the professional life, and both an open source developer and packager in my spare time, I noticed some common ground between both worlds, and I think the open source community can learn from the Devops movement which is solving problems in the professional tech world.

For the sake of getting a point across, I’ll simplify some things.

First, a crash course on Devops…

the "Community Contributions" section on the Arch Linux forums is a goldmine

The Community contributions subforum of the Arch Linux forums is awesome. It is the birthplace of many applications, most of them not Arch Linux specific. File managers, media players, browsers, window managers, text editors, todo managers, and so on. Many shell scripts, urxvt extensions and dwm patches aswell. Most of the apps are designed after suckless/KISS principles, but there are also some GUI programs. If you like to discover new apps and tools, check it out.

Back from Canada, Archcon

I’m back from Canada/Archcon, and it was great. I’ve been in Toronto for 11 days, and visited Montreal for 3 days.


Archcon was small (20-ish people). (That’s what you get for doing it in Canada ;), but very nice.
Interesting talks, informal, good vibe, decent logistics and catering.
This year it happened because Dusty and Ricardo actually just wanted to have a conference without worrying too much about the attendance,
next year we should do it again because Arch (conferences) rock(s), and because we need more visitors. More central locations such as Seattle and Europe have been suggested.
Either way, next year both Judd (founder) and Aaron (current overlord) should be there. (this year they both had lame excuses like family reunions and “almost getting married”. Congrats btw, Aaron!)

It was an absolute pleasure to meet some more of my fellow devs, and users.
Here is a pic from the group (unfortunately, a few are missing)

Off to Toronto July 14-28, Archcon

As mentioned earlier, I’ll be at Archcon in Toronto in a few weeks.
It’s a very small conference, and the first of its kind. At the last FrOSCon we have been playing with the idea to hold an informal Arch conference in Europe, but those were just ideas. Dusty and Ricardo beat us with an actual implementation.
This is great, and one of the milestones in Arch Linux history. Which is why I want to be there and help making it better.

Restoring ssh connections on resume

I use pm-utils for hibernation support. It has a hooks system which can execute stuff upon hibernate/suspend/thaw/resume/..., but they run as root. If you want to run stuff as a regular user you could do something like su $user -c <command> ..but these commands have no access to your user environment. In my user environment I have a variable which I need access to, namely SSH_AUTH_SOCK, which points to my agent which has some unlocked ssh keys.

Full article

Uzbl, monitoring, AIF talks

I recently did two talks, for which the videos are now online.

If all goes well, I’ll be at ArchCon this summer, where I’ll be doing these talks:

We’re not sure yet if those talks will get videotaped.

facebook usrbincrash php implementation

Implementation for Facebook usr bin crash puzzle. (how/why) I haven't touched the code for a few months, but better to put it online then to let it rot. 2 branches: master: basically what I submitted to FB, and what just works withpruning: an attempt for futher optimalisation (it only improves the runtime in some cases) but I didn't finish that version and there's a bug in it somewhere In the repo you'll also find various test input files supplied by the community on the forums and a script to benchmark the implementation on all inputfiles.

Full article

Not working for Facebook

In november last year, I was contacted by Facebook HR.
They found my background interesting and thought I might be a good
fit for an “application operations engineer” position in Palo Alto, California. (it is
basically the link between their infrastructure engineering and operations/support

Arch Linux interview and Uzbl article

Arch Linux team interview @ OSNews Uzbl @ Apologies for only informing you about the second article now. I assumed most of you follow LWN (you probably should) or found the article anyway. Of all the articles written about uzbl, no one came close to the quality of Koens work. So even though it's a bit dated it's still worth a read.

ext3 logical partition resizing

You probably know you can resize primary partitions by deleting them and recreating them, keeping the starting block the same but using a higher block as ending point. You can then increase the filesystem.
But what about logical partitions? A while back I had to resize an ext3 logical partition which ended at the end of the last logical partition. I learned some usefull stuff but I only made some quick scratch notes and I don’t remember all details so:
Do not expect a nice tutorial here, it’s more of a commented dump of my scratch notes and some vague memories.
The information in this post is not 100% accurate

I wondered if I could just drop and recreate the extended partition (and if needed, recreating all contained logical partitions, the last one being bigger of course) but nowhere I could find information about that.

About the maemo summit 2009 and the nokia n900

So I’m back from the 3-day maemo summit in Amsterdam. It was very nice. Very well organized, and Nokia definitely invested enough in catering, fancy-suited people and such to please all 400 of us. I met several interesting people, both from the community, as well as Nokia guys.
The talks were diverse, but interesting (duh?). I will especially remember the kickoff with its fancy visual effects and loud music that set the mood straight for the entire weekend.
The best moment was, of course, when it was announced that every summit participant would receive a n900. Uncontrolled hapiness all around.

nokia n900 & maemo summit 2009

I have been looking for the "perfect mobile companion device" already for a while. Basically I want a "pocket PC that can do as much as possible over which i have as much control as possible so I can do things my way, but still fits in a pocket and which can do gsm and such" So, something like a netbook, but really portable, and that can also do telephony stuff.

Opening files automatically on mainstream Linux desktops

Xfce/Gnu/Linux works amazingly well on my moms workstation, with one exception: opening files automatically with the correct program.

The two biggest culprits are:

  • Gtk’s “open file with” dialog: if any Gtk program doesn’t know how to open a file it brings up this dialog that is horrible to use. You can search through your entire VFS for the right executable. No thumbnails, no usage of .desktop files, $PATH, autocompletion and not even limiting the scope to directories such as /usr/bin
  • Mozilla software such as Firefox and Thunderbird: they only seem to differentiate files by their mimetype, not by extension. There are add-ons to make it easier to edit these preferences, but eventually you’re in a dead end because you get files with correct extensions but unuseful mimetimes (application/octet-stream)

Luckily the fd.o guys have come up with .desktop files.

Arch Linux 2009.08 & Froscon 2009

So, the Arch Linux 2009.08 release is now behind us, nicely on schedule. I hope people will like AIF because it was a lot of work and we didn't receive much feedback. I personally like it to apply my fancy backup restoration approach. But I'm sure if more people would look at the code we would find quite some design and implementation things that could be improved. (With uzbl I was amazed how much difference it can make if many people all have ideas and opinions about every little detail) Later this week I'm off to the Counting Cows festival in France, and the week after that (august 22-23) I'm going to FrOSCon in Germany where I will meet some of my Arch Linux colleagues in real life, which I'm really looking forward to.

AIF automatic lvm/dm_crypt installations and test suite

We're working hard on a new Arch release. (should be done by froscon) Amongst the slew of fixes and improvements there are also some cool new things I'm working on. First of all, I worked more on the automatic installations. Now you can easily install an LVM based Arch system on top of dm_crypt for example. You type this command: aif -p automatic -c /usr/share/aif/examples/fancy-install-on-sda And bam you have a complete working system with LVM, dm_crypt etc all set up.

Mysql status variables caveats

While setting up Zenoss and reading Mysql documentation about status variables I learned: All select_* variables ("Select statistics" graph in Zenoss) are actually about joins, not (all) selects. This also explains why there is no clear relation to com_select (which shows the amount of selects). ("Command statistics:selects" graph in Zenoss) Com_select does not denote all incoming select commands. If you have a hit on your query cache, com_select is not incremented.

Automatic installations with AIF

Yesterday I finished the first working version of AIF's automatic procedure, along with a sample config for a basic install..

For me personally this means I can start working on the next step towards my goal of having all my systems "metadata" centrally stored (along with my real "data"), and the possibility to reconstruct all my systems in a deployment-meets-backup-restore fashion ( see rethinking_the_backup_paradigm_a_higher-level... )

Fosdem 2009

I'm particulary interested in: The out-of-the-box concepts of Exherbo. I hope to see more things like this. Various talks at OpenSuse about distribution development, their build service, etc. tools such as func and puppet. syslinux, upstart, ext4 etc. Some mysql stuff and the filesystem i/o from a db perspective talk.

Arch Linux release engineering

I don't think I've ever seen so much anxiety/impatience/hope/buzz for a new Arch Linux release. (this is because of 2.6.28 with ext4 support). The last release was 6 months ago, which is not so good.. also the arch-installer project has been slacking for a while. But the Arch devs have been very busy and many things going on. You know how it goes... That's why some new people have stepped up to help out on a new release: Today, we are on the verge of a 2009-01 release (though that has been said so many times lately ;-) and together with Aaron we have started a new project: the Arch Linux Release Engineering team.

CakePHP and a paradigm shift to a code generation based approach?

At my new job, I’m writing a quite full-featured web application.
I’ve choosen to use CakePHP.
Why? Well, it may be 2 years since I last used it, but I’ve followed the project and it’s planet, and it seems to have matured and gained even more monumentum.
I want to use something that is widely used so there is plenty of stuff available for it, it’s RAD, it’s flexible and powerful.
I noticed things such as CLI support and documentation have improved tremendously too.

However, I find that still, the recommended (or at least “most commonly used”) practices are not as efficient as they could be, and that emphasis is placed on the wrong aspects.
See, even though the bake tool has come a long way since I last used it, it’s still used to “generate some standard models/controllers/views” and the developer can take it from there [further editing the resulting files himself].
Finetuning generated code by editing the templates (in fact, only views have templates; the php code of models and controllers is hardcoded in the scripts that generate them), is still an obscure practice…
Also, there are very few commandline switches (Right now you can choose your app dir, whether you want to bake a model,controller or view, and it’s name.)
All other things (validation rules, associatons, index/view/edit/add actions/views, which components, overwrite yes/no etc) are all handled interactively.
There are also some smaller enoyances such as when you specify one option like the name of the model, it assumes you don’t want interactivity and produces a model containing nothing more then the class definition and the membervariable $name, which is usually worthless.
One thing that is pretty neat though, If you update $this->recursive in a model, the baked views will contain stuff for the associated things. But so much more could be done…

#1 productivity tip: showers

When you’re stuck on a problem, or not even stuck but you just want to boost your creative/out-of-the-box thinking…
Take a shower. When I’m thinking about a problem and I take a shower, the ideas and thoughts just start popping up, one after each other, or sometimes even two at the same time. It’s amazing. And it works every time.

Looking for a new job

The adventure at Netlog didn't work out entirely, so I'm looking for a new challenge! My new ideal (slightly utopic) job would be: Conceptual engineering while still being close to the technical side as well, most notably system engineering and development. Innovative: go where no one has gone before. Integrated in the open-source world. (Bonus points for companies where open source is key in their business model) To get a detailed overview of my interests and skills, I refer to: My Linkedin profile: My Curriculum Vitae.

AIF: the brand new Arch Linux Installation Framework

Recently I started thinking about writing my own automatic installer that would set up my system exactly the way I want.
(See rethinking_the_backup_paradigm_a_higher-level…)

I looked at the official Arch install scripts to see if I could reuse parts of their code, but unfortunately the code was just one big chunk of bash code with the main program and “flow control” (you must first do this step, then that), UI-code (dialogs etc) and backend logic (create filesystems, …) all mangled up and mixed very closely together.
Functionality-wise the installer works fine, but I guess the code behind it is the result of years of adding features and quick fixes without refactoring, making it impossible to reuse any of the code.

So I started to write AIF: the Arch Linux Installation Framework

Handling a remote rename/move with Git

I recently had to rename a repo on my Github account. Github has made this very easy but it's just one side of the issue. Obviously you must also update any references to this remote in other clones, otherwise pushes, fetches etc won't work anymore. You can do this in two ways: open .git/config and modify the url for the remote manually git remote rm origin && git remote add origin$user/$project.git That's it!

Muse ... wow

Weird as it might sound, I've never bothered to listen to Muse songs.. until now. Some people have recommended the band to me so I really had to stop ignoring this band someday. And wow.. what have I been missing al that time :/ Songs like Butterflies and Hurricanes and Citizen Erased are among the most beautiful songs I've ever heard now.

dautostart, a standalone freedesktop-compliant application starter

I couldn’t find a standalone application/script that implements freedesktop compliant (XDG based) autostarting of applications, so I decided to write my own.
The project is at .

Right now, all the basics seem to work (except “Autostart Of Applications After Mount” of the spec).
It’s probably not bugfree. I hacked it together in a few hours (but it works for me :-). Bugreports welcome!

I'm done with Gnome/Gconf

I’m managing my ~ in svn but using gnome & gconf makes this rather hard.
They mangle cache data together with user data and user preferences and spread that mix over several directories in your home (.gconf, .gnome2 etc).
The .gconf directory is the worst. This is where many applications store all their stuff. User preferences but also various %gconf.xml files, which seem to be updated automatically everytime ‘something’ happens: They keep track of timestamps for various events such as when you press numlock or become available on pidgin.
I’m fine with the fact they do that. I’m sure it enables them to provide some additional functionality. But they need to do it in clearly separated places (such as xdg’s $XDG_CACHE_HOME directory)

Requirements for the perfect GTD tool

I’ve been reading GTD lately and it’s absolutely a great and inspiring book.
Having made my home office space into a real Zen I want to start implementing GTD in my digital life but it seems very hard to find a good GTD tool that fully implements GTD. (even though there are a lot of tools out there)

The most interesting ones (each for different reasons) I’ve looked at so far are Thinkingrock, tracks and yagtd (the latter requiring most work before it does everything I need, but it’s also the most easy to dive into the code base). I’m keeping my eyes open because there are certainly more things to discover.

Even though there are probably no applications out there that can do everything I want, I just wanted to share my feature-wishlist. These are the requirements I find that a really good tool should comply with:

Dump your azerty and querty because the only keyboard layout that makes sense is Dvorak!

For a while now I am typing using solely the Dvorak keyboard layout. I roughly estimate it has been 4 or 5 months now - with the first month being a pain in the ass because i had to relearn typing pretty much from scratch - but now my typing speed is starting to exceed what it used to be in querty, and I still have much headroom to improve.

For those who have no clue what I’m talking about: think for 30 seconds which characters you type the most and which the least (eg: which characters occur the most/least in the language you type?).

Ok you got them? Now look at your keyboard and spot where these characters are. Now consider where your fingers are most of the time (if you’ve never learned to type: the ‘base position’ for your fingers is on the middle row). Notice anything strange?

Announcing the Netlog Developer Pages

At work, we've setup the Netlog Developer Pages It is the place where you can/will find all information around our OpenSocial implementation, our own API, skin development, sample code and so on. We've also launched a group where you can communicate with fellow developers and Netlog employees. The page also features a blog where you can follow what is going on in the Netlog Tech team. PS: We've also updated

Windows sucks

I had to fix a problem at my dad's company... "The network was broken." It was a NetBEUI network connecting some windows stations - it has been running for years - and now suddenly the nodes couldn't find eachother. One of the boxes (windows 2000 iirc) had 2 network cards, one for the network, the other not used for anything (not even connected). Disabling the latter - not even touching the former - fixed half of the network.

I survived LCL 31-3-2008

On 31-3-2008 LCL, one of the most used datacenters in Belgium - and the only one with a 0% downtime record in Belgium - had major power issues with their datacenter in Diegem, bringing lots of Belgian parties offline. (more specifics on the net).

If you’re one of the sysadmins of a website with 35M members and 150M hits per day this means you’re in for an exciting night …

DDM : a Distributed Data Manager

UPDATE: this information is outdated. See for latest information.


If you have multiple sets of data (e.g.: music, images, documents, movies, …) and you use these on more then one system ( e.g. a laptop and a file server) then you probably also have some ‘rules’ on how you use these on your systems. For example after capturing new images you maybe put them on your laptop first but you like to sync them to your file server frequently. On the other hand you also want all your high-res images (stored on the server) available for editing on the laptop, and to make it more complicated you might have the same images in a smaller format on your server (for gallery programs etc.) and want these (or a select few albums of them) available on the road.

The more different types of data you have and the more you have specific work flows the harder it becomes to keep your data as up to date as possible and consistent on your boxes. You could manually rsync/(s)cp your data but you end up in having a mess (at least that’s how it turned out on my boxes). Putting everything under version control is great for text files and such, but it’s not an option for bigger (binary) files.

I wanted to keep all my stuff neatly organised in my home directories and I want to create good work flows with as minimum hassle as possible, so I decided to write DDM: the Distributed Data Manager.

Tweaking Lighttpd stat() performance with fcgi-stat-accel

If you serve lots of (small) files with Lighttpd you might notice you’re not getting the throughput you would expect. Other factors (such as latencies because of the random read patterns ) aside, a real show stopper is the stat() system call, which is a blocking system call ( no parallelism ). Some clever guys thought of a way to solve this : a fastcgi program that does a stat(), so when it returns Lighty doesn’t have to wait because the stat information will be in the Linux cache. And in the meanwhile your Lighty thread can do other stuff.

Hacking into my router by brute-forcing http authentication

I forgot the username and password to access the web panel of my router.
Luckily I knew some possible usernames and some patterns that I could have used to construct my password, so I just had to try all the combinations… Too much work to do manually but easily done when scripted.

Here is the php script that I came up with. (obviously stripped of my personal stuff). It got my account in less then a second :)

Video of me drumming

I recorded a little drumvideo while practicing. It's just some improvisation... Find the mistakes! ;-) Recorded with 2 shure SM57's, an M-audio mobilepre USB and the iSight cam in my macbook pro. The mic setup had to be close enough to sound direct and focused without too much reverb, and still far enough to allow all instruments too come through. After some experimenting I decided to place the mics behind the kit, next to me ( one each side), just below the height of the hips.

Bye CakePHP, bye dAuth... Hello Drupal!

I’m afraid the time has come to say goodbye to CakePHP, and to the projects I’ve been working on for it.
I still like Cake … In fact, the further development of 1.2 goes the more I like it (well, generally spoken that is … because there are some minor things I don’t like but that’s not important now). The truth of the matter is I like to develop, I like the php language and I enjoy working with Cake.
But .. all the sites I currently work on are all community sites or blogs

Assymetric keys instead of passwords for SSH authentication to increase security and convenience

I've been using OpenSSH already for a while and although I've seen mentions of "public key authentication" and "RSA encryption" several times in it's config files, I never decided to figure out what it did exactly, and stuck to password authentication. But now the guys at work explained how it works and after reading more about it, I'm totally hooked on it! It's a feature in ssh protocol version 2 (thus it's around for already a while, e.g.

The perfect GTK music player: can Exaile replace Amarok?

I’ve always liked Amarok: it does everything I always wanted, and more. It looks perfect in every way …
But .. it uses the QT library, and although there are tricks to make QT applications more fit in with your gtk desktop/theme it will never fit in perfectly, not only graphically but also because you still need to load the qt libraries when you want to listen to some music and it is built to interact with the KDE desktop environment.

So, I’ve been looking for an alternative, a GTK application strong enough to actually be able to replace Amarok, the king of all software music players.

Webpages should not contain "add to Digg / / Technorati /..." links

I don't like pages / articles / blog posts /.. accompanied by "Digg this", "add to" or "add to Technorati" links. Why not? Because this is meta level functionality. Not functionality of the blog/article/page in question, but on a higher level. And thus this should be handled on a higher level: the web browser. Just like we can create and manage bookmarks (I mean the old fashioned ones, not the delicious ones) in our browser: this is not the task of a web page.

dAuth is a secure authentication system for CakePHP. It uses techniques such as the challenge-response paradigm, customizable multiple-stage password hashing, brute force (hammering) detection, session hijacking prevention etc. Read all about it You can download the files separately on the before mentioned page or get the tarball that somebody was kind enough to create. (damn I'm lazy today) I don't maintain this any more!

PhpDeliciousClient, a php cli client to administer accounts

PhpDeliciousClient is a console based client for doing maintenance on accounts. I wrote it because - to my knowledge - there currently is no good program (including the personalized web page itself) that lets you make changes to your data in a powerful, productive manner. (with data I primarily mean tags. Posts and bundles are considered less important). You probably are familiar with the fact that a Delicious account (or any tag based meta data organizing system, for that matter) can soon become bloated: It gets filled with way too many tags.

Full article


Introduction PhpDeliciousClient is a CLI program for administering your Delicous account. When you invoke it from the command line you have some methods to administer your tags and your posts. It's written in PHP and uses Ed Eliot's PhpDelicous class to contact the api. (included in download) PhpDeliciousClient is licensed under the GPL v2, while the PhpDelicious class is licensed under the BSD license. Why? The main reason I started developing it is because administering your account on the web interface can be cumbersome.

Getting statistics about events that don't trigger page requests with Google Analytics

You probably already heard of Google Analytics. It’s a pretty nice program that (basically) gathers data about visits of your site and creates reports of it. It works by including some JavaScript code on your page, so that each page request triggers a call to the Analytics tracker sending along some data such as which page is requested and which resolution was used. (no personal or other privacy-sensitive data is sent). But here is the deal! I just discovered that you can also track events that don’t require page requests!
Think of links to files or to external locations, JavaScript events (Ajax anyone?) or even Flash events (but who is crazy enough to use Flash anyway?).

Bye, Google sandbox!

Today I'm finally out of Google's Sandbox. Google has this system called the sandbox where new pages go into for 6 months, in order to prevent scammers/spammers from resurrecting dummy pages - and scoring well in Google - all the time. During these 6 months a page will score very bad in search results, even if it should rate very well for the specific keywords. Smart people will look at my first post, dated 03/04/2007, but keep in mind that before this blog existed I already had a dummy page with my name on (the keywords I want to score on) as soon as I could because back then I already knew I wanted to put a blog here and I wanted to get out of the sandbox as soon as possible.

Drag 'n drop tutorial with the CakePHP 1.2 Ajax helper, Prototype framework and Scriptaculous library


During the development of my thesis I wanted to create a drag ‘n drop interface. But I never did anything like that, I never used CakePHP’s Ajax helper and neither made I ever use of more advanced functionalities of Scriptaculous/Prototype. Hell I even never touched Ajax before this!

Although there are some basic CakePHP/Ajax tutorials out there, I still had a hard time because some knowledge about Ajax (in CakePHP) was assumed in all of those. After a lot of googling I even found a tutorial called CakePHP: Sortable AJAX Drag & Drops - The Basics
“Perfect!” I thought, until after staring at the article for a long while and I started to notice nowhere in the article “$ajax->drag”, “$ajax->drop” or “$ajax->dropRemote” is used. (those are calls on the CakePHP Ajax helper to enhance objects to become draggable, or to become a dropbox where draggables can be dropped into). So the only more or less suited tutorial about drag ‘n drop was actually about sorting and didn’t use the drag/drop function calls at all. Even though it contains very useful information.

Long story short: I finally got it working (thanks to Krazylegz and kristofer and possibly others too, it has been a while so I may forget someone ;-), and learned a lot in the process. I will share what I learned with you guys so that hopefully it’s a bit easier for you then what I had to go through.

Open source en softwarepatenten vanuit een ethisch perspectief

Voor school moest ik een ethisch verslag schrijven. Hier moesten 3 elementen in voorkomen: Inleidende tekst / Persoonlijke reflectie over beroeps- en bedrijfsethiek Ethische beschouwingen bij het eindwerk Uitgebreidere verhandeling met ethische beschouwingen over een vraagstuk naar keuze uit het domein van wetenschap, techniek, beroeps- en bedrijfsleven. In mijn geval is dit vraagstuk naar keuze "open source en softwarepatenten" geworden Zoals sommigen onder jullie weten vind ik ethiek een heel belangrijk aspect in het leven en ik ben dan ook blij dat ik het issue van open source en softwarepatenten verder heb kunnen uitdiepen, want dit is iets waar ik in geinteresseerd ben.

I just became a "System & Network Architect"

I just signed my contract at Incrowd, the company behind sites such as redbox and facebox.

I will be working there in a team of all young, enthusiastic people. Among those, some people are already familiar to me: my old friend Lieven (we’ve played in a band together but kept in touch afterwards) and my ex-classmate Jurriaan. Both of them love their jobs btw :-).

My official title is “System & Network architect”.
Things I will be doing there is

Debian changing route: the end of the perfect server linux distribution?

From the very little experience I have with Debian, and from the stuff I’ve been reading about it, I think I can safely say Debian has always been a special distribution: packages always take very long to get into the stable tree, because Debian wanted to be a rock solid system where packages go through a lot of testing. (“We release it when it’s done”) The end result is a distro where you don’t have the latest software, neither as much flexibility as, say Gentoo or Arch: You’d many times need to adapt your way of doing things to the “Debian way” (or be prepared to look for help in really obscure places and probably break things) but the end result is a stable distro where everything works very decently. That, combined with no licensing fees (unlike for example Red hat), make it the perfect choice for a server in small companies, where money is more important then features such as professional support or official certifications.

However, it seems like Debian is taking a route that will make it lose it’s advantages over other distributions in the server market:

Figuring out CakePHP's new AuthComponent

In the Cake community, there has always been much interest in authentication/authorization systems. The issue of authentication has been addressed in several add-ons provided by the community, such as DAuth (written by me), OthAuth (written by Crazylegs) and many others.

However, one of the additions to the 1.2 branch which is currently in active development , is a built-in auth module. A module that isn’t finished yet but it sure is worth it looking at. (In fact I’m thinking about making a new dAuth version built on cake’s own auth system.). As most bakers know, there is very little information about the 1.2 branch in general, and the auth component in specific. So what I will try to do, is delve in the code, mess with it, and explain my findings in this post.

My favorite bash tricks

Hello everyone. This post is about bash, the shell providing so many users easy access to the underlying power of their system. (not bash the quote database, although i really like that website too ;-) ) Most people know the basics, but getting to know it better can really increase your productivity. And when that happens, you might start loving bash as much as I do ;-) I assume you have a basic knowledge of bash, the history mechanism, and ~/.bash* files.

Hello world!

Finally, my own website... I already wanted to get this up for a long time. My initial idea was writing (and styling) it all from scratch using the marvelous CakePHP framework along with an authentication system i wrote, dAuth. However, due to my lack of time I decided to use the excellent drupal platform, of which I'm quite sure will get the job done equally well, while drastically liberating my time, so I can invest it in other projects :-) Dries Buytaert's talk on fosdem this year really helped on making that decision ;-) So, what will this site be about?

