arch bash cakephp conf dauth devops drupal foss git golang information age life linux lua mail monitoring music mysql n900 netlog openstack perf photos php productivity python thesis travel uzbl vimeo web2.0

IT-Telemetry Google group. Trying to foster more collaboration around operational insights.

The discipline of collecting infrastructure & application performance metrics, aggregation, storage, visualizations and alerting has many terms associated with it... Telemetry. Insights engineering. Operational visibility. I've seen a bunch of people present their work in advancing the state of the art in this domain:
from Anton Lebedevich's statistics for monitoring series, Toufic Boubez' talks on anomaly detection and Twitter's work on detecting mean shifts to projects such as flapjack (which aims to offload the alerting responsibility from your monitoring apps), the metrics 2.0 standardization effort or Etsy's Kale stack which tries to bring interesting changes in timeseries to your attention with minimal configuration.

Much of this work is being shared via conference talks and blog posts, especially around anomaly and fault detection, and I couldn't find a location for collaboration, quicker feedback and discussions on more abstract (algorithmic/mathematical) topics or those that cross project boundaries. So I created the IT-telemetry Google group. If I missed something existing, let me know. I can shut this down and point to whatever already exists. Either way I hope this kind of avenue proves useful to people working on these kinds of problems.

A real whisper-to-InfluxDB program.

The whisper-to-influxdb migration script I posted earlier is pretty bad. A shell script, without concurrency, and an undiagnosed performance issue. I hinted that one could write a Go program using the unofficial whisper-go bindings and the influxdb Go client library. That's what I did now, it's at github.com/vimeo/whisper-to-influxdb. It uses configurable amounts of workers for both whisper fetches and InfluxDB commits, but it's still a bit naive in the sense that it commits to InfluxDB one serie at a time, irrespective of how many records are in it. My series, and hence my commits have at most 60k records, and presumably InfluxDB could handle a lot more per commit, so we might leverage better batching later. Either way, this way I can consistently commit about 100k series every 2.5 hours (or 10/s), where each serie has a few thousand points on average, with peaks up to 60k points. I usually play with 1 to 30 InfluxDB workers. Even though I've hit a few InfluxDB issues, this tool has enabled me to fill in gaps after outages and to do a restore from whisper after a complete database wipe.

InfluxDB as a graphite backend, part 2



Updated oct 1, 2014 with a new Disk space efficiency section which fixes some mistakes and adds more clarity.

The Graphite + InfluxDB series continues.

  • In part 1, "On Graphite, Whisper and InfluxDB" I described the problems of Graphite's whisper and ceres, why I disagree with common graphite clustering advice as being the right path forward, what a great timeseries storage system would mean to me, why InfluxDB - despite being the youngest project - is my main interest right now, and introduced my approach for combining both and leveraging their respective strengths: InfluxDB as an ingestion and storage backend (and at some point, realtime processing and pub-sub) and graphite for its renown data processing-on-retrieval functionality. Furthermore, I introduced some tooling: carbon-relay-ng to easily route streams of carbon data (metrics datapoints) to storage backends, allowing me to send production data to Carbon+whisper as well as InfluxDB in parallel, graphite-api, the simpler Graphite API server, with graphite-influxdb to fetch data from InfluxDB.
  • Not Graphite related, but I wrote influx-cli which I introduced here. It allows to easily interface with InfluxDB and measure the duration of operations, which will become useful for this article.
  • In the Graphite & Influxdb intermezzo I shared a script to import whisper data into InfluxDB and noted some write performance issues I was seeing, but the better part of the article described the various improvements done to carbon-relay-ng, which is becoming an increasingly versatile and useful tool.
  • In part 2, which you are reading now, I'm going to describe recent progress, share more info about my setup, testing results, state of affairs, and ideas for future work

read more

Graphite & Influxdb intermezzo: migrating old data and a more powerful carbon relay

Influx-cli: a commandline interface to Influxdb.

Time for another side project: influx-cli, a commandline interface to influxdb.
Nothing groundbreaking, and it behaves pretty much as you would expect if you've ever used the mysql, pgsql, vsql, etc tools before.
But I did want to highlight a few interesting features.


read more

Darktable: a magnificent photo manager and editor

A post about the magnificent darktable photo manager/editor and why I'm abandoning pixie
read more

Beautiful Go patterns for concurrent access to shared resources and coordinating responses

It's a pretty common thing in backend go programs to have multiple coroutines concurrently needing to modify a shared resource, and needing a response that tells them whether the operation succeeded and/or other auxiliary information. Something centralized manages the shared state, the changes to it and the responses.


read more

Monitorama PDX & my metrics 2.0 presentation

Earlier this month we had another iteration of the Monitorama conference, this time in Portland, Oregon.


(photo by obfuscurity)


read more

On Graphite, Whisper and InfluxDB

Graphite, and the storage Achilles heel

Graphite is a neat timeseries metrics storage system that comes with a powerful querying api, mainly due to the whole bunch of available processing functions.
For medium to large setups, the storage aspect quickly becomes a pain point. Whisper, the default graphite storage format, is a simple storage format, using one file per metric (timeseries).
read more

Metrics 2.0 now has its own website!

Metrics 2.0 started as a half-formal proposal and an implementation via graph-explorer, but is broad enough in scope that it deserves its own website, its own spec, its own community. That's why I launched metrics20.org and a discussion group.
read more

Introduction talk to metrics 2.0 and Graph-Explorer

This week I had the opportunity to present metrics 2.0 and Graph-Explorer at the Full-stack engineering meetup.
read more

Pixie: simple photo management using directory layouts and tags.

So you have a few devices with pictures, and maybe some additional pictures your friends sent you. You have a lot of pictures of the same thing and probably too high of a resolution. Some may require some editing. How do you easily create photo albums out of this mess? And how do you do it in a way that keeps a simple and elegant, yet flexible file/directory layout for portability and simplicity?
read more

Vimeo holiday special & other great videos

We (the Vimeo Staff) just released the 2013 Vimeo Holiday Special, embedded below.
(and I have a line in it! "Who's behind this?")


read more

Metrics 2.0: a proposal

  • Graphite's metrics are strings comprised of dot-separated nodes which, due to their ordering, can be represented as a tree. Many other places use a similar format (stats in /proc etc).
  • OpenTSDB's metrics are shorter, because they move some of the dimensions (server, etc) into key-value tags.
I think we can do better...
I think our metrics format is restrictive and we do our self a disservice using it:
read more

Graphite-ng: A next-gen graphite server in Go.

I've been a graphite contributor for a while (and still am). It's a great tool for timeseries metrics. Two weeks ago I started working on Graphite-ng: it's somewhere between an early clone/rewrite, a redesign, and an experiment playground, written in Golang. The focus of my work so far is the API web server, which is a functioning prototype, it answers requests like

/render/?target=sum(scale(stats.web2,5.12),derivative(stats.web2))

I.e. it lets you retrieve your timeseries, processed by function pipelines which are setup on the fly based on a spec in your http/rest arguments. Currently it only fetches metrics from text files but I'm working on decent metrics storage as well.


read more

A few common graphite problems and how they are already solved.

Hi Planet Devops and Infratalk

This blog just got added to planet devops and infra-talk, so for my new readers: you might know me as Dieterbe on irc, github or twitter. Since my move from Belgium to NYC (to do backend stuff at Vimeo) I've started writing more about devops-y topics (whereas I used to write more about general hacking and arch linux release engineering and (automated) installations). I'll mention some earlier posts you might be interested in:
read more

Profiling and behavior testing of processes and daemons, and Devopsdays NYC

Profiling a process run

I wanted the ability to run a given process and get
a plot of key metrics (cpu usage, memory usage, disk i/o) throughout the duration of the process run.
Something light-weight with minimal dependencies so I can easily install it on a server for a one-time need.
Couldn't find a tool for it, so I wrote profile-process
which does exactly that in <100 lines of python.

black-box behavior testing processes/daemons

I wrote simple-black-box to do this.
It runs the subject(s) in a crafted sandbox, sends input (http requests, commands, ...)
and allows to make assertions on http/statsd requests/responses, network listening state, processes running, log entries,
file existence/checksums in the VFS/swift clusters, etc.
Each test-case is a scenario.
It also can use logstash to give a centralized "distributed stack trace" when you need to debug a failure after multiple processes interacting and acting upon received messages; or to compare behavior across different scenario runs.
You can integrate this with profile-process to compare runtime behaviors across testcases/scenarios.
read more

Graph-Explorer: A graphite dashboard unlike any other

The above sounds like a marketing phrase and I'm just as skeptical of them as you, but I feel it's in place. Not because GE is necessarily better, but it's certainly different.
read more

Client-side rendered graphite charts for all

Client-side rendering of charts as opposed to using graphite's server side generated png's allows various interactivity features, such as:
read more

Anthracite, an event database to enrich monitoring dashboards and to allow visual and numerical analysis of events that have a business impact

Introduction

Graphite can show events such as code deploys and puppet changes as vertical markers on your graph. With the advent of new graphite dashboards and interfaces where we can have popups and annotations to show metadata for each event (by means of client-side rendering), it's time we have a database to track all events along with categorisation and text descriptions (which can include rich text and hyperlinks). Graphite is meant for time series (metrics over time), Anthracite aims to be the companion for annotated events.
More precisely, Anthracite aims to be a database of "relevant events" (see further down), for the purpose of enriching monitoring dashboards, as well as allowing visual and numerical analysis of events that have a business impact (for the latter, see "Thoughts on incident nomenclature, severity levels and incident analysis" below)
It has a TCP receiver, a database (sqlite3), a http interface to deliver event data in many formats and a simple web frontend for humans.


read more

Histograms in statsd, and graphing them over time with graphite.

I submitted a pull request to statsd which adds histogram support. Example histogram, from Wikipedia
(refresher: a histogram is [a visualization of] a frequency distribution of data, paraphrasing your data by keeping frequencies for entire classes (ranges of data). histograms - Wikipedia)
It's commonly documented how to plot single histograms, that is a 2D diagram consisting of rectangles whose

  • area is proportional to the frequency of a variable
  • whose width is equal to the class interval
Class intervals go on x-axis, frequencies on y-axis.

Note: histogram class intervals are supposed to have the same width.
My implementation allows arbitrary class intervals with potentially different widths, as well as an upper boundary of infinite.

Plotting histograms.. over time


read more

Moving to New York City

I have a one-way ticket to NYC on Sept. 21st. Vimeo HQ is in Manhattan, and practically the whole team works in the building so it makes sense for me to relocate and join them locally. I'm looking forward to working with the colleagues face to face, but mainly I'm looking forward to the experience of living in such a different place, and exploring the US. In fact, I already have some small trips planned (Hamptons NY, camping in Pennsylvania, skiing in New York this winter) with some friends I met last year in NY. We're also looking into possibly going to California or Mexico for the holidays.
I've been living in Ghent for about a year now and absolutely loved it. I'll definitely miss Ghent, I think NYC might be a bit too hectic for my taste, but I just have to try this and see :) There's so much to learn and explore. I especially have high expectations of the music scene, which I'ld like to get involved in. (btw my ex-band is recording an EP and looking for a new drummer)

As for things in Belgium: selling furniture, moving out of my apartment next weekend, having a goodbye drink friday sept. 14th, flying a week later and returning in the summer 2013 for the gentse feesten. Maybe I'll visit sooner, depending on how homesick I get :)

Resigning as Arch Linux developer

A few days ago, I resigned as Arch Linux developer.
I'm sad to go, but I felt like my work on Arch became a drag, so it was time I officialized my decreased interest. The Releng team we started more than 3 years ago is now dead, but other developers are showing interest in iso building and installer scripts, so as long as they don't burn out, you'll see new isos again. More information in my resignation mail linked above.

Dieter

Dell crowbar openstack swift

Learned about Dell Crowbar the other day. It seems to be (becoming) a tool I've wanted for quite a while, because it takes automating physical infrastructure to a new level, and is also convenient on virtual.
read more

Joining Vimeo

Working on scalable information retrieval systems at the university of Ghent has been very fun: interesting and challenging work, smart team, and an environment that fosters growth and innovation. I could definitely see myself continuing there...
However, Vimeo got in touch and told me about their plans... specifically what's going into the new version and what other stuff they have on their roadmap. I can honestly say vimeo is the most beautiful web property I've ever seen [*], not just that, they also provide a top product/service, and host a great community of passionate people who create some of the most beautiful online videos I've ever seen. (examples: Vietnam travel report, Sabian cymbals taking advertising to a whole new level in this video with my musician hero Mike Portnoy, a city time lapse video)
From what I can tell, they also do product management well: they know what their territory is, and how to cultivate it through stellar community management. They are not a general purpose video site and hence do not compete directly with YouTube or Facebook.

And now, I have the opportunity to be a part of that. After much pondering I decided to go for it. Resigning at the university was hard but smooth, I felt I had to take this chance and they were very supportive.
I'll be working on the infrastructure/backend side of things, I'm actually working on transcoding infrastructure right now. Working from my place in Ghent, a move to NYC at some point in the future might happen, but we'll see...

[*] When they told me the new version would be more appealing than the old, I couldn't believe that's possible. but to my own surprise they succeeded.

Lighttpd socket Arch Linux /var/run tmpfs tmpfiles.d

On Arch Linux, and probably many other distros /run is a new tmpfs, and /var/run symlinks to it. With Lighttpd you might have a fastcgi socket defined something like "/var/run/lighttpd/sockets/mywebsite.sock". This won't work anymore as after each reboot /var/run is an empty directory and lighttpd won't start, /var/log/lighttpd/error.log will tell you:
2012-03-16 09:21:34: (log.c.166) server started 
2012-03-16 09:21:34: (mod_fastcgi.c.977) bind failed for: unix:/var/run/lighttpd/sockets/mywebsite.sock-0 No such file or directory 
2012-03-16 09:21:34: (mod_fastcgi.c.1397) [ERROR]: spawning fcgi failed. 
2012-03-16 09:21:34: (server.c.945) Configuration of plugins failed. Going down.
That's where this new tool tmpfiles.d comes in. It creates files and directories as described in the configs, and gets invoked on boot. Like so:
$ cat /etc/tmpfiles.d/lighttpd.conf 
d /run/lighttpd/sockets 0700 http http

Thailand, Berlin Velocity EU, NYC, Ghent and more metal

I've been meaning to write about a lot of stuff in separate posts, but they kept getting delayed, so I'll just briefly share everything in one post.
read more

Luamail: a mail client built into luakit

Similarly to how back in 2009 there was no browser that works in a way I find sane, and I started solving that with uzbl, now I'm fed up with the lack of an email client that works in a way I find sane. Uzbl turned out to be a bit cumbersome for my taste, so I switched to the uzbl-inspired but more pragmatic luakit browser, which is much in the same vein, except that all configuration, extensions, event handling, programmatic input etc are done by interfacing with lua API's. Now I want to build the "luakit of email clients". Let me explain what's that all about...

Basically the story is pretty much the same as it was with uzbl. There are no mail clients which offer a high level of customization and interfacing possibilities. There are some mail clients aimed at "power users" and "lightweight mail clients" like mutt/alpine/nmh etc but those are also restricted in extensibility and often crippled in terms of features. Currently I'm using Claws-mail, which is the least sucky client I found, but it's also nowhere near what I want.
read more

Hitchhiking.. try it.

For the last few months, I've started to actively use hitchhiking as a means to travel between home and work. What started as a "I'm not sure about this, it seems a bit awkward, but I do want to know how it goes and feels, so I'll try it out once" ended up being "this is great, I'm doing it every day and loving it". Here's why you should try it and why it may make your life more awesome.
read more

Poor mans pickle implementations benchmark

Where are the new Arch Linux release images?

My metal band

Since the audience of this blog is largely technical, I don't post much about other topics, but I feel it's time for a short summary about one of my "real life projects".
In the spring of 2009 I joined a progressive death metal band. I've been drumming since I was 17, but during the last 2 years I've been practising and rehearsing like never before.[1]
When you hear yourself on tape for the first time, it's a bit of disillusionment as you suddenly hear every imperfection, many of which you didn't realise you had (or didn't think were very noticeable).
So 2 years of practicing, rehearsing, test recordings, real recordings, mixing sessions (where you really grow a good ear towards imperfections) later we are now getting to the point where we can nail our stuff and are looking very forward to our first gig, which will be June 3rd in jh sjatoo in Kalken. We've written about 7 songs, of which at this point we play 5. I wish we had proper recordings of all of them, but "Total Annihilation" captures several aspects of our style:
In early 2010 I treated myself (found a nice 2nd hand deal) to a new pdp birch kit with Zildjian Z custom cymbals (that was actually at the time I was in the interview process for a Palo-Alto position at Facebook so I might have needed to sell it again soon after, but that didn't happen).
Here are some pics: 1, 2, 3, 4. More info about the band:

[1] Hence the need I had to find a new maintainer for Uzbl)

Thank you Google!

Google, you get a lot of bad words over you lately. "Evil", "big brother", "dangerous", ....
But I just wanted to say: thank you.
You provide us some nice services. Google search, Gmail, analytics, google maps, ...
All of these products are/were game changers and made the life of people all over the world easier.
Many people take them for granted and don't realise what it takes to design, engineer and operate these applications.
That you provide them for free makes it even more amazing. And as far as advertisements are concerned; many business models rely on them and I don't see that changing any time soon.
I'll take a personalized ad over a generic ad any day. The more you can optimize targetted ads on my screen, the more useful I'll find them.
Just don't overdo them, but you know that already.

Dieter

Dvcs-autosync: An open source dropbox clone... well.. almost

I found the Dvcs-autosync project on the vcs-home mailing list, which btw is a great list for folks who are doing stuff like maintaining their home directory in a vcs.
In short: Use cases:
read more

Let's make the world a better place. Let's stop the abuse of SI prefixes

This has been on my mind for a while. But now I actually took some time to launch a little project to do something about it.
1024 just ain't 1000. stop abusing SI prefixes!

Why rewriting git history? And why should commits be in imperative present tense?

There are tons of articles describing how you can rewrite history with git, but they do not answer "why should I do it?". A similar question is "what are the tradeoffs / how do I apply this in my distributed workflow?".
Also, git developers strongly encourage/command you to write commit message in imperative present tense, but do not say why. So, why?
I'll try to answer these to the best of my abilities, largely based on how I see things. I won't get too detailed (there are enough manuals and tutorials for the exact concepts and commands).
read more

Can we build a simple, cross-distribution installation framework?

Today at Fosdem 2011 I did my talk Can we build a simple, cross-distribution installation framework? Basically, using the Arch Installation Framework as a starting point, along with the notion that most of the code is actually not Arch-specific I adressed other distros to check if there was any interest in sharing workload on the distribution-agnostic aspects of the framework. If other distros with a similar philosophy of little-abstractions/KISS would join, we would all reap the benefits of a simple, yet quite featureful installer. There was some interest, so we'll see what happens.

Dir 2011, Fosdem 2011

On February 4, I'll be in Amsterdam at DIR 2011, the 11th Dutch-Belgian Information Retrieval Workshop. After that, I'm going to the devopsdinner and Fosdem beer event in Brussels. On february 5/6 of course, Fosdem itself. Looking forward to the systemd talk. On sunday I'll do a talk about simple shell based Gnu/Linux installers, like mentioned earlier I hope devs from other "lightweight"/kiss-style distro's will be present (Gentoo and other *too's, Crux, *ppix, ... You know who you are) It would be interesting to share some common codebase for distribution independent topics (like filesystems), or at least discuss how feasible it would be.

Building a search engine

I started working at IBCN, the research group of the university of Ghent. I was looking to get back to the challenging world of high-performance and large-scale (web) applications, but I also wanted something more conceptual and researchy, rather then the highly hands-on dev- and ops work I've been doing for a few years now.
The Bom-vl project is pretty broad: it aims to make the Flemish cultural heritage media more useable by properly digitizing, archiving and making public the (currently mostly analog) archives from providers such as TV stations.

Currently, I believe there's some >100TB of media in our cluster (mostly from VRT, afaik), along with associated textual descriptions/metadata, with more to follow. The application is currently for a selected audience but the goal is to make it public in the near future. I'm part of the search engine team, we aim to provide users with the most relevant hits for their queries, by using existing technology (think Lucene, hadoop, etc) or devising our own where needed. As I'm charged with a similarity search problem ("other videos which might also interest you"), I'm studying information retrieval topics such as index and algorithm design and various vector models. Starting next week, I'll probably start implementing and testing some approaches.

Blog moved

This blog now runs on Pyblosxom on Lighttpd on my new Linode machine.
The moving/conversion process wasn't as smooth as I thought it would be as I needed to work quite a bit on pyblosxom and implement some new plugins to get certain features working (like syntax highlighting).
Also, my previous hosting provider removed my account before the contract expired.
But luckily I managed to restore everything and all should work pretty much as before, in particular:
Rss feeds should still be working on the old urls and the same GUID's are used to avoid spamming anyone, syntax highlighting of all old entries and comments works, but not yet posting highlighted code in new comments.

In related news: pyblosxom has improved quite a bit and is nearing a 1.5 release.
The drupal-to-pyblosxom tool is now quite polished and comes with a bunch of "check whether everything is ok" scripts.

If you notice anything funny, let me know.

Libui-sh: a library providing UI functions for shell scripts

== A library providing UI functions for shell scripts ==

When you write bash/shell scripts, do you write your own error/debug/logging/abort functions?
Logic that requests the user to input a boolean, string, password, selection out of a list,
date/time, integer, ... ?

Libui-sh is written to take care of all that.
libui-sh is meant to a be a general-purpose UI abstraction library for shell scripts.
Low impact, easy to use, but still flexible.
cli by default, can optionally use ncurses dialogs as well.

read more

Migrating blogs from Drupal to Pyblosxom

pyblosxom is a pretty cool blogging platform written in python.
Like many of the modern minimal blog engines it works with plaintext files only (no database), has a relatively small codebase, supports many plugins (like markdown support), is written in a proper scripting language, has a simple and clean file structure, is seo-friendly, and so on.
The one feature that sets it apart from other minimal blog engines is that it supports comments, and doesn't just rely on an external service like disqus, but stores comments as plaintext files as well.
Some features seem a bit overengineered (like, multiple possible locations to store themes (known as "flavours") and templates; I'm a fan of convention over configuration and keeping things simple), but discussing this with the maintainer revealed this is because pyblosxom is meant as a reimplementation of the original perl-based bloxsom project. Over time features could be simplified and/or redesigned.
So I plan to migrate this blog from drupal to pyblosxom.
To do this, I'm building the tool drupal-to-pyblosxom.
The goal is to convert posts, associated metadata (publish time, tags) and comments from the drupal database to pyblosxom files. Source code display should be converted too (merely a matter of converting between different plugin conventions), and images shown should be downloaded. Currently I'm about halfway, if there's anyone out there with a similar use case, help is welcome ;)
read more

Checking if a git clone has any unique content, git/svn scripts

When cleaning up a system and going over git repositories I often wonder if a git repo contains any interesting, but unpushed work. (i.e. "unique" content)
I heard bzr (or was it hg...) can do it out-of-the-box, but I couldn't find any existing solution for git.
So I wrote a script to do this. It checks a repo for unique commits, tags, branches, dirty files/index, added files, or stashed states. In comparison to a specific remote, or all of them, and uses an appropriate exitcode.
git-remote-in-sync.sh
The script is part of a bigger git-scripts repo (most of the scripts written by random people). Although the original repo creator hasn't gotten back to me this seems like a good starting point to have some sense of order in the wildspread of git scripts.

Here are some other scripts I find pretty useful:
read more

Filesystem code in AIF

In light of the work and discussions around supporting Nilfs2 and Btrfs on Arch Linux and its installer AIF,
I've shared some AIF filesystem code design insights and experiences on the arch-releng mailing list.
This is some hard to understand code. Partly because it's in bash (and I've needed to work around some limitations in bash),
partly because there is some complex logic going on.

I think it's very useful material for those who are interested (it can also help understanding the user aspect),
so I wanted to share an improved version here.
On a related topic: I proposed to do a session at Fosdem 2011/"distro miniconf" about simple (console based) installers for Linux,
and how multiple distributions could share efforts maintaining installation tools, because there are a lot of cross-distribution concerns
which are not trivial to get right (mostly filesystems, but I also think about clock adjustments, bootloaders, etc).
Already several distro's use the (or a fork of) the Arch installer, for example Pentoo,
but I think cooperation could be much better and more efficient.

Anyway:

read more

Handing off uzbl to a new project leader

As of yesterday, Brendan 'bct' Taylor is the new Uzbl project leader / maintainer.
Yesterday I did the newspost on uzbl.org which explains the reasoning. I can add it feels pretty weird "giving away" and "leaving behind" a project you spent so much time on and which grew a large (well, for a FOSS side project with a hacker audience) base of users and contributors, and which served as inspiration for various other projects.
read more

Rsyncbench, an rsync benchmarking tool

Background info:
I'm currently in the process of evaluating (V)PS hosting providers and backup solutions. The idea being: I want a (V)PS to run my stuff, which doesn't need much disk space,
but in the meantime it might be a good idea to look for online backup solutions (oops did I say "online"? I meant "cloud"), like on the (V)PS itself, or maybe as a separate solution.
But I've got some diverse amount of data (my personal data is mostly a lot of small plaintext files, my mom has a windows VM for which I considered syncing the entire vdi file)
At this point the biggest contenders are Linode (which offers quite some flexibility and management tools, but becomes expensive when you want extra disk space (2$/month*GB), Rackspace backup gives you 10GB for 5$/month, but they have nice backup tools so I could only backup the important files from within the windows VM (~200MB), and then there's Hetzner, which offers powerful physical private servers with a lot of storage (160GB) for 29eur/month, but less flexibility (I.e. kvm-over-ip costs an extra 15eur/month)

Another issue, given the limited capacity of Belgian internet connections, I needed to figure out how much bandwith rsync really needs, so I can calculate if the duration of a backup run including syncing the full vdi file is still reasonable.

I couldn't find an rsync benchmarking tool, so I wrote my own.

Features:

  • simple
  • non invasive: you specify the target and destination hosts (just localhost is fine too), and file locations
  • measures time spent, bytes sent (measured with tcpdump), and data sent (rsync's statistics which takes compression into account)
  • supports plugins
  • generates png graphs using Gnuplot
  • two current plugins: one using files of various sizes, both randomly generated (/dev/urandom) and easily compressable (/dev/zero), does some use cases like initial sync, second sync (no-op), and syncing with a data block appended and prepended. The other plugin collects vdi files from rsnapshot directories and measures the rsyncing from each image to the next


read more

An rss2email fork that sucks less

Rss2email is a great tool. I like getting all my news messages in my mailbox and using smtp to make the "news delivery" process more robust makes sense.
However, there are some things I didn't like about it so I made a github repo where I maintain an alternative version which (imho) contains several useful improvements, both for end users and for developers/downstreams.
Also, this was a nice opportunity for me to improve my python skills :)

Here is how it compares:

read more

What the open source community can learn from Devops

Being active as both a developer and ops person in the professional life, and both an open source developer and packager in my spare time, I noticed some common ground between both worlds, and I think the open source community can learn from the Devops movement which is solving problems in the professional tech world.

For the sake of getting a point across, I'll simplify some things.

First, a crash course on Devops...


read more

the "Community Contributions" section on the Arch Linux forums is a goldmine

The Community contributions subforum of the Arch Linux forums is awesome.
It is the birthplace of many applications, most of them not Arch Linux specific.
File managers, media players, browsers, window managers, text editors, todo managers, and so on. Many shell scripts, urxvt extensions and dwm patches aswell.
Most of the apps are designed after suckless/KISS principles, but there are also some GUI programs.

If you like to discover new apps and tools, check it out.

Review of "Python 3 Object Oriented Programming"

Dusty Phillips, Arch Linux shwag guy, Archcon co-organizer, (python) consultant and since recently buddy of mine wrote his first book: Python 3 Object Oriented Programming.

I got the opportunity to get a free pdf copy in exchange for a review on my blog, so here we go.
Mind you, my Python knowledge is very limited. I have done some python programming, and I once read (most of)
read more

Back from Canada, Archcon

I'm back from Canada/Archcon, and it was great. I've been in Toronto for 11 days, and visited Montreal for 3 days.

Archcon

Archcon was small (20-ish people). (That's what you get for doing it in Canada ;), but very nice.
Interesting talks, informal, good vibe, decent logistics and catering.
This year it happened because Dusty and Ricardo actually just wanted to have a conference without worrying too much about the attendance,
next year we should do it again because Arch (conferences) rock(s), and because we need more visitors. More central locations such as Seattle and Europe have been suggested.
Either way, next year both Judd (founder) and Aaron (current overlord) should be there. (this year they both had lame excuses like family reunions and "almost getting married". Congrats btw, Aaron!)

It was an absolute pleasure to meet some more of my fellow devs, and users.
Here is a pic from the group (unfortunately, a few are missing)
read more

Off to Toronto July 14-28, Archcon

As mentioned earlier, I'll be at Archcon in Toronto in a few weeks.
It's a very small conference, and the first of its kind. At the last FrOSCon we have been playing with the idea to hold an informal Arch conference in Europe, but those were just ideas. Dusty and Ricardo beat us with an actual implementation.
This is great, and one of the milestones in Arch Linux history. Which is why I want to be there and help making it better.
read more

Restoring ssh connections on resume

I use pm-utils for hibernation support.
It has a hooks system which can execute stuff upon hibernate/suspend/thaw/resume/..., but they run as root.
If you want to run stuff as a regular user you could do something like

su $user -c <command>

..but these commands have no access to your user environment.
In my user environment I have a variable which I need access to, namely SSH_AUTH_SOCK, which points to my agent which has some unlocked ssh keys. Obviously you don't want to reenter your ssh key passwords everytime you resume.
(In fact, I started using hibernate/resume because I got tired of having to enter 4 passwords on boot. - 1 for dm_crypt, 1 for login, 2 for ssh keys, not because it is much faster)

The solution is very simple. Use this:

sudo pm-hibernate && do-my-stuff.sh

This way, do-my-stuff.sh will be executed when you resume, after the complete environment has been restored.
Ideal to kill old ssh processes, and setup tunnels and ssh connections again.
I'm probably gonna integrate this into my microDE

Uzbl, monitoring, AIF talks

I recently did two talks, for which the videos are now online.

If all goes well, I'll be at ArchCon this summer, where I'll be doing these talks:

We're not sure yet if those talks will get videotaped.

read more

facebook usrbincrash php implementation

Implementation for Facebook usr bin crash puzzle. (how/why)

I haven't touched the code for a few months, but better to put it online then to let it rot.
http://github.com/Dieterbe/facebookpuzzles/

2 branches:

  • master: basically what I submitted to FB, and what just works
  • withpruning: an attempt for futher optimalisation (it only improves the runtime in some cases) but I didn't finish that version and there's a bug in it somewhere

In the repo you'll also find various test input files supplied by the community on the forums and a script to benchmark the implementation on all inputfiles.

Not working for Facebook

In november last year, I was contacted by Facebook HR.
They found my background interesting and thought I might be a good
fit for an "application operations engineer" position in Palo Alto, California. (it is
basically the link between their infrastructure engineering and operations/support
teams).
read more

Fosdem 2010

I'll be at fosdem - 10th edition - again this year.
I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

I'll be presenting a lightning talk about uzbl.
Also, Arch Linux guys Roman, JGC, Thomas and me will hang out at the distro miniconf. We might join the infrastructure round-table panel, but there is no concrete information yet.

More stuff I'm looking forward to:
read more

Arch Linux interview and Uzbl article

Apologies for only informing you about the second article now. I assumed most of you follow LWN (you probably should) or found the article anyway.
Of all the articles written about uzbl, no one came close to the quality of Koens work. So even though it's a bit dated it's still worth a read.

RRDtool: updating RRA settings and keeping your collected data

When you use rrdtool, it can happen that you first create your databases, then collect a whole bunch of data and decide later you want more accuracy/longer periods.
Especially when using
read more

ext3 logical partition resizing

You probably know you can resize primary partitions by deleting them and recreating them, keeping the starting block the same but using a higher block as ending point. You can then increase the filesystem.
But what about logical partitions? A while back I had to resize an ext3 logical partition which ended at the end of the last logical partition. I learned some usefull stuff but I only made some quick scratch notes and I don't remember all details so:
Do not expect a nice tutorial here, it's more of a commented dump of my scratch notes and some vague memories.
The information in this post is not 100% accurate

I wondered if I could just drop and recreate the extended partition (and if needed, recreating all contained logical partitions, the last one being bigger of course) but nowhere I could find information about that.

read more

About the maemo summit 2009 and the nokia n900

So I'm back from the 3-day maemo summit in Amsterdam. It was very nice. Very well organized, and Nokia definitely invested enough in catering, fancy-suited people and such to please all 400 of us. I met several interesting people, both from the community, as well as Nokia guys.
The talks were diverse, but interesting (duh?). I will especially remember the kickoff with its fancy visual effects and loud music that set the mood straight for the entire weekend.
The best moment was, of course, when it was announced that every summit participant would receive a n900. Uncontrolled hapiness all around.

read more

nokia n900 & maemo summit 2009

I have been looking for the "perfect mobile companion device" already for a while. Basically I want a "pocket PC that can do as much as possible over which i have as much control as possible so I can do things my way, but still fits in a pocket and which can do gsm and such"
So, something like a netbook, but really portable, and that can also do telephony stuff.
Nokia's recently announced n900 seems to be very close to what I'm looking for.
It could have been a tad bigger (to make typing easier) but other then that it looks perfect: powerful, high resolution display, Linux with a "usual" userspace (unlike Android) to give me all freedom I'm looking for, keyboard, plenty of space and many goodies such as wifi, a-gps, fm receiver/transmitter, IR, bluetooth, digital camera, tv-out and so on.

This device has ignited my interest in Maemo and all things related so I'll be in Amsterdam on October 9-10-11, at the Maemo summit 2009. I was lucky enough to score a place in the ibis hotel Amsterdam, as Nokia has reserved more rooms then they could fill with invited speakers and own personnel ;-)
I'm hoping it will be possible to buy a device at the conference. The timing would be perfect. Nokia seems to be a really cool company and so far, they haven't disappointed yet...

Somewhat related: a Mer (maemo alternative) developer told me he was very interested in making uzbl available on Mer, so he did just that. I'm curious myself how usable uzbl will be on the n900. Only one way to find out :)

Opening files automatically on mainstream Linux desktops

Xfce/Gnu/Linux works amazingly well on my moms workstation, with one exception: opening files automatically with the correct program.

The two biggest culprits are:

  • Gtk's "open file with" dialog: if any Gtk program doesn't know how to open a file it brings up this dialog that is horrible to use. You can search through your entire VFS for the right executable. No thumbnails, no usage of .desktop files, $PATH, autocompletion and not even limiting the scope to directories such as /usr/bin
  • Mozilla software such as Firefox and Thunderbird: they only seem to differentiate files by their mimetype, not by extension. There are add-ons to make it easier to edit these preferences, but eventually you're in a dead end because you get files with correct extensions but unuseful mimetimes (application/octet-stream)

Luckily the fd.o guys have come up with .desktop files.
read more

Snip: a dead-simple but quite powerful text expander and more

Inspired by Snippy and snippits, I wrote a simple tool called snip.
It helps you to automatically fill in text for you (which can be dynamically created) and/or to perform custom keypresses and operations.

read more

Wishlist

I'm starting to keep track of some things I want. I've picked Amazon because they have many items in their database.
wishlist

Froscon 2009 afterthoughts

A script that pulls photos from facebook

Fbcmd is pretty cool.
I quickly hacked this script together which pulls all photo albums from friends on facebook, so I have them available where I want. (It should also pull your own albums, but I don't have any so I can't check that)
read more

Arch Linux 2009.08 & Froscon 2009

So, the Arch Linux 2009.08 release is now behind us, nicely on schedule.
I hope people will like AIF because it was a lot of work and we didn't receive much feedback. I personally like it to apply my fancy backup restoration approach.
But I'm sure if more people would look at the code we would find quite some design and implementation things that could be improved. (With uzbl I was amazed how much difference it can make if many people all have ideas and opinions about every little detail)

Later this week I'm off to the Counting Cows festival in France, and the week after that (august 22-23) I'm going to FrOSCon in Germany where I will meet some of my Arch Linux colleagues in real life, which I'm really looking forward to.

If anyone wants a ride to froscon let me know. But note I'll try to maximize my time there (leave saturday early and come back late on sunday. I even took a day off on monday so I might stay a day longer if I find more interested people to hang out there)

AIF automatic lvm/dm_crypt installations and test suite

We're working hard on a new Arch release. (should be done by froscon)

Amongst the slew of fixes and improvements there are also some cool new things I'm working on.
First of all, I worked more on the automatic installations. Now you can easily install an LVM based Arch system on top of dm_crypt for example.
You type this command:

aif -p automatic -c /usr/share/aif/examples/fancy-install-on-sda

And bam you have a complete working system with LVM, dm_crypt etc all set up. You just need to change your keymap, hostname, network config and such (or, configure that on the beforehand in the config file for AIF)

Another thing I started working on is a very simple test suite.
Basically, when launching a test, the following steps are invoked

  • installation of an arch system with aif's automatic procedure using a certain config file
  • installation of a verification script onto the target system and configuration of the target to run the script on boot (DAEMONS variable in /etc/rc.conf)
  • if aif ended succesfully: automatic reboot.. and tada!

The verification script will check things like availability (and size) of LVM volumes, amount of swap space, keyboard layout, network and so on.
Here's a picture of a rough first version:

Stay tuned!

Mysql status variables caveats

While setting up Zenoss and reading Mysql documentation about status variables I learned:

  • All select_* variables ("Select statistics" graph in Zenoss) are actually about joins, not (all) selects. This also explains why there is no clear relation to com_select (which shows the amount of selects). ("Command statistics:selects" graph in Zenoss)
  • Com_select does not denote all incoming select commands. If you have a hit on your query cache, com_select is not incremented. So I thought we were doing less qps while in fact we were just getting more cache hits. Qcache_hits gets incremented on cache hits (but is not monitored by Zenoss)

Zenoss & Mysql monitoring

I've been playing with Zenoss (2.4) for the first time. Here are my thoughts:
read more

Uzbl. A browser that adheres to the unix philosophy.

I need a browser that is fast, not bloated, stores my data (bookmarks, history, account settings, preferences, ...) in simple text files that I can keep under version control, something that does not reinvent the wheel, something that I can control.

Well, I could not find it.
So I started the uzbl browser project.
read more

Poor mans dmenu benchmark

I wanted to know how responsive dmenu and awk, sort, uniq are on a 50MB file (625000 entries of 80 1-byte chars each).


read more

Automatic installations with AIF

Yesterday I finished the first working version of AIF's automatic procedure, along with a sample config for a basic install..

For me personally this means I can start working on the next step towards my goal of having all my systems "metadata" centrally stored (along with my real "data"), and the possibility to reconstruct all my systems in a deployment-meets-backup-restore fashion ( see rethinking_the_backup_paradigm_a_higher-level... )


read more

Fosdem 2009

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

I'm particulary interested in:

Arch Linux release engineering

I don't think I've ever seen so much anxiety/impatience/hope/buzz for a new Arch Linux release. (this is because of 2.6.28 with ext4 support).
The last release was 6 months ago, which is not so good.. also the arch-installer project has been slacking for a while. But the Arch devs have been very busy and many things going on. You know how it goes...

That's why some new people have stepped up to help out on a new release:
Today, we are on the verge of a 2009-01 release (though that has been said so many times lately ;-) and together with Aaron we have started a new project: the Arch Linux Release Engineering team.
Members of this team are Aaron himself, Gerhard Brauer and me.

Our goals:

  • coordinated releases following the rhythm of kernel releases (that's a release every 2-3 months baby!).
  • anticipate the availability of the kernel in the testing repo's instead of having to wait for it to go to core before building alpha/beta images
  • migration to AIF as the new Arch installer (woot!)
  • testing. Leveraging the possibilities of AIF as an unattended installer, we should be able to script full installations and health checking of the resulting system.
  • involving the community more? releasing "testing iso's" ? not sure about that. we'll see...

We also have:

Oh yeah AIF is mirrored @ http://projects.archlinux.org/ and available packaged in the official repo's!

CakePHP and a paradigm shift to a code generation based approach?

At my new job, I'm writing a quite full-featured web application.
I've choosen to use CakePHP.
Why? Well, it may be 2 years since I last used it, but I've followed the project and it's planet, and it seems to have matured and gained even more monumentum.
I want to use something that is widely used so there is plenty of stuff available for it, it's RAD, it's flexible and powerful.
I noticed things such as CLI support and documentation have improved tremendously too.

However, I find that still, the recommended (or at least "most commonly used") practices are not as efficient as they could be, and that emphasis is placed on the wrong aspects.
See, even though the bake tool has come a long way since I last used it, it's still used to "generate some standard models/controllers/views" and the developer can take it from there [further editing the resulting files himself].
Finetuning generated code by editing the templates (in fact, only views have templates; the php code of models and controllers is hardcoded in the scripts that generate them), is still an obscure practice...
Also, there are very few commandline switches (Right now you can choose your app dir, whether you want to bake a model,controller or view, and it's name.)
All other things (validation rules, associatons, index/view/edit/add actions/views, which components, overwrite yes/no etc) are all handled interactively.
There are also some smaller enoyances such as when you specify one option like the name of the model, it assumes you don't want interactivity and produces a model containing nothing more then the class definition and the membervariable $name, which is usually worthless.
One thing that is pretty neat though, If you update $this->recursive in a model, the baked views will contain stuff for the associated things. But so much more could be done...

read more

Jobhunt over.

What better way to launch the new year then starting to work as a System Engineer/Developer for a consulting firm where everyone breathes Linux and Open Source?
Next week I'll start at Kangaroot. Woohoo.

new AIF release

My holidays present for Arch devs and users: AIF alpha-0.6 !

* Changes since alpha 0.5:
read more

#1 productivity tip: showers

When you're stuck on a problem, or not even stuck but you just want to boost your creative/out-of-the-box thinking...
Take a shower. When I'm thinking about a problem and I take a shower, the ideas and thoughts just start popping up, one after each other, or sometimes even two at the same time. It's amazing. And it works every time.
read more

Looking for a new job

The adventure at Netlog didn't work out entirely, so I'm looking for a new challenge!

My new ideal (slightly utopic) job would be:

  • Conceptual engineering while still being close to the technical side as well, most notably system engineering and development.
  • Innovative: go where no one has gone before.
  • Integrated in the open-source world. (Bonus points for companies where open source is key in their business model)

To get a detailed overview of my interests and skills, I refer to:

AIF: the brand new Arch Linux Installation Framework

Recently I started thinking about writing my own automatic installer that would set up my system exactly the way I want.
(See rethinking_the_backup_paradigm_a_higher-level...)

I looked at the official Arch install scripts to see if I could reuse parts of their code, but unfortunately the code was just one big chunk of bash code with the main program and "flow control" (you must first do this step, then that), UI-code (dialogs etc) and backend logic (create filesystems, ...) all mangled up and mixed very closely together.
Functionality-wise the installer works fine, but I guess the code behind it is the result of years of adding features and quick fixes without refactoring, making it impossible to reuse any of the code.

So I started to write AIF: the Arch Linux Installation Framework (actually it had another name until recently), with these 3 goals in mind:
read more

Handling a remote rename/move with Git

I recently had to rename a repo on my Github account. Github has made this very easy but it's just one side of the issue. Obviously you must also update any references to this remote in other clones, otherwise pushes, fetches etc won't work anymore.

You can do this in two ways:

  • open .git/config and modify the url for the remote manually
  • git remote rm origin && git remote add origin git@github.com:$user/$project.git

That's it! All will work fine again.

Muse ... wow

Weird as it might sound, I've never bothered to listen to Muse songs.. until now. Some people have recommended the band to me so I really had to stop ignoring this band someday. And wow.. what have I been missing al that time :/
Songs like Butterflies and Hurricanes and Citizen Erased are among the most beautiful songs I've ever heard now.

dautostart, a standalone freedesktop-compliant application starter

I couldn't find a standalone application/script that implements freedesktop compliant (XDG based) autostarting of applications, so I decided to write my own.
The project is at http://github.com/Dieterbe/dautostart .

Right now, all the basics seem to work (except "Autostart Of Applications After Mount" of the spec).
It's probably not bugfree. I hacked it together in a few hours (but it works for me :-). Bugreports welcome!

read more

Visual feedback of the exit status of the previous command in bash

Put this in your .bashrc, and the current directory in your PS1 will be printed green if the previous command had exit state 0, red otherwise. No more typing 'echo $?', ' && echo ok', '|| echo failed' etc on the command line.
read more

I'm done with Gnome/Gconf

I'm managing my ~ in svn but using gnome & gconf makes this rather hard.
They mangle cache data together with user data and user preferences and spread that mix over several directories in your home (.gconf, .gnome2 etc).
The .gconf directory is the worst. This is where many applications store all their stuff. User preferences but also various %gconf.xml files, which seem to be updated automatically everytime 'something' happens: They keep track of timestamps for various events such as when you press numlock or become available on pidgin.
I'm fine with the fact they do that. I'm sure it enables them to provide some additional functionality. But they need to do it in clearly separated places (such as xdg's $XDG_CACHE_HOME directory)
read more

DDM v0.4 released

DDM v0.4 has been released.
Since the last release many, many things have been changed/fixed/added.

read more

My projects are on github now

I've put my (somewhat interesting) projects on GitHub.
Git is a damn cool VCS for distributed development, and I think Github integrates with it really nicely, adding some useful aspects for following and collaborating on projects.
The projects I have migrated to my GitHub profile are:
read more

A fast way to get stuff out of your head and into your GTD inbox

Often while you're occupied with something, some thought pops into your head. Something that you want to remember/do something about.
read more

Requirements for the perfect GTD tool

I've been reading GTD lately and it's absolutely a great and inspiring book.
Having made my home office space into a real Zen I want to start implementing GTD in my digital life but it seems very hard to find a good GTD tool that fully implements GTD. (even though there are a lot of tools out there)

The most interesting ones (each for different reasons) I've looked at so far are Thinkingrock, tracks and yagtd (the latter requiring most work before it does everything I need, but it's also the most easy to dive into the code base). I'm keeping my eyes open because there are certainly more things to discover.

Even though there are probably no applications out there that can do everything I want, I just wanted to share my feature-wishlist. These are the requirements I find that a really good tool should comply with:

read more

Rethinking the backup paradigm: a higher-level approach

In this post I explain my vision on the concepts of backups and how several common practices are in my opinion suboptimal and become unnecessary or at least can be done more easily by managing data on a higher level by employing other patterns such as versioning important directories and distributed data management.
read more

Dump your azerty and querty because the only keyboard layout that makes sense is Dvorak!

For a while now I am typing using solely the Dvorak keyboard layout. I roughly estimate it has been 4 or 5 months now - with the first month being a pain in the ass because i had to relearn typing pretty much from scratch - but now my typing speed is starting to exceed what it used to be in querty, and I still have much headroom to improve.

For those who have no clue what I'm talking about: think for 30 seconds which characters you type the most and which the least (eg: which characters occur the most/least in the language you type?).

Ok you got them? Now look at your keyboard and spot where these characters are. Now consider where your fingers are most of the time (if you've never learned to type: the 'base position' for your fingers is on the middle row). Notice anything strange?
read more

Announcing the Netlog Developer Pages

At work, we've setup the Netlog Developer Pages

It is the place where you can/will find all information around our OpenSocial implementation, our own API, skin development, sample code and so on.
We've also launched a group where you can communicate with fellow developers and Netlog employees.
The page also features a blog where you can follow what is going on in the Netlog Tech team.

PS: We've also updated our jobs page

Windows sucks

I had to fix a problem at my dad's company...
"The network was broken."

It was a NetBEUI network connecting some windows stations - it has been running for years - and now suddenly the nodes couldn't find eachother.
One of the boxes (windows 2000 iirc) had 2 network cards, one for the network, the other not used for anything (not even connected). Disabling the latter - not even touching the former - fixed half of the network.

There was another box that couldn't find any other node in the network. This happened to be a box of which the ps/2 mouse broke. It had a usb mouse connected but since it was windows 95 it was not supported. I removed the usb mouse and attached another ps/2 mouse. This fixed not only the mouse but also the box could suddenly find the other boxes again....

Windows really does suck.

And the worst part is: even though all is fine now, I have no clue for how long it will work, and what will be the cause the next time it will be broken?

I survived LCL 31-3-2008

On 31-3-2008 LCL, one of the most used datacenters in Belgium - and the only one with a 0% downtime record in Belgium - had major power issues with their datacenter in Diegem, bringing lots of Belgian parties offline. (more specifics on the net).

If you're one of the sysadmins of a website with 35M members and 150M hits per day this means you're in for an exciting night ...
read more

DDM : a Distributed Data Manager

UPDATE: this information is outdated. See http://github.com/Dieterbe/ddm/tree/master for latest information.

Introduction

If you have multiple sets of data (e.g.: music, images, documents, movies, ...) and you use these on more then one system ( e.g. a laptop and a file server) then you probably also have some 'rules' on how you use these on your systems. For example after capturing new images you maybe put them on your laptop first but you like to sync them to your file server frequently. On the other hand you also want all your high-res images (stored on the server) available for editing on the laptop, and to make it more complicated you might have the same images in a smaller format on your server (for gallery programs etc.) and want these (or a select few albums of them) available on the road.

The more different types of data you have and the more you have specific work flows the harder it becomes to keep your data as up to date as possible and consistent on your boxes. You could manually rsync/(s)cp your data but you end up in having a mess (at least that's how it turned out on my boxes). Putting everything under version control is great for text files and such, but it's not an option for bigger (binary) files.

I wanted to keep all my stuff neatly organised in my home directories and I want to create good work flows with as minimum hassle as possible, so I decided to write DDM: the Distributed Data Manager.
read more

Tweaking Lighttpd stat() performance with fcgi-stat-accel

If you serve lots of (small) files with Lighttpd you might notice you're not getting the throughput you would expect. Other factors (such as latencies because of the random read patterns ) aside, a real show stopper is the stat() system call, which is a blocking system call ( no parallelism ). Some clever guys thought of a way to solve this : a fastcgi program that does a stat(), so when it returns Lighty doesn't have to wait because the stat information will be in the Linux cache. And in the meanwhile your Lighty thread can do other stuff.
read more

I'm not going to Fosdem 2008

I wish I could put this on my webpage :

I’m going to FOSDEM, the Free and Open Source Software Developers’ European Meeting

read more

Per-directory bash history

I've been thinking about how a specific bash history for each directory could improve productivity, and unlike what I feared it was actually pretty easy to find a solution on the net.
read more

The key to mastering a musical instrument ...

The key to mastering a musical instrument is learning an other.

Hacking into my router by brute-forcing http authentication

I forgot the username and password to access the web panel of my router.
Luckily I knew some possible usernames and some patterns that I could have used to construct my password, so I just had to try all the combinations... Too much work to do manually but easily done when scripted.

Here is the php script that I came up with. (obviously stripped of my personal stuff). It got my account in less then a second :)
read more

gtk dialogs for (shell)scripts with zenity and the ask-pass gui tools for ssh-add

Phew! where to start? Probably at this blogpost. It's about making it very easy to work with external encrypted volumes. I'm not going to talk about the article itself but about a great tool i discovered thanks to it: Zenity. It's an LGPL-licensed program written in C by some guys from Gnome and Sun. You can call it from any script and present a user with a gtk widget such as a password-dialog, filechooser, calendar, ... It has
read more

Simple command to retrieve stuff from RAM after closing/crashing an application

After an app is closed or crashed, the data is still in your RAM and you can very easily get it back by grepping /proc/kcore.
thanks Martin for this tip!
http://www.matusiak.eu/numerodix/blog/index.php/2007/09/10/recover-lost-...

Emulating two-dimensional (or even multi-dimensional) arrays in bash

Ever needed to use arrays of two or more dimensions but got stuck on Bash limited array support which provides only 1 dimension?

There is a trick that let's you dynamically create variable names. Using this, you can emulate additional dimensions.

read more

Nagios monitoring in your desktop panel aka Xfce Genmon panel plugin rules!

FOSS is written by users, for users, and what I've been doing/experiencing this afternoon is a perfect example of that.
read more

Upgrading Drupal the easy way

I just upgraded this site to Drupal 5.2. The package came with upgrading instructions consisting of 11 steps to complete the upgrade proces, but after reading it a few times I realized it could be done easier.
read more

Bye CakePHP, bye dAuth... Hello Drupal!

I'm afraid the time has come to say goodbye to CakePHP, and to the projects I've been working on for it.
I still like Cake ... In fact, the further development of 1.2 goes the more I like it (well, generally spoken that is ... because there are some minor things I don't like but that's not important now). The truth of the matter is I like to develop, I like the php language and I enjoy working with Cake.
But .. all the sites I currently work on are all community sites or blogs, and although some of them have some specific requirements, in the end it's all very generic and a full blown content management system like Drupal proves much more useful and feature full then developing my own application in a web application framework such as Cake. (even if that's becoming easier and easier to do)
read more

Assymetric keys instead of passwords for SSH authentication to increase security and convenience

I've been using OpenSSH already for a while and although I've seen mentions of "public key authentication" and "RSA encryption" several times in it's config files, I never decided to figure out what it did exactly, and stuck to password authentication. But now the guys at work explained how it works and after reading more about it, I'm totally hooked on it!

It's a feature in ssh protocol version 2 (thus it's around for already a while, e.g. we can all use it without updating something) which essentially comes down to this: you generate an asymmetric key pair and distribute the public key to all remote hosts. When logging in to that host, the host will encrypt a random integer, which only you can decrypt (using the private key) and hence prove your identity. Too secure your private key you'll store it encrypted with a password. Ssh-agent (which is bundled with openssh) is the tool that interacts with ssh to perform this task: when logging in to a host, ssh-agent will open the private key for you automatically if it can decrypt it with the password it receives from you But the problem is you'll have to load (enter your password and decrypt the key) each time again.

This is where keychain comes in, or you can use SSH Agent (don't confuse this with the ssh agent that comes with openssh) if you're a Mac user and like gui's. These tools basically ask you your passwords for all private keys you wish to use in a session (with session I mean "the whole time of using your computer"), decrypt the encrypted key on your hard disk and cache the decrypted key in ram, so it can be used in all terminals you open.

For more information:
OpenSSH key management, Part 1: Understanding RSA/DSA authentication
OpenSSH key management, Part 2: Introducing ssh-agent and keychain
OpenSSH key management, Part 3: Agent forwarding and keychain improvements (freaks only ;-))

Have fun

The perfect GTK music player: can Exaile replace Amarok?

I've always liked Amarok: it does everything I always wanted, and more. It looks perfect in every way ...
But .. it uses the QT library, and although there are tricks to make QT applications more fit in with your gtk desktop/theme it will never fit in perfectly, not only graphically but also because you still need to load the qt libraries when you want to listen to some music and it is built to interact with the KDE desktop environment.

So, I've been looking for an alternative, a GTK application strong enough to actually be able to replace Amarok, the king of all software music players.
read more

Webpages should not contain "add to Digg / Del.icio.us / Technorati /..." links

I don't like pages / articles / blog posts /.. accompanied by "Digg this", "add to Del.icio.us" or "add to Technorati" links.
Why not? Because this is meta level functionality. Not functionality of the blog/article/page in question, but on a higher level. And thus this should be handled on a higher level: the web browser. Just like we can create and manage bookmarks (I mean the old fashioned ones, not the delicious ones) in our browser: this is not the task of a web page. (we all know how silly "bookmark this" links look on a page, right?)

Whether you like these kind of services or not is up to you (personally I think the most popular content is often the most subjective, biased and close minded, not to mention too mainstream for "real" geeks but that's another story) but people who are serious about it should just make sure they can do it for any page they visit (eg use a Firefox extension to enable digging and adding to delicious) so we can get rid of this ugly clutter that is put on some pages.

I know not every browser supports this already (either by default or by extending it with (3rd party) plug-ins) and even if it is, not everyone enables this functionality, so it's a bit a chicken and egg problem.

But then again, most web2.0 people already use a browser that supports it, or will support it in the very near future, so let's get rid of this inappropriately placed meta-functionality !

Afgestudeerd - graduated

Foreign visitors: yeay I graduated today \o/

Dutchies: joepie, afgestudeerd...
Vandaag proclamatie gehad, ben geslaagd met voldoening, zelfs geen enkele buis \o/
En een 12/20 voor de Masterproef :-)

PhpDeliciousClient, a php cli client to administer del.icio.us accounts

PhpDeliciousClient is a console based client for doing maintenance on Del.icio.us accounts.
I wrote it because - to my knowledge - there currently is no good program (including the personalized del.icio.us web page itself) that lets you make changes to your del.icio.us data in a powerful, productive manner. (with data I primarily mean tags. Posts and bundles are considered less important).

You probably are familiar with the fact that a Delicious account (or any tag based meta data organizing system, for that matter) can soon become bloated: It gets filled with way too many tags. Among those tags several of them mean the same (fun, funny, humor, ...) or include the other (humor, jokes, ...) You can group them in bundles but even then you need to add all the tags to a post if you want it to appear in the results for that tag. Not very convenient. Also, if you have your del.icio.us bookmarks available in Firefox, you'd have a menu with several hundreds of entries (one for each tag), each menu containing usually just a few (or worse: just one) entry.

When I got in this situation I tried to fix it, but it was a hell of a task to do this on the Delicious webpage itself, and I although I found some other tools they were far to basic, outdated, dependent on other stuff or just not meant for this kind of task, so I decided to write my own.

The result is a php command line program called PhpDeliciousClient (as you can see, I added it to the menu on the left too), which uses the PhpDelicious library to access the Del.icio.us api.

The primary focus of the program is to help you to bring your tags in balance, in an as efficient way as possible. Other stuff, which can be done just fine on the delicious page (editing single posts, changing your password, ...) is not implemented.

It's a bit hacky, I don't give any guarantees but I can tell I used it to edit my own Del.icio.us page, going from about 400 tags to about 80 without any problems.

That said, head over to the PhpDeliciousClient project page for some more information, and to download it ;-)

Getting statistics about events that don't trigger page requests with Google Analytics

You probably already heard of Google Analytics. It's a pretty nice program that (basically) gathers data about visits of your site and creates reports of it. It works by including some JavaScript code on your page, so that each page request triggers a call to the Analytics tracker sending along some data such as which page is requested and which resolution was used. (no personal or other privacy-sensitive data is sent). But here is the deal! I just discovered that you can also track events that don't require page requests!
Think of links to files or to external locations, JavaScript events (Ajax anyone?) or even Flash events (but who is crazy enough to use Flash anyway?).
read more

Bye, Google sandbox!

Today I'm finally out of Google's Sandbox.

Google has this system called the sandbox where new pages go into for 6 months, in order to prevent scammers/spammers from resurrecting dummy pages - and scoring well in Google - all the time.
During these 6 months a page will score very bad in search results, even if it should rate very well for the specific keywords.

Smart people will look at my first post, dated 03/04/2007, but keep in mind that before this blog existed I already had a dummy page with my name on (the keywords I want to score on) as soon as I could because back then I already knew I wanted to put a blog here and I wanted to get out of the sandbox as soon as possible.

I know some people who kept forgetting my URL and didn't find it in Google, well, you will find me now!

You can't make bits harder to copy

I just watched Cory Doctorow's talk which is part of the Authors@Google series on youtube.

He made some great points about where the (music) industry gets/does it wrong and about some fundamental flaws in our law systems (especially with regards to copyright). All of which are of course results of the challenges imposed by the "information age". (which I also introduced in
read more

Drag 'n drop tutorial with the CakePHP 1.2 Ajax helper, Prototype framework and Scriptaculous library

Introduction

During the development of my thesis I wanted to create a drag 'n drop interface. But I never did anything like that, I never used CakePHP's Ajax helper and neither made I ever use of more advanced functionalities of Scriptaculous/Prototype. Hell I even never touched Ajax before this!

Although there are some basic CakePHP/Ajax tutorials out there, I still had a hard time because some knowledge about Ajax (in CakePHP) was assumed in all of those. After a lot of googling I even found a tutorial called CakePHP: Sortable AJAX Drag & Drops - The Basics
"Perfect!" I thought, until after staring at the article for a long while and I started to notice nowhere in the article "$ajax->drag", "$ajax->drop" or "$ajax->dropRemote" is used. (those are calls on the CakePHP Ajax helper to enhance objects to become draggable, or to become a dropbox where draggables can be dropped into). So the only more or less suited tutorial about drag 'n drop was actually about sorting and didn't use the drag/drop function calls at all. Even though it contains very useful information.

Long story short: I finally got it working (thanks to Krazylegz and kristofer and possibly others too, it has been a while so I may forget someone ;-), and learned a lot in the process. I will share what I learned with you guys so that hopefully it's a bit easier for you then what I had to go through.


read more

Open source en softwarepatenten vanuit een ethisch perspectief

Voor school moest ik een ethisch verslag schrijven.
Hier moesten 3 elementen in voorkomen:

  • Inleidende tekst / Persoonlijke reflectie over beroeps- en bedrijfsethiek
  • Ethische beschouwingen bij het eindwerk
  • Uitgebreidere verhandeling met ethische beschouwingen over een vraagstuk naar keuze uit het domein van wetenschap, techniek, beroeps- en bedrijfsleven. In mijn geval is dit vraagstuk naar keuze "open source en softwarepatenten" geworden

Zoals sommigen onder jullie weten vind ik ethiek een heel belangrijk aspect in het leven en ik ben dan ook blij dat ik het issue van open source en softwarepatenten verder heb kunnen uitdiepen, want dit is iets waar ik in geinteresseerd ben.

Mijn werkje kan je hier downloaden. Lees het gerust door en laat me weten wat je ervan vindt!

Een heel interessant artikel dat ik ben tegengekomen (wel, eigenlijk mailde de docent van Ethiek het me door ;-) is Community Ethics and Challenges to Intellectual Property geschreven door Kaido Kikkas. Neem ook dit zeker door want die kerel slaat de nagel op de kop!
Bovenstaande tekst heeft ook het levensverhaal van Edward Howard Armstrong aangehaald. Toen ik hierover las heb ik direct zijn hele biografie op wikipedia doorgenomen. In het kort komt het er op neer dat die persoon enkele geniale uitvindingen heeft gedaan, maar het bedrijf waarvoor hij werkte heeft via dure rechtzaken hem in de zak gezet en geruineerd. Uiteindelijk ging zijn huwelijk en zijn hele leven eraan kapot. En dan heeft hij zelfmoord gepleegd.
Ongelooflijk hoe ver het egoisme van een bedrijf kan gaan. Laten we dan ook leren uit de geschiedenis en kritisch staan tegenover de nieuwe elementen waar we tegenover staan zoals intellectuele eigendommen en patenten. We willen toch niet dezelfde fouten maken?

In mijn zoektocht naar meer info over deze man stootte ik op deze pagina: doomed engineers. Neem deze even door!

Last but not least wil ik bij deze ook iedereen de film Revolution OS aanraden. Het is al lang geleden dat ik hem gezien heb, maar hij geeft een prima beeld over de opgang van vrije software (en Linux in het bijzonder), en je leert er vooral ook bij over enkele onethische praktijken van softwaregiganten, you know which one I mean...

Thesis finished

Yesterday, after a night of searching and fixing spelling errors, things that could be better explained and other small details,
I got my thesis printed and delivered the six books to my school.

read more

I just became a "System & Network Architect"

I just signed my contract at Incrowd, the company behind sites such as redbox and facebox.

I will be working there in a team of all young, enthusiastic people. Among those, some people are already familiar to me: my old friend Lieven (we've played in a band together but kept in touch afterwards) and my ex-classmate Jurriaan. Both of them love their jobs btw :-).

My official title is "System & Network architect".
Things I will be doing there is keeping the "lower level" (hardware, network, databases) secure, stable and performing well.
read more

Debian changing route: the end of the perfect server linux distribution?

From the very little experience I have with Debian, and from the stuff I've been reading about it, I think I can safely say Debian has always been a special distribution: packages always take very long to get into the stable tree, because Debian wanted to be a rock solid system where packages go through a lot of testing. ("We release it when it's done") The end result is a distro where you don't have the latest software, neither as much flexibility as, say Gentoo or Arch: You'd many times need to adapt your way of doing things to the "Debian way" (or be prepared to look for help in really obscure places and probably break things) but the end result is a stable distro where everything works very decently. That, combined with no licensing fees (unlike for example Red hat), make it the perfect choice for a server in small companies, where money is more important then features such as professional support or official certifications.

However, it seems like Debian is taking a route that will make it lose it's advantages over other distributions in the server market:
read more

Figuring out CakePHP's new AuthComponent

In the Cake community, there has always been much interest in authentication/authorization systems. The issue of authentication has been addressed in several add-ons provided by the community, such as DAuth (written by me), OthAuth (written by Crazylegs) and many others.

However, one of the additions to the 1.2 branch which is currently in active development , is a built-in auth module. A module that isn't finished yet but it sure is worth it looking at. (In fact I'm thinking about making a new dAuth version built on cake's own auth system.). As most bakers know, there is very little information about the 1.2 branch in general, and the auth component in specific. So what I will try to do, is delve in the code, mess with it, and explain my findings in this post. For this first post it will be more trying to decipher the source code, messing with it will probably for a little later on.
read more

Kwartee 4 2007 verslag

Dit weekend (17-18 maart) ben ik naar Kwartee 4 geweest.
Kwartee weekends worden georganiseerd door Formaat (vroeger bekend als VFJ) en ging door in vormingscentrum destelheide te Dworp (dichtbij Halle, ten zuiden van Brussel).
Twee man sterk (Steven en ik) vertegenwoordigden we jeugdhuis SjaTOo.


read more

Fosdem 2007 review

Every year, during a special weekend in February, the University Libre of Brussels suddenly becomes a little more geeky.
It's that time of the year when many European (and some inter-continental) colleagues join us at
Fosdem: the Free and Open source Software Developers' European Meeting (more info here).


read more

My favorite bash tricks

Hello everyone.
This post is about bash, the shell providing so many users easy access to the underlying power of their system.
(not bash the quote database, although i really like that website too ;-) )
Most people know the basics, but getting to know it better can really increase your productivity. And when that happens, you might start loving bash as much as I do ;-)

I assume you have a basic knowledge of bash, the history mechanism, and ~/.bash* files.
So here they are, my favorite tricks, key combo's and some bonus stuff:

Tricks

  • "cd -" takes you back to the previous directory, wherever that was. Press again to cycle back.
  • putting arguments between braces {like,so} will execute the command multiple times, once for each "argument". Bash will make the cartesian product when doing it multiple times in 1 expression. Some less-obvious tricks with this method are mentioned here
  • HISTIGNORE : with this variable you have control over which things are being saved in your history. Here is a nice explication. Especially the space trick is very useful imho.
  • CD_PATH : Here is a great explanation ;-)
  • readline (library used by bash) trick: put this in your ~/.inputrc (or /etc/inputrc) :
    "\e[5~": history-search-backward
    "\e[6~": history-search-forward
    

    This way you can do *something*+pageup/pagedown to cycle through your history for commands starting with *something*
    You can use the up/down arrows too, their codes are "\e[A" and "\e[B"

  • for more "natural" history saving behavior (when having several terminals open): put this in .bash_profile:

    PROMPT_COMMAND='history -a'
    

    (write each command separately in a new entry, instead of all at shell exit).
    And type

    shopt -s histappend
    

    to append instead of overwrite. (this might be default on some distro's. I think it was on Gentoo)

Shortcuts/keycombos

  • ctrl+r : search through your history. Repeatedly press ctrl+r to cycle through hits.
  • ctrl-u : cut everything on the current line before the cursor.
  • ctrl-y : paste text that was cut using ctrl-u. (starting at the cursor)
  • !$: equals the last word of the previous command. (great when performing several operations on the same file)

Bonus material

  • Bash completion, an add-on for bash which adds completion for arguments of your commands. It's pretty smart and configurable. (it's in portage, and probably in repos of all popular distros out there)
  • This script provides you an interface to the rafb pastebin!
  • Recursively delete all .svn folders in this folder, and in folders below. "find . -name .svn -print0 | xargs -0 rm -rf"
  • Recursively count number of files in a directory: "find -not -type d | wc -l"

Conclusion

Those were all important tricks I'm currently using. On the web you'll find lots more useful tips :-).
If that still isn't enough, there is also man bash :o

With aliases and scripts (and involving tools like sed or awk) the possibilities become pretty much endless. But for that I refer to tldp.org and your favorite web search engine.

Hello world!

Finally, my own website...
I already wanted to get this up for a long time. My initial idea was writing (and styling) it all from scratch using the marvelous CakePHP framework along with an authentication system i wrote, dAuth.
However, due to my lack of time I decided to use the excellent drupal platform, of which I'm quite sure will get the job done equally well, while drastically liberating my time, so I can invest it in other projects :-)
Dries Buytaert's talk on fosdem this year really helped on making that decision ;-)

So, what will this site be about?

  • me
  • my interests
    • Free & Opensource software, and the thoughts/ideals behind it
    • PHP scripting/programming (I like C(++), bash and j2se too but I'm not as good at it as I am at php)
    • Audio recording/mixing/production
    • Drumming, one of my greatest hobbies
    • Music, bands, movies,... I like
    • productivity (TDD, automation scripts, shell/DE tweaks, ...)
    • ethics and philosophy, these aspects are really important in my life
    • Jeugdhuis SjaTOo, our local youth club

Now let's get started ;-)