Graphite & Influxdb intermezzo: migrating old data and a more powerful carbon relay

Migrating data from whisper into InfluxDB

"How do i migrate whisper data to influxdb" is a question that comes up regularly, and I've always replied it should be easy to write a tool to do this. I personally had no need for this, until a recent small influxdb outage where I wanted to sync data from our backup server (running graphite + whisper) to influxdb, so I wrote a script:
# whisper dir without trailing slash.
start=$(date -d 'sep 17 6am' +%s)
end=$(date -d 'sep 17 12pm' +%s)
pipe_path=$(mktemp -u)
mkfifo $pipe_path
function influx_updater() {
    influx-cli -db $db -async < $pipe_path
influx_updater &
while read wsp; do
  series=$(basename ${wsp//\//.} .wsp)
  echo "updating $series ..." --from=$start --until=$end $wsp_dir/$wsp.wsp | grep -v 'None$' | awk '{print "insert into \"'$series'\" values ("$1"000,1,"$2")"}' > $pipe_path
done < <(find $wsp_dir -name '*.wsp' | sed -e "s#$wsp_dir/##" -e "s/.wsp$//")

It relies on the recently introduced asynchronous inserts feature of influx-cli - which commits inserts in batches to improve the speed - and the whisper-fetch tool.
You could probably also write a Go program using the unofficial whisper-go bindings and the influxdb Go client library. But I wanted to keep it simple. Especially when I found out that whisper-fetch is not a bottleneck: starting whisper-fetch, and reading out - in my case - 360 datapoints of a file always takes about 50ms, whereas InfluxDB at first only needed a few ms to flush hundreds of records, but that soon increased to seconds.
Maybe it's a bug in my code, I didn't test this much, because I didn't need to; but people keep asking for a tool so here you go. Try it out and maybe you can fix a bug somewhere. Something about the write performance here must be wrong.

A more powerful carbon-relay-ng

carbon-relay-ng received a bunch of love and has been a great help in my graphite+influxdb experiments.

Here's what changed:
  • First I made it so that you can adjust routes at runtime while data is flowing through, via a telnet interface.
  • Then Paul O'Connor built an embedded web interface to manage your routes in an easier and prettier way (pictured above)
  • The relay now also emits performance metrics via statsd (I want to make this better by using go-metrics which will hopefully get expvar support at some point - any takers?).
  • Last but not least, I borrowed the diskqueue code from NSQ so now we can also spool to disk to bridge downtime of endpoints and re-fill them when they come back up
Beside our metrics storage, I also plan to put our anomaly detection (currently playing with heka and kale) and carbon-tagger behind the relay, centralizing all routing logic, making things more robust, and simplifying our system design. The spooling should also help to deploy to our metrics gateways at other datacenters, to bridge outages of datacenter interconnects.

I used to think of carbon-relay-ng as the python carbon-relay but on steroids, now it reminds me more of something like nsqd but with an ability to make packet routing decisions by introspecting the carbon protocol,
or perhaps Kafka but much simpler, single-node (no HA), and optimized for the domain of carbon streams.
I'd like the HA stuff though, which is why I spend some of my spare time figuring out the intricacies of the increasingly popular raft consensus algorithm. It seems opportune to have a simpler Kafka-like thing, in Go, using raft, for carbon streams. (note: InfluxDB might introduce such a component, so I'm also a bit waiting to see what they come up with)

Reminder: notably missing from carbon-relay-ng is round robin and sharding. I believe sharding/round robin/etc should be part of a broader HA design of the storage system, as I explained in On Graphite, Whisper and InfluxDB. That said, both should be fairly easy to implement in carbon-relay-ng, and I'm willing to assist those who want to contribute it.