arch bash cakephp conf dauth devops drupal foss git golang information age life linux lua mail monitoring music mysql n900 netlog openstack perf photos php productivity python thesis travel uzbl vimeo web2.0

Introduction talk to metrics 2.0 and Graph-Explorer

This week I had the opportunity to present metrics 2.0 and Graph-Explorer at the Full-stack engineering meetup.
read more

What the open source community can learn from Devops

Being active as both a developer and ops person in the professional life, and both an open source developer and packager in my spare time, I noticed some common ground between both worlds, and I think the open source community can learn from the Devops movement which is solving problems in the professional tech world.

For the sake of getting a point across, I'll simplify some things.

First, a crash course on Devops...


read more

Profiling and behavior testing of processes and daemons, and Devopsdays NYC

Profiling a process run

I wanted the ability to run a given process and get
a plot of key metrics (cpu usage, memory usage, disk i/o) throughout the duration of the process run.
Something light-weight with minimal dependencies so I can easily install it on a server for a one-time need.
Couldn't find a tool for it, so I wrote profile-process
which does exactly that in <100 lines of python.

black-box behavior testing processes/daemons

I wrote simple-black-box to do this.
It runs the subject(s) in a crafted sandbox, sends input (http requests, commands, ...)
and allows to make assertions on http/statsd requests/responses, network listening state, processes running, log entries,
file existence/checksums in the VFS/swift clusters, etc.
Each test-case is a scenario.
It also can use logstash to give a centralized "distributed stack trace" when you need to debug a failure after multiple processes interacting and acting upon received messages; or to compare behavior across different scenario runs.
You can integrate this with profile-process to compare runtime behaviors across testcases/scenarios.
read more

RRDtool: updating RRA settings and keeping your collected data

When you use rrdtool, it can happen that you first create your databases, then collect a whole bunch of data and decide later you want more accuracy/longer periods.
Especially when using
read more

Metrics 2.0 now has its own website!

Metrics 2.0 started as a half-formal proposal and an implementation via graph-explorer, but is broad enough in scope that it deserves its own website, its own spec, its own community. That's why I launched metrics20.org and a discussion group.
read more

Graphite & Influxdb intermezzo: migrating old data and a more powerful carbon relay

Client-side rendered graphite charts for all

Client-side rendering of charts as opposed to using graphite's server side generated png's allows various interactivity features, such as:
read more

Monitorama PDX & my metrics 2.0 presentation

Earlier this month we had another iteration of the Monitorama conference, this time in Portland, Oregon.


(photo by obfuscurity)


read more

A real whisper-to-InfluxDB program.

The whisper-to-influxdb migration script I posted earlier is pretty bad. A shell script, without concurrency, and an undiagnosed performance issue. I hinted that one could write a Go program using the unofficial whisper-go bindings and the influxdb Go client library. That's what I did now, it's at github.com/vimeo/whisper-to-influxdb. It uses configurable amounts of workers for both whisper fetches and InfluxDB commits, but it's still a bit naive in the sense that it commits to InfluxDB one serie at a time, irrespective of how many records are in it. My series, and hence my commits have at most 60k records, and presumably InfluxDB could handle a lot more per commit, so we might leverage better batching later. Either way, this way I can consistently commit about 100k series every 2.5 hours (or 10/s), where each serie has a few thousand points on average, with peaks up to 60k points. I usually play with 1 to 30 InfluxDB workers. Even though I've hit a few InfluxDB issues, this tool has enabled me to fill in gaps after outages and to do a restore from whisper after a complete database wipe.

Graph-Explorer: A graphite dashboard unlike any other

The above sounds like a marketing phrase and I'm just as skeptical of them as you, but I feel it's in place. Not because GE is necessarily better, but it's certainly different.
read more

Graphite-ng: A next-gen graphite server in Go.

I've been a graphite contributor for a while (and still am). It's a great tool for timeseries metrics. Two weeks ago I started working on Graphite-ng: it's somewhere between an early clone/rewrite, a redesign, and an experiment playground, written in Golang. The focus of my work so far is the API web server, which is a functioning prototype, it answers requests like

/render/?target=sum(scale(stats.web2,5.12),derivative(stats.web2))

I.e. it lets you retrieve your timeseries, processed by function pipelines which are setup on the fly based on a spec in your http/rest arguments. Currently it only fetches metrics from text files but I'm working on decent metrics storage as well.


read more

InfluxDB as a graphite backend, part 2



Updated oct 1, 2014 with a new Disk space efficiency section which fixes some mistakes and adds more clarity.

The Graphite + InfluxDB series continues.

  • In part 1, "On Graphite, Whisper and InfluxDB" I described the problems of Graphite's whisper and ceres, why I disagree with common graphite clustering advice as being the right path forward, what a great timeseries storage system would mean to me, why InfluxDB - despite being the youngest project - is my main interest right now, and introduced my approach for combining both and leveraging their respective strengths: InfluxDB as an ingestion and storage backend (and at some point, realtime processing and pub-sub) and graphite for its renown data processing-on-retrieval functionality. Furthermore, I introduced some tooling: carbon-relay-ng to easily route streams of carbon data (metrics datapoints) to storage backends, allowing me to send production data to Carbon+whisper as well as InfluxDB in parallel, graphite-api, the simpler Graphite API server, with graphite-influxdb to fetch data from InfluxDB.
  • Not Graphite related, but I wrote influx-cli which I introduced here. It allows to easily interface with InfluxDB and measure the duration of operations, which will become useful for this article.
  • In the Graphite & Influxdb intermezzo I shared a script to import whisper data into InfluxDB and noted some write performance issues I was seeing, but the better part of the article described the various improvements done to carbon-relay-ng, which is becoming an increasingly versatile and useful tool.
  • In part 2, which you are reading now, I'm going to describe recent progress, share more info about my setup, testing results, state of affairs, and ideas for future work

read more

A few common graphite problems and how they are already solved.

On Graphite, Whisper and InfluxDB

Graphite, and the storage Achilles heel

Graphite is a neat timeseries metrics storage system that comes with a powerful querying api, mainly due to the whole bunch of available processing functions.
For medium to large setups, the storage aspect quickly becomes a pain point. Whisper, the default graphite storage format, is a simple storage format, using one file per metric (timeseries).
read more

Anthracite, an event database to enrich monitoring dashboards and to allow visual and numerical analysis of events that have a business impact

Introduction

Graphite can show events such as code deploys and puppet changes as vertical markers on your graph. With the advent of new graphite dashboards and interfaces where we can have popups and annotations to show metadata for each event (by means of client-side rendering), it's time we have a database to track all events along with categorisation and text descriptions (which can include rich text and hyperlinks). Graphite is meant for time series (metrics over time), Anthracite aims to be the companion for annotated events.
More precisely, Anthracite aims to be a database of "relevant events" (see further down), for the purpose of enriching monitoring dashboards, as well as allowing visual and numerical analysis of events that have a business impact (for the latter, see "Thoughts on incident nomenclature, severity levels and incident analysis" below)
It has a TCP receiver, a database (sqlite3), a http interface to deliver event data in many formats and a simple web frontend for humans.


read more

Metrics 2.0: a proposal

  • Graphite's metrics are strings comprised of dot-separated nodes which, due to their ordering, can be represented as a tree. Many other places use a similar format (stats in /proc etc).
  • OpenTSDB's metrics are shorter, because they move some of the dimensions (server, etc) into key-value tags.
I think we can do better...
I think our metrics format is restrictive and we do our self a disservice using it:
read more

Histograms in statsd, and graphing them over time with graphite.

I submitted a pull request to statsd which adds histogram support. Example histogram, from Wikipedia
(refresher: a histogram is [a visualization of] a frequency distribution of data, paraphrasing your data by keeping frequencies for entire classes (ranges of data). histograms - Wikipedia)
It's commonly documented how to plot single histograms, that is a 2D diagram consisting of rectangles whose

  • area is proportional to the frequency of a variable
  • whose width is equal to the class interval
Class intervals go on x-axis, frequencies on y-axis.

Note: histogram class intervals are supposed to have the same width.
My implementation allows arbitrary class intervals with potentially different widths, as well as an upper boundary of infinite.

Plotting histograms.. over time


read more

Hi Planet Devops and Infratalk

This blog just got added to planet devops and infra-talk, so for my new readers: you might know me as Dieterbe on irc, github or twitter. Since my move from Belgium to NYC (to do backend stuff at Vimeo) I've started writing more about devops-y topics (whereas I used to write more about general hacking and arch linux release engineering and (automated) installations). I'll mention some earlier posts you might be interested in:
read more

Dell crowbar openstack swift

Learned about Dell Crowbar the other day. It seems to be (becoming) a tool I've wanted for quite a while, because it takes automating physical infrastructure to a new level, and is also convenient on virtual.
read more