Wed, 02 May 2012

Dell crowbar openstack swift

Learned about Dell Crowbar the other day. It seems to be (becoming) a tool I've wanted for quite a while, because it takes automating physical infrastructure to a new level, and is also convenient on virtual.

::Read more

Mon, 12 Nov 2012

Anthracite, an event database to enrich monitoring dashboards and to allow visual and numerical analysis of events that have a business impact


Graphite can show events such as code deploys and puppet changes as vertical markers on your graph. With the advent of new graphite dashboards and interfaces where we can have popups and annotations to show metadata for each event (by means of client-side rendering), it's time we have a database to track all events along with categorisation and text descriptions (which can include rich text and hyperlinks). Graphite is meant for time series (metrics over time), Anthracite aims to be the companion for annotated events.
More precisely, Anthracite aims to be a database of "relevant events" (see further down), for the purpose of enriching monitoring dashboards, as well as allowing visual and numerical analysis of events that have a business impact (for the latter, see "Thoughts on incident nomenclature, severity levels and incident analysis" below)
It has a TCP receiver, a database (sqlite3), a http interface to deliver event data in many formats and a simple web frontend for humans.

design goals:

  • do one thing and do it well. aim for integration.
  • take inspiration from graphite:
    • simple TCP protocol
    • automatically create new event types as they are used
    • run on port 2005 by default (carbon is 2003,2004)
    • deliver events in various formats (html, raw, json, sqlite,...)
    • stay out of the way
  • super easy to install and run: install dependencies, clone repo. the app is ready to run
I have a working prototype on

About "relevant events"

I recommend you submit any event that has or might have a relevant effect on:

  • your application behavior
  • monitoring itself (for example you fixed a bug in metrics reporting. it shouldn't look like the app behavior changed)
  • the business (press coverage, viral videos, etc), because this also affects your app usage and metrics.

Formats and conventions

The TCP receiver listens for lines in this format:

<unix_timestamp> <type> <description>

There are no restrictions for type and description, other than that they must be non-empty strings.
I do have some suggestions which I'll demonstrate through fictive examples;
but note that there's room for improvement, see the section below)

# a deploy_* type for each project
ts "deploy e8e5e4 initiated by Nicolas --"
ts puppet "all nodes of class web_cluster_1: modified apache.conf; restart service apache"
ts incident_sev2_start "mysql2 crashed, site degraded"
ts incident_sev2_resolved "replaced db server"
ts incident "hurricane Sandy, systems unaffected but power outages among users, expect lower site usage"
# in those exceptional cases of manual production changes, try to not forget adding your event
ts manual_dieter "i have to try this firewall thing on the LB"
ts backup "backup from database slave vimeomysql22"

Thoughts on incident nomenclature, severity levels and incident analysis

Because there are so many unique and often subtle pieces of information pertaining to each individual incident, it's often hard to map an incident to a simple severity level or keyword. When displaying events as popups on graphs I think no severity levels are needed, the graphs and event descriptions are much more clear than any severity level could convey.
However, I do think these levels are very useful for reporting and numerical analysis.
On slide 53 of the metametrics slidedeck Allspaw mentions severity levels, which can be paraphrased in terms of service degradation for the end user: 1 (full), 2 (major), 3 (minor), 4 (no).
I would like to extend this notion into the opposite spectrum, and have similar levels on the positivie scale, so that they represent positive incidents (like viral videos, press mentions, ...) as opposed to problematic ones (outages).
For incident analysis we need a rich nomenclature and schema: incidents can (presumably) have a positive or negative impact, can be self-induced or not, and can be categorized with severity levels; they can also be planned for (maintenance, release announcements) or not. While we're at it, how about events to mark point in time where the cause was detected, as well as resolved, so we can calculate TTD and TTR (see Allspaw slidedeck)?
Since basically any event can have a positive or negative impact, an option is to leave out the type 'incident' and give a severity field for every event type.
I'm thinking of a good nomenclature and a schema to express all this. (btw, notice how in common ops literature the word incident is usually associated with outages and bad things; while actually an incident can just as well be a positive event) as well as UI features to support this analysis.

I need your help

The tcp receiver works, the backend works, i have a crude (but functional) web app, and a simple http api to retrieve the events in all kinds of formats. Next up are:
  • monitoring dashboard for graphite that gathers events from anthracite, can show metadata, and can mark a timeframe between start and stop events
  • plugings for puppet, chef to automatically submit their relevant events
  • a better web UI and actually provide features to do statistics on events and analysis such as TTD, TTR, with colors for severity levels etc

Wed, 23 Apr 2014

Metrics 2.0 now has its own website!

Metrics 2.0 started as a half-formal proposal and an implementation via graph-explorer, but is broad enough in scope that it deserves its own website, its own spec, its own community. That's why I launched and a discussion group.

from the website:

We have pretty good storage of timeseries data, collection agents, and dashboards. But the idea of giving timeseries a "name" or a "key" is profoundly limiting us. Especially when they're not standardized and missing information.
Metrics 2.0 aims for self-describing, standardized metrics using orthogonal tags for every dimension. "metrics" being the pieces of information that point to, and describe timeseries of data.

By adopting metrics 2.0 you can:
  • increase compatibility between tools
  • get immediate understanding of metrics
  • build graphs, plots, dashboards and alerting expressions with minimal hassle

Mon, 21 Jan 2013

Profiling and behavior testing of processes and daemons, and Devopsdays NYC

Profiling a process run

I wanted the ability to run a given process and get
a plot of key metrics (cpu usage, memory usage, disk i/o) throughout the duration of the process run.
Something light-weight with minimal dependencies so I can easily install it on a server for a one-time need.
Couldn't find a tool for it, so I wrote profile-process
which does exactly that in <100 lines of python.

black-box behavior testing processes/daemons

I wrote simple-black-box to do this.
It runs the subject(s) in a crafted sandbox, sends input (http requests, commands, ...)
and allows to make assertions on http/statsd requests/responses, network listening state, processes running, log entries,
file existence/checksums in the VFS/swift clusters, etc.
Each test-case is a scenario.
It also can use logstash to give a centralized "distributed stack trace" when you need to debug a failure after multiple processes interacting and acting upon received messages; or to compare behavior across different scenario runs.
You can integrate this with profile-process to compare runtime behaviors across testcases/scenarios.

Simple-black-box talk @ Devopsdays NYC

I did a quick 5min talk, despite some display/timing issues it was well received. (in particular I got some really positive feedback from one person and still wonder if that was a recruiter attempting to hire me -but being shy about it...- it was quite awkward)
raw uncut video. Go to 'New York, January 18th, 2013' from 02:36:25 to 02:41:15

More random thoughts about Devopsdays NYC

  • I'm getting tired of people on stage making a big revelation out of adding an index to a database column. This happens too often at web-ops/devops conferences, it's embarrassing. But at least it's not like the "how we made our site 1000x faster"-style Velocity talks that should have been named "caching and query optimization for newbies"
  • Paperless post confirms again they got their act together and keeps us up to date with their great work. Follow them.
  • Knights of the Provisioning Round Table - Bare Metal Provisioning was mostly (to my surprise) 4 individuals presenting their solution instead of a real round-table, but (to my surprise again) they were not as similar/repetitive as I expected and the pros/cons of all solutions were compared more in depth than I dared to hope. I covered dell crowbar before and like it, though I wonder when this thing is actually gonna be reliable.
  • Dave Zwieback and John Willis gave hilarious talks
  • Tried to start an open space discussion around collaboration patterns and anti-patterns, which I think is a very interesting subject, because how individuals in a team collaborate is crucial to success, but yet very little is written about it (that I could find). I would hope we can distill the years of aggregate experience of people into concise patterns and anti-patterns and document how (not) well they work for development styles (such as agile/scrum), team size, company structure/hierarchy, roadmap/technical debt pressure, etc. And especially in light of any of these things changing, because I've found people can be very change-resistive.
  • DevOps At Obama for America & The Democratic National Committee was good, I thought it would be a rehash of what was said at Coding Forward New York City: Meet the Developers Behind the Obama Campaign but there were a bunch of interesting insights about state of the art technology in here (mostly Amazon stuff)
  • A bunch of talks where the same could have been said in half the time, or less
Random thoughts about some sponsors:
  • Librato is quite cool. It's basically how my open source tool graph-explorer would look like after finishing a bunch of TODO's, combining it with graphite, polishing it all up, and offering it as a hosted solution. I find it interesting if this is a successful business with only such a limited scope
  • Even cooler is datadog. It goes beyond just metrics and doesn't just provide hosted graphing, it provides a solution for a philosophy that aims for a centralized insight of all your operational data, related collaboration and prioritized alerts that are to the point. They get a lot of things right, the open source world has some catching up to do
Interesting that both use Cassandra and free-form tags for flexibility, validating the approach I'm taking with graph-explorer. Now Graphite could use a distributed metrics storage backend over which one can do map-reduce style jobs to gather intelligence from metrics archives (maybe based on Cassandra too?), but that's another story.

Anyway, living in NYC with its vibrant ecosystem of devops people and companies organizing plenty of meet-ups and talks on their own makes it less pressing to have an event like Devopsdays, though it was certainly a good event, thanks to the sponsors and the volunteers.

Sun, 24 Mar 2013

Hi Planet Devops and Infratalk

This blog just got added to planet devops and infra-talk, so for my new readers: you might know me as Dieterbe on irc, github or twitter. Since my move from Belgium to NYC (to do backend stuff at Vimeo) I've started writing more about devops-y topics (whereas I used to write more about general hacking and arch linux release engineering and (automated) installations). I'll mention some earlier posts you might be interested in: FWIW, I'm attending Monitorama next weekend in Boston.

Wed, 14 Nov 2012

Client-side rendered graphite charts for all

Client-side rendering of charts as opposed to using graphite's server side generated png's allows various interactivity features, such as:

  • interactive realtime zooming and panning of the graph, timeline sliders
  • realtime switching between various rendering modes (lines, stacked, etc)
  • toggling certain targets on/off, reordering them, highlighting their plot when hoovering over the legend, etc
  • basic data manipulation, such as smoothing to see how averages compare (akin to movingAverage, but now interactive)
  • popups detailing the exact metrics when hoovering over the chart's datapoints
  • popups for annotated events. (a good use for anthracite).

Those are all features of charting libraries such as flot and rickshaw, the only remaining work is creating a library that unleashes the power of such a framework, integrates it with the graphite api datasource, and makes it available over a simple but powerful api.

That's what I'm trying to achieve with
It's based on rickshaw.
It gives you a JavaScript api to which you specify your graphite targets and some options and it'll give you your graph with extra interactivity sauce. Note that graphite has a good and rich api, one that is widely known and understood, that's why I decided to keep it exposed and not abstract any more than needed.

There are many graphite dashboards with a different focus, but as far as plotting graphs, what they need is usually very similar: a plot to draw multiple graphite targets, a legend, an x-axis, 1 or 2 y-axis, lines or stacked bands, so I think there's a lot of value in having many dashboard projects share the same code for the actual charts, and I hope we can work together on this.

Thu, 29 May 2014

Monitorama PDX & my metrics 2.0 presentation

Earlier this month we had another iteration of the Monitorama conference, this time in Portland, Oregon.

(photo by obfuscurity)

I think the conference was better than the first one in Boston, much more to learn. Also this one was quite focused on telemetry (timeseries metrics processing), lots of talks on timeseries analytics, not so much about things like sensu or nagios. Adrian Cockroft's keynote brought some interesting ideas to the table, like building a feedback loop into the telemetry to drive infrastructure changes (something we do at Vimeo, I briefly give an example in the intro of my talk) or shortening the time from fault to alert (which I'm excited to start working on soon)
My other favorite was Noah Kantrowitz's talk about applying audio DSP techniques to timeseries, I always loved audio processing and production. Combining these two interests hadn't occurred to me so now I'm very excited about the applications.
The opposite, but just as interesting of an idea - conveying information about system state as an audio stream - came up in puppetlab's monitorama recap and that seems to make a lot of sense as well. There's a lot of information in a stream of sound, it is much denser than text, icons and perhaps even graph plots. Listening to an audio stream that's crafted to represent various information might be a better way to get insights into your system.

I'm happy to see the idea reinforced that telemetry is a key part of modern monitoring. For me personally, telemetry (the tech and the process) is the most fascinating part of modern technical operations, and I'm glad to be part of the movement pushing this forward. There's also a bunch of startups in the space (many stealthy ones), validating the market. I'm curious to see how this will play out.

I had the privilege to present metrics 2.0 and Graph-Explorer. As usual the slides are on slideshare and the footage on the better video sharing platform ;-) .
I'm happy with all the positive feedback, although I'm not aware yet of other tools and applications adopting metrics 2.0, and I'm looking forward to see some more of that, because ultimately that's what will show if my ideas are any good.

Wed, 09 Jan 2013

Graph-Explorer: A graphite dashboard unlike any other

The above sounds like a marketing phrase and I'm just as skeptical of them as you, but I feel it's in place. Not because GE is necessarily better, but it's certainly different.

In web operations, mostly when troubleshooting but also for capacity planning, I often find myself having very specific information needs from my time-series, and these information needs vary a lot over time. This usually means I need to correlate or compare things that no one anticipated. Things that relate to specific machines, specific services across machines, or a few specific metrics of which only the ops team knows how they are related and cross various scopes (application, network, system, etc).
I should have an easy way to filter metrics by any information in the metric's name or values.
I should be able to group metrics into graphs the way I want. (example: when viewing filesystem usage of servers, I should be able to group by server (one graph per server listing the filesystems, but also by mountpoint to compare servers on one graph).
I should be able -with minimal effort- to view metrics by their gauge/count value, but also by their rate of change and where appropriate, as a percentage of a maximum (like diskspace used).
It should be trivial to manipulate the graph interactively (toggling things on/off, switching between lines/stacked mode, inspecting datapoints, zooming, ...).
It should show me all events, colorcoded by type, with text description, and interactive so that it can use hyperlinks.
And most of all, the code should be as simple as possible and it should be easy to get running.

Dashboards which show specific predefined KPI's (this covers most graphite dashboards) are clearly unsuitable for this use case. Template-based "metric exploration" dashboards like cacti and ganglia are in my experience way too limited. Graph composing dashboards (like the stock graphite one, or graphiti) require much manual work to get the graph you want. I couldn't find anything even close to what I wanted, so I started Graph-Explorer.

The approach I'm taking is using plugins which add metadata to metrics (tags for server, service, mountpoint, interface name, ...), having them define how to render as a count, as a rate, as a percent of some max allowed value (or a metric containing the max), and providing a query language which allow you to match/filter metrics, group them into graphs by tag, and render them how you want them. The plugins promote standardized metric naming and reuse across organisations, not in the least because most correspond to plugins for the Diamond monitoring agent.

Furthermore, because it uses my graphitejs plugin (which now btw supports flot as a backend for fast canvas-based graphs and annotated events from anthracite) the manual interactions mentioned earlier are supported or at least on the roadmap.

Graph Explorer is not yet where I want it, but it's already a very useful tool at Vimeo.

Fri, 03 Sep 2010

What the open source community can learn from Devops

Being active as both a developer and ops person in the professional life, and both an open source developer and packager in my spare time, I noticed some common ground between both worlds, and I think the open source community can learn from the Devops movement which is solving problems in the professional tech world.

For the sake of getting a point across, I'll simplify some things.

First, a crash course on Devops...

::Read more

Sun, 23 Feb 2014

Introduction talk to metrics 2.0 and Graph-Explorer

This week I had the opportunity to present metrics 2.0 and Graph-Explorer at the Full-stack engineering meetup.
I could easily talk for hours about this stuff but the talk had to be about 20 minutes so I paraphrased it to be only about the basic concepts and ideas and some practical use cases and features. I think it serves as a pretty good introduction and a showcase of the most commonly used features (graph composition, aggregations and unit conversion), and some new stuff such as alerting and dashboards.

The talk also briefly covers native metrics 2.0 through your metrics pipeline using statsdaemon and carbon-tagger. I'm psyched that by formatting metrics at the source a little better and having an aggregation daemon that expresses the performed operations by updating the metric tags, all the foundations are in place for some truly next-gen UI's and applications (one of them already being implemented: graph-explorer can pretty much generate all graphs I need by just phrasing an information need as a proper query)

The video and slides are available and also embedded below.

I would love to do another talk that allows me to dive into more of the underlying ideas, the benefits of metrics2.0 for things like metric storage systems, graph renderers, anomaly detection, dashboards, etc.

Hope you like it!

Sat, 07 Sep 2013

Graphite-ng: A next-gen graphite server in Go.

I've been a graphite contributor for a while (and still am). It's a great tool for timeseries metrics. Two weeks ago I started working on Graphite-ng: it's somewhere between an early clone/rewrite, a redesign, and an experiment playground, written in Golang. The focus of my work so far is the API web server, which is a functioning prototype, it answers requests like


I.e. it lets you retrieve your timeseries, processed by function pipelines which are setup on the fly based on a spec in your http/rest arguments. Currently it only fetches metrics from text files but I'm working on decent metrics storage as well.

There's a few reasons why I decided to start a new project from scratch:

  • With graphite, it's a whole ordeal to get all components properly installed. Deploying in production environments is annoying and even more so when you just want a graphite setup on your personal netbook.
  • the graphite development process slows contributors down a lot. A group of 3rd/4th generation maintainers jointly manages the project, but it's hard to get big changes through, because they (understandably) don't feel authoritative enough to judge those changes and the predecessors have disappeared or are too busy with other things. And also ...
  • there's a high focus on backwards compatibility, which can be a good thing, but it's hard to get rid of old design mistakes, especially when fixing unclear (but arguably broken) early design decisions (or oversights) lead to different outputs
  • Graphite suffers feature creep: it has an events system, a PNG renderer, an elaborate composer web UI, etc. There's a lot of internal code dependencies holding you back from focusing on a specific problem
  • Carbon (the metrics daemon) has a pretty hard performance and scalability ceiling. Peter's article explains this well; I think we'll need some form of rewrite. Peter suggests some solutions but they are basically workarounds for Python's shortcomings. I'm also thinking of using pypy. But last time I checked pypy just wasn't there yet.
  • I want to become a good Go programmer
Note: the Graphite project is still great, the people managing do good work, but it's fairly natural for a code base that large and complicated to end up in this situation.I'm not at all claiming graphite-ng is, or ever will be better but I need a fresh start to try some disruptive ideas, using Go means having a runtime very suited for concurrency and parallelism, you can compile the whole thing down into a single executable file, and its performance looks promising. Leaving out the non-essentials (see below) allows for an elegant and surprisingly small, hackable code base.

The API server I developed sets up a processing pipeline as directed by your query: every processing function runs in a goroutine for concurrency and the metrics flow through using Go channels. It literally compiles a program and executes it. You can add your own functions to collect, process, and return metrics by writing simple plugins.
As for timeseries storage, for now it uses simple text files, but I'm experimenting and thinking what would be the best metric store(s) that works on small scale (personal netbook install) to large scale ("I have millions of metrics that need to be distributed across nodes, the system should be HA and self-healing in failure scenarios, easily maintainable, and highly performant") and is still easy to deploy, configure and run. Candidates are whisper-go, kairosdb, my own elasticsearch experiment etc.
I won't implement rendering images, because I think client-side rendering using something like timeserieswidget is superior. I can also leave out events because anthracite already does that. There's a ton of dashboards out there (graph-explorer, descartes, etc) so that can be left out as well.

For more information, see the Graphite-ng homepage.

PS: props to Felix Geisendörfer who suggested a graphite clone in Go first, it seemed like a huge undertaking but the right thing to do, I had some time so I went for it!

Sat, 14 Sep 2013

Metrics 2.0: a proposal

  • Graphite's metrics are strings comprised of dot-separated nodes which, due to their ordering, can be represented as a tree. Many other places use a similar format (stats in /proc etc).
  • OpenTSDB's metrics are shorter, because they move some of the dimensions (server, etc) into key-value tags.
I think we can do better...
I think our metrics format is restrictive and we do our self a disservice using it:
  • the metrics are not fully self-describing (they need additional information to become useful for interpretation, e.g. a unit, and information about what they mean)
  • they impose needless limitations (no dots allowed, strict ordering)
  • aggregators have no way to systematically and consistently express the operation they performed
  • we can't include additional information because that would create a new metric ID
  • we run into scale/display/correctness problems because storage rollups, rendering consolidation, as well as aggregation api functions need to be aware of how the metric is to be aggregated, and this information is often lacking or incorrect.
  • there's no consistency into what information goes where or how it's called (which field is the one that has the hostname?)
We can solve all of this. In an elegant way, even!

Over the past year, I've been working a lot on a graphite dashboard (Graph-Explorer). It's fairly unique in the sense that it leverages what I call "structured metrics" to facilitate a powerful way to build graphs dynamically. (see also "a few common graphite problems and how they are already solved" and the Graph-Explorer homepage). I implemented two solutions. ( structured_metrics, carbon-tagger) for creating, transporting and using these metrics on which dashboards like Graph-Explorer can rely. The former converts graphite metrics into a set of key-value pairs (this is "after the fact" so a little annoying), carbon-tagger uses a prototype extended carbon (graphite) protocol (called "proto2") to maintain an elasticsearch tags database on the fly, but the format proved too restrictive.

Now for Graphite-NG I wanted to rethink the metric, taking the simplicity of Graphite, the ideas of structured_metrics and carbon_tagger (but less annoying) and OpenTSDB (but more powerful), and propose a new way to identify and use metrics and organize tags around them.

The proposed ingestion (carbon etc) protocol looks like this:

<intrinsic_tags>  <extrinsic_tags> <value> <timestamp>
  • intrinsic_tags and extrinsic_tags are space-separated strings, those strings can be a regular value, or a key=value pair and they describe one dimension (aspect) of the metric.
    • an intrinsic tag contributes to the identity of the metric. If this section changes, we get a new metric
    • an extrinsic tag provides additional information about the metric. changes in this set doesn't change the metric identity
    Internally, the metric ID is nothing more than the string of intrinsic tags you provided (similar to Graphite style). When defining a new metric, write down the intrinsic tags/nodes (like you would do with Graphite except you can use any order), and then keep 'em in that order (to keep the same ID). The ordering does not affect your ability to work with the metrics in any way. The backend indexes the tags and you'll usually rely on those to work with the metrics, and rarely with the ID itself.
  • A metric in its basic form (without extrinsic tags) would look like:
    graphite: stats.gauges.db15.mysql.bytes_received
    opentsdb: mysql.bytes_received host=db15
    proposal: service=mysql server=db15 direction=in unit=B
  • key=val tags are most useful: the key is a name of the aspect, which allows you to express "I want to see my metrics averaged/summed/.. by this aspect, in different graphs based on another aspect, etc" (e.g. GEQL statements), and because you can use them for filtering, so I highly recommend to take some time to come up with a good key for every tag. However sometimes it can be hard to come up with a concise, but descriptive key for a tag, hence they are not mandatory. For regular words without a key, the backend will assign dummy keys ('n1', 'n2', etc) to facilitate those features without hassle. Note the two spaces between intrinsic and extrinsic. With extrinsic tags the example could look like:
    service=mysql server=db15 direction=in unit=B  src=diamond processed_by_statsd env=prod
  • Tags can contain any character except whitespace and null. Specifically: dots (great for metrics that contain an ip, histogram bin, a percentile, etc) and slashes (unit can be 'B/s' too)
  • the unit tag is mandatory. It allows dashboards to show the proper label on the Y-axis, and to do conversions (for example in Graph Explorer, if your metric is an amount of B used on a disk drive, and you request the increase in GB per day. it will automatically convert (and derive) the data). We should aim for standardization of units. I maintain a table of standardized units & prefixes which uses SI and IEC as starting point, and extends it with units commonly used in IT.

Further thoughts/comparisons


  • The concept of extrinsic tags is something I haven't seen before, but I think it makes a lot of sense because we often want to pass extra information but we couldn't because it would create a new metric. It also makes the metrics more self-describing.
  • Sometimes it makes sense for tags only to apply to a metric in a certain time frame. For intrinsic tags this is already the case by design, for extrinsic tags the database could maintain a from/to timestamp based on when it saw the tag for the given metric
  • metric finding: besides the obvious left-to-right auto-complete and pattern matching (which allows searching for substrings in any order), we can now also build an interface that uses facet searches to suggest/auto-complete tags, and filter by toggling on/off tags.
  • Daemons that sit in the pipeline and yield aggregated/computed metrics can do this in a much more useful way. For example a statsd daemon that computes a rate per second for metrics with 'unit=B' can yield a metric with 'unit=B/s'.
  • We can standardize tag keys and values, other than just the unit tag. Beyond the obvious compatibility benefits between tools, imagine querying for:
    • 'unit=(Err|Warn)' and getting all errors and warnings across the entire infrastructure (no matter which tool generated the metric), and grouping/drilling down by tags
    • '$hostname direction=in' and seeing everything coming in to the server, network traffic on the NIC, as well as files being uploaded.
    Also, metrics that are summary statistics (i.e. with statsd) will get intrinsic tags like 'stat=median' or 'stat=upper_90'. This has three fantastic consequences:
    • aggregation (rollups from higher to lower resolution) knows what to do without configurating aggregation functions, because it can be deduced from the metric itself
    • renderers that have to render >1 datapoints per pixel, will produce more accurate, relevant graphs because they can deduce what the metric is meant to represent
    • API functions such as "cumulative", "summarize" and "smartSummarize" don't need to be configured with an explicit aggregation function

From the Graphite perspective, specifically

  • dots and slashes are allowed
  • The tree model is slow to navigate, sometimes hard to find your metrics, and makes it really hard to write queries (target statements) that need metrics in different locations of the tree (because the only way to retrieve multiple metrics in a target is wildcards)
  • The tree model causes people to obsess over node ordering to find the optimal tree organization, but no ordering allows all query use cases anyway, so you'll be limited no matter how much time you spend organising metrics.
  • We do away with the tree entirely. A multi-dimensional tag database is way more powerfuland allows for great "metric finding" (see above)
  • Graphite has no tags support
  • you don't need to configure aggregation functions anymore, less room for errors ("help, my scale is wrong when i zoom out"), better rendering
  • when using statsd, you don't need prefixes like "stats.". In fact that whole prefix/postfix/namespacing thing becomes moot

From the OpenTSDB perspective, specifically

  • allow dots anywhere
  • 'http.hits' becomes 'http unit=Req' (or 'unit=Req http', as long as you pick one and stick with it)
  • probably more, I'm not very familiar with it

From the structured_metrics/carbon-tagger perspective, specifically

  • not every tag requires a key (on metric input), but you still get the same benefits
  • You're not forced to order the tags in any way
  • sometimes relying on the 'plugin' tag is convenient but it's not really intrinsic to the metric, now we can use it as extrinsic tag

Backwards compatibility

  • Sometimes you merely want the ability to copy a "metric id" from the app source code, and paste it in a "/render/?target=" url to see that particular metric. You can still do this: copy the intrinsic tags string and you have the id.
  • if you wanted to, you could easily stay compatible with the official graphite protocol: for incoming metrics, add a 'unit=Unknown' tag and optionally turn the dots into spaces (so that every node becomes a tag), so you can mix old-style and new style metrics in the same system.

Sun, 18 May 2014

On Graphite, Whisper and InfluxDB

Graphite, and the storage Achilles heel

Graphite is a neat timeseries metrics storage system that comes with a powerful querying api, mainly due to the whole bunch of available processing functions.
For medium to large setups, the storage aspect quickly becomes a pain point. Whisper, the default graphite storage format, is a simple storage format, using one file per metric (timeseries).
  • It can't keep all file descriptors in memory so there's a lot of overhead in constantly opening, seeking, and closing files, especially since usually one write comes in for all metrics at the same time.
  • Using the rollups feature (different data resolutions based on age) causes a lot of extra IO.
  • The format is also simply not optimized for writes. Carbon, the storage agent that sits in front of whisper has a feature to batch up writes to files to make them more sequential but this doesn't seem to help much.
  • Worse, due to various implementation details the carbon agent is surprisingly inefficient and cpu-bound. People often run into cpu limitations before they hit the io bottleneck. Once the writeback queue hits a certain size, carbon will blow up.
Common recommendations are to run multiple carbon agents and running graphite on SSD drives.
If you want to scale out across multiple systems, you can get carbon to shard metrics across multiple nodes, but the complexity can get out of hand and manually maintaining a cluster where nodes get added, fail, get phased out, need recovery, etc involves a lot of manual labor even though carbonate makes this easier. This is a path I simply don't want to go down.

These might be reasonable solutions based on the circumstances (often based on short-term local gains), but I believe as a community we should solve the problem at its root, so that everyone can reap the long term benefits.

In particular, running Ceres instead of whisper, is only a slight improvement, that suffers from most of the same problems. I don't see any good reason to keep working on Ceres, other than perhaps that it's a fun exercise. This probably explains the slow pace of development.
However, many mistakenly believe Ceres is "the future".
Switching to LevelDB seems much more sensible but IMHO still doesn't cut it as a general purpose, scalable solution.

The ideal backend

I believe we can build a backend for graphite that
  • can easily scale from a few metrics on my laptop in power-save mode to millions of metrics on a highly loaded cluster
  • supports nodes joining and leaving at runtime and automatically balancing the load across them
  • assures high availability and heals itself in case of disk or node failures
  • is simple to deploy. think: just run an executable that knows which directories it can use for storage, elasticsearch-style automatic clustering, etc.
  • has the right read/write optimizations. I've never seen a graphite system that is not write-focused, so something like LSM trees seems to make a lot of sense.
  • can leverage cpu resources (e.g. for compression)
  • provides a more natural model for phasing out data. Optional, runtime-changeable rollups. And an age limit (possibly, but not necessarily round robin)
While we're at it. pub-sub for realtime analytics would be nice too. Especially when it allows to use the same functions as the query api.
And getting rid of the metric name restrictions such as inability to use dots or slashes.


There's a lot of databases that you could hook up to graphite. riak, hdfs based (opentsdb), Cassandra based (kairosdb, blueflood, cyanite), etc. Some of these are solid and production ready, and would make sense depending on what you already have and have experience with. I'm personally very interested in playing with Riak, but decided to choose InfluxDB as my first victim.

InfluxDB is a young project that will need time to build maturity, but is on track to meet all my goals very well. In particular, installing it is a breeze (no dependencies), it's specifically built for timeseries (not based on a general purpose database), which allows them to do a bunch of simplifications and optimizations, is write-optimized, and should meet my goals for scalability, performance, and availability well. And they're in NYC so meeting up for lunch has proven to be pretty fruitful for both parties. I'm pretty confident that these guys can pull off something big.

Technically, InfluxDB is a "timeseries, metrics, and analytics" databases with use cases well beyond graphite and even technical operations. Like the alternative databases, graphite-like behaviors such as rollups management and automatically picking the series in the most appropriate resolutions, is something to be implemented on top of it. Although you never know, it might end up being natively supported.

Graphite + InfluxDB

InfluxDB developers plan to implement a whole bunch of processing functions (akin to graphite, except they can do locality optimizations) and add a dashboard that talks to InfluxDB natively (or use Grafana), which means at some point you could completely swap graphite for InfluxDB. However, I think for quite a while, the ability to use the Graphite api, combine backends, and use various graphite dashboards is still very useful. So here's how my setup currently works:
  • carbon-relay-ng is a carbon relay in Go. It's a pretty nifty program to partition and manage carbon metrics streams. I use it in front of our traditional graphite system, and have it stream - in realtime - a copy of a subset of our metrics into InfluxDB. This way I basically have our unaltered Graphite system, and in parallel to it, InfluxDB, containing a subset of the same data.
    With a bit more work it will be a high performance alternative to the python carbon relay, allowing you to manage your streams on the fly. It doesn't support consistent hashing, because CH should be part of a strategy of a highly available storage system (see requirements above), using CH in the relay still results in a poor storage system, so there's no need for it.
  • I contributed the code to InfluxDB to make it listen on the carbon protocol. So basically, for the purpose of ingestion, InfluxDB can look and act just like a graphite server. Anything that can write to graphite, can now write to InfluxDB. (assuming the plain-text protocol, it doesn't support the pickle protocol, which I think is a thing to avoid anyway because almost nothing supports it and you can't debug what's going on)
  • graphite-api is a fork/clone of graphite-web, stripped of needless dependencies, stripped of the composer. It's conceived for many of the same reasons behind graphite-ng (graphite technical debt, slow development pace, etc) though it doesn't go to such extreme lengths and for now focuses on being a robust alternative for the graphite server, api-compatible, trivial to install and with a faster pace of development.
  • That's where graphite-influxdb comes in. It hooks InfluxDB into graphite-api, so that you can query the graphite api, but using data in InfluxDB. It should also work with the regular graphite, though I've never tried. (I have no incentive to bother with that, because I don't use the composer. And I think it makes more sense to move the composer into a separate project anyway).
With all these parts in place, I can run our dashboards next to each other - one running on graphite with whisper, one on graphite-api with InfluxDB - and simply look whether the returned data matches up, and which dashboards loads graphs faster. Later i might do more extensive benchmarking and acceptance testing.

If all goes well, I can make carbon-relay-ng fully mirror all data, make graphite-api/InfluxDB the primary, and turn our old graphite box into a live "backup". We'll need to come up with something for rollups and deletions of old data (although it looks like by itself influx is already more storage efficient than whisper too), and I'm really looking forward to the InfluxDB team building out the function api, having the same function api available for historical querying as well as realtime pub-sub. (my goal used to be implementing this in graphite-ng and/or carbon-relay-ng, but if they do this well, I might just abandon graphite-ng)

To be continued..

Thu, 04 Apr 2013

A few common graphite problems and how they are already solved.

metrics often seem to lack details, such as units and metric types

looking at a metric name, it's often hard to know
  • the unit a metric is measured in (bits, queries per second, jiffies, etc)
  • the "type" (a rate, an ever increasing counter, gauge, etc)
  • the scale/prefix (absolute, relative, percentage, mega, milli, etc)
structured_metrics solves this by adding these tags to graphite metrics:
  • what
    what is being measured?: bytes, queries, timeouts, jobs, etc
  • target_type
    must be one of the existing clearly defined target_types (count, rate, counter, gauge)
    These match statsd metric types (i.e. rate is per second, count is per flushInterval)
In Graph-Explorer these tags are mandatory, so that it can show the unit along with the prefix (i.e. 'Gb/s') on the axis.
This will also allow you to request graphs in a different unit and the dashboard will know how to convert (say, Mbps to GB/day)

tree navigation/querying is cumbersome, metrics search is broken. How do I organize the tree anyway?

the tree is a simplistic model. There is simply too much dimensionality that can't be expressed in a flat tree. There's no way you can organize it so that will it satisfy all later needs. A tag space like structured_metrics makes it obsolete. with Graph-Explorer you can do (full-text) search on metric name, by any of their tags, and/or by added metadata. So practically you can filter by things like server, service, unit (e.g. anything expressed in bits/bytes per second, or anything denoting errors). All this irrespective of the source of a metric or the "location in the tree".

no interactivity with graphs

timeserieswidget allows you to easily add interactive graphite graph objects to html pages. You get modern features like togglable/reorderable metrics, realtime switching between lines/stacked, information popups on hoover, highlighting, smoothing, and (WIP) realtime zooming. It has a canvas (flot) and svg (rickshaw/d3) backend. So it basically provides a simpler api to use these libraries specifically with graphite.
There's a bunch of different graphite dashboards with different takes on graph composition/configuration and workflow, but the actual rendering of graphs usually comes down to plotting some graphite targets with a legend. timeserieswidget aims to be a drop-in plugin that brings all modern features so that different dashboards can benefit from a common, shared codebase, because static PNGs are a thing from the past


events lack text annotations, they are simplistic and badly supported

Graphite is a great system for time series metrics. Not for events. metrics and events are very different things across the board. drawAsInFinite() is a bit of a hack.
  • anthracite is designed specifically to manage events.
    It brings extra features such as different submission scripts, outage annotations, various ways to see events and reports with uptime/MTTR/etc metrics.
  • timeserieswidget displays your events on graphs along with their metadata (which can be just some text or even html code).
    this is where client side rendering shines


cumbersome to compose graphs

There's basically two approaches:
  • interactive composing: with the graphite composer, you navigate through the tree and apply functions. This is painfull, dashboards like descartes and graphiti can make this easier
  • use a dashboard that uses predefined templates (gdash and others) They often impose a strict navigation path to reach pages which may or may not give you the information you need (usually less or way more)
With both approaches, you usually end up with an ever growing pile of graphs that you created and then keep for reference.
This becomes unwieldy but is useful for various use cases and needs.
However, neither approach is convenient for changing information needs.
Especially when troubleshooting, one day you might want to compare the rate of increase of open file handles on a set of specific servers to the traffic on given network switches, the next day it's something completely different.
With Graph-Explorer:
  • GE gives you a query interface on top of structured_metric's tag space. this enables a bunch of things (see above)
  • you can yield arbitrary targets for each metric, to look at the same thing from a different angle (i.e. as a rate with `derivative()` or as a daily summary), and you can of course filter by angle
  • You can group metrics into graphs by arbitrary tags (e.g. you can see bytes used of all filesystems on a graph per server, or compare servers on a graph per filesystem). This feature always results in the "wow that's really cool" every time I show it
  • GE includes 'what' and 'target_type' in the group_by tags by default so basically, if things are in a different unit (B/s vs B vs b etc) it'll put them in separate graphs (controllable in query)
  • GE automatically generates the graph title and vertical title (always showing the 'what' and the unit), and shows all metrics' extra tags. This also gives you a lot of inspiration to modify or extend your query

limited options to request a specific time range

GE's query language supports freeform `from` and `to` clauses.

Referenced projects

  • anthracite:
    event/change logging/management with a bunch of ingestion scripts and outage reports
  • timeserieswidget:
    jquery plugin to easily get highly interactive graphite graphs onto html pages (dashboards)
  • structured_metrics:
    python library to convert graphite metrics tree into a tag space with clearly defined units and target types, and arbitrary metadata.
  • graph-explorer:
    dashboard that provides a query language so you can easily compose graphs on the fly to satisfy varying information needs.
All tools are designed for integration with other tools and each other. Timeserieswidget gets data from anthracite, graphite and elasticsearch. Graph-Explorer uses structured_metrics and timeserieswidget.

Future work

There's a whole lot going on in the monitoring space, but I'd like to highlight a few things I personally want to work more on:
  • I spoke with Michael Leinartas at Monitorama (and there's also a launchpad thread). We agreed that native tags in graphite are the way forward. This will address some of the pain points I'm already fixing with structured_metrics but in a more native way. I envision submitting metrics would move from:
    stats.serverdb123.mysql.queries.selects 895 1234567890
    to something more along these lines:
    host=serverdb123 service=mysql type=select what=queries target_type=rate 895 1234567890
    host=serverdb123 service=mysql type=select unit=Queries/s 895 1234567890
    h=serverdb123 s=mysql t=select queries r 895 1234567890
  • switch Anthracite backend to ElasticSearch for native integration with logstash data (and allow you to use kibana)