Monitoring – NIH style (part 2)

19.02.2013 Jonathan Buch

This expands on the idea in the first part of this blog series. We will still be working NIH style here – this time to improve the visuals, user-interface and information density.
The idea still is: collect arbitrary information, stay small, display distilled information. The goal is to learn more how to visualize things, and of course do it within the constraints mentioned in the previous blog entry.

Prototype #2

General idea:

Web interface (Cubism)
real database
real programming language

Cubism (also as recently introduced into Jolokia) is a wonderful Javascript library using D3 to plot information in a browser window as horizon graphs. It features automatic updates to the graphs – for which we will write a server-side application. As with before, environmental restrictions apply (I would have loved to use Jolokia, but it is not available everywhere), so we will roll our own.

Client

While the documentation to cubism isn’t really bad, it lacks a little on the “how do I start to use it” side. The best source for information I found to be the HTML source of the main cubism site. For further information the API documentation on the Cubism wiki is immensely useful.
Parts of a Cubism page:

A metric definition
This is a Javascript function which feeds a cubism callback with an array of values. It gets a start and stop-date and a stepping (time between updates/granularity of information).
An HTML element on which cubism can append the graphs to.
A cubism context, which sets up cubism-specifics.
Here the graphs are set up, most importantly how much information will be displayed.
The metric instances themselves
A set of d3 commands to put the graphs inside the specified HTML element.

For brevity I will try to limit all code here to the really necessary parts while being verbose enough that crucial information does not get lost.
HTML parts:

<style>
@import url(//square.github.com/cubism/style.css);
#mymetric { min-height: 155px; }
</style>
<div id="body">
  <h2>Metric:</h2>
  <div id="mymetric" ></div>
</div>
<script type="text/javascript" src="http://d3js.org/d3.v2.js"></script>
<script type="text/javascript" src="http://raw.github.com/square/cubism/master/cubism.v1.js"></script>
<script type="text/javascript" src="http://code.jquery.com/jquery-1.8.2.min.js"></script>
<script type="text/javascript" src="mymetrics.js"></script>

This will set up your basic structure and can be prettified as much as you care to.
Next up is configuring the cubism context:

var context = cubism.context()
.serverDelay(500)
.clientDelay(100)
.step(10e3)
.size(960);

This specifies that the server and client do not react instantly, that we only update the graph once each 10 seconds and that at most we will display 960 data-points.
I will leave on how to generate the metrics as the last point, as that is the part where I had the most problems with. But here is how we will generate some metric to be used:

var site = "TST";
var metricGenerator = metricGeneratorForHost("localhost:8080");
var mReqTime = metricGenerator.metric(site, "maxRequestTime").divide(1000);
mReqTime.toString = function() { return site+" maxReqT" };

The metricGenerator returns a metric which returns the maximal request time of a JBoss request in milliseconds. We specify a filter which devides each datapoint by 1000 so we get more readable numbers and define a toString function so the description takes less space in the graph.
Now we bind the metrics to a HTML element:

d3.select("#mymetric").call(function(div) {
  div.append("div")
      .attr("class", "axis")
      .call(context.axis().orient("top"));
  div.selectAll(".horizon")
      .data([mReqTime])
      .enter().append("div")
      .attr("class", "horizon")
      .call(context.horizon());
  div.append("div")
      .attr("class", "rule")
      .call(context.rule());
});
// On mousemove, reposition the chart values to match the rule.
context.on("focus", function(i) {
  d3.selectAll(".value").style("right", i == null ? null : context.size() - i + "px");
});

This produces a time axis on top and the horizon charts. Additionally we catch a mouse-over which will show the concrete values for each horizon chart below the cursor. The horizon chart of course has more options, color only being one of them. The d3 data() function takes an array of one or more metric definitions.
Now for the metricGenerator, it is not bug free but it works for me TM.

var metricGeneratorForHost = function(host) {
  if (!arguments.length) host = "localhost:8080";
  var source = {};
  var cubism_cubeFormatDate = d3.time.format.iso;
  source.metric = function(site, expression) {
    return context.metric(function(start, stop, step, callback) {
      var url = host + "/1.0/metric"
          + "?site=" + encodeURIComponent(site)
          + "&expression=" + encodeURIComponent(expression)
          + "&start=" + cubism_cubeFormatDate(start)
          + "&stop=" + cubism_cubeFormatDate(stop)
          + "&step=" + step;
      d3.json(url, function(data) {
          if (!data) return callback(new Error("unable to load data"));
          data.forEach(function(d) {
            cubism_cubeFormatDate.parse(d.date);
            d.value = parseInt(d.value);
          });
          callback(null, data.map(function (d) { return d.value; }) );
      });
    }, ""+site+" "+expression);
  };
  // Returns the Cube host.
  source.toString = function() {
    return ""+site+" "+expression;
  };
  return source;
};

This sets up the basic prototype of a metric which does background HTTP requests to pull data. We will deliver JSON from the server, which gets parsed d3. Cubism does not care at all in which form it gets data, all it cares about are the final data-points in the callback function. The array with the data-points has to be the size of: stop-start/stepping. d3.json(url, callback) gets the JSON, parses it and hands it over to the callback.

[{ site: "TST", date: "2006-01-02T15:04:05.000Z", value: "12021" }]

It takes each element of the array, converts the value to an int and finally (quite convoluted) hands an array of integers to the callback(null, [12021]). If it somehow fails to contact the server correctly it will call the callback with an error.
This finishes up the client side of things. We just need a web-service which delivers the above JSON.
Example Request: http://localhost:8080/1.0/metric?site=TST&expression=maxRequestTime&start=2006-01-02T15:04:05.000Z&stop=2006-01-02T15:05:25.000Z&step=10000
A note on the behavior how Cubism will execute the metric callback. On site load, it will attempt to fill all data-points. So it will generate a massive request for all missing information. Once it is loaded, I have observed (at least for a stepping of 10 seconds) that it will ask for the last 70 seconds, not only for the last value.
The interpolation of the data is not done client side here, although it could be done here. Additionally there are a few bugs where I just didn’t have the time to investigate more. For example the last 7 data-points are always the same, and when asking for the full data-set it gets the time wrong (by the amount of daylight saving time). Not all too critical for me so I didn’t investigate further.

Server

The server part is split into the web-server part and the information gathering part. A sqlite3 database is used to exchange data. As we are adventurous, we will use the new fancy Go language from Rob Pike (google).
To avoid all too much code show, I have pushed a (cleaned) version of my project to github: https://github.com/BuJo/cubismsource
There you can follow my ~~misadventures~~ development path and learning process – my proficiency in Go is sadly lacking. But, I have put up a (hopefully) decent README to get an interesting party started.
The code is split into

cubismsource: web-application which delivers the HTML site and the metrics from the database to cubism
jboss2sqlite: application to periodically save the jboss status into the database
jbossinfo: small library for handling the jboss status xml

This is a partial screenshot of three running JBoss instances running almost the same application. The observant reader will of course spot oddities (like the maximum request time of the first instance climbing quite high there, like the first instance having a hiccup at the end).
The result is quite close to something like Cube or Jolokia – just a little less clever! Feel free to use the code in any way you want – it is “finished” in the sense that it is feature complete and unlikely to be extended by me.
So, what did we gain by putting the odd minute here and there into this?

The need of a third monitor
Beautiful forms and colors – the co-workers’ envy
Knowledge of visualizing things via Cubism
But, we loose points for the added complexity (~650 LoC) – which makes it less accessible from outside.

As with my previous blog articles – this is less about a “solution”, more a way to understand and learn a little more.