BlueSteel started as a tool to keep track of the performance of a 3D engine. One of the main problems was constant regressions affecting the time spend on rendered frames. The process of bisecting the code after discovering the regression and finding the origin of it was very time consuming. It made a lot of sense to create a tool able to benchmark every commit of that 3D engine project with the ability to send emails when a fluctuation appeared.

First hack iteration

The first iteration was a complete quick hack. It was made in Python using Django 1.1. The project itself was composed of:

  • A benchmark script.
  • A webpage for presenting the results.

The benchmark script was in charge of making all the hard work. From asking the web service if a commit was already benchmarked to pull + build + benchmark that specific commit.

The web page was provided by a simple app living on a Django project. That app was serving a very thin HTML code with a javascript code who retrieved the information from the web service and generated HTML on the fly.

First iteration

This first iteration was able to prove the concept while at the same time helped reduce the amount of time required to solve a FPS degradation. At the same time it had a lot of missing features. There was just basic information about the commit, author name, first 7 letters of the commit hash. The benchmark script and the web app where completely tied to the 3D engine git project. etc.

And the first step trying to analyze the code inside a commit was also in place. For every commit there was a row of information showing the amount of lines added and subtracted, both for regular .h .cpp files and .test.cpp files.

Second iteration

This project was never properly funded with development time. But the time spend on it was used properly!

Everything was better on this iteration. The project was using Django 1.4. All the web pages where created with templates. Support for many workers was added. Huge improvements on server side. Shades of steelblue color were used everywhere. And more information was fed for every commit entry.

Here we can see a picture of it in action:

Second iteration

Because time constrains, the project was still tied to the 3D engine project, but allowed its customers to rapidly point and fix conflictive commits that affected performance.

At this stage the project acquired its name. There were many jokes about the name of the HTML color used (steelblue) on the theme and the Blue Steel, a face made by the fictional character Derek Zoolander in the films Zoolander. Far from this, the real source of the name comes from the blue shade obtained while tempering steel, which increase its toughness.

Third iteration

While BlueSteel was working everyday providing valuable information for every commit fed to it, some work was initiated to improve the code analysis skills of BlueSteel.

Having the ability to check the code coverage value of a source code can be misleading, because you can end up with this scenario inside a random test:

  • You call a function that exercises +1000 lines of code.
  • You virtually check nothing.
  • The code coverage tool is going to mark all these +1000 lines as covered.
  • The result is meaningless (or misleading).

Instead of doing that, BlueSteel takes an upside-down approach:

  • For every line of code inside a commit
  • The line is changed / mutated / removed.
  • The code is compiled. (1)
  • The tests are executed. (2)
  • At this point, if (1) and (2) ended successfully, you know that the original line was not covered by tests.

The result of this algorithm was vastly ignored by many engineers, (no one wants a bot telling him/her that he/she is not writing tests :D), but it was a huge improvement on understanding the current status of the code. BlueSteel thoroughly marked every line of uncovered code, and we were able to spot which parts of the code were the most fragile ones.

Here is a picture of it:

With this algorithm, we discovered uncompiled code, untested code, unexercised code, etc. Because this algorithm was parsing C++ code and this is a very difficult task, there is plenty of room for improvements.

Forth iteration

The forth iteration started as an intent of making BlueSteel to be truly independent of the project to benchmark. Because of it, this ended up as a full re-write of the BlueSteel project while applying all the knowledge gathered on previous iterations.

Now, BlueSteel can fully understand git repositories and all its data associated with them. This allow BlueSteel to benchmark every commit on every branch with a benchmark definition.

A benchmark definitions store the required steps to benchmark a given git commit. Think about it as a list of commands that will be executed while having the git project pointing at that precise commit. After the execution of all these steps, the results will be stored on something called benchmark execution.

Benchmark executions will show us the fluctuations of the results by representing them on screen with charts, text, etc. Also, these benchmark executions can notify the commits creators in case the fluctuations go beyond some given thresholds.

BlueSteel is still in heavy development as 2016-05-09, and I will update this article with new information :D

Last updated on