Around 7 years ago, I started writing tests for the code I wrote and here I would like to post my experience, thoughts, and hints that I found with this practice.
Unit Tests need to be a commit hook
During these years, I have seen many examples of code that breaks or even fails to compile if it does not get the proper attention. Most of the time this is because code evolves, changes are made, and if the programmer does not give attention, code "degrades" or stops working.
An example of that would be:
- A library code is written to do X. - An App code is written to use that library. ... time pases by ... - The library is refactored and its API has changed. (The engineer is not aware of the App code.) ... time pases by ... - App code does not compile!
If this happens to the code, imagine how fast Unit Tests will degrade if they are not executed often. This is why, in our case, we execute all the Unit Tests on every commit that is made.
Unit Tests need to be fast
This is very important, the execution of the whole list of tests needs to be fast. The faster they are, the better productivity teammates are going to have. To achieve speed I would recommend fake/mocks of any I/O. Also consider parallelization of the execution of tests.
Here are some execution time examples in the project I've been working:
~ 4500 C++ Unit Tests: ~0.8s ~ 20 C++ Visual Tests: ~0.1s each, 50 x 0.1 = 5s ~ 10 C++ Performance Tests: ~5s each, 10 x 5 = 50s
Unit Tests, as you might see, only cost ~0.8s before a commit is done, which is great! This is because all the external libraries are faked/mocked and the code does not need to wait on I/O of any kind.
Visual tests will require GPU execution, but most of the content is going to be procedurally generated so the execution time is still very fast.
Performance tests usually require a lot of iterations and time, which make them costly. In our case, they are not a blocker for a commit, so they are going to be executed later by another machine running BlueSteel.
This means that a teammate will spend few seconds before a commit is made, which in my opinion is a good deal to be sure that the code is still working as expected.
Unit Tests need team commitment
This is going to be crucial for the success of this practice. All the team members around the code need to accept to write tests. Having divergences about this practice might end up with bad results such as poor maintenance of the test infrastructure or breaking features more often.
First test is expensive
The first test is probably one of the most expensive ones in terms of development time mainly because the initial setup. I noticed that, in general, this is one of the biggest barriers when it comes to write tests. At first sight, spending a couple of days (or even a week) making all the setup necessary to run tests might be seen as a waste of time, but down the road it pays off big time preventing regressions, faster and more confident refactors, and keeping part of developer's context written in the code.
Tests tends to be cheaper than Regressions
Something that I found during these years is that writing tests tend to be cheaper than fixing regressions. Let me explain. Normally, to write a test means that you are going to spend engineering time writing that test, and engineering time is expensive.
Lets put an example with some approximated data:
- Engineering salary: ~150.000$ per year. - Working hours in a year: 52 weeks * 40 hours a day = ~2.080 - Price hour: 150.000 / 2.080 = ~72$
Now, let's imagine that we write a test to cover a feature and it takes approximately an hour to be written. Someone can say that writing a test is expensive because it can cost ~72$.
But normally what is not seen about that written test are the benefits of it:
- This test is going to check the working state of that feature in milliseconds.
- The cost of execution of that test is very small.
- This test is going to be executed per every commit made by you or any other team member.
- This test holds context and intention of the engineer who wrote it.
Now, if a human tries to check the working state of that feature per every commit, and let's say it takes at best ~1 min to finish the manual test, we will quickly see how much expensive is to use human time versus machine time.
- Price per minute: ~72$ / 60 minutes hour = ~1.2$
In a perfect scenario with no other costs, it means that after the 60th commit, testing that feature with humans become more expensive than with machines.
From my point of view, using human time for repetitive tasks is absurd, expensive and not scalable. Machines should own the repetition tasks because they are faster, cheaper, and most probably better at it.
On the other hand, regressions tend to be expensive because there is a high chance that the engineer in charge of fixing the regression will not have the proper context about it, and because of that it will be necessary to investigate around that regression, which ends up translated in time of development.
Keep in mind that Unite Tests and QA engineering are not mutually exclusive, but they need proper valance and each one needs to take advantage of its benefits.
Focus on Fakes/Mocks
During this time I found that some of the most important aspects of tests is to have good fakes/mocks for the externals modules that our code is going to interact with.
Having good fakes/mocks will allow our code to be incredibly fast (because it does not need to make slow interactions with real modules) and will allow smaller and more specific tests to be written.
In our case, we need to interact with the Network and OpenGL in a lot of places, and because of that, we have a FakeNetwork and a FakeGL objects in our test folder.
These objects allow an engineer to write tests with the following important steps:
1.- Configure fake objects characteristics for an Scenario A. 2.- Call the code we want to test. 3.- Check the code is doing what we wanted for Scenario A.
For example, let me write a pseudocode to expose a bit clearer this idea. Lets imagine that we want to test that our 3D engine is able to take a screenshot and return it correctly:
// Initialize and configure Fake objects. vector<Pixel> pixels = make_default_image() FakeGL gl = new FakeGL(); gl->set_back_buffer_pixels(pixels); // Setup and call of the real code. Engine eng = new Engine(gl); ... vector<Pixel> screenshot_pixels = eng->make_screenshot(...); // Check the correctness of the data. ASSERT_EQUAL(pixels, screenshot_pixels);
As you can see in the pseudocode above, we first initialize our FakeGL object and then we configure it with very specific values for the pixels in the back buffer. We don't really care who or how those pixels where put into the back buffer. We only care here that the make_screenshot function is able to return the pixels that are in the back buffer.
After we call our library code, we want to check that the returned pixels are equal to the pixels we initially put inside the FakeGL's back buffer.
With that test, we did not need to execute an expensive render operation, and we tested that our make_screenshot function is able to return pixels correctly. This is a very good way to keep tests furiously fast.
Focus on Mutation Coverage and not on Code Coverage
Over these years, I learned that there are two types of "Code Coverage". The first one, probably the most extended one, reports the amount of code that has been executed while executing the tests. Or to put it another way:
Code coverage is the percentage of code which is covered by automated tests.
Code coverage is VERY deceiving. As an example, someone can write a test that with only 1 line of code can execute (and cover) 52% of the existing code of the given library. This result, in itself, is very weak, but it can be weaker if the code coverage result does not give you any data about how different the execution was on a for-loop, or on an if-statement.
The second code coverage that I learned is called Mutation Coverage, which comes from Mutation Testing. This coverage is by far more interesting because it's almost the reverse of the first example.
Mutation testing will cleverly modify one line of your code, compile it, and run all the tests. If the compilation succeeds and all the tests pass correctly, that line was not cover by the test. It can't be more precise than that!
That precision can quickly expose which are the if-statements, for-loops, or even plain code that is not completely covered by tests. From my point of view, a mutation coverage value precisely exposes how much the code lacks tests or not.
The dark side of Mutation Testing is that is going to require a lot of CPU power. Per every small change, the project will need to be recompiled and tests executed. Be prepared because the mutation testing procedure can be time expensive.