(Tech: C#, Microsoft SQL)
Recently I was investigating a bug which kept me, and a few of my mates, occupied way longer than we are willing to admit :)
One summer’s morning
It started on a fresh summer’s morning…
After I’ve completed my code changes, like a very good software citizen, I ran the project’s unit tests.
They were all green and I was riding the wave of success, nothing could stand in my way today…I’m flying through my work.
Last formality is to create a Pull Request and wait for the build server to succeed before reviewers will approve my changes.
Life…is…easy, everything is in harmony.
Until I received an email from the build server - Failing tests.
They were all green locally, surely this can’t be right.
I kicked off another build, now they pass.
Kicked another build, now they fail.
Kicked off yet another build, failed again.
This carried on a few more times with the hope I’ll find a pattern, no luck.
It was very strange as my changes did not relate to the failing tests.
I then realized it was one of those “fun” intermittent failing tests.
And so my investigation was born…
I set my acceptance criteria to 5 successful consecutive builds.
I had two failing tests.
In short, the tests set a property and then saved that property to the database.
public void ThenInvoiceCommentIsStored()
The verification failed when that property was expected to have a string value but the actual value was null.
invoices.Lines.Comment should be
Usually when I have intermittent failing tests, past experience taught me dates/times/timezone are involved.
In my case, none of those were present.
I’ve checked server build plans, added debugging messages, second guessing frameworks we used (long shot, I know), adding delays to check race conditions and a few other ridiculous ideas not worth mentioning.
None of my plans worked.
The investigation was a very slow and painful process as every code change took 5 minutes to build.
Die bug, die!
I decided to turn my focus back to the origin of the problem, the failing tests.
I zoomed in on the one particular test - ThenInvoiceCommentIsStored.
By looking at the test, the intent is to assert the first element in the list.
I’ve added a breakpoint and discovered there were actually two items in the Lines list.
Hypothesis: Is it possible we updated the data correctly but the list is only in the wrong order?
I changed the test to find the Line item by ID (filter), rather than the first element in the Lines list.
public void ThenInvoiceCommentIsStored()
BOOM, the beast was slain…5 successful consecutive builds!
The actual problem was a mistake in the test.
The test indicated we expect the first element in the list to change.
That holds true, so long as there is only one item in the list.
As soon as we have two items in the list, then we have to investigate sort order.
The extra data silently changed the test.
Because we query the data directly from the database, with no sort order, the database will return the records based on its default sort order.
Our tests were at the mercy of the database sort order.
Usually the database’s default sort order is based on a table’s Primary Key.
In our case,
Lines.ID was the Primary Key and was of Data Type
CREATE TABLE [dbo].[Lines](
Example: Two Line records were added in exactly this order but with only one digit difference:
488d6b1a-898e-4a13-acb3-a9e500d66962 (sample only, GUIDS are unique)
On a query of the data, even though the data input was done with
2xxxxxxx-xxxx-xxxx... will always show up first because of the Ascending Primary Key sort order
SELECT ID FROM Lines
This had a devastating impact on the server unit tests as we had intermittent test failures.
In some cases we were lucky and the GUID order worked out well for asserting the first element in the Lines list, other times the updated Line item was somewhere in the list.
What about the green local tests?
I’m still unclear as to why they passed.
The only difference being that on the server, the database is created every time.
Locally, the database is created only the first time you run the unit tests.
I decided to let this one go as the change fixed the server build and locally the tests still passed.
My initial mission was completed.
What I have learned?
Write your tests with more intent.
Your test must speak to future you about what you were trying to test.
When you look back at code you’ve written 3 months ago, you will thank the past you :)
Perseverance - failure was not an option.
The longer the problem dragged out, the more drive I had to fix the bug.
Once you’ve invested so much time in an issue, the victory is so much sweeter.
Use it…don’t use it :)