ploeh blog

Accountability and free speech

Tuesday, 09 February 2021 07:53:00 UTC

A most likely naive suggestion.

A few years ago, my mother (born 1940) went to Paris on vacation with her older sister. She was a little concerned if she'd be able to navigate the Métro, so she asked me to print out all sorts of maps in advance. I wasn't keen on another encounter with my nemesis, the printer, so I instead showed her how she could use Google Maps to get on-demand routes. Google Maps now include timetables and line information for many metropolitan subway lines around the world. I've successfully used it to find my way around London, Paris, New York, Tokyo, Sydney, and Melbourne.

It even works in my backwater home-town:

Snapshot of Copenhagen Metro route generated by Google Maps.

It's rapidly turning into indispensable digital infrastructure, and that's beginning to trouble me.

You can still get paper maps of the various rapid transit systems around the world, but for how long?

Digital infrastructure #

If you're old enough, you may remember phone books. In the days of land lines and analogue phones, you'd look up a number and then dial it. As the internet became increasingly ubiquitous, phone directories went online as well. Paper phone books were seen as waste, and ultimately disappeared from everyday life.

You can think of paper phone books as a piece of infrastructure that's now gone. I don't miss them, but it's worth reflecting on what impact their disappearing has. Today, if I need to find a phone number, Google is often the place I go. The physical infrastructure is gone, replaced by a digital infrastructure.

Google, in particular, now provides important infrastructure for modern society. Not only web search, but also maps, video sharing, translation, emails, and much more.

Other companies offer other services that verge on being infrastructure. Facebook is more than just updates to friends and families. Many organisations, including schools and universities, coordinate activities via Facebook, and the political discourse increasingly happens there and on Twitter.

We've come to rely on these 'free' services to a degree that resembles our reliance on physical infrastructure like the power grid, running water, roads, telephones, etcetera.

TANSTAAFL #

Robert A. Heinlein coined the phrase There ain't no such thing as a free lunch (TANSTAAFL) in his excellent book The Moon is a Harsh Mistress. Indeed, the digital infrastructure isn't free.

Recent events have magnified some of the cost we're paying. In January, Facebook and Twitter suspended the then-president of the United States from the platforms. I've little sympathy for Donald Trump, who strikes me as an uncouth narcissist, but since I'm a Danish citizen living in Denmark, I don't think that I ought to have much of an opinion about American politics.

I do think, on the other hand, that the suspension sets an uncomfortable precedent.

Should we let a handful of tech billionaires control essential infrastructure? This time, the victim was someone half the world was glad to see go, but who's next? Might these companies suspend the accounts of politicians who work against them?

Never in the history of the world has decision power over so many people been so concentrated. I'll remind my readers that Facebook, Twitter, Google, etcetera are in worldwide use. The decisions of a handful of majority shareholders can now affect billions of people.

If you're suspended from one of these platforms, you may lose your ability to participate in school activities, or from finding your way through a foreign city, or from taking part of the democratic discussion.

The case for regulation #

In this article, I mainly want to focus on the free-speech issue related mostly to Facebook and Twitter. These companies make money from ads. The longer you stay on the platform, the more ads they can show you, and they've discovered that nothing pulls you in like anger. These incentives are incredibly counter-productive for society.

Users are implicitly encouraged to create or spread anger, yet few are held accountable. One reason is that you can easily create new accounts without disclosing your real-world identity. This makes it difficult to hold users accountable for their utterances.

Instead of clear rules, users are suspended for inexplicable and arbitrary reasons. It seems that these are mostly the verdicts of algorithms, but as we've seen in the case of Donald Trump, it can also be the result of an undemocratic, but political decision.

Everyone can lose their opportunities for self-expression for arbitrary reasons. This is a free-speech issue.

Yes, free speech.

I'm well aware that Facebook and Twitter are private companies, and no-one has any right to an account on those platforms. That's why I started this essay by discussing how these services are turning into infrastructure.

More than a century ago, the telephone was new technology operated by private companies. I know the Danish history best, but it serves well as an example. KTAS, one of the world's first telephone companies, was a private company. Via mergers and acquisitions, it still is, but as it grew and became the backbone of the country's electronic communications network, the government introduced legislation. For decades, it and its sister companies were regional monopolies.

The government allowed the monopoly in exchange for various obligations. Even when the monopoly ended in 1996, the original monopoly companies inherited these legal obligations. For instance, every Danish citizen has the right to get a land-line installed, even if that installation by itself is a commercial loss for the company. This could be the case on certain remote islands or other rural areas. The Danish state compensates the telephone operator for this potential loss.

Many other utility companies run in a similar fashion. Some are semi-public, some are private, but common to water, electricity, garbage disposal, heating, and other operators is that they are regulated by government.

When a utility becomes important to the functioning of society, a sensible government steps in to regulate it to ensure the further smooth functioning of society.

I think that the services offered by Google, Facebook, and Twitter are fast approaching a level of significance that ought to trigger government regulation.

I don't say that lightly. I'm actually quite libertarian in my views, but I'm no anarcho-libertarian. I do acknowledge that a state offers essential services such as protection of property rights, a judicial system, and so on.

Accountability #

How should we regulate social media? I think we should start by exchanging accountability for the right to post.

Let's take another step back for a moment. For generations, it's been possible to send a letter to the editor of a regular newspaper. If the editor deems it worthy for publication, it'll be printed in the next issue. You don't have any right to get your letter published; this happens at the discretion of the editor.

Why does it work that way? It works that way because the editor is accountable for what's printed in the paper. Ultimately, he or she can go to jail for what's printed.

Freedom of speech is more complicated than it looks at first glance. For instance, censorship in Denmark was abolished with the 1849 constitution. This doesn't mean that you can freely say whatever you'd like; it only means that government has no right to prevent you from saying or writing something. You can still be prosecuted after the fact if you say or write something libellous, or if you incite violence, or if you disclose secrets that you've agreed to keep, etcetera.

This is accountability. It works when the person making the utterance is known and within reach of the law.

Notice, particularly, that an editor-in-chief is accountable for a newspaper's contents. Why isn't Facebook or Twitter accountable for content?

These companies have managed to spin a story that they're platforms rather than publishers. This argument might have had some legs ten years ago. When I started using Twitter in 2009, the algorithm was easy to understand: My feed showed the tweets from the accounts that I followed, in the order that they were published. This was easy to understand, and Twitter didn't, as far as I could tell, edit the feed. Since no editing took place, I find the platform argument applicable.

Later, the social networks began editing the feeds. No humans were directly involved, but today, some posts are amplified while others get little attention. Since this is done by 'algorithms' rather than editors, the companies have managed to convince lawmakers that they're still only platforms. Let's be real, though. Apparently, the public needs a programmer to say the following, and I'm a programmer.

Programmers, employees of those companies, wrote the algorithms. They experimented and tweaked those algorithms to maximise 'engagement'. Humans are continually involved in this process. Editing takes place.

I think it's time to stop treating social media networks as platforms, and start treating them as publishers.

Practicality #

Facebook and Twitter will protest that it's practically impossible for them to 'edit' what happens on the networks. I find this argument unconvincing, since editing already takes place.

I'd like to suggest a partial way out, though. This brings us back to regulation.

I think that each country or jurisdiction should make it possible for users to opt in to being accountable. Users should be able to verify their accounts as real, legal persons. If they do that, their activity on the platform should be governed by the jurisdiction in which they reside, instead of by the arbitrary and ill-defined 'community guidelines' offered by the networks. You'd be accountable for your utterances according to local law where you live. The advantage, though, is that this accountability buys you the right to be present on the platform. You can be sued for your posts, but you can't be kicked off.

I'm aware that not everyone lives in a benign welfare state such as Denmark. This is why I suggest this as an option. Even if you live in a state that regulates social media as outlined above, the option to stay anonymous should remain. This is how it already works, and I imagine that it should continue to work like that for this group of users. The cost of staying anonymous, though, is that you submit to the arbitrary and despotic rules of those networks. For many people, including minorities and citizens of oppressive states, this is likely to remain a better trade-off.

Problems #

This suggestion is in no way perfect. I can already identify one problem with it.

A country could grant its citizens the right to conduct infowar on an adversary; think troll armies and the like. If these citizens are verified and 'accountable' to their local government, but this government encourages rather than punishes incitement to violence in a foreign country, then how do we prevent that?

I have a few half-baked ideas, but I'd rather leave the problem here in the hope that it might inspire other people to start thinking about it, too.

Conclusion #

This is a post that I've waited for a long time for someone else to write. If someone already did, I'm not aware of it. I can think of three reasons:

It's a really stupid idea.
It's a common idea, but no-one talks about it because of pluralistic ignorance.
It's an original idea that no-one else have had.

I don't believe much in the last option, and I'm afraid that the first is the most likely. I've no desire to look stupid, but on the other hand, I can't help keep thinking about this.

The idea is, in short, to make it optional for users to 'buy' the right to stay on a social network for the price of being held legally accountable. This requires some national as well as international regulation of the digital infrastructure.

Could it work? I don't know, but isn't it worth discussing?

Comments

Jboy Flaga #

Hi Mark, I would just like to say that I like the idea :)

Thank you for writing this.

2021-02-18 13:34 UTC

ASP.NET POCO Controllers: an experience report

Monday, 01 February 2021 08:10:00 UTC

Controllers don't have to derive from a base class.

In most tutorials about ASP.NET web APIs you'll be told to let Controller classes derive from a base class. It may be convenient if you believe that productivity is measured by how fast you can get an initial version of the software into production. Granted, sometimes that's the case, but usually there's a price to be paid. Did you produce legacy code in the process?

One common definition of legacy code is that it's code without tests. With ASP.NET I've repeatedly found that when Controllers derive from ControllerBase they become harder to unit test. It may be convenient to have access to the Url and User properties, but this tends to make the arrange phase of a unit test much more complex. Not impossible; just more complex than I like. In short, inheriting from ControllerBase just to get access to, say, Url or User violates the Interface Segregation Principle. That ControllerBase is too big a dependency for my taste.

I already use self-hosting in integration tests so that I can interact with my REST APIs via HTTP. When I want to test how my API reacts to various HTTP-specific circumstances, I do that via integration tests. So, in a recent code base I decided to see if I could write an entire REST API in ASP.NET Core without inheriting from ControllerBase.

The short answer is that, yes, this is possible, and I'd do it again, but you have to jump through some hoops. I consider that hoop-jumping a fine price to pay for the benefits of simpler unit tests and (it turns out) better separation of concerns.

In this article, I'll share what I've learned. The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

POCO Controllers #

Just so that we're on the same page: A POCO Controller is a Controller class that doesn't inherit from any base class. In my code base, they're defined like this:

[Route("")]
public sealed class HomeController

[Authorize(Roles = "MaitreD")]
public class ScheduleController

As you can tell, they don't inherit from ControllerBase or any other base class. They are annotated with attributes, like [Route], [Authorize], or [ApiController]. Strictly speaking, that may disqualify them as true POCOs, but in practice I've found that those attributes don't impact the sustainability of the code base in a negative manner.

Dependency Injection is still possible, and in use:

[ApiController]
public class ReservationsController
{
    public ReservationsController(
        IClock clock,
        IRestaurantDatabase restaurantDatabase,
        IReservationsRepository repository)
    {
        Clock = clock;
        RestaurantDatabase = restaurantDatabase;
        Repository = repository;
    }
 
    public IClock Clock { get; }
    public IRestaurantDatabase RestaurantDatabase { get; }
    public IReservationsRepository Repository { get; }
 
    // Controller actions and other members go here...

I consider Dependency Injection nothing but an application of polymorphism, so I think that still fits the POCO label. After all, Dependency Injection is just argument-passing.

HTTP responses and status codes #

The first thing I had to figure out was how to return various HTTP status codes. If you just return some model object, the default status code is 200 OK. That's fine in many situations, but if you're implementing a proper REST API, you should be using headers and status codes to communicate with the client.

The ControllerBase class defines many helper methods to return various other types of responses, such as BadRequest or NotFound. I had to figure out how to return such responses without access to the base class.

Fortunately, ASP.NET Core is now open source, so it isn't too hard to look at the source code for those helper methods to see what they do. It turns out that they are just discoverable methods that create and return various result objects. So, instead of calling the BadRequest helper method, I can just return a BadRequestResult:

return new BadRequestResult();

Some of the responses were a little more involved, so I created my own domain-specific helper methods for them:

private static ActionResult Reservation201Created(int restaurantId, Reservation r)
{
    return new CreatedAtActionResult(
        actionName: nameof(Get),
        controllerName: null,
        routeValues: new { restaurantId, id = r.Id.ToString("N") },
        value: r.ToDto());
}

Apparently, the controllerName argument isn't required, so should be null. I had to experiment with various combinations to get it right, but I have self-hosted integration tests that cover this result. If it doesn't work as intended, tests will fail.

Returning this CreatedAtActionResult object produces an HTTP response like this:

HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8
Location: https://example.net:443/restaurants/2112/reservations/276d124f20cf4cc3b502f57b89433f80

{
  "id": "276d124f20cf4cc3b502f57b89433f80",
  "at": "2022-01-14T19:45:00.0000000",
  "email": "donkeyman@example.org",
  "name": "Don Keyman",
  "quantity": 4
}

I've edited and coloured the response for readability. The Location URL actually also includes a digital signature, which I've removed here just to make the example look a little prettier.

In any case, returning various HTTP headers and status codes is less discoverable when you don't have a base class with all the helper methods, but once you've figured out the objects to return, it's straightforward, and the code is simple.

Generating links #

A true (level 3) REST API uses hypermedia as the engine of application state; that is, links. This means that a HTTP response will typically include a JSON or XML representation with several links. This article includes several examples.

ASP.NET provides IUrlHelper for exactly that purpose, and if you inherit from ControllerBase the above-mentioned Url property gives you convenient access to just such an object.

When a class doesn't inherit from ControllerBase, you don't have convenient access to an IUrlHelper. Then what?

It's possible to get an IUrlHelper via the framework's built-in Dependency Injection engine, but if you add such a dependency to a POCO Controller, you'll have to jump through all sorts of hoops to configure it in unit tests. That was exactly the situation I wanted to avoid, so that got me thinking about design alternatives.

That was the real, underlying reason I came up with the idea to instead add REST links as a cross-cutting concern. The Controllers are wonderfully free of that concern, which helps keeping the complexity down of the unit tests.

I still have test coverage of the links, but I prefer testing HTTP-related behaviour via the HTTP API instead of relying on implementation details:

[Theory]
[InlineData("Hipgnosta")]
[InlineData("Nono")]
[InlineData("The Vatican Cellar")]
public async Task RestaurantReturnsCorrectLinks(string name)
{
    using var api = new SelfHostedApi();
    var client = api.CreateClient();
 
    var response = await client.GetRestaurant(name);
 
    var expected = new HashSet<string?>(new[]
    {
        "urn:reservations",
        "urn:year",
        "urn:month",
        "urn:day"
    });
    var actual = await response.ParseJsonContent<RestaurantDto>();
    var actualRels = actual.Links.Select(l => l.Rel).ToHashSet();
    Assert.Superset(expected, actualRels);
    Assert.All(actual.Links, AssertHrefAbsoluteUrl);
}

This test verifies that the representation returned in response contains four labelled links. Granted, this particular test only verifies that each link contains an absolute URL (which could, in theory, be http://www.example.com), but the whole point of fit URLs is that they should be opaque. I've other tests that follow links to verify that the API affords the desired behaviour.

Authorisation #

The final kind of behaviour that caused me a bit of trouble was authorisation. It's easy enough to annotate a Controller with an [Authorize] attribute, which is fine as long all you need is role-based authorisation.

I did, however, run into one situation where I needed something closer to an access control list. The system that includes all these code examples is a multi-tenant restaurant reservation system. There's one protected resource: a day's schedule, intended for use by the restaurant's staff. Here's a simplified example:

GET /restaurants/2112/schedule/2021/2/23 HTTP/1.1
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInCI6IkpXVCJ9.eyJ...

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{
  "name": "Nono",
  "year": 2021,
  "month": 2,
  "day": 23,
  "days": [
    {
      "date": "2021-02-23",
      "entries": [
        {
          "time": "19:45:00",
          "reservations": [
            {
              "id": "2c7ace4bbee94553950afd60a86c530c",
              "at": "2021-02-23T19:45:00.0000000",
              "email": "anarchi@example.net",
              "name": "Ann Archie",
              "quantity": 2
            }
          ]
        }
      ]
    }
  ]
}

Since this resource contains personally identifiable information (email addresses) it's protected. You have to present a valid JSON Web Token with required claims. Role claims, however, aren't enough. A minimum bar is that the token must contain a sufficient role claim like "MaitreD" shown above, but that's not enough. This is, after all, a multi-tenant system, and we don't want one restaurant's MaitreD to be able to see another restaurant's schedule.

If ASP.NET can address that kind of problem with annotations, I haven't figured out how. The Controller needs to check an access control list against the resource being accessed. The above-mentioned User property of ControllerBase would be really convenient here.

Again, there are ways to inject an entire ClaimsPrincipal class into the Controller that needs it, but once more I felt that that would violate the Interface Segregation Principle. I didn't need an entire ClaimsPrincipal; I just needed a list of restaurant IDs that a particular JSON Web Token allows access to.

As is often my modus operandi in such situations, I started by writing the code I wished to use, and then figured out how to make it work:

[HttpGet("restaurants/{restaurantId}/schedule/{year}/{month}/{day}")]
public async Task<ActionResult> Get(int restaurantId, int year, int month, int day)
{
    if (!AccessControlList.Authorize(restaurantId))
        return new ForbidResult();
 
    // Do the real work here...

AccessControlList is a Concrete Dependency. It's just a wrapper around a collection of IDs:

public sealed class AccessControlList
{
    private readonly IReadOnlyCollection<int> restaurantIds;

I had to create a new class in order to keep the built-in ASP.NET DI Container happy. Had I been doing Pure DI I could have just injected IReadOnlyCollection<int> directly into the Controller, but in this code base I used the built-in DI Container, which I had to configure like this:

services.AddHttpContextAccessor();
services.AddTransient(sp => AccessControlList.FromUser(
    sp.GetService<IHttpContextAccessor>().HttpContext.User));

Apart from having to wrap IReadOnlyCollection<int> in a new class, I found such an implementation preferable to inheriting from ControllerBase. The Controller in question only depends on the services it needs:

public ScheduleController(
    IRestaurantDatabase restaurantDatabase,
    IReservationsRepository repository,
    AccessControlList accessControlList)

The ScheduleController uses its restaurantDatabase dependency to look up a specific restaurant based on the restaurantId, the repository to read the schedule, and accessControlList to implement authorisation. That's what it needs, so that's its dependencies. It follows the Interface Segregation Principle.

The ScheduleController class is easy to unit test, since a test can just create a new AccessControlList object whenever it needs to:

[Fact]
public async Task GetScheduleForAbsentRestaurant()
{
    var sut = new ScheduleController(
        new InMemoryRestaurantDatabase(Some.Restaurant.WithId(2)),
        new FakeDatabase(),
        new AccessControlList(3));
 
    var actual = await sut.Get(3, 2089, 12, 9);
 
    Assert.IsAssignableFrom<NotFoundResult>(actual);
}

This test requests the schedule for a restaurant with the ID 3, and the access control list does include that ID. The restaurant, however, doesn't exist, so despite correct permissions, the expected result is 404 Not Found.

Conclusion #

ASP.NET has supported POCO Controllers for some time now, but it's clearly not a mainstream scenario. The documentation and Visual Studio tooling assumes that your Controllers inherit from one of the framework base classes.

You do, therefore, have to jump through a few hoops to make POCO Controllers work. The result, however, is more lightweight Controllers and better separation of concerns. I find the jumping worthwhile.

Comments

Dave Shaw #

Hi Mark, as I read the part about BadRequest and NotFound being less discoverable, I wondered if you had considered creating a ControllerHelper (terrible name I know) class with BadRequest, etc?
Then by adding using static Resteraunt.ControllerHelper as a little bit of boilerplate code to the top of each POCO controller, you would be able to do return BadRequest(); again.

2021-02-03 22:00 UTC

Mark Seemann #

Dave, thank you for writing. I hadn't thought of that, but it's a useful idea. Someone would have to figure out how to write such a helper class, but once it exists, it would offer the same degree of discoverability. I'd suggest a name like Result or HttpResult, so that you could write e.g. Result.BadRequest().

Wouldn't adding a static import defy the purpose, though? As I see it, C# discoverability is enabled by 'dot-driven development'. Don't you lose that by a static import?

2021-02-04 7:18 UTC

Self-hosted integration tests in ASP.NET

Monday, 25 January 2021 07:45:00 UTC

A way to self-host a REST API and test it through HTTP.

In 2020 I developed a sizeable code base for an online restaurant REST API. In the spirit of outside-in TDD, I found it best to test the HTTP behaviour of the API by actually interacting with it via HTTP.

Sometimes ASP.NET offers more than one way to achieve the same end result. For example, to return 200 OK, you can use both OkObjectResult and ObjectResult. I don't want my tests to be coupled to such implementation details, so by testing an API via HTTP instead of using the ASP.NET object model, I decouple the two.

You can easily self-host an ASP.NET web API and test it using an HttpClient. In this article, I'll show you how I went about it.

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

Reserving a table #

In true outside-in fashion, I'll first show you the test. Then I'll break it down to show you how it works.

[Fact]
public async Task ReserveTableAtNono()
{
    using var api = new SelfHostedApi();
    var client = api.CreateClient();
    var at = DateTime.Today.AddDays(434).At(20, 15);
    var dto = Some.Reservation.WithDate(at).WithQuantity(6).ToDto();
 
    var response = await client.PostReservation("Nono", dto);
 
    await AssertRemainingCapacity(client, at, "Nono", 4);
    await AssertRemainingCapacity(client, at, "Hipgnosta", 10);
}

This test uses xUnit.net 2.4.1 to make a reservation at the restaurant named Nono. The first line that creates the api variable spins up a self-hosted instance of the REST API. The next line creates an HttpClient configured to communicate with the self-hosted instance.

The test proceeds to create a Data Transfer Object that it posts to the Nono restaurant. It then asserts that the remaining capacity at the Nono and Hipgnosta restaurants are as expected.

You'll see the implementation details soon, but I first want to discuss this high-level test. As is the case with most of my code, it's far from perfect. If you're not familiar with this code base, you may have plenty of questions:

Why does it make a reservation 434 days in the future? Why not 433, or 211, or 1?
Is there anything significant about the quantity 6?
Why is the expected remaining capacity at Nono 4?
Why is the expected remaining capacity at Hipgnosta 10? Why does it even verify that?

To answer the easiest question first: There's nothing special about 434 days. The only requirement is that it's a positive number, so that the reservation is in the future. That makes this test a great candidate for a property-based test.

The three other questions are all related. A bit of background is in order. I wrote this test during a process where I turned the system into a multi-tenant system. Before that change, there was only one restaurant, which was Hipgnosta. I wanted to verify that if you make a reservation at another restaurant (here, Nono) it changes the observable state of that restaurant, and not of Hipgnosta.

The way these two restaurants are configured, Hipgnosta has a single communal table that seats ten guests. This explains why the expected capacity of Hipgnosta is 10. Making a reservation at Nono shouldn't affect Hipgnosta.

Nono has a more complex table configuration. It has both standard and communal tables, but the largest table is a six-person communal table. There's only one table of that size. The next-largest tables are four-person tables. Thus, a reservation for six people reserves the largest table that day, after which only four-person and two-person tables are available. Therefore the remaining capacity ought to be 4.

The above test knows all this. You are welcome to criticise such hard-coded knowledge. There's a real risk that it might make it more difficult to maintain the test suite in the future.

Certainly, had this been a unit test, and not an integration test, I wouldn't have accepted so much implicit knowledge - particularly because I mostly apply functional architecture, and pure functions should have isolation. Functions shouldn't depend on implicit global state; they should return a value based on input arguments. That's a bit of digression, though.

These are integration tests, which I mostly use for smoke tests and to verify HTTP-specific behaviour. I have unit tests for fine-grained testing of edge cases and variations of input. While I wouldn't accept so much implicit knowledge from a unit test, I find that it so far works well with integration tests.

Self-hosting #

It only takes a WebApplicationFactory to self-host an ASP.NET API. You can use it directly, but if you want to modify the hosted service in some way, you can also inherit from it.

I want my self-hosted integration tests to run as state-based tests that use an in-memory database instead of SQL Server. I've defined SelfHostedApi for that purpose:

internal sealed class SelfHostedApi : WebApplicationFactory<Startup>
{
    protected override void ConfigureWebHost(IWebHostBuilder builder)
    {
        builder.ConfigureServices(services =>
        {
            services.RemoveAll<IReservationsRepository>();
            services.AddSingleton<IReservationsRepository>(new FakeDatabase());
        });
    }
}

The way that WebApplicationFactory works, its ConfigureWebHost method runs after the Startup class' ConfigureServices method. Thus, when ConfigureWebHost runs, the services collection is already configured to use SQL Server. As Julius H so kindly pointed out to me, the RemoveAll extension method removes all existing registrations of a service. I use it to remove the SQL Server dependency from the system, after which I replace it with a test-specific in-memory implementation.

Since the in-memory database is configured with Singleton lifetime, that instance is going to be around for the lifetime of the service. While it's only keeping track of things in memory, it'll keep state until the service shuts down, which happens when the above api variable goes out of scope.

Notice that I named the class by the role it plays rather than which base class it derives from.

Posting a reservation #

The PostReservation method is an extension method on HttpClient:

internal static async Task<HttpResponseMessage> PostReservation(
    this HttpClient client,
    string name,
    object reservation)
{
    string json = JsonSerializer.Serialize(reservation);
    using var content = new StringContent(json);
    content.Headers.ContentType.MediaType = "application/json";
 
    var resp = await client.GetRestaurant(name);
    resp.EnsureSuccessStatusCode();
    var rest = await resp.ParseJsonContent<RestaurantDto>();
    var address = rest.Links.FindAddress("urn:reservations");
 
    return await client.PostAsync(address, content);
}

It's part of a larger set of methods that enables an HttpClient to interact with the REST API. Three of those methods are visible here: GetRestaurant, ParseJsonContent, and FindAddress. These, and many other, methods form a client API for interacting with the REST API. While this is currently test code, it's ripe for being extracted to a reusable client SDK library.

I'm not going to show all of them, but here's GetRestaurant to give you a sense of what's going on:

internal static async Task<HttpResponseMessage> GetRestaurant(this HttpClient client, string name)
{
    var homeResponse = await client.GetAsync(new Uri("", UriKind.Relative));
    homeResponse.EnsureSuccessStatusCode();
    var homeRepresentation = await homeResponse.ParseJsonContent<HomeDto>();
    var restaurant = homeRepresentation.Restaurants.First(r => r.Name == name);
    var address = restaurant.Links.FindAddress("urn:restaurant");
 
    return await client.GetAsync(address);
}

The REST API has only a single documented address, which is the 'home' resource at the relative URL ""; i.e. the root of the API. In this incarnation of the API, the home resource responds with a JSON array of restaurants. The GetRestaurant method finds the restaurant with the desired name and finds its address. It then issues another GET request against that address, and returns the response.

Verifying state #

The verification phase of the above test calls a private helper method:

private static async Task AssertRemainingCapacity(
    HttpClient client,
    DateTime date,
    string name,
    int expected)
{
    var response = await client.GetDay(name, date.Year, date.Month, date.Day);
    var day = await response.ParseJsonContent<CalendarDto>();
    Assert.All(
        day.Days.Single().Entries,
        e => Assert.Equal(expected, e.MaximumPartySize));
}

It uses another of the above-mentioned client API extension methods, GetDay, to inspect the REST API's calendar entry for the restaurant and day in question. Each calendar contains a series of time entries that lists the largest party size the restaurant can accept at that time slot. The two restaurants in question only have single seatings, so once you've booked a six-person table, you have it for the entire evening.

Notice that verification is done by interacting with the system itself. No Back Door Manipulation is required. I favour this if at all possible, since I believe that it offers better confidence that the system behaves as it should.

Conclusion #

It's been possible to self-host .NET REST APIs for testing purposes at least since 2012, but it's only become easier over the years. All you need to get started is the WebApplicationFactory<TEntryPoint> class, although you're probably going to need a derived class to override some of the system configuration.

From there, you can interact with the self-hosted system using the standard HttpClient class.

Since I configure these tests to run on an in-memory database, the execution time is comparable to 'normal' unit tests. I admit that I haven't measured it, but that's because I haven't felt the need to do so.

Comments

Grzegorz Gałęzowski #

Hi Mark, 2 questions from me:

What kind of integration does this test verify? The integration between the HTTP request processing part and the service logic? If so, why fake the database only and not the whole logic?
If you fake the database here, then would you have a separate test for testing integration with DB (e.g. some kind of "adapter test", as described in the GOOS book)?

2021-01-25 20:34 UTC

Mark Seemann #

Grzegorz, thank you for writing.

1. As I already hinted at, the motivation for this test was that I was expanding the code from a single-tenant to a multi-tenant system. I did this in small steps, using tests to drive the augmented behaviour. Before I added the above test, I'd introduced the concept of a restaurant ID to identify each restaurant, and initially added a method overload to my data access interface that required such an ID:

Task Create(int restaurantId, Reservation reservation);

I was using the Strangler pattern to be able to move in small steps. One step was to add overloads like the above. Subsequent steps would be to move existing callers from the 'legacy' overload to the new overload, and then finally delete the 'legacy' overload.

When moving from a single-tenant system to a multi-tenant system, I had to grandfather in the existing restaurant, which I arbitrarily chose to give the restaurant ID 1. During this process, I kept the existing system working so as to not break existing clients.

I gradually introduced method overloads that took a restaurant ID as a parameter (as above), and then deleted the legacy methods that didn't take a restaurant ID. In order to not break anything, I'd initially be calling the new methods with the hard-coded restaurant ID 1.

The above test was one in a series of tests that followed. Their purpose was to verify that the system had become truly multi-tenant. Before I added that test, an attempt to make a reservation at Nono would result in a call to Create(1, reservation) because there was still hard-coded restaurant IDs left in the code base.

(To be honest, the above is a simplified account of events. It was actually more complex than that, since it involved hypermedia controls. I also, for reasons that I may divulge in the future, didn't perform this work behind a feature flag, which under other circumstances would have been appropriate.)

What the above test verifies, then, is that the reservation gets associated with the correct restaurant, instead of the restaurant that was grandfathered in (Hipgnosta). An in-memory (Fake) database was sufficient to demonstrate that.

Why didn't I use the real, relational database? Read on.

2. As I've recently discussed, integration tests that involve SQL Server are certainly possible. They tend to be orders-of-magnitudes slower than unit tests that run entirely in memory, so I only add them if I deem them necessary.

I interact with databases according to the Dependency Inversion Principle. The above Create method is an example of that. The method is defined by the needs of the client code rather than on implementation details.

The actual implementation of the data access interface is, as you imply, an Adapter. It adapts the SQL Server SDK (i.e. ADO.NET) to my data access interface. As long as I can keep such Adapters Humble Objects I don't cover them by tests.

Such Adapters are often just mappings: This class field maps to that table column, that field to that other column, and so on. If there's no logic, there isn't much to test.

This doesn't mean that bugs can't appear in database Adapters. If that happens, I may introduce regression tests that involve the database. I prefer to do this in a separate test suite, however, since such tests tend to be slow, and TDD test suites should be fast.

2021-01-26 8:32 UTC

Brian Elgaard Bennett #

Hi Mark, I have been a fan of "integration" style testing since I found Andrew Locks's post on the topic. In fact, I personally find that it makes sense to take that idea to one extreme.

As far as I can see, your test assumes something like "Given that no reservations have been made". Did you consider to make that "Given" explicitly recognizable in your test?

I am asking because this is a major pet topic of mine and I would love to hear your thoughts. I am not suggesting writing Gherkin instead of C# code, just if and how you would make the "Given" assumption explicit.

A more practical comment is on your use of services.RemoveAll<IReservationsRepository>(); It seems extreme to remove all, as all you want to achieve is to overwrite a few service declarations. Andrew Lock demonstrates how to keep all the ASP.NET middleware intact by keeping most of the service registrations.

I usually use ConfigureTestServices on WebApplicationFactory, something like,

factory.WithWebHostBuilder(builder => builder.ConfigureTestServices(MyTestCompositionRoot.Initialize));

2021-02-22 10:40 UTC

Mark Seemann #

Brian, thank you for writing. You're correct that one implicit assumption is that no prior reservations have been made to Nono on that date. The reality is, as you suggest, that there are no reservations at all, but the test doesn't require that. It only relies on the weaker assumption that there are prior reservations to neither Nono nor Hipgnosta on the date in question.

I often find it tricky to make 'negative' assumptions explicit. How much should one highlight? Is it also relevant to highlight that SelfHostedApi isn't going to send notification emails? That it doesn't write to a persistent log? That it doesn't have the capability to launch missiles?

I'm not saying that it's irrelevant to highlight certain assumptions, but I'm not sure that I have a clear heuristic for it. After all, it depends on what the reader cares about, and how clear the assumptions are.

If it became clear to me that the assumption of 'no prior reservations' was important to the reader, I might consider to embed that in the name of the test, or perhaps by adding a comment. Another option is to hide the initialisation of SelfHostedApi behind a required factory method so that you'd have to initialise it as using var api = SelfHostedApi.StartEmpty();

In this particular code base, though, that SelfHostedApi is used repeatedly. A programmer who regularly works with that code base will quickly internalise the assumptions embedded in that self-hosted service.

Granted, it can be a dangerous strategy to rely too much on implicit assumptions, because it can make it more difficult to change things if those assumptions change in the future. I do, after all, favour the wisdom from the Zen of Python: Explicit is better than implicit.

Making such things as we discuss here explicit does, however, tend to make the tests more verbose. Pragmatically, then, there's a balance to strike. I'm not claiming that the test shown here is perfect. In a a living code base, this would be a concern that I would keep a constant eye on. I'd fiddle with it, trying out making things more explicit, less explicit, and so on, to see what works best in context.

Your other point about services.RemoveAll<IReservationsRepository>() I don't understand. Why is that extreme? It's literally the most minimal, precise change I can make. It removes only the IReservationsRepository, of which there's exactly one Singleton instance:

var connStr = Configuration.GetConnectionString("Restaurant");
services.AddSingleton<IReservationsRepository>(sp =>
{
    var logger = sp.GetService<ILogger<LoggingReservationsRepository>>();
    var postOffice = sp.GetService<IPostOffice>();
    return new EmailingReservationsRepository(
        postOffice,
        new LoggingReservationsRepository(
            logger,
            new SqlReservationsRepository(connStr)));
});

The integration test throws away that Singleton object graph, but nothing else.

2021-02-23 9:38 UTC

Brian Elgaard Bennett #

Hi Mark, thanks for a thorough answer.

Alas, my second question was not thought through - I somehow thought you threw away all service registrations, which you obviously do not. In fact, I do exactly what you do, usually with Replace and thereby implicitly assume that there is only one registration. My Bad.

We very much agree in the importance of favouring explicit assumptions.

The solution to stating 'negative' assumptions, as long as we talk about the 'initial context' (the 'Given') is actually quite simple if your readers know that you always state assumptions explicitly. If none are stated, it's the same as saying 'the world is empty' - at least when it comes to the bounded context of whatever you are testing. I find that the process of identifying a minimal set of assumptions per test is a worthwhile exercise.

So, IMHO it boils down to consistently stating all assumptions and making the relevant bounded context clear.

2021-02-23 12:18 UTC

Mark Seemann #

Brian, thank you for writing. I agree that it's a good rule of thumb to expect the world to be empty unless explicitly stated otherwise. This is the reason that I love functional programming.

One question, however, is whether a statement is sufficiently explicit. The integration test shown here is a good example. What's actually happening is that SelfHostedApi sets up a self-hosted service configured in a certain way. It contains three restaurants, of which two are relevant in this particular test. Each restaurant comes with quite a bit of configuration: opening time, closing time, seating duration, and the configuration of tables. Just consider the configuration of Nono:

{
  "Id": 2112,
  "Name": "Nono",
  "OpensAt": "18:00",
  "LastSeating": "21:00",
  "SeatingDuration": "6:00",
  "Tables": [
    {
      "TableType": "Communal",
      "Seats": 6
    },
    {
      "TableType": "Communal",
      "Seats": 4
    },
    {
      "TableType": "Standard",
      "Seats": 2
    },
    {
      "TableType": "Standard",
      "Seats": 2
    },
    {
      "TableType": "Standard",
      "Seats": 4
    },
    {
      "TableType": "Standard",
      "Seats": 4
    }
  ]
}

The code base also enables a test writer to express the above configuration as code instead of JSON, but the point I'm trying to make is that you probably don't want that amount of detail to appear in the Arrange phase of each test, regardless of how explicit it'd be.

To work around an issue like this, you might instead define an immutable test data variable called nono. Then, in each test that uses these restaurants, you might include something like AddRestaurant(nono) or AddRestaurant(hipgnosta) in the Arrange phase (hypothetical API). That's not quite as explicit, because it hides the actual values away, but is probably still good enough.

Then you find yourself always adding the same restaurants, so you say to yourself: It'd be nice if I had a helper method like AddStandardRestaurants. So, you may add such a helper method.

But you find that for the integration tests, you always call that helper method, so ultimately, you decide to just roll it into the definition of SelfHostedApi. That's basically what's going on here.

I don't recall whether I explicitly stated the following in the article, but I use integration tests consistent with the test pyramid. The purpose of the integration tests is to verify that components integrate correctly, and that the HTTP API behaves according to contract. In my experience, I can let the 'givens' be somewhat implicit and still keep the code maintainable. For integration tests, that is.

On the unit level, I favour pure functions, which are intrinsically testable. Due to isolation, everything that affects the behaviour of a pure function must be explicitly passed as arguments, and as a bi-product, they become explicit as part of a test's Arrange phase.

But, again, let me reiterate that I agree with you. I don't claim that the code shown here is perfect. It's so implicit that it makes me uncomfortable too, but on the other hand, I think that a test like the above communicates intent well. The more explicit details one adds, the more they may drown out the intent. It's a difficult balance to strike.

2021-02-25 12:34 UTC

Brian Elgaard Bennett #

Hi Mark, I'm happy that we agree. Still, this is a pet topic of mine so please allow me one additional comment.

I wish all code had at least a functional core, as that would make testing so much easier. You could say that explicitly stating initial context resembles testing pure functions.

I do believe that it is possible to state initial context without being overwhelmed by details. It can by no means be fully automated - this is where the mind of a tester must be used. Also, it will never be perfect, but at least it can be documented somewhat systematically and kept under version control, so the reasoning can be challenged.

In your example, I would not add Nono or Hipgnosta or a standard restaurant, as those names say noting about what makes my test pass.

Rather, I would add e.g. a "restaurant with a 6 and a 4 person table", as this seems to be what makes your test pass, if I understood it correctly. In another test I might have "restaurant which is open from 6 PM with last seating at 9 PM" if I want to test time aspects of booking. It may be the same Json file used in those two tests, because what's most important is that you document what makes each individual test pass.

However, if it's not too much trouble (it sometimes is) I would even try to minimize the initial context. For example, I would test timing aspects with a minimal restaurant with a single table. Unless, timing and tables interact - maybe late bookings are OK if the restaurant is starved for customers? This is where the "tester mind" comes in handy.

In your restaurant example, that would mean having quite a few restaurant definitions, but only definitions which include the needed context for each test. My experience is that minimizing the initial context will teach you a lot about the code. This is similar to unit tests which focus on a small part of code, but "minimizing initial context" allows for less fragile tests than when "minimizing code".

We started using this approach because our tests did not give sufficient confidence. Tests were flaky and it was difficult to see what was tested and why. I can safely say that this is much better now.

My examples of initial context above may appear like they could grow out of control, but our experience so far is that it is possible to find a relatively small set of equivalence classes of initial context items, even for fairly complex functionality.

2021-02-28 15:24 UTC

Mark Seemann #

Brian, I agree that in unit testing, being able to reduce test cases to minimal examples is useful. Once you have those minimal examples, being explicit about all preconditions is more attainable. This is why, when I coach teams, I try to teach them about equivalence class partitioning. If, within each equivalence class, you can impose some sort of (partial) ordering, you may be able to pick a canonical representation that constitutes the minimal example.

This is basically what the shrinking process of property-based testing frameworks attempt to do.

Thus, when my testing goal is to explore the boundaries of the system under test, I go to great lengths to ensure that the arrange phase is as explicit as possible. I recently published an example of how I use property-based testing to such an end.

The goal of the integration test in the present article, however, is not to explore boundary cases or verify the correctness of algorithms in question. I have unit tests for that. The goal is to examine whether things may be correctly 'clicked together'.

Expressing an explicit minimal context for the above test is more involved than you outline. Having two restaurants, with two differently sized tables, is only the beginning. You must also ensure that they are open at the same time, so that an attempted reservation would fit both. Furthermore, you must then pick the date and time of the reservation to fit within that time window. This is still not hopelessly difficult, and might make for a good exercise, but it could easily add 4-5 extra lines of code to the test.

And again: had this been a unit test, I'd happily added those extra lines, but for an integration test, I felt that it was less important.

2021-03-02 11:24 UTC

Parametrised test primitive obsession code smell

Monday, 18 January 2021 06:30:00 UTC

Watch out for this code smell with some unit testing frameworks.

In a previous article you saw this state-based integration test:

[Theory]
[InlineData(1049, 19, 00, "juliad@example.net", "Julia Domna", 5)]
[InlineData(1130, 18, 15, "x@example.com", "Xenia Ng", 9)]
[InlineData( 956, 16, 55, "kite@example.edu", null, 2)]
[InlineData( 433, 17, 30, "shli@example.org", "Shanghai Li", 5)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    int days,
    int hours,
    int minutes,
    string email,
    string name,
    int quantity)
{
    var at = DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    var db = new FakeDatabase();
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    var dto = new ReservationDto
    {
        Id = "B50DF5B1-F484-4D99-88F9-1915087AF568",
        At = at.ToString("O"),
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        Guid.Parse(dto.Id),
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        new Email(dto.Email),
        new Name(dto.Name ?? ""),
        dto.Quantity);
    Assert.Contains(expected, db.Grandfather);
}

This was the test after I improved it. Still, I wasn't satisfied with it. It has several problems. Take a few moments to consider it. Can you identify any problems? Which ones?

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

Size #

I know that you're not familiar with all the moving parts. You don't know how ReservationDto or Reservation are implemented. You don't know what InMemoryRestaurantDatabase is, or how ReservationsController behaves. Still, the issues I have in mind aren't specific to a particular code base.

I feel that the method is verging on being too big. Quantifiably, it doesn't fit in an 80x24 box, but that's just an arbitrary threshold anyway. Still, I think it's grown to a size that makes me uncomfortable.

If you aren't convinced, think of this code example as a stand-in for something larger. In the above test, a reservation contains five smaller values (Id, At, Email, Name, and Quantity). How would a similar test look if the object in question contains ten or twenty values?

In the decades I've been programming and consulting, I've seen plenty of code bases. Data objects made from twenty fields are hardly unusual.

What would a similar test look like if the dto and the expected object required twenty smaller values?

The test would be too big.

Primitive obsession #

A test like this one contains a mix of essential behaviour and implementation details. The behaviour that it verifies is that when you Post a valid dto, the data makes it all the way to the database.

Exactly how the dto or the expected value are constructed is less relevant for the test. Yet it's intermingled with the test of behaviour. The signal-to-noise ratio in the test isn't all that great. What can you do to improve things?

As given, it seems difficult to do much. The problem is primitive obsession. While this is a parametrised test, all the method parameters are primitives: integers and strings. The makes it hard to introduce useful abstractions.

In C# (and probably other languages as well) parametrised tests often suffer from primitive obsession. The most common data source is an attribute (AKA annotation), like xUnit.net's [InlineData] attribute. This isn't a limitation of xUnit.net, but rather of .NET attributes. You can only create attributes with primitive values and arrays.

What is a limitation of xUnit.net (and the other mainstream .NET testing frameworks, as far as I know) is that tests aren't first-class values. In Haskell, by contrast, it's easy to write parametrised tests using the normal language constructs exactly because tests are first-class values. (I hope that the next version of xUnit.net will support tests as first-class values.)

Imagine that instead of only five constituent fields, you'd have to write a parametrised test for objects with twenty primitive values. As long as you stick with attribute-based data sources, you'll be stuck with primitive values.

Granted, attributes like [InlineData] are lightweight, but over the years, my patience with them has grown shorter. They lock me into primitive obsession, and I don't appreciate that.

Essential test #

While tests as first-class values aren't an option in xUnit.net, you can provide other data sources for the [Theory] attribute than [InlineData]. It's not as lightweight, but it breaks the primitive obsession and re-enables normal code design techniques. It enables you to reduce the test itself to its essence. You no longer have to think in primitives, but can instead express the test unshackled by constraints. As a first pass, I'd like the test to look like this:

[Theory]
[ClassData(typeof(PostValidReservationWhenDatabaseIsEmptyTestCases))]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    ReservationDto validDto, Reservation expected)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    await sut.Post(validDto);
 
    Assert.Contains(expected, db.Grandfather);
}

This version of the test eliminates the noise. How validDto and expected are constructed is an implementation detail that has little bearing on the behaviour being tested.

For a reader of the code, it's should now be clearer what's at stake here: If you Post a validDto the expected reservation should appear in the database.

Reducing the test code to its essentials made me realise something that hitherto had escaped me: that I could name the DTO by role. Instead of just dto, I could call the parameter validDto.

Granted, I could also have done that previously, but I didn't think of it. There's was so much noise in that test that I didn't stop to consider whether dto sufficiently communicated the role of that variable.

The less code, the easier it becomes to think such things through, I find.

In any case, the test code now much more succinctly expresses the essence of the desired behaviour. Notice how I started my refactoring by writing the desired test code. I've yet to implement the data source. Now that the data source expresses test data as full objects, I'm not so concerned with whether or not that's going to be possible. Of course it's possible.

Object data source #

You can define data sources for xUnit.net as classes or methods. In C# I usually reach for the [ClassData] option, since an object (in C#, that is) gives me better options for further decomposition. For example, I can define a class and delegate the details to helper methods:

private class PostValidReservationWhenDatabaseIsEmptyTestCases :
    TheoryData<ReservationDto, Reservation>
{
    public PostValidReservationWhenDatabaseIsEmptyTestCases()
    {
        AddWithName(1049, 19, 00, "juliad@example.net", "Julia Domna", 5);
        AddWithName(1130, 18, 15, "x@example.com", "Xenia Ng", 9);
        AddWithoutName(956, 16, 55, "kite@example.edu", 2);
        AddWithName(433, 17, 30, "shli@example.org", "Shanghai Li", 5);
    }
 
    // More members here...

Here, I'm taking advantage of xUnit.net's built-in TheoryData<T1, T2> base class, but that's just a convenience. All you have to do is to implement IEnumerable<object[]>.

As you can see, the constructor adds the four test cases by calling two private helper methods. Here's the first of those:

private const string id = "B50DF5B1-F484-4D99-88F9-1915087AF568";
 
private void AddWithName(
    int days,
    int hours,
    int minutes,
    string email,
    string name,
    int quantity)
{
    var at =
        DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    Add(new ReservationDto
        {
            Id = id,
            At = at.ToString("O"),
            Email = email,
            Name = name,
            Quantity = quantity
        },
        new Reservation(
            Guid.Parse(id),
            at,
            new Email(email),
            new Name(name),
            quantity));
}

The other helper method is almost identical, although it has a slight variation when it comes to the reservation name:

private void AddWithoutName(
    int days,
    int hours,
    int minutes,
    string email,
    int quantity)
{
    var at =
        DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    Add(new ReservationDto
        {
            Id = id,
            At = at.ToString("O"),
            Email = email,
            Quantity = quantity
        },
        new Reservation(
            Guid.Parse(id),
            at,
            new Email(email),
            new Name(""),
            quantity));
}

In total, this refactoring results in more code, so how is this an improvement?

The paradox of decomposition #

In object-oriented design, decomposition tends to lead to more code. If you want to isolate and make reusable a particular piece of behaviour, you'll usually introduce an interface or a base class. Even stateless functions need a static class to define them. (To be fair, functional programming isn't entirely devoid of such overhead associated with decomposition, but the cost tends to smaller.) This leads to more code, compared with the situation before decomposition.

This is a situation you may also encounter if you attempt to refactor to design patterns, or follow the SOLID principles. You'll have more code than when you started. This often leads to resistance to such 'code bloat'.

It's fine to resist code bloat. It's also fine to dislike 'complexity for complexity's sake'. Try to evaluate each potential code change based on advantages and disadvantages. I'm not insisting that the above refactoring is objectively better. I did feel, however, that I had a problem that I ought to address, and that this was a viable alternative. The result is more code, but each piece of code is smaller and simpler.

You can, conceivably, read the test method itself to get a feel for what it tests, even if you don't know all the implementation details. You can read the four statements in the PostValidReservationWhenDatabaseIsEmptyTestCases constructor without, I hope, understanding all the details about the two helper methods. And you can read AddWithName without understanding how AddWithoutName works, and vice versa, because these two methods don't depend on each other.

Conclusion #

In this article, I've described how the use of code annotations for parametrised tests tend to pull in the direction of primitive obsession. This is a force worth keeping an eye on, I think.

You saw how to refactor to class-based test data generation. This enables you to use objects instead of primitives, thus opening your design palette. You can now use all your usual object-oriented or functional design skills to factor the code in a way that's satisfactory.

Was it worth it in this case? Keep in mind that the original problem was already marginal. While the code didn't fit in a 80x24 box, it was only 33 lines of code (excluding the test data). Imagine, however, that instead of a five-field reservation, you'd be dealing with a twenty-field data class, and such a refactoring begins to look more compelling.

Is the code now perfect? It still isn't. I'm a little put off by the similarity of AddWithName and AddWithoutName. I'm also aware that there's a trace of production code duplicated in the test case, in the way that the test code duplicates how a valid ReservationDto relates to a Reservation. I'm on the fence whether I should do anything about this.

At the moment I'm inclined to heed the rule of three. The duplication is still too insubstantial to warrant refactoring, but it's worth keeping an eye on.

Comments

Mateusz #

Hi Mark, fair point! Indeed, sometimes [InlineData] is too simple and having test data creation moved to a separate class with all its advantages is a valid approach (using f.e. [ClassData]).

I just wanted to clarify on TheoryData class, I noticed that it was added in xUnit v3. But I can't find this new version anywhere on NuGet - are you using it from some early adopter feed? Thanks!

2021-05-22 12:37 UTC

Mark Seemann #

Mateusz, thank you for writing. The code shown in this article uses xUnit.net 2.4.1, which (apparently) includes the TheoryData base classes. I must admit that I don't know exactly when they were added to the library.

2021-05-23 18:39 UTC

Waiting to happen

Monday, 11 January 2021 06:31:00 UTC

A typical future test maintenance problem.

In a recent article I showed a unit test and parenthetically mentioned that it might have a future maintenance problem. Here's a more recent version of the same test. Can you tell what the future issue might be?

[Theory]
[InlineData("2023-11-24 19:00", "juliad@example.net", "Julia Domna", 5)]
[InlineData("2024-02-13 18:15", "x@example.com", "Xenia Ng", 9)]
[InlineData("2023-08-23 16:55", "kite@example.edu", null, 2)]
[InlineData("2022-03-18 17:30", "shli@example.org", "Shanghai Li", 5)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    string at, string email, string name, int quantity)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    var dto = new ReservationDto
    {
        Id = "B50DF5B1-F484-4D99-88F9-1915087AF568",
        At = at,
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        Guid.Parse(dto.Id),
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        new Email(dto.Email),
        new Name(dto.Name ?? ""),
        dto.Quantity);
    Assert.Contains(expected, db.Grandfather);
}

To be honest, there's more than one problem with this test, but presently I'm going to focus on one of them.

Since you don't know the details of the implementation, you may not be able to tell what the problem might be. It's not a trick question. On the other hand, you might still be able to guess, just from the clues available in the above code listing.

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

Sooner or later #

Here are some clues to consider: I'm writing this article in the beginning of 2021. Consider the dates supplied via the [InlineData] attributes. Seen from 2021, they're all in the future.

Notice, as well, that the sut takes a SystemClock dependency. You don't know the SystemClock class (it's a proprietary class in this code base), but from the name I'm sure that you can imagine what it represents.

From the perspective of early 2021, all dates are going to be in the future for more than a year. What is going to happen, though, once the test runs after March 18, 2022?

That test case is going to fail.

You can't tell from the above code listing, but the system under test rejects reservations in the past. Once March 18, 2022 has come and gone, the reservation at "2022-03-18 17:30" is going to be in the past. The sut will reject the reservation, and the assertion will fail.

You have to be careful with tests that rely on the system clock.

Test Double? #

The fundamental problem is that the system clock is non-deterministic. A typical reaction to non-determinism in unit testing is to introduce a Test Double of some sort. Instead of using the system clock, you could use a Stub as a stand-in for the real time.

This is possible here as well. The ReservationsController class actually depends on an IClock interface that SystemClock implements. You could define a test-specific ConstantClock implementation that would always return a constant date and time. This would actually work, but would rely on an implementation detail.

At the moment, the ReservationsController only calls Clock.GetCurrentDateTime() a single time to get the current time. As soon as it has that value, it passes it to a pure function, which implements the business logic:

var now = Clock.GetCurrentDateTime();
if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
    return NoTables500InternalServerError();

A ConstantClock would work, but only as long as the ReservationsController only calls Clock.GetCurrentDateTime() once. If it ever began to call this method multiple times to detect the passing of time, using a constant time value would mostly likely again break the test. This seems brittle, so I don't want to go that way.

Relative time #

Working with the system clock in automated tests is easier if you deal with relative time. Instead of defining the test cases as absolute dates, express them as days into the future. Here's one way to refactor the test:

[Theory]
[InlineData(1049, 19, 00, "juliad@example.net", "Julia Domna", 5)]
[InlineData(1130, 18, 15, "x@example.com", "Xenia Ng", 9)]
[InlineData( 956, 16, 55, "kite@example.edu", null, 2)]
[InlineData( 433, 17, 30, "shli@example.org", "Shanghai Li", 5)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    int days,
    int hours,
    int minutes,
    string email,
    string name,
    int quantity)
{
    var at = DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
    var db = new FakeDatabase();
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    var dto = new ReservationDto
    {
        Id = "B50DF5B1-F484-4D99-88F9-1915087AF568",
        At = at.ToString("O"),
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        Guid.Parse(dto.Id),
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        new Email(dto.Email),
        new Name(dto.Name ?? ""),
        dto.Quantity);
    Assert.Contains(expected, db.Grandfather);
}

The absolute dates always were fairly arbitrary, so I just took the current date and converted the dates to a number of days into the future. Now, the first test case will always be a date 1,049 days (not quite three years) into the future, instead of November 24, 2023.

The test is no longer a failure waiting to happen.

Conclusion #

Treating test cases that involve time and date as relative to the current time, instead of as absolute values, is usually a good idea if the system under test depends on the system clock.

It's always a good idea to factor as much code as you can as pure functions, like the above WillAccept method. Pure functions don't depend on the system clock, so here you can safely pass absolute time and date values. Pure functions are intrinsically testable.

Still, as the test pyramid suggests, relying exclusively on unit tests isn't a good idea. The test shown in this article isn't really a unit test, but rather a state-based integration test. It relies on both the system clock and a Fake database. Expressing the test cases for this test as relative time values effectively addresses the problem introduced by the system clock.

There are plenty of other problems with the above test. One thing that bothers me is that the 'fix' made the line count grow. It didn't quite fit into a 80x24 box before, but now it's even worse! I should do something about that, but that's a topic for another article.

Dynamic test oracles for rho problems

Monday, 04 January 2021 06:26:00 UTC

A proof of concept of cross-branch testing for compiled languages.

Hillel Wayne recently published an article called Cross-Branch Testing. It outlines an approach to a class of problems that are hard to test. He mentions computer vision and simulations, among others. I can add that it's also difficult to write intuitive tests of convex hulls and Conway's game of life.

Hillel Wayne calls these rho problems, 'just because'. I'm totally going to run with that term.

In the article, he outlines an approach where you test an iteration of rho code against a 'last known good' snapshot. He uses git worktree to set up a snapshot of the reference implementation. He then writes a property that compares the refactored code's behaviour against the reference.

The example code is in Python, which is a language that I don't know. As far as I can tell, it works because Python is 'lightweight' enough that you can load and execute source code directly. I found that the approach makes much sense, but I wondered how it would apply for statically typed, compiled languages. I decided to create a proof of concept in F#.

Test cases from Python #

My first problem was to port Hillel Wayne's example rho problem to F#. The function f doesn't have any immediate mathematical properties; nor is its behaviour intuitive. While I think that I understand what each line of code in f means, I don't really know Python. Since one of the properties of rho problems is that bugs can be subtle, I didn't trust myself to be able to port the Python code to F# without some test cases.

To solve that problem, I first found an online Python interpreter and pasted the f function into it. I then wrote code to print the output of a function call:

print(f'1, 2, 3, { f(1, 2, 3) }')

This line of code produces this output:

1, 2, 3, True

In other words, I could produce comma-separated values of input and actual output.

Hillel Wayne wrote properties using Hypothesis, which, it seems, by default runs each property 200 times.

In F# I'm going to use FsCheck, so I first used F# Interactive with FsCheck to produce 200 Python print statements like the above:

> Arb.Default.Int32().Generator
|> Gen.three
|> Gen.map (fun (x, y, z) -> sprintf "print(f'%i, %i, %i, { f(%i, %i, %i) }')" x y z x y z)
|> Gen.sample 100 200
|> List.iter (printfn "%s");;
print(f'-77, 67, 84, { f(-77, 67, 84) }')
print(f'58, -46, 3, { f(58, -46, 3) }')
print(f'21, 13, 94, { f(21, 13, 94) }')
...

This is a throwaway data pipeline that starts with an FsCheck integer generator, creates a triple from it, turns that triple into a Python print statement, and finally writes 200 of those to the console. The above code listing only shows the first three lines of output, while the rest are indicated by an ellipsis.

I copied those 200 print statements over to the online Python interpreter and ran the code. That produced 200 comma-separated values like these:

-77, 67, 84, False
58, -46, 3, False
21, 13, 94, True
...

These can serve as test cases for porting the Python code to F#.

Port to F# #

The next step is to write a parametrised test, using a provisional implementation of f:

[<Theory; MemberData(nameof fTestCases)>]
let ``test f`` x y z expected = expected =! f x y z

This test uses xUnit.net 2.4.1 and Unquote 5.0.0. As you can tell, apart from the annotations, it's a true one-liner. It calls the f function with the three supplied arguments x, y, and z and compares the return value with the expected value.

The code uses the new nameof feature of F# 5. fTestCases is a function in the same module that holds the test:

// unit -> seq<obj []>
let fTestCases () =
    use strm = typeof<Anchor>.Assembly.GetManifestResourceStream streamName
    use rdr = new StreamReader (strm)
    let s = rdr.ReadToEnd ()
    s.Split Environment.NewLine |> Seq.map csvToTestCase

It reads an embedded resource stream of test cases, like the above comma-separated values. Even though the values are in a text file, it's easier to embed the file in the test assembly, because it nicely dispenses with the problem of copying a text file to the appropriate output directory when the code compiles. That would, however, be an valid alternative.

Anchor is a dummy type to support typeof, and streamName is just a string constant that identifies the name of the stream.

The csvToTestCase function converts each line of comma-separated values to test cases for the [<Theory>] attribute:

// string -> obj []
let csvToTestCase (csv : string) =
    let values = csv.Split ','
    [|
        values.[0] |> Convert.ToInt32 |> box
        values.[1] |> Convert.ToInt32 |> box
        values.[2] |> Convert.ToInt32 |> box
        values.[3] |> Convert.ToBoolean |> box
    |]

It's not the safest code I could write, but this is, after all, a proof of concept.

The most direct port of the Python code I could produce is this:

// f : int -> int -> int -> bool
let f (x : int) (y : int) (z : int) =
    let mutable mx = bigint x
    let mutable my = bigint y
    let mutable mz = bigint z
    let mutable out = 0I
    for i in [0I..9I] do
        out <- out * mx + abs (my * mz - i * i)
        let x' = mx
        let y' = my
        let z' = mz
        mx <- y' + 1I
        my <- z'
        mz <- x'
    abs out % 100I < 10I

As F# code goes, it's disagreeable, but it passes all 200 test cases, so this will serve as an initial implementation. The out variable can grow to values that overflow even 64-bit integers, so I had to convert to bigint to get all test cases to pass.

If I make the same mutation to the code that Hillel Wayne did (abs out % 100I < 9I) two test cases fail. This gives me some confidence that I have a degree of problem coverage comparable to his.

Test oracle #

Now that a reference implementation exists, we can use it as a test oracle for refactorings. You can, for example, add a little test-only utility to your program portfolio:

open Prod
open FsCheck
 
[<EntryPoint>]
let main argv =
    Arb.Default.Int32().Generator
    |> Gen.three
    |> Gen.sample 100 200
    |> List.iter (fun (x, y, z) -> printfn "%i, %i, %i, %b" x y z (f x y z))
 
    0 // return an integer exit code

Notice that the last step in the pipeline is to output the values of each x, y, and z, as well as the result of calling f x y z.

This is a command-line executable that uses FsCheck to produce new test cases by calling the f function. It looks similar to the above one-off script that produced Python code, but this one instead just produces comma-separated values. You can run it from the command line to produce a new sample of test cases:

$ ./foracle
29, -48, -78, false
-8, -25, 13, false
-74, 34, -68, true
...

As above, I've used an ellipsis to indicate that in reality, 200 lines of comma-separated values scroll by.

When you use Bash, you can even pipe the output straight to a file:

$ ./foracle > csv.txt

You can now take the new comma-separated values and update the test values that the above test f test uses.

In other words, you use version n of f as a test oracle for version n + 1. When iteration n + 1 is a function of iteration n, you have a so-called dynamic system, so I think that we can call this technique dynamic test oracles.

The above foracle program is just a proof of concept. You could make it more flexible by making it take command-line arguments that would let you control the sample size and FsCheck's size parameter (the hard-coded 100 in the above code listing).

Refactoring #

With the confidence instilled by the test cases, we can now refactor the f function:

// f : int -> int -> int -> bool
let f (x : int) (y : int) (z : int) =
    let imp (x, y, z, out) i =
        let out = out * x + abs (y * z - i * i)
        y + 1I, z, x, out
    let (_, _, _, out) = List.fold imp (bigint x, bigint y, bigint z, 0I) [0I..9I]
    abs out % 100I < 10I

Instead of all those mutable variables, the function is, after all, just a left fold. Phew, I feel better now.

Conclusion #

This article demonstrated a proof of concept where you use a known good version (n) of the code as a test oracle for the next version (n + 1). In interpreted languages, you may be able to load two versions of the code base side by side, but that's rarely practical in a statically typed compiled language like F#. Instead, I used a utility program to generate test cases that can be used as a data source for a parametrised test.

The example rho problem takes only integers as input, and returns a simple Boolean value, so in this case it's trivial to encode each test case as a line of comma-separated values. For (real) problems that may involve more complex types, it'd be better to use another serialisation format, such as JSON or XML.

An outstanding issue is whether it's possible to implement shrinking behaviour when tests fail. Currently, the proof of concept just uses a set of serialised test cases. Normally, when a property-based testing framework like FsCheck detects a counter-example, it'll shrink the counter-example to values that are easier to understand than the original. This proof of concept doesn't do that. I'm not sure if a framework like FsCheck currently contains enough extensibility points to enable that sort of behaviour. I'll leave that question open for any reader interested in taking on that problem.

Comments

Alex Nguyen #

Hi Mark! Thanks for another thought provoking post.

I believe you and Hillel are writing characterization tests, which you've mentioned in the past. Namely, you're both using the behavior of existing code to verify the correctness of a refactor. The novel part to me is that Hillel is using code as the test oracle. Your solution serializes the oracle to a static file. The library I use for characterization tests (ApprovalTests) does this as well.

I believe shrinking is impossible when the oracle is a static file. However with Hillel's solution the oracle may be consulted at any time, making shrinking viable. If only there was a practical way to combine the two...

2021-01-06 23:01 UTC

Tyson Williams #

A thought provoking post indeed!

In F# I'm going to use FsCheck...

I think that is a fine choice given the use case laid out in this post. In general though, I think Hedgehog is a better property-based testing library. Its killer feature is integrated shrinking, which means that all generators can also shrink and this extra power is essentially free.

For the record (because this can be a point of confusion), Haskell has QuickCheck and (Haskell) Hedgehog while F# has ports from Haskell called FsCheck and (FSharp) Hedgehog.

Jacob Stanley gave this excellent talk at YOW! Lambda Jam 2017 that explains the key idea that allows Hedgehog to have integrated shrinking. (Spoiler: A generic type that is invariant in its only type parameter is replaced by a different generic type that is monadic in its only type parameter. API design guided by functional programming for the win!)

Normally, when a property-based testing framework like FsCheck detects a counter-example, it'll shrink the counter-example to values that are easier to understand than the original.

In my experience, property-based tests written with QuickCheck / FsCheck do not normally shrink. I think this is because of the extra work required to enable shrinking. For an anecdotal example, consider this post by Fraser Tweedale. He believed that it would be faster to add (Haskell) Hedgehog as a dependency and create a generator for it than to add shrinking to his existing generator in QuickCheck.

In other words, you use version n of f as a test oracle for version n + 1. When iteration n + 1 is a function of iteration n, you have a so-called dynamic system, so I think that we can call this technique dynamic test oracles.

I am confused by this paragraph. I interpret your word "When" at the start of the second sentence as a common-language synonym for the mathematical word "If". Here is roughly how I understand that paragraph, where A stands for "version / iteration n of f" and B stands for "version / iteration n + 1 of f". "A depends on B. If B depends on A, then we have a dynamic system. Therefore, we have a dynamic system." I feel like the paragraph assumes (because it is obvious?) that version / iteration n + 1 of f depends on version / iteration n of f. In what sense is that the case?

An outstanding issue is whether it's possible to implement shrinking behaviour when tests fail. [...] I'll leave that question open for any reader interested in taking on that problem.

I am interested!

Quoting Mark and then Alex.

Hillel Wayne [...] outlines an approach where you test an iteration of rho code against a 'last known good' snapshot. He uses git worktree to set up a snapshot of the reference implementation. He then writes a property that compares the refactored code's behaviour against the reference.

The example code is in Python, which is a language that I don't know. As far as I can tell, it works because Python is 'lightweight' enough that you can load and execute source code directly. I found that the approach makes much sense, but I wondered how it would apply for statically typed, compiled languages. I decided to create a proof of concept in F#.

I believe shrinking is impossible when the oracle is a static file. However with Hillel's solution the oracle may be consulted at any time, making shrinking viable.

I want to start by elaborating on this to make sure we are all on the same page. I think of shrinking as involving two parts. On the one hand, we have the "shrink tree", which contains the values to test during the shrinking process. On the other hand, for each input tested, we need to know if the output should cause the test to pass or fail.

With Hedgehog, getting a shrink tree would not be too difficult. For a generator with type parameter 'a, the current generator API returns a "random" shrink tree of type 'a in which the root is an instance a of the type 'a and the tree completely depends on a. It should be easy to expose an additional function that accepts inputs of type 'a Gen and 'a and returns the tree with the given 'a as its root.

The difficult part is being able to query the test oracle. As Mark said, this seems easy to do in a dynamically-typed language like Python. In contrast, the fundamental issue with a statically-typed language like F# is that the compiled code exists in an assembly and only one assembly of a given name can be loaded in a given process at the same time.

This leads me to two ideas for workarounds. First, we could query the test oracle in a different process. I imagine an entry point could be generated that gives direct access to the test oracle. Then the test process could query the test oracle by executing this generated process. Second, we could generate a different assembly that exposes the test oracle. Then the test process could load this generated assembly to query the test oracle. The second approach seems like it would have a faster query time but be harder to implement. The first approach seems easier to implement but would probably have a slower query time. Maybe the query time would be fast enough though, especially if it was only queried when shrinking.

But given such a solution, who wants to restrict access to the test oracle only to shrinking? If the test oracle is always available, then there is no need to store input-output pairs. Instead of always checking that the system under test works correctly for a previously selected set of inputs, the property-based test can check that the system under test has the expected behavior for a unique set of inputs each time the property-based test is executed. In my experience, this is the default behavior of a property-based test.

One concern that some people might have is the idea of checking into the code repository the binary containing the test oracle. My first though is that the size of this is likely so small that it does not matter. My second thought is that the binary containing the test oracle does not have to be included in the repository. Instead, the workflow could be to (1) create the property-based test that uses the compiled test oracle, (2) refactor the system under test, (3) observe that the property-based test still passes, (4) commit the refactored code, and (5) discard the remaining changes, which will delete the property-based test and the compiled test oracle.

Instead of completely removing that property-based test, it might be better to leave it there with input-output pairs stored in a file. Then the conversion from that state of the property-based test to the one that uses the compiled test oracle will be much smaller.

2021-01-07 19:27 UTC

Mark Seemann #

Alex, thank you for writing. Yes, I think that calling this a Characterisation Test is correct. I wasn't aware of the ApprovalTests library; thank you for mentioning it.

When I originally wrote the article, I was under the impression that shrinking might still be possible. I admit, though, that I hadn't thought things through. I think that Tyson Williams argues convincingly that this isn't possible.

2021-01-15 13:42 UTC

Mark Seemann #

Tyson, thank you for writing. I'm well aware of Hedgehog, and I'm keen on the way it works. I rarely use it, however, as it so far doesn't quite seem to have the same degree of 'industrial strength' to it that FsCheck has. Additionally, I find that shrinking is less important in practice than it might seem in theory.

I'm not sure that I understand your confusion about the term dynamic. You write:

"A depends on B."

Why do you write that? I don't think, in the way you've labelled iterations, that A depends on B.

When it comes to shrinking, I think that you convincingly argues that it can't be done unless one is able to query the oracle. As long as all you have is a list of test cases, you can't do that... unless, perhaps, you were to also generate and run all the shrunk test cases when you capture the list of test cases... Again, I haven't thought this through, so there may be some obvious gotcha that I'm missing.

I would be wary of trying to host the previous iteration in a different process. This is technically possible, but, in .NET at least, quite cumbersome. You'll have to deal with data marshalling and lifetime management of the second process. It was difficult enough in .NET framework back when remoting was a thing; I'm not even sure how one would go about such a problem in .NET Core - particularly if you want it to work on both Windows, Linux, and Mac. HTTP?

2021-01-16 13:24 UTC

Tyson Williams #

[Hedgehog] so far doesn't quite seem to have the same degree of 'industrial strength' to it that FsCheck has.

That seems reasonable. I haven't used FsCheck, so I wouldn't know myself. Several of us are making many great improvements to F# Hedgehog right now.

When it comes to shrinking, I think that you convincingly argues that it can't be done unless one is able to query the oracle. As long as all you have is a list of test cases, you can't do that... unless, perhaps, you were to also generate and run all the shrunk test cases when you capture the list of test cases... Again, I haven't thought this through, so there may be some obvious gotcha that I'm missing.

That would be too many test cases. The shrinking process finds two values n and n+1 such that the test passes for n and fails for n+1. This can be viewed as a constraint. The objective is to minimize the value of n. The only way to ensure that some value is the minimum is to test all values smaller than it. However, doing so is not practical. Property-based tests uses randomness precisely because it is not practical to test every possible value.

Instead, the shrinking process uses binary search as a heuristic to find a value satisfying the constraint that is rather small but not necessarily the smallest.

Why do you write that? I don't think, in the way you've labelled iterations, that A depends on B.

Ok. I will go slower and ask smaller questions.

When iteration n + 1 is a function of iteration n [...]

Does this phrase have the same meaning if "When" is replaced by "If"?

In other words, you use version n of f as a test oracle for version n + 1. When iteration n + 1 is a function of iteration n, you have a so-called dynamic system, so I think that we can call this technique dynamic test oracles.

I understand how version n of f is being used as a test oracle for version n + 1. In this blog post, in what sense is something of iteration n + 1 is a function of iteration n?

2021-01-30 16:36 UTC

Mark Seemann #

Tyson, thank you for writing.

"Does this phrase have the same meaning if "When" is replaced by "If"?"

I'm not sure that there's a big difference, but then, I don't know how you parse that. As Kevlin Henney puts it,

"The act of describing a program in unambiguous detail and the act of programming are one and the same."

Kevlin Henney

It seems to me that you're attempting to parse my prose as though it was an unambiguous description, which it can't be.

A dynamic system is a system such that x_t+1 = f(x_t), where x_t is the value of x at time t, and x_t+1 is the value of x at time t+1. For simplicity, this is the definition of a dynamic system in discrete time. Mathematically, you can also express it in continuous time using calculus.

2021-02-02 6:46 UTC

Tyson Williams #

It seems to me that you're attempting to parse my prose as though it was an unambiguous description, which it can't be.

Oh, yes. My mistake. I meant to phrase in slightly differently thereby changing it from essentially an impossible question to one that only you can answer. Here is what I meant to ask.

Does this phrase have the same meaning to you if "When" is replaced by "If"?

No matter though. I simply misunderstood your description / defintion of a dynamical system. I understand now. Thanks for your patience and willingness to explain it to me again.

2021-03-25 03:47 UTC

An F# demo of validation with partial data round trip

Monday, 28 December 2020 09:22:00 UTC

An F# port of the previous Haskell proof of concept.

This article is part of a short article series on applicative validation with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that was valid.

In the previous article you saw a Haskell proof of concept that demonstrated how to compose the appropriate applicative functor with a suitable semigroup to make validation work as desired. In this article, you'll see how to port that proof of concept to F#.

Data definitions #

Like in the previous article, we're going to need some types. These are essentially direct translations of the corresponding Haskell types:

type Input = { Name : string option; DoB : DateTime option; Address : string option}
type ValidInput = { Name : string; DoB : DateTime; Address : string }

The Input type plays the role of the input we'd like to validate, while ValidInput presents validated data.

If you're an F# fan, you can bask in the reality that F# records are terser than Haskell records. I like both languages, so I have mixed feelings about this.

Computation expression #

Haskell's main workhorse is its type class system. F# doesn't have that, but it has computation expressions, which in F# 5 got support for applicative functors. That's just what we need, and it turns out that there isn't a lot of code we have to write to make all of this work.

To recap from the Haskell proof of concept: We need a Result-like container that returns a tuple for errors. One element of the tuple should be a an endomorphism, which forms a monoid (and therefore also a semigroup). The other element should be a list of error messages - another monoid. In F# terms we'll write it as (('b -> 'b) * 'c list).

That's a tuple, and since tuples form monoids when their elements do the Error part of Result supports accumulation.

To support an applicative computation expression, we're going to need a a way to merge two results together. This is by far the most complicated piece of code in this article, all six lines of code:

module Result =
    // Result<'a       ,(('b -> 'b) * 'c list)> ->
    // Result<'d       ,(('b -> 'b) * 'c list)> ->
    // Result<('a * 'd),(('b -> 'b) * 'c list)>
    let merge x y =
        match x, y with
        | Ok xres, Ok yres -> Ok (xres, yres)
        | Error (f, e1s), Error (g, e2s)  -> Error (f >> g, e2s @ e1s)
        | Error e, Ok _ -> Error e
        | Ok _, Error e -> Error e

The merge function composes two input results together. The results have Ok types called 'a and 'd, and if they're both Ok values, the return value is an Ok tuple of 'a and 'd.

If one of the results is an Error value, it beats an Ok value. The only moderately complex operations is when both are Error values.

Keep in mind that an Error value in this instance contains a tuple of the type (('b -> 'b) * 'c list). The first element is an endomorphism 'b -> 'b and the other element is a list. The merge function composes the endomorphism f and g by standard function composition (the >> operator), and concatenates the lists with the standard @ list concatenation operator.

Because I'm emulating how the original forum post's code behaves, I'm concatenating the two lists with the rightmost going before the leftmost. It doesn't make any other difference than determining the order of the error list.

With the merge function in place, the computation expression is a simple matter:

type ValidationBuilder () =
    member _.BindReturn (x, f) = Result.map f x
    member _.MergeSources (x, y) = Result.merge x y

The last piece is a ValidationBuilder value:

[<AutoOpen>]
module ComputationExpressions =
    let validation = ValidationBuilder ()

Now, whenever you use the validation computation expression, you get the desired functionality.

Validators #

Before we can compose some validation functions, we'll need to have some validators in place. These are straightforward translations of the Haskell validation functions, starting with the name validator:

// Input -> Result<string,((Input -> Input) * string list)>
let validateName ({ Name = name } : Input) =
    match name with
    | Some n when n.Length > 3 -> Ok n
    | Some _ ->
        Error (
            (fun (args : Input) -> { args with Name = None }),
            ["no bob and toms allowed"])
    | None -> Error (id, ["name is required"])

When the name is too short, the endomorphism resets the Name field to None.

The date-of-birth validation function works the same way:

// DateTime -> Input -> Result<DateTime,((Input -> Input) * string list)>
let validateDoB (now : DateTime) ({ DoB = dob } : Input) =
    match dob with
    | Some d when d > now.AddYears -12 -> Ok d
    | Some _ ->
        Error (
            (fun (args : Input) -> { args with DoB = None }),
            ["get off my lawn"])
    | None -> Error (id, ["dob is required"])

Again, like in the Haskell proof of concept, instead of calling DateTime.Now from within the function, I'm passing now as an argument to keep the function pure.

The address validation concludes the set of validators:

// Input -> Result<string,(('a -> 'a) * string list)>
let validateAddress ({ Address = address }: Input) =
    match address with
    | Some a -> Ok a
    | None -> Error (id, ["add1 is required"])

The inferred endomorphism type here is the more general 'a -> 'a, but it's compatible with Input -> Input.

Composition #

All three functions have compatible Error types, so they ought to compose with the applicative computation expression to produce the desired behaviour:

// DateTime -> Input -> Result<ValidInput,(Input * string list)>
let validateInput now args =
    validation {
        let! name = validateName args
        and! dob = validateDoB now args
        and! address = validateAddress args
        return { Name = name; DoB = dob; Address = address }
    }
    |> Result.mapError (fun (f, msgs) -> f args, msgs)

The validation expression alone produces a Result<ValidInput,((Input -> Input) * string list)> value. To get an Input value in the Error tuple, we need to 'run' the Input -> Input endomorphism. The validateInput function does that by applying the endomorphism f to args when mapping the error with Result.mapError.

Tests #

To test that the validateInput works as intended, I first copied all the code from the original forum post. I then wrote eight characterisation tests against that code to make sure that I could reproduce the desired functionality.

I then wrote a parametrised test against the new function:

[<Theory; ClassData(typeof<ValidationTestCases>)>]
let ``Validation works`` input expected =
    let now = DateTime.Now
    let actual = validateInput now input
    expected =! actual

The ValidationTestCases class is defined like this:

type ValidationTestCases () as this =
    inherit TheoryData<Input, Result<ValidInput, Input * string list>> ()

This class produces a set of test cases, where each test case contains an input value and the expected output. To define the test cases, I copied the eight characterisation tests I'd already produced and adjusted them so that they fit the simpler API of the validateInput function. Here's a few examples:

let eightYearsAgo = DateTime.Now.AddYears -8
do this.Add (
    { Name = Some "Alice"; DoB = Some eightYearsAgo; Address = None },
    Error (
        { Name = Some "Alice"; DoB = Some eightYearsAgo; Address = None },
        ["add1 is required"]))
 
do this.Add (
    { Name = Some "Alice"; DoB = Some eightYearsAgo; Address = Some "x" },
    Ok ({ Name = "Alice"; DoB = eightYearsAgo; Address = "x" }))

The first case expects an Error value because the Input value has no address. The other test case expects an Ok value because all input is fine.

I copied all eight characterisation tests over, so now I have those eight tests, as well as the modified eight tests for the applicative-based API shown here. All sixteen tests pass.

Conclusion #

I find this solution to the problem elegant. It's always satisfying when you can implement what at first glance looks like custom behaviour using universal abstractions.

Besides the aesthetic value, I also believe that this keeps a team more productive. These concepts of monoids, semigroups, applicative functors, and so on, are concepts that you only have to learn once. Once you know them, you'll recognise them when you run into them. This means that there's less code to understand.

An ad-hoc implementation as the original forum post suggested (even though it looked quite decent) always puts the onus on a maintenance developer to read and understand even more one-off infrastructure code.

With an architecture based on universal abstractions and well-documented language features, a functional programmer that knows these things will be able to pick up what's going on without much trouble. Specifically, (s)he will recognise that this is 'just' applicative validation with a twist.

This article is the December 28 entry in the F# Advent Calendar in English 2020.

Next: A C# port of validation with partial round trip.

A Haskell proof of concept of validation with partial data round trip

Monday, 21 December 2020 06:54:00 UTC

Which Semigroup best addresses the twist in the previous article?

In this article, I'll show how I did a quick proof of concept in Haskell.

Data definitions #

You can't use the regular Either instance of Applicative for validation because it short-circuits on the first error. In other words, you can't collect multiple error messages, even if the input has multiple issues. Instead, you need a custom Applicative instance. You can easily write such an instance yourself, but there are a couple of libraries that already do this. For this prototype, I chose the validation package.

import Data.Bifunctor
import Data.Time
import Data.Semigroup
import Data.Validation

Apart from importing Data.Validation, I also need a few other imports for the proof of concept. All of them are well-known. I used no language extensions.

For the proof of concept, the input is a triple of a name, a date of birth, and an address:

data Input = Input {
  inputName :: Maybe String,
  inputDoB :: Maybe Day,
  inputAddress :: Maybe String }
  deriving (Eq, Show)

The goal is actually to parse (not validate) Input into a safer data type:

data ValidInput = ValidInput {
  validName :: String,
  validDoB :: Day,
  validAddress :: String }
  deriving (Eq, Show)

If parsing/validation fails, the output should report a collection of error messages and return the Input value with any valid data retained.

Looking for a Semigroup #

My hypothesis was that validation, even with that twist, can be implemented elegantly with an Applicative instance. The validation package defines its Validation data type such that it's an Applicative instance as long as its error type is a Semigroup instance:

Semigroup err => Applicative (Validation err)

The question is: which Semigroup can we use?

Since we need to return both a list of error messages and a modified Input value, it sounds like we'll need a product type of some sorts. A tuple will do; something like (Input, [String]). Is that a Semigroup instance, though?

Tuples only form semigroups if both elements give rise to a semigroup:

(Semigroup a, Semigroup b) => Semigroup (a, b)

The second element of my candidate is [String], which is fine. Lists are Semigroup instances. But what about Input? Can we somehow combine two Input values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.

What we need to do, however, is to take the original Input and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input. Aha! There's the semigroup we need: Endo Input.

So the Semigroup instance we need is (Endo Input, [String]), and the validation output should be of the type Validation (Endo Input, [String]) a.

Validators #

Cool, we can now implement the validation logic; a function for each field, starting with the name:

validateName :: Input -> Validation (Endo Input, [String]) String
validateName (Input (Just name) _ _) | length name > 3 = Success name
validateName (Input (Just _) _ _) =
  Failure (Endo $ \x -> x { inputName = Nothing }, ["no bob and toms allowed"])
validateName _ = Failure (mempty, ["name is required"])

This function reproduces the validation logic implied by the forum question that started it all. Notice, particularly, that when the name is too short, the endomorphism resets inputName to Nothing.

The date-of-birth validation function works the same way:

validateDoB :: Day -> Input -> Validation (Endo Input, [String]) Day
validateDoB now (Input _ (Just dob) _) | addGregorianYearsRollOver (-12) now < dob =
  Success dob
validateDoB _ (Input _ (Just _) _) =
  Failure (Endo $ \x -> x { inputDoB = Nothing }, ["get off my lawn"])
validateDoB _ _ = Failure (mempty, ["dob is required"])

Again, the validation logic is inferred from the forum question, although I found it better keep the function pure by requiring a now argument.

The address validation is the simplest of the three validators:

validateAddress :: Monoid a => Input -> Validation (a, [String]) String
validateAddress (Input _ _ (Just a)) = Success a
validateAddress _ = Failure (mempty, ["add1 is required"])

This one's return type is actually more general than required, since I used mempty instead of Endo id. This means that it actually works for any Monoid a, which also includes Endo Input.

Composition #

All three functions return Validation (Endo Input, [String]), which has an Applicative instance. This means that we should be able to compose them together to get the behaviour we're looking for:

validateInput :: Day -> Input -> Either (Input, [String]) ValidInput
validateInput now args =
  toEither $
  first (first (`appEndo` args)) $
  ValidInput <$> validateName args <*> validateDoB now args <*> validateAddress args

That compiles, so it probably works.

Sanity check #

Still, it'd be prudent to check. Since this is only a proof of concept, I'm not going to set up a test suite. Instead, I'll just start GHCi for some ad-hoc testing:

λ> now <- localDay <&> zonedTimeToLocalTime <&> getZonedTime
λ> validateInput now & Input Nothing Nothing Nothing
Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing},
      ["name is required","dob is required","add1 is required"])
λ> validateInput now & Input (Just "Bob") Nothing Nothing
Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing},
      ["no bob and toms allowed","dob is required","add1 is required"])
λ> validateInput now & Input (Just "Alice") Nothing Nothing
Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing},
      ["dob is required","add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2002 10 12) Nothing
Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing},
      ["get off my lawn","add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) Nothing
Left (Input {inputName = Just "Alice", inputDoB = Just 2012-04-21, inputAddress = Nothing},
      ["add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) (Just "x")
Right (ValidInput {validName = "Alice", validDoB = 2012-04-21, validAddress = "x"})

In order to make the output more readable, I've manually edited the GHCi session by adding line breaks to the output.

It looks like it's working like it's supposed to. Only the last line successfully parses the input and returns a Right value.

Conclusion #

Before I started this proof of concept, I had an inkling of the way this would go. Instead of making the prototype in F#, I found it more productive to do it in Haskell, since Haskell enables me to compose things together. I particularly appreciate how a composition of types like (Endo Input, [String]) is automatically a Semigroup instance. I don't have to do anything. That makes the language great for prototyping things like this.

Now that I've found the appropriate semigroup, I know how to convert the code to F#. That's in the next article.

Next: An F# demo of validation with partial data round trip.

Comments

Tyson Williams #

Great work and excellent post. I just had a few clarification quesitons.

...But what about Input? Can we somehow combine two Input values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.

What we need to do, however, is to take the original Input and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input. Aha! There's the semigroup we need: Endo Input.

How rhetorical are those questions? Whatever the case, I will take the bait.

Any product type forms a semigroup if all of its elements do. You explicitly stated this for tuples of length 2; it also holds for records such as Input. Each field on that record has type Maybe a for some a, so it suffices to select a semigroup involving Maybe a. There are few different semigropus involving Maybe that have different functions.

I think the most common semigroup for Maybe a has the function that returns the first Just _ if one exists or else returns Nothing. Combining that with Nothing as the identity element gives the monoid that is typically associated with Maybe a (and I know by the name monoidal plus). Another monoid, and therefore a semigroup, is to return the last Just _ instead of the first.

Instead of the having a preference for Just _, the function could have a preference for Nothing. As before, when both inputs are Just _, the output could be either of the inputs.

I think either of those last two semigroups will achieved the desired behavior in the problem at hand. Your code never replaces an instace of Just a with a different instance, so we don't need a preference for some input when they are both Just _.

In the end though, I think the semigroup you derived from Endo leads to simpler code.

At the end of the type signature for validateName / validateDoB / validateAddress, what does String / Day / String mean?

Why did you pass all three arguments into every parsing/validation function? I think it is a bit simpler to only pass in the needed argument. Maybe you thought this was good enough for prototype code.

Why did you use add1 in your error message instead of address? Was it only for prototype code to make the message a bit shorter?

2020-12-21 14:21 UTC

Mark Seemann #

Tyson, thank you for writing. The semigroup you suggest, I take it, would look something like this:

newtype Perhaps a = Perhaps { runPerhaps :: Maybe  a } deriving (Eq, Show)
 
instance Semigroup (Perhaps a) where
  Perhaps Nothing <> _ = Perhaps Nothing
  _ <> Perhaps Nothing = Perhaps Nothing
  Perhaps (Just x) <> _ = Perhaps (Just x)

That might work, but it's an atypical semigroup. I think that it's lawful - at least, I can't come up with a counterexample against associativity. It seems reminiscent of Boolean and (the All monoid), but it isn't a monoid, as far as I can tell.

Granted, a Monoid constraint isn't required to make the validation code work, but following the principle of least surprise, I still think that picking a well-known semigroup such as Endo is preferable.

Regarding your second question, the type signature of e.g. validateName is:

validateName :: Input -> Validation (Endo Input, [String]) String

Like Either, Validation has two type arguments: err and a; it's defined as data Validation err a. In the above function type, the return value is a Validation value where the err type is (Endo Input, [String]) and a is String.

All three validation functions share a common err type: (Endo Input, [String]). On the other hand, they return various a types: String, Day, and String, respectively.

Regarding your third question, I could also have defined the functions so that they would only have taken the values they'd need to validate. That would better fit Postel's law, so I should probably have done that...

As for the last question, I was just following the 'spec' implied by the original forum question.

2020-12-22 15:05 UTC

Validation, a solved problem?

Monday, 14 December 2020 08:28:00 UTC

A validation problem with a twist.

Until recently, I thought that data validation was a solved problem: Use an applicative functor. I then encountered a forum question that for a few minutes shook my faith.

After brief consideration, though, I realised that all is good. Validation, even with a twist, is successfully modelled with an applicative functor. Faith in computer science restored.

The twist #

Usually, when you see a demo of applicative validation, the result of validating is one of two: either a parsed result, or a collection of error messages.

λ> validateReservation $ ReservationJson "2017-06-30 19:00:00+02:00" 4 "Jane Doe" "j@example.com"
Validation (Right (Reservation {
    reservationDate = 2017-06-30 19:00:00 +0200,
    reservationQuantity = 4,
    reservationName = "Jane Doe",
    reservationEmail = "j@example.com"}))

λ> validateReservation $ ReservationJson "2017/14/12 6pm" 4.1 "Jane Doe" "jane.example.com"
Validation (Left ["Not a date.","Not a positive integer.","Not an email address."])

λ> validateReservation $ ReservationJson "2017-06-30 19:00:00+02:00" (-3) "Jane Doe" "j@example.com"
Validation (Left ["Not a positive integer."])

(Example from Applicative validation.)

What if, instead, you're displaying an input form? When users enter data, you want to validate it. Imagine, for the rest of this short series of articles that the input form has three fields: name, date of birth, and address. Each piece of data has associated validation rules.

If you enter a valid name, but an invalid date of birth, you want to clear the input form's date of birth, but not the name. It's such a bother for a user having to retype valid data just because a single field turned out to be invalid.

Imagine, for example, that you want to bind the form to a data model like this F# record type:

type Input = { Name : string option; DoB : DateTime option; Address : string option}

Each of these three fields is optional. We'd like validation to work in the following way: If validation fails, the function should return both a list of error messages, and also the Input object, with valid data retained, but invalid data cleared.

One of the rules implied in the forum question is that names must be more than three characters long. Thus, input like this is invalid:

{ Name = Some "Tom"; DoB = Some eightYearsAgo; Address = Some "x" }

Both the DoB and Address fields, however, are valid, so, along with error messages, we'd like our validation function to return a partially wiped Input value:

{ Name = None; DoB = Some eightYearsAgo; Address = Some "x" }

Notice that both DoB and Address field values are retained, while Name has been reset.

A final requirement: If validation succeeds, the return value should be a parsed value that captures that validation took place:

type ValidInput = { Name : string; DoB : DateTime; Address : string }

That requirement is straightforward. That's how you'd usually implement application validation. It's the partial data round-trip that seems to throw a spanner in the works.

How should we model such validation?

Theory, applied #

There's a subculture of functional programming that draws heavily on category theory. This is most prevalent in Haskell. I've been studying category theory in an attempt to understand what it's all about. I even wrote a substantial article series about some design patterns and how they relate to theory.

One thing I learned after I'd named that article series is that most of the useful theoretical concepts come from abstract algebra, with the possible exception of monads.

People often ask me: does all that theory have any practical use?

Yes, it does, as it turns out. It did, for example, enable me to identify a solution to the above twist in five to ten minutes.

It's a discussion that I often have, particularly with the always friendly F# community. Do you have to understand functors, monads, etcetera to be a productive F# developer?

To anyone who wants to learn F# I'd respond: Don't worry about that at the gate. Find a good learning resource and dive right in. It's a friendly language that you can learn gradually.

Sooner or later, though, you'll run into knotty problems that you may struggle to address. I've seen this enough times that it looks like a pattern. The present forum question is just one example. A beginner or intermediate F# programmer will typically attempt to solve the problem in an ad-hoc manner that may or may not be easy to maintain. (The solution proposed by the author of that forum question doesn't, by the way, look half bad.)

To be clear: there's nothing wrong with being a beginner. I was once a beginner programmer, and I'm still a beginner in multiple ways. What I'm trying to argue here is that there is value in knowing theory. With my knowledge of abstract algebra and how it applies to functional programming, it didn't take me long to identify a solution. I'll get to that later.

Before I outline a solution, I'd like to round off the discussion of applied theory. That question about monads comes up a lot. Do I have to understand functors, monads, etcetera to be a good F# developer?

I think it's like asking Do I have to understand polymorphism, design patterns, the SOLID principles, etcetera to be a good object-oriented programmer?

Those are typically not the first topics people are taught about OOD. I would assert, however, that understanding such topics do help. They may not be required to get started with OOP, but knowing them makes you a better programmer.

I think the same is true for functional programming. It's just a different skill set that makes you better in that paradigm.

Solution outline #

When you know a bit of theory, you may know that validation can be implemented with an applicative sum type like Either (AKA Option), with one extra requirement.

Either has two dimensions, left or right (success or failure, ok or error, etcetera). The applicative nature of it already supplies a way to compose the successes, but what if there's more than one validation error?

In my article about applicative validation I showed how to collect multiple error messages in a list. Lists, however, form a monoid, so I typed the validation API to be that flexible.

In fact, all you need is a semigroup. When I wrote the article on applicative validation, Haskell's Semigroup type class wasn't yet a supertype of Monoid, and I (perhaps without sufficient contemplation) just went with Monoid.

What remains is that applicative validation can collect errors for any semigroup of errors. All we need to solve the above validation problem with a twist, then, is to identify a suitable semigroup.

I don't want to give away everything in this article, so I'm going to leave you with this cliffhanger. Which semigroup solves the problem? Read on.

As is often my modus operandi, I first did a proof of concept in Haskell. With its type classes and higher-kinded polymorphism, it's much faster to prototype solutions than even in F#. In the next article, I'll describe how that turned out.

After the Haskell article, I'll show how it translates to F#. You can skip the Haskell article if you like.

Conclusion #

I still think that validation is a solved problem. It's always interesting when such a belief for a moment is challenged, and satisfying to discover that it still holds.

This is, after all, not proof of anything. Perhaps tomorrow, someone will throw another curve ball that I can't catch. If that happens, I'll have to update my beliefs. Until then, I'll consider validation a solved problem.

Next: A Haskell proof of concept of validation with partial data round trip.

Branching tests

Monday, 07 December 2020 06:25:00 UTC

Is it ever okay to branch and loop in a unit test?

When I coach development organisations about unit testing and test-driven development, there's often a sizeable group of developers who don't see the value of unit testing. Some of the arguments they typically use are worth considering.

A common complaint is that it's difficult to see the wisdom in writing code to prevent defects in code. That's not an unreasonable objection.

We have scant scientific knowledge about software engineering, but the little we know suggests that the number of defects is proportional to lines of code. The more lines of code, the more defects.

If that's true, adding more code - even when it's test code - seems like a bad idea.

Reasons to trust test code #

First, we should consider the possibility that the correlation between lines of code and defects doesn't mean that defects are evenly distributed. As Adam Tornhill argues in Your Code as a Crime Scene, defects tend to cluster in hotspots.

You can have a large proportion of your code base which is, for all intents and purpose, bug-free, and hotspots where defects keep spawning.

If this is true, adding test code isn't a problem if you can keep it bug-free.

That, however, sounds like a chicken-and-the-egg kind of problem. How can you know that test code is bug-free without tests?

I've previously answered that question. In short, you can trust a test for two reasons:

You've seen it fail (haven't you?)
It's simple

I usually think of the simplicity criterion as a limit on cyclomatic complexity: it should be 1. This means no branching and no loops in your tests.

That's what this article is actually about.

What's in a name? #

I was working with an online restaurant reservation system (example code), and had written this test:

[Theory]
[InlineData("2023-11-24 19:00", "juliad@example.net", "Julia Domna", 5)]
[InlineData("2024-02-13 18:15", "x@example.com", "Xenia Ng", 9)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    string at,
    string email,
    string name,
    int quantity)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(db);
 
    var dto = new ReservationDto
    {
        At = at,
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        dto.Email,
        dto.Name,
        dto.Quantity);
    Assert.Contains(expected, db);
}

This is a state-based test that verifies that a valid reservation makes it to the database. The test has a cyclomatic complexity of 1, and I've seen it fail, so all is good. (It may, in fact, contain a future maintenance problem, but that's a topic for another article.)

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

What constitutes a valid reservation? At the very least, we should demand that At is a valid date and time, and that Quantity is a positive number. The restaurant would like to be able to email a confirmation to the user, so an email address is also required. Email addresses are notoriously difficult to validate, so we'll just require that the the string isn't null.

What about the Name? I thought about this a bit and decided that, according to Postel's law, the system should accept null names. The name is only a convenience; the system doesn't need it, it's just there so that when you arrive at the restaurant, you can say "I have a reservation for Julia" instead of giving an email address to the maître d'hôtel. But then, if you didn't supply a name when you made the reservation, you can always state your email address when you arrive. To summarise, the name is just a convenience, not a requirement.

This decision meant that I ought to write a test case with a null name.

That turned out to present a problem. I'd defined the Reservation class so that it didn't accept null arguments, and I think that's the appropriate design. Null is just evil and has no place in my domain models.

That's not a problem in itself. In this case, I think it's acceptable to convert a null name to the empty string.

Copy and paste #

Allow me to summarise. If you consider the above unit test, I needed a third test case with a null name. In that case, expected should be a Reservation value with the name "". Not null, but "".

As far as I can tell, you can't easily express that in PostValidReservationWhenDatabaseIsEmpty without increasing its cyclomatic complexity. Based on the above introduction, that seems like a no-no.

What's the alternative? Should I copy the test and adjust the single line of code that differs? If I did, it would look like this:

[Theory]
[InlineData("2023-08-23 16:55", "kite@example.edu", null, 2)]
public async Task PostValidReservationWithNullNameWhenDatabaseIsEmpty(
    string at,
    string email,
    string name,
    int quantity)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(db);
 
    var dto = new ReservationDto
    {
        At = at,
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        dto.Email,
        "",
        dto.Quantity);
    Assert.Contains(expected, db);
}

Apart from the values in the [InlineData] attribute and the method name, the only difference from PostValidReservationWhenDatabaseIsEmpty is that expected has a hard-coded name of "".

This is not acceptable.

There's a common misconception that the DRY principle doesn't apply to unit tests. I don't see why this should be true. The DRY principle exists because copy-and-paste code is difficult to maintain. Unit test code is also code that you have to maintain. All the rules about writing maintainable code also apply to unit test code.

Branching in test #

What's the alternative? One option (that shouldn't be easily dismissed) is to introduce a Test Helper to perform the conversion from a nullable name to a non-nullable name. Such a helper would have a cyclomatic complexity of 2, but could be unit tested in isolation. It might even turn out that it'd be useful in the production code.

Still, that seems like overkill, so I instead made the taboo move and added branching logic to the existing test to see how it'd look:

[Theory]
[InlineData("2023-11-24 19:00", "juliad@example.net", "Julia Domna", 5)]
[InlineData("2024-02-13 18:15", "x@example.com", "Xenia Ng", 9)]
[InlineData("2023-08-23 16:55", "kite@example.edu", null, 2)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    string at,
    string email,
    string name,
    int quantity)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(db);
 
    var dto = new ReservationDto
    {
        At = at,
        Email = email,
        Name = name,
        Quantity = quantity
    };
    await sut.Post(dto);
 
    var expected = new Reservation(
        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
        dto.Email,
        dto.Name ?? "",
        dto.Quantity);
    Assert.Contains(expected, db);
}

Notice that the expected name is now computed as dto.Name ?? "". Perhaps you think about branching instructions as relating exclusively to keywords such as if or switch, but the ?? operator is also a branching instruction. The test now has a cyclomatic complexity of 2.

Is that okay?

To branch or not to branch #

I think that in this case, it's okay to slightly increase the cyclomatic complexity of the test. It's not something I just pull out of my hat, though. I think it's possible to adjust the above heuristics to embrace this sort of variation.

To be clear, I consider this an advanced practice. If you're just getting started with unit testing, try to keep tests simple. Keep the cyclomatic complexity at 1.

Had I been in the above situation a couple of years ago, I might not have considered this option. About a year ago, though, I watched John Hughes' presentation Building on developers' intuitions to create effective property-based tests. When he, about 15 minutes in, wrote a test with a branching instruction, I remember becoming quite uncomfortable. This lasted for a while until I understood where he was going with it. It's truly an inspiring and illuminating talk; I highly recommend it.

How it relates to the problem presented here is through coverage. While the PostValidReservationWhenDatabaseIsEmpty test now has a cyclomatic complexity of 2, it's a parametrised test with three test cases. Two of these cover one branch, and the third covers the other.

What's more important is the process that produced the test. I added one test case at a time, and for each case, I saw the test fail.

Specifically, when I added the third test case with the null name, I first added the branching expression dto.Name ?? "" and ran the two existing tests. They still both passed, which bolstered my belief that they both exercised the left branch of that expression. I then added the third case and saw that it (and only it) failed. This supported my belief that the third case exercised the right branch of ??.

Branching in unit tests isn't something I do lightly. I still believe that it could make the test more vulnerable to future changes. I'm particularly worried about making a future change that might shift one or more of these test cases into false negatives in the form of tautological assertions.

Conclusion #

As you can tell, when I feel that I'm moving onto thin ice, I move deliberately. If there's one thing I've learned from decades of professional programming it's that my brain loves jumping to conclusions. Moving slowly and deliberately is my attempt at countering this tendency. I believe that it enables me to go faster in the long run.

I don't think that branching in unit tests should be common, but I believe that it may be occasionally valid. The key, I think, is to guarantee that each branch in the test is covered by a test case. The implication is that there must be at least as many test cases as the cyclomatic complexity. In other words, the test must be a parametrised test.

Comments

James Morcom #

Hi Mark, I guess there is implicit cyclomatic complexity in the testing framework itself (For example, it loops through the InlineData records). That feels fine though, does this somehow have less cost than cyclomatic complexity in the test code itself? I guess, as you mentioned, it's acceptable because the alternative is violation of DRY.

With this in mind, I wonder how you feel about adding an expectedName parameter to the InlineData attributes, instead of the conditional in the test code? Maybe it's harder to read though when the test data includes input and output.

2020-12-07 08:36 UTC

Mark Seemann #

James, thank you for writing. I consider the cyclomatic complexity of a method call to be 1, and Visual Studio code metrics agree with me. Whatever happens in a framework should, in my opinion, likewise be considered as encapsulated abstraction that's none of our business.

Adding an expectedName parameter to the method is definitely an option. I sometimes do that, and I could have done that here, too. In this situation, I think it's a toss-up. It'd make it harder for a later reader of the code to parse the test cases, but would simplify the test code itself, so that alternative comes with both advantages and disadvantages.

2020-12-08 11:02 UTC

Romain Deneau @DeneauRomain #

Hi Mark. To build up on the additional expectedName parameter, instead of keeping a single test with the 3 cases but the last being a edge case, I prefer introduce a specific test for the last case.

Then, to remove the duplication, we can extract a common method which will take this additional expectedName parameter:

		[Theory]
		[InlineData("2023-11-24 19:00", "juliad@example.net", "Julia Domna", 5)]
		[InlineData("2024-02-13 18:15", "x@example.com",      "Xenia Ng",    9)]
		public async Task PostValidReservationWithNameWhenDatabaseIsEmpty
		           (string at,          string email,         string name,   int quantity) =>
			PostValidReservationWhenDatabaseIsEmpty(at, email, name, expectedName: name, quantity);

		[Fact]
		public async Task PostValidReservationWithoutNameWhenDatabaseIsEmpty() =>
			PostValidReservationWhenDatabaseIsEmpty(
				at          : "2023-11-24 19:00",
				email       : "juliad@example.net",
				name        : null,
				expectedName: "",
				quantity    : 5);

		private async Task PostValidReservationWhenDatabaseIsEmpty(
			    string at,
			    string email,
			    string name,
			    string expectedName,
			    int    quantity)
		{
		    var db = new FakeDatabase();
		    var sut = new ReservationsController(db);
		 
		    var dto = new ReservationDto
		    {
		        At       = at,
		        Email    = email,
		        Name     = name,
		        Quantity = quantity,
		    };
		    await sut.Post(dto);
		 
		    var expected = new Reservation(
		        DateTime.Parse(dto.At, CultureInfo.InvariantCulture),
		        dto.Email,
		        expectedName, // /!\ Not `dto.Name`,
		        dto.Quantity);
		    Assert.Contains(expected, db);
		}

2020-12-09 8:44 UTC

Mark Seemann #

Romain, thank you for writing. There are, indeed, many ways to skin that cat. If you're comfortable with distributing a test over more than one method, I instead prefer to use another data source for the [Theory] attribute:

private class PostValidReservationWhenDatabaseIsEmptyTestCases :
    TheoryData<ReservationDto, Reservation>
{
    public PostValidReservationWhenDatabaseIsEmptyTestCases()
    {
        AddWithName(new DateTime(2023, 11, 24, 19, 0, 0), "juliad@example.net", "Julia Domna", 5);
        AddWithName(new DateTime(2024, 2, 13, 18, 15, 0), "x@example.com", "Xenia Ng", 9);
        AddWithoutName(new DateTime(2023, 8, 23, 16, 55, 0), "kite@example.edu", 2);
    }
 
    private void AddWithName(DateTime at, string email, string name, int quantity)
    {
        Add(new ReservationDto
            {
                At = at.ToString("O"),
                Email = email,
                Name = name,
                Quantity = quantity
            },
            new Reservation(at, email, name, quantity));
    }
 
    private void AddWithoutName(DateTime at, string email, int quantity)
    {
        Add(new ReservationDto { At = at.ToString("O"), Email = email, Quantity = quantity },
            new Reservation(at, email, "", quantity));
    }
}
 
 
[Theory, ClassData(typeof(PostValidReservationWhenDatabaseIsEmptyTestCases))]
public async Task PostValidReservationWhenDatabaseIsEmpty(
    ReservationDto dto, Reservation expected)
{
    var db = new FakeDatabase();
    var sut = new ReservationsController(db);
 
    await sut.Post(dto);
 
    Assert.Contains(expected, db);
}

Whether you prefer one over the other is, I think, subjective. I like my alternative, using a [ClassData] source, better, because I find it a bit more principled and 'pattern-based', if you will. I also like how small the actual test method becomes.

Your solution, on the other hand, is more portable, in the sense that you could also apply it in a testing framework that doesn't have the sort of capability that xUnit.net has. That's a definite benefit with your suggestion.

2020-12-10 20:05 UTC

Next Previous

Page 18 of 73

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!

ploeh blog danish software design

Digital infrastructure #

TANSTAAFL #

The case for regulation #

Accountability #

Practicality #

Problems #

Conclusion #

Comments

POCO Controllers #

HTTP responses and status codes #

Generating links #

Authorisation #

Conclusion #

Comments

Reserving a table #

Self-hosting #

Posting a reservation #

Verifying state #

Conclusion #

Comments

Size #

Primitive obsession #

Essential test #

Object data source #

The paradox of decomposition #

Conclusion #

Comments

Sooner or later #

Test Double? #

Relative time #

Conclusion #

Test cases from Python #

Port to F# #

Test oracle #

Refactoring #

Conclusion #

Comments

Data definitions #

Computation expression #

Validators #

Composition #

Tests #

Conclusion #

Data definitions #

Looking for a Semigroup #

Validators #

Composition #

Sanity check #

Conclusion #

Comments

The twist #

Theory, applied #

Solution outline #

Conclusion #

Reasons to trust test code #

What's in a name? #

Copy and paste #

Branching in test #

To branch or not to branch #

Conclusion #

Comments