Code coverage is a useless target measure

Monday, 16 November 2015 08:38:00 UTC

Aiming for a particular percentage of code coverage is counter-productive.

It's the end of 2015, and here I thought that it was common knowledge that using code coverage as a metric for code quality is useless at best. After all, Martin Fowler wrote a good article on the subject in 2012, but the fundamental realisation is much older than that. Apparently, it's one of those insights that one assumes that everyone else already knows, but in reality, that's not the case.

Let's make it clear, then: don't set goals for code coverage.

You may think that it could make your code base better, but asking developers to reach a certain code coverage goal will only make your code worse.

I'll show you some examples, but on a general level, the reason is that as with all other measurements, you get what you measure. Unfortunately, we can't measure productivity, so measuring code coverage will produce results that are entirely unrelated to software quality.

"People respond to incentives, although not necessarily in ways that are predictable or manifest. Therefore, one of the most powerful laws in the universe is the law of unintended consequences." - Super Freakonomics
Incentives with negative consequences are called perverse incentives; asking developers to reach a particular code coverage goal is clearly a perverse incentive.

It doesn't matter whether you set the target at 100% code coverage, 90%, 80%, or some other number.

Reaching 100% coverage is easy #

Here's a simple code example:

public class GoldCustomerSpecification : ICustomerSpecification
{
    public bool IsSatisfiedBy(Customer candidate)
    {
        return candidate.TotalPurchases >= 10000;
    }
}

Imagine that you have been asked to reach a high level of code coverage, and that this class is still not covered by tests. Not only that, but you have bugs to fix, meetings to go to, new features to implement, documentation to write, time sheets to fill out, and the project is already behind schedule, over budget, and your family is complaining that you're never home.

Fortunately, it's easy to achieve 100% code coverage of the GoldCustomerSpecification class:

[Fact]
public void MeaninglessTestThatStillGivesFullCoverage()
{
    try
    {
        var sut = new GoldCustomerSpecification();
        sut.IsSatisfiedBy(new Customer());
    }
    catch { }
}

This test achieves 100% code coverage of the GoldCustomerSpecification class, but is completely useless. Because of the try/catch block and the lack of assertions, this test will never fail. This is what Martin Fowler calls Assertion-Free Testing.

If you can declare a rule that your code base must have so-and-so test coverage, however, you can also declare that all unit tests must have assertions, and must not have try/catch blocks.

Despite of this new policy, you still have lots of other things you need to attend to, so instead, you write this test:

[Fact]
public void SlightlyMoreInnocuousLookingTestThatGivesFullCoverage()
{
    var sut = new GoldCustomerSpecification();
    var actual = sut.IsSatisfiedBy(new Customer());
    Assert.False(actual);
}

This test also reaches 100% coverage of the GoldCustomerSpecification class.

What's wrong with this test? Nothing, as such. It looks like a fine test, but in itself, it doesn't prevent regressions, or proves that the System Under Test works as intended. In fact, this alternative implementation also passes the test:

public class GoldCustomerSpecification : ICustomerSpecification
{
    public bool IsSatisfiedBy(Customer candidate)
    {
        return false;
    }
}

If you want your tests to demonstrate that the software works as intended, particularly at boundary values, you'll need to add more tests:

[Theory]
[InlineData(100, false)]
[InlineData(9999, false)]
[InlineData(10000, true)]
[InlineData(20000, true)]
public void IsSatisfiedReturnsCorrectResult(
    int totalPurchases,
    bool expected)
{
    var sut = new GoldCustomerSpecification();
 
    var candidate = new Customer { TotalPurchases = totalPurchases };
    var actual = sut.IsSatisfiedBy(candidate);
 
    Assert.Equal(expected, actual);
}

This is a much better test, but it doesn't increase code coverage! Code coverage was already 100% with the SlightlyMoreInnocuousLookingTestThatGivesFullCoverage test, and it's still 100% with this test. There's no correlation between code coverage and the quality of the test(s).

Code coverage objectives inhibit quality improvement #

Not only is test coverage percentage a meaningless number in itself, but setting a goal that must be reached actually hinders improvement of quality. Take another look at the GoldCustomerSpecification class:

public class GoldCustomerSpecification : ICustomerSpecification
{
    public bool IsSatisfiedBy(Customer candidate)
    {
        return candidate.TotalPurchases >= 10000;
    }
}

Is the implementation good? Can you think of any improvements to this code?

What happens if candidate is null? In that case, a NullReferenceException will be thrown. In other words, the IsSatisfiedBy method doesn't properly check that its preconditions are satisfied (which means that encapsulation is broken).

A better implementation would be to explicitly check for null:

public class GoldCustomerSpecification : ICustomerSpecification
{
    public bool IsSatisfiedBy(Customer candidate)
    {
        if (candidate == null)
            throw new ArgumentNullException(nameof(candidate));
 
        return candidate.TotalPurchases >= 10000;
    }
}

The problem, though, is that if you do this, coverage drops! That is, unless you write another test case...

Developers in a hurry often refrain from making the code better, because it would hurt their coverage target - and they don't feel they have time to also write the tests that go with the improvement in question.

Instituting a code coverage target - any percentage - will have that effect. Not only does the coverage number (e.g. 87%) tell you nothing, but setting it as a target will make the code base worse.

Attitude #

You may argue that I'm taking too dim a view on developers, but I've seen examples of the behaviour I describe. People mostly have good intentions, but if you put enough pressure on them, they'll act according to that pressure. This is the reason we need to be aware of perverse incentives.

You may also argue that if a team is already doing Test Driven Development, and in general prioritise code quality, then coverage will already be high. In that case, will it hurt setting a target? Perhaps not, but it's not going to help either. At best, the target will be irrelevant.

This article, however, doesn't discuss teams that already do everything right; it describes the negative consequences that code coverage targets will have on teams where managers or lead developers mistakenly believe that setting such goals is a good idea.

Code coverage is still useful #

While it's dangerous to use code coverage for target setting, collecting coverage metrics can still be useful.

Some people use it to find areas where coverage is weak. There may be good reasons that some parts of a code base are sparsely covered by tests, but doing a manual inspection once in a while is a good idea. Perhaps you find that all is good, but you may also discover that a quality effort is overdue.

In some projects, I've had some success watching the code coverage trend. When I review pull requests, I first review the changes by looking at them. If the pull request needs improvement, I work with the contributor to get the pull request to an acceptable quality. Once that is done, I've already made my decision on merging the code, and then I measure code coverage. It doesn't influence my decision to merge, but it tells me about the trend. On some projects, I've reported that trend back to the contributor while closing the pull request. I wouldn't report the exact number, but I'd remark that coverage went up, or down, or remained the same, by 'a little', 'much', etc. The point of that is to make team members aware that testing is important.

Sometimes, coverage goes down. There are many good reasons that could happen. Watching how coverage evolves over time doesn't mean that you have to pounce on developers every time it goes down, but it means that if something looks odd, it may be worth investigating.

Summary #

Don't use code coverage as an objective. Code coverage has no correlation with code quality, and setting a target can easily make the quality worse.

On the other hand, it can be useful to measure code coverage once in a while, but it shouldn't be your main source of information about the status of your source code.


Null has no type, but Maybe has

Friday, 13 November 2015 08:24:00 UTC

In C#, null has no type, but most variables can be null; you can't really trust the type system. A Maybe, on the other hand, always has a type, which means that Maybe is a saner approach to the question of values that may or may not be present.

A few days ago, I was looking at some C# code that, reduced to essentials, looked like this:

string foo = null;
var isNullAString = foo is string;

What is the value of isNullAString after execution?

Since foo is declared as a string, I thought that the answer clearly had to be true. Much to my surprise, it turns out that it's false.

Wondering if I was exceptionally bad at predicting the type of null values, I created a Twitter poll. 235 votes later, the outcome was this:

Poll results showing that 44% thought that the answer is true, and 56% thought that the answer is false.

Forty-four percent of respondents (some 103 people) were as wrong as I was! At one point, while the poll was still open and some 100 people had responded, the distribution was even fifty-fifty. Ultimately, I believe that the final results are artificially skewed toward false, because people could try the code first, before answering, and there's evidence that at least one person did that.

In short, that a null string isn't a string doesn't make much sense to a lot of people.

It's not a bug, though. It's explicitly stated in section 7.10.10 of the C# language specification:

"If E is [...] the null literal, of if the type of E is a reference type or a nullable type and the value of E is null, the result is false."
The specification doesn't offer much of an explanation, but Eric Lippert shares some information on the topic.

It still doesn't make any sense to me...

Apparently, the rules of C#'s type system is: a variable is guaranteed to be of a certain type, except when it isn't. Once again, null throws a wrench into any attempt to reason sanely about code.

The .NET Rocks! episode about less is more sparked a ton of comments; most of them in defence of null. People don't seem to understand just how malicious null references are. That null has no type is yet another example.

I think that the main reason that people defend null is that they have a hard time imagining other ways of modelling situations where a value may or may not be present. Even when introduced to the Maybe monad, most people remain unconvinced, because it's difficult to understand how Maybe is better than null.

The difference is clear: only values explicitly declared as Maybes can be Maybes, and Maybe values always have a type!

In F#, Maybe is called option, and it's always typed. The logical equivalent of the above type check would be this in F#:

let foo : string option = None
let isNoneAStringOption = foo :? string option

Only, this doesn't even compile!

If you try this in F#, the compiler will complain:

"error FS0016: The type 'string option' does not have any proper subtypes and cannot be used as the source of a type test or runtime coercion."
That expression doesn't even make sense in F#. Of course foo is a string option, because it's the only thing it can be!

You'll have to upcast foo to obj in order to be able to perform the type check:

let foo : string option = None
let isNoneAStringOption = box foo :? string option

As expected, this evaluates to true. Of course isNoneAStringOption is true, even when it's None! What else could it possibly be?

In Haskell, it doesn't even make sense to ask such a question, because there's no type hierarchy. In Haskell, you can't upcast a value to its base type, because there's no inheritance.

In short, null values invalidate all rules and guarantees that the C# type system attempts to make. It's truly a toxic language 'feature'.


Comments

null certainly is a toxic feature of C#. It was one of the key reasons behind me creating the Succinc<T> library. It brings all the wonderfulness of F#'s options to C#. It let's one write equivalent code to your example:

var foo = Option<string>.None();
var isNoneAStringOption = foo is Option<string>;

Which gives the same true result as the F# code.

Just because a language has a feature, doesn't mean we have to use it. The same applies just as much to null as to switch and goto. Alternatives exist to these features.

2015-11-17 10:00 UTC

Service Locator violates encapsulation

Monday, 26 October 2015 08:39:00 UTC

Service Locator violates encapsulation in statically typed languages because it doesn't clearly communicate preconditions.

The horse has been long dead, but some people still want to ride it, so I'll beat it yet again. Over the years, I've made various attempts to explain why Service Locator is an anti-pattern (e.g. that it violates SOLID), but recently it struck me that most of my arguments have been focused on symptoms without ever addressing the fundamental problem.

As an example of discussing symptoms, in my original article, I described how IntelliSense is hurt by the use of Service Locator. In 2010, it never occurred to me that the underlying problem is that encapsulation is violated.

Consider my original example:

public class OrderProcessor : IOrderProcessor
{
    public void Process(Order order)
    {
        var validator = Locator.Resolve<IOrderValidator>();
        if (validator.Validate(order))
        {
            var shipper = Locator.Resolve<IOrderShipper>();
            shipper.Ship(order);
        }
    }
}

This is C# code, but it'd be similar in Java or another comparable statically typed language.

Pre- and postconditions #

One of the major benefits of encapsulation is abstraction: relieving you of the burden of having to understand every implementation detail of every piece of code in your code base. Well-designed encapsulation enables you to use a class without knowing all the intricate details of how it's implemented. This is done by establishing a contract for interaction.

As Object-Oriented Software Construction explains, a contract consists of a set of pre- and postconditions for interaction. If the client satisfies the preconditions, the object promises to satisfy the postconditions.

In statically typed languages like C# and Java, many preconditions can be expressed with the type system itself, as I've previously demonstrated.

If you look at the public API for the above OrderProcessor class, then what would you think its preconditions are?

public class OrderProcessor : IOrderProcessor
{
    public void Process(Order order)
}

As far as we can tell, there aren't many preconditions. The only one I can identify from the API is that there ought to be an Order object before you can call the Process method.

Yet, if you attempt to use OrderProcessor using only that precondition, it's going to fail at run-time:

var op = new OrderProcessor();
op.Process(order); // throws

The actual preconditions are:

  • There ought to be an Order object (this one we already identified).
  • There ought to be an IOrderValidator service in some Locator global directory.
  • There ought to be an IOrderShipper service in some Locator global directory.
Two out of three preconditions were invisible at compile-time.

As you can see, Service Locator violates encapsulation because it hides the preconditions for correct use.

Passing arguments #

Several people have jokingly identified Dependency Injection as a glorified term for passing arguments, and there may be some truth to that. The easiest way to make the preconditions apparent would be to use the type system to advertise the requirements. After all, we already figured out that an Order object is required. This was evident because Order is an argument to the Process method.

Can you make the need for IOrderValidator and IOrderShipper as apparent as the need for the Order object using the same technique? Is the following a possible solution?

public void Process(
    Order order,
    IOrderValidator validator,
    IOrderShipper shipper)

In some circumstances, this could be all you need to do; now the three preconditions are equally apparent.

Unfortunately, often this isn't possible. In this case, OrderProcessor implements the IOrderProcessor interface:

public interface IOrderProcessor
{
    void Process(Order order);
}

Since the shape of the Process method is already defined, you can't add more arguments to it. You can still make the preconditions visible via the type system by requiring the caller to pass the required objects as arguments, but you'll need to pass them via some other member. The constructor is the safest channel:

public class OrderProcessor : IOrderProcessor
{
    private readonly IOrderValidator validator;
    private readonly IOrderShipper shipper;
 
    public OrderProcessor(IOrderValidator validator, IOrderShipper shipper)
    {
        if (validator == null)
            throw new ArgumentNullException("validator");
        if (shipper == null)
            throw new ArgumentNullException("shipper");
            
        this.validator = validator;
        this.shipper = shipper;
    }
 
    public void Process(Order order)
    {
        if (this.validator.Validate(order))
            this.shipper.Ship(order);
    }
}

With this design, the public API now looks like this:

public class OrderProcessor : IOrderProcessor
{
    public OrderProcessor(IOrderValidator validator, IOrderShipper shipper)
 
    public void Process(Order order)
}

Now it's clear that all three object are required before you can call the Process method; this version of the OrderProcessor class advertises its preconditions via the type system. You can't even compile client code unless you pass arguments to constructor and method (you can pass null, but that's another discussion).

Summary #

Service Locator is an anti-pattern in statically typed, object-oriented languages because it violates encapsulation. The reason is that it hides preconditions for proper usage.

If you need an accessible introduction to encapsulation, you should consider watching my Encapsulation and SOLID Pluralsight course. If you wish to learn more about Dependency Injection, you can read my award-winning book Dependency Injection in .NET.


Comments

If we take a look at the original example, we should notice that terms from multiple domains are interleaving. Therefore, the OrderProcessor is violating context independence as described in GOOS book. To become context independent OrderProcessor should make his relationships explicit by allowing to pass them in constructor.

It is a slightly different perspective of the problem, but conclusion is the same, because context independence also concerns encapsulation.

2015-10-27 20:00 UTC

Is it only the usage of a Service Locator within a class that's an anti-pattern? That is, as long as OrderProcessor makes its dependencies explicit via the constructor, there's nothing wrong with using a Service Locator to get those dependencies when creating a OrderProcessor instance?

2015-11-19 23:21 UTC

Jeffrey, thank you for writing. I'm not sure I fully understand, but perhaps you are considering whether the use of a DI Container as a composition engine is also an anti-pattern?

If so, you can use a DI Container from within your Composition Root, but personally, I still prefer Pure DI.

2015-11-20 7:06 UTC UTC

Visual Value Verification

Monday, 19 October 2015 08:08:00 UTC

Sometimes, the most efficient way to verify the outcome of executing a piece of code is to visualise it.

Recently, I've been working my way through Real World Haskell, and although some of the exercises in the book are exasperating, others are stimulating and engaging. One of the good exercises is to use the Graham Scan algorithm to find the convex hull for a set of points.

This proved to be unexpectedly difficult for me, but I also found the exercise captivating, so I kept at it. My main problems turned out to be related to the algorithm itself, so during the exercise, I temporarily switched to F# in order to work out the kinks of my implementation. This enabled me to focus on the problem itself without also having to fight with an unfamiliar programming language.

Surprisingly, it turned out that one of my biggest problems was that I didn't have a good way to verify my implementation.

Return values #

Since I was approaching the problem with Functional Programming, it ought to be easy to unit test. After all, Functional design is intrinsically testable. My overall function to find the convex hull looks like this:

let inline hull points = // ...

In simplified form, the type of this function is (^a * ^d) list -> (^a * ^d) list where the ^a and ^d generic type arguments have a whole lot of constraints that I don't want to bother you with. In practice, both ^a and ^d can be integers, so that the hull function gets the type (int * int) list -> (int * int) list. In other words: you supply a list of integer points, and you get a list of integer points back.

Here's a simple example:

> hull [(3, 1); (2, 3); (2, 4); (2, 5); (3, 7); (1, 2); (1, 6)];;
val it : (int * int) list = [(3, 1); (3, 7); (2, 5); (1, 6); (1, 2)]

Quick! At a glance: is this result correct or incorrect?

How about this result?

> hull [(5, -2); (5, 6); (-4, 7); (-6, 0); (-8, 0); (-2, 5); (-3, -4); (-2, -2);
   (-9, -7); (2, -9); (4, -2); (2, -10); (4, -10); (4, -9); (2, -10); (3, -9);
   (8, 2); (-8, -5); (-9, -4); (5, -6); (6, 4); (8, -10); (-5, 0); (5, 9);
   (-5, -4); (-6, 8); (0, -9); (7, -4); (6, 4); (-8, -5); (-7, -7); (8, -9);
   (7, -3); (6, 4); (-6, -8); (-4, 4); (-2, -2); (-6, -10); (0, 1); (5, -7);
   (-5, 4); (5, -5); (6, 4); (0, 7); (5, 5); (-1, -4); (-6, 0); (-9, 3);
   (5, 6); (-7, 7); (4, -10); (5, -8); (9, -1); (0, -9); (6, 6); (6, -6);
   (9, 8); (-10, -2); (-3, 2); (-5, -7)];;
val it : (int * int) list =
  [(-6, -10); (2, -10); (4, -10); (8, -10); (9, -1); (9, 8); (5, 9); (-6, 8);
   (-7, 7); (-9, 3); (-10, -2); (-9, -7)]

(In the first example, the output is incorrect, but in the second, it's correct.)

It's easy enough to write automated unit tests once you know what the expected outcome should be. In this case, my problem was that I didn't have an easy way to calculate if a given list of points was the correct answer or not. After all, I was trying to implement a function that could be used for this purpose, but I needed to know if the function returned the correct values.

In the beginning, I tried to plot the values into Excel, in order to draw them as diagrams, but that soon turned out to be tedious and inefficient.

Then I considered Property-Based Testing, but I couldn't come up with a good set of properties that didn't involve half of the algorithm I was trying to implement.

Visual Value Verification #

The concept of a convex hull is simple, and easy to verify if you can visualise it. That's what I tried to do with Excel, but here my problem was that the process was too cumbersome.

Instead, I decided to pull in FSharp.Charting, because it enabled me to easily visualise the result of calling the hull function. This is all it takes:

open System
open FSharp.Charting
 
let inline hullChart points =
    let hullPoints = hull points
    let hullChart =
        let closedHull = hullPoints @ [hullPoints.Head]
        Chart.Line(closedHull, Name = "Hull")
        |> Chart.WithStyling(Color = Drawing.Color.Blue)
    let pointsChart =
        Chart.Point(points, Name = "Points")
        |> Chart.WithStyling(Color = Drawing.Color.Black)
    [hullChart; pointsChart]
    |> Chart.Combine
    |> Chart.WithYAxis(MajorGrid = ChartTypes.Grid(Enabled = false))
    |> Chart.WithXAxis(MajorGrid = ChartTypes.Grid(Enabled = false))

The signature of the hullChart function is (^a * ^d) list -> FSharp.Charting.ChartTypes.GenericChart (where, again, the ^a and ^d types are generic type arguments with various type constraints; think of them as numbers). It first calls the hull function with points. Then it creates a line chart to draw the hull, and a point chart to plot in all the input points. Finally, it combines both charts into a single chart.

With the hullChart function, it's easy to do ad-hoc testing in F# Interactive and visually inspect the results of calling the hull function with various input. At one point, I had a particular problem with my interpretation of the Graham Scan algorithm, and this was easy to see using the hullChart function, which would produce this chart:

A hull diagram that shows the calculated hull to be concave.

With this chart it's easy to see, at a glance, that the calculated hull is concave, and thus not a convex hull. There's an error in the implementation. (This is the first result set that I asked you to evaluate above.)

Struggling on with the exercise, I was able to solve my immediate problem and produce a convex hull from that particular input. Did that mean that I now had the correct implementation, or could there be other errors? I needed more test results before I felt that I had confidence in my implementation.

This, on the other hand, was now easy to get.

First, I could randomly generate points like this:

let randomPoints (r : Random) =
    [1..r.Next(1, 100)]
    |> List.map (fun _ -> (r.Next(-10, 10), r.Next(-10, 10)))

For ad-hoc testing, I could now create a random set of points and show the calculated hull:

> (randomPoints (Random()) |> hullChart).ShowChart();;

Immediately, a window would pop up, enabling me to visually verify the calculated hull value. Literally, verification at a glance.

From Visual Value Verification to automated tests #

You may object to this approach because such testing isn't sustainable. Ultimately, we'd like to have a suite of automated tests that can give us a succeed or failure result.

Still, the ability to visually verify the calculated hull values enabled me to produce a set of input points, as well as calculated hull points that I knew to be correct. I could, for example, use the randomPoints function to produce 100 input sets. For each of these 100 input sets, I could visually inspect the diagrams.

Here's an example of six diagrams, instead of 100, just to give you an idea about how quickly you can verify such a set:

Six individual hull diagrams arranged in a 2x3 grid, each of them displaying convex hulls.

If all of the generated diagrams look okay, you know that for at least these 100 sets, the output of the hull function is correct. You can now capture those input values and the corresponding (correct) output values as a parametrised test. Here's an xUnit.net example with five test cases:

// No [<ClassData>] attribute in xUnit.net 2.0 :(
type HullDataAttribute() =
    inherit Xunit.Sdk.DataAttribute ()
    override this.GetData testMethod =
        // The following input data comes from randomly generated points.
        // The expected values come from a prototype of the hull function where
        // the output was subsequently visually inspected by drawing the input
        // points and the calculated hull on a coordinate system to verify that
        // the hull prototype function calculated the correct values.
        seq {
            yield
                [|
                    // Points (input):
                    [(3, 1); (3, 7); (2, 5); (2, 4); (1, 6); (2, 3); (1, 2)]
                    // Expected:
                    [(3, 1); (3, 7); (1, 6); (1, 2)]
                |]
            yield
                [|
                    [(1, -4); (2, 5); (1, 3); (1, -3); (1, -2); (0, 4)]
                    [(1, -4); (2, 5); (0, 4)]
                |]
            yield
                [|
                    [(1, 1); (0, 3); (-2, 1); (-4, 3); (5, 2); (3, 2); (5, 5); (2, 5); (1, 3); (1, -3); (1, -2); (7, -4); (-1, 1); (-3, 0); (-5, -2); (1, -4); (0, 1); (0, 4); (3, -3); (6, 1)]
                    [(1, -4); (7, -4); (6, 1); (5, 5); (2, 5); (-4, 3); (-5, -2)]
                |]
            yield
                [|
                    [(-7, -7); (4, -7); (2, 3); (4, 4); (3, 1); (2, -1); (-3, -5); (4, -2); (-1, -7); (-6, 9); (4, 4); (-8, -2); (9, 4); (3, 0); (7, 0); (-7, 3); (0, 9); (4, -7); (-7, -6); (-1, 7); (6, 5); (7, -3); (-8, -8); (-6, -2); (3, 5); (-5, 7); (8, 1); (3, -2); (-9, -4); (-7, 8)]
                    [(-8, -8); (4, -7); (7, -3); (9, 4); (0, 9); (-6, 9); (-7, 8); (-9, -4)]
                |]
            yield
                [|
                    [(3, -3); (-9, -3); (0, 7); (3, 8); (3, -9); (1, 3); (-9, 5); (-4, 9); (-2, -10); (8, -2); (-4, 2); (-7, -9); (-5, -10); (0, 2); (9, -7); (6, -4); (4, 7); (-9, -7); (2, 1); (-3, -5); (-5, -1); (9, 6); (-3, 1); (6, -6); (-5, -4); (-6, 5); (0, 9); (-2, -9); (-6, -10); (-8, -1); (-4, -9); (8, -1); (-5, -5); (9, -6); (4, -8); (-3, 7); (2, 3); (-8, 6); (3, -4); (3, 4); (-6, -5); (-4, 3); (9, -10); (5, 4); (-1, 9); (9, 1); (-1, 7); (8, -7); (1, -1); (0, -9); (2, 1); (0, -8); (8, -3); (-8, 7); (7, 1); (-2, 8); (-4, -2); (-5, -10); (4, -6); (0, -5); (-1, -6); (5, 4); (-7, 6); (-3, 4); (4, 8); (-6, -7); (5, 2); (-9, 2); (5, -6); (4, 2); (7, 8); (7, 7)]
                    [(-6, -10); (-5, -10); (-2, -10); (9, -10); (9, -7); (9, -6); (9, 1); (9, 6); (7, 8); (0, 9); (-1, 9); (-4, 9); (-8, 7); (-9, 5); (-9, 2); (-9, -3); (-9, -7)]
                |]
        }
 
[<TheoryHullData>]
let ``hull returns correct result``
    (points : (int * intlist)
    (expected : (int * intlist) = 
 
    let actual = hull points
    expected =! actual

(The =! operator is an assertion operator from Unquote; read it as should equal - i.e. expected should equal actual.)

This gives you a deterministic test suite you can run repeatedly to protect the hull function against regressions.

Summary #

Sometimes, the nature of the problem is such that the easiest way to verify that the System Under Test (SUT) produces the correct results, is to visually verify the resulting value of exercising the SUT. We can call this Visual Value Verification (VVV).

In this article, I used the problem of finding the convex hull for a set of points as an example, but I've encountered other problems where this technique has proven useful. The most prominent that comes to mind is when implementing Conway's Game of Life; that's another problem where, just by looking at lists of numbers, you can't easily verify that your implementation is correct.

Once you've visually verified that output looks correct, you can capture the known input and output into a test case that you can run repeatedly.


Command Query Separation when Queries should have side-effects

Thursday, 08 October 2015 15:50:00 UTC

How can you adhere to Command Query Separation if your Domain Model requires Queries to have side-effects?

Command Query Separation (CQS) can be difficult until you get the hang of it; then it's not so difficult - just like most other things in life :)

In a previous article, I covered how to retrieve server-generated IDs after Create operations. That article discussed how to prevent Commands from turning into Queries. In the present article, you'll see some examples of how to prevent Queries from turning into Commands.

Context #

This article was triggered by a viewer's question related to my Encapsulation and SOLID Pluralsight course. As I interpret it, the hypothetical scenario is some school or university exam taking software:

"If a student has not submitted a solution to an exercise yet, when and if they look at the exercise hint for the first time, flag that hint as viewed. The points granted to a student's solution will be subtracted by 5 points, if the related hint is flagged as viewed."
As stated here, it sounds like a Query (reading the exercise hint) must have a side-effect. This time, we can't easily wave it away by saying that the side-effect is one that the client isn't responsible for, so it'll be OK. If the side-effect had been an audit log, we could have gotten away with that, but here the side-effect is within the Domain Model itself.

How can you implement this business requirement while still adhering to CQS? Perhaps you'd like to pause reading for a moment to reflect on this question; then you can compare your notes to mine.

Is it even worth applying CQS to this problem, or should we simply give up? After all, the Domain Model seems to inherently associate certain Queries with side-effects.

In my opinion, it's exactly in such scenarios that CQS really shines. Otherwise, you're looking at the code as a team developer, and you go: Why did the score just go down? I didn't change anything! You can waste hours when side-effects are implicit. Applying CQS makes side-effects explicit, and as the Zen of Python goes:

Explicit is better than implicit.
There are various ways to address this apparently impossible problem. You don't have to use any of them, but the first key to choosing your tools is to have something to choose from.

Contextual types #

With the requirements given above, we don't know what we're building. Is it a web-based application? An app? A desktop application? Let's, for a while, imagine that we're developing an app or desktop application. In my fevered imagination, this sort of application may have all the questions and hints preloaded in memory, or in files, and continually displays the current score on the screen. There may not be further persistent storage, or perhaps the application publishes the final scores for the exam to a central server once the exam is over. Think occasionally connected clients.

In this type of scenario, the most important point is to keep the score up-to-date in memory. This can easily be done with a contextual or 'amplified' type. In this case, we can call it Scored<T>:

public sealed class Scored<T>
{
    public readonly T Item;
    public readonly int Score;
 
    public Scored(T item, int score)
    {
        if (item == null)
            throw new ArgumentNullException(nameof(item));
 
        this.Item = item;
        this.Score = score;
    }
 
    public Scored<T> Add(int scoreDelta)
    {
        return new Scored<T>(this.Item, this.Score + scoreDelta);
    }
 
    public override bool Equals(object obj)
    {
        var other = obj as Scored<T>;
        if (other == null)
            return base.Equals(obj);
 
        return object.Equals(this.Item, other.Item)
            && object.Equals(this.Score, other.Score);
    }
 
    public override int GetHashCode()
    {
        return
            this.Item.GetHashCode() ^
            this.Score.GetHashCode();
    }
}

The Scored<T> class enables you to carry a score value around within a computation. In order to keep the example as simple as possible, I modelled the score as an integer, but perhaps you should consider refactoring from Primitive Obsession to Domain Modelling; that's a different story, though.

This means you can model your API in such a way that a client must supply the current score in order to retrieve a hint, and the new score is returned together with the hint:

public interface IHintQuery
{
    Scored<Hint> Read(int hintId, int currentScore);
}

The Read method is a Query, and there's no implied side-effect by calling it. Since the return type is Scored<Hint>, it should be clear to the client that the score may have changed.

An implementation could look like this:

public class HintQuery : IHintQuery
{
    private readonly IHints hints;
 
    public HintQuery(IHints hints)
    {
        if (hints == null)
            throw new ArgumentNullException(nameof(hints));
 
        this.hints = hints;
    }
 
    public Scored<Hint> Read(int hintId, int currentScore)
    {
        var valFromInner = this.hints.FirstById(hintId);
        return new Scored<Hint>(valFromInner, currentScore).Add(-5);
    }
}

The Read method uses an injected (lower-level) Query interface to read the answer hint, packages the result in a Scored<Hint> value, and subtracts 5 points from the score.

Both the score type (int) and Scored<T> are immutable. No side-effects occur while the client reads the answer hint, but the score is nonetheless adjusted.

In this scenario, the score travels around in the memory of the application. Perhaps, after the exam is over, the final score can be sent to a central repository for record-keeping. This architecture could work well in client-side implementations, but may be less suitable in stateless web scenarios.

Pessimistic locking #

If you're developing a web-based exam-taking system, you may want to be able to use stateless web servers for scalability or redundancy reasons. In such cases, perhaps keeping the score in memory isn't a good idea.

You could still use the above model, but the client must remember to save the updated score before returning an HTTP response to the browser. Perhaps you find this unsatisfactorily fail-safe, so here's an alternative: use pessimistic locking.

Essentially, you can expose an interface like this:

public interface IHintRepository
{
    void UnlockHint(int hintId);
 
    Hint Read(int hintId);
}

If a client attempts to call the Read method without first unlocking the hint, the method will throw an exception. First, you'll have to unlock the hint using the UnlockHint method, which is clearly a Command.

This is less discoverable, because you can't tell by the type signature of the Read method that it may fail for that particular reason, but it safely protects the system from accidentally reading the hint without impacting the score.

(In type systems with Sum types, you can make the design clearer by statically modelling the return type to be one of several mutually exclusive cases: hint, no hint (hintId doesn't refer to an existing hint), or hint is still locked.)

This sort of interface might in fact align well with a good User Experience, because you might want to ask the user if he or she is sure that (s)he wants to see the hint, given the cost. Such a user interface warning would be followed by a call to UnlockHint if the user agrees to the score deduction.

An implementation of UnlockHint would leave behind a permanent record that the answer hint was unlocked by a particular user, and that record can then subsequently be used when calculating the final score.

Summary #

Sometimes, it can be difficult to see how to both follow CQS and implement the desired business logic. In my experience, it's always possible to recast the problem in such a way that this is possible, but it may take some deliberation before it clicks.

Must you always follow CQS? Not necessarily, but if you understand what your options are, then you know what you're saying no to if you decide not to do it. That's quite a different situation from not having any idea about how to apply the principle.

In this article, I showed two options for reconciling CQS with a Domain Model where a Query seems to have side-effects.


Comments

Hi Mark, don't you think that the pessimistic locking is a case of temporal coupling?
2015-10-09 07:08 UTC

Philippe, thank you for writing. That's a great observation, and one that I must admit that I hadn't considered myself!

At least, in this case encapsulation is still intact because pre- and post-conditions are preserved. You can't leave the system in an incorrect state.

The reason I described the option using Scored<T> before the pessimistic locking alternative is that I like the first option best. Among other benefits, it doesn't suffer from temporal coupling.

2015-10-09 07:59 UTC

Hi Mark, those are all nice solutions!

I think there are also other options, for example sending a "Excercice hint viewed" notification which could then be handled by a subscriber calling a command.
But this is at the cost of some indirection, so it's nice to have other choices.

2015-10-09 12:01 UTC

Loïc, thank you for writing. I'm sure there are other alternatives than the ones I've outlined. The purpose of the article wasn't to provide an exhaustive list of options, but rather explain that it is possible to adhere to the CQS, even though sometimes it seems difficult.

Specifically, are you suggesting to send a notification when the Query is made? Isn't that a side-effect?

2015-10-09 12:23 UTC

There are some alternatives way in which I would consider handling this if I'm being honest. We always want to retrieve the hint. We singularly want to reduce the person's score by 5 points if they have not seen this hint before. This depreciation in points is idempotent and should only be executed if the hint hasn't been viewed before. Contextual information associated to the returned hint, such as last time viewed by current user, would inform the triggering of the command.

I think this is OK, because we care whether a user has viewed a hint. A hint having been viewed by a user means something, so returning it from the query feels valid. Acting up on this accordingly also feels valid, but the command itself becomes nicely idempotent as it understand the single-hit decrease in the points.

2015-10-09 13:18 UTC

Applications and their side effects

Wednesday, 23 September 2015 09:30:00 UTC

All applications have side-effects, but you can isolate those side-effects at the boundaries.

In my Encapsulation and SOLID Pluralsight course, I introduce Command-Query Separation (CQS) as a fundamental tool that will help you think about encapsulation. (I didn't come up with this myself, but rather picked it up from Bertrand Meyer's Object-Oriented Software Construction.)

Despite the age of the CQS principle, it's still news to many people, and I get lots of questions about it; I attempt to answer them as well as I can. Most questions are about specific situations where the inquirer can't see a way out of issuing a Query and at the same time producing a side-effect.

Perhaps the most common situation comes up when auditing comes into play. In some domains, auditing is a legal requirement. Asking any question must produce the side-effect that an audit record is created.

How can you reconcile such requirements with CQS?

Definition #

It may be helpful to first take a step back and attempt to answer the question: what's a side-effect, anyway?

Perhaps we can learn from Functional Programming (gee, who'd have thunk!?), because Functional Programming is all about taming side-effects. Apparently, Simon Peyton-Jones once said during an introduction to Haskell, that if your program has no side-effects, it cannot print a result, it cannot ask for input, it cannot interact with the network or the file system. All it does is heat up the CPU, after which someone from the audience interjected that heating up the CPU is also a side-effect, so, technically speaking, if you want your program to have no side-effects, you cannot even run it. (I only have this story on second hand, but it makes an important point.)

Clearly, there's no such thing as a truly side-effect free program, so how do we define what a side-effect is?

In strictly Functional languages, a side-effect occurs whenever a function isn't referentially transparent. This fancy term means that you can replace a function call with its return value. If a function call isn't equivalent to its return value, it's either because the function call also has a side-effect, or because some side-effect caused the return value to change.

This is closely related to the popular definition of a Query in CQS: Asking the question mustn't change the answer. This is, though, a weaker statement, because it allows a change in global state (e.g. another process updating a database record) to change the answer between two identical queries.

In a completely different context, in REST it's often helpful to distinguish between safe and idempotent requests. The term safe is closely related to a side-effect free Query. As REST in Practice states (p. 38): "By safe, we mean a GET request generates no server-side effects for which the client can be held responsible" (my emphasis). That can often be a useful distinction when thinking about CQS. A Query may cause a side-effect to happen (such as an audit record being written), but that side-effect doesn't concern the client.

Applications are never side-effect free #

All of the above is useful because there's a large set of side-effects we can ignore in practice. We can ignore that the CPU heats up. We can ignore that web servers log HTTP requests. We can (probably) ignore that audit records are written. Such side-effects don't change the answer of a Query.

There may still be cases where you need to deal explicitly with side-effects. You may wish to acknowledge to a user that a file was written. You may want to return a receipt to a client that your service received a document.

It's important to realise that at the application level, applications are all about side-effects.

  • Applications may have GUIs; every time the application updates the screen, that's a side-effect.
  • An application may be a REST service; it handles HTTP, which is modelled on the Request-Response pattern. Even POST requests have responses. Everything in HTTP looks like Queries, because responses are return values.
  • Applications may write to a database; clearly, side-effects are involved.
CQS, or referentially transparent functions, are principles we apply at the source code level to make the code easier to reason about and maintain. It's a question of separation of concerns: we separate pure computation from side-effects.

Applications, on the other hand, are compositions of all relevant concerns. As I've previously described, at the boundaries, applications aren't Object-Oriented. Neither are they Functional. All applications must have a boundary where Encapsulation, or FP, or CQS, doesn't apply. The trick is to keep that boundary as thin as possible.

Thus, if you must violate CQS, do it in the boundary of the application. As an example, perhaps you're creating a REST service that enables clients to create new resources with POST requests. As a response, you want to return the address of the new resource. That violates CQS, but if you keep that violation at the entry point of the request, you've isolated the violation to the boundary of the application.

Summary #

It can be difficult to follow CQS, until you get your head around it, but it's always possible to apply it - except at the application boundary.

How do we know that we can always apply CQS? We know this from Functional Programming. Strict Functional languages like Haskell model everything as Queries (except at the boundaries), and Haskell is a Turing-complete language. If you can model everything as Queries, it should be clear that you can also separate Commands and Queries: if in doubt, eliminate the Command; FP shows how to do that.

Even if you're working in an Object-Oriented language, learn some Functional Programming. It'll teach you how to apply CQS, which is a cornerstone of Encapsulation.


Comments

While I agree most part of this, I thought I would also point out that the CQS brings quite a bit of complications that doesn’t justify the benefit it brings. Hence CQS in my opinion should not be the default reference architecture for the entire application. If it has to be used, it must be with in a bounded context. Udi Dahan the early advocate of CRQS caution on its usage http://udidahan.com/2011/04/22/when-to-avoid-cqrs/ And more details from Martin Fowler http://martinfowler.com/bliki/CQRS.html
2015-09-29 09:44 UTC

Unit testing internals

Tuesday, 22 September 2015 11:56:00 UTC

FAQ: How should you unit test internals? A: Through the public API.

This question seems to come up repeatedly: I have some internal (package-private in Java) code. How do I unit test it?

The short answer is: you unit test it as you unit test all other code: through the public API of the System Under Test (SUT).

Purpose of automated testing #

Details can be interesting, but don't lose sight of the big picture. Why do you test your software with automated tests?

Automated testing (as opposed to manual testing) only serves a single purpose: it prevents regressions. Some would say that it demonstrates that the software works correctly, but that's inaccurate. Automated tests can only demonstrate that the software works correctly if the tests are written correctly, but that's a different discussion.

Assuming that all automated tests are correct, then yes: automated tests also demonstrate that the software works, but it's still regression testing. The tests were written to demonstrate that the software worked correctly once. Running the tests repeatedly only demonstrates that it still works correctly.

What does it mean that the software works correctly? When it comes to automated testing, the verification is only as good as the tests. If the tests are good, the verification is strong. If the tests are weak, the verification is weak.

Consider the purpose of writing the software you're currently being paid to produce. Is the purpose of the software to pass all tests? Hardly. The purpose of the software is to solve some problem, to make it easier to perform some task, or (if you're writing games) to entertain. The tests are proxies of the actual purpose.

It ought to be evident, then, that automated tests should be aligned with the purpose of the software. There's nothing new in this: Dan North introduced Behaviour Driven Development (BDD) in order to emphasise that testing should be done with the purpose of the software in mind. You should test the behaviour of the software, not its implementation.

Various software can have different purposes. The software typically emphasised by BDD tends to be business software that solves a business problem. Often, these are full applications. Other types of software include reusable libraries. These exist to be reusable. Common to all is that they have a reason to exist: they have externally visible behaviour that some consumer cares about.

If you want to test a piece of software to prevent regressions, you should make sure that you're testing the externally visible behaviour of the software.

Combinatorics #

In an ideal world, then, all automated testing should be done against the public interface of the SUT. If the application is a web application, testing should be done against the HTML and JavaScript. If the application is a mobile app, testing should be done somehow by automating user interaction against its GUI. In reality, these approaches to testing tend to be brittle, so instead, you can resort to subcutaneous testing.

Even if you're developing a reusable library, or a command-line executable, if you're doing something even moderately complex, you run into another problem: a combinatorial explosion of possible paths through the code. As J.B. Rainsberger explains much better than I can do, if you combine software modules (e.g. validation, business logic, data access, authentication and authorisation, caching, logging, etc) you can easily have tens of thousands distinct paths through a particular part of your software - all via a single entry point.

This isn't related to BDD, or business problems, or agile... It's a mathematical result. There's a reason the Test Pyramid looks like it does.

When you combine a procedure with four distinct paths with another with five paths, the number of possible paths isn't (4 + 5 =) nine; it's (4 * 5 =) twenty. As you combine units together, you easily reach tens of thousands distinct paths through your software. (This is basic combinatorics).

You aren't going to write tens of thousands of automated tests.

In the ideal world, we would like to only test the behaviour of software against its public interface. In the real world, we have to separate any moderately complex software into modules, and instead test the behaviour of those modules in isolation. This prevents the combinatorial explosion.

If you cut up your software in an appropriate manner, you'll get modules that are independent of each other. Many books and articles have been written about how to do this. You've probably heard about Layered Application Architecture, Hexagonal Architecture, or Ports and Adapters. The recent interest related to microservices is another (promising) attempt at factoring code into smaller units.

You still need to care about the behaviour of those modules. Splitting up your software doesn't change the overall purpose of the software (whatever that is).

Internals #

When I do code reviews, often the code is already factored into separate concerns, but internal classes are everywhere. What could be the reason for that?

When I ask, answers always fall into one of two categories:

  • Members (or entire classes) are poorly encapsulated, so the developers don't want to expose these internals for fear of destabilising the system.
  • Types or members are kept hidden in order to protect the code base from backwards compatibility issues.
The first issue is easy to deal with. Consider a recent example:

private void CalculateAverage()
{
    this.average =
        TimeSpan.FromTicks(
            (long)this.durations.Average(ts => ts.Ticks));
}

This CalculateAverage method is marked private because it's unsafe to call it. this.durations can be null, in which case the method would throw an exception. It may feel like a solution to lock down the method with an access modifier, but really it only smells of poor encapsulation. Instead, refactor the code to a proper, robust design.

There are valid use cases for the private and internal access modifiers, but the majority of the time I see private and internal code, it merely smells of poor design. If you change the design, you could make types and members public, and feel good about it.

The other issue, concerns about compatibility, can be addressed as well. In any case, for most developers, this is essentially a theoretical issue, because most code written isn't for public consumption anyway. If you also control all consumers of your API, you may not need to worry about compatibility. If you need to change the name of a class, just let your favourite refactoring tool do this for you, and all is still good. (I'd still recommend that you should adopt a Ranger approach to your Zoo software, but that's a different topic.)

The bottom line: you don't have to hide most of your code behind access modifiers. There are good alternatives that lead to better designs.

Testing internals #

That was a long detour around reasons for testing, as well as software design in general. How, then, do you test internal code?

Through the public API. Remember: the reason for testing is to verify that the observable behaviour of the SUT is correct. If a class or member is internal, it isn't visible; it's not observable. It may exist in order to support the public API, but if you explicitly chose to not make it visible, you also stated that it can't be directly observable. You can't have it both ways. (Yes, I know that .NET developers will point the [InternalsVisibleTo] attribute out to me, but this attribute isn't a solution; it's part of the problem.)

Why would you test something that has no 'official' existence?

I think I know the answer to that question. It's because of the combinatorial explosion problem. You want the software to solve a particular problem, and to keep everything else 'below the surface'. Unfortunately, as the complexity of the software grows, you realise (explicitly or implicitly) that you can't cover the entire code base with high-level BDD-style tests. You want to unit test the internals.

The problem with doing this is that it's just as brittle as testing through a GUI. Every time you change the internals, you'll have to change your tests.

A well-designed system tends to be more stable when it comes to unit testing. A poorly designed system is often characterised by tests that are too coupled to implementation details. As soon as you decide to change something in such a system, you'll need to change the tests as well.

Recall that automated testing is regression testing. The only information we get from running tests is whether or not existing tests passed or failed. We don't learn if the tests are correct. How do we know that tests are correct? Essentially, we know because we review them, see them fail, and never touch them again. If you constantly have to fiddle with your tests, how do you know they still test the right behaviour?

Design your sub-modules well, on the other hand, and test maintenance tends to be much lower. If you have well-designed sub-modules, though, you don't have to make all types internal. Make them public. They are part of your solution.

Summary #

A system's public API should enable you to exercise and verify its observable behaviour. That's what you should care about, because that's the SUT's reason to exist. It may have internal types and members, but these are implementation details. They exist to support the system's observable behaviour, so test them through the public API.

If you can't get good test coverage of the internal parts through the public API, then why do these internal parts exist? If they exhibit no observable behaviour, couldn't you delete them?

If they do exist for a reason, but you somehow can't reach them through the public API, it's a design smell. Address that smell instead of trying to test internals.


Comments

Dear Mark,

Thank you for this excellent article. I got one question: in the "Internals" section, you state that there are valid use cases for the internal access modifier - can you name some of them for me?

I'm also a proponent of keeping nearly all types public in reusable code bases even if most of them are not considered to be part of the API that a client usually consumes - therefore I would only think about marking a class internal if it cannot protect it's invariants properly and fail fast when it's used in the wrong way. But you also mentioned that in this article, too.

When I did a Google search on the topic, I couldn't find any useful answers either. The best one is probably from Eric Lippert on a Stack Overflow question, stating that big important classes that are hard to verify in the development process should be marked internal. But one can easily counter that by not designing code in such a way.

Anyways, it would be very kind of you if you could provide some beneficial use cases for the use of internal.

Sincerely,
Kenny

2015-10-03 08:14 UTC

Kenny, thank you for writing. The answer given by Eric Lippert does, in my opinion, still paint an appropriate picture. There's always a cost to making types or members public, and I don't always wish to pay that cost. The point that I'm trying to make in the present article is that while this cost exists, it's not so high that it justifies unit testing internals via mechanisms like Reflections or [InternalsVisibleTo]. The cost of doing that is higher than making types or members public.

Still, there are many cases where it's possible to cover internal or private code though a public API. Sometimes, I may be in doubt that the way I've modelled the internal code is the best way to do so, and then I'd rather avoid the cost of making it public. After all, making code public means that you're making a promise that it'll be around without breaking changes for a long time.

Much of my code has plenty of private or internal code. An example is Hyprlinkr, which has private helper methods for one of its central features. These private helpers only exist in order to make the feature implementation more readable, but are not meant for public consumption. They're all covered by tests that exercise the various public members of the class.

Likewise, you can consider the private DurationStatistics class from my Temporary Field code smell article. At first, you may want to keep this class private, because you're not sure that it's stable (i.e. that it isn't going to change). Later, you may realise that you'll need that code in other parts of your code base, so you move it to an internal class. If you're still not convinced that it's stable, you may feel better keeping it internal rather than making it public. Ultimately, you may realise that it is, indeed, stable, in which case you may decide to make it public - or you may decide that no one has had the need for it, so it doesn't really matter, after all.

My goal with this article wasn't to advise against internal code, but only to advise against trying to directly call such code for unit testing purposes. In my experience, when people ask how to unit test internal code, it's a problem they have because they haven't used Test-Driven Development (TDD). If you do TDD, you'll have sufficient coverage, and then it doesn't matter how you choose to organise the internals of your code base.

2015-10-03 11:45 UTC

Public types hidden in plain sight

Monday, 21 September 2015 14:12:00 UTC

Instead of making types internal to a package, you can hide them in plain sight as public types.

When I review object-oriented code, I often see lots of internal (package-private in Java) classes and members. Sometimes, most of a code base is written at that access level.

When I ask for the reasoning behind this, the answer is invariably: encapsulation.

Encapsulation is one of the most misunderstood concepts in programming. If you think it's about making everything inaccessible, then there's only one logical conclusion: all your code must have this API:

public class Program
{
    public static int Main(string[] args)
    {
        // Everything else goes here!
    }
}

If you've seen my Pluralsight course about encapsulation, you'll know that I prefer Bertrand Meyer's view, as outlined in Object-Oriented Software Construction. Essentially, it's about design by contract (pre- and post-conditions).

Once I start to ask more pointed questions about enthusiastic use of internal-only access, it turns out that often, the underlying reason is that the developers want to be able to change the design of their internal types and members, without breaking existing clients. A noble goal.

Imagine that such a code base has existing clients. While the maintainers can't change the public types without breaking these clients, they can change all internal types as much as they want.

Unfortunately, there are disadvantages to this design strategy. You can't easily unit test such code bases. There are ways, but none of them are good.

What if, instead of making types internal, you made them public? Are there ways to prevent clients from relying on such types? Yes, plenty of ways.

Public, append-only #

My preferred approach to this problem is to make types public anyway. If my overall feeling about a type is that "it seems okay", I make it public, even if I know that there's a risk involved with this.

Consider, as an example, the private DurationStatistics class I extracted in a recent article. In that article, I kept the class private, because I wanted to discuss visibility in a separate article.

You can easily promote the DurationStatistics class to a public class:

public class DurationStatistics

Here you only see the declaration of the class, because the implementation is exactly the same as in the other article. The code is moved out of the Estimator class and put into its own file, though.

How can you be sure that DurationStatistics is 'done'? How can you be sure that you're not going to need to change it later?

You can't.

You can, however, deal with that when it occurs. You can still add new members to the class, but you can't remove members or change the type's name.

If you need to change something (thereby breaking compatibility), then don't change it. Instead, add the new type or member, but leave the old artefact in place and deprecate it. In .NET, you can do this by adorning the member or type with the [Obsolete] attribute. You can even add a message that points clients to the new, preferred way of doing things:

[Obsolete("DurationStatistics is being retired (our fault). Use Foo instead.")]
public class DurationStatistics

This will cause a compiler warning in all client code using the DurationStatistics class. You should make the message as helpful and friendly as possible.

This may leave your code with compiler warnings. This is good, because you should work to remove these warnings from your own code base. The only place you should leave them in place is in your unit tests. As long as you have a deprecated type in your published code base, you should also keep it covered by unit tests. This will cause compiler warnings in your unit tests, but you can suppress such warnings explicitly there:

[Theory]
[InlineData(new[] { 1, 1, 1 }, 1)]
[InlineData(new[] { 2, 3, 4 }, 3)]
public void AverageIsCorrect(int[] seconds, int expectedSeconds)
{
#pragma warning disable 618
    var sut = 
        new DurationStatistics(
            seconds.Select(s => TimeSpan.FromSeconds(s)).ToArray());
#pragma warning restore 618
    var actual = sut.Average;
    Assert.Equal(expectedSeconds, actual.TotalSeconds);
}

Keep the 'disable scope' as small as possible, and always remember to restore the warning after disabling it.

Perhaps you think that this approach of just making everything public and dealing with the consequences later is an unprofessional, unhygienic, and ugly way to evolve code, but it's really the logical way to produce and publish SOLID code.

Obsolete by default #

A variant of the above technique is that instead of making a type or member internal, you can deprecate it right away. This will immediately warn clients that they're doing something they aren't supposed to do.

If you ever decide to 'promote' that type to a bona fide member of your API, you can simply remove the [Obsolete] attribute.

Separate namespaces #

You can also hide public types in separate namespaces. If you don't document the types in those 'hidden' namespaces, clients will again have to do something explicit to use them.

This is a softer way of hiding public types in plain sight, because clients get no compiler warnings if they use those 'hidden' types. Still, it can be quite effective.

My experience with maintaining public software (e.g. the now-more-than-six-year-old AutoFixture project) is that the most common problem is that users don't even find the public types I want them to find! If you knew how many times I've seen people reinvent a feature already available in the API... And that's even when all the interesting stuff is in the same namespace.

Putting types in separate namespaces is, in my experience, quite an effective way of hiding them.

Separate libraries #

Ultimately, you can put your volatile types in a separate library. In the Estimation example, you can ship the Estimator class (your 'real' public API) in one library, but put DurationStatistics in another library:

The Estimator class is in the Estimation library, whereas the DurationStatistics are in another library named Estimation.Statistics.

The Estimation library references the Estimation.Statistics library, so it can use all the public types in that library. You can unit test the public types in the Estimation library, but you can also unit test the public types in the Estimation.Statistics library.

When you publish your API, you give clients a reference to Estimation, but not to Estimation.Statistics. You still ship Estimation.Statistics as a dependency of your API, but clients shouldn't reference it.

Specifically, if you wish to publish your API as a NuGet package, you can use the <references> element to ensure that only the 'official' library is added to a project:

The library Ploeh.Samples.Estimation is referenced, but Ploeh.Samples.Estimation.Statistics isn't.

In this library, I installed the (local) Estimation NuGet package, and as you can see, only Ploeh.Samples.Estimation is referenced, whereas Ploeh.Samples.Estimation.Statistics isn't. This means that the client can easily use the official API, e.g. to create a new instance of the Estimator class:

this.estimator = new Estimator(TimeSpan.FromMinutes(1));

On the other hand, if a client developer attempts to use the public DurationStatistics class, that's not possible:

The DurationStatistics class isn't available, and Visual Studio can't even suggest pulling in the appropriate reference.

Not only is DurationStatistics not available, all Visual Studio can suggest is to create it; it can't suggest pulling in the appropriate assembly reference, which is a good thing.

The library is still there, it's just not referenced, so Estimator still works at run-time.

The trick to set up a NuGet package in this way is to use the <references> element in the .nuspec file:

<?xml version="1.0"?>
<package >
  <metadata>
    <id>Estimation</id>
    <version>1.0.0</version>
    <authors>Mark Seemann</authors>
    <owners>Mark Seemann</owners>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <description>
      This is an example that demonstrates how dependent libraries can
      hide public types in plain sight.
    </description>
    <copyright>Copyright Mark Seemann 2015</copyright>
    <references>
      <reference file="Ploeh.Samples.Estimation.dll" />
    </references>
  </metadata>
  <files>
    <file src="..\Estimation\bin\Debug\Ploeh.Samples.Estimation.dll" target="lib\net45" />
    <file src="..\Estimation\bin\Debug\Ploeh.Samples.Estimation.Statistics.dll" target="lib\net45" />
  </files>
</package>

Notice that both Ploeh.Samples.Estimation.dll and Ploeh.Samples.Estimation.Statistics.dll are included in the NuGet package, but that only Ploeh.Samples.Estimation.dll should be referenced when the NuGet package is added.

For good measure I should point out that in order to create these demo files, I only installed the Estimation NuGet package from my local disk (using the -Source switch), so don't go and look for it on nuget.org.

Summary #

There are plenty of ways to hide public types in plain sight. Often, one or more of these options are better than misusing the internal/package-private access modifier. This doesn't mean that you must never make a class internal, but it means that you should consider the advantages and disadvantages of all options before making such a decision.

The main disadvantage of using the internal access modifier is that you can't easily unit test such a class. The options provided here should give you plenty of alternatives that will enable you to easily unit test classes while still hiding them from client developers.


Comments

Børge Nordli #
Your main argument is the non-testability of internal classes. What is wrong with the InternalsVisibleTo assembly attribute? Isn't this one of the main use cases for this attribute?
2015-08-21 17:05 UTC

The problem with the [InternalsVisibleTo] attribute is exactly that it enables you to unit test internals, which you shouldn't.

2015-09-22 12:24 UTC

Dear Mark,

Thanks for another great article. I totally agree with your point of view, but I can also see why Børge might be somewhat unsatisfied with the answer to not use the InternalsVisibleTo attribute in the context of automated testing.

I think that using internal to hide certain types of a reusable code base is a violation of the Open/Closed Principle, because there is no easy way to extend the internal types without touching the source code (you can use reflection to call internal methods, but that's annoying, and you simply cannot derive from an internal class or implement an internal interface.)

What are your thoughts on this argument?

With regards,
Kenny

2015-10-03 04:37 UTC

Kenny, thank you for writing. Overall, I like that additional argument. In order to keep this discussion balanced, I think it's worth pointing out that your argument certainly applies to reusable software, but that lots of software isn't (or rather, shouldn't be) reusable. Udi Dahan has pointed out that reuse may be overrated, or even have negative consequences, and while I've been disagreeing with certain parts of that claim, in general I think there's much wisdom in that observation.

Some software is written to be reusable, but much software shouldn't be. This is the sort of software I call Zoo software. Your argument about violation of the OCP certainly applies to Wildlife Software, but only indirectly to Zoo software.

2015-10-03 09:30 UTC

Temporary Field code smell

Friday, 18 September 2015 11:52:00 UTC

Temporary Field is a well-known code smell. Here's an example and remedy.

The Temporary Field code smell was described more than a decade ago, but I keep encountering it when doing code reviews. Despite its vintage, I couldn't find a good example, so I decided to provide one.

The code smell is described in Refactoring:

"Sometimes you see an object in which an instance variable is set only in certain circumstances. Such code is difficult to understand, because you expect an object to need all of its variables. [...]

"A common case of temporary field occurs when a complicated algorithm needs several variables. Because the programmer didn't want to pass around a huge parameter list (who does?), he put them in fields."

- Refactoring, Martin Fowler et al., Addison-Wesley 1999. p. 84

Unfortunately, Refactoring doesn't provide an example, and I couldn't find a good, self-contained example on the web either.

Example: estimate a duration #

In this example, a developer was asked to provide an estimate of a duration, based on a collection of previously observed durations. The requirements are these:

  • There's a collection of previously observed durations. These must be used as statistics upon which to base the estimate.
  • It's better to estimate too high than too low.
  • Durations are assumed to be normal distributed.
  • The estimate should be higher than the actual duration in more than 99% of the times.
  • If there are no previous observations, a default estimate must be used as a fall-back mechanism.
Does this sound like a complicated algorithm?

It's not that bad, actually. Because the distribution is assumed to be normal, you can find a good estimate by calculating the average, and add three times the standard deviation.

Here's how our developer attempted to solve the problem:

public class Estimator
{
    private readonly TimeSpan defaultEstimate;
    private IReadOnlyCollection<TimeSpan> durations;
    private TimeSpan average;
    private TimeSpan standardDeviation;
 
    public Estimator(TimeSpan defaultEstimate)
    {
        this.defaultEstimate = defaultEstimate;
    }
 
    public TimeSpan CalculateEstimate(
        IReadOnlyCollection<TimeSpan> durations)
    {
        if (durations == null)
            throw new ArgumentNullException(nameof(durations));
 
        if (durations.Count == 0)
            return this.defaultEstimate;
 
        this.durations = durations;
        this.CalculateAverage();
        this.CalculateStandardDeviation();
 
        var margin = TimeSpan.FromTicks(this.standardDeviation.Ticks * 3);
        return this.average + margin;
    }
 
    private void CalculateAverage()
    {
        this.average =
            TimeSpan.FromTicks(
                (long)this.durations.Average(ts => ts.Ticks));
    }
 
    private void CalculateStandardDeviation()
    {
        var variance =
            this.durations.Average(ts => 
                Math.Pow(
                    (ts - this.average).Ticks,
                    2));
        this.standardDeviation = 
            TimeSpan.FromTicks((long)Math.Sqrt(variance));
    }
}

The CalculateEstimate method directly uses the temporary field durations, as well as implicitly the fields average and standardDeviation. The reason for this is to avoid passing parameters around. Both the average and the standard deviation depend on the durations, but the standard deviation also depends on the average.

These dependencies are difficult to see in this code snippet:

this.durations = durations;
this.CalculateAverage();
this.CalculateStandardDeviation();

What if I wanted to switch the order around?

this.durations = durations;
this.CalculateStandardDeviation();
this.CalculateAverage();

This compiles! It also produces a result if you invoke the CalculateEstimate method. No exception is thrown, but the result returned is incorrect!

Not only is this code difficult to understand, it's also brittle. Furthermore, it's not thread-safe.

This was most likely done with the best of intentions. After all, in Clean Code you learn that methods with zero arguments are better than methods with one argument (which are better than methods with two arguments, and so on).

There are better ways to factor the code, though.

Extract Class #

In Refactoring, the suggested cure is to extract a class that contains only the temporary fields. This refactoring is called Extract Class. Your first step can be to introduce a private, nested class within the containing class. In the case of the example, you might call this class DurationStatistics. It can be used from CalculateEstimate in this way:

public TimeSpan CalculateEstimate(
    IReadOnlyCollection<TimeSpan> durations)
{
    if (durations == null)
        throw new ArgumentNullException(nameof(durations));
 
    if (durations.Count == 0)
        return this.defaultEstimate;
 
    var stats = new DurationStatistics(durations);
    var margin = TimeSpan.FromTicks(stats.StandardDeviation.Ticks * 3);
    return stats.Average + margin;
}

It's now much clearer what's going on. You have some statistics based on the durations, and those contain both the average and the standard deviation. The flow of data is also clear: you need the stats to get the average and the standard deviation, and you need stats.StandardDeviation in order to calculate the margin, and the margin before you can calculate the return value. If you attempt to move those lines of code around, it's no longer going to compile.

This solution is also thread-safe.

Here's the full code of the Estimator class, after the Extract Class refactoring:

public class Estimator
{
    private readonly TimeSpan defaultEstimate;
 
    public Estimator(TimeSpan defaultEstimate)
    {
        this.defaultEstimate = defaultEstimate;
    }
 
    public TimeSpan CalculateEstimate(
        IReadOnlyCollection<TimeSpan> durations)
    {
        if (durations == null)
            throw new ArgumentNullException(nameof(durations));
 
        if (durations.Count == 0)
            return this.defaultEstimate;
 
        var stats = new DurationStatistics(durations);
        var margin = TimeSpan.FromTicks(stats.StandardDeviation.Ticks * 3);
        return stats.Average + margin;
    }
 
    private class DurationStatistics
    {
        private readonly IReadOnlyCollection<TimeSpan> durations;
        private readonly Lazy<TimeSpan> average;
        private readonly Lazy<TimeSpan> standardDeviation;
 
        public DurationStatistics(IReadOnlyCollection<TimeSpan> durations)
        {
            if (durations == null)
                throw new ArgumentNullException(nameof(durations));
            if (durations.Count == 0)
                throw new ArgumentException(
                    "Empty collection not allowed.",
                    nameof(durations));
 
            this.durations = durations;
            this.average = new Lazy<TimeSpan>(this.CalculateAverage);
            this.standardDeviation =
                new Lazy<TimeSpan>(this.CalculateStandardDeviation);
        }
 
        public TimeSpan Average
        {
            get { return this.average.Value; }
        }
 
        public TimeSpan StandardDeviation
        {
            get { return this.standardDeviation.Value; }
        }
 
        private TimeSpan CalculateAverage()
        {
            return TimeSpan.FromTicks(
                (long)this.durations.Average(ts => ts.Ticks));
        }
        private TimeSpan CalculateStandardDeviation()
        {
            var variance =
                this.durations.Average(ts =>
                    Math.Pow(
                        (ts - this.average.Value).Ticks,
                        2));
            return TimeSpan.FromTicks((long)Math.Sqrt(variance));
        }
    }
}

As you can see, the Estimator class now only has a single read-only field, which ensures that it isn't temporary.

The original intent of not passing parameters around is still preserved, so this solution still adheres to that Clean Code principle.

The DurationStatistics class lazily calculates the average and standard deviation, and memoizes their results (or rather, that's what Lazy<T> does).

Instances of DurationStatistics have a shorter lifetime than the containing Estimator object.

The DurationStatistics class is a private, nested class, but a next step might be to pull it out to a public class in its own right. You don't have to do this. Sometimes I leave such classes as private classes, because they only exist to organise the code better; their purpose and behaviour are still narrow in scope, and associated with the containing class. In other cases, such a refactoring may uncover a great way to model a particular type of problem. If that's the case, making it a public class could make the entire code base better, because you've now introduced a reusable concept.

Summary #

Class fields are intended to be used by the class. They constitute the data part of data with behaviour. If you have temporary values, either pass them around as arguments, or extract a new class to contain them.


Comments

The refactored example is better because we are no longer sharing the fields across methods but wrapping it in a extraced class. Is this a good way of summarizing the changes?
How about passing everything as parameters and returning the calculated value as result, would this be considered dirty code?
Just trying to get the most out of this example.
2015-09-25 22:20 UTC

Tom, thank you for writing. Great observation about passing arguments around instead! In this case, passing the necessary arguments around would also be a good solution. It would look like this:

public class Estimator
{
    private readonly TimeSpan defaultEstimate;
 
    public Estimator(TimeSpan defaultEstimate)
    {
        this.defaultEstimate = defaultEstimate;
    }
 
    public TimeSpan CalculateEstimate(
        IReadOnlyCollection<TimeSpan> durations)
    {
        if (durations == null)
            throw new ArgumentNullException(nameof(durations));
 
        if (durations.Count == 0)
            return this.defaultEstimate;
            
        var average = CalculateAverage(durations);
        var standardDeviation =
            CalculateStandardDeviation(durations, average);
 
        var margin = TimeSpan.FromTicks(standardDeviation.Ticks * 3);
        return average + margin;
    }
 
    private static TimeSpan CalculateAverage(
        IReadOnlyCollection<TimeSpan> durations)
    {
        return TimeSpan.FromTicks((long)durations.Average(ts => ts.Ticks));
    }
 
    private static TimeSpan CalculateStandardDeviation(
        IReadOnlyCollection<TimeSpan> durations,
        TimeSpan average)
    {
        var variance =
            durations.Average(ts => 
                Math.Pow(
                    (ts - average).Ticks,
                    2));
        return TimeSpan.FromTicks((long)Math.Sqrt(variance));
    }
}

This refactoring also eliminates the temporary fields, and is probably even easier to understand.

To be quite honest, that's what I would have done with this particular example if it had been production code. The reason I wanted to show the Extract Class refactoring instead is that passing arguments doesn't scale well when you add more intermediate values. In this example, the maximum number of arguments you have to pass is two, which is easily within acceptable limits, but what if you have 15 intermediate values?

Passing 15 method arguments around is well beyond most people's threshold, so instead they sometimes resort to temporary fields. The point here is that this isn't necessary, because you have the alternative of extracting classes.

Why didn't I show an example with 15 intermediate values, then? Well, first, I couldn't think of a realistic example, and even if I could, it would be so complicated that no one would read the article :)

2015-09-26 07:31 UTC

Sounds reasonable - Passing one or two around is okay but at a certain point an Extract Class would make more sense.

No, it's a good example otherwise it would be too confusing as you mentioned. Interesting post, thanks!

2015-09-26 12:00 UTC

Ad hoc Arbitraries with FsCheck.Xunit

Tuesday, 08 September 2015 11:11:00 UTC

When using FsCheck with xUnit.net, you can define ad hoc Arbitraries in-line in your test functions.

Writing properties with FsCheck and using xUnit.net as a test host is a nice combination. Properties are written as normal functions annotated with the Property attribute:

[<Property>]
let ``findNeighbors returns 8 cells`` (cell : int * int) =
    let actual : Set<int * int> = findNeighbors cell
    8 =! actual.Count

FsCheck takes care of generating values for the cell argument. This works well in many cases, but sometimes you need a bit more control over the range of values being generated by FsCheck.

Motivating example #

You may already have guessed it from the above code snippet, but the example for this article is a property-based approach to Conway's Game of Life. One of the properties of the game is that any live cell with more than three live neighbours dies.

This means that you have to write a property where one of the values is the number of live neighbours. That number must be a (randomly generated) number between 4 and 8 (including both ends), because the maximum number of neighbours in a cell grid is eight. How do you get FsCheck to give you a number between 4 and 8?

Crude solution: conditional property #

There are various solutions to the problem of getting fine control over certain values generated by FsCheck, but the ones I first learned turn out to be problematic.

One attempt is to use conditional properties, where you supply a boolean expression, and FsCheck will throw away all properties that fail the boolean test:

[<Property>]
let ``Any live cell with more than three live neighbors dies``
    (cell : int * int)
    (neighborCount : int) =
    (3 < neighborCount && neighborCount <= 8) ==> lazy
 
    let neighborCells = findNeighbors cell |> pickRandom neighborCount
    let actual = calculateNextState (cell :: neighborCells) cell
    Dead =! actual

The ==> operator is an FsCheck-specific operator that indicates that FsCheck should disregard all properties where the input arguments don't satisfy the boolean condition on the left side. The use of lazy ensures that FsCheck will only attempt to evaluate the property when the condition is true.

That's simple and tidy, but unfortunately doesn't work. If you attempt to run this property, you'll get a result like this:

Test 'Ploeh.Katas.GameOfLifeProperties.Any live cell with more than three live neighbors dies' failed: 
Arguments exhausted after 34 tests.

The reason is that FsCheck generates random integers for neighborCount, and most of these values (statistically) fall outside of the range 4-8. After a (default) maximum of 1000 attempts, FsCheck gives up, because at that time, it's only managed to find 34 values that satisfy the condition, but it wants to find 100.

What a disappointment.

Crude solution: custom Arbitrary #

Another apparent solution is to define a custom Arbitrary for FsCheck.Xunit. The mechanism is to define a static class with your custom rule, and register that with the Property attribute. The class must have a static method that returns an Arbitrary<'a>.

In this particular example, you'll need to define a custom Arbitrary that only picks random numbers between 4 and 8. That's easy, but there's a catch: if you change the way int values are generated, you're also going to impact the generated cell values, because a cell here is an int * int tuple.

Since you only need a small number, you can cheat and customize byte values instead:

type ByteBetween1and8 =
    static member Byte () = Gen.elements [1uy .. 8uy] |> Arb.fromGen
 
[<Property(Arbitrary = [| typeof<ByteBetween1and8> |])>]
let ``Any live cell with more than three live neighbors dies``
    (cell : int * int)
    (neighborCount : byte) =
    neighborCount > 3uy ==> lazy
 
    let neighborCells = findNeighbors cell |> pickRandom (int neighborCount)
    let actual = calculateNextState (cell :: neighborCells) cell
    Dead =! actual

The ByteBetween1and8 type is a static class with a single static member that returns Arbitrary<byte>. By using the Arbitrary property of the Property attribute (!), you can register this custom Arbitrary with the Property.

This 'solution' side-steps the issue by using a substitute data type instead of the desired data type. There are several problems with this:

  • You have to convert the byte value back to an integer in order to use it with the System Under Test: int neighborCount.
  • 3uy is less readable than 3.
  • A newcomer will wonder why neighborCount is a byte instead of an int.
  • You can't generalise this solution. It works because F# has more than one number type, but if you need to generate strings, you're out of luck: there's only one (normal) type of string in .NET.
Next to these issues with using bytes instead of ints, there's the larger problem that it's awkward to have to define an entire new type (ByteBetween1and8) for this purpose. Finally, this mechanism isn't type-safe. ByteBetween1and8 doesn't implement any interface. It doesn't have to, because the Arbitrary property on the Property attribute is just an array of Type instances. They can be any type, and your code will compile, but if one of those Types don't meet the requirements, an exception will be thrown at run-time.

This way works, but is hardly elegant or safe.

Ad hoc, in-line Arbitraries #

There's a safe way to define ad hoc, in-line Arbitraries with FsCheck.Xunit:

[<Property>]
let ``Any live cell with > 3 live neighbors dies`` (cell : int * int) =
    let nc = Gen.elements [4..8] |> Arb.fromGen
    Prop.forAll nc (fun neighborCount ->
        let liveNeighbors =
            cell
            |> findNeighbors
            |> shuffle
            |> Seq.take neighborCount
            |> Seq.toList
        
        let actual : State =
            calculateNextState (cell :: liveNeighbors |> shuffle |> set) cell
 
        Dead =! actual)

Using Prop.forAll enables you to execute your property with a custom Arbitrary, but mixed with the 'normally' generated values that the function receives via its arguments. In this example, nc is an Arbitrary<int>. Notice how it's explicitly used to populate the neighborCount value of the property, whereas the cell value arrives via normal means.

This is type-safe, because nc is an Arbitrary<int>, which means that neighborCount is statically inferred to be an int.

If you need more than a single ad hoc Arbitrary, you can always create Arbitraries for each of them, and then use Gen.map2, Gen.map3, and so on, to turn those individual Arbitraries into a single Arbitrary of a tuple. You can then use that tuple with Prop.forAll.

Summary #

FsCheck is a well-designed library that you can combine in lots of interesting ways. In this article you learned how to use Prop.forAll to evaluate a property with a mix of normal, arbitrarily generated values, and an ad hoc, in-line Arbitrary.

Addendum 2016-03-01: You can write such properties slightly better using the backward pipe operator.


Page 42 of 73

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!