The dangers of primitive obsession

Posted on 2015-01-20 Updated on 2016-04-06

Primitive obsession is, like many facets of modern programming, something that has been written about many times before (e.g. http://sourcemaking.com/refactoring/primitive-obsession, http://blog.thecodewhisperer.com/2013/03/04/primitive-obsession-obsession/), yet still we see its ensnaring of unwary programmers.

For those who may not know, primitive obsession means making use of primitive data types (int, long, string, array etc) to represent what is, really, a domain object. Easy examples of such types are string for email addresses, int for quantity and, what we’re here to talk about today, Guid for Id fields.

Writing value objects to represent these items instead of primitives has a number of benefits. First: it reduces code duplication. Let’s say that a system is using string to represent a user’s UserName. Business rules state that UserNames have a maximum of 25 characters. Take the following example code:

public class ServiceA
{
    public void DoSomething(string username)
    {
        if ( username.Length &gt; 25 )
            throw new ArgumentException(&quot;Username cannot be more than 25 characters&quot;);

        ...
    }
}

This appears fine until we find the following elsewhere in the solution:

public class ServiceB
{
    public void DoSomethingElse(string username)
    {
        if ( username.Length &gt; 25 )
            throw new ArgumentException(&quot;Username cannot be more than 25 characters&quot;);

        ...
    }
}

The code that double checks the validity of the username has been duplicated! Obviously this is bad. Building a new class to encompass the concept of a username is the preferred solution for this. E.g.

public class Username
{
    public Username(string username)
    {
        if (username.Length &gt; 25)
            throw new ArgumentException(String.Format(&quot;'{0}' is not a valid username&quot;, username));

        Value = username;
    }

    public string Value { get; private set; }

    //     Equals, GetHashcode, ToString methods go here
}

Since Username protects its invariants and encapsulates the business rules each service can be secure in the knowledge that the username it has been passed is valid and that they can continue without having to verify this themselves. (Notice that the class is also publicly immutable. This means that objects of this type don’t change their hashcodes and are therefore suitable to be inserted into any kind of collection. Anyway, it makes no sense to change the value inside a Username object after it’s been created. Just create a new one.) This allows the two services to change their definitions to:

public class ServiceB
{
    public void DoSomethingElse(Username username)
    {
        ...
    }
}

This benefit of using specifically-written classes instead of primitives is what’s been written about before; this is well understood. I would now like to talk about the second big benefit; accidental type reuse.

Imagine the two following types in a domain:

public class Customer
{
    public int Id { get; private set; }
}

public class Order
{
    public int Id { get; private set; }
    public Customer Customer { get; private set; }
}

Now look at the following code, and see if you can see the (rather subtle) bug. A clue: the compiler is absolutely fine with what’s been written.

Given:

public Order GetCustomerForOrder(int orderId)
{
    var customers =
        from order in orders
        where order.Id == orderId
        select order.Customer;

    return customers.Single();
}

Why doesn’t this work:

Order order = // Get order from somewhere ...;

Customer customer = GetCustomerForOrder(order.CustomerId);

emailSender.SendOrderProcessedEmail(customer);

The worst thing about this bug is that there isn’t even a runtime error; sometimes the emails go out just fine, but to the wrong customer. What’s going on?

The answer is here: GetCustomerForOrder(order.CustomerId). If you look at the signature of GetCustomerForOrder, you’ll see that its parameter is actually named “orderId”. We’ve accidentally passed in the Customer ID of an Order rather than the ID of the Order itself, and the compiler let us because we’ve given both fields the type int.

Imagine if our domain looked like this instead:

public class Customer
{
    public CustomerId Id { get; private set; }
}

public class Order
{
    public OrderId Id { get; private set; }
    public Customer Customer { get; private set; }
}

With this domain our error-prone code from earlier wouldn’t have compiled and the bug would never have made it to production. We’ve used the compiler to enforce correctness; something that we should always strive to do.

Note: This is a very real issue: The above case is very similar to a bug that I was tasked with solving at my first job. New starters were put on the support team, and it wasn’t long before I’d sunk a ton of time into a bug just like this. Our auditing system was giving what the business was sure were incorrect figures. It later turned out that the reporting call was passing in the wrong ID and the analytics were going against whichever customer happened to have the ID “1838”, for example.

This entry was posted in .NET, C#, General chat and tagged .NET, C#, Coding style, Primitive obsession.

Richiban

A code monkey's tales of C#, F# and Powershell. I’d tell you a UDP joke, but you may not get it.

The dangers of primitive obsession

One thought on “The dangers of primitive obsession”

Leave a comment Cancel reply

The dangers of primitive obsession

Share this:

Related

One thought on “The dangers of primitive obsession”

Leave a comment Cancel reply