IEnumerable, ICollection, IReadOnlyCollection – an API analysis on .NET and XUnit

Today for the first time since I use XUnit I wrote a theory where the individual test parameters could not be easily defined as compile time constants.

In the next chapter I’m going to explain the motivation and the use case that arose here, before the next chapter will try to give a summary over the .NET API that’s behind that issue and what’s wong with it.

Use Case: XUnit MemberData Attribute

A theory in XUnit is a parametrized test function that can be fed with several sets of input and – if necessary, the corresponding expected result. (If you don’t need any parameters your test case checks for a single fact, and that’s why the corresponding attribute of such a test method is a [Fact].)

As a theory is parametrized, those parameters have to be populated from some source. XUnit provides several ways to do that.

The [InlineData] attribute defines a single set of parameters. A theory can carry multiple InlineData attributes. This usually leads to a well readable, easily to maintain list of input data tuples. On the other side [InlineData] is still an attribute, and as such it can only carry a small subset of compile time constant values as it’s properties according to the C# specification (unfortunately I couldn’t find the corresponding section in a current language specification version on the MSDN, but the Visual Studio 2013 and 2015 behaviour suggests it’s still the same). This restriction makes it hard to provide a more complex data model as input. Either the parameters have to be split to simple values and the test method has to create the test input first, or you have to use one of the other variants:

When more or less any object that could be created is possible to use, you could use e.g. AutoFixture, a library to generate objects according to given constraints in a parametrized random fashion. For this specific test case there’s [AutoDomainData] and other attributes defined in AutoFixture.XUnit. With those attributes the input parametrization is generated by auto fixture, but implementing a fine grained control or directly specifying what exactly should be generated tends to get quite complex to implement again – so autofixture is good for more or less randomized input data.

The missing part up to here is the one where [InlineData] would be required to use functions to get the actual parameters. Here XUnit provides the attribute [MemberData], formerly known as [PropertyData]. This attribute is attached to a theory and defines a string. This string has to be the name of a static property of the test class, that has the type IEnumerable<object[]>.
Each value of this property is then used for one separate test case. If only compile time constant object arrays are returned, several [InlineData] attributes are equivalent. But in contrast to InlineData the MemberData property getter can contain arbitrary code and generating complex input datasets is possible.

Mutability and Expectations to IEnumerable

The MemberData attribute links to a property of type IEnumerable<object[]>.
Each object array then populates the theory by assigning it’s elements in order to the theory parameters.

When I came to work today in the morning I thought this might be a good way to generate several sets of input data on the fly, and as those are quite complex and C# has the pretty yield keyword, that returns a single item as one of a sequence of items in an IEnumerable or Enumerator. The special force of this language feature is, that the enumeration itself explicitly is not enumerated completely when used, but may only generate one object at a time, keeping the memory footprint small even for very large enumerations.

As an example consider the endless Fibonnacci sequence, that can be iterated, but calculating an element a given index i requires the knowledge of the two predecessors at index i-1 and i-2.

Fibonnacci elements are simple numbers and therefore immutable in C#, but what if you have a similar case where you want to return objects with state where it’s much easier to mutate them and yield return them several times than generating new ones? My intuition of IEnumerable in contrast to ICollection was that Enumerators are designed to be used „on the fly“, so I implemented an Xunit theory where the property linked to the MemberData attribute returned the same object several times, adding more child objects into the more complex data structure between invocations of yield.

Suddenly my tests failed. I was carefully working through what should happen, and what would be the expected result given the state of the object when yielded to return both to be provided to the test theory.

Digging through it I realized, that the XUnit test runner in fact enumerates the complete property before running the first test.
You may want to do that, and IEnumerable has a ToList() function to do that: It enumerates and populates an IList object with the single elements returned. But here I wanted to use the benefit the generator pattern – else I could have provided a List myself.

Even worse: If I would have yielded the single object alone and the theory would have processed that as a single parameter, passing the test, maybe I never would have even realized that the theory runs with a single object, nothing more than a [Fact] again.

ICollection extends IEnumerable. It adds functions to manipulate the list (Add(), Clear() and Remove()), and two properties: IsReadOnly and Count. Count in particular is the one that IMHO makes the difference here: When the number of elements is known beforehand, it’s – beyond memory concerns – easy to generate all members as well, and vice versa: using IEnumerable has a benefit either to provide a public interface as general as possible or to provide very long up to unlimited sequences of items.

With this inheritance one could argue that any ICollection in fact is an IEnumerable as well. That’s true, and one argument of Brad Wilson on the xUnit github issue I opened for this case. He argues for IReadOnlyCollection, where the manipulation is not possible regarding the collection members itself. Nevertheless as the individual items are still mutable, that’s only half of the point of course.

Words and their Connotations matter

Enumerable inherently shows that it’s not fixed, it’s enumerated, not a list. Someone, who collects post stamps has a Collection. He might know how many stamps he has – do you see the analogy to the ICollection interface? Of course it’s possible to look at every single stamp, basically to enumerate through the collection – the collection is Enumerable.

It’s dangerous to rely on natural language when reading source code, but when it’s done good the intuition when reading code should match what it does. It’s bad style to use names where they don’t match.

My Proposal on GitHub

To overcome that in the github issue I

therefore propose[d] to do two things to clarify it:

  • clearly state how it behaves in the documentation of the attribut
  • change behaviour or api:
    • either, the enumerable should not be enumerated completely beforehand, or
    • the property should preferably be of type ICollection<object[]> instead, which may be basically the same implementation, but it’s more clear that it’s a list already, and all objects are inside from start on

In his response, Brad argued

This [(yielding the same object twice mutating it’s state in between)] breaks the contract/expectations of IEnumerable (not just for xUnit.net). The world you live in is one where .ToList() exists, and is regularly used.

As mentioned above, that is true in the real world, but – please don’t see that as an offense here – can you point me to the contract that is defined somewhere? It’s far too often used that way, but it’s not what it is meant to be.

Linq for Sql would be disgusting to use if all linq methods would call ToList(), this would make large data sets to a serious risk regarding memory consumptions and such. In fact some Linq methods like Intersect() and Union() do that, and you have to be very careful to deal with those methods and large amounts of data, often working around or at least re-designing the query.

Conclusion

I agree with Brad’s argument for any part of code that USES a foreign API. Because of those expectations you can’t rely on an Enumerable being used as an on-the-fly iteration.

I agree, that I had a wrong assumption when I wanted to use the Xunit APIs IEnumerable property as such. But I strongly oppose the conclusion: The fact, that many APIs and projects, even those of Microsoft and the .NET framework itself, use ToList() on Enumerable doesn’t imply that we couldn’t use it in a way that is more flexible. It shouldn’t lead to bad code that may suggest some people to make use of the features of the Enumerable: the generator feature that makes yield special.

I have to admit that I don’t yet fully understand how the discovery works and what’s required for the test runners in which state. It may be impossible to implement the interface as the generating IEnumerable I proposed.

If it is possible, I would propose to change do it: iterate the property on the fly with all benefits I wrote about above.
If not, well – shit happens, but then the documentation should clearly mention that, as it’s AFAIK not a written down contract to have immutable and distinct objects in an enumerable.

 

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.