Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  133 Posts | 8 Stories | 368 Comments | 162 Trackbacks

News



Article Categories

Archives

Post Categories

Programming

When dealing with custom data structures funny things can happen. This is especially true if multiple threads can mutate state and you have several locks playing in the mix. Doing a code review is quite challenging with such code but still some bugs slip through. After quite a lot of debugging the final bug turned out to have nothing to do with multithreading but a problematic IComparable implementation. What will the following small program print as output?

using System;
using System.Collections.Generic;

namespace SortedSetTest
{
    class Data : IComparable<Data>
    {
        public byte[] Payload
        {
            get;
            set;
        }
        public int CompareTo(Data other)
        {
           return this.Payload == other.Payload ? 0 : 1;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var set = new SortedSet<Data>();

            var payload = new byte[10];
            var payload2 = new byte[20];
            var payload3 = new byte[30];

            var d1 = new Data { Payload = payload };
            var d2 = new Data { Payload = payload2 };
            var d3 = new Data { Payload = payload3 };

            set.Add(d1);
            set.Add(d2);
            set.Add(d3);

            set.Remove(d1);
            set.Remove(d3);
            set.Remove(d2);

            Console.WriteLine($"Final Count: {set.Count}");
        }
    }
}

You can choose from

  • 0
  • 1
  • 2
  • 3
  • Other
  • Exception

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

And the winner is:

 

Final Count: 1

This was unexpected. If you break the rules of the IComparable interface you can get from never terminating sorts up to silent data structure corruption everything. The generic interface description is not particularly helpful but the non generic version spells it out explicitly:

1) A.CompareTo(A) must return zero.

2) If A.CompareTo(B) returns zero, then B.CompareTo(A) must return zero.

3) If A.CompareTo(B) returns zero and B.CompareTo(C) returns zero, then A.CompareTo(C) must return zero.

4) If A.CompareTo(B) returns a value other than zero, then B.CompareTo(A) must return a value of the opposite sign.

5) If A.CompareTo(B) returns a value x not equal to zero, and B.CompareTo(C) returns a value y of the same sign as x, then A.CompareTo(C) must return a value of the same sign as x and y.

6) By definition, any object compares greater than (or follows) null, and two null references compare equal to each other.

If you break any of the rules you will get undefined behavior which depends entirely on the collection class you use. You must remember to never ever take shortcuts while implementing the IComparable interface. In my case the rule number 4 and 6 were violated.

Proof 4: A,B are non null arrays

    A.CompareTo(B) = 1

    B.CompareTo(A) = 1 Cannot be according to rule 4!

The spirit of this rule is that in order to sort things you need to support the < operator which requires that property. The problem with SortedSet<T> is that it uses a red black tree for data storage. Since the < comparison is broken the tree operations which rely on a working comparison operator can break it subtle ways like sometimes to forget to remove an existing item from the collection.

So how can this be fixed? That all depends on what you treat as equal. If you care only about array sizes and not their contents then your CompareTo method becomes

        public int CompareTo(Data other)
        {
            int lret = 0;
            if( Object.ReferenceEquals(Payload, other.Payload))
            {
                lret = 0;
            }
            else if (other.Payload == null)
            {
                lret = 1;
            }
            else if (Payload == null)
            {
                lret= -1;
            }
            else
            {
                lret = Payload.Length.CompareTo(other.Payload.Length);
            }

            return lret;
        }

That fixed version will also follow rule 6 for null values which is also good practice. For some time I have thought that SortedSet<T> was broken but as usual the BCL classes are ok but the fault was in our code. How hard can it be to write a comparison method? It turns out there are 6 rules to be followed which are quite a lot for a seemingly simple function.

posted on Tuesday, March 22, 2016 11:18 AM

Feedback

# re: Beware of SortedSet<T> and a custom Comparer 3/23/2016 1:52 AM Tim VanFosson
I'll be honest, my first reaction was "Well, that's obviously wrong. You can't have a correct IComparable<T> that doesn't have the possibility of returning a negative value." That's not a very helpful response, though, so I wasn't planning to comment. However, I think you could have easily avoided this by simply testing your implementation across the various permutations. Some things can be hard to test, but this shouldn't be. I imagine that you would have saved a lot of time (not to mention the results of the incorrect implementation) if you had written sufficient tests to cover the possibilities.

Also, it's curious to me that you're determining ordering based eventually on the length rather than the content of the payload. Is it not possible in your case to have byte arrays with different content of the same length? If so, you still don't have a total ordering, only a partial ordering of the data.

# re: Beware of SortedSet<T> and a custom Comparer 3/23/2016 2:11 AM Stephen Cleary
As a general rule, if you're implementing IComparable<T>, then you should also be implementing IEquatable<T> (so that Equals(x) is the same as (CompareTo(x) == 0)). And if you're implementing IEquatable<T>, then you should also be overriding Equals(Object) and GetHashCode.

The complexity of comparison rules is why I wrote my little library called Comparers: https://github.com/StephenCleary/Comparers

The fluent syntax of my library helps avoid most of these kinds of pitfalls.

# re: Beware of SortedSet<T> and a custom Comparer 3/23/2016 3:51 AM Alois Kraus
@Tim: Yes more testing helps. The issue was hidden in already complex multithreaded code so it was not easy to spot. The SortedSet was used as container for a memory buffer pool so the contents of the arrays did not matter. I agree more tests would have helped but at some point you are writing too many tests for simple code. It is more economical to have good asserts which check for invariants and throw an exception when they are violated. That is how the issue was found at all.

@Stephen: That is really a nice library! Yes once you get started with implementing one Interface you need all others as well. But this was a rather specific data structure. The final solution was to not use a SortedSet but use a plain List<byte[]> array which is faster and has none of the issues.

# re: Beware of SortedSet<T> and a custom Comparer 3/23/2016 7:26 AM Chris Bowley
The big question for me is why use a SortedSet when there is no obvious ordering of the items. Wouldn't a HashSet with an IEqualityConverter based on reference equality of the Payload have been more appropriate? It should be much more efficient for any serious number of items.

# re: Beware of SortedSet<T> and a custom Comparer 3/25/2016 9:13 AM Alois Kraus
@Chris: The ordering is determined by its usage. It should act as a buffer for byte arrays with a LRU policy where the oldest items are evicted if nobody was using them. This at least in theory can help to get good CPU cache locality if the last added array is reused first.

Post A Comment
Title:
Name:
Email:
Comment:
Verification: