Geeks With Blogs
George Mamaladze .NET C# tips, tricks, tweaks. Effective use of data structures and algorithms. Clean code.

Download QuadTreeOptimization.zip c# project

Background

Today I'd like to share with you interesting experiences I made last several evenings when working on one of my open source hobby projects - Source Code Cloud Generator

http://sourcecodecloud.codeplex.com/

One of the tasks was arranging of nonintersecting rectangles on a 2D surface in certain pattern. For quick 2D collision detection (rectangle intersection) I decided to use QuadTree data structure [see: http://en.wikipedia.org/wiki/Quadtree]. I found one good (clean) open source c# implementation from Michael Coyle [many thanks!] at http://www.codeproject.com/KB/recipes/QuadTree.aspx. I started using it without major modifications.

After the first version of my app was done, I have started optimizing different aspects. One of performance highlights was drawing speed. About 50% of the overall drawing & layouting time was consumed by QuadTree. So I decided to OPTIMIZE this particular component.

Optimization

Optimization is very similar to refactoring - you are modifying code to make it better (in this case faster) preserving exactly the same behaviour. Very good differentiation between refactoring, bug-fix and optimization, was made by Robert C Martin in his book "Working Effectively with Legacy Code". All these three hold your functionality invariant - each of them trying to improve different aspects of code. In this case I wanted to improve the speed of the algorithm.

Approach

To be precise my goal was to optimize the QuadTree data structure in my particular use case - not making it efficient in general for all sorts of use cases. The overall scenario was quite long running and very UI and file system dependant. My application was gathering data by processing large amount of files and drawing visualization on screen. The QuadTree was somewhere in the middle of this chain.

The challenge was to isolate the QuadTree component while still working with REAL DATA in order to make optimization progress visible and measurable.

Step 1

So I built in simple logging functionality into two main methods of my component. It was recording all method calls with parameters and return values in common semicolon delimited text file.

 
HasContent;766,7109;469;312,5781;53,99999;False
Insert;766,7109;469;312,5781;53,99999;True
HasContent;798,7172;474,5293;248,5657;42,94143;True
HasContent;796,9323;474,7823;248,5657;42,94143;True
...

Then I have generated a set of log files which where result of several different runs of the real application. Logs very in size from 4 to 16MB and contained many thousand records.

Step 2

I wrote a set of regression tests, conservation tests, preserving basic behaviour. Like inserting a rectangle covering whole surface, inserting rectangle of zero size, etc.

Step 3

I have created a unit test which was fetching its test data from test case generator. The test case generator was accessing CSV file line by line generating test records out of it. Thus I was able to playback my log files inside NUnit.

This is the test itself:

        [Test]
        [TestCaseSource(typeof(TestDataReader), "TestCases")]
        public bool Test(string methodName, RectangleF rectangle)
        {
            switch (methodName)
            {
                case "Insert":
                    m_InsertStopwatch.Start();
                    QuadTree.Insert(new LayoutItem(rectangle));
                    m_InsertStopwatch.Stop();
                    return true;

                case "HasContent":
                    m_QueryStopwatch.Start();
                    var result = QuadTree.HasContent(rectangle);
                    m_QueryStopwatch.Stop();
                    return result;
            }

            Assert.Fail("Unknown method {0}", methodName);
            return false;
        }

and appropriate test case generator:

    public static class TestDataReader
    {
        public static IEnumerable TestCases
        {
            get
            {
                using (var logFile = File.OpenText("TestData.log"))
                {
                    while (!logFile.EndOfStream)
                    {
                        string line = logFile.ReadLine();
                        var parseResult = ParseTestCase(line);
                        yield return parseResult;
                    }
                }
            }
        }

        private static TestCaseData ParseTestCase(string line)
        {
         ...
         }
    }

Step 4

I have created two copies of QuadTree - the original one and one to be optimized. Also I made two test fixtures for each of them deriving from the base test fixture. Only difference was that they where working with instances of two different QuadTree implementations using the same interface.

    [TestFixture]
    public class ExhaustiveTestOriginal : ExhaustiveTestBase
    {
       
        [TestFixtureSetUp]
        public override void SetUp()
        {
            base.SetUp();
            this.QuadTree = new QuadTree<LayoutItem>(new RectangleF(new PointF(0, 0), new SizeF(2000, 2000)));
        }

        [TestFixtureTearDown]
        public override void TearDown()
        {
            base.TearDown();
        }
    }

Step 5

I was making optimizations I supposed to improve performance one by one, only one at a time and was able to verify the impact immediately. After several experiments I ended up with a solution which was almost two times faster then the original one.

Console output:

C:\>nunit-console.exe QuadTreeOptimization.dll /run=QuadTreeOptimization.Optimized /nodots
NUnit-Console version 2.6.0.11089

Selected test(s): QuadTreeOptimization.Optimized

Insert: 00:00:00.0049159
Query: 00:00:00.7418825
Total: 00:00:
00.7467984

Tests run: 123022, Errors: 0, Failures: 0, Inconclusive: 0, Time: 50,5165949 seconds
  Not run: 0, Invalid: 0, Ignored: 0, Skipped: 0


C:\>nunit-console.exe QuadTreeOptimization.dll /run=QuadTreeOptimization.Original /nodots
NUnit-Console version 2.6.0.11089

Selected test(s): QuadTreeOptimization.Original

Insert: 00:00:00.0087474
Query: 00:00:01.2876519
Total: 00:00:
01.2963993

Tests run: 123022, Errors: 0, Failures: 0, Inconclusive: 0, Time: 50,1728383 seconds
  Not run: 0, Invalid: 0, Ignored: 0, Skipped: 0

Here my optimization "changelog":
* Eliminating almost duplicate code.
* Replacing result accumulation in a List<> by yield return iterator.
* Replacing a List<> representing the node list with fixed length of 4 with an array.
* Replacing a List<> representing the node list with unlimited growing length with Stack<> - which is basically singly linked list.

It was fun having immediate verification after each refinement attempt. Several times it helped me to figure out very quickly that I was barking up the wrong tree.

Examples:
My first idea was to improve performance by balancing the tree. I started reading about sophisticated balancing strategies, until I found out that my QuadTree filled with REAL data is nearly perfectly balanced.
Then I used LinkedLists<> instead of lists everywhere which seemed to be intuitively more suitable. And again MEASURE-MODIFY-MEASURE-ROLLBACK. 


Interesting Findings

At the end I figured out that using of [TestCaseSource(typeof(TestDataReader), "TestCases")] attribute and organizing my series of assertions like each one as an independent test case was not the optimal solution.
1. Calling test method goes over reflection causing incredible performance penalty in overal execution time.
2. Test cases can only be executed all together, thus it's a recording and every next assertion assumes certain state left by all previous stimulus.
3. NUnit GUI was not able to handle more than several thousand test cases, because it was trying to generate the tree containing list of all test cases ending up with out of memory exception. That's why I primarily used NUnit console.
4. Time measured by NUnit is the overall time and not the "clean" test time. It is the sum of time needed for test case construction (in my case reading from file) + time needed for reflection invoke + test time itself. I was interested on performance measurement of method calls on my class, so I had to implement my own StopWatch around them.

At the end I put back optimized implementation in my original project and it worked perfectly at the first try.
The whole solution became if fact a throw away code and data. It was made to accomplish one very specific optimization mission - like Russian Soyuz spacecraft expendable launch system - less elegant, but "low risk of mission failure, a short time to launch and a relatively low cost." (Wikipeia about expendable launch systems.)

Posted on Wednesday, July 20, 2011 10:05 AM C# Language , Data Structures and Algorithms , Performance , Testability | Back to top

Copyright © George Mamaladze | Powered by: GeeksWithBlogs.net