Alois Kraus


  Home  |   Contact  |   Syndication    |   Login
  133 Posts | 8 Stories | 368 Comments | 162 Trackbacks


Article Categories


Post Categories


When you have a performance bug you usually look at your code why something takes longer than expected. In conjunction with a debugger this is a great approach for easy bugs. Things get much harder if the issue only surfaces sporadically at customer sites. Application logging which is always on helps you to see that there was a problem but since logging must not flood the hard disc you cannot log every method call with all parameters like IntelliTrace tries to do. The next level is application tracing which can generate enormous amounts of output is a good method to find production issues on customer sites. Now you see in your traces that something takes much longer than normally but you have no clue WHY it is taking so much longer. This is the time where system wide profiling with ETW (yes I keep coming back to it) is a nice way to drill deeper to solve otherwise impossible to solve bugs. Performance Counters are also a good approach to check if something strange is going on but then you still have (usually) no clue what the reason for the slowdown was. It is therefore the best to bring your application specific tracing and ETW in WPA together to find the slowdown in your application traces and then drill deeper in the other ETW profiling events.

As a rule of the thumb if you have carefully added tracing to your application you have also written the best trace viewer in the world to visualize your traces. No problem you can write an extractor tool to store your traces as text file in your own format which can be viewed by your good old trace viewer as usual (Tx is a nice library to extract events from an etl file). If you do want to stay within WPA you certainly want a better viewer than a plain list of your application traces. I use application traces to get an idea what the application was doing at a high level and then drill down to the details once I have found the slowdown in the traces. It would be cool to see in WPA your traces per thread with a duration timeline so I can see which methods take longest to execute. That allows a call stack like view where I can simply see in the graph which operations take most of the time.


The only issue with WPA is that in the graph there is no expand menu item (in the list view you can expand but there seems to be a limit how many items in the graph can be expanded)  and there is no possibility to easily filter for regions e.g. with a method execution time >0,1s. But there are workarounds. You need to bring up the advanced graph settings dialog



and to force the graph to expand all items you can define in the Expansion tab a query that matches all regions.


Then you get in the UI graph a nice per thread expansion of all called methods. Once you have visually found an interesting hot spot it is easy to drill deeper in the CPU Usage Precise graph to find unexpected waits for virtually anything.

You can check it out for yourself when you write your traces to ETW like with this little ETW tracer which depends on the EventSource Nuget package.

[EventSource(Name = "ETWTracer", Guid="950FD22D-086C-4D16-98A0-FCC2B8DE2B07")]
public sealed class Tracer : EventSource
    #region Singleton instance
    static public Tracer Log = new Tracer();

    [Event(1, Keywords = Keywords.InOut, Task = Tasks.InOut, Opcode = EventOpcode.Start)]
    public void MethodEnter(string Method)
    { WriteEvent(1, Method); }

    [Event(2, Keywords = Keywords.InOut, Level = EventLevel.Verbose, Task = Tasks.InOut, Opcode=EventOpcode.Stop)]
    public void MethodLeave(string Method, int DurationInus)
    { WriteEvent(2, Method, DurationInus); }

    [Event(3, Keywords = Keywords.Info, Task = Tasks.Info, Opcode = EventOpcode.Info)]
    public void Info(string Method, string Message)
    { WriteEvent(3, Method, Message); }
    #region Keywords / Tasks / Opcodes

    public class Keywords   // This is a bitvector
        public const EventKeywords InOut = (EventKeywords)0x0001;
        public const EventKeywords Info = (EventKeywords)0x0002;

    public class Tasks
        public const EventTask InOut = (EventTask)0x1;
        public const EventTask Info = (EventTask)0x2;


With this little class we can define a method/enter leave tracer with just a few lines

    class SimpleTracer : IDisposable
        string Method;
        Stopwatch Watch = Stopwatch.StartNew();
        public SimpleTracer(string method)
            Method = method;

        public void Info(string message)
            Tracer.Log.Info(Method, message);

        public void Dispose()
            Tracer.Log.MethodLeave(Method, (int)Watch.Elapsed.TotalMilliseconds * 1000);

and use it right away
       void Method1()
            using(new SimpleTracer("Method1"))
                using(new SimpleTracer("Method2"))
                    using(new SimpleTracer("Method3"))
                        using (new SimpleTracer("Method4"))

With PerfView you can enable collection your brand new ETW event provider by specifying the Guid for it. I still prefer the Guid and register them in the normal way with EventRegister for my own ETW provider since only PerfView and WPA knows how do the correct rundown based on the provider name by calculating its Guid by convention when the ETW provider class name is used with a * before the name.


Now it is time to load the generated data and load our own region of interest file. Go to Trace - Trace Properties and add your region file to it.


Then you get a new graph in the System Activity diagram list that you can add to your current analysis pane as usual.


You might be interested how the regions file does look like. Here it is:


<?xml version='1.0' encoding='utf-8' standalone='yes'?>
         <RegionRoot Guid="{65D8D484-90A9-4BA4-9247-51E7C159F000}" Name="" FriendlyName="">  <!-- This guid is simply a randum number -->
            <Region  Guid="{65D8D484-90A9-4BA4-9247-51E7C159F001}" Name="" FriendlyName="">  <!-- This guid is simply a randum number -->
                  <Event Provider="{950FD22D-086C-4D16-98A0-FCC2B8DE2B07}" Id="1" Version="0" /> <!-- Method Enter event. The guid is your ETW Provider guid. Event id and version must match! -->
                  <Event Provider="{950FD22D-086C-4D16-98A0-FCC2B8DE2B07}" Id="2" Version="0" /> <!-- Method Leave event. The guid is your ETW Provider guid. Event id and version must match! -->
                   <PayloadBased NameField="Method" />  <!-- The regions get the method names. It is a pity that no concatenation of more than one field is supported. -->
                    <Parent TID="true" /> <!-- Parent event must be from the same thread otherwise strange durations and correlations would be displayed as region -->
                    <Event TID="true">    <!-- The start stop events must originate from the same thread -->
                      <Payload FieldName="Method" TargetFieldName="Method" /> <!-- Region start/stop must come from the same method. 
                                                                                  You can use several payload rules here to match several conditions here. -->
                    <!-- Nesting does not really work although it would be nice feature to get a call stack like view 
                         The main problem is that the region matching algo takes not the nearest possible event as parent but the first matching one which 
                         is almost always wrong if you call yourself in a loop the nesting is awkward.
                         It would work if we would give each method enter leave a unique activity guid so we could nest based on matching activity guids
                         but no one is writing traces out in this way.
                     <SelfNest TID="true">

As you can see in the comments there are still quite a few things to be desired to make regions even more user friendly but they are a nice way to get a visual "call stack" like view with durations as bars for one or all threads. The call stack view works only if you have no recursive or duplicate method names which would make the SelfNest feature work correctly. If more than one parent region is possible the first one in time and not the nearest on the current thread is chosen which is unfortunate to get a more call stack like view. Zooming into the graph works well for the x axis but there is no way to zoom out on they y axis to see where the time is spent when many threads at the same time are working together.

Although still many things are left to be desired the region based trace view is much better than a plain trace method enter/leave view as you get it for Generic events:


This can be a killer feature if you have complex application tracing in your application and you want to combine your application know how with the system response to the actions of your application.

posted on Sunday, September 21, 2014 4:25 AM