Sunday, 29 March 2015

XML Serialization - Performance issue and Memory Leak

There are well documented issues with the XmlSerializer class that can cause performance issues along with memory leaks. Just in case you have not read about those, try these - Link # 1, Link # 2, Link # 3. The issue is related to the constructors that let you specify "Base Type" and list of "Derived Type". We had a requirement where we needed to store and retrieve data related to classes that have similar inheritance. e.g.

    public class BaseLog
    {
        public string Number0 { get; set; }
    }

    public class Log : BaseLog
    {
        public string Number { get; set; }
    }

    public class Log1 : BaseLog
    {
        public string Number1 { get; set; }
    }

    public class Log2 : BaseLog
    {
        public string Number2 { get; set; }

    }

To de-serialize the xml into appropriate type, we wrote a helper method:
public static TActualType Deserialize(string xml, Type[] derivedTypes)
        {
            TextReader reader = null;
            try
            {
                reader = new StringReader(xml);

                using (var xmlReader = XmlReader.Create(reader))
                {
                    reader = null;
                    XmlSerializer xmlserializer = new XmlSerializer(typeof(TBaseType), derivedTypes);
                    return (TActualType)xmlserializer.Deserialize(xmlReader);
                }
            }
            finally
            {
                if (reader != null)
                {
                    reader.Dispose();
                }
            }

        }

The highlighted line started to show signs of memory leak and slow performance when started to load test our application. I wrote a simple console application that runs the serialization and de-serialization operation 100 times and ran it with Visual Studio's Memory Profiler. Results were scary :)

            int counter = 0, limit = 100;
            Stopwatch stopWatch = new Stopwatch();
            if (true)
            {
                counter = 0;
                stopWatch.Reset();
                stopWatch.Start();
                Console.WriteLine("Xml Serializer : No Cache");
                while (true)
                {
                    if (counter > limit)
                    {
                        break;
                    }

                    SerializeTest2(); // calls serialize/deserialize
                    counter++;
                }

                stopWatch.Stop();
                Console.WriteLine(stopWatch.ElapsedMilliseconds);

            }

private static void SerializeTest2()
        {
            Log a = new Log();
            a.Number0 = "PQR";
            a.Number = "ABC";
            var result = XmlSerializationHelper2.SerializeToXml(a);

            Log1 b = new Log1();
            b.Number0 = "PQR";
            b.Number1 = "ABC";
            var result1 = XmlSerializationHelper2.SerializeToXml(b);

            Log2 c = new Log2();
            c.Number0 = "PQR";
            c.Number2 = "ABC";
            var result2 = XmlSerializationHelper2.SerializeToXml(c);

            BaseLog d = new BaseLog();
            d.Number0 = "PQR";
            var result3 = XmlSerializationHelper2.SerializeToXml(d);

            BaseLog results = XmlSerializationHelper2.Deserialize(result, new Type[] { typeof(Log), typeof(Log1), typeof(Log2) });

            BaseLog results1 = XmlSerializationHelper2.Deserialize(result1, new Type[] { typeof(Log), typeof(Log1), typeof(Log2) });

            BaseLog results2 = XmlSerializationHelper2.Deserialize(result2, new Type[] { typeof(Log), typeof(Log1), typeof(Log2) });

            BaseLog resultss = XmlSerializationHelper2.Deserialize(result);

            BaseLog resultss1 = XmlSerializationHelper2.Deserialize(result1);

            BaseLog resultss2 = XmlSerializationHelper2.Deserialize(result2);

            BaseLog results3 = XmlSerializationHelper2.Deserialize(result3);

        }

It took ~6 seconds to run 100 loops and memory usage kept increasing with each iteration of the loop. The reason for this is documented (and I repeat): dynamic assembly generation - assemblies can not be unloaded unless you unload an appdomain and therefore the allocated memory keeps increasing. It was so bad that our application would start to throw "Out of Memory" exception whenever it had to deal with large number of records (read in excess of 1000 records). Below is the graph:




Solutions:
1. Cache the XmlSerializer instances. I added a pretty simple caching logic like below:
        private static Dictionary xmlSerializerDictionary = new Dictionary();


        private static object syncRoot = new object();

        private static string GetKey(Type type, Type[] derivedTypes)
        {
            StringBuilder builder = new StringBuilder();
            if (derivedTypes != null)
            {
                foreach (var derivedType in derivedTypes)
                {
                    builder.AppendFormat("{0},", derivedType.FullName);
                }
            }

            return string.Format("{0}_{1}", type == null ? string.Empty : type.FullName, builder.ToString());
        }

        private static XmlSerializer GetSerializer(Type baseType, Type[] derivedTypes)
        {
            string key = GetKey(baseType, derivedTypes);

            XmlSerializer xmlserializer = null;
            lock (syncRoot)
            {
                if (xmlSerializerDictionary.ContainsKey(key))
                {
                    xmlserializer = xmlSerializerDictionary[key];
                }
                else
                {
                    xmlserializer = new XmlSerializer(baseType, derivedTypes);
                    xmlSerializerDictionary.Add(key, xmlserializer);
                }
            }

            return xmlserializer;

        }

After the change, the same program took ~120 milliseconds :O and memory usage was flat and total memory usage came down significantly:



2. Forget about the XmlSerializer usage if you can :). There can be cases where JSON serialization can be handy too and its performance is as good as any other dependable implementation. You can use Newtonsoft.Json and use a simple implementation that lets you serialize and deserialize base class and derived class.

public static class JsonSerializationHelper
    {
        public static string Serialize(T obj)
        {
            return JsonConvert.SerializeObject(obj, typeof(T), new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.All });
        }

        public static T Deserialize(string content)
        {
            return JsonConvert.DeserializeObject(content, new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.All });
        }

    }

Running the same test against this implemention produces good results. It took ~250 milliseconds.


No comments:

Post a Comment