Sunday 26 April 2015

NewtonSoft.Json wins!!

NewtonSoft.Json won a long time ago and we should not be fighting over it anymore. I inherited an old code base which performed serialization to JSON content and deserialization from JSON content using the JavaScriptSerializer - a class that was included in System.Web.Extensions assembly by Microsoft to provide in built support for JSON. What has happened is that JavaScriptSerializer has not been enhanced since its launch and NewtonSoft.Json has made significant improvements in terms of feature and performance. See the comparison sheet. This sheet, alone, should be enough to convince one and all to switch to NewtonSoft.Json. However, I thought of running my own test:


            Console.WriteLine(Newtonsoft.Json.JsonConvert.SerializeObject(null));
            var javaScriptSerializer = new System.Web.Script.Serialization.JavaScriptSerializer();
            var obj = new TypeA() { Name = "Name", Value = "Value" };
            var objB = new TypeB() { Name = "Name", Value = "Value" };
            
            Stopwatch watch = new Stopwatch();
            watch.Start();
            
            for (int i = 0; i < 100000; i++)
            {
                var javaScriptSerializer2 = new System.Web.Script.Serialization.JavaScriptSerializer();
                string jsonString = javaScriptSerializer2.Serialize( (object)(i % 2 == 0 ? (object)obj : (object)objB));
            }

            watch.Stop();
            Console.WriteLine(watch.ElapsedMilliseconds);

            watch.Reset();
            watch.Start();

            for (int i = 0; i < 100000; i++)
            {
                string jsonString1 = Newtonsoft.Json.JsonConvert.SerializeObject((object)(i % 2 == 0 ? (object)obj : (object)objB));
            }

            watch.Stop();
            Console.WriteLine(watch.ElapsedMilliseconds);


Results are as expected. Newtonsoft.Json is faster by 50%. Even though the absolute difference is still in milliseconds, Newtonsoft.Json becomes an automatic selection if you add the support for rich CLR types to the kitty. 

Monday 13 April 2015

Spell check on all resource files in solution

When you work in projects that require large development teams, you run into such situations where an issue that is trivial in nature in isolation becomes a mammoth task when you add up the mistakes of all :). I consider that a team is large when there are more than 6 developers in the team. My current team has about 50 developers. (Don't smile. Sometimes that is what you get.)

Few examples are overdoing (sometimes irresponsibly!) the use of .NET resource files (*.resx), configuration files (Web.Config, App.Config), databases tables for common settings, published content etc. This leads to too much fragmentation in code base. My current project has about 60 resource files and each file has good amount of resource values - what happened was that every sub team kept adding their own resource file instead of consolidating it and led to this situation. That also meant that spelling errors in resource strings also got spread across those many resource files with their own flavor of mistake e.g. word "plenty" was written as "planty" in one resource file and as "plent" in other file. OK, so the task was to minimize such errors by identifying and fixing these errors in single round of sweep and ensure that the team uses a spell checker tool from thereon.

I tried out some of the plugins available for Visual Studio (Link # 1, Link # 2) and all of those are pretty good at what they do i.e. live spell check, verify an open document in Visual Studio. None of those free tools that the option of scanning a code base and generating a report though. There are some paid tools that do that (my suggestion is to buy if you can). However, i thought i should write a program that solves the problem to some extent if not fully. 

I extended the code i wrote to consolidate all resource files, to perform simple spell check using NHunSpell. It has a very easy to use API and is quite powerful (many open office tools use this). Since we use British English for our application, i had to download the dictionary separately from here. Once it was all done, the coding part was straightforward : Scan all resource files, read the value, split the value to find legal words and perform spell check on the word, dump the report in XML format so that it can be opened in Excel and analyzed easily.

It isn't the complete implementation nor is it the most elegant one. However, it works for rainy days :)

using (var hunspell = new Hunspell(@"NHunspell\en_GB.aff", @"NHunspell\en_GB.dic"))
            {

                string[] lines = System.IO.File.ReadAllLines(@"NHunspell\CustomWords-en_GB.txt");
                foreach (var line in lines)
                {
                    hunspell.Add(line);
                }

                string dropLocation = @"D:\Dump\";

                DirectoryInfo dirInfo = new DirectoryInfo(@"D:\Projects\Application\");
                var resourceFiles = dirInfo.EnumerateFiles("*.resx", SearchOption.AllDirectories);

                DataTable dt = new DataTable("resources");
                dt.Columns.Add(new DataColumn("FilePath", typeof(string)));
                dt.Columns.Add(new DataColumn("Key", typeof(string)));
                dt.Columns.Add(new DataColumn("Value", typeof(string)));
                dt.Columns.Add(new DataColumn("HasSpellingErrors", typeof(string)));

                foreach (var resourceFile in resourceFiles)
                {
                    XDocument xDoc = XDocument.Load(resourceFile.FullName);
                    xDoc.Document.Descendants()
                                .Where(x => x.Name == "data")
                                .Each(x =>
                                {
                                    var row = dt.NewRow();
                                    string[] words = x.Descendants().ToList()[0].Value.Trim().Split().Distinct().ToArray();// Find words
                                    if (words.Length == 1 && !words[0].Contains(" ")) return; // if single word, ignore it

                                    StringBuilder incorrectWords = new StringBuilder();
                                    foreach (string word in words)
                                    {
                                        if (IgnoreWord(word)) continue; // conditions under which word is ignored.
                                        string finalWord = word.Replace("\"", string.Empty);
                                        var finalwords = finalWord.Split('>', '<', '\'', '=', '/', ':'); // html?
                                        foreach (var y in finalwords)
                                        {
                                            if (IgnoreWord(y)) continue;

                                            if (!hunspell.Spell(y))
                                            {
                                                incorrectWords.Append(y + "   ");
                                            }
                                        }
                                    }
                                    row.ItemArray = new object[4] { resourceFile.FullName, x.Attributes().ToList()[0].Value, x.Descendants().ToList()[0].Value, incorrectWords.ToString() };
                                    dt.Rows.Add(row);
                                });
                }

                dt.WriteXml(dropLocation + "excel.xml");
}
Console.ReadLine();
}


private static bool IgnoreWord(string y)
        {
// ignore empty or null string, string that represents an Url or is all CAPS string
            return string.IsNullOrEmpty(y) 
                                                        || (y.StartsWith("www.") && y.EndsWith(".com"))
                                                        || y.ToCharArray().All(c => Char.IsUpper(c)));

        }