When you work in projects that require large development teams, you run into such situations where an issue that is trivial in nature in isolation becomes a mammoth task when you add up the mistakes of all :). I consider that a team is large when there are more than 6 developers in the team. My current team has about 50 developers. (Don't smile. Sometimes that is what you get.)
Few examples are overdoing (sometimes irresponsibly!) the use of .NET resource files (*.resx), configuration files (Web.Config, App.Config), databases tables for common settings, published content etc. This leads to too much fragmentation in code base. My current project has about 60 resource files and each file has good amount of resource values - what happened was that every sub team kept adding their own resource file instead of consolidating it and led to this situation. That also meant that spelling errors in resource strings also got spread across those many resource files with their own flavor of mistake e.g. word "plenty" was written as "planty" in one resource file and as "plent" in other file. OK, so the task was to minimize such errors by identifying and fixing these errors in single round of sweep and ensure that the team uses a spell checker tool from thereon.
I tried out some of the plugins available for Visual Studio (Link # 1, Link # 2) and all of those are pretty good at what they do i.e. live spell check, verify an open document in Visual Studio. None of those free tools that the option of scanning a code base and generating a report though. There are some paid tools that do that (my suggestion is to buy if you can). However, i thought i should write a program that solves the problem to some extent if not fully.
I extended the code i wrote to consolidate all resource files, to perform simple spell check using NHunSpell. It has a very easy to use API and is quite powerful (many open office tools use this). Since we use British English for our application, i had to download the dictionary separately from here. Once it was all done, the coding part was straightforward : Scan all resource files, read the value, split the value to find legal words and perform spell check on the word, dump the report in XML format so that it can be opened in Excel and analyzed easily.
It isn't the complete implementation nor is it the most elegant one. However, it works for rainy days :)
using (var hunspell = new Hunspell(@"NHunspell\en_GB.aff", @"NHunspell\en_GB.dic"))
{
string[] lines = System.IO.File.ReadAllLines(@"NHunspell\CustomWords-en_GB.txt");
foreach (var line in lines)
{
hunspell.Add(line);
}
string dropLocation = @"D:\Dump\";
DirectoryInfo dirInfo = new DirectoryInfo(@"D:\Projects\Application\");
var resourceFiles = dirInfo.EnumerateFiles("*.resx", SearchOption.AllDirectories);
DataTable dt = new DataTable("resources");
dt.Columns.Add(new DataColumn("FilePath", typeof(string)));
dt.Columns.Add(new DataColumn("Key", typeof(string)));
dt.Columns.Add(new DataColumn("Value", typeof(string)));
dt.Columns.Add(new DataColumn("HasSpellingErrors", typeof(string)));
foreach (var resourceFile in resourceFiles)
{
XDocument xDoc = XDocument.Load(resourceFile.FullName);
xDoc.Document.Descendants()
.Where(x => x.Name == "data")
.Each(x =>
{
var row = dt.NewRow();
string[] words = x.Descendants().ToList()[0].Value.Trim().Split().Distinct().ToArray();// Find words
if (words.Length == 1 && !words[0].Contains(" ")) return; // if single word, ignore it
StringBuilder incorrectWords = new StringBuilder();
foreach (string word in words)
{
if (IgnoreWord(word)) continue; // conditions under which word is ignored.
string finalWord = word.Replace("\"", string.Empty);
var finalwords = finalWord.Split('>', '<', '\'', '=', '/', ':'); // html?
foreach (var y in finalwords)
{
if (IgnoreWord(y)) continue;
if (!hunspell.Spell(y))
{
incorrectWords.Append(y + " ");
}
}
}
row.ItemArray = new object[4] { resourceFile.FullName, x.Attributes().ToList()[0].Value, x.Descendants().ToList()[0].Value, incorrectWords.ToString() };
dt.Rows.Add(row);
});
}
dt.WriteXml(dropLocation + "excel.xml");
}
Console.ReadLine();
}
private static bool IgnoreWord(string y)
{
// ignore empty or null string, string that represents an Url or is all CAPS string
return string.IsNullOrEmpty(y)
|| (y.StartsWith("www.") && y.EndsWith(".com"))
|| y.ToCharArray().All(c => Char.IsUpper(c)));
}