Friday 13 December 2019

So when does a Task start?

This question can turn out to be funny if you don't pay attention. So, i wrote an simple code so that it gives clarity to the uninitiated.

private static async Task WriteWithDelay()
{
            Console.WriteLine($"Write before delay in thread id : {Thread.CurrentThread.ManagedThreadId} at {DateTime.UtcNow.Ticks}");
            await Task.Delay(10000).ConfigureAwait(false);
            Console.WriteLine($"Write after delay in thread id : {Thread.CurrentThread.ManagedThreadId} at {DateTime.UtcNow.Ticks}");
            return Thread.CurrentThread.ManagedThreadId;

  }

 static async Task Main(string[] args)
 {
            Task[] tasks = new Task[10];
            for(int i = 0; i < tasks.Length; i++)
            {
                tasks[i] = WriteWithDelay();
            }

            Console.WriteLine($"Before when all at {DateTime.UtcNow.Ticks}");
            await Task.WhenAll(tasks);

            Console.WriteLine($"After when all at {DateTime.UtcNow.Ticks}");
}

Well, you can check the output and be clear about the order in which things happen.

Write before delay in thread id : 1 at 637118566941927002
Write before delay in thread id : 1 at 637118566942107032
Write before delay in thread id : 1 at 637118566942114110
Write before delay in thread id : 1 at 637118566942117029
Write before delay in thread id : 1 at 637118566942117029
Write before delay in thread id : 1 at 637118566942117029
Write before delay in thread id : 1 at 637118566942117029
Write before delay in thread id : 1 at 637118566942117029
Write before delay in thread id : 1 at 637118566942127007
Write before delay in thread id : 1 at 637118566942127007
Before when all at 637118566942127007
Write after delay in thread id : 4 at 637118567043690656
Write after delay in thread id : 11 at 637118567043690656
Write after delay in thread id : 5 at 637118567043690656
Write after delay in thread id : 7 at 637118567043690656
Write after delay in thread id : 13 at 637118567043690656
Write after delay in thread id : 9 at 637118567043690656
Write after delay in thread id : 12 at 637118567043690656
Write after delay in thread id : 8 at 637118567043690656
Write after delay in thread id : 6 at 637118567043690656
Write after delay in thread id : 10 at 637118567043700652
After when all at 637118567043871008

Stateful vs Stateless

So, this has been a question that I have faced multiple times over the course of years. Should you write services that are stateful or should you try out a design that should be stateless?

Disclaimer: All opinions expressed below are personal opinion and are point in time opinions. Software evolves very fast and new patterns emerge every year. So do not quote me :). 

By Stateful service, I mean services that keep data as part of the service. There are multiple options that allow us to implement stateful services e.g. Service Fabric Stateful Service etc.

Let us dive into the pros and cons.


Stateful Services

Pros:

1. Keeps data close to where it is needed the most and minimize network latency. (*there will still be limitation on the size of data and type of data that can keep. You will eventually need a persistent backing store anyway.)
2. Can potentially improve the performance of the service.
3. Invariably the design could lead to partitioning the data as well and can increase the scale options.

Cons:

1. Enforces data affinity and special considerations need to be put in place if you want to go for active-active sites. Same restriction applies to scale out scenarios (think consistent hashing).
2. Disaster recovery needs to be carefully planned as there is level of coupling between compute plane and data plane.

Stateless Services

Pros:

1. Infrastructure (e.g. compute instances) can be scaled without much concerns because there is no local data affinity.
2. Data is separated from compute and keeps conceptual/physical separations consistent - at least from visualization perspective.

Cons:

1. Easier to set up active-active sites as long as the backend can hold up.
2. Disaster recovery for compute plane is independent of disaster recovery of data plane.


References - link # 1 link # 2


Wednesday 20 November 2019

Azure Durable Functions vs everything else

If you were to develop and run such complex workflows in the cloud, you will probably need to depend on industry leading products like K2 or build your own workflow using .NET Workflow Foundation (but that has its own drawbacks like it is not available on .NET core) or Durable Task Library. 

While above mentioned options are very much possible, if you don't have a requirement to run millions of workflows per day, probably you should validate Azure Durable Functions as well before you decide to go all-in.

Azure Durable Functions are a great way to instantiate and manage complex event-driven workflows. The workflows can follow many of the well-known workflow patterns like "fan out-fan in", "human intervention", "workflow chaining" etc. 

There is great content available over the web to go through and build your workflows using ADF (Azure Duration Functions) e.g. Talk by Jeremy Likness

There are couple of factors to keep in mind before you get swayed either way.


  1. Take a good look at your input requirements and identify a trigger that works well with it. If your workflows are triggers are triggered via HTTP requests, you might evaluate HTTP triggers vs your own .NET core web app (which can internally queue messages and ADF can be triggered using Queue triggers).
  2. Ensure that you have chosen the right number of partition counts and are using the auto scale appropriately. Docs
  3. ADF in their current avatar utilize Azure Storage for instance management and heavy load can lead to potential throttling and slowness. If you have extremely high load and low latency requirements, perhaps it is a good idea to use any industry leading product or utilize Durable Task Framework. DTF is extremely powerful and flexible and you can add your own modifications via its extensible state providers or modifying the original source code :). I believe it is an area of improvement for ADF.
  4. ADF does have a way to version orchestrations but you are better off using a side by side deployment model instead of adding backward compatible logic in your functions (and stopping and restarting all in-flight instances).
  5. ADF's disaster recovery options are pretty much covered via disaster recovery options of app service and azure storage account.


Happy decision making!!


Thursday 17 October 2019

Don't be afraid of the in-memory Cache

For a very long time, I had this doubt in my mind if "in-memory" cache is a justified solution if the number of items to be cached are quite high. I guess I had my own reasons for this; primarily because I had never tried to put so much data in such process level cache.

Of course, if it was 2005, we might still be using 32 bit hardware and memory constraints were equally important as optimized processing logic. But now that memory is cheap, we can be assured that it is ok to trade-off memory for better performance.

So, i ran an experiment.

            var memoryCache = MemoryCache.Default;
            Console.WriteLine(memoryCache.Count());
            Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int i = 0; i < 1000 * 1000; i++)
            {
                memoryCache.Add("key" + i.ToString(), new Tuple("key" + i.ToString(), true), new CacheItemPolicy() { AbsoluteExpiration = DateTime.UtcNow.AddDays(1) });
            }
            sw.Stop();
            Console.WriteLine($"Time taken to write 1M values to cache {sw.ElapsedMilliseconds}");

            Console.WriteLine(memoryCache.Count());
            Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);

            memoryCache = MemoryCache.Default;
            Console.WriteLine(memoryCache.Count());
            sw.Restart();
            for (int i = 0; i < 1000 * 1000; i++)
            {
                var val = memoryCache.Get("key" + i.ToString()) as Tuple;
                if (val == null)
                {
                    Console.WriteLine("Found null");
                }
            }
            sw.Stop();

            Console.WriteLine($"Time taken to read 1M values to cache {sw.ElapsedMilliseconds}");

I wanted to check the impact on memory usage and if it there is any apparent issue with keeping 1M records in "in-memory" cache.

Well, there were no apparent issues.
0
19460096
Time taken to write 1M values to cache 4167
1000000
757383168
1000000
Time taken to read 1M values to cache 676

As you can see, there is not much delay in writing or reading so many records. The memory usage does definitely jump by a factor of 10 but i guess we planned to use more memory anyways.

Tuesday 8 October 2019

Azure Storage Blob : Policies

It has been a feature that was missing from Azure Storage Blob capabilities - adding policies for performing actions after certain conditions e.g. deleting a blob item after 1 month. 

There are typical use cases for it. I can give you one example.

Let's imagine an API that needs to asynchronously process messages it receives. An standard approach will be to put the message in a blob storage and leave a message in the queue with the address of blob item so that you can process the item later on.

Application's queue processor service can read the message, downloads the blob item and process the request. However, it might need to add a "maintenance" code to delete the blob item so that you don't store the unnecessary message. This is a code you could have avoided if platform had a feature for it :).

PS: if the message is smaller than 64kb than you can directly put it in a queue too, but we digress.


Well, Microsoft Azure Storage has added support for policies for handling typical actions (like the one i mentioned) for Azure Blob. Link

Quite nifty feature.

I can define a rule like below to delete items after 1 month.

{ "rules": [ { "name": "deleteTemporaryData", "enabled": true, "type": "Lifecycle", "definition": { "filters": { "blobTypes": [ "blockBlob" ], "prefixMatch": [ "queueinputblob/messages" ] }, "actions": { "baseBlob": { "delete": { "daysAfterModificationGreaterThan": 30 } } } } } ] }


Monday 12 August 2019

nuget package wisdom - Cosmos DB Table API

I have been working in .NET ecosystem for a long time and managing correct assembly references in a large/distributed code base (i.e. managed via multiple git repos) is indeed a hard task. More so, when VS lets you pick few "system" references in "Reference Assemblies" and remaining from "packages" location downloaded via nuget feeds.

However, developers find a way to manage it in one way or another.

Once you achieve that, you are faced with two other hard tasks.

1. Maintaining the same nuget package version across solutions.
2. Ensuring that referenced nuget packages are the latest stable versions.

I will focus more on # 2 as i recently found out that if many times you miss out on goodness of new APIs/better performance if you do not upgrade.

Case in point:

I have a solution that utilizes Azure Cosmos Table API as storage technology. The code to read/write to the store was written more than a year ago and was using a nuget named "Microsoft.Azure.CosmosDB.Table".

Solution was using it with version 1.*.*. The average response time for the API for writing to cosmos Table was ~180ms which is quite decent actually.

However, when i upgraded the solution to use version 2.1.1 (latest), the average response time dropped to ~70ms which is even better.

The gain isn't too much when look at it from the perspective of absolute numbers. However, they do add up to a lot if your system is making hundreds of such calls.

Learning : keep updating the newest version on regular intervals :)

Sunday 19 May 2019

Be careful with MemoryCache

Imagine a .NET code block like below:


var memCache = new System.Runtime.Caching.MemoryCache("Cache1");
            memCache.Add("Key1", "Val1", new CacheItemPolicy() { AbsoluteExpiration = DateTime.UtcNow.AddDays(1) });
            Console.WriteLine(memCache["Key1"]);
            var memCache2 = new System.Runtime.Caching.MemoryCache("Cache1");
            Console.WriteLine(memCache2["Key1"]);
            Console.WriteLine(memCache["Key1"]);

What do you expect in output?

Well, as opposed to what some might expect, 2 line printed on console is null.

In fact, if you extend the code to like below, you will come to realize that everytime you new up a MemoryCache instance; even if it is with same name within same process, it is a different one.

var memCache = new System.Runtime.Caching.MemoryCache("Cache1");

            memCache.Add("Key1", "Val1", new CacheItemPolicy() { AbsoluteExpiration = DateTime.UtcNow.AddDays(1) });
            Console.WriteLine(memCache["Key1"]);
            var memCache2 = new System.Runtime.Caching.MemoryCache("Cache1");
            Console.WriteLine(memCache2["Key1"]);
            Console.WriteLine(memCache["Key1"]);
            memCache2.Add("Key1", "Val2", new CacheItemPolicy() { AbsoluteExpiration = DateTime.UtcNow.AddDays(1) });
            Console.WriteLine(memCache2["Key1"]);
            Console.WriteLine(memCache["Key1"]);
            Console.ReadLine();


Keep this in mind and save yourself from some weird performance issues :).

Tuesday 7 May 2019

Where is the bug, bro?

I was recently ran into an interesting issue when using JsonConvert.DeserializeObject method within Parallel.ForEach function. 

Here is the code:


public class Temp
{
   public int A { get; set; }

   public string B { get; set; }
}


static async Task Main(string[] args)
{
    var counter = 0;
    Temp t = new Temp();
    var str = JsonConvert.SerializeObject(t);
    while (counter <= 25)
    {
       var itemsInList = new List<string>();
for (int i = 0; i < 10; i++)
       {
              itemsInList.Add(str);
}

var tempList = new List();
       Parallel.ForEach(itemsInList, (item) =>
       {
              var y = JsonConvert.DeserializeObject(item);
              tempList.Add(y);
});

       foreach (var tempItem in tempList)
       {
Console.WriteLine("A  :" + tempItem.A);
}
Console.Read();
       counter++;
    }
}


If we run this a few times, you will run into an odd "Object reference not set to an instance of an object" error.


Now that is odd as that is intermittent and that indicated that the issue was with Deserialization code. Well, it is not the case.

If you replace the Parallel.ForEach block with following, you will notice that the new logic is never run.

Parallel.ForEach(itemsInList, (item) =>
{
var y = JsonConvert.DeserializeObject(item);
       if(y == null)
       {
              Console.WriteLine("Some issue with deserialization");
}
tempList.Add(y);
});


Mystery deepens ??

Well, not really. As an experienced person would tell after a bit of head scratch, the issue is with List instance as that is not thread safe.

If we replace that with a ConcurrentBag, we are good.

var tempList = new ConcurrentBag();

Old knowledge but takes a minute to kick in. Nice!!

Thursday 31 January 2019

Redis: Advanced data structures

Whenever we think of Cache, we think of Redis Cache unless you are exclusively working with an In-Memory cache. However, many a times, we continue to look at Redis as a store of Key-Value pair. That is a mistake!!

Imagine you have a situation where you have to keep information about members of a group and your app can support dynamic generation of group - how would you store it?

Option 1:

Keep it like a Key-Value pair. E.g.

Key: {GroupName}
Value: {MemberName1}{Delimiter}{MemberName2}{Delimiter}{MemberName3}....

Here delimiter can be of your choice like ",".

It will work fine for you. Redis documentation says that maximum size of a value can be 512MB.

However, now imagine you need to add and remove members from this value. You would end up writing code that will fetch the value, manipulate it and save it again. 2 trips to Redis and some CPU cycles.

Option 2:

Keep it in Set.

SetName: {GroupName}

You can simply add and remove items from it. There is even a single command to list all members of a set.

Now this is handy and faster than Option 1.

Similarly there are multiple advanced data structures built into Redis which can be leveraged in specific situations. Better to use it :).

Tuesday 8 January 2019

Accessing website running on Ubuntu VM from Windows Host

So this might appear simple, but it does require a bit of attention and work. 

First of all, let us create the Ubuntu VM on Hyper-V using the "Quick Create" option. You can follow the details from here

Once you are through, you will have an Ubuntu VM running on Hyper-V with Enhanced Session mode which is actually quite nicer than doing all of the hacks on your own to get a better Remote Desktop experience. 

The network switch configured for the VM will be "Default Switch". Read more about it here This means that you should be able to ping the VM machine from Host (and vice-versa) using a domain name "mshome.net".


Once you are done, you can host/run a web application on the Ubuntu VM. I used VSCode to create a .NET Core MVC Web application and ran it. I configured the application to use "HostName" of the VM during startup so that .NETCore understands and allows the HostName for accessing the site - by default .NET core's web server does not allow other host names unless specified.


You should be able to access the web application using both the URLs on the VM. Right now, if you attempt to access the application from host VM, it will fail.

Configure Firewall

We need to configure firewall on the Ubuntu VM. I installed UFW application on the VM and allowed 5002 port. Here is the tutorial to achieve the same.

Once it is done, the application should be accessible using "VMName.mshome.net" host name.


Hope it helps.