Sunday 19 June 2016

Azure Stream Analytics Jobs and Reference data

If you are using Azure Stream Analytics Jobs and are utilizing "Reference Data" streams (Which are essentially a blob storage for now), then you may run into following scenario.

If your data repository is other than Azure Blob storage (which has a very high chance because Blob storage is not for that purpose) and you need to reference part of that data in your Azure Stream Analytics Jobs and if that reference data is slowly changing, then you need to synchronize data across the primary data store and Azure Blob Storage. There are multiple articles that can help you achieve that like this. However there are some finer points:

1. Though most of the articles on the web seem to indicate that reference data can be indicated to have changed at "minute" time grain level, I have not been able to get that working. New Azure portal (portal.azure.com) does not even show "minute" as an option to specify in Blob storage path. The old portal (manage.windowsazure.com) does show the option but if you try to save your input source, it fails :). Your best bet for time grain is "hour".

2. You may think of using Azure Data Copy (in case your primary data source is supported as input), however there are couple of gotchas:

a. Azure Data Copy activity creates the output file with a naming convention   that does not play well with Azure Stream Analytics Jobs. You can not specify the destination file name in Azure Data Copy activity - it likes to create a file named "data_{guid}" whereas Azure Stream Analytics Job is looking for a fixed file name as reference data stream.

b. If you let your synchronization process run every "UTC hour", then your Azure Stream Analytics Jobs will always be ahead of the synchronization process and therefore it will add delays to data refresh. Try completing your synchronization process before it strikes next hour on UTC clock. I personally liked using 15 min synchronization.

So that is that. Hope it helps.

No comments:

Post a Comment