Sunday 12 June 2016

"Cannot have more than 5 receivers" - Azure Stream Analytics Jobs

If you try to write a little large or complex Azure Stream Analytics Job, whether due to logic requirement or whether you are trying to minimize the number of jobs to manage, there is a high chance that you will be required to utilize the UNION operator in your query so that you can club outputs. Well, that is a good option but there are certain restrictions around it.

Link#1 explains such issue. If you use EventHub as an input to the Azure Stream Analytics Job then you can not have more than 5 input streams.

Consider the below query as an example:

WITH Result1
(
   SELECT Min(S.Val) AS Col1,
               Max(S.Val) AS Col2,
               Avg(S.Val) Col3,
               S.Name AS Col4,
               S.Id AS Col5,
               S.F1 AS Col6,
               S.F2 AS Col7
   From
          InputStream S
   TIMESTAMP BY S.[TimeStamp]
   GROUP BY
          S.Name, S.Id, S.F1, S.F2, SlidingWindow(Min, 5)
),
Result2
(
   SELECT S1.Col4, 'Name' as 'Name'
   FROM Result1 S1
   UNION
   SELECT S1.Col5, 'ID' AS 'Id'
   FROM Result1 S1
   UNION
   SELECT S1.Col6, 'F1' AS 'Name'
   FROM Result1 S1
   UNION
   SELECT S1.Col7, 'F2' AS 'Name'
   FROM Result1 S1
   UNION
   SELECT S1.Col1, 'Min' AS 'Min'
   FROM Result1 S1
   UNION
   SELECT S1.Col2, 'Max' AS 'Max'
   FROM Result1 S1
   UNION
   SELECT S1.Col3, 'Avg' AS 'Avg'
   FROM Result1 S1
)
SELECT * INTO OutputStream FROM Result2

If this query utilizes EventHub as an Input source, this query will have problem in starting.

One possible solution is to break the Result2 into 2-3 queries and have their outputs sent to different output streams. There is no restriction (as yet) that stops you to add one output destination multiple times with different names. Let us check that:

WITH Result1
(
   SELECT Min(S.Val) AS Col1,
               Max(S.Val) AS Col2,
               Avg(S.Val) Col3,
               S.Name AS Col4,
               S.Id AS Col5,
               S.F1 AS Col6,
               S.F2 AS Col7
   From
          InputStream S
   TIMESTAMP BY S.[TimeStamp]
   GROUP BY
          S.Name, S.Id, S.F1, S.F2, SlidingWindow(Min, 5)
),
Result2
(
   SELECT S1.Col4, 'Name' as 'Name'
   FROM Result1 S1
   UNION
   SELECT S1.Col5, 'ID' AS 'Id'
   FROM Result1 S1
   UNION
   SELECT S1.Col6, 'F1' AS 'Name'
   FROM Result1 S1
   UNION
   SELECT S1.Col7, 'F2' AS 'Name'
   FROM Result1 S1  
),
Result3
(

  SELECT S1.Col1, 'Min' AS 'Min'
   FROM Result1 S1
   UNION
   SELECT S1.Col2, 'Max' AS 'Max'
   FROM Result1 S1
   UNION
   SELECT S1.Col3, 'Avg' AS 'Avg'
   FROM Result1 S1
)
SELECT * INTO OutputStream1 FROM Result2
SELECT * INTO OutputStream2 FROM Result3

That's one of the optional ways. Of course, it would be nicer if Azure Stream Analytics Jobs don't have such restrictions.






No comments:

Post a Comment