Sort an Array in Azure Data Factory / Synapse Pipeline via Regular Expressions

Problem Statement :

In Azure Data Factory / Synapse, there is no dedicated function under regular expressions to Sort an Array.

So is it possible to Sort an Array in ADF / Synapse without the use of Dataflow.

Prerequisites :

  1. Azure Data Factory / Synapse

Solution :

To Sort an Array, we can follow the below approach

GitHUB Code

  1. Create a Pipeline parameter : InputArray and Pipeline Variables as below

2. Initially Assign the InputArray parameter value to a variable: SourceArray which we would leverage and update based on the Sort Logic.

3. Set the Iteration range (equivalent to #1 in Code Snippet)

@range(0,add(length(variables('SourceArray')),-1))

4. Until Activity (Equivalent to While Iteration in Code)

5. Within Until Activity :

6. Get the Minimum Index in the Iteration

7. Get (Min + 1) Index value

@string(add(int(variables('CurrentItemValue')),1))

8. Compare the Elemental values (Equivalent to #2 in Code Snippet)

@greater(variables('SourceArray')[int(variables('CurrentItemValue'))], variables('SourceArray')[int(variables('NextItemValue'))])

9. In case if the initial element is greater than the next element value (TRUE Activities), Swap the values (Equivalent to #3 in Code Snippet)

Please refer the blog : Swapping Two Array Elements in Azure Data Factory / Synapse Pipeline via Regular Expressions for understanding the highlighted Section in above image .

Finally Reset the iteration ( Equivalent to #4 in Code Snippet)

@range(0,add(length(variables('SourceArray')),-1))

10. In case if the initial element is less than or equal than the next element value (FALSE Activities)

Increment the iteration ( Equivalent to yellow highlighted section in Code Snippet)

In Synapse / ADF , it is not possible to self reference a variable in Set Variable Activity. Hence, we would need to create a TempArray variable and then over ride the original array with the TempArray one.

11. The Final Activity : Debug is only for debugging purpose to get the sorted value in the last activity log (This can be avoided in real case scenarios as we are mapping the original sorted Array into a new variable)

Result :

Numeric :

Input :

Output :

String :

Input :

Output :

Note :

To get the Sort in Descending order, we can follow the below logics :

a) Reverse the Array once it is sorted in Ascending order

Reversing an Array in Azure Data Factory / Synapse Pipeline via Regular Expressions

b) In #8 , IF Activity ; use the below logic

@less(variables('SourceArray')[int(variables('CurrentItemValue'))], variables('SourceArray')[int(variables('NextItemValue'))])

Published by Nandan Hegde

Microsoft Data MVP |Microsoft Data platform Architect | Blogger | MSFT Community Champion I am a MSFT Data Platform MVP and Business Intelligence and Data Warehouse professional working within the Microsoft data platform eco-system which includes Azure Synapse Analytics ,Azure Data Factory ,Azure SQL Database and Power BI. To help people keep up with this ever-changing landscape, I frequently posts on LinkedIn, Twitter and to his blog on https://datasharkx.wordpress.com. LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66 GitHUB Profile : https://github.com/NandanHegde15 Twitter Profile : @nandan_hegde15 MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942

Leave a comment

Design a site like this with WordPress.com
Get started