Auto Cancel Long Running Pipelines within Synapse / Azure Data Factory

Problem Statement :

As of 21st Sep, 2023 ; pipelines within Synapse / Data factory do not have a timeout functionality because of which there is no out of box functionality to auto cancel / alert / notify long running pipelines within Synapse / ADF.

Is it possible Auto Cancel Long Running Pipelines within Synapse / Azure Data Factory.

Prerequisites :

  1. Azure Data Factory / Synapse

Solution :

To achieve this functionality, we would have to build our own custom logic as below

GitHUB Code

where the activities highlighted in yellow are the additional custom logics that needs to be added within the existing pipeline.

Pipeline Parameters :

where :

a) TimeOutInSec : The timeout value of the pipeline in Seconds

b) QueueTimeInSec : Time Interval in Seconds to validate the pipeline execution time / Status of pipeline

c) SubscriptionID : SubscriptionID of the Subscription hosting the Azure Data Factory / Synapse

d) ResourceGroupName : Name of Resource group hosting the Azure Data Factory / Synapse

Pipeline Variables :

In the above flow; Wait1, Wait2 & Fail1 activities are the normal pipeline dataflow activities.

  1. ‘Set Pipeline Status’ Set variable activity is used to determine the status of the normal flow in case of no timeout scenario (whether the Pipeline is Success or Failure)
Value : 
@if(empty(replace(concat(activity('Wait1').error?.message,'^|',
activity('Wait2')?.error?.message,'^|',activity('Fail1')?.error?.message),'^|','')),'Success','Failure')

The above value and the integration aspect of the activities in Sequential flow can be understood from the below blog :

Error Logging and the Art of Avoiding Redundant Activities in Azure Data Factory

2. In parallel to your actual flow, we would use Until Activity and IF Activity to validate whether the pipeline has exceeded the allocated timeout value and take necessary actions.

a) Until Activity :

Expression :
@or(not(empty(variables('PipelineStatus'))),greaterOrEquals( variables('ExecutionTime'),pipeline().parameters.TimeoutInSec))

where the until activity would iterate till the time either the main flow within the pipeline is completed (success or Failure) OR the pipeline execution is still in progress and has exceeded the timeout allocated for its execution.

within Until Activity :

where the

i) Wait3 activity is to wait for some time before proceeding for next iteration.

ii) ‘Set TempExecutionTime’ Set variable activity to get the overall execution time till that instance

Value :
@add(variables('ExecutionTime'),pipeline().parameters.QueueTimeInSec)

iii) ‘Set ExecutionTime’ Set variable activity to override the ExecutionTime variable

Value :
@variables('TempExecutionTime')

3. The IF activity is to check whether the Until exit was due to normal pipeline completion (Success or failure) to avoid notification/Cancellation or due to timeout.

Expression :
@not(empty(variables('PipelineStatus')))

Validate whether the PipelineStatus variable is empty or not.

In case if it is empty, it means the normal flow of the pipeline is still in progress and has not reached the Set variable activity stage ; thereby exceeding the timeout allocated for the pipelines.

In our use case, to Cancel the pipeline run, we would use Web activity and trigger the pipeline REST API.

The Set up and details to cancel the pipeline run is illustrated in the below article :

Cancel Azure Data Factory Pipeline Runs via Synapse / Data Factory

One can leverage the Logic App and web activity combination to Notify in case of timeout aspect (by overriding the Pipeline cancellation Web activity within the Logic App trigger web activity).

Output :

Scenario 1)

Normal Success within allocated time (disable the fail activity within the example and update the expression in ‘Set Pipeline Status’ activity)

Scenario 2)

Failure within allocated time

Scenario 3)

Pipeline execution exceeding Timeout

Published by Nandan Hegde

Microsoft Data MVP |Microsoft Data platform Architect | Blogger | MSFT Community Champion I am a MSFT Data Platform MVP and Business Intelligence and Data Warehouse professional working within the Microsoft data platform eco-system which includes Azure Synapse Analytics ,Azure Data Factory ,Azure SQL Database and Power BI. To help people keep up with this ever-changing landscape, I frequently posts on LinkedIn, Twitter and to his blog on https://datasharkx.wordpress.com. LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66 GitHUB Profile : https://github.com/NandanHegde15 Twitter Profile : @nandan_hegde15 MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942

Leave a comment

Design a site like this with WordPress.com
Get started