Problem Statement :
In my previous blog Event Trigger Data Sync from SQL Server to Synapse via Azure Data Factory / Synapse Pipeline, we were able to trigger a Synapse / ADF pipeline via SQL server leveraging a Dedicated SQL Pool.
Is it possible to instantaneously trigger Synapse / ADF Pipeline in case of any data changes (Insert / Update /Delete) within a table in SQL server database without leveraging Linked server/ Dedicated SQL Pool.
Prerequisites :
- SQL Server ( 2022 )
- Azure Blob Storage
- Synapse / Data Factory
Solution :
To trigger Synapse / ADF pipeline via SQL server, we would be leveraging CETAS (Polybase) feature of SQL server 2022.
- Ensure Polybase is Enabled on the database via which we want to trigger the Pipelines.
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;
Run the following command and verify the return value is 1 to ensure PolyBase is installed.
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
2. To Leverage CETAS functionality, enable Polybase Export by executing the below command
exec sp_configure @configname = 'allow polybase export', @configvalue = 1;
RECONFIGURE;
3. Set up External Data Source and File format for creation of an External table
a) Create a Master Key (If already not exists)
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<<>>';
b) Create Database Scoped Credential to Azure Blob storage
CREATE DATABASE SCOPED CREDENTIAL [BlobSAS]
WITH
IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2022-11-02<<>>' ;
GO
Generate the SAS by Selecting Container & Object in Allowed Resource Types
Note : Access Key is not supported in SQL 2022 and only SAS is supported.
C) Create External Data Source
CREATE EXTERNAL DATA SOURCE [BlobSource]
WITH
( LOCATION = 'abs://<<BlobName>>.blob.core.windows.net/<<ContainerName>>' ,
CREDENTIAL = [BlobSAS],
) ;
D) Create External File Format
CREATE EXTERNAL FILE FORMAT [CommaDelimited] WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR = N',',
STRING_DELIMITER = N'"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True)
)
4. The purpose behind #3 is for us to leverage this in the CETAS creation which in turn would generate a file in the Azure blob storage which we can leverage for Blob trigger purpose. So next action is to create a Stored procedure in SQL Server which can be triggered in case of any data change.
CREATE PROCEDURE dbo.EventTriggerPipeline
AS
IF EXISTS ( SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'EventTrigger' AND TABLE_SCHEMA='dbo')
DROP EXTERNAL TABLE dbo.EventTrigger
CREATE EXTERNAL TABLE dbo.EventTrigger
WITH (
LOCATION='Adhoc/',
DATA_SOURCE=[BlobSource],
FILE_FORMAT =[CommaDelimited]
)
AS
SELECT GETDATE() AS Event
5. Set up Blob Event Type trigger on the Pipeline which needs to be triggered.
Note: The Blob path begins is equivalent to the Location specified in CETAS script in #4.
Also, at the end of pipeline, make sure to delete the file via which the Pipeline was triggered via Delete activity to avoid the below error :
“Cannot create external table. External table location already exists.“
6. The next aspect would be to set up Trigger in the SQL server on the table whose data changes needs to trigger the pipeline.
Create Table dbo.DataSharkX
(
C1 Varchar(10),
C2 Varchar(10)
);
Create Trigger EventPipe
On dbo.DataSharkx
AFTER INSERT,UPDATE,DELETE
AS
BEGIN
EXEC dbo.EventTriggerPipeline
END
So the set up is established which would flow in the below order whenever any Data changes (Insert / Update / Delete) happens within the SQL server table :
1) SQL Trigger activated => 2) SQL Server Stored Procedure Executed => 3) External Table creation (CETAS) => 4) File Creation in Blob => 5) ADF/Synapse Event Trigger activated => 6) ADF/Synapse Pipeline triggered => 7) Delete files via Delete activity at end
Result :
a) Insert Data into SQL Server table
1 record was inserted into table and files were generated in blob via CETAS