Problem Statement :
There are multiple properties associated with a file uploaded on Azure Blob Storage / Azure Data Lake Storage
One can leverage Get Metadata Activity within the pipelines to get only the below sub set of properties :
Is it possible to get other properties of the file like Creation Time, Content-Type etc. in Synapse / Data Factory pipelines.
Prerequisites :
- Azure Data Factory / Synapse
- Azure Blob Storage / Azure Data Lake Storage
Solution :
- We would be leveraging Azure Blob Storage REST API : Get Blob to get the blob file properties.
- Provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage to authenticate via Managed Identity.
a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment
b) Search Storage Blob Data Reader role and proceed further
3. Create a pipeline within Synapse / Data Factory leveraging Web Activity to trigger the REST API.
URL :
In case of Azure Blob Storage
https://<<StorageAccountName>>.blob.core.windows.net/<<ContainerName>>/<<FileName>>
In case of Azure Data Lake Storage
https://<<DataLakeStorageName>>.dfs.core.windows.net/<<ContainerName>>/<<FileName/DirectoryName>>
Method : GET
Authentication : System Assigned Managed Identity
Resource : https://storage.azure.com/
Headers :
x-ms-version : 2017-11-09
Output :
Get Metadata Activity output :-
Web Activity Output (Azure Blob Storage) :-
where [x-ms-creation-time] represents the file creation time.
Web Activity Output (Azure Data Lake Storage) :-
Directory Property :
Web Activity :
Hi, This blog is interesting. I have tried the above way of getting the data. However, I am getting the data inside the blob as response as well. Any idea how to remove it.
LikeLike
Hey @jAC, Thanks for reaching out and your feedback :).
Those are the standard response headers via REST API and you cannot restrict them but you can enable Secure Output in Web Activity thereby restricting access to other data in output and then pull in the required outputs based on dynamic expressions like @activity(‘Web’).output.ADFWebActivityResponseHeaders[‘Content-Type’]
LikeLike