Overcoming Limitations of Get Metadata Activity in Azure Synapse / Data Factory

Problem Statement :

There are multiple properties associated with a file uploaded on Azure Blob Storage / Azure Data Lake Storage

One can leverage Get Metadata Activity within the pipelines to get only the below sub set of properties :

Is it possible to get other properties of the file like Creation Time, Content-Type etc. in Synapse / Data Factory pipelines.

Prerequisites :

  1. Azure Data Factory / Synapse
  2. Azure Blob Storage / Azure Data Lake Storage

Solution :

  1. We would be leveraging Azure Blob Storage REST API : Get Blob to get the blob file properties.
  2. Provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage to authenticate via Managed Identity.

a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment

b) Search Storage Blob Data Reader role and proceed further

3. Create a pipeline within Synapse / Data Factory leveraging Web Activity to trigger the REST API.

URL :

In case of Azure Blob Storage

https://<<StorageAccountName>>.blob.core.windows.net/<<ContainerName>>/<<FileName>>

In case of Azure Data Lake Storage

https://<<DataLakeStorageName>>.dfs.core.windows.net/<<ContainerName>>/<<FileName/DirectoryName>>

Method : GET

Authentication : System Assigned Managed Identity

Resource : https://storage.azure.com/

Headers :

x-ms-version : 2017-11-09

Output :

Get Metadata Activity output :-

Web Activity Output (Azure Blob Storage) :-

where [x-ms-creation-time] represents the file creation time.

Web Activity Output (Azure Data Lake Storage) :-

Directory Property :

Web Activity :

Published by Nandan Hegde

Microsoft Data MVP |Microsoft Data platform Architect | Blogger | MSFT Community Champion I am a MSFT Data Platform MVP and Business Intelligence and Data Warehouse professional working within the Microsoft data platform eco-system which includes Azure Synapse Analytics ,Azure Data Factory ,Azure SQL Database and Power BI. To help people keep up with this ever-changing landscape, I frequently posts on LinkedIn, Twitter and to his blog on https://datasharkx.wordpress.com. LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66 GitHUB Profile : https://github.com/NandanHegde15 Twitter Profile : @nandan_hegde15 MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942

2 thoughts on “Overcoming Limitations of Get Metadata Activity in Azure Synapse / Data Factory

  1. Hi, This blog is interesting. I have tried the above way of getting the data. However, I am getting the data inside the blob as response as well. Any idea how to remove it.

    Like

    1. Hey @jAC, Thanks for reaching out and your feedback :).
      Those are the standard response headers via REST API and you cannot restrict them but you can enable Secure Output in Web Activity thereby restricting access to other data in output and then pull in the required outputs based on dynamic expressions like @activity(‘Web’).output.ADFWebActivityResponseHeaders[‘Content-Type’]

      Like

Leave a comment

Design a site like this with WordPress.com
Get started