Overcoming Data Size / Row Limitations of Power BI REST API for Automated Data Extraction from Dataset

Problem Statement :

Power BI REST API : Datasets – Execute Queries has the below limitation :

Is it possible to have an automated way for data extraction from Power BI dataset with data greater than 15MB or dataset greater than 100000 rows.

Prerequisites :

Power BI Report / Dataset
Azure Data Factory / Synapse
Azure SQL Database / SQL Database
Azure Blob Storage
Logic App ( This Offering can be avoided if we leverage Web Activity / Dataflow in Data Factory / Synapse )

In my previous blog,

Copy Data from Power BI through Azure Data Factory / Synapse Pipeline via Managed Identity Authentication– Part 4

we have leveraged Service Principal Authentication for data extraction from Power BI, so for a change of scenario ; in this use case we would be leveraging Logic App and AD User Authentication for extracting data from Dataset.

Solution :

For the Security configurations, please follow till step #1 as mentioned in the blog Copy Data from Power BI through Azure Data Factory / Synapse Pipeline – Part 3 for enabling Managed Identity authentication.

Since we would also be leveraging User ID authentication, grant the user Dataset.Read.All rights on the dataset from which you want to extract the data.

GitHUB Code

Pipeline would have 4 variables :

To overcome the limitation of Data row / Size, we need to get the overall row count (total number of rows within the dataset) and partition the dataset in some batch range to extract the data in partitions and finally merge the data into a single file.
To get overall Row Count, we would be leveraging Web activity to execute a query on Power BI dataset.

URL : @concat('https://api.powerbi.com/v1.0/myorg/datasets/',pipeline().parameters.PBIDatasetId,'/executeQueries')

Body :
{"queries":[{"query":"@{pipeline().parameters.RowCountQuery}"}],"serializerSettings":{"includeNulls":true}}

Where value for Parameter: RowCountQuery would be a Dax Query to get the Count of Rows

EVALUATE SUMMARIZE(DataSharkX,"Count Row",COUNT(DataSharkX[EmployeeID]))

Below is my Power BI Dataset table :

Output :

3. We would now need to extract the row count from the JSON output and assign it to the RowCount variable

Expression :
@replace(string(last(split(string(activity('Row Count').output.results[0].tables[0].rows[0]),':'))),'}','')

Output :

4. For this current scenario, we plan to extract the data in batches of 90000 rows. Hence, we have assigned 90000 value to the PartitionRange pipeline parameter.

Now based on the overall row count and Partition range, we need to identify the number of iterations / batches in which we would be extracting the data.

For our use case, we have used a Lookup Activity to execute a SQL query on an Azure SQL Database.

Query value :

declare @@intialize int = 1
declare @@partition int = @{pipeline().parameters.PartitionRange}
declare @@count int = @{variables('RowCount')}
declare @@maxrun int = ceiling(@@count/@@partition)
declare @@dividendcount int = @@partition
declare @@con varchar(max) = (select CONCAT(@@intialize,';',@@dividendcount))
declare @@counter int = 0

while (@@counter<@@maxrun)
begin

set @@intialize = (select @@dividendcount )

set @@dividendcount = (select @@partition + @@dividendcount)

set @@con = (select CONCAT(@@con,'|',@@intialize,';',@@dividendcount))

set @@counter = (select @@counter+1)

end

select @@con AS ConValue

Output :

where we have generated 3 partitions in the range of

a) 1;90000

b) 90000;180000

c)180000;270000

5. Split the ConValue (output in step #4) via ‘|’ and iterate it over For Each activity

Items :
@split(string(activity('Generate Partitions').output.value[0].ConValue),'|')

6. Within For Each Activity

Set up the values for the variables: LowerRange & UpperRange.

A) LowerRange

Expression :
@substring(item(),add(indexOf(item(),';'),1),sub(sub(length(item()),indexOf(item(),';')),1))

Output :

B) UpperRange

Expression :
@substring(item(),add(indexOf(item(),';'),1),sub(sub(length(item()),indexOf(item(),';')),1))

C) Generate the Final Query to Execute on Power BI Dataset to extract the data in batches

DAX Query below is Equivalent to Row_Number function in T-SQL, wherein we generate a Rank Column in the source dataset and filter the dataset in every iteration based on data between the LowerRange and UpperRange

DEFINE VAR Employee1 = SUMMARIZECOLUMNS(DataSharkX[EmployeeID],DataSharkX[EmployeeName],DataSharkX[Rating],"EID",VALUES(DataSharkX[EmployeeID])) VAR EmployeeTable = ADDCOLUMNS ( Employee1, "Rank", RANK.EQ( [EID],DataSharkX[EmployeeID],ASC )) EVALUATE(FILTER(EmployeeTable,[Rank]>=@{variables('LowerRange')} && [Rank]< @{variables('UpperRange')} ))

Output :

D) Trigger the Logic App Synchronously that would execute this FinalQuery on Power BI Dataset and upload the file in Azure Blob Storage.

7 . Logic App Design

The Highlighted section (of Sharepoint) can be ignored in current context if need be.

a) HTTP Request :

Where Query and Iteration are the input parameters which would map to the FinalQuery and LowerRange Variable values from ADF/Synapse.

b. Run Query against Dataset

Login via the User ID that has the necessary access on the Power BI Dataset and Select the necessary Workspace Name and the Dataset.

c. Create CSV Table

d. Create Blob

Connect to the Resource Group and Provide the Storage Account Details.

And Blob Name is below expression :

concat(utcNow(),triggerBody()?['Iteration'],'pbi.csv')

e. (Optional) Create File Sharepoint

Where File Name is below expression :

concat(utcNow(),triggerBody()?['Iteration'],'pbi.csv')

f. For Synchronous call and ADF / Synapse Webhook , Instead of using a request response, we’ll use a HTTP POST message to a CallBackUri.

output after all Iterations:

8. To Merge/Club the different files into a single file, leverage Copy Activity

Source Setting :

where Source Dataset :

Sink Setting :

where Sink Dataset :

Output :

9. Finally Delete the Individual Partitioned data files

Note :

One can leverage Web Activity in place of webhook/Logic App to execute the queries as well but the partition range needs to be much lower than the 100000 range as Web Activity has a lower response range.

Error Message for greater size:

Or one case use Dataflow activity to execute the queries.

File Output :

2 thoughts on “Overcoming Data Size / Row Limitations of Power BI REST API for Automated Data Extraction from Dataset”

Sagar Jaiswal says:

May 22, 2023 at 3:23 pm

Thanks for the wonderful tutorial. I really appreciate your time and effort for writing this. There is a syntax mistake in “LowerPartitionRange”. It should be @substring(item(),0,indexOf(item(),’;’)).

Also, instead of using Logic app we can trigger web activity to fetch the data using final query and save the output in blob(json file) then use copy activity to create a merged csv or insert into database this will also help to avoid json flatten cost using dataflow and save money.

Note – Make sure to save json result rows only using (@activity(‘Extract data’).output.results[0].tables[0].rows) otherwise copy activity will not be able to read multiline.

LikeLike

Aditya Rai says:

January 29, 2024 at 12:49 pm

lower partition range query is not working

LikeLike

Overcoming Data Size / Row Limitations of Power BI REST API for Automated Data Extraction from Dataset

Published by Nandan Hegde

2 thoughts on “Overcoming Data Size / Row Limitations of Power BI REST API for Automated Data Extraction from Dataset”

Leave a comment Cancel reply

Share this:

Related

Published by Nandan Hegde

2 thoughts on “Overcoming Data Size / Row Limitations of Power BI REST API for Automated Data Extraction from Dataset”

Leave a comment Cancel reply