Tuesday, October 17, 2023

Azure Data Factory - how to ACTUALLY use storage event triggers to copy data from Blob Storage to Azure SQL Database (Azure SQL DB)

 This post is in response to an incredibly frustrating interaction with Azure.

I figured I would do something incredibly simple - import a parquet file from blob storage into Azure SQL DB. When the file is written to Blob Storage (using CETAS, fwiw), it should activate the storage trigger, which should kick off the pipeline. 


The problem is that it won't work as the tutorial (https://learn.microsoft.com/en-us/azure/data-factory/tutorial-copy-data-portal) shows. It says it can't find the file. Why not? Well, it wasn't firewalls or anything else. It's that a linked service dataset, in "Sink" requires a container name - and the trigger sends the container name also. So I kept getting messages that it couldn't find \mycontainer\mycontainer\myfolder\myfile.parquet . I found a single message on Stack Overflow (https://stackoverflow.com/questions/67294521/azure-data-factory-how-to-get-new-file-location-from-triggerbody-folderpath/74738786) on how to solve it. So, here we go: 

package parameters:





Source:


@pipeline().parameters.trigger_folder_path

@pipeline().parameters.trigger_file_name


My Source Dataset:







Sink:







And the last part in the trigger that makes it all work:

Here's what to put for trigger_folder_path, as per Steve Johnson on SO. No, you don't have to change the YAML, you can modify the values above: 
@substring(triggerBody().folderPath,add(indexof(triggerBody().folderPath,'/'),1),sub(length(triggerBody().folderPath),add(indexof(triggerBody().folderPath,'/'),1)))


Why it's that hard to find/use, I have no idea. 
All Hail Steve Johnson!

No comments: