Microsoft Excel has become an essential source of data in most organisations.
Azure Data Factory now supports processing Excel files natively, making this process simpler by removing the need to use intermediate CSV files.
Azure Data Factory (ADF) now has built-in functionality that supports ingesting data from xls and xlsx files. These files could be located in different places, including as Amazon S3, Amazon S3 Compatible Storage, Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure File Storage, File System, FTP/SFTP, Google Cloud Storage, HDFS, HTTP and Oracle Cloud Storage. Prior to ADF supporting such functionality, data engineers needed to apply workarounds, such as using PowerShell scripts or Azure Functions to convert the excel file into CSV.
In this post, I will develop an ADF pipeline to load an excel file from Azure Data Lake Gen 2 into an Azure SQL Database.
Step 1 – About the source file:
I have an excel workbook titled ‘2018-2020.xlsx’ sitting in Azure Data Lake Gen2 under the “excel dataset” folder. In this workbook, there are two sheets, “Data” and “Note”. The “Data” sheet contains exchange rates per date for different currencies, while the “Note” sheet has the full list of currencies with their codes and names.



Step 2 – Target database:
I provisioned an Azure SQL database called One51Training. This database will host the Exchange Rate data. As you can see, there are no tables created yet. I will configure the ADF pipeline to create one table per sheet. The table structure will reflect both the header and columns within each sheet.

Step 3 Building the data pipeline:
Now is the time to build and configure the ADF pipeline. I’ll be using the Copy Activity for the data transfer. To do this, the following ADF components are needed:
Linked Services: Contains the source connection details and credentials.
Datasets: Represents a named logical view of the source data.
Pipeline: It is the logical workflow of data transfer activities.
The following procedure outlines the required configuration:
- Create a new Linked Service for Azure Data Lake Storage Gen2

- Create a Linked Service for the Azure SQL Database

- Navigate to the Dataset page and create a dataset for Azure Data Lake Storage Gen2 by selecting the excel file.
In addition, I created a parameter to hold the sheet’s name. As I mentioned earlier, the excel file has two sheets, the first one has the rates, and the second one has the currency names and codes.
One of the advantages of using parameters is reusability, and I will leverage that in this case, as ADF will iterate through all sheets available in the Excel file.

- Create another dataset for the destination database but this time selecting the Azure SQL Database as a dataset type. Similarly, as for the source dataset, create a parameter to hold the table name. I will be creating a table per Excel sheet under the dbo schema.
- Create another dataset for the destination database but this time selecting the Azure SQL Database as a dataset type. Similarly, as for the source dataset, create a parameter to hold the table name. I will be creating a table per Excel sheet under the dbo schema.

- Create a new pipeline and add a “ForEach” activity. The “ForEach” activity will iterate through all sheets and copy their content into a table. To achieve this, I have created a parameter for the pipeline. The parameter type is Array, and the value is a JSON string containing the names of the sheets:
[
{“SheetName”:”Data”},
{“SheetName”:”Note”}
]

Use the Pipeline parameter as Foreach Items setting

Edit the “ForEach” loop, add a Copy activity followed by configuring both Source and Sink:

For the Sink configuration, I am using the Auto-Create table option and adding a Drop table statement as a Pre-Copy script:
Add dynamic content:
@{Concat(‘DROP TABLE IF EXISTS ‘,item().SheetName)}

- Finally, execute the pipeline and check the result in SQL Server:

One51 helps Organisations with their data ingestion and data integration needs by using an approach and the technology that make this type of solutions easy to implement and a more cost-effective alternative to the traditional custom ETL development.
Contact us to know more about our Services.