Azure Data Factory (ADF) is one of the key components of the Microsoft Data platform, either as a standalone application or as part of Azure Synapse Analytics. In addition to ADF, Microsoft has been working on the Power BI Dataflows as another great tool for data load and transformation containing hundreds of out-of-box connections.
It is unsurprising to see the Data factory in Microsoft Fabric as one of the core workloads that empowers developers with a modern experience to extract, transform and load data at scale. But the key difference is that Fabric is equipped with the next generation of Data Factory. It comes with all the good features of both ADF and Dataflows!
Fabric Data Factory has two major components:
- Data Pipelines provide workflows at scale and can work with PB size of data. It allows for activities as copy, loops, and lookups.
- Dataflows (Gen2) is a low-code interface with connectors to hundreds of sources and provides more than 300 different transformations.
Fabric Data Factory is a fully managed cloud service and, like other cloud services, does not have access to the on-prem databases or any databases secured behind a firewall or virtual network. With the very first version of ADF, Microsoft introduced the concept of Integration Runtime or IR in short (managed, self-hosted and Azure) and Power BI data gateway for Power BI Dataflows. These two applications are designed to establish secure connections between on-prem/private data sources and ADF/Dataflows. Fast forward to Fabric, there is or will be a similar concept for both Data Pipelines and Dataflows. Having said that, Microsoft Fabric is still in Preview; the Data Pipelines do not support the IR and, therefore cannot be used for on-prem data sources. It is still under development and will be released soon. The good news is dataflow supports data gateway and can be used to extract data from on-prem and private data sources. There is no need to install a separate Power BI data gateway, as your existing ones can serve the Fabric dataflows. However, it might encounter issues with the dataflow refresh process. To fix that make sure the outbound traffic from the gateway server allows TCP on port 1433 for *.datawarehouse.pbidedicated.windows.net endpoint.
This issue is well documented in the Microsoft portal, and you can find more details here: On-premises data gateway considerations for data destinations in Dataflow Gen2 – Microsoft Fabric | Microsoft Learn