Microsoft Fabric introduces Data Pipelines as a new addition to the Microsoft Data Factory family. This versatile tool combines a comprehensive range of activities designed to facilitate data movement, transformation, and orchestration, often requiring minimal or no coding. The fundamental building block, known as an “activity,” represents an executable task. Activities can be harnessed to control data flows, employing functions such as “Until,” “Filter,” conditional statements, or data movement and transformation through activities like “Copy,” “Stored Procedure,” and “Script Get Metadata.”
By arranging activities in a sequence, users can define a logical flow of operations. The outcome of each activity, whether it succeeds, fails, or completes, can dictate the course of subsequent activities.
For scenarios where data retrieval from a source without in-line transformation is essential (commonly known as ELT or Extract, Load, Transform), the “Copy” activity is a reliable companion. Data pipelines offer support for an extensive array of source connections, including Azure SQL, Amazon Redshift, Dataverse, Snowflake, FTP, SFTP, and HTTP calls. While Microsoft continues to expand these connections, they may not be as numerous as those available in Azure Data Factory or Azure Synapse Analytics pipelines.
However, Fabric Data Pipelines come with some limitations. Notably, they do not support Virtual Network (VNet) or Gateway/Integration Runtime. Therefore, for datasets located on-premises or protected behind a firewall, Data Flow Gen2 is the recommended choice for now. Additionally, connectors cannot be parameterized.
At the time of writing, continuous integration/continuous deployment (CI/CD) for Fabric pipelines is not available in preview. In general, CI/CD in Fabric relies on a managed version integrated with the workspace’s Git capabilities. Fabric handles branching and synchronization processes.
To execute transformations, data pipelines can invoke either Data Flow or Notebook activities. Data Flow offers a graphical, no-code experience, while Notebooks enable the execution of Apache Spark jobs and machine learning experiments.
In Azure Data Factory and Synapse Analytics Pipelines, the term “Trigger” is used to schedule execution. However, in Fabric Data Pipelines, it has been renamed to “Schedule,” although its function remains the same.
The Monitoring Hub serves as the central page for viewing and tracking all pipeline runs. It can be customized by adding or removing columns and applying filters, making it user-friendly. The hub provides valuable insights into each pipeline run, including throughput, duration, parallel copies, the number of files or records read or written, and various other metrics.
Despite its preview status, Microsoft Fabric boasts an array of compelling features. Data Pipelines, specifically designed for data movement and orchestration, are highly optimized to handle substantial volumes of datasets with minimal coding.
At One51, we are actively exploring the capabilities of Microsoft Fabric and conducting proof-of-concept projects to demonstrate its potential in real-world data projects. We welcome your feedback and encourage you to reach out if you require assistance with your data project journey.