16 October 2023
Published by One51

Microsoft Fabric, Data Pipelines! What do you need to Know?

Microsoft Fabric introduces Data Pipelines as a new addition to the Microsoft Data Factory family. This versatile tool combines a comprehensive range of activities designed to facilitate data movement, transformation, and orchestration, often requiring minimal or no coding. The fundamental building block, known as an “activity,” represents an executable task. Activities can be harnessed to control data flows, employing functions such as “Until,” “Filter,” conditional statements, or data movement and transformation through activities like “Copy,” “Stored Procedure,” and “Script Get Metadata.”
 
By arranging activities in a sequence, users can define a logical flow of operations. The outcome of each activity, whether it succeeds, fails, or completes, can dictate the course of subsequent activities.
 
For scenarios where data retrieval from a source without in-line transformation is essential (commonly known as ELT or Extract, Load, Transform), the “Copy” activity is a reliable companion. Data pipelines offer support for an extensive array of source connections, including Azure SQL, Amazon Redshift, Dataverse, Snowflake, FTP, SFTP, and HTTP calls. While Microsoft continues to expand these connections, they may not be as numerous as those available in Azure Data Factory or Azure Synapse Analytics pipelines.
 
However, Fabric Data Pipelines come with some limitations. Notably, they do not support Virtual Network (VNet) or Gateway/Integration Runtime. Therefore, for datasets located on-premises or protected behind a firewall, Data Flow Gen2 is the recommended choice for now. Additionally, connectors cannot be parameterized.
 
At the time of writing, continuous integration/continuous deployment (CI/CD) for Fabric pipelines is not available in preview. In general, CI/CD in Fabric relies on a managed version integrated with the workspace’s Git capabilities. Fabric handles branching and synchronization processes.
 
To execute transformations, data pipelines can invoke either Data Flow or Notebook activities. Data Flow offers a graphical, no-code experience, while Notebooks enable the execution of Apache Spark jobs and machine learning experiments.
 
In Azure Data Factory and Synapse Analytics Pipelines, the term “Trigger” is used to schedule execution. However, in Fabric Data Pipelines, it has been renamed to “Schedule,” although its function remains the same.
 
The Monitoring Hub serves as the central page for viewing and tracking all pipeline runs. It can be customized by adding or removing columns and applying filters, making it user-friendly. The hub provides valuable insights into each pipeline run, including throughput, duration, parallel copies, the number of files or records read or written, and various other metrics.
 
Despite its preview status, Microsoft Fabric boasts an array of compelling features. Data Pipelines, specifically designed for data movement and orchestration, are highly optimized to handle substantial volumes of datasets with minimal coding.
 
At One51, we are actively exploring the capabilities of Microsoft Fabric and conducting proof-of-concept projects to demonstrate its potential in real-world data projects. We welcome your feedback and encourage you to reach out if you require assistance with your data project journey.

About One51

Drawing on a wealth of expertise and a deep understanding of the Energy, Supply Chain, FMCG and other industry sectors, One51 is dedicated to helping businesses navigate the complexities of their operations by harnessing the power of data-driven insights.
With a customer-centric approach, we collaborate closely with our clients to uncover hidden patterns, mitigate risks, and realise new avenues for innovation, all while bolstering their bottom line.
Our tailored solutions, aligned with best-of-breed cloud technologies and delivered using comprehensive analytics frameworks, enable companies to optimise their processes, identify growth opportunities, and make informed strategic decisions.
As a leader in helping companies harness the power of data, the One51 team will work with you to transform complex data into actionable intelligence, helping your business gain a competitive edge in a rapidly evolving landscape.
One51 offers a comprehensive range of services encompassing the entire data and analytics lifecycle, from strategy to successful implementation and ongoing support. Our expertise covers various areas, including Data Assessments, Data Strategy, Data Governance, Data Architecture, Data Management, Data Visualisation, Advanced Analytics and Managed Services.
For more information, please visit one51.consulting