Tech tutorials Navigating the Azure Data Factory v2 Enhancements
By Insight Editor / 10 May 2018 , Updated on 16 May 2019 / Topics: Microsoft Azure
By Insight Editor / 10 May 2018 , Updated on 16 May 2019 / Topics: Microsoft Azure
Azure Data Factory (ADF) was originally released as an Azure platform service in the cloud environment in 2015 — the same year it became generally available to end users. The service was released to be the leading resource for all data orchestration activities in the cloud.
Whether the requirement is to simply copy data from source to destination or kick off a Data Lake Analytics transformation job, ADF is the answer. The ADF service is a fully managed cloud service built for complex data hybrid Extract, Transform, Load (ETL), Extract, Load, Transform (ELT) and data integration processing.
At the end of 2017, Azure publicly released ADF version 2 (v2), which introduced various enhancements and incorporated customer feedback through the initial version release. While still in public preview, it’s open for all to test the functionality and changes from ADF v1. Let’s walk through some of the key enhancements — including SQL Server Integration Services (SSIS) capabilities (finally!).
The overall concept of data sets, activities and pipelines remains intact within v2. However, the new version brings a few changes:
The following table introduces high-level differences between the two ADF services.
One of the larger changes is the transfer from the concept of time slices and data set availability to a more traditional ETL approach scheduling process. Instead of waiting for a data set to become available for an activity when a pipeline is executing, the pipeline itself is triggered and kicks off the activity regardless of the state of the data set.
The Integration Runtimes (IR) are the compute infrastructure used by ADF v2 for data movement, activity execution and SSIS package executions. The IR provides the bridge between the linked services referenced in the activity and the activity itself. The IR is referenced by the linked service, which then provides the compute environment where the activity will be run in the nearest region to provide the most efficient performance based on the target data store.
The introduction of native SSIS capabilities in ADF v2 was a key addition for the cloud data orchestration service. It provides a stepping stone for customers to get off their on-premise servers and move to a cloud-first strategy rather than completely re-architecting their existing data integration process from SSIS to ADF v1.
Along with the SSIS integration, many other features, such as control flow tasks and triggers, allow for greater flexibility in pipeline executions. As the new ADF v2 service approaches general availability and is no longer in public preview, users can submit feedback to Microsoft for further enhancements to the service.