9 June 2021
Published by One51

Using Azure Data Factory to Import RSS Feeds into your Datawarehouse

In our Tech Challenge #1, we walked through the steps required to import Excel data into an Azure SQL Database using Azure Data Factory. This Tech Challenge will explore how Azure Data Factory helps to extract data from an RSS Feed.

RSS is a web feed that allows users and applications to access updates to websites in a standardised, computer-readable format. It’s an XML-formatted plain text allowing compatibility with many different programs. Azure Data Factory (ADF) supports XML format for the following connectors: Amazon S3, Amazon S3 Compatible Storage, Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure File Storage, File System, FTP, Google Cloud Storage, HDFS, HTTP, Oracle Cloud Storage, and SFTP.

In this post, I will load exchange rates into an Azure SQL Database using ADF. The exchange rates are provided by the Reserve Bank of Australia (RBA) through an RSS Feed. The data is found under the following Exchange Rates website.


Image 1

Image 2

  • For my target destination, I have provisioned an Azure SQL Database and created a table for the exchange rates.





RATE_DESCRIPTION varchar(2000) 

  • This table is my staging table, and I am going to use the truncate and load approach. Let’s navigate to the ADF portal and start developing the pipeline. I need an HTTP Linked Service to connect to the RBA website. Since the website and RSS are public, I use the anonymous authentication type and https://www.rba.gov.au/rss/rss-cb-exchange-rates.xml for the base URL.

Image 3

  • The source Linked Service is ready and connected successfully. Note, I need the Azure SQL Database Linked Service for the target database.

Image 4

  • Next is creating the source and destination datasets. For the source, I need an HTTP dataset with the XML format. The destination dataset, on the other hand, will be an Azure SQL Database table. In this case, I have parameterised the table name to pass it later during the Copy Data activity.

Image 5

Image 6

  • Finally, I need to create a pipeline with the Copy Data activity and configure the Source, Sink, and Mapping.

Image 7

  • The source for the Copy Data activity is the HTTP Dataset and “GET” for the “Request method”. There is no need to configure the rest of the attributes.

Image 8

  • I will use the Azure SQL Database as my target; I used a parameterised Azure SQL Dataset and configured the “TableName” parameter value to “EXCHANGE_RATE”. Since I want to take a truncate and load approach, I provided the truncate table command in a “Pre-copy script”.

Image 9

  • The RSS feed provides the rates and additional information about the feed, including the description and trend. I am only interested in the exchange rates so I will only extract/map the Rate Description, Rate Date, Rate Value, and Currency Code. Reviewing the XML data indicates the exchange rates are exposed under “[‘rdf:RDF’][‘item’]” tag. This tag is the starting point or, in other words, “Collection
    Reference”. Table 1 lists the rest of the tags of data that I need.

    Collection Reference: $[‘rdf:RDF’][‘item’] 











Table 1

Image 10

  • Time to execute the pipeline and check my target table:

Image 11

Image 12

As you can see, the pipeline connected to the RBA website loaded the XML data, mapped the required records and columns, and loaded the exchange rates into my staging table. 

One51 helps Organisations with their data ingestion and data integration needs by using an approach and the technology that make this type of solutions easy to implement and a more cost-effective alternative to the traditional custom ETL development.

Contact us to know more about our Services.