Saturday, January 10, 2015

Session 9 : Merge Join in SSIS
AnandSaturday, January 10, 2015 0 comments

Session 9 : Merge Join in SSIS


In this tutorial we will create a package in which we will join three datasets to form one dataset. We will use a Data Flow Task, three OLE DB Sources, four Sort transformations, and two Merge Join transformations. The Merge Join transformations will be used to join the datasets together. Only two datasets can be joined at a time with the Merge Join transformation, that is why we will use two Merge Joins. We will use the Sort transformations to presort our datasets before joining with the Merge Join, because the Merge Join requires that both datasets be sorted in the same order, using the columns that will be joined.

Using Merge Join:

We will start out with a connetion manager that is created for the Adventureworks database
Empty Control Flow


We will create a Data Flow Task by dragging it out of the toolbox.
Data Flow Task


Now we will create an OLE DB Source called "Products", in which we will select all of the products (see query below).
Products Data Flow Task


Double click on the Products OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductID, Name, ProductNumber, and ProductSubcategoryID from the Product table
OLE DB Source Editor


Create an OLE DB Source called "Product Subcategory", in which we will select all of the Product Categories .
Product Subcategory Data Flow Task


Double click on the Product Subcategory OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductSubcategoryID, and Name from the ProductCategory table.
OLE DB Source Editor


Create an OLE DB Source called "Purchase Order Detail", in which we will select all of the Details of the Purchase Orders
Purchase Order Detail


Double click on the Purchase Order Detail OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the PurchaseOrderID, ProductID, and UnitPrice from the PurchaseOrderDetail table.
OLE DB Source Editor


Now we will create two Sort components and join the pipeline from Products to one of the sort transformations and join the pipeline from Product Subcategory to the other sort transformation. Remember that both datasets that you are joining must be sorted the same before joining with a Merge Join.
Sort Transformations


Click on the Sort that we connected to the Products source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Sort Transformation Editor


Click on the Sort that we connected to the Product Subcategory source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Sort Transformation Editor


Now we will add a Merge Join transformation. This will allow us to join the Products and Product Subcategory sources together. Drag the pipeline arrow from the Products Sort to the Merge Join.
Merge Join


The Input Output Selection window will appear. Select Merge Join Left Input. This will mean that we are using the Products on the Left hand side of the join. If you are familiar with Left and Right joins in SQL this is a familiar concept. Choosing Left Input doesn't mean we are necessarily doing an outer or inner join (we define that later), it just defines what side of the join this input will be used as.
Input Output Selection window


Now drag the pipeline arrow from the Product Subcategory Sort to the Merge Join. This time it will most likely not prompt you for which side of the join you want to add this input, as we already selected Left Input for the previous input.
Merge Join


Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductSubcategoryID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be "Inner Join", "Left Outer Join" or "Full Outer Join". In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Notice that below there is a column called "Output Alias". This column allows us to rename columns to new names. This allows us to rename "Name" from Products to "ProductName" and renam "Name" from ProductSubcategory to "CategoryName". So after the Merge Join, this columns will now be known be these alias names. When completed click OK.
Merge Join Transformation Editor


Now we will create two Sort components and join the pipeline from the Merge Join to one of the sorts and join the pipeline from the Purchase Order Detail source to the other sort.
Sort Transformation


Click on the Sort that we connected to the Merge Join output and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Sort Transformation Editor


Click on the Sort that we connected to the Purchase Order Detail source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Sort Transformation Editor


Now we will add a Merge Join transformation. This will allow us to join the Results of the first Merge Join and the Purchase Order Detail source together. Drag the pipeline arrow from the sort transformation of the first Merge Join to the Merge Join. The Input Output Selection window will appear. Select Merge Join Left Input. Then drag the pipeline arrow from the Purchase Order Detail sort to the Merge Join.
Merge Join


Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be "Inner Join", "Left Outer Join" or "Full Outer Join". In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Click OK hen completed.
Merge Join Transformation Editor


Now we can output our joined dataset to any source we desire. I Hope you are now able to successfully join datasets using SSIS.
Destination Success

In Category :
About The Author Anand Anand is a Microsoft Certified MCITP (Business Intelligence Infrastructure Using Microsoft SQL Server 2008), MCTS (SQL Server 2008, Business Intelligence Development and Maintenance) with 8 + years of experience in the Finance , Education, Healthcare, Banking and Insurance, Telecom domain focused on delivering software design, development, and data migrations from diversified data sources using Business Intelligence analysis tools..

0 comments

Post a Comment