Session 9 : Merge Join in SSIS
In this tutorial we will create a package in which we will join three datasets to form one dataset. We will use a Data Flow Task, three OLE DB Sources, four Sort transformations, and two Merge Join transformations. The Merge Join transformations will be used to join the datasets together. Only two datasets can be joined at a time with the Merge Join transformation, that is why we will use two Merge Joins. We will use the Sort transformations to presort our datasets before joining with the Merge Join, because the Merge Join requires that both datasets be sorted in the same order, using the columns that will be joined.
We will create a Data Flow Task by dragging it out of the toolbox.
Now we will create an OLE DB Source called "Products", in which we will select all of the products (see query below).
Double click on the Products OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductID, Name, ProductNumber, and ProductSubcategoryID from the Product table
Create an OLE DB Source called "Product Subcategory", in which we will select all of the Product Categories .
Double click on the Product Subcategory OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductSubcategoryID, and Name from the ProductCategory table.
Create an OLE DB Source called "Purchase Order Detail", in which we will select all of the Details of the Purchase Orders
Double click on the Purchase Order Detail OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the PurchaseOrderID, ProductID, and UnitPrice from the PurchaseOrderDetail table.
Now we will create two Sort components and join the pipeline from Products to one of the sort transformations and join the pipeline from Product Subcategory to the other sort transformation. Remember that both datasets that you are joining must be sorted the same before joining with a Merge Join.
Click on the Sort that we connected to the Products source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Click on the Sort that we connected to the Product Subcategory source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Now we will add a Merge Join transformation. This will allow us to join the Products and Product Subcategory sources together. Drag the pipeline arrow from the Products Sort to the Merge Join.
The Input Output Selection window will appear. Select Merge Join Left Input. This will mean that we are using the Products on the Left hand side of the join. If you are familiar with Left and Right joins in SQL this is a familiar concept. Choosing Left Input doesn't mean we are necessarily doing an outer or inner join (we define that later), it just defines what side of the join this input will be used as.
Now drag the pipeline arrow from the Product Subcategory Sort to the Merge Join. This time it will most likely not prompt you for which side of the join you want to add this input, as we already selected Left Input for the previous input.
Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductSubcategoryID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be "Inner Join", "Left Outer Join" or "Full Outer Join". In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Notice that below there is a column called "Output Alias". This column allows us to rename columns to new names. This allows us to rename "Name" from Products to "ProductName" and renam "Name" from ProductSubcategory to "CategoryName". So after the Merge Join, this columns will now be known be these alias names. When completed click OK.
Now we will create two Sort components and join the pipeline from the Merge Join to one of the sorts and join the pipeline from the Purchase Order Detail source to the other sort.
Click on the Sort that we connected to the Merge Join output and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Click on the Sort that we connected to the Purchase Order Detail source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Now we will add a Merge Join transformation. This will allow us to join the Results of the first Merge Join and the Purchase Order Detail source together. Drag the pipeline arrow from the sort transformation of the first Merge Join to the Merge Join. The Input Output Selection window will appear. Select Merge Join Left Input. Then drag the pipeline arrow from the Purchase Order Detail sort to the Merge Join.
Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be "Inner Join", "Left Outer Join" or "Full Outer Join". In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Click OK hen completed.
Now we can output our joined dataset to any source we desire. I Hope you are now able to successfully join datasets using SSIS.
In this tutorial we will create a package in which we will join three datasets to form one dataset. We will use a Data Flow Task, three OLE DB Sources, four Sort transformations, and two Merge Join transformations. The Merge Join transformations will be used to join the datasets together. Only two datasets can be joined at a time with the Merge Join transformation, that is why we will use two Merge Joins. We will use the Sort transformations to presort our datasets before joining with the Merge Join, because the Merge Join requires that both datasets be sorted in the same order, using the columns that will be joined.
Using Merge Join:
We will start out with a connetion manager that is created for the Adventureworks databaseWe will create a Data Flow Task by dragging it out of the toolbox.
Now we will create an OLE DB Source called "Products", in which we will select all of the products (see query below).
Double click on the Products OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductID, Name, ProductNumber, and ProductSubcategoryID from the Product table
Create an OLE DB Source called "Product Subcategory", in which we will select all of the Product Categories .
Double click on the Product Subcategory OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductSubcategoryID, and Name from the ProductCategory table.
Create an OLE DB Source called "Purchase Order Detail", in which we will select all of the Details of the Purchase Orders
Double click on the Purchase Order Detail OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the PurchaseOrderID, ProductID, and UnitPrice from the PurchaseOrderDetail table.
Now we will create two Sort components and join the pipeline from Products to one of the sort transformations and join the pipeline from Product Subcategory to the other sort transformation. Remember that both datasets that you are joining must be sorted the same before joining with a Merge Join.
Click on the Sort that we connected to the Products source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Click on the Sort that we connected to the Product Subcategory source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Now we will add a Merge Join transformation. This will allow us to join the Products and Product Subcategory sources together. Drag the pipeline arrow from the Products Sort to the Merge Join.
The Input Output Selection window will appear. Select Merge Join Left Input. This will mean that we are using the Products on the Left hand side of the join. If you are familiar with Left and Right joins in SQL this is a familiar concept. Choosing Left Input doesn't mean we are necessarily doing an outer or inner join (we define that later), it just defines what side of the join this input will be used as.
Now drag the pipeline arrow from the Product Subcategory Sort to the Merge Join. This time it will most likely not prompt you for which side of the join you want to add this input, as we already selected Left Input for the previous input.
Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductSubcategoryID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be "Inner Join", "Left Outer Join" or "Full Outer Join". In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Notice that below there is a column called "Output Alias". This column allows us to rename columns to new names. This allows us to rename "Name" from Products to "ProductName" and renam "Name" from ProductSubcategory to "CategoryName". So after the Merge Join, this columns will now be known be these alias names. When completed click OK.
Now we will create two Sort components and join the pipeline from the Merge Join to one of the sorts and join the pipeline from the Purchase Order Detail source to the other sort.
Click on the Sort that we connected to the Merge Join output and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Click on the Sort that we connected to the Purchase Order Detail source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Now we will add a Merge Join transformation. This will allow us to join the Results of the first Merge Join and the Purchase Order Detail source together. Drag the pipeline arrow from the sort transformation of the first Merge Join to the Merge Join. The Input Output Selection window will appear. Select Merge Join Left Input. Then drag the pipeline arrow from the Purchase Order Detail sort to the Merge Join.
Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be "Inner Join", "Left Outer Join" or "Full Outer Join". In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Click OK hen completed.
Now we can output our joined dataset to any source we desire. I Hope you are now able to successfully join datasets using SSIS.
0 comments
Post a Comment