Combine multiple tables with UNION / UNION ALL in SQL Server, APPEND in Power Query or UNION in DAX

This post is also available in: Deutsch

If you want to combine several tables, you have to ask yourself whether you should do this already in the data source such as SQL Server, in the processing of the data (ETL) in Power Query or in Power BI and, above all, where exactly there is a difference.

First of all, the post is relatively detailed, for all of those who consider it too long (tl;dr) here is the comparison table:

Duplicate entries remainDuplicate entries are removedUnequal number of columnsDeviating column namesDifferent order of the columnsCombine more than 2 tables
UNION in SQL ServerWith UNION ALLWith UNIONxx
APPEND in Power QueryxPossible with “Remove duplicates”xxx
UNION in DAXxPossible with “DISTINCT”xx
Overview of the different possible combinations

Data transformation in the data source or in the frontend?

A question that comes up right at the beginning is, where should the joining of the tables take place?
In fact, this cannot be stated in a generalized way, since it depends on a number of factors. But in most cases it is better to do this at the beginning of the data transformation. Ideally, the union would take place in SQL Server.

However, it may not be possible to do this in SQL Server, either because the data is not yet available there, since it comes from different sources, such as an SQL table and a CSV file. In this case, the next level, here Power Query within Power BI or the new dataflows, would be a good choice.

Then there is the last possibility to combine the data directly in Power BI with DAX. Again, there are different scenarios why this could happen in the Power BI data model. For example, if a table is calculated in the DAX and is not yet available in Power Query. Usually you would do the append less in the frontend and more in the data source or power query.

Example data for the 3 scenarios

As an example we look at the target values for different locations. Let’s say that the target values come from different systems and so from different tables or files, we have to merge them again for the analysis.

Our first table looks like this:

EmpIDNameLocationIDLocationYearTarget
1089Frodo Baggins12The Shire2020120000
1232Samwise Gamgee12The Shire2020140000
1219Peregrin Took12The Shire2020100000
1024Meriadoc Brandybock12The Shire202095000
Employees of the location “The Shire” with their target values

We want to combine this with the table of our second location to get a complete table of all locations. Note that we have one column more in the second table, because here several regions belong to the location or to a common cluster:

EmpIDNameLocationIDLocationClusterYearTarget
9042Aragorn8Minas TirithCentral Middle Earth2020260000
9058Arwen8Minas TirithCentral Middle Earth2020190000
9012Faramir8Minas TirithCentral Middle Earth2020210000
9276Damrod8Minas TirithCentral Middle Earth2020180000
5412Theoden16RohanCentral Middle Earth2020220000
5419Eomer16RohanCentral Middle Earth2020180000
5487Eowyn16RohanCentral Middle Earth2020210000
5876Eothain16RohanCentral Middle Earth2020185000
5912Haleth16RohanCentral Middle Earth2020150000
5001Gamling16RohanCentral Middle Earth2020165000
1089Frodo Baggins12The Shire2020120000
Employees of the cluster “Central Middle Earth” with their target values

Important for this scenario is the last record from the cluster “Central Middle Earth”. The employee “1089 Frodo Baggins” actually is from the region of our first table, but during the year he was working in the region of “Rohan”/”Minas Tirith”, so the manager of that region entered a target value there. But since he is back now in “The Shire”, the manager there has also added him to his system. Accordingly we have here a double record.

For our test series we need a third table, here we will use the target values of the location “Moria”:

EmpIDNameLocationIDLocationYearTarget
7431Balin14Moria2020245000
7129Dwalin14Moria2020178000
7194Gimli 14Moria2020210000
7225Ori14Moria2020190000

We are going to test each version on 6 criteria:

  1. Syntax to combine the tables
  2. Handling of duplicate entries
  3. Behavior in case of unequal amount of columns
  4. Behavior in case of different column names
  5. Behavior in case of unequal order of the columns
  6. Behavior in case of more than 2 tables

1. UNION / UNION ALL in SQL Server

Let’s start with the merge in SQL Server.

A. Syntax to combine the tables in SQL Server

The syntax is quite simple, we combine the select queries of the individual tables with a UNION or UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

As a result we receive the combined tables:

B. Handling of duplicate entries in SQL Server

If you have paid attention, the double entry of the employee “Frodo Baggins” is missing in the above example. This is because the “UNION” command removes duplicate values.

If we want to have the duplicate entries as well, we have to use “UNION ALL”. The syntax is similar to the example above:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

This way we also receive the duplicate record:

C. Behavior in case of unequal amount of columns in SQL Server

In SQL Server we reach a limit here, because the number of columns must always be identical. Otherwise we run into an error if we try to integrate the cluster from the second table, for instance:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Cluster]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

Here we encounter the following error:

Msg 205, Level 16, State 1, Line 2
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

An additional column would have to be caught in the remaining queries (NULL AS [Cluster]).

D. Behavior in case of different column names in SQL Server

Different column names are not a problem. The column names of the first query are always used. The following queries must simply have the correct number of columns:

SELECT [EmpID] AS [EmployeeID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Cluster]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

Note that the column name of the first column adapts to the name of the first select:

E. Behavior in case of unequal order of the columns in SQL Server

As just mentioned, the SQL Server follows the order of the columns of the first select. The column names of the following queries are ignored. If the order of the columns in the following query changes accordingly, the values are taken in the order they appear in the select, as long as the file types match:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [LocationID]
  ,[Name]
  ,[EmpID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

The values of the LocationID and the EmpID appear mixed in the columns:

If we put a column with text and a column with numerical values on the corresponding positions, we would run into a conversion error:

Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value 'Frodo Baggins' to data type int.

F. Behavior in case of more than 2 tables in SQL Server

An unlimited number of tables can be combined together. For this purpose, additional UNION or UNION ALL statements simply have to be lined up one after the other:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetMoria]

Each additional table is simply added below the existing ones:

2. APPEND in Power Query

In Power Query the behavior is slightly different than in SQL Server. Here the columns are combined only by the names and not by the order, which results in other possibilities.

A. Syntax to combine the tables in Power Query

Due to the visual way Power Query works, it would be a little unusual to use the programmatic syntax in M (= the query language of Power Query). I will therefore focus on creating the query with the graphical interface.

To create the query, the first selected table is marked and combined using the append function:

Then you only need to select the second table:

After that the combination is already done, in Power Query further transformations could be done:

B. Handling of duplicate entries in Power Query

Duplicates are not automatically removed by an append. Those still exist:

However, these can be filtered out in a next step using the Remove Duplicates function:

Afterwards the duplicate value is removed:

C. Behavior in case of unequal amount of columns in Power Query

As already mentioned, the append in Power Query uses the column names. Columns with the same name are combined, columns missing from a table are simply left empty:

D. Behavior in case of different column names in Power Query

The same is with column names. If the name changes, the content will not be combined.

As in the previous example, in the first table the column “EmpID” was renamed to “EmployeeID”, in the second table the name is kept:

E. Behavior in case of unequal order of the columns in Power Query

The order does not matter at all in Power Query, because the append takes place exclusively using the column name.

The append of the following two tables…

… handles Power Query without problems:

F. Behavior in case of more than 2 tables in Power Query

It is also possible to combine several tables without any difficulty. With the possibility of combining more than 2 tables, all listed tables can be combined:

3. UNION in DAX

Although it is used less frequently in practice, DAX not only allows the creation of Calculated Columns and Calculated Measure, but also Caculated Tables.

These are created in the Modeling Tab by clicking “New Table”:

A big difference to the other two procedures is certainly that the UNION command in DAX combines existing tables from the data model. This means that although the base tables can be hidden from the end user, they are still loaded into the data model. Together with the preparation of the data during the initialization of the model, the storage space of the used tables is multiplied in any case, since these tables are found again in the UNION table.

A. Syntax to combine the tables in DAX

The syntax is kept relatively simple here as well. With the UNION function the mentioned tables are simply put together:

A Union = UNION(‘A TargetShire’, ‘A TargetCentralMiddleEarth’)

As a result we get the combined table directly in Power BI:

B. Handling of duplicate entries in DAX

As just seen in the first scenario, duplicate entries are not being filtered out:

The UNION in DAX thus corresponds to a UNION ALL in SQL or an APPEND in Power Query. If you want to eliminate the additional entry, this can be achieved with the DISTINCT function:

A Union Distinct = DISTINCT( UNION( ‘A TargetShire’, ‘A TargetCentralMiddleEarth’ ) )

C. Behavior in case of unequal amount of columns in DAX

Like SQL, DAX also requires the identical number of columns. In scenario C, we unfortunately cannot connect the two tables with an unequal number of columns:

Each table argument of 'UNION' must have the same number of columns.

Here the table with fewer columns would have to be manipulated with a table function in DAX as a workaround.

D. Behavior in case of different column names in DAX

Also the naming of the columns behaves like in SQL Server. The names of the first table are simply used:

The column caption is changed accordingly if the order of the tables is changed at the UNION:

E. Behavior in case of unequal order of the columns in DAX

As in SQL Server, only the order of the columns matters in DAX. So as soon as the columns are not in the same order, the contents will be mixed:

F. Behavior in case of more than 2 tables in DAX

The syntax of UNION requires at least 2 tables to combine, but optionally any number of tables can be combined. These can easily be added as another argument to the function. Important here is that all tables must have an identical structure in order to achieve the desired result:

F Union = UNION( ‘F TargetShire’, ‘F TargetCentralMiddleEarthDAX’, ‘F TargetMoria’ )

Conclusion

Let’s move on to the summary. Combining several tables to one large table is possible in all 3 ways. As we have seen, the behavior of UNION in SQL Server and UNION in DAX within Power BI is very similar. Here simply tables with the same number of columns are placed directly under each other. The APPEND function in Power Query takes a different approach and combines the tables by column names.

I would like to summarize the behavior again in tabular form:

Duplicate entries remainDuplicate entries are removedUnequal number of columnsDeviating column namesDifferent order of the columnsCombine more than 2 tables
UNION in SQL ServerWith UNION ALLWith UNIONxx
APPEND in Power QueryxPossible with “Remove duplicates”xxx
UNION in DAXxPossible with “DISTINCT”xx
Overview of the different possible combinations

At the end the question remains which command you should use. As already written in the introduction this depends strongly on the context. All three approaches have a different behavior. Depending on the initial situation, each developer must decide for himself what makes most sense.

Denis Selimovic

Denis Selimovic

As a user from the very beginning, Denis Selimovic is passionate about Power BI and everything related to it. In his blog WhatTheFact.bi, he writes about the latest developments in Power BI and provides tips and tricks on the subject. Besides being a Power BI enthusiast, blog author, speaker and senior consultant, he is also co-organizer of the Meetup Power BI User Group Switzerland.

Kommentare:

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
%d bloggers like this: