Combine multiple tables with UNION / UNION ALL in SQL Server, APPEND in Power Query or UNION in DAX
This post is also available in: German
If you want to combine several tables, you have to ask yourself whether you should do this already in the data source such as SQL Server, in the processing of the data (ETL) in Power Query or in Power BI and, above all, where exactly there is a difference.
First of all, the post is relatively detailed, for all of those who consider it too long (tl;dr) here is the comparison table:
Duplicate entries remain | Duplicate entries are removed | Unequal number of columns | Deviating column names | Different order of the columns | Combine more than 2 tables | |
---|---|---|---|---|---|---|
UNION in SQL Server | With UNION ALL | With UNION | – | x | – | x |
APPEND in Power Query | x | Possible with “Remove duplicates” | x | – | x | x |
UNION in DAX | x | Possible with “DISTINCT” | – | x | – | x |
Data transformation in the data source or in the frontend?
A question that comes up right at the beginning is, where should the joining of the tables take place?
In fact, this cannot be stated in a generalized way, since it depends on a number of factors. But in most cases it is better to do this at the beginning of the data transformation. Ideally, the union would take place in SQL Server.
However, it may not be possible to do this in SQL Server, either because the data is not yet available there, since it comes from different sources, such as an SQL table and a CSV file. In this case, the next level, here Power Query within Power BI or the new dataflows, would be a good choice.
Then there is the last possibility to combine the data directly in Power BI with DAX. Again, there are different scenarios why this could happen in the Power BI data model. For example, if a table is calculated in the DAX and is not yet available in Power Query. Usually you would do the append less in the frontend and more in the data source or power query.
Example data for the 3 scenarios
As an example we look at the target values for different locations. Let’s say that the target values come from different systems and so from different tables or files, we have to merge them again for the analysis.
Our first table looks like this:
EmpID | Name | LocationID | Location | Year | Target |
---|---|---|---|---|---|
1089 | Frodo Baggins | 12 | The Shire | 2020 | 120000 |
1232 | Samwise Gamgee | 12 | The Shire | 2020 | 140000 |
1219 | Peregrin Took | 12 | The Shire | 2020 | 100000 |
1024 | Meriadoc Brandybock | 12 | The Shire | 2020 | 95000 |
We want to combine this with the table of our second location to get a complete table of all locations. Note that we have one column more in the second table, because here several regions belong to the location or to a common cluster:
EmpID | Name | LocationID | Location | Cluster | Year | Target |
---|---|---|---|---|---|---|
9042 | Aragorn | 8 | Minas Tirith | Central Middle Earth | 2020 | 260000 |
9058 | Arwen | 8 | Minas Tirith | Central Middle Earth | 2020 | 190000 |
9012 | Faramir | 8 | Minas Tirith | Central Middle Earth | 2020 | 210000 |
9276 | Damrod | 8 | Minas Tirith | Central Middle Earth | 2020 | 180000 |
5412 | Theoden | 16 | Rohan | Central Middle Earth | 2020 | 220000 |
5419 | Eomer | 16 | Rohan | Central Middle Earth | 2020 | 180000 |
5487 | Eowyn | 16 | Rohan | Central Middle Earth | 2020 | 210000 |
5876 | Eothain | 16 | Rohan | Central Middle Earth | 2020 | 185000 |
5912 | Haleth | 16 | Rohan | Central Middle Earth | 2020 | 150000 |
5001 | Gamling | 16 | Rohan | Central Middle Earth | 2020 | 165000 |
1089 | Frodo Baggins | 12 | The Shire | 2020 | 120000 |
Important for this scenario is the last record from the cluster “Central Middle Earth”. The employee “1089 Frodo Baggins” actually is from the region of our first table, but during the year he was working in the region of “Rohan”/”Minas Tirith”, so the manager of that region entered a target value there. But since he is back now in “The Shire”, the manager there has also added him to his system. Accordingly we have here a double record.
For our test series we need a third table, here we will use the target values of the location “Moria”:
EmpID | Name | LocationID | Location | Year | Target |
---|---|---|---|---|---|
7431 | Balin | 14 | Moria | 2020 | 245000 |
7129 | Dwalin | 14 | Moria | 2020 | 178000 |
7194 | Gimli | 14 | Moria | 2020 | 210000 |
7225 | Ori | 14 | Moria | 2020 | 190000 |
We are going to test each version on 6 criteria:
- Syntax to combine the tables
- Handling of duplicate entries
- Behavior in case of unequal amount of columns
- Behavior in case of different column names
- Behavior in case of unequal order of the columns
- Behavior in case of more than 2 tables
1. UNION / UNION ALL in SQL Server
Let’s start with the merge in SQL Server.
A. Syntax to combine the tables in SQL Server
The syntax is quite simple, we combine the select queries of the individual tables with a UNION or UNION ALL
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetShire]
UNION
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetCentralMiddleEarth]
As a result we receive the combined tables:
B. Handling of duplicate entries in SQL Server
If you have paid attention, the double entry of the employee “Frodo Baggins” is missing in the above example. This is because the “UNION” command removes duplicate values.
If we want to have the duplicate entries as well, we have to use “UNION ALL”. The syntax is similar to the example above:
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetShire]
UNION ALL
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetCentralMiddleEarth]
This way we also receive the duplicate record:
C. Behavior in case of unequal amount of columns in SQL Server
In SQL Server we reach a limit here, because the number of columns must always be identical. Otherwise we run into an error if we try to integrate the cluster from the second table, for instance:
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetShire]
UNION ALL
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Cluster]
,[Year]
,[Target]
FROM [TargetCentralMiddleEarth]
Here we encounter the following error:
Msg 205, Level 16, State 1, Line 2
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
An additional column would have to be caught in the remaining queries (NULL AS [Cluster]).
D. Behavior in case of different column names in SQL Server
Different column names are not a problem. The column names of the first query are always used. The following queries must simply have the correct number of columns:
SELECT [EmpID] AS [EmployeeID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetShire]
UNION ALL
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Cluster]
,[Year]
,[Target]
FROM [TargetCentralMiddleEarth]
Note that the column name of the first column adapts to the name of the first select:
E. Behavior in case of unequal order of the columns in SQL Server
As just mentioned, the SQL Server follows the order of the columns of the first select. The column names of the following queries are ignored. If the order of the columns in the following query changes accordingly, the values are taken in the order they appear in the select, as long as the file types match:
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetShire]
UNION ALL
SELECT [LocationID]
,[Name]
,[EmpID]
,[Location]
,[Year]
,[Target]
FROM [TargetCentralMiddleEarth]
The values of the LocationID and the EmpID appear mixed in the columns:
If we put a column with text and a column with numerical values on the corresponding positions, we would run into a conversion error:
Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value 'Frodo Baggins' to data type int.
F. Behavior in case of more than 2 tables in SQL Server
An unlimited number of tables can be combined together. For this purpose, additional UNION or UNION ALL statements simply have to be lined up one after the other:
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetShire]
UNION ALL
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetCentralMiddleEarth]
UNION ALL
SELECT [EmpID]
,[Name]
,[LocationID]
,[Location]
,[Year]
,[Target]
FROM [TargetMoria]
Each additional table is simply added below the existing ones:
2. APPEND in Power Query
In Power Query the behavior is slightly different than in SQL Server. Here the columns are combined only by the names and not by the order, which results in other possibilities.
A. Syntax to combine the tables in Power Query
Due to the visual way Power Query works, it would be a little unusual to use the programmatic syntax in M (= the query language of Power Query). I will therefore focus on creating the query with the graphical interface.
To create the query, the first selected table is marked and combined using the append function:
Then you only need to select the second table:
After that the combination is already done, in Power Query further transformations could be done:
B. Handling of duplicate entries in Power Query
Duplicates are not automatically removed by an append. Those still exist:
However, these can be filtered out in a next step using the Remove Duplicates function:
Afterwards the duplicate value is removed:
C. Behavior in case of unequal amount of columns in Power Query
As already mentioned, the append in Power Query uses the column names. Columns with the same name are combined, columns missing from a table are simply left empty:
D. Behavior in case of different column names in Power Query
The same is with column names. If the name changes, the content will not be combined.
As in the previous example, in the first table the column “EmpID” was renamed to “EmployeeID”, in the second table the name is kept:
E. Behavior in case of unequal order of the columns in Power Query
The order does not matter at all in Power Query, because the append takes place exclusively using the column name.
The append of the following two tables…
… handles Power Query without problems:
F. Behavior in case of more than 2 tables in Power Query
It is also possible to combine several tables without any difficulty. With the possibility of combining more than 2 tables, all listed tables can be combined:
3. UNION in DAX
Although it is used less frequently in practice, DAX not only allows the creation of Calculated Columns and Calculated Measure, but also Caculated Tables.
These are created in the Modeling Tab by clicking “New Table”:
A big difference to the other two procedures is certainly that the UNION command in DAX combines existing tables from the data model. This means that although the base tables can be hidden from the end user, they are still loaded into the data model. Together with the preparation of the data during the initialization of the model, the storage space of the used tables is multiplied in any case, since these tables are found again in the UNION table.
A. Syntax to combine the tables in DAX
The syntax is kept relatively simple here as well. With the UNION function the mentioned tables are simply put together:
A Union = UNION('A TargetShire', 'A TargetCentralMiddleEarth')
As a result we get the combined table directly in Power BI:
B. Handling of duplicate entries in DAX
As just seen in the first scenario, duplicate entries are not being filtered out:
The UNION in DAX thus corresponds to a UNION ALL in SQL or an APPEND in Power Query. If you want to eliminate the additional entry, this can be achieved with the DISTINCT function:
A Union Distinct = DISTINCT( UNION( 'A TargetShire', 'A TargetCentralMiddleEarth' ) )
C. Behavior in case of unequal amount of columns in DAX
Like SQL, DAX also requires the identical number of columns. In scenario C, we unfortunately cannot connect the two tables with an unequal number of columns:
Each table argument of 'UNION' must have the same number of columns.
Here the table with fewer columns would have to be manipulated with a table function in DAX as a workaround.
D. Behavior in case of different column names in DAX
Also the naming of the columns behaves like in SQL Server. The names of the first table are simply used:
The column caption is changed accordingly if the order of the tables is changed at the UNION:
E. Behavior in case of unequal order of the columns in DAX
As in SQL Server, only the order of the columns matters in DAX. So as soon as the columns are not in the same order, the contents will be mixed:
F. Behavior in case of more than 2 tables in DAX
The syntax of UNION requires at least 2 tables to combine, but optionally any number of tables can be combined. These can easily be added as another argument to the function. Important here is that all tables must have an identical structure in order to achieve the desired result:
F Union = UNION( 'F TargetShire', 'F TargetCentralMiddleEarthDAX', 'F TargetMoria' )
Conclusion
Let’s move on to the summary. Combining several tables to one large table is possible in all 3 ways. As we have seen, the behavior of UNION in SQL Server and UNION in DAX within Power BI is very similar. Here simply tables with the same number of columns are placed directly under each other. The APPEND function in Power Query takes a different approach and combines the tables by column names.
I would like to summarize the behavior again in tabular form:
Duplicate entries remain | Duplicate entries are removed | Unequal number of columns | Deviating column names | Different order of the columns | Combine more than 2 tables | |
---|---|---|---|---|---|---|
UNION in SQL Server | With UNION ALL | With UNION | – | x | – | x |
APPEND in Power Query | x | Possible with “Remove duplicates” | x | – | x | x |
UNION in DAX | x | Possible with “DISTINCT” | – | x | – | x |
At the end the question remains which command you should use. As already written in the introduction this depends strongly on the context. All three approaches have a different behavior. Depending on the initial situation, each developer must decide for himself what makes most sense.