Combine multiple tables with UNION / UNION ALL in SQL Server, APPEND in Power Query or UNION in DAX

October 2, 2020
Denis Selimovic
DAX, Microsoft, Power BI, Power Query / M, SQL Servercomment 3Created with Sketch Beta.0 comments

This post is also available in: German

If you want to combine several tables, you have to ask yourself whether you should do this already in the data source such as SQL Server, in the processing of the data (ETL) in Power Query or in Power BI and, above all, where exactly there is a difference.

First of all, the post is relatively detailed, for all of those who consider it too long (tl;dr) here is the comparison table:

Duplicate entries remain Duplicate entries are removed Unequal number of columns Deviating column names Different order of the columns Combine more than 2 tables
UNION in SQL Server With UNION ALL With UNION x x
APPEND in Power Query x Possible with “Remove duplicates” x x x
UNION in DAX x Possible with “DISTINCT” x x
Overview of the different possible combinations

Data transformation in the data source or in the frontend?

A question that comes up right at the beginning is, where should the joining of the tables take place?
In fact, this cannot be stated in a generalized way, since it depends on a number of factors. But in most cases it is better to do this at the beginning of the data transformation. Ideally, the union would take place in SQL Server.

However, it may not be possible to do this in SQL Server, either because the data is not yet available there, since it comes from different sources, such as an SQL table and a CSV file. In this case, the next level, here Power Query within Power BI or the new dataflows, would be a good choice.

Then there is the last possibility to combine the data directly in Power BI with DAX. Again, there are different scenarios why this could happen in the Power BI data model. For example, if a table is calculated in the DAX and is not yet available in Power Query. Usually you would do the append less in the frontend and more in the data source or power query.

Example data for the 3 scenarios

As an example we look at the target values for different locations. Let’s say that the target values come from different systems and so from different tables or files, we have to merge them again for the analysis.

Our first table looks like this:

EmpID Name LocationID Location Year Target
1089 Frodo Baggins 12 The Shire 2020 120000
1232 Samwise Gamgee 12 The Shire 2020 140000
1219 Peregrin Took 12 The Shire 2020 100000
1024 Meriadoc Brandybock 12 The Shire 2020 95000
Employees of the location “The Shire” with their target values

We want to combine this with the table of our second location to get a complete table of all locations. Note that we have one column more in the second table, because here several regions belong to the location or to a common cluster:

EmpID Name LocationID Location Cluster Year Target
9042 Aragorn 8 Minas Tirith Central Middle Earth 2020 260000
9058 Arwen 8 Minas Tirith Central Middle Earth 2020 190000
9012 Faramir 8 Minas Tirith Central Middle Earth 2020 210000
9276 Damrod 8 Minas Tirith Central Middle Earth 2020 180000
5412 Theoden 16 Rohan Central Middle Earth 2020 220000
5419 Eomer 16 Rohan Central Middle Earth 2020 180000
5487 Eowyn 16 Rohan Central Middle Earth 2020 210000
5876 Eothain 16 Rohan Central Middle Earth 2020 185000
5912 Haleth 16 Rohan Central Middle Earth 2020 150000
5001 Gamling 16 Rohan Central Middle Earth 2020 165000
1089 Frodo Baggins 12 The Shire 2020 120000
Employees of the cluster “Central Middle Earth” with their target values

Important for this scenario is the last record from the cluster “Central Middle Earth”. The employee “1089 Frodo Baggins” actually is from the region of our first table, but during the year he was working in the region of “Rohan”/”Minas Tirith”, so the manager of that region entered a target value there. But since he is back now in “The Shire”, the manager there has also added him to his system. Accordingly we have here a double record.

For our test series we need a third table, here we will use the target values of the location “Moria”:

EmpID Name LocationID Location Year Target
7431 Balin 14 Moria 2020 245000
7129 Dwalin 14 Moria 2020 178000
7194 Gimli 14 Moria 2020 210000
7225 Ori 14 Moria 2020 190000

We are going to test each version on 6 criteria:

  1. Syntax to combine the tables
  2. Handling of duplicate entries
  3. Behavior in case of unequal amount of columns
  4. Behavior in case of different column names
  5. Behavior in case of unequal order of the columns
  6. Behavior in case of more than 2 tables

1. UNION / UNION ALL in SQL Server

Let’s start with the merge in SQL Server.

A. Syntax to combine the tables in SQL Server

The syntax is quite simple, we combine the select queries of the individual tables with a UNION or UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

As a result we receive the combined tables:

image 23

B. Handling of duplicate entries in SQL Server

If you have paid attention, the double entry of the employee “Frodo Baggins” is missing in the above example. This is because the “UNION” command removes duplicate values.

If we want to have the duplicate entries as well, we have to use “UNION ALL”. The syntax is similar to the example above:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

This way we also receive the duplicate record:

image 25

C. Behavior in case of unequal amount of columns in SQL Server

In SQL Server we reach a limit here, because the number of columns must always be identical. Otherwise we run into an error if we try to integrate the cluster from the second table, for instance:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Cluster]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

Here we encounter the following error:

Msg 205, Level 16, State 1, Line 2
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

An additional column would have to be caught in the remaining queries (NULL AS [Cluster]).

D. Behavior in case of different column names in SQL Server

Different column names are not a problem. The column names of the first query are always used. The following queries must simply have the correct number of columns:

SELECT [EmpID] AS [EmployeeID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Cluster]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

Note that the column name of the first column adapts to the name of the first select:

image 27

E. Behavior in case of unequal order of the columns in SQL Server

As just mentioned, the SQL Server follows the order of the columns of the first select. The column names of the following queries are ignored. If the order of the columns in the following query changes accordingly, the values are taken in the order they appear in the select, as long as the file types match:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [LocationID]
  ,[Name]
  ,[EmpID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

The values of the LocationID and the EmpID appear mixed in the columns:

image 28

If we put a column with text and a column with numerical values on the corresponding positions, we would run into a conversion error:

Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value 'Frodo Baggins' to data type int.

F. Behavior in case of more than 2 tables in SQL Server

An unlimited number of tables can be combined together. For this purpose, additional UNION or UNION ALL statements simply have to be lined up one after the other:

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetMoria]

Each additional table is simply added below the existing ones:

image 29

2. APPEND in Power Query

In Power Query the behavior is slightly different than in SQL Server. Here the columns are combined only by the names and not by the order, which results in other possibilities.

A. Syntax to combine the tables in Power Query

Due to the visual way Power Query works, it would be a little unusual to use the programmatic syntax in M (= the query language of Power Query). I will therefore focus on creating the query with the graphical interface.

To create the query, the first selected table is marked and combined using the append function:

image 30

Then you only need to select the second table:

image 31

After that the combination is already done, in Power Query further transformations could be done:

image 32

B. Handling of duplicate entries in Power Query

Duplicates are not automatically removed by an append. Those still exist:

image 33

However, these can be filtered out in a next step using the Remove Duplicates function:

image 34

Afterwards the duplicate value is removed:

image 35

C. Behavior in case of unequal amount of columns in Power Query

As already mentioned, the append in Power Query uses the column names. Columns with the same name are combined, columns missing from a table are simply left empty:

image 36

D. Behavior in case of different column names in Power Query

The same is with column names. If the name changes, the content will not be combined.

As in the previous example, in the first table the column “EmpID” was renamed to “EmployeeID”, in the second table the name is kept:

image 37

E. Behavior in case of unequal order of the columns in Power Query

The order does not matter at all in Power Query, because the append takes place exclusively using the column name.

The append of the following two tables…

image 38
image 40

… handles Power Query without problems:

image 41

F. Behavior in case of more than 2 tables in Power Query

It is also possible to combine several tables without any difficulty. With the possibility of combining more than 2 tables, all listed tables can be combined:

image 42
image 43

3. UNION in DAX

Although it is used less frequently in practice, DAX not only allows the creation of Calculated Columns and Calculated Measure, but also Caculated Tables.

These are created in the Modeling Tab by clicking “New Table”:

image 44

A big difference to the other two procedures is certainly that the UNION command in DAX combines existing tables from the data model. This means that although the base tables can be hidden from the end user, they are still loaded into the data model. Together with the preparation of the data during the initialization of the model, the storage space of the used tables is multiplied in any case, since these tables are found again in the UNION table.

A. Syntax to combine the tables in DAX

The syntax is kept relatively simple here as well. With the UNION function the mentioned tables are simply put together:

A Union = UNION('A TargetShire', 'A TargetCentralMiddleEarth')

As a result we get the combined table directly in Power BI:

image 45

B. Handling of duplicate entries in DAX

As just seen in the first scenario, duplicate entries are not being filtered out:

image 46

The UNION in DAX thus corresponds to a UNION ALL in SQL or an APPEND in Power Query. If you want to eliminate the additional entry, this can be achieved with the DISTINCT function:

A Union Distinct = DISTINCT( UNION( 'A TargetShire', 'A TargetCentralMiddleEarth' ) )
image 47

C. Behavior in case of unequal amount of columns in DAX

Like SQL, DAX also requires the identical number of columns. In scenario C, we unfortunately cannot connect the two tables with an unequal number of columns:

Each table argument of 'UNION' must have the same number of columns.
image 48

Here the table with fewer columns would have to be manipulated with a table function in DAX as a workaround.

D. Behavior in case of different column names in DAX

Also the naming of the columns behaves like in SQL Server. The names of the first table are simply used:

image 49

The column caption is changed accordingly if the order of the tables is changed at the UNION:

image 51

E. Behavior in case of unequal order of the columns in DAX

As in SQL Server, only the order of the columns matters in DAX. So as soon as the columns are not in the same order, the contents will be mixed:

image 52

F. Behavior in case of more than 2 tables in DAX

The syntax of UNION requires at least 2 tables to combine, but optionally any number of tables can be combined. These can easily be added as another argument to the function. Important here is that all tables must have an identical structure in order to achieve the desired result:

F Union = UNION( 'F TargetShire', 'F TargetCentralMiddleEarthDAX', 'F TargetMoria' )
image 53

Conclusion

Let’s move on to the summary. Combining several tables to one large table is possible in all 3 ways. As we have seen, the behavior of UNION in SQL Server and UNION in DAX within Power BI is very similar. Here simply tables with the same number of columns are placed directly under each other. The APPEND function in Power Query takes a different approach and combines the tables by column names.

I would like to summarize the behavior again in tabular form:

Duplicate entries remain Duplicate entries are removed Unequal number of columns Deviating column names Different order of the columns Combine more than 2 tables
UNION in SQL Server With UNION ALL With UNION x x
APPEND in Power Query x Possible with “Remove duplicates” x x x
UNION in DAX x Possible with “DISTINCT” x x
Overview of the different possible combinations

At the end the question remains which command you should use. As already written in the introduction this depends strongly on the context. All three approaches have a different behavior. Depending on the initial situation, each developer must decide for himself what makes most sense.

Share:
5 1 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments