Load files with changing names from folders with changing names into Power Query
This post is also available in: German
What at first sounds complicated and not very useful, can be found in reality from time to time. Recently the automated export of source data from another program in CSV format confronted me with this task.
Daily folder for automated export
Unfortunately, when exporting from the source system – apart from the target directory – very little could be adjusted. The result is that the automated export job daily creates a new folder with the current date, similar to the following example:
In this example it is only necessary to load the files from the newest folder, because this folder contains the entire data. However, the file names also change with the daily export. The structure can be seen in the following picture:
To load the latest data we have to:
- Find the most recent folder
- Save the date from the folder name as a variable or function
- Load the folder and files dynamically with a variable / function
1. Find the most recent folder
To extract the information about the existing folders, we first have to load them into Power Query. Important here is that we create the folder name in Power Query as a parameter, because we need it for a dynamic loading later:
Then we can load the folder into Power Query:
Here we receive a lot of information about the files that are located in the subfolders. But not the content itself. This is often misunderstood when loading folders. But we are only interested in the last column “Folder Path”, which shows us the path of the corresponding file:
Accordingly, we remove all other columns except “Folder Path”, remove duplicate values, remove everything before the separator “_” and sort in descending order to get the newest folder at the top:
With the keep top rows function we keep only the top/most recent record:
Now comes one of the crucial steps. Currently we only have one tablewith one record. In order to get the folder name and therefore also the date from the file names as a dynamic value, we have to drill down to this value with the “drill down” function. This can be achieved with the right mouse button:
Now there is one more thing to adjust. Since we drill down to the value manually, the value is stored statically in the formula:
This would cause an error if the folder name was changed. So we replace the static value, to which we want to drill down, with the line number. It does notstart with 1, but as often in programming languages with 0:
Since I refer to this result in the next step, I name the query “TemplateDynamicSourceFolder”:
2. Save the date from the folder name as a variable or function
In step 1 we extracted the dynamic part of the folder and file names and theoretically we could access this result in the other queries.
If we have loaded one of the Sales_per_Employee files, we can try to make it dynamic with the query we just created.
We can try this out directly in the advanced editor:
In theory it should work, but the following error message will be shown:
Formula.Firewall: Query 'Sales_per_Employee' (step 'Changed Type') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
The reason for the error is – in a short version – that Power Query does not allow to use an additional external data source in the query. Ken Puls has already explained this in detail in his blog. Therefore we have 2 possibilities how to avoid this:
- We can integrate our query from the first step into the same query (see demo file at the end of the article)
- We can convert our query created in step 1 into a function
Which brings us to the last step
3. Load the folder and files dynamically with a variable / function
As said before we don’t receive this error if we define the detection of the folder name from step 1 as a function. Simply right-click on the query from step 1 and select “Create Function”:
The function requires a name and a parameter. But since we had already set the parameter in the first step, we simply have to assign a function name:
A new folder is automatically created with the function and the corresponding query:
Beim Dynamisieren können wir nun die Jahreszahlen durch unsere Funktion ersetzen, welche ja das Datum des letzten Ordners zurückliefert:
Der Load funktioniert durch die Funktion fehlerfrei und aktualisiert sich immer auf die neusten Dateien aus dem neusten Ordner.
Download Beispieldatei und Quelldateien für diesen Artikel