In this blog, Marcus Radue, Solution Architect, offers high-level guidance in the advantages of aggregation tables in Power BI. For a full overview on this topic and a demo, watch the full video at the bottom of the blog.
Aggregation tables are summary tables that allow you to speed up performance for large Power BI datasets in DirectQuery storage mode. For example, let’s say you have a transaction / fact type table where you have many different attribute columns. In the example data model below, that would include attributes like Date, Product, Customer, Stores, etc. These attributes might be important for your data, but the end users consuming a specific report from this dataset may spend 95% of the time slicing on specific attributes like Product or Date.
By configuring an aggregation table(s), you allow Power BI to cache or save those aggregated defined results from your implicit measures. Then, when creating your visualizations and DAX calculations, the formula engine recognizes that relationship to an aggregation table and will use that in the formula engine instead of the overall fact table.
Performance Difference Using Aggregation Tables
This can help with performance in a couple of ways. First is size: your fact table(s) may have upwards of hundreds of millions to billions of rows. This could be a storage constraint for those organizations only using Power BI Pro licensing. And depending on the data source performance, it should increase the speed of your DAX calculations that are funneled through your aggregation table(s).
So how does this work? First, bring all your tables in your Power BI dataset in as Direct Query storage mode. Make sure your aggregation tables are defined ahead of time. Next switch the storage mode from your related dimension type tables to “Dual”. These tables should be related to both your original fact(s) and your aggregation table(s). After that, switch the storage mode of your aggregation table(s) to “Import”. Finally configure your aggregation table(s) in Power BI desktop.
How much faster? Many variables impact report generation speed. Adding indexes to your database can improve speed and return time with a query whether there are aggregation tables or not. Other performance tuning options can variably change that query response time. However, in my demo, I ran both types of report queries on the same database (that had no performance tuning). Using aggregation tables vs not using them delivered a significant difference for those specific data sets. It took 90 seconds to return the summarization of the entire fact table of over 10 million rows compared to less than a second for measures connected to my aggregation table.
When Should You Use Aggregation Tables?
Let’s look at two scenarios where aggregation tables may be useful for you in your report and data model design.
Scenario One: Faster Results
In the first scenario, I need a Power BI report that is in DirectQuery. That’s a business requirement. I need data that’s more real-time, but I need my calculations to run faster. With that DirectQuery connection I’m not getting the results returned fast enough for end user satisfaction.
Scenario Two: Memory Limits
In the second scenario, the Power BI report is in import storage mode and the file is too large to publish online. This scenario applies to organizations that are using Power BI Pro licensing where you have a memory limit of 1 gigabyte for a Power BI Desktop file. For those using Power BI Premium, scenario two doesn’t fall into your use case. With Premium, you can upload files up to 40 gigabytes and use the large file storage format enabled through Power BI Premium.
Aggregation Table Setup Tips
We’ve looked at when and why to use aggregation tables. In this section, I’m going to cover some tips and guidelines for the setup of this feature. Remember to watch the full video demo for more details.
All Table Queries Should be Brought in as DirectQuery Storage Mode
Initially in your data model, you’ll want to bring in all your table queries as DirectQuery storage mode. If you initially bring them in as Import, then you will have to delete all your queries and re-load them as DirectQuery to make the switch. You can’t dynamically go from Import to DirectQuery, (or Dual as another option). If you start with DirectQuery you can always change it to either Dual or Import from there, but you can’t go the reverse direction.
Pre-Define Aggregation Tables
An aggregation table needs to be defined ahead of time. That is, you need to have some type of aggregated source object defined before bringing it into Power BI desktop. Calculated tables in DAX are not allowed in your aggregation table setup.
Configure Storage Modes
You will need to change the storage modes from some of your DirectQuery tables to Dual and Import. In the data model example below, starting from left to right, you can see by the solid line at the top that the first three tables (Date, Customer, and Sales) are all still in DirectQuery mode. Continuing to the right, my Product and Stores tables are in dual mode and have a dotted line over the top. Finally, my aggregation table on the right is in import mode, which removes that colored line at the top. Note that Product and Stores are linked to both the original factor transaction table (Sales) and the aggregation table.
Configure Aggregation Table
You will also want to configure your aggregation table in your data model. You can see an example of this below with the Product ID and Store ID, and with Revenue and Units as your measures.
You also want to indicate which attributes you want to group by, and which measures you want to aggregate by. In this example, I aggregate Revenue and Units as a sum. For my attributes of Product ID and Store ID, I group them so I can use them in slicers and on my axes in my visualizations.
After configuring my aggregation table in my data model, I can now test my query results. Theoretically columns connected to my aggregation table should return much quicker than those columns that are not connected.
First, I created a simple report page seen below. I opened the Performance Analyzer pane and recorded my query times.
You can see that my visuals return nearly all under a second except my card visual containing a summarization of my cost column. This should not be a surprise as the “Cost” column was not included in my aggregation table setup above.
To confirm if your DAX query is using the aggregation table in its query plan you can hop over to DAX studio. You can launch this tool from the external tools menu, if you have that configured, or open it separately and connect to your open Power BI desktop file.
Next, I will copy the card query summarizing cost and paste it into my DAX studio query window. I’ll enable the server timings setting and evaluate the query.
I can see that my rewrite attempt failed and that the query took almost 80 seconds to return.
Next, I will show the query that’s connected to the aggregation table with my card summarizing revenue.
This time around my query for revenue returns in less than a second. And sure enough a match to the aggregation table has been found in the query plan!
I hope this dive into aggregation tables has helped you decide when to use them to improve your data model and query performance, along with answering some basic questions about the configuration of aggregation tables.
For more assistance with aggregation tables, or other Power BI data management needs, be sure to watch the demo below, review my past blogs and webinars, or reach out to the Core data team.