Have you ever developed or worked on tabular models that take longer to process than you expect?
Have you ever wanted to decrease the processing time so you can process the tabular model more often?
In my experience, I have answered “yes” to both questions.
In this post I’d like to share a technique that may help identify an opportunity to improve processing performance.
Processing a Tabular Model
Processing a tabular model is the necessary step to bring data from one or more sources and transform it into a compressed in-memory data structure. The step of processing a model can be time-consuming for many reasons. It can sometimes be hard to understand which part is taking the most time. Once you understand what is taking the most time, it becomes easier to focus on what can be done to improve performance.
It is also important to note that, while processing times for a model may be taking longer than anticipated, it could be that the model is already running as efficiently as possible. Knowing the amount of time of each step during processing allows insight into where to further investigate. However, it is still up to the developer to determine if those times seem too long. After gaining more experience with a specific model and across many models, it becomes easier to gauge if processing is taking longer than expected. Understanding how the Analysis Services engine works in detail, and following tabular development best practices, can allow for the most efficient processing from the beginning.
Full Process Strategy
While there are many processing strategies, we will focus on a Full Process strategy as it is very simple to implement and very common. In addition, this technique can be used regardless of your processing strategy.
During a Full Process there are two primary stages: Process Data Command and Recalc. The Process Data command is run on each table in the model, typically in a multi-threaded asynchronous manner. Once a Process Data has been run on all tables, a Process Recalc is executed.
(Note: It is outside the scope of this article to go into detail on what happens within each of these stages. In summary, the process data reads the data from the data source and brings it into a compressed tabular data structure, and the Process Recalc builds additional data structures such as Relationships, Dictionaries and Hierarchies.)
Best Way to Process a Tabular Model in Development
A model can be processed in many ways including the SQL Server Management Studio (SSMS) user interface (UI), through a JSON script, and programmatically. In my opinion, the best way to process a tabular model during development is to use a JSON script executed from SSMS. To avoid writing this JSON by hand, you can easily generate it by using SSMS. This can be done by right-clicking on the object you want to process within SSMS and selecting the process option.
However, instead of selecting the options you want and clicking OK, a script can be generated by clicking the Script button near the top left corner of the dialog box. This creates the JSON script within SSMS which is much easier to run multiple times while troubleshooting versus going through the UI each time. In addition, it is easier to see detailed error information if processing fails. After the script is generated, you can click Cancel in the processing dialog box. Scripting the JSON will be used to help identify the slowest part of the processing.
Pinpointing Slow Areas of a Process
To identify the slowest part of the process, I like to run a Process Data on each table, one at a time, followed by a Process Recalc at the database level. The JSON below will process one table at a time.
(Please note, each script must be run individually. This can be done by highlighting it in SSMS and executing it. Determining how long each table takes to process will allow you to identify which tables in a tabular model are taking the longest.)
{
"refresh": {
"type": "dataOnly",
"objects": [
{
"database": "Adventure Works Internet Sales",
"table": "Internet Sales"
}
]
}
}
If you have many tables, you can run Process Data on multiple tables at once to exclude a set of tables that may be less significant. However, if a batch of tables runs slow, it may need to be broken down into individual table scripts to determine which table or tables is the slowest. Processing more than one table at a time will, by default, run in a multi-threaded manner. It will usually process quicker than running them sequentially or forcing a multiple table process to run with a single thread.
{
"refresh": {
"type": "dataOnly",
"objects": [
{
"database": "Adventure Works Internet Sales",
"table": "Customer"
},
{
"database": "Adventure Works Internet Sales",
"table": "Internet Sales"
}
]
}
}
After running Process Data on each table, a Process Recalc on the database should be run. Depending on the model, the Process Recalc could be slow and an area to focus on. Based on the results of running the scripts individually, it should be fairly easy to identify if there are one or more tables that process slowly, if the Process Recalc is taking a significant amount of time, or a combination of the two.
{
"refresh": {
"type": "calculate",
"objects": [
{
"database": "Adventure Works Internet Sales"
}
]
}
}
Possible Root Causes
While it is beyond the scope of this article to troubleshoot the many different types of performance issues that can occur, here are several common issues that will provide a starting point:
- A primary cause of slow processing can be the underlying data source itself. Creating a physical dimensional model is often the best source for a tabular model.
- The queries that extract data from the underlying source can also be a common problem, especially if there are a lot of JOINS in the query. Depending on your data source, the problem in a source query could range from a complex SQL Server database query to a poorly performing web service.
- If the Process Recalc is slower than you expect, it could be because of complex calculated columns or calculated tables.
- A seemingly slow process could potentially be the hardware that is used to process the tabular model, or even the network connection.
Real Life Example and Solution
In one of our recent projects, there was a model that took over an hour before all the memory on the processing server was consumed. This model would often fail to process. By using the technique listed above, it was straightforward to see which areas were taking the most time. In this case, the causes of the processing performance issues ranged from complex SQL queries based on an OLTP system and the use of many unnecessary calculated columns.
After building a physical dimensional model and redesigning the tabular model to eliminate many unnecessary calculated columns, the model processed in about two minutes. While it can be easy to create simple tabular models, understanding how the tabular engine works and following best practices can go a long way in preventing processing and query performance issues.
Application
In summary, identifying whether one or more tables or the Process Recalc is running slower than expected is the first step in identifying tabular model processing performance issues. Once the problem area has been identified, a more focused approach can be used to optimize the area in question.