December 3, 2019

“The Other Realtime”: Low-Latency Data Processing via DirectQuery

Post by: Data Team

So far in this blog series, we have talked about Realtime and Near-Realtime data processing. However, in the Power BI world, there is a third option that we call “the Other Realtime.” It’s low latency data analysis via a DirectQuery data model. In this case, we’re usually looking at data that is more transactional in nature.

What is an Example of “The Other Realtime?”

In the graphic below, I have a use-case example of what “the Other Realtime” might look like. The left-hand side shows orders that have been placed and need to be shipped off from a dock. This is based on a scenario that we encountered in the past. A client needed to monitor how quickly their trucks were being filled coming out of the warehouse to make sure they met certain time limit requirements. They paid a penalty if they didn’t get those trucks off the dock in the allowed time.

Our client needed to see how quickly they were getting orders loaded onto the truck. They had to make sure they were on target to get the truck full and away from the dock within a two-hour window.

In a case like this, we probably have two different systems that are feeding the data. We might turn on something like transactional replication so that orders from the ERP system and shipping information from the warehouse floor picking and scanning system can be quickly replicated into an Azure SQL DB Managed Instance. The Managed Instance can have sub-databases inside of it. One of the things we need to keep in mind with DirectQuery is that it can only run against a single database instance. In this use case, we solved that problem by creating cross-database queries that loaded orders and shipment data and joined it into a single unified Azure SQL DB instance. (In the image, that’s the database at the bottom.)

Then we connected Analysis Services via a DirectQuery to that Azure SQL DB instance, and that provided very low latency processing. It’s not to the second or sub-second, but it’s less than a minute from the time the data is appearing in the shipping system (which then monitors the loading of the trucks) to the time when we see it in Power BI. We get a very quick refresh rate in Analysis Services and in Power BI.

Advantages of “The Other Realtime”

There are a couple of advantages to using this kind of DirectQuery approach instead of a pure Realtime or Near-Realtime setup.

Up-to-Date Data
With “other” Realtime there’s no extra management overhead to maintain a separate copy of the data, like an in-memory cache of Analysis Services. We are not doing our typical Analysis Services processing. Therefore, changes to the underlying data source can be immediately reflected in queries against the data model.

Large Data Sets
Another advantage of this scenario is that data sets can be larger than the memory capacity of the Analysis Services server. In Analysis Services, we’re limited to the amount of RAM on the server where it lives. In typical model processing, we have to be able to hold the entire data set in memory during that processing step. That’s not true with DirectQuery, although there are some other data limitations we’ll cover at the end of this blog.

Provider-Side Query Acceleration
DirectQuery can also take advantage of provider-side query acceleration – meaning that SQL server has the notion of memory-optimized column indexes. We’re basically pushing our processing from Analysis Services down to the SQL level, which is an interesting way of approaching query acceleration.

Query Efficiency Optimization
Finally, if the model does contain complex formulas that could potentially require multiple queries, Analysis Services performs optimization to ensure that the query plan executed against the back-end database will be as efficient as possible.

“Other Realtime” Factors

While “the Other Realtime” has many advantages, it has some drawbacks and limitations. Key considerations to be aware of when using this approach include:

Database Limit
The query can only run against a single database instance. As we saw above, there are ways to work around this.

Inefficient Modeling
Complex modeling may not be efficient within a direct query model. It’s rare to have complex modeling requirements in a Realtime scenario. If you find yourself looking at complex modeling requirements, you might want to ask yourself if the use case in question would be better addressed by a Near-Realtime solution.

Record Limit
We have a one million record row limit by default. Again, a DirectQuery Realtime use case with one million records would be rare. If you have too many records, consider truncating inactive (historical) data from the previous day(s).

Supported Data Sources

Be aware that the main data sources supported by “the Other Realtime,” or DirectQuery, are Microsoft SQL Server, Azure SQL DB, Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse, and also known in the on-premise world as the Microsoft SQL Analytics Platform System – or APS), Oracle relational databases, and Teradata relational databases.

Subscribe to our Newsletter

Stay informed on the latest technology news and trends

Relevant Insights

Cybersecurity Myth Busted: Tools Are the Solution

When thinking about security, people often gravitate towards implementing various security tools, solutions, or products. If you bring up a...

Time to Reconsider MP-BGP EVPN for Your Datacenter Network?

VxLAN was defined in 2014 by RFC 7348 and has been used as a component in several SDN (software defined...

Part-Time Staff Cannot Monitor and Manage Your Security

Security is a full-time responsibility. It’s not 9-5. You have concerns, risks, and attacks 24/7/365. Adversaries work around the clock...