How to understand application performance in Ursa Studio
  • 26 Jul 2022
  • 5 Minutes to read
  • Dark
    Light

How to understand application performance in Ursa Studio

  • Dark
    Light

Article summary

Ursa Studio's design philosophy is to delegate as much computational work as is possible to the database, so understanding the ways in which Ursa Studio leverages the database is a key to understanding the performance profile of Ursa Studio as a whole.

There are two databases underpinning Ursa Studio: the application database and the client database. The application database handles the metadata that powers the app, such as measure definitions, user credentials, and ELT build history. Even though the application database handles most of the communication from Ursa Studio, this work is rarely processor-intensive and is typically not a source of sluggishness in the system. Indeed, customers on Hybrid Cloud and Ursa Cloud deployments might not even be aware of the existence of the application database, which is fine; it never houses what customers think of as "their data".

The client database is used whenever customer data is being processed. This includes data imports, ELT pipelines, report runs, and requests from Analytics Portal. Some data assets -- notably, value sets and lookup tables -- are also managed via system tables in the client database, because they often need to be joined with other data tables in the client database.

Data imports, ELT pipelines, and report runs are all implemented in Ursa Studio as long-running processes with dedicated connection pools. This does not mean that these processes will necessarily be free of data contention issues. These typically represent the most processor-intensive workloads managed by Ursa Studio, and often tie up resources on the client database.

Requests from Analytics Portal also hit the customer database. Because responsiveness is a more important priority within the context of Analytics Portal than it is for long-running processes such as ELTs, Ursa Studio has built in many layers of caching and optimization to service these requests as efficiently as possible.

The simplest of these caching mechanisms is the browser-side cache in Analytics Portal. Whenever a user navigates back to a chart or a case review table that they'd previously requested, the results are served immediately via a cache that's kept in the browser state, without needing to make a new request to the Ursa Studio server. This mechanism is quite limited, and does not survive a screen refresh.

Similarly, Ursa Studio keeps a cache in the database of responses for requests made by Analytics Portal users. When another user, or the same user in a new browser session, requests the same analysis, the results are served immediately from this cache, instead of requiring the analysis to be re-computed. The results of all the charts in Analytics Portal are saved in this cache, as are the row counts as displayed in the tabs at the top of the case review data table. The actual case-level data table as shown onscreen in case review is not cached via this mechanism, and is recalculated upon each request.

The database-side cache is invalidated whenever the report is re-run with new data. The final step of every report run is to prepopulate this cache with common requests, such as are most likely to be requested by users. The new report instance is accessible throughout this "cache common requests" routine, which runs in the background.

Requests from case review typically run quickly, because they are limited to a maximum of 250 records and generally do not contain any processor-intensive work such as grouping. Long-form measures and interval measures display a view of the data that has been rolled up to the case level; the results reported to the user thus are not simply one row per row of the underlying measure table. In order to eliminate the need to calculate this grouping on a per-request basis, every report run creates a companion table to the underlying measure table whose grain size is one row per case, and this table will be used whenever possible so as to minimize processing. There are circumstances where it is impossible to use the one-row-per-case optimized table, such as if there is a filter in place on a field that varies within each case. Such queries can sometimes be slow to run on large datasets.

Other factors that might cause extra latency within case review would be if there is an ad-hoc derived field that requires a large computational burden to calculate. Window functions such as FIRST VALUE ACROSS INSTANCES are often the main culprits here. It's advisable to re-implement slow-running ad-hoc derived fields in the measure itself as soon as they have proved their value, so as to move the computational burden to the moment of the report run. 

Lastly, running data review on an object that is implemented as a view can also be a source of slowness, and the corresponding fix to move the computational work to a long-running process is to change the object from a view to a table and re-ELT.

Just as case review has its optimized (case-grained) table to handle most requests, measure explorer (as well as similar chart-driven screens such a report home) leverages a dimensional table, automatically created during the report run, which is a cube across every field designated on the measure as being analytic. This table is often much faster to query and is used whenever possible. As with case review there are circumstances where it would be impossible to use this optimized table, such as if there is a filter or split on a non-analytic field; in this circumstance Ursa Studio automatically routes this query to the raw measure table.

By choosing the "fully collapsed" configuration optimization for a measure in Measure Workshop, users can designate a table to effectively only exist as this dimensional cube. Such a designation typically saves database storage space as well as the time to run the report, but makes case review and non-analytic filters impossible.

In short, Analytics Portal makes a lot of routing decisions to use the most efficient possible queries for user requests. Users can "peek underneath the hood" by opening https://your.ursa.url/RequestHistory/quickPrint in a new tab to see these decisions for all the requests they've made in the current day. The request history page shows whether the request was handled from a cache, or from an optimized table, or from the raw measure table, and shows the SQL that was executed against the client database. For RDBMSes that support it, this page also shows the query plan.

Data imports, ELT pipelines, and report runs are all implemented in Ursa Studio as long-running processes with dedicated connection pools. This does not mean that these processes will necessarily be free of data contention issues. These typically represent the most processor-intensive workloads managed by Ursa Studio, and often tie up resources on the client database. For help troubleshooting ELT performance, see How to troubleshoot ELT performance in Ursa Studio and How to perform brute force troubleshooting in Ursa Studio.


Was this article helpful?