Validate aggregate measure results against a gold standard
  • 21 Dec 2022
  • 3 Minutes to read
  • Dark
    Light

Validate aggregate measure results against a gold standard

  • Dark
    Light

Article summary

Background and Strategy

"Happy families are all alike; every unhappy family is unhappy in its own way."

-- Leo Tolstoy, Anna Karenina

Integrating a new data source, like any significant data engineering effort, involves hundreds of small decisions and implementation steps. Each of these is an opportunity for failure, and, unhappily, the nature of the work is that all it takes is a single failure to produce catastrophically inaccurate results.

However, a corollary to this is that if the final results of an integration effort can be shown to exactly match a well-vetted, accurate "gold standard", it is usually safe to conclude that the implementation has been fully successful, at least in the portion of the data flow involved in generating the result in question. Like Tolstoy's happy families, successful implementations avoid the thousands of potential mishaps that might distinguish them, producing the same results as every other successful implementation.

Given this, a best practice for every integration effort is to obtain as many gold standard aggregate statistics as possible for a data source, develop the means to produce the same statistics in Ursa Studio, and then compare the results.

For example, when integrating a claims data package, it's not unusual for the originating organization to have reporting capabilities to produce gold standard aggregate, population-wide statistics like member counts; per-member-per-month spending; and counts (or per-member rates) of hospital inpatient admissions, ED visits, and office visits. These basic measures -- and many others -- are included in the Ursa Health Population Health Foundations (PHF) module that is typically included with each Ursa Studio license, meaning no significant new development should be needed to generate comparison results.

If and when discrepancies are found, the first step should be to double-check the concordance of time periods and exclusions between the two sets of results. If possible, start with the narrowest cohort definition available in the gold standard to minimize the opportunity for misalignment. If denominator and numerator values are broken out in the gold standard, break those out in the Ursa Studio results as well, which might help isolate the source of the overall discrepancy. Once agreement is achieved for any subset of patients or observations -- no matter how narrowly defined -- expand the scope of the analysis step by step until the discrepancy reemerges, then carefully scrutinize the logic that was newly involved in that last expansion.

The process of flushing out the root cause of these discrepancies can be as much an art as a science; but the process described above -- starting small, finding a kernel of agreement, and slowly building out from there -- is as reliable way as any to prosecute the task.

Detailed Implementation Guidance

  1. It can take some time to locate and clear the necessary bureaucratic hurdles needed to obtain gold standard results; a best practice is to request this type of information early in the integration project, well before it is actually needed.

  2. Generally, nearly any aggregate statistics, if known to be accurate, can be productively put to use as a gold standard for validation purposes. (It will likely be much easier to produce a matching measure in Ursa Studio than it will be for the data source or other authority to generate and verify a new set of gold standard results.) The request for gold standard materials can and should therefore be quite broad and open-ended -- essentially, for whatever is on hand.

  3. Create a dedicated report for the gold standard review, and memorialize the Analytics Portal configuration needed to produce the matching results in a board. The board should also include, as comments, the respective values from the gold standard report, so that in the future the report can be rerun and used to validate that the integration logic is still sound. (And set the report to Prevent Passive ELT to avoid unnecessary runs during routine ELTs.)


Was this article helpful?