Determine the grain size of each source data object
  • 03 Nov 2022
  • 5 Minutes to read
  • Dark
    Light

Determine the grain size of each source data object

  • Dark
    Light

Article summary

Background and Strategy

A data table is a simple structure: it has columns and rows. While it is common to receive a data dictionary that defines each column in a table, it is unusual to receive good documentation on its "grain size" -- what each row represents. Does this table contain claim headers or service line items? Tasks or subtasks (or both)? Transactions or final action claims? And what inclusion / exclusion criteria were used? Does the membership file include only patients with an active plan membership on the extract date, or those with a membership at any time in the extract's multi-year coverage period. It is often necessary to answer these questions oneself. This task discusses that exercise.

There is no single approach to recommend given the breadth of data one might encounter in an integration project. Generally speaking, sorting the data in a particular way, and looking at how the values in certain critical columns change or don't change, can reveal patterns that allow a clear interpretation of grain size to be made. For example, to tell the different between a patient table and an patient-periods table, sort by the patient identifier and observe whether there are multiple records for the same patient.

One specific (and common) grain size mystery, worth spending some time discussing here, is whether claim tables contain final action claims or transactions.

As background, claims data extracts typically take one of two forms: (1) “final action” data, in which each record represents the current state of a claim after accounting for all the updates applied to it so far; or (2) “transactional” data, in which each record represents a transaction establishing or updating the claim (or service line item on a claim).

It is common to receive claims data in transactional form, especially when receiving a data extract from a payor (whose administration systems are more oriented around processing transactions). However, because transactional claims data are more difficult for analysts to work with, and because most analytic use cases have no need to know about the transactional sausage-making that went in to producing the claim in its final form, the claims objects in the Ursa Health Core Data Model all use the “final action” standard. Consequently, the integration effort must often do the dirty work of reconciling transactional data to generate “final action” claims. This starts with an accurate understanding of the source data grain size.

A complicating factor in interpreting claims data is the structure of medical claims: a two-tier hierarchy consisting of a header (the parent) and one or more subordinate service line items (the children). (Note that pharmacy claims have no analogous structure; they are just singleton “claims”, without any need to specify whether a record is a “header” or “service line item”.) Because transactions for medical claims are used to modify both header-level and line-level information, transactional data for medical claims should typically be expected to take a similar form: a transaction header with one or more subordinate transaction line items (corresponding to each of the service line items on the claim it is modifying).

A good way to determine whether the data are transactional is to review the data and look for reversal transactions, which many administrative systems apply every time any modification is made to claim, making them a common marker of transactional data. The most visible indication that a record is a reversal is a negative number in an “amount” field – paid amount, unit count on a medical claim, days supply on a pharmacy claim, etc. There might also be a transaction type field – or something similarly named – that might identify reversals (along with other types of transactions), though the value be coded (e.g., “O” for original transactions, “R” for reversals, and “A” for adjustments).

More generally, any repetition of the same claim identifier (for pharmacy claims or header-level medical claims data) or repetition of the same claim identifier-service line number pair (for line-level medical claims data) suggests that the data are transactional.

How these values should be mapped into destination fields during semantic mapping is covered in a later task; for now, it is sufficient to reach a good understanding of the source data, and to document that understanding, either in the object description of the source data object containing the data, or in a thoughtfully configured Case Review memorialized in a board for the object.

Detailed Implementation Guidance

  1. Two (complementary) methods are recommended to document the grain size of each source data file. First, in simple cases it is probably sufficient to update the Object Description field in the Import object or Registered Table object. (E.g., "Contains one record per medical claim service line item."); second, if a particular useful configuration (sorting, filtering, ad hoc derived fields, etc.) has been set up in Case Review of an Import or Registered Table object, that configuration should be memorialized as a board.

Examples

Example 1: Claim service line items in final action format

Consider the following sample data:

pat_mrnclm_idlineproc_codepaid
110199123100
11027654350
1103123450
11114258634

The four records here represent the service line-level detail of 2 claims: two claim headers, the first claim (clm_id = 10) has 3 service line items, the second claim (clm_id = 11) has 1 service line item.


Example 2: Claim service line items in Example 1 as transactional data

The following table illustrates what a transactional version of the claims data in the example above might look like:

pat_mrnclm_idlineproc_codepaidtrx_clm_idtrx_detail_idtrx_typetrx_seq_no
110199123801001100101O1
110199123-801002100102R2
1101991231001003100103A3
110276543501001100201O1
110276543-501002100202R2
110276543501003100203A3
11031234501001100301O1
11031234501002100302R2
11031234501003100303A3
111142586341101110101O1
111287654501101110201O1
111142586-341102110102R2
111287654-501102110202R2
111142586341103110103A3

In this example, the first claim appears to have gone through two rounds of revision following the initial submission, for a total of three header transactions (trx_clm_id = {1001, 1002, 1003}), each with three "child" service line item transactions.

The second claim also has three transaction headers, but one of the two service line items on the original claim (for proc_code = 87654) appears to have been reversed (with trx_detail_id = 110202) but not reinstated in the subsequent transaction (trx_clm_id = 1103), leaving the claim with only one service line item (for proc_code = 42586) in its final action status.


Was this article helpful?