Determine the grain size of each source data object

Updated on 03 Nov 2022
5 Minutes to read

Print
Share
Dark
Light

Article summary

Did you find this summary helpful?

Thank you for your feedback!

Background and Strategy

A data table is a simple structure: it has columns and rows. While it is common to receive a data dictionary that defines each column in a table, it is unusual to receive good documentation on its "grain size" -- what each row represents. Does this table contain claim headers or service line items? Tasks or subtasks (or both)? Transactions or final action claims? And what inclusion / exclusion criteria were used? Does the membership file include only patients with an active plan membership on the extract date, or those with a membership at any time in the extract's multi-year coverage period. It is often necessary to answer these questions oneself. This task discusses that exercise.

There is no single approach to recommend given the breadth of data one might encounter in an integration project. Generally speaking, sorting the data in a particular way, and looking at how the values in certain critical columns change or don't change, can reveal patterns that allow a clear interpretation of grain size to be made. For example, to tell the different between a patient table and an patient-periods table, sort by the patient identifier and observe whether there are multiple records for the same patient.

One specific (and common) grain size mystery, worth spending some time discussing here, is whether claim tables contain final action claims or transactions.

As background, claims data extracts typically take one of two forms: (1) “final action” data, in which each record represents the current state of a claim after accounting for all the updates applied to it so far; or (2) “transactional” data, in which each record represents a transaction establishing or updating the claim (or service line item on a claim).

It is common to receive claims data in transactional form, especially when receiving a data extract from a payor (whose administration systems are more oriented around processing transactions). However, because transactional claims data are more difficult for analysts to work with, and because most analytic use cases have no need to know about the transactional sausage-making that went in to producing the claim in its final form, the claims objects in the Ursa Health Core Data Model all use the “final action” standard. Consequently, the integration effort must often do the dirty work of reconciling transactional data to generate “final action” claims. This starts with an accurate understanding of the source data grain size.

A complicating factor in interpreting claims data is the structure of medical claims: a two-tier hierarchy consisting of a header (the parent) and one or more subordinate service line items (the children). (Note that pharmacy claims have no analogous structure; they are just singleton “claims”, without any need to specify whether a record is a “header” or “service line item”.) Because transactions for medical claims are used to modify both header-level and line-level information, transactional data for medical claims should typically be expected to take a similar form: a transaction header with one or more subordinate transaction line items (corresponding to each of the service line items on the claim it is modifying).

A good way to determine whether the data are transactional is to review the data and look for reversal transactions, which many administrative systems apply every time any modification is made to claim, making them a common marker of transactional data. The most visible indication that a record is a reversal is a negative number in an “amount” field – paid amount, unit count on a medical claim, days supply on a pharmacy claim, etc. There might also be a transaction type field – or something similarly named – that might identify reversals (along with other types of transactions), though the value be coded (e.g., “O” for original transactions, “R” for reversals, and “A” for adjustments).

More generally, any repetition of the same claim identifier (for pharmacy claims or header-level medical claims data) or repetition of the same claim identifier-service line number pair (for line-level medical claims data) suggests that the data are transactional.

How these values should be mapped into destination fields during semantic mapping is covered in a later task; for now, it is sufficient to reach a good understanding of the source data, and to document that understanding, either in the object description of the source data object containing the data, or in a thoughtfully configured Case Review memorialized in a board for the object.

Detailed Implementation Guidance

Two (complementary) methods are recommended to document the grain size of each source data file. First, in simple cases it is probably sufficient to update the Object Description field in the Import object or Registered Table object. (E.g., "Contains one record per medical claim service line item."); second, if a particular useful configuration (sorting, filtering, ad hoc derived fields, etc.) has been set up in Case Review of an Import or Registered Table object, that configuration should be memorialized as a board.

Examples

Example 1: Claim service line items in final action format

Consider the following sample data:

pat_mrn	clm_id	line	proc_code	paid
1	10	1	99123	100
1	10	2	76543	50
1	10	3	12345	0
1	11	1	42586	34

The four records here represent the service line-level detail of 2 claims: two claim headers, the first claim (clm_id = 10) has 3 service line items, the second claim (clm_id = 11) has 1 service line item.

Example 2: Claim service line items in Example 1 as transactional data

The following table illustrates what a transactional version of the claims data in the example above might look like:

pat_mrn	clm_id	line	proc_code	paid	trx_clm_id	trx_detail_id	trx_type	trx_seq_no
1	10	1	99123	80	1001	100101	O	1
1	10	1	99123	-80	1002	100102	R	2
1	10	1	99123	100	1003	100103	A	3
1	10	2	76543	50	1001	100201	O	1
1	10	2	76543	-50	1002	100202	R	2
1	10	2	76543	50	1003	100203	A	3
1	10	3	12345	0	1001	100301	O	1
1	10	3	12345	0	1002	100302	R	2
1	10	3	12345	0	1003	100303	A	3
1	11	1	42586	34	1101	110101	O	1
1	11	2	87654	50	1101	110201	O	1
1	11	1	42586	-34	1102	110102	R	2
1	11	2	87654	-50	1102	110202	R	2
1	11	1	42586	34	1103	110103	A	3

In this example, the first claim appears to have gone through two rounds of revision following the initial submission, for a total of three header transactions (trx_clm_id = {1001, 1002, 1003}), each with three "child" service line item transactions.

The second claim also has three transaction headers, but one of the two service line items on the original claim (for proc_code = 87654) appears to have been reversed (with trx_detail_id = 110202) but not reinstated in the subsequent transaction (trx_clm_id = 1103), leaving the claim with only one service line item (for proc_code = 42586) in its final action status.

Was this article helpful?

What's Next

Semantic Mapping