Determine the mappings for identifier fields
  • 17 Feb 2023
  • 9 Minutes to read
  • Dark
    Light

Determine the mappings for identifier fields

  • Dark
    Light

Article summary

Background and Strategy

As described earlier, semantic mapping is the process of using source data to generate field values that conform to the destination data model standards. While much of semantic mapping involves the straightforward renaming of fields, the generation of identifier fields -- that is, the primary keys of the tables or views associated with each object -- is a special case that can be quite complicated. This task addresses that special case.

First, some background on how identifiers are meant to be constructed in Ursa Studio.

The unique identifiers for each instance of an object are typically formed by concatenating the Source ID value of the record's source system of origin with the local identifier that had served as the primary key in that source system. For example, a patient with a local identifier of "A123" originating in a source system with Source ID = "EDW" might end up with an identifier of "EDW|A123". This construction effectively guarantees uniqueness even after records originating in different source systems are intermingled. (See Data Model Keys for more details.)

The role of semantic mapping in the construction of these data model keys is to map local identifier fields in the source data -- e.g., the field containing "A123" in the example above -- to the appropriate "Source Local" identifier -- fields with names like Source Local Patient ID, Source Local Billing Provider ID, and Source Local Plan ID intended to store the original identifiers used in the source system. (The data mastering process, which follows semantic mapping, will handle the merging of duplicative records and the assignment of the final data model key values -- e.g., the "EDW|A123" in the example above.)

A quick point of clarification: a field like Source Local Plan ID should be populated with the source system's identifier for the entity that meets the definition of a plan in the destination system, not necessarily what the source system calls a "plan". In other words, if a source system uses a column labelled "payor_id" to store the identifier for a member's specific insurance product -- i.e., what would be considered the Plan in the Ursa Health Core Data Model -- that source field should be mapped to Source Local Plan ID, not Source Local Payor ID, even though the word "payor" appears in its column name.

This convention can get especially confusing when dealing with transaction identifiers in transactional claims source data. In particular, it is common to encounter tables that use the term "claim_id" or "clm_number" (or equivalent) to uniquely identify transactions, rather than the claims they are reversing or adjusting; but a claim and a transaction are different concepts in the Core Data Model, and so despite the use of the word "claim" in the source system column header those transaction-level identifier values should not be mapped to Source Local Claim ID, but rather a field like Source Local Transaction ID.

The examples above all represent one-to-one mappings, which are relatively straightforward, differences in nomenclature notwithstanding. But another complicating circumstance is when the cardinality of the source and destination concepts is different, resulting in one-to-many, many-to-one, and other source-destination relationships. Special care is often needed to handle these situations appropriately, as described below.

For example, a source system might have two concepts, “Staff” (i.e., individual providers) and “Organizations” (i.e., organizational providers), that both map onto the destination data model’s concept of “Provider” (i.e., individuals or organizations involved in providing health care); in this example, information about both “Staff” and “Organizations” should be mapped into the destination data model’s “Provider” fields – e.g., Provider NPI, Provider Description, etc. – during semantic mapping. This is a many-to-one mapping.

Now consider the practical implications of mapping these two source concepts into a single destination concept. Both the primary key of the Staff table and the primary key of the Organizations table must be mapped into the same destination field, Source Local Provider ID. This raises a potential problem: what if the original identifier value for a Staff record is the same as the original identifier value for an Organizations record? (This could easily happen, for example, if both tables are generated with incrementing integer identifier values.) To avoid the resulting collision, the original identifier values will need to be amended in some way that prevents these kinds of collisions. The standard remedy here is to prepend a token -- i.e., a character or short string of characters unique to the original concept, say “S” for Staff table records and “O” for Organization table records -- which guarantees uniqueness when those records are collected into a single destination object with a single identifier field.

In contrast, one-to-many mappings don’t have the same problem. For example, if a source system has a single table for all medical claims (i.e., both institutional and professional claims) that is being conformed to a data model with separate tables for institutional and professional claims, there is no need to decollide the primary key values in the source system’s medical claims table. That is, those primary keys can be directly mapped, “as is”, into the destination Source Local Claim ID field.

Relatedly, it is not uncommon to encounter a “zero-to-one” relationship between source and destination system concepts. For example, a source system might not have any formal concept of a provider represented in the dataset – that is, it might have no dedicated Providers table, and provider information simply exists as fields on other tables (as when, say, Prescribing Provider NPI and Prescribing Provider Name fields appear on a pharmacy claims table). We are still able to, and should, populate the destination system’s Providers table under these circumstances, and therefore we must generate appropriate Source Local Provider ID values from scratch, as it were.

This is accomplished by simply imagining the existence in the source system of such a concept – in this case, a Providers table – with a primary key of whatever appropriate field or combination of fields is on hand – e.g., an NPI or TIN, or even a descriptive field like a name, if necessary.

(An especially challenging variant of this "zero-to-one" scenario occurs when a transactional claims data file arrives without claim-level identifiers. Potential solutions to this are discussed in detail in the next task, Determine the mappings for claims or billing transaction fields.)

In these “zero-to-one” scenarios it is often the case that the same pseudo-key-generating field is not always available on every record. For example, a claims table might have a Prescribing Provider NPI but a Pharmacy Provider TIN. These represent two different entities, obviously, and both of them must end up as entries in the destination Providers object, so they must both be assigned a different unique identifier, even though one identifier will be constructed from an NPI while the other will be constructed from a TIN. Just as in the many-to-one scenario above, to the extent different fields must sometimes be used to populate the Source Local Provider ID, we must prepend to these pseudo-key values a unique token describing their origin, say “NPI” or “TIN”.

To keep all these interpretations and tokens straight, it’s a good idea to document each such translation to a “Source Local” field, and the associated tokens, in a central location so that any other developers participating in the integration will generate Source Local identifier field values correctly. For example, a term for Source Local Provider ID can be created in the Namespace for that source's integration work, in which these tokens can be documented.

Finally, “one-to-zero” scenarios, in which there is a source system concept that has no analogue in the destination model, are common. This commonly happens when the source system data model is more normalized than the destination data model. For example, a source system might have an Addresses table, with fields like City and ZIP Code, and with primary key Address ID; but the destination data model has no distinct concept of an address. (Fields like City and ZIP Code are just additional fields on objects like Patients or Providers.)

To the extent these additional source tables contain useful information, they should be accessed and semantically mapped just like any other table, even though they are not destined to survive as a distinct concept in the Natural Object Layer. The plan will simply be to join to them in the integration process and obtain whatever information is needed, and then discard the keys used to effect those joins. To do this, a “Source Local” identifier – e.g., Source Local Address ID – will be needed. The twist here is that there will be likely be no “Source Local” identifier in any of the existing Semantic Mapping Templates, and so a template field for that new identifier will need to be added. (See task Create or extend Semantic Mapping Templates, which describes how to create such a template.)

Detailed Implementation Guidance

  1. When working with transactional claims data, a good practical method for identifying Source Local Claim ID and Source Local Claim Service Line Item ID fields is to sort records first by patient, then by service date, and then by any field that is likely to be distinctive to that claim or claim service line item and also unlikely to change over the course of any adjustments. (The billing provider is a decent option here, or, if working with line-level medical claims, the service line number, CPT/HCPCS code, or revenue center code.) This sorting should cluster transactions for the same claim or claim service line item together, allowing a quick visual scan of the records to pick out identifier fields containing the same value over the course of the sequence of transactions (i.e., the Source Local Claim ID and Source Local Claim Service Line Item ID fields) as well as those that change.

  2. The standard Semantic Mapping Templates are stocked with a wide variety of the Source Local identifier fields discussed above.

  3. Document the conventions used to generate Source Local field values in Term entries in the namespace of the source system. Doing so will put the definitions of these fields at the fingertips of all users working with the source data. (Look for the small circled-question-mark icons next to field names throughout Ursa Studio; hovering over these will provide the field's term.)

  4. Note that in the Ursa Health Core Data Model the concept of a "provider" includes both individual and organizational providers, as well as non-clinical staff -- such as technicians, receptionists, etc. -- at a provider organization.

  5. Generally, NPIs are a reasonable identifier to use for providers, if available.

  6. Be careful of using Tax Identification Numbers (TINs) as unique identifiers for providers, especially individual providers. Often the TIN listed for an individual provider will actually identify the medical group or other parent organization invoved in billing for that individual provider.


Was this article helpful?