Data Model Keys
  • 16 Feb 2024
  • 2 Minutes to read
  • Dark
    Light

Data Model Keys

  • Dark
    Light

Article Summary

Data model keys are the fields that uniquely identify instances of most of the concepts represented by objects in the data model. They typically serve as the primary keys of the tables associated with these objects, and as the foreign keys used to link these objects to each another.

These fields are easily identifiable in the Ursa Health Core Data Model by their names, which always end in "ID". For example: Patient ID, Provider ID, Plan ID, EMR Encounter ID, etc. These fields are typically assigned to the Data Model Keys field group, which occupies the first position in the default field order; that is, when an object's contents are viewed in Case Review, the data model keys are usually the first columns on the left.

In the standard data integration process implemented in Ursa Studio, data model keys are constructed by concatenating the Source ID value of the record's source system with the value of a "source local key" field -- that is, an identifier that is locally unique in the source system. For example, a patient with a local identifier of "A123" originating in a source system with Source ID of "EDW" might end up with an identifier of "EDW|A123". This approach guarantees uniqueness in the Natural Object Layer of the data model and beyond, where data originating in different source systems are intermingled.

(Note that it's possible the record in the example above ends up with a Patient ID value other than "EDW|A123" due to master data management. For example, if data mastering finds that the patient associated with this record is the same patient as that represented elsewhere in the EDW system with local identifier "A012", the resulting record in the Patients object might take a Patient ID value of "EDW|A012", with the contents of that resulting Patients record representing the synthesis of patient features from both records.)

FAQ

Q: Why not generate entirely new ID values -- e.g., as incremental integer values -- when adding records to the data model?

A: There are a few rationales. First, in cases where data mastering isn't required, it's simpler to generate data model keys directly from source local keys, because there is no need to establish a centralized infrastructure that guarantees the newly created identifier is distributed to all tables that require it. Rather, the new data model key can simply be created "in place" from the local identifier.

Second, in cases where the source local key values are consistent over time, it maintains that consistency. There is often value in memorializing particularly interesting cases, and consistent identifiers makes this much easier.

Third, it makes it simpler to trace a record backwards in the data journey, from the data model or a report to the record's system of origin, without needing to reference a crosswalk table.

Q: Data model keys values might get long, depending on the length of the Source ID and source local key values; do these long strings degrade database performance when used in a join predicate?

A: Though it might vary between the database management systems supported by Ursa Studio, our understanding join field length has a negligible impact on join performance. Nonetheless, a best practice is to use short Source ID values of, ideally, 2 or 3 characters. This reduces any potential performance penalty and, more practically, makes case review easier by keeping the column width of data model key fields smaller.


Was this article helpful?

What's Next