- 08 Nov 2022
- 2 Minutes to read
- Print
- DarkLight
Determine which source tables contain information needed for mastering
- Updated on 08 Nov 2022
- 2 Minutes to read
- Print
- DarkLight
Background and Strategy
The first step of the mastering process, as implemented in Ursa Studio, is to gather together all records needed for mastering. This task describes how those records should be identified.
First, naturally, if a source system concept has its own dedicated file or table in the data package, the semantic mapping object used to map in that file or table should be included in the mastering. (Again, this is only for concepts for which a need to master has been positively identified in the previous task.)
However, it is common for data packages from a source system to be missing one or more of the dedicated dimension tables that you would typically expect to find in an appropriately normalized enterprise data warehouse. A common example of this involves provider information: in claims data extracts especially, providers are usually identified with “external” identifiers like NPIs rather than an “internal” Provider ID value, and consequently there is rarely a pressing need for a Provider table, and indeed a Provider table is rarely included in such extracts.
In these cases, the recommended approach is to include each record with any valid identifier – one of which should have been mapped to a “Source Local” identifier during semantic mapping – in the mastering. For example, if every professional claim includes Billing Provider NPI, Billing Provider TIN, and Billing Provider Description fields, and there is no Providers table in the extract, each professional claim should be included in the mastering.
Effectively, when mastering, we want every unique identifier value for a concept – found in any field in any table in the source data package – to be gathered up, because we want the resulting crosswalk table to include an entry for every possible local identifier value. Relatedly, we will eventually populate the respective natural object for each of these concepts in a similarly robust way, such that every natural objects should include an entry for every (mastered) instance of the concept. That is, to continue the example above, the Providers natural object should ultimately include an entry for every distinct billing provider referenced in at least one professional claim, institutional claim, or any other type of record.
Finally, even in cases where a dedicated dimension table for a concept is included in the package, it is prudent to adopt a precautionary approach and comb through the relevant fact tables to see if they contain identifier values that are missing from the dimension table. This happens more than you might expect. For example, a claim might reference an internal provider identifier value that doesn’t exist in the local provider file; if we only drew from the local provider file for mastering, we would fail to generate a proper Provider ID for that provider, leaving them in the sort of limbo state – present on the claim but not present in the Provider natural object – from which pernicious bugs often emerge.