Create and run Semantic Mapping objects

09 May 2023
9 Minutes to read

Print
Share
Dark
Light

Create and run Semantic Mapping objects

Updated on 09 May 2023
9 Minutes to read

Print
Share
Dark
Light

Article summary

Did you find this summary helpful?

Thank you for your feedback!

Background and Strategy

This task discusses the creation of the Semantic Mapping objects, putting the insights from the previous steps into use.

In addition to the straightforward mapping of source fields to template fields, there are a number of important configurations to consider. Many of the configuration options are self-explanatory; see the Ursa Studio product manual page for Semantic Mapping Objects for full reference details on how the controls on the screen work.

Additionally, some implementation best practices are described below.

1. Setting the object and table name values.

As described in an earlier task (see Determine what Semantic Mapping objects are needed), there is typically a one-to-one relationship between source data objects and semantic mapping objects. Because of this, it is recommended to name Semantic Mapping objects so that the connections to their respective source objects are easily identified. For example, if a source data object is named "Source X Medical Claims" (with table name, say, "sd_dataco_x_medical_claims"), the semantic mapping object would be named "Source X Medical Claims SM" (with table name "sm_dataco_x_medical_claims").

2. Setting Database Object Type to be table vs. view.

As with all objects in Ursa Studio, Semantic Mapping objects can be created as physical tables or views. The longer-term determination of which of these to use is discussed in a later task (see: Determine which Semantic Mapping and Local Transform objects should be views), but tables are the recommended option in the near term when frequent data review is likely to be needed.

3. Generating "Source Local" identifier values when one or more source fields might be NULL

An earlier task (see Determine the mappings for identifier fields) described the possibility of needing to generate "Source Local" identifiers from multiple source data fields. (For example, generating Source Local Provider ID values opportunistically from NPIs, TINs, or other provider identifiers, only some of which might be populated on any given source data record.) The recommended handling of this scenario is to create Intermediate Derived Fields using the Concatenate pattern (with the Returning NULL if one or more tokens are NULL option selected), with field names like Source Local Provider ID from NPI or Source Local Provider ID from TIN, and then use the Coalesce pattern to enforce the desired prioritization logic (for example, favoring NPI-derived identifiers over TIN-derived identifiers.

4. Accessing metadata from the Import object to generate important template field values.

Under normal circumstances, the Source ID field should always be populated during semantic mapping. Users can obtain the Source ID specified in an Import objects using the Import Metadata: Source ID option; for Registered Tables you will likely need to generate the Source ID "manually" using the Literal derived field pattern.

Another practically mandatory field -- useful as a way to characterize the freshness of the source data -- is the Source Data Effective Datetime field. The data effective date captured by the Import object can be invoked using the Import Metadata: Source Data Effective Date option.

It is sometimes useful to extract part of the source data file's filepath to populate template fields during semantic mapping. (For example, if the only information describing the period of health plan enrollment for patients listed in a particular file is a YYYYMM token in the filename itself.) Fields based on the filepath of the source data file can be generated by invoking Import Metadata: Filename, then applying the appropriate sequence of string functions, likely as Intermediate Derived Fields, before making the final assignment in the Template Fields panel.

5. Generating interval end dates.

An important semantic convention used throughout Ursa Studio is that intervals of time are denoted with an inclusive start date and an exclusive end date. For example, a full calendar year 2021 of health plan membership would have a start date of January 1, 2021 and an end date of January 1, 2022; the patient would be considered enrolled on January 1, 2021 and not enrolled on January 1, 2022.

(Note that the convention does not apply to event or encounter end dates. For example, if a claim covers a single date, say, January 1, the claim covered start and end dates will both be January 1 in the source data, and it is appropriate to leave the end date as January 1.)

Source data often use a different convention, i.e., inclusivity on both the period start date and period end date. (Under that convention, for example, a calendar year would have a start date of January 1 and an end date of Decmeber 31.) In those cases the semantic mapping must generate new, conformant end dates. This can easily be done using the Date Offset pattern.

6. Setting configurations in the Metadata panel.

Typically, the default visibility settings will be appropriate. (See Configure the object visibility settings for more details on object visibility settings.)

Complete the Case ID field if possible, and elect to Use Case Field for Primary Key. In addition to the benefits to data integrity, it will make Case Review faster, since the primary key can be used to paginate the results.

Setting the Temporal Class can be useful as a way to document the contents of the file, but it's not strictly needed for any technical purpose, and can be left blank if the object description is sufficiently informative as to the contents of the object.

You will likely not have generated a Patient ID value at this point (Semantic mapping typically generates only the Source Local Patient ID, with the final Patient ID value assigned after master data management, which occurs later), in which case the Patient ID control should be left blank.

7. Setting up Validation rules.

Users may add validation rules to Semantic Mapping Templates, and the pre-fab templates in the Ursa Health Core Data Model contain a number of relevant rules. (For example, the Professional Claim Service Line Item Fields template includes checks of correct string length for fields related to HCPCS codes, Place of Service codes, and NPIs.) These validation rules can be inherited by Semantic Mapping objects that invoke those templates selecting the Import Template Rules option in the Validation panel. The final set of rules to be applied to any given Semantic Mapping object can -- and typically should -- then be pared down to only those appropriate given the nature of the source data.

8. Cloning Semantic Mapping objects.

It's often the case that many of the configurations set for a Semantic Mapping object -- e.g., Mass-Remove Special Characters entries, Consider Values as Null values, some portion of the object and table names, etc. -- are portable to other Semantic Mapping objects associated with the same data source. Cloning an existing Semantic Mapping object is a convenient way to avoid some of the largely boilerplate setup that would otherwise be necessary. (Just be careful to remove settings that are not relevant.)

Examples

Example 1: Transactional medical claims

Consider the following source data file containing transactional medical claim service line item data:

pat_mrn	clm_id	line	proc_code	plan_paid	trx_clm_id	trx_detail_id	trx_type	trx_seq_no	dx_1	dx_2
1	10	1	99123	80	1001	100101	O	1	A010	Z0001
1	10	1	99123	-80	1002	100102	R	2	A010
1	10	1	99123	100	1003	100103	A	3	A010
1	10	2	76543	50	1001	100201	O	1	A010	Z0001
1	10	2	76543	-50	1002	100202	R	2	A010
1	10	2	76543	50	1003	100203	A	3	A010
1	10	3	12345	0	1001	100301	O	1	A010	Z0001
1	10	3	12345	0	1002	100302	R	2	A010
1	10	3	12345	0	1003	100303	A	3	A010
1	11	1	42586	34	1101	110101	O	1	A0109
1	11	2	87654	50	1101	110201	O	1	A0109
1	11	1	42586	-34	1102	110102	R	2	A0109
1	11	2	87654	-50	1102	110202	R	2	A0109
1	11	1	42586	34	1103	110103	A	3	A0109

After semantic mapping, the data might look like the following:

Source ID	Source Local Patient ID	Source Local Claim ID	Source Local Service Line Number	HCPCS Code	Increase to Plan Paid Amount	Source Local Transaction Header ID	Source Local Transaction Service Line Item ID	Transaction Type Operational ID	Is Reversal Transaction	Transaction Sequence Number	Diagnosis 1 ICD-10-CM Code	Diagnosis 2 ICD-10-CM Code
DS	MRN\|1	10	1	99123	80	1001	100101	O	0	1	A01.0	Z00.01
DS	MRN\|1	10	1	99123	-80	1002	100102	R	1	2	A01.0
DS	MRN\|1	10	1	99123	100	1003	100103	A	0	3	A01.0
DS	MRN\|1	10	2	76543	50	1001	100201	O	0	1	A01.0	Z00.01
DS	MRN\|1	10	2	76543	-50	1002	100202	R	1	2	A01.0
DS	MRN\|1	10	2	76543	50	1003	100203	A	0	3	A01.0
DS	MRN\|1	10	3	12345	0	1001	100301	O	0	1	A01.0	Z00.01
DS	MRN\|1	10	3	12345	0	1002	100302	R	1	2	A01.0
DS	MRN\|1	10	3	12345	0	1003	100303	A	0	3	A01.0
DS	MRN\|1	11	1	42586	34	1101	110101	O	0	1	A01.09
DS	MRN\|1	11	2	87654	50	1101	110201	O	0	1	A01.09
DS	MRN\|1	11	1	42586	-34	1102	110102	R	1	2	A01.09
DS	MRN\|1	11	2	87654	-50	1102	110202	R	1	2	A01.09
DS	MRN\|1	11	1	42586	34	1103	110103	A	0	3	A01.09

Note the following:

As described in Determine the mappings for identifier fields, source identifiers like pat_mrn and clm_id have been mapped to the appropriate "Source Local" identifier fields.
As described in Determine the mappings for claims or billing transaction fields, the transactional data has been properly interpreted, more notably:
- The identifiers for the transactional headers and line items accurately mapped to Source Local Transaction Header ID and Source Local Transaction Service Line Item ID, respectively;
- The paid_amt accurately assessed to represent the change in running total paid amount (and therefore mapped to Increase to Plan Paid Amount)
- The trx_type translated into the more robust binary flag, Is Reversal Transaction.
The diagnosis codes formatted to include the semantically conformant decimal.

Example 2: Monthly plan enrollment periods

Consider the following source data file from PayorCo providing a list of months of enrollment for each member.

member_id	member_month	policy_no	product_code
1	012021	A123456	GOLD4000
1	022021	A123456	GOLD4000
1	032021	A123456	GOLD4000
1	102021	A234567	GOLD6000
1	112021	A234567	GOLD6000
1	122021	A234567	GOLD6000

In this sample the patient appears to have a break in coverage starting in April 2021 and ending in October 2021 when coverage resumes (with a new insurance product).

After semantic mapping, the data might look like the following:

Source Local Patient ID	Source Local Payor ID	Source Local Plan ID	Period Start Date	Period End Date	Member Policy Number
1	PC	GOLD4000	1/1/2021	2/1/2021	A123456
1	PC	GOLD4000	2/1/2021	3/1/2021	A123456
1	PC	GOLD4000	3/1/2021	4/1/2021	A123456
1	PC	GOLD5000	10/1/2021	11/1/2021	A234567
1	PC	GOLD5000	11/1/2021	12/1/2021	A234567
1	PC	GOLD5000	12/1/2021	1/1/2022	A234567

Note the following:

Since there is no column identifying the payor sending the data (for understandable reasons -- it probably would have felt redundant to include in a file that only contains PayorCo data already), Source Local Payor ID is generated out of thin air, probably using a Literal pattern.
The product code is set to the Source Local Plan ID. The Ursa Studio definition of the term "plan" -- effectively, the package of health insurance benefits enrolled in by consumers -- is synonymous with "product" as used by many organizations.
The enrollment period dates use the standard Ursa Studio convention of inclusive start dates and exclusive end dates.

Was this article helpful?

What's Next

Validate the semantic mapping results