- 10 Jul 2024
- 13 Minutes to read
- Print
- DarkLight
ELT Overview
- Updated on 10 Jul 2024
- 13 Minutes to read
- Print
- DarkLight
ELT Overview
Ursa Studio comes with its own integrated ELT tool, which can be accessed through the Object Workshop dashboard. Users can choose objects for ELT by means of the checkboxes in the rightmost column of the objects list. When one checkbox is checked, users can see more fine-grained options for selecting the scope of the ELT. Users can choose to include specific objects in the ELT, as well as any upstream and downstream objects. Downstream views will automatically be added when an object is built, even if the user does not select to run downstream objects. The Object Workshop dashboard shows a preview of the ELT before it runs.
Objects can be set to "Prevent Passive ELT", which will cause the object not to be included in ELTs by virtue of a "Run This and Upstream" or "Run This and Downstream" of another object. The only way to include the object in an ELT will be to explicitly select it. This setting can be found in the Access and Use Restrictions panel.
For environments that support exporting to storage container, users can set an ELT object to export to storage container upon the completion of that object's build via a checkbox in the ELT setup panel. These settings are saved as part of a saved ELT.
The preview panel contains an "Investigate ELT" icon button, which looks for one of two possible hygiene concerns among the objects in the ELT. First: objects that currently have validation errors. Second: objects that have been updated since their last run. The presence of these warnings does not necessarily mean that the ELT will fail, but they are a good indicator of problems that might exist.
Users can select to be notified via email, on a per-ELT basis, when the ELT is complete or if it has errored.
To efficiently run the ELT, the application determines the dependency tree of all the ELT objects and runs the ELT from start to finish, leveraging parallelism whenever possible. The ELT objects screen shows the process of the ELT in real time. Validation warnings and errors are shown with each object after it finishes.
If any objects error, then their downstream objects will not be built, and the ELT will be considered to be failed when it finishes. However, a validation error or a system error will not necessarily stop the ELT progress if objects can be built that are not downstream of errored objects.
If the ELT finishes in an error status, or if it was manually canceled, then the ELT can be resumed as a whole. In this case, only the objects that errored are rebuilt, along with those downstream objects whose ELT was stalled by the errors. Failed reports will also be rerun. When doing so, only the particular measures that failed on each failed report will be rerun, not the report as a whole. Users can also resume the ELT of a given failed object on a per-object basis while the rest of the ELT is continuing.
Users can also choose how complete a rerun they want to perform on a per-object basis. The options are to perform the full load, the post-load only, or validation only. Objects that failed during their main load must be rebuilt per the full load. Typically, an erroring object will have to be repaired to avoid failing again.
Users can trigger the running of any affected downstream reports at the end of an ELT via a checkbox in the ELT preview panel. A report is considered to be downstream of an object if it has any measures that reference that object. The reports will be kicked off immediately after the ELT is finished and will be run one after another in sequence. If some of the ELT objects fail to run successfully, the platform will only run reports in which all objects have successfully been built.
Users can exclude any particular report from being run after the ELT is complete via an “X” button alongside the report in the preview. The exclusion of these reports will be saved as expected in saved ELTs. To reset the list of excluded reports, users can unselect and reselect the "Click to run affected reports" icon. Users can also exclude a report from being run post-ELT by setting the report's "Prevent Passive ELT" option.
An "In Progress" warning banner will appear on any object screen whose object is currently being built or is scheduled to soon be built as part of an underway ELT.
ELT Phases
The phases of an ELT for each object are as follows:
- Pre-load
- Perform bespoke staging work, for incremental loads
- Drop existing table, for full loads
- Run bespoke pre-SQL, if present
- Load
- Post-load
- Run bespoke post-SQL, if present
- Add indexes, if specified in Object Workshop
- Run statistics, if supported by the client database
- Grant permissions to ursa_admin role
- Validation, if present
Multiple ELT jobs can be run concurrently. These can be monitored independently from the ELT screen. Ursa will prevent collisions on ELT objects by prohibiting new ELTs containing objects that are complicit in ongoing ELTs. If such a new ELT must be run, the existing ELT must first be cancelled.
Incremental Load
Incremental load is supported across certain object types. Incremental load has a two-part setup, both of which are managed by the Incremental Load panel in each object.
First, the upstream object has to be determined to support incremental load. When an object has been opted-in to support incremental load, the user will have to identify the tracking field on that object. The tracking field must be a date or a datetime field that represents the freshness of each row. The tracking field should never be null, and every value of the field in each updated tranche of data must be more recent than the latest value of the field in all previous tranches. If no such field naturally presents itself, then you can create a derived field using the “Instantiation Datetime" pattern and use that as the tracking field.
Import objects automatically support incremental load, using the load_end_datetime import metadata field as its tracking field.
Second, the downstream object must be selected to execute incremental load. Upon each successful load, the downstream object will keep a record of the latest value of the upstream tracking field, and subsequent incremental loads will only append the rows with a more recent value in the incoming data.
Even if an object is set up to support incremental load, it is possible to run a full load on the object by checking the "Run Full Load" checkbox for that object in the ELT setup screen. Indeed, the first time an incrementally loadable object is run, it is necessarily done with a full load, because no most-recent-tracking-value has been recorded. Moreover, if the upstream object has changed its tracking field, or if the most recent ELT of the object was a failure, then a full load will be run, with an eye to support incremental load on the following run.
All objects that execute incremental load must have a primary key driven by the object's case field. This is to facilitate a deduplication step before the incremental data is appended to the table. Any primary keys in the existing table which are referenced in the updated tranche of data will be removed. It is possible via a checkbox in the Incremental Load panel to skip this step as a performance optimization, if you are confident that the incoming data will include no updates to existing records. If you skip the deduplication step and the incoming data does have updates to existing records, a primary key error will be thrown.
It is of course possible for objects to both execute incremental load from their upstream objects while also supporting incremental load for their downstream objects. By chaining together incrementally loadable objects it is possible to establish an entire ELT workflow using incremental load.
Object types that can support incremental load include: import objects, registered table objects, bespoke SQL objects, semantic mapping objects, integrators, and single stacks. Object types that can execute incremental load include: import objects, bespoke SQL objects, semantic mapping objects and single stacks. Currently, only single stacks with a single upstream object can execute incremental load.
To illustrate the mechanics of incremental load, imagine an example in which an import object is set up to support incremental load, and its downstream semantic mapping object is set up to execute the incremental load. Because the upstream object is an import object, the load_end_datetime field is automatically used as the tracking field. The import object is run on Monday morning, and the semantic mapping object is run immediately thereafter. The semantic mapping object, because it is set up to execute incremental load, commits to memory the latest (max) value of load_end_datetime across the rows of the import object as being Monday at 11am.
The next day in this example, both objects are run again. The semantic mapping object first creates a staging table from the import object with the added restriction that rows with load_end_datetime before Monday at 11am are to be excluded. Then, it uses the staging table to perform deduplication, deleting any rows in the existing semantic mapping table whose case field is the same as any row in the staging table. Lastly, it appends the staging table onto the bottom of the semantic mapping table, and recommits to memory the new "latest" value of load_end_datetime across the incrementally loaded rows.
It is important to understand the inherent limitations of incremental loading of data. The only means by which existing rows of data in the downstream table will be modified is via deletion during the deduplication step. This means that if a record has been removed from the upstream table, incremental load will not know to remove the record from the downstream table. In this case, users should opt for a full load to re-sync the downstream table with all the appropriate data.
Likewise, Object Workshop allows for sophisticated calculations that can span records within a table. Imagine a table with the grain size of one-row-per-claim, with a derived field calculating the total number of claims across the table for the patient referenced in the claim. New incremental data might arrive that should affect this calculation, even for claims that are not themselves in the new tranche of data. Because existing records will not be updated except for deduplication, those values will no longer be correct. In this case as well, users should opt for a full load to re-sync the downstream table with all the appropriate data.
Validation
Users can set up general-purpose and fully-customizable validation rules using the validation panel in Object Workshop. The rules are run in ELT post-processing, and can be run on any field in the object.
Validation rules take the form:
When <aggregation function> <optional filter clause> is <validation test> then throw an <error/warning>
Supported aggregation functions are
- Count of records
- Distinct count
- Sum, min, max
Supported validation tests are
- Is Zero or Is Not Zero
- Greater than or less than a value
- Greater than or less than a % of the total row count of the table
- Percentage change since last ELT
In other words, the aggregation function and the filter clause combine to generate a number, which might be the count of a certain type of record, or the maximum value of the field, or some other computed value. Then, the validation test decides if that number is valid, and if not, what severity of warning to throw.
For example,
- When <count of records> <where pat_age is less than 0> is <not zero> then throw an error
- When <the distinct count of plan IDs> (no filter) <has changed by more than 20 since the last ELT> then throw a warning
Validation rules cannot be added to import objects, bespoke objects, or Ursa Standard objects.
New objects will automatically start with the following two validation rules:
- When the count of records is zero then throw a warning
- When the count of records has changed by over 20 percent since the last ELT then throw a warning
Users can drill down from a validation warning or error in the ELT screen directly into Data Review on the failed object, with the pertinent filters applied.
The validation panel for semantic mapping objects allows users to create validation fields. Any derived field pattern can be used as a validation field, and these validation fields can be used as an input into a validation rule.
Validation fields and validation rules can be added to semantic mapping templates, from where they can be imported into semantic mapping objects via a button in the validation panel of the semantic mapping object. Only validation fields and validation rules that reference mapped fields will be inherited.
Validation for Bespoke SQL Objects
Users can provide custom validation criteria in bespoke SQL objects that will be checked once the object has been fully refreshed. The validation specifications take the form of any number of semicolon-delimited SQL queries. The queries should contain the following fields:
is_error: 1 if this error merits turning the ELT bar red and stopping the ELT process, 0 otherwise.
is_warning: 1 if this message merits turning the ELT bar orange. If both is_error and is_warning are set to 0, the message is considered purely informational. It will show up in the same place in the ELT bars, but it will not change the color of the bar.
summary_text: description of the general class of error.
detail_text: specifics of the particular error, with row-level detail.
There may be multiple detail texts for each summary text. In this case, the Admin is shown a list of the distinct summary texts on screen and can then drill down into their details.
Object Backup
Users can choose to have an object maintain a backup of its most-recently ELTed state. Backups can only be maintained on tables, not on views. Only one backup at a time is allowed, but users can elect to "pause" the backups by unclicking the "Store Backup on Next ELT" checkbox, which will keep the current backup active. Within the Object Backup panel users can see the name and timestamp of the current backup and can delete it. Backup tables can be browsed via the Data Review screen for each object.
Within the data review screen for objects with backups, users can elect to compare on a row-by-row basis the contents of the current measure to the backup measure. When triggered, four new tabs will be available within case review: (1) rows found in both the object and its backup with identical values across all comparable columns, (2) rows found in both the object and its backup with at least one difference in a comparable column, (3) rows that are in the object but not in the backup, and (4) rows that are only in the backup. The primary key of the object and its backup is used to determine the contents of the tab, and only objects with a primary key identified will be allowed to trigger the case comparison feature.
For the first two tabs, the backup columns will be shown immediately adjacent to the object columns of the same name, and will be highlighted if they have a different value. The "Not In Backup" tab will only show columns from the object, and the "Only In Backup" tab will only show columns from the backup. Filters on fields that are inapplicable to the current column set -- for example, filters on the object fields for the "Only In Backup" tab -- will be ignored.
Saved ELTs
Any ELT selection can be saved for future use. Note that what is saved is not the complete set of specific tables that are being run in the job but the set of instructions that the user has selected by means of the checkboxes. This makes the saved ELT more resilient to changes in the dependency tree. For example, if the saved ELT included all the downstream objects of a given object, then running that ELT later will re-compute the downstream objects of the object in the original instructions.
Users can rename, update or delete saved ELTs after creation.
ELT History
The status of the most recent ELTs are displayed in the Object Workshop dashboard, and users can drill down into the details of these jobs. From the detail screen, users can download a verbose server log of the ELT for troubleshooting. Data imports are also included in this list. Even though the mechanisms of imports are quite different on the server, they are presented in much the same way in the Object Workshop dashboard.
Users can see the ELT history of each object by drilling down into the object itself.