- 01 Nov 2022
- 2 Minutes to read
- Print
- DarkLight
Create data exploration objects
- Updated on 01 Nov 2022
- 2 Minutes to read
- Print
- DarkLight
Background and Strategy
More than half of the hours of developer time in a typical integration are spent reviewing fine-grained "case-level" data. In the earlier stages of an integration, the purpose of the review is to understand the source data; in the later stages, the purpose is to validate the results yielded by the integration logic.
Much of this case review can be done directly in the objects created to perform the integration -- i.e., using the Case Review feature in each object -- starting with the Import and Registered Table objects. However, circumstances often call for more power and flexibility than the original object's Case Review can afford. In scenarios like these, dedicated data exploration objects should be used to perform the more demanding investigations.
The solution is simple: create one or more dedicated objects for each integration whose sole purpose is to perform -- and potentially memorialize -- important investigations into the source data. As the integration proceeds, these objects can be cloned and reconfigured easily.
Once the integration is complete, these objects can be archived; alternatively, they can be labelled and configured to remain in place, but in the background, until a question arises that might benefit from a review of their contents.
Key Diagnostics / Heuristics
Does the data exploration require examining the relationship between two or more objects? If so, you'll need to use a data exploration object since Case Review doesn't allow joins.
Is Case Review timing out due to an especially complicated sequence of derived fields, filters, or sorting? You can replicate whatever Case Review logic you're attempting to perform in a new object, which can be run in an ELT (i.e., without any timeout).
Is Case Review timing out on even simple filter and/or sort configurations due to its enormous record count? You can use a data exploration object to create a much smaller sample of the enormous object by applying some appropriate restriction patterns.
Detailed Implementation Guidance
Select the Prevent Passive ELT checkbox (found in the Access and Use Restrictions panel of all objects) to prevent data exploration objects from being inadvertently caught up in an ELT created using the Run This and Downstream option.
For the most part, just using a Single Stack object will provide all the configurabilty needed. In some scenarios, when the desired exploration might require the union of several objects, an Integrator object might be better. (It can also be convenient to create one of each.)
When working in Ursa Studio with others, it can be a good idea to identify the data exploration objects that you will be using with your initials, or some other personal identifier, and a name that cannot be confused with other objects. Ideally, there would be some platform-wide naming convention for these objects, such as "Test [Source ID] [developer initials] 001" (with table name test_[Source ID]_[developer initials]_001.)
Similarly, assigning data exploration objects to the Data Diagnostics layer will make sure they are not visible to most other objects, so they won't be referenced inadvertently.