Mining Data Vault Loading Views
I first heard about Data Vault (DV) in late 2011. As I learned about DV, I quickly realized its benefits in the realm of Data Warehousing (DW), however it was not until early 2012, after I had the opportunity to attend Dan Linsted’s 3-day DV “Boot camp” class, when I realized it’s true power. DV is flexible, scalable, auditable pattern-based data warehousing methodology that by design solves a myriad of “traditional” DW issues. More information on DV is available here as well as on LearnDataVault.com.
In a recent Data Vault (DV) implementation, we utilized views to encapsulate the logic for the core DV loading. The main benefit of this implementation was that the processing shifted from the Extract Transform Load (ETL) tool / engine to the Database (DB) engine. In addition, it simplified the ETL jobs design and development. Another benefit I “discovered” was that, when coupled with DB Metadata schema (and providing that DB & ETL naming conventions were utilized consistently), the DV Load View implementation enables extracting a great deal of useful data about the DV relations, dependencies, Source-To-Target (STT) mapping, the DV model, and even the related ETL jobs.
In the DV implementation described above, the DB engine was ORACLE, therefore I used ORACLE’s Metadata objects to extract the following data elements:
- DV Load View Name
- Staging Table Name (source)
- DV Table Name (target)
- Lookup Table Name(s)
- ETL Job Name (derived)
- DV Load View SQL
My initial goal was to be able to generate STT mapping documentation / specification to give to the ETL developers and later on to pass on as part of the Knowledge Transfer (KT) documentation to the Production Support team. Once I started looking at the data my query returned, I realized that there are even more applications for this data that I initially thought of. To name a few:
- Source-To-Target Mapping
- Finding Dependent Tables for a given Load View
- Validating DB Naming Conventions were properly used
- Validating Lookup Tables were properly referenced
- Finding which ETL Job(s) loads a given DV Target Table
- Finding which DV Target Tables are loaded by a given Source (Staging Table)
Having such query built can be beneficial when joining in a new project where little or no documentation about the DV implementation is available. With few small tweaks, the same logic can be used to “reverse engineer” the whole DV model’s STT Mapping. Another application for this query would be to assist ETL code generation (in conjunction with ETL code templates) as it already has the majority of the ETL job related data elements (source name, target name, job name etv) needed to generate a new job.
« Back to blog