Exploring Data Vault and Ensemble Modeling
In November 2018, I attended the PASS Summit, where I sat in on a session introducing innovative data modeling techniques — specifically Data Vault. I had no idea what this was or how it worked, but after the session I had a much clearer understanding and decided to explore it further. Through some research, I discovered the Certified Data Vault Data Modeler certification offered by the Genesee Academy in Golden, Colorado.
Earlier this week, I had the pleasure of attending that training. Hans Hultgren, who has been involved with Data Vault since its inception and serves on the consortium that defines and maintains its standards, led the course. The training spans three days and is available both at the Golden campus and at scheduled local events.
What Is Ensemble Modeling?
Ensemble Modeling has been around for some time and is now gaining traction in the enterprise data warehousing world. In this approach, the Core Business Concepts (CBC) are identified first and used as the foundation for modeling the warehouse. Once these concepts are established, the Naturally Occurring Business Relationships (NBR) between them — along with the context and history that describe them — are defined.
Using the Enterprise Logical Modeling (ELM) approach, a facilitator collaborates with business stakeholders to define these concepts. By aligning the model directly to the business, the resulting data warehouse is better positioned to answer questions and support analytics.
Unified Decomposition
Enterprise modeling leverages a process called Unified Decomposition, which separates keys from associations (foreign keys) and from context into three specific constructs. This logical separation forms the foundation of the Data Vault model.
The Three Core Objects of Data Vault
The Data Vault model consists of three primary components:
- Hubs (blue) – Represent the core business concepts. Each hub uniquely defines instances of those concepts across the enterprise.
- Links (green) – Represent the business relationships connecting hubs to each other.
- Satellites (yellow) – Contain the context that describes each business concept (attributes, changes, and history).
The hub contains:
- An enterprise-wide business key (often generated via concatenation or sequence)
- Metadata such as
Load_DT_Stamp
andRecordSource
- A primary key (sequence ID)
The link table holds:
- A unique ID for the relationship
- References to the hubs it connects
- Metadata columns
The satellite stores:
- Descriptive attributes and history about the related hub
Load_DT_Stamp
for version tracking
This is where the data that would traditionally reside in dimensions and facts (in a dimensional model) now lives.
Advantages of the Data Vault Approach
The Data Vault offers several advantages over other data warehousing models:
- Business-driven & agile – Built directly around business concepts and easily adaptable to change.
- Incremental growth – New concepts and sources can be added with minimal re-engineering.
- Simplified ETL – Hubs, links, and satellites can be loaded in parallel for efficiency.
- Scalability – Growth simply means adding new hubs, links, or satellites.
- Flexibility – Supports integration with technologies like Hadoop or blockchain.
Because context is stored in satellites, changes are absorbed easily without disrupting core structures. This structure supports both enterprise-level analytics and data marts for more targeted reporting.
Loading the Vault
The Data Vault loading process is highly efficient. Once hubs are populated, links and satellites can be loaded concurrently:
- Hubs – If the record exists, no action is taken; otherwise, a new record is inserted.
- Links – Loaded in parallel to establish relationships.
- Satellites – Loaded in groups based on data characteristics like rate of change or source type.
This structure supports parallelism, scalability, and agility in development.
The Training Experience
The Genesee Academy training was an excellent experience. Hans Hultgren demonstrated deep expertise and enthusiasm for the technique. The course can be taken in small groups in Golden, Colorado, or at client sites. Paige Baltzan and Erin Lynch also assisted with instruction and logistics.
Upon successful completion, participants receive the Certified Data Vault Data Master (CDVDM) credential. As of now, there are approximately 1,450 certified masters worldwide.
References and Resources
Related Links
Source Material
- Data Vault Modeling Certification 2018, Hans Hultgren, 2018, v.38
- Modeling the Agile Data Warehouse with Data Vault, Hans Hultgren, 2018