• Blog
  • Podcast
  • Contact
  • Sign in
CloverDX Logo
Product
  • OVERVIEW
  • Discover CloverDX Data Integration Platform###Automate data pipelines, empower business users.
  • Deploy in Cloud
  • Deploy on Premise
  • Deploy on Docker
  • Plans & Pricing
  • Release Notes
  • Documentation
  • Customer Portal
  • More Resources
  • CAPABILITIES
  • Sources and Targets###Cloud and On-premise storage, Files, APIs, messages, legacy sources…
  • AI-enabled Transformations###Full code or no code, debugging, mapping
  • Automation & Orchestration###Full workflow management and robust operations
  • MDM & Data Stewardship###Reference data management
  • Manual Intervention###Manually review, edit and approve data
  • ROLES
  • Data Engineers###Automated Data Pipelines
  • Business Experts###Self-service & Collaboration
  • Data Stewards###MDM & Data Quality
clip-mini-card

 

Ask us anything!

We're here to walk you through how CloverDX can help you solve your data challenges.

 

Request a demo
Solutions
  • Solutions
  • On-Premise & Hybrid ETL###Flexible deployment & full control
  • Data Onboarding###Accelerate setup time for new data
  • Application Integration###Integrate operational data & systems
  • Replace Legacy Tooling###Modernize slow, unreliable or ad-hoc data processes
  • Self-Service Data Prep###Empower business users to do more
  • MDM & Data Stewardship###Give domain experts more power over data quality
  • Data Migration###Flexible, repeatable migrations - cloud, on-prem or hybrid
  • By Industry
  • SaaS
  • Healthcare & Insurance
  • FinTech
  • Government
  • Consultancy
zywave-3

How Zywave freed up engineer time by a third with automated data onboarding

Read case study
Services
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support

More efficient, streamlined data feeds

Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.

 

Read case study
Customers
  • By Use Case
  • Analytics and BI
  • Data Ingest
  • Data Integration
  • Data Migration
  • Data Quality
  • Data Warehousing
  • Digital Transformation
  • By Industry
  • App & Platform Providers
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • E-Commerce
  • FinTech
  • Government
  • Healthcare
  • Logistics
  • Manufacturing
  • Retail
Migrating data to Workday - case study
Case study

Effectively Migrating Legacy Data Into Workday

Read customer story
Company
  • About CloverDX
  • Our Story & Leadership
  • Contact Us
  • Partners
  • CloverDX Partners
  • Become a Partner
Pricing
Demo
Trial

A data vault, warehouse, lake, and hub explained

Data Architecture Data Warehouse
Posted September 21, 2017
4 min read
A data vault, warehouse, lake, and hub explained

Let’s cut right to the chase: you're reading this blog because you want expertise on storing data for your current data-driven business needs. Well, you've come to the right place! 

Throughout this post, we'll look at the definitions and sample use cases for data vaults, warehouses, lakes and hubs. The differences between them are subtle, but they all serve a different purpose in the data world today. 

Download the article as a pdf

Share it with colleagues. Print it as a booklet. Read it on the plane.

What is a Data Vault?

Data Vault Definition

A data vault is a system made up of a model, methodology and architecture that is specifically designed to solve a complete business problem as requirements change. So, as your business requirements morph over time, the data vault will maintain the historical system of reference or archive of your data and easily relate it to the new standard of data that you have defined. I like to think of the data vault as a customized, dynamic solution that gives business users access to all data (current and historical).

Data Vault Use Case

The biggest data vault use case is when a business, such a bank, needs to audit their data.

Let’s say you decide you need to update your security model to include additional fields and new applications in your enterprise. Using a data vault, you are able to checkpoint the time you made the security model changes and update your infrastructure with the changes, including all associated applications. This means the business team continues receiving the full view of historical and current information regarding the audit trail.

Data Warehouses, lakes, hubs, and vaults explained

Sample technologies used today: RDBMS, Redshift, Snowflake

What is a Data Warehouse?

Data Warehouse Definition

A data warehouse is a consolidated, structured repository for storing data assets. Data warehouses will store data in one of two ways: Star Schema or 3NF, but these are only fundamental principles in how you store your data model. We have seen, advised, and implemented both principles, but the one major flaw is that everything must be strictly defined (both in schema and integration).  

Your Guide to Enterprise Data Architecture   Data warehouses, lakes, vaults and more - explore the pros and cons of different options and learn when to use each one

Data Warehouse Use Case

The most common use case for creating and using a data warehouse is to consolidate data and answer a business-related question. This question may be, 'How many users are visiting my product pages from North America?' This ties together the information you're receiving from your end users with a business question that needs to be answered from a structured data set. This is what most would identify as the cookie cutter business intelligence solution.

Read more: Data Warehousing with CloverDX

Data Warehouses, lakes, hubs, and vaults explained

But, there is an alternative approach that is becoming more popular, especially when you are talking about cloud and more powerful warehouses.

Organizations are adopting the ELT approach. This entails “staging” their data in their warehouse (such as HP Vertica), and then letting the power of the database perform the traditional transformation. Essentially, you are performing the most expensive operations with a system where you have more resources.

Data Warehouses, lakes, hubs, and vaults explained

Sample technologies used today: RDBMS, Redshift, Snowflake, HP Vertica

How to get your data to your cloud data warehouse? There are several options. This clip is from our webinar on Data Ingestion into S3, Azure Blob, Redshift, Snowflake: What Are Your Options?

What is a Data Lake?

Data Lake Definition

A data lake is a term that represents a methodology of storing raw data in a single repository. The type of data that’s stored in the lake does not matter and could be unstructured, structured, semi-structured, or binary. The fundamental idea for a data lake is to make available any/all data from applications so your data team can provide insights on a business problem or value proposition.

But the challenge begins when you want to try to make sense of your data. If you are dumping data into a data lake, how do you know what data you need and what data you don’t need? How do you determine where the data resides in the lake? This very quickly can become a data swamp if not managed correctly.

Data Lake Use Case

The use cases we see for creating a data lake revolve around reporting, visualization, analytics, and machine learning.

Learn more about data lakes in our guide to enterprise data architecture

Here is the architecture we see evolving:

Data Warehouses, lakes, hubs, and vaults explained

Sample technologies used today: HDFS, S3, Azure data lake
Your Guide to Enterprise Data Architecture - How to Choose Which Is Right for You

What is a Data Hub?

Data Hub Definition

A data hub is a centralized system where data is stored, defined, and served from. We like to think of it as a hybrid of a data lake and a database warehouse, as it provides a central repository for your applications to dump data. It also adds a level of harmonization at ingest so the data is indexed and can easily be queried.

Please note that this is not the same as a data warehouse architecture, as the ETL processing is merely for indexing the data you have rather than mapping it into a strict structure. The challenge comes when you have to implement the data hub and how can you harmonize all of your siloed data sources.

Data Hub Use Case

In general, we see the same use cases for a data hub as we would for a data lake: reporting, visualization, analytics, and machine learning.

Data Warehouses, lakes, hubs, and vaults explained

Sample technologies used today: MarkLogic

Conclusion

Hopefully, you have learned a little bit about each of these data models, as well as their individual values in dealing with multi-structured data. 

At the end of the day, there is not one model or technology that's superior to the other. It varies for each use case.

This means that you must analyze your requirements, needs, and budget before deciding which approach to use. Technology is constantly evolving, and each of these models will evolve with it.

Discover more: Data Architecture

 

New call-to-action

Share

Facebook icon Twitter icon LinkedIn icon Email icon
Behind the Data  Learn how data leaders solve complex problems every day

Newsletter

Subscribe

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

Back to all articles
buying data integration software
Data Architecture
7 min read

Dos and don'ts when buying a data integration platform

Continue reading
Data architecture health check - do you have these symptoms?
Data Architecture
7 min read

Data architecture health check: Do you have these symptoms?

Continue reading
What is modern enterprise data architecture?
Data Architecture
5 min read

What is modern enterprise data architecture?

Continue reading
CloverDX logo
Book a demo
Get the free trial
  • Company
  • Our Story
  • Contact
  • Partners
  • Our Partners
  • Become a Partner
  • Product
  • Platform Overview
  • Plans & Pricing
  • Customers
  • By Use Case
  • By Industry
  • Deployment
  • AWS
  • Azure
  • Google Cloud
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Behind the Data Podcast
  • Tech Blog
  • CloverDX Marketplace
  • Other resources
Blog
The vital importance of data governance in the age of AI
Data Governance
Bringing a human perspective to data integration, mapping and AI
Data Integration
How AI is shaping the future of data integration
Data Integration
How to say ‘yes’ to all types of data and embark on a data-driven transformation journey
Data Ingest
© 2025 CloverDX. All rights reserved.
  • info@cloverdx.com
  • sales@cloverdx.com
  • ●
  • Legal
  • Privacy Policy
  • Cookie Policy
  • EULA
  • Support Policy