Evolution
1980 - Data warehouse |
Collect and store structured data to provide support for for refined analysis and reporting.
|
2000 - Data lake |
Collect and store raw data and conducting exploratory analysis
|
2021 - Data lakehouse |
Unified plateform that benefits of both data lakes and data warehouses solution
|
Aspect
|
Data Warehouse
|
Data Lake
|
Data Lakehouse
|
Data Type |
Structured, processed, and refined data |
Raw data: structured, semi-structured, and unstructured |
Combines raw and processed data
|
Schema |
Schema-on-write: Data is structured before storage |
Schema-on-read: Structure applied when accessed |
Flexible: Schema-on-read for raw data; schema-on-write for structured data
|
Purpose |
Optimized for business intelligence (BI), reporting, and predefined analytics |
Designed for big data analytics, machine learning, and exploratory analysis |
Unified analytics platform for BI, AI/ML, streaming, and real-time analytics
|
Processing Approach |
ETL: Data is cleaned and transformed before storage |
ELT: Data is loaded first and transformed as needed |
Both ETL and ELT; enables real-time processing
|
Scalability |
Less scalable and more expensive to scale |
Highly scalable and cost-effective for large volumes of diverse data |
Combines scalability of lakes with performance optimization of warehouses
|
Users |
Business analysts and decision-makers |
Data scientists, engineers, and analysts |
BI teams, data scientists, engineers
|
Accessibility |
More rigid; changes to structure are complex |
Flexible; easy to update and adapt |
Highly adaptable; supports schema evolution
|
Security & Maturity |
Mature security measures; better suited for sensitive data |
Security measures evolving; risk of "data swamp" if not managed properly |
Strong governance with ACID transactions; improved reliability
|
Use Cases |
Operational reporting, dashboards, KPIs |
Predictive analytics, AI/ML models, real-time analytics |
Unified platform for BI dashboards, AI/ML workflows, streaming analytics
|
Data warehouse 1980
Collect and store structured data to provide support for for refined analysis and reporting.
Pros
|
Cons
|
Business Intelligence |
Struggle with volume and velocity upticks
|
Analytics |
Long processing time
|
Structured data |
No support for semi-structured and unstructured data
|
Predefined schemas |
Inflexible schemas
|
Data lake 2000
Collect and store raw data and conducting exploratory analysis.
Pros
|
Cons
|
Flexible data storage |
Poor data reliability
|
Streaming support |
No transactional support
|
Fast and cost efficient storage in the cloud |
Slow analysis performance
|
Support for IA and ML |
Data governance concerns (security, privacy)
|
Lake house
Pros
|
Cons
|
Flexible data storage |
Poor data reliability
|
Streaming support |
No transactional support
|
Fast and cost efficient storage in the cloud |
Slow analysis performance
|
Support for IA and ML |
Data governance concerns (security, privacy)
|