← Modification précédente Modification suivante →

Version du 8 avril 2025 à 14:43

History

1980 - Data warehouse	Collect and store structured data to provide support for for refined analysis and reporting.
2000 - Data lake	Collect and store raw data and conducting exploratory analysis
2021 - Data lakehouse	Unified plateform that benefits of both data lakes and data warehouses solution

Aspect	Data Warehouse	Data Lake	Data Lakehouse
Data Type	Structured, processed, and refined data	Raw data: structured, semi-structured, and unstructured	Combines raw and processed data
Schema	Schema-on-write: Data is structured before storage	Schema-on-read: Structure applied when accessed	Flexible: Schema-on-read for raw data; schema-on-write for structured data
Purpose	Optimized for business intelligence (BI), reporting, and predefined analytics	Designed for big data analytics, machine learning, and exploratory analysis	Unified analytics platform for BI, AI/ML, streaming, and real-time analytics
Processing Approach	ETL: Data is cleaned and transformed before storage	ELT: Data is loaded first and transformed as needed	Both ETL and ELT; enables real-time processing
Scalability	Less scalable and more expensive to scale	Highly scalable and cost-effective for large volumes of diverse data	Combines scalability of lakes with performance optimization of warehouses
Users	Business analysts and decision-makers	Data scientists, engineers, and analysts	BI teams, data scientists, engineers
Accessibility	More rigid; changes to structure are complex	Flexible; easy to update and adapt	Highly adaptable; supports schema evolution
Security & Maturity	Mature security measures; better suited for sensitive data	Security measures evolving; risk of "data swamp" if not managed properly	Strong governance with ACID transactions; improved reliability
Use Cases	Operational reporting, dashboards, KPIs	Predictive analytics, AI/ML models, real-time analytics	Unified platform for BI dashboards, AI/ML workflows, streaming analytics

@@ Ligne 1 : / Ligne 1 : @@
-= Evolution =
+= History =
 {| class="wikitable wtp"
 |-
@@ Ligne 32 : / Ligne 32 : @@
 |-
 | Use Cases || Operational reporting, dashboards, KPIs || Predictive analytics, AI/ML models, real-time analytics || Unified platform for BI dashboards, AI/ML workflows, streaming analytics
-|}
-== Data warehouse 1980 ==
-Collect and store structured data to provide support for for refined analysis and reporting.
-{| class="wikitable wtp"
-! Pros
-! Cons
-|-
-| Business Intelligence || Struggle with volume and velocity upticks
-|-
-| Analytics || Long processing time
-|-
-| Structured data || No support for semi-structured and unstructured data
-|-
-| Predefined schemas || Inflexible schemas
-|}
-== Data lake 2000 ==
-Collect and store raw data and conducting exploratory analysis.
-{| class="wikitable wtp"
-! Pros
-! Cons
-|-
-| Flexible data storage || Poor data reliability
-|-
-| Streaming support || No transactional support
-|-
-| Fast and cost efficient storage in the cloud || Slow analysis performance
-|-
-| Support for IA and ML || Data governance concerns (security, privacy)
-|}
-== Lake house ==
-{| class="wikitable wtp"
-! Pros
-! Cons
-|-
-| Flexible data storage || Poor data reliability
-|-
-| Streaming support || No transactional support
-|-
-| Fast and cost efficient storage in the cloud || Slow analysis performance
-|-
-| Support for IA and ML || Data governance concerns (security, privacy)
 |}