« Databricks » : différence entre les versions
Apparence
Aucun résumé des modifications |
Aucun résumé des modifications |
||
Ligne 13 : | Ligne 13 : | ||
! Data Warehouse | ! Data Warehouse | ||
! Data Lake | ! Data Lake | ||
! Data Lakehouse | |||
|- | |- | ||
| Data Type || | | Data Type || Structured, processed, and refined data || Raw data: structured, semi-structured, and unstructured || Combines raw and processed data | ||
|- | |- | ||
| Schema || Schema-on-write: Data is structured | | Schema || Schema-on-write: Data is structured before storage || Schema-on-read: Structure applied when accessed || Flexible: Schema-on-read for raw data; schema-on-write for structured data | ||
|- | |- | ||
| Purpose || Optimized for business intelligence (BI), reporting, and predefined analytics || Designed for big data analytics, machine learning, and exploratory analysis | | Purpose || Optimized for business intelligence (BI), reporting, and predefined analytics || Designed for big data analytics, machine learning, and exploratory analysis || Unified analytics platform for BI, AI/ML, streaming, and real-time analytics | ||
|- | |- | ||
| Processing Approach || | | Processing Approach || ETL: Data is cleaned and transformed before storage || ELT: Data is loaded first and transformed as needed || Both ETL and ELT; enables real-time processing | ||
|- | |- | ||
| Scalability || Less scalable and more expensive to scale || Highly scalable and cost-effective for large volumes of diverse data | | Scalability || Less scalable and more expensive to scale || Highly scalable and cost-effective for large volumes of diverse data || Combines scalability of lakes with performance optimization of warehouses | ||
|- | |- | ||
| Users || Business analysts and decision-makers || Data scientists, engineers, and analysts | | Users || Business analysts and decision-makers || Data scientists, engineers, and analysts || BI teams, data scientists, engineers | ||
|- | |- | ||
| Accessibility || More rigid; changes to structure are complex || Flexible; easy to update and adapt | | Accessibility || More rigid; changes to structure are complex || Flexible; easy to update and adapt || Highly adaptable; supports schema evolution | ||
|- | |- | ||
| Security & Maturity || Mature security measures; better suited for sensitive data || Security measures | | Security & Maturity || Mature security measures; better suited for sensitive data || Security measures evolving; risk of "data swamp" if not managed properly || Strong governance with ACID transactions; improved reliability | ||
|- | |- | ||
| Use Cases || Operational reporting, dashboards, KPIs || Predictive analytics, AI/ML models, real-time analytics | | Use Cases || Operational reporting, dashboards, KPIs || Predictive analytics, AI/ML models, real-time analytics || Unified platform for BI dashboards, AI/ML workflows, streaming analytics | ||
|} | |} | ||
Version du 8 avril 2025 à 14:40
Evolution
1980 - Data warehouse | Collect and store structured data to provide support for for refined analysis and reporting. |
2000 - Data lake | Collect and store raw data and conducting exploratory analysis |
2021 - Data lakehouse | Unified plateform that benefits of both data lakes and data warehouses solution |
Aspect | Data Warehouse | Data Lake | Data Lakehouse |
---|---|---|---|
Data Type | Structured, processed, and refined data | Raw data: structured, semi-structured, and unstructured | Combines raw and processed data |
Schema | Schema-on-write: Data is structured before storage | Schema-on-read: Structure applied when accessed | Flexible: Schema-on-read for raw data; schema-on-write for structured data |
Purpose | Optimized for business intelligence (BI), reporting, and predefined analytics | Designed for big data analytics, machine learning, and exploratory analysis | Unified analytics platform for BI, AI/ML, streaming, and real-time analytics |
Processing Approach | ETL: Data is cleaned and transformed before storage | ELT: Data is loaded first and transformed as needed | Both ETL and ELT; enables real-time processing |
Scalability | Less scalable and more expensive to scale | Highly scalable and cost-effective for large volumes of diverse data | Combines scalability of lakes with performance optimization of warehouses |
Users | Business analysts and decision-makers | Data scientists, engineers, and analysts | BI teams, data scientists, engineers |
Accessibility | More rigid; changes to structure are complex | Flexible; easy to update and adapt | Highly adaptable; supports schema evolution |
Security & Maturity | Mature security measures; better suited for sensitive data | Security measures evolving; risk of "data swamp" if not managed properly | Strong governance with ACID transactions; improved reliability |
Use Cases | Operational reporting, dashboards, KPIs | Predictive analytics, AI/ML models, real-time analytics | Unified platform for BI dashboards, AI/ML workflows, streaming analytics |
Data warehouse 1980
Collect and store structured data to provide support for for refined analysis and reporting.
Pros | Cons |
---|---|
Business Intelligence | Struggle with volume and velocity upticks |
Analytics | Long processing time |
Structured data | No support for semi-structured and unstructured data |
Predefined schemas | Inflexible schemas |
Data lake 2000
Collect and store raw data and conducting exploratory analysis.
Pros | Cons |
---|---|
Flexible data storage | Poor data reliability |
Streaming support | No transactional support |
Fast and cost efficient storage in the cloud | Slow analysis performance |
Support for IA and ML | Data governance concerns (security, privacy) |
Lake house
Pros | Cons |
---|---|
Flexible data storage | Poor data reliability |
Streaming support | No transactional support |
Fast and cost efficient storage in the cloud | Slow analysis performance |
Support for IA and ML | Data governance concerns (security, privacy) |