Aller au contenu

« Databricks » : différence entre les versions

De Banane Atomic
Aucun résumé des modifications
Aucun résumé des modifications
Ligne 1 : Ligne 1 :
= Evolution =
= History =
{| class="wikitable wtp"  
{| class="wikitable wtp"  
|-
|-
Ligne 32 : Ligne 32 :
|-
|-
| Use Cases || Operational reporting, dashboards, KPIs || Predictive analytics, AI/ML models, real-time analytics || Unified platform for BI dashboards, AI/ML workflows, streaming analytics
| Use Cases || Operational reporting, dashboards, KPIs || Predictive analytics, AI/ML models, real-time analytics || Unified platform for BI dashboards, AI/ML workflows, streaming analytics
|}
== Data warehouse 1980 ==
Collect and store structured data to provide support for for refined analysis and reporting.
{| class="wikitable wtp"
! Pros
! Cons
|-
| Business Intelligence || Struggle with volume and velocity upticks
|-
| Analytics || Long processing time
|-
| Structured data || No support for semi-structured and unstructured data
|-
| Predefined schemas || Inflexible schemas
|}
== Data lake 2000 ==
Collect and store raw data and conducting exploratory analysis.
{| class="wikitable wtp"
! Pros
! Cons
|-
| Flexible data storage || Poor data reliability
|-
| Streaming support || No transactional support
|-
| Fast and cost efficient storage in the cloud || Slow analysis performance
|-
| Support for IA and ML || Data governance concerns (security, privacy)
|}
== Lake house ==
{| class="wikitable wtp"
! Pros
! Cons
|-
| Flexible data storage || Poor data reliability
|-
| Streaming support || No transactional support
|-
| Fast and cost efficient storage in the cloud || Slow analysis performance
|-
| Support for IA and ML || Data governance concerns (security, privacy)
|}
|}

Version du 8 avril 2025 à 14:43

History

1980 - Data warehouse Collect and store structured data to provide support for for refined analysis and reporting.
2000 - Data lake Collect and store raw data and conducting exploratory analysis
2021 - Data lakehouse Unified plateform that benefits of both data lakes and data warehouses solution
Aspect Data Warehouse Data Lake Data Lakehouse
Data Type Structured, processed, and refined data Raw data: structured, semi-structured, and unstructured Combines raw and processed data
Schema Schema-on-write: Data is structured before storage Schema-on-read: Structure applied when accessed Flexible: Schema-on-read for raw data; schema-on-write for structured data
Purpose Optimized for business intelligence (BI), reporting, and predefined analytics Designed for big data analytics, machine learning, and exploratory analysis Unified analytics platform for BI, AI/ML, streaming, and real-time analytics
Processing Approach ETL: Data is cleaned and transformed before storage ELT: Data is loaded first and transformed as needed Both ETL and ELT; enables real-time processing
Scalability Less scalable and more expensive to scale Highly scalable and cost-effective for large volumes of diverse data Combines scalability of lakes with performance optimization of warehouses
Users Business analysts and decision-makers Data scientists, engineers, and analysts BI teams, data scientists, engineers
Accessibility More rigid; changes to structure are complex Flexible; easy to update and adapt Highly adaptable; supports schema evolution
Security & Maturity Mature security measures; better suited for sensitive data Security measures evolving; risk of "data swamp" if not managed properly Strong governance with ACID transactions; improved reliability
Use Cases Operational reporting, dashboards, KPIs Predictive analytics, AI/ML models, real-time analytics Unified platform for BI dashboards, AI/ML workflows, streaming analytics