Aller au contenu

Databricks

De Banane Atomic

History

1980 - Data warehouse Collect and store structured data to provide support for for refined analysis and reporting.
2000 - Data lake Collect and store raw data and conducting exploratory analysis
2021 - Data lakehouse Unified plateform that benefits of both data lakes and data warehouses solution
Aspect Data Warehouse Data Lake Data Lakehouse
Data Type Structured, processed, and refined data Raw data: structured, semi-structured, and unstructured Combines raw and processed data
Schema Schema-on-write: Data is structured before storage Schema-on-read: Structure applied when accessed Flexible: Schema-on-read for raw data; schema-on-write for structured data
Purpose Optimized for business intelligence (BI), reporting, and predefined analytics Designed for big data analytics, machine learning, and exploratory analysis Unified analytics platform for BI, AI/ML, streaming, and real-time analytics
Processing Approach ETL: Data is cleaned and transformed before storage ELT: Data is loaded first and transformed as needed Both ETL and ELT; enables real-time processing
Scalability Less scalable and more expensive to scale Highly scalable and cost-effective for large volumes of diverse data Combines scalability of lakes with performance optimization of warehouses
Users Business analysts and decision-makers Data scientists, engineers, and analysts BI teams, data scientists, engineers
Accessibility More rigid; changes to structure are complex Flexible; easy to update and adapt Highly adaptable; supports schema evolution
Security & Maturity Mature security measures; better suited for sensitive data Security measures evolving; risk of "data swamp" if not managed properly Strong governance with ACID transactions; improved reliability
Use Cases Operational reporting, dashboards, KPIs Predictive analytics, AI/ML models, real-time analytics Unified platform for BI dashboards, AI/ML workflows, streaming analytics