Aller au contenu

Databricks

De Banane Atomic

Description

Databricks combines a Data Lakehouse with Generative IA into a Data Intelligence Plateform.
Generative IA allows the usage of natural language to fetch data and allows to optimize storage and costs based on previous usages.
Erreur lors de la création de la vignette : /bin/bash: /usr/bin/convert: No such file or directory Error code: 127

Components

Delta Lake

The data lakehouse storage:

  • ACID transactions
  • Scalable data and metadata handling
  • Audit history and time travel
  • Schema enforcement and evolution
  • Streaming and batch data processing

History

1980 - Data warehouse Collect and store structured data to provide support for for refined analysis and reporting.
2000 - Data lake Collect and store raw data and conducting exploratory analysis
2021 - Data lakehouse Unified plateform that benefits of both data lakes and data warehouses solution
Aspect Data Warehouse Data Lake Data Lakehouse
Data Type Structured, processed, and refined data Raw data: structured, semi-structured, and unstructured Combines raw and processed data
Schema Schema-on-write: Data is structured before storage Schema-on-read: Structure applied when accessed Flexible: Schema-on-read for raw data; schema-on-write for structured data
Purpose Optimized for business intelligence (BI), reporting, and predefined analytics Designed for big data analytics, machine learning, and exploratory analysis Unified analytics platform for BI, AI/ML, streaming, and real-time analytics
Processing Approach ETL: Data is cleaned and transformed before storage ELT: Data is loaded first and transformed as needed Both ETL and ELT; enables real-time processing
Scalability Less scalable and more expensive to scale Highly scalable and cost-effective for large volumes of diverse data Combines scalability of lakes with performance optimization of warehouses
Users Business analysts and decision-makers Data scientists, engineers, and analysts BI teams, data scientists, engineers
Accessibility More rigid; changes to structure are complex Flexible; easy to update and adapt Highly adaptable; supports schema evolution
Security & Maturity Mature security measures; better suited for sensitive data Security measures evolving; risk of "data swamp" if not managed properly Strong governance with ACID transactions; improved reliability
Use Cases Operational reporting, dashboards, KPIs Predictive analytics, AI/ML models, real-time analytics Unified platform for BI dashboards, AI/ML workflows, streaming analytics