Data Engineering and Cloud Platform Knowledge Base🔗

This repository is where I keep track of what I’m learning and experimenting with across data engineering, distributed systems, and cloud platforms. It’s part study notes, part code lab, and part reference guide — essentially a place to capture concepts as I work through them and to revisit later when I need a refresher.

I focus on technologies like Apache Spark, Kafka/Redpanda, Databricks, AWS, and Azure, along with supporting tools and practices that are important for building reliable data systems. You’ll find a mix of summaries, deep dives into tricky concepts, performance tuning notes, and small experiments that test how things work in practice.

The main goal of this repo is to strengthen my own understanding, but I also hope it can be useful to anyone else navigating similar topics. Think of it as a learning log that balances hands-on exploration with professional best practices - something between a personal notebook and a practical guide.

📑 Data Engineering Knowledge Base🔗

Airflow🔗

Azure🔗

Data-formats🔗

Databricks🔗

Docs-deep-dive🔗

Databricks🔗

Scenarios🔗

Index

Adf🔗

Architectures

Databricks🔗

Kafka🔗

Spark🔗

Streaming🔗

Index

Data Engineering and Cloud Platform Knowledge Base🔗

📑 Data Engineering Knowledge Base🔗

Airflow🔗

Azure🔗

Data-formats🔗

Databricks🔗

Docs-deep-dive🔗

Databricks🔗

Scenarios🔗

Adf🔗

Databricks🔗

Kafka🔗

Spark🔗

Spark🔗

Streaming🔗

Architecture🔗

Flink🔗

Kafka🔗