Posts

PySpark Kafka Stream: How to test it ?

Spark partitioning vs bucketing partitionsby vs bucketby

Spark partitioning

Avro vs Parquet overview

Deploy serverless Spark jobs to AWS using GitHub Actions

Data Modeling - Why Data Engineers Need To Understand It - An Introduction To Data Engineering

Migrate a Parquet data lake to Delta Lake

Data architecture book review: Deciphering Data Architectures

Tuning Spark Optimization: A Guide to Efficiently Processing 1 TB Data

DataOps: the future of Data Engineering?

How to build Spark from source and deploy it to a Kubernetes cluster in 60 minutes

Creating a modern data platform