Posts

Spark partitioning

Avro vs Parquet overview

Deploy serverless Spark jobs to AWS using GitHub Actions

Data Modeling - Why Data Engineers Need To Understand It - An Introduction To Data Engineering

Migrate a Parquet data lake to Delta Lake

Data architecture book review: Deciphering Data Architectures

Tuning Spark Optimization: A Guide to Efficiently Processing 1 TB Data

DataOps: the future of Data Engineering?

How to build Spark from source and deploy it to a Kubernetes cluster in 60 minutes

Creating a modern data platform

Dive Deeper into Data Engineering on Databricks

Latency goes subsecond in Apache Spark Structured Streaming Improving Offset Management in Project Lightspeed