Apache Spark: The Flavors of CI/CD

Credits:  Kesav Thakur
 
Understand under the hood… What are the possibility to achieve Continous delivery of Spark-based ETL Job?
 
In this article, I will do my best to cover two topics from all if/else perspective:

First One is certainly Apache Spark(JAVA, Scala, PySpark, SparklyR) or (EMR, Databricks)

Second One: Continous Integration and Delivery which is a Pipeline possibility using Job/Jenkins, Dockers/Kubernetes, Airflow with EMR/Databricks

Now, if you are continuing to read, Thanks for your interest and I will try to make it comprehensive for you. For this, Let’s get started with a design in which, I will fit Spark based ETL which needs a CI/CD Pipeline.

 Read more: Apache Spark : CI/CD

Comments