Understanding the Issue When using Spark Structured Streaming with Kafka, the startingOffset option allows you to specify where Spark should start reading data. The two options are "latest", which reads only new data, and "earliest", which reads from the beginning of the topic. However, some users have reported that Spark does not always honor the "earliest" setting. Instead of starting from the beginning, Spark starts reading from the latest offset, leading to data loss.
Read more : https://saturncloud.io/blog/spark-structured-streaming-with-kafka-understanding-the-startingoffsetearliest-issue/