dataproc serverless slow consume kafka topic

I use dataproc serverless with java API to read kafka topic. The topic has only 2 partitions.The topic receives 80 msg/sec.After reading messages I repartition to 100, transform the data then write to BQ.Since I repartition I see two stages one with 2 tasks to read data and a second one with 10 that transforms and writes the data.When havingstreaming.max.offsets.per.trigger=500streaming.min.offsets.per.trigger=100The task of reading is varying a lot in time between 7sec and 1min50.While the task that transforms an dwrites data takes around 10 sec.Any idea about why it's taking too much time to read data and how to optimize the code ?

Dataset<Row> dfr = spark                .readStream()                .format("org.apache.spark.sql.kafka010.KafkaSourceProvider")                .option("kafka.bootstrap.servers", kafkaServers)                .option("kafka.sasl.kerberos.service.name", "kafka")                .option("kafka.sasl.mechanism", "GSSAPI")                .option("kafka.security.protocol", "SASL_SSL")                .option("kafka.ssl.truststore.location", trustStoreName)                .option("kafka.ssl.truststore.password", truststorePassword)                .option("kafka.ssl.truststore.type", "JKS")                .option("startingOffsets", "latest")                .option("kafka.max.partition.fetch.bytes", "209715200")  // 200MB per partition                .option("kafka.fetch.max.bytes", "1048576000")            // 1000MB total                .option("subscribe", kafkaTopic)                .option("maxOffsetsPerTrigger", maxOffsets)                .option("minOffsetsPerTrigger", minOffsets)                .option("failOnDataLoss", "false")                .option("kafka.request.timeout.ms", 300000)                .option("kafka.session.timeout.ms", 60000)                .load();        Dataset<Row> dfr2 = dfr.selectExpr("CAST(topic as STRING) as topic", "CAST(key AS STRING) AS key","CAST(value AS STRING) AS xml","timestamp", "partition", "offset").repartition(10);        StructType outSchema = new StructType()                .add("key", DataTypes.StringType)                .add("topic", DataTypes.StringType)                .add("partition", DataTypes.IntegerType)                .add("offset", DataTypes.LongType)                .add("JSON_COL", DataTypes.StringType)                .add("DAT_MAJ_DWH", DataTypes.StringType);        // Create proper encoder - cast the result to Encoder<Row>        Encoder<Row> encoder = Encoders.row(outSchema);        Dataset<Row> jsonified = dfr2.mapPartitions(                (MapPartitionsFunction<Row, Row>) (Iterator<Row> it) -> {                    List<Row> out = new ArrayList<>();                    DocumentBuilder builder = XML_BUILDER.get();                    while (it.hasNext()) {                        Row r = it.next();                        String topic = r.getString(0);                        String key = r.getString(1);                        String xml = r.getString(2);                        int part = r.getInt(4);                        long offset = r.getLong(5);                        String json = null;                        try {                            builder.reset();                            // Pass the XML string, not the Document                            json = BusMessageXmlJson.toJson(xml);                        } catch (Exception ex) {                            System.err.println("XML parse error partition=" + part +" offset=" + offset +" msg=" + ex.getMessage());                        }                        String ts = r.getTimestamp(3).toInstant().toString();                        out.add(RowFactory.create(key, topic, part, offset, json, ts));                    }                    return out.iterator();                },                encoder        ).withColumn("DAT_MAJ_DWH",                date_format(to_timestamp(col("DAT_MAJ_DWH")), "yyyy-MM-dd'T'HH:mm:ss.SSSSSS")        ).select("key","topic","partition","offset","JSON_COL","DAT_MAJ_DWH");        StreamingQuery query = jsonified                .writeStream()                .queryName("spark-sdh-ndc-streaming-query")                .foreachBatch((batchDF, batchId) -> {                    // Write this batch to BigQuery using batch API                    batchDF.select("JSON_COL", "DAT_MAJ_DWH").write()                            .format("bigquery")                            .option("temporaryGcsBucket", tempBucket)                            .option("table", bigQueryTable)                            .option("createDisposition", "CREATE_IF_NEEDED")                            .option("intermediateFormat", "avro")                            .option("writeMethod", "indirect")                            .option("allowFieldAddition", "true")                            .option("allowFieldRelaxation", "true")                            .mode(SaveMode.Append)                            .save();                    // 2. Commit offsets to Kafka AFTER successful write                    commitOffsetsToKafka(batchDF, kafkaServers, trustStoreName, truststorePassword, consumerGroup);                    System.out.println("Batch " + batchId +" written successfully");                })                .option("checkpointLocation", checkpointPath)                .trigger(Trigger.ProcessingTime(triggerInterval))                .start();        System.out.println("Streaming query started successfully!");        System.out.println("Query ID: " + query.id());        System.out.println("Waiting for termination... (Ctrl+C to stop)");        query.awaitTermination();

dataproc serverless slow consume kafka topic

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...