{"id":21820,"date":"2026-06-27T10:00:41","date_gmt":"2026-06-27T04:30:41","guid":{"rendered":"https:\/\/www.placementpreparation.io\/blog\/?p=21820"},"modified":"2026-07-03T15:56:19","modified_gmt":"2026-07-03T10:26:19","slug":"kafka-interview-questions-for-freshers","status":"publish","type":"post","link":"https:\/\/www.placementpreparation.io\/blog\/kafka-interview-questions-for-freshers\/","title":{"rendered":"Top Kafka Interview Questions for Freshers"},"content":{"rendered":"<?xml encoding=\"utf-8\" ?><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>Key Takeaways<\/strong><\/p>\n<p>In this article, we will learn about:<\/p>\n<ul>\n<li>Core Kafka concepts like topics, partitions, brokers, producers, consumers, offsets, and consumer groups.<\/li>\n<li>Kafka architecture and how data flows inside a Kafka-based system.<\/li>\n<li>Important Kafka interview questions for freshers and Java developers.<\/li>\n<li>Kafka producer and consumer behaviour, message ordering, replication, and fault tolerance.<\/li>\n<li>Kafka use cases in data engineering, real-time analytics, event streaming, and microservices.<\/li>\n<li>Kafka with Java and Spring Boot concepts commonly asked in interviews.<\/li>\n<li>Scenario-based Kafka questions related to message loss, lag, duplicate messages, partitions, and performance issues.<\/li>\n<\/ul>\n<\/div><\/div><p>Kafka is a key skill for freshers and professionals preparing for data engineering, backend development, Java development, DevOps, and real-time application roles.<\/p><p>Apache Kafka is used for building high-performance data pipelines, streaming analytics, and event-driven systems, and more than <a href=\"https:\/\/kafka.apache.org\/powered-by\/\" rel=\"nofollow noopener\" target=\"_blank\">80% of Fortune 100 companies use Kafka<\/a>.<\/p><p>This article covers practical Kafka interview questions, including basic concepts, architecture, producers, consumers, topics, partitions, offsets, Kafka with Java\/<a href=\"https:\/\/www.placementpreparation.io\/blog\/spring-boot-interview-questions-for-freshers\/\">Spring Boot<\/a>, and scenario-based questions.<\/p><p><a href=\"https:\/\/www.placementpreparation.io\/mock-test\/?utm_source=placement_preparation&amp;utm_medium=blog_banner&amp;utm_campaign=kafka_interview_questions_for_freshers_horizontal\"><img decoding=\"async\" class=\"alignnone wp-image-21216 size-full\" src=\"https:\/\/www.placementpreparation.io\/blog\/wp-content\/uploads\/2026\/06\/mock-test-horizontal-banner-placement-success.webp\" alt=\"mock test horizontal banner placement success\" width=\"1135\" height=\"300\" srcset=\"https:\/\/www.placementpreparation.io\/blog\/wp-content\/uploads\/2026\/06\/mock-test-horizontal-banner-placement-success.webp 1135w, https:\/\/www.placementpreparation.io\/blog\/wp-content\/uploads\/2026\/06\/mock-test-horizontal-banner-placement-success-300x79.webp 300w, https:\/\/www.placementpreparation.io\/blog\/wp-content\/uploads\/2026\/06\/mock-test-horizontal-banner-placement-success-1024x271.webp 1024w, https:\/\/www.placementpreparation.io\/blog\/wp-content\/uploads\/2026\/06\/mock-test-horizontal-banner-placement-success-768x203.webp 768w, https:\/\/www.placementpreparation.io\/blog\/wp-content\/uploads\/2026\/06\/mock-test-horizontal-banner-placement-success-150x40.webp 150w\" sizes=\"(max-width: 1135px) 100vw, 1135px\"><\/a><\/p><h2>Beginner Kafka Interview Questions<\/h2><p>Here are the Kafka basic interview questions freshers should prepare before moving into architecture and scenario-based topics.<\/p><p>These Kafka questions for interview cover topics, brokers, producers, consumers, partitions, offsets, replication, and the basic message flow in Kafka.<\/p><h3>1. Why is Kafka used in modern applications?<\/h3><p>Kafka is used to move large amounts of data between systems in real time. It is commonly used when applications need fast, reliable, and scalable communication.<\/p><p>For example, an e-commerce platform may use Kafka to send order events to inventory, payment, notification, and analytics systems.<\/p><p>Kafka is useful because it supports:<\/p><ul>\n<li>High-throughput messaging<\/li>\n<li>Real-time data streaming<\/li>\n<li>Fault tolerance<\/li>\n<li>Event-driven architecture<\/li>\n<li>Decoupling between services<\/li>\n<\/ul><p>Instead of one service directly calling many other services, it can publish an event to Kafka, and other systems can consume it independently.<\/p><h3>2. Explain the role of a Kafka topic.<\/h3><p>A Kafka topic is a logical category where messages are stored. Producers write messages to topics, and consumers read messages from topics.<\/p><p><strong>For example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Topic: payment-events<br>\nMessages: payment-created, payment-success, payment-failed<\/p>\n<\/div><\/div><p>A topic helps organize data based on business purpose. Common examples include:<\/p><ul>\n<li><strong>order-events<\/strong><\/li>\n<li><strong>user-activity<\/strong><\/li>\n<li><strong>payment-events<\/strong><\/li>\n<li><strong>application-logs<\/strong><\/li>\n<\/ul><p>Topics are split into partitions for scalability. Multiple consumers can read from the same topic, making Kafka useful for real-time systems, analytics, and data engineering pipelines.<\/p><h3>3. What is the purpose of a Kafka broker?<\/h3><p>A Kafka broker is a server that stores and manages Kafka data. A Kafka cluster usually has multiple brokers to handle large data volumes and provide fault tolerance.<\/p><p>A broker is responsible for:<\/p><ul>\n<li>Receiving messages from producers<\/li>\n<li>Storing messages in topic partitions<\/li>\n<li>Serving messages to consumers<\/li>\n<li>Managing partition replicas<\/li>\n<li>Coordinating with other brokers<\/li>\n<\/ul><p>For example, if a Kafka cluster has three brokers, topic partitions may be distributed across all three. This improves scalability because the load is shared across multiple servers.<\/p><p>In simple terms, brokers are the machines that make Kafka run.<\/p><h3>4. How does a Kafka producer work?<\/h3><p>A Kafka producer is an application that sends messages to a Kafka topic. For example, an order service can produce an event whenever a new order is placed.<\/p><p>Producer flow:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>Application<\/strong> &rarr; <strong>Producer<\/strong> &rarr; <strong>Kafka Topic<\/strong> &rarr; <strong>Partition<\/strong><\/p>\n<\/div><\/div><p>A producer can decide which topic to send data to. It can also use a key, such as <strong>userId<\/strong> or <strong>orderId,<\/strong> to decide which partition receives the message.<\/p><p>Producers are commonly used in:<\/p><ul>\n<li>Order systems<\/li>\n<li>Payment systems<\/li>\n<li>Log collection<\/li>\n<li>IoT data streaming<\/li>\n<li>Real-time analytics<\/li>\n<\/ul><p>A good producer configuration helps improve performance, reliability, and message delivery.<\/p><h3>5. How does a Kafka consumer work?<\/h3><p>A Kafka consumer reads messages from one or more Kafka topics. For example, a notification service may consume order events and send SMS or email updates.<\/p><p>Consumer flow:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Kafka Topic &rarr; Consumer &rarr; Application Logic<\/p>\n<\/div><\/div><p>A consumer reads data from partitions using offsets. The offset tells Kafka which messages have already been read.<\/p><p>Consumers are useful because different systems can read the same event for different purposes. For example, one consumer may update analytics, while another may update inventory.<\/p><p>This makes Kafka suitable for event-driven systems where many services react to the same data independently.<\/p><h3>6. What is a Kafka partition?<\/h3><p>A partition is a smaller division of a Kafka topic. Kafka splits topics into partitions to improve scalability and parallel processing.<\/p><p><strong>For example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Topic: order-events<br>\nPartitions: P0, P1, P2<\/p>\n<\/div><\/div><p>Messages are distributed across partitions. If a key is provided, Kafka sends messages with the same key to the same partition. This helps maintain ordering for related messages.<\/p><p>Partitions are important because:<\/p><ul>\n<li>They improve throughput<\/li>\n<li>They allow parallel consumption<\/li>\n<li>They support scalability<\/li>\n<li>They help distribute data across brokers<\/li>\n<\/ul><p>Without partitions, a topic would be limited in how much data it can process at once.<\/p><h3>7. What is an offset in Kafka?<\/h3><p>An offset is a unique number assigned to each message inside a partition. It represents the position of a message in that partition.<\/p><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Partition 0: offset 0, offset 1, offset 2<\/p>\n<\/div><\/div><p>Consumers use offsets to track which messages they have already read. If a consumer stops and restarts, it can continue from the last committed offset.<\/p><p>Offsets are maintained per partition, not globally across the topic.<\/p><h3>8. What is a consumer group in Kafka?<\/h3><p>A consumer group is a group of consumers that work together to read data from a topic. Kafka assigns partitions among consumers in the same group.<\/p><p>For example, if a topic has 4 partitions and a consumer group has 2 consumers:<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Consumer<\/b><\/td>\n<td><b>Assigned Partitions<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Consumer 1<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Partition 0, Partition 1<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Consumer 2<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Partition 2, Partition 3<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>This allows parallel processing.<\/p><p>Important rule: within one consumer group, a partition is consumed by only one consumer at a time.<\/p><p>Consumer groups are useful for scaling applications. If more processing power is needed, more consumers can be added, as long as there are enough partitions.<\/p><h3>9. What is the difference between Kafka and a traditional message queue?<\/h3><p>Kafka and traditional message queues both move messages, but they work differently.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Feature<\/b><\/td>\n<td><b>Traditional Queue<\/b><\/td>\n<td><b>Kafka<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Message storage<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Often removed after consumption<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Stored based on retention<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Consumers<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Usually one consumer per message<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Multiple consumer groups can read same data<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Use case<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Task distribution<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Event streaming and data pipelines<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Replay support<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Limited<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Strong replay support<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Scalability<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Depends on tool<\/span><\/td>\n<td><span style=\"font-weight: 400;\">High through partitions<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>Kafka is preferred when systems need event history, replay, high throughput, and multiple independent consumers.<\/p><p>Traditional queues are useful for simple task processing, while Kafka is better for event streaming.<\/p><h3>10. What is message retention in Kafka?<\/h3><p>Message retention defines how long Kafka stores messages in a topic. Kafka does not immediately delete messages after consumers read them. Instead, messages remain available based on retention settings.<\/p><p>Retention can be based on:<\/p><ul>\n<li>Time, such as 7 days<\/li>\n<li>Size, such as 10 GB<\/li>\n<li>Log compaction rules<\/li>\n<\/ul><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>retention.ms = 604800000<\/p>\n<\/div><\/div><p>This means messages may be stored for 7 days.<\/p><p>Retention is useful because consumers can replay old messages if needed. For example, if an analytics service fails, it can restart and process older events again from Kafka.<\/p><h3>11. What is replication in Kafka?<\/h3><p>Replication means keeping copies of Kafka partitions on multiple brokers. It helps Kafka remain available even if one broker fails.<\/p><p>For example, if a topic has replication factor 3, each partition will have three copies across different brokers.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Term<\/b><\/td>\n<td><b>Meaning<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Leader replica<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Handles reads and writes<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Follower replica<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Copies data from leader<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Replication factor<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Number of copies<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>If the leader broker fails, Kafka can elect another replica as the new leader.<\/p><p>Replication improves fault tolerance and is one of the main reasons Kafka is trusted for production systems.<\/p><h3>12. What is the role of ZooKeeper or KRaft in Kafka?<\/h3><p>Older Kafka versions used ZooKeeper to manage cluster metadata, broker coordination, and controller election. Newer Kafka versions are moving toward KRaft mode, where Kafka manages metadata internally without ZooKeeper.<\/p><p>Simple comparison:<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Feature<\/b><\/td>\n<td><b>ZooKeeper-based Kafka<\/b><\/td>\n<td><b>KRaft Kafka<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Metadata management<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Uses ZooKeeper<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Managed by Kafka itself<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Architecture<\/span><\/td>\n<td><span style=\"font-weight: 400;\">More components<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Simpler architecture<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Modern direction<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Older model<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Current direction<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>For freshers, it is enough to know that Kafka historically used ZooKeeper, but modern Kafka is moving toward KRaft for simpler and more scalable cluster management.<\/p><h3>13. Why does Kafka use keys in messages?<\/h3><p>Kafka message keys are used to decide which partition a message should go to. If messages have the same key, Kafka usually sends them to the same partition.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Key: customerId_101<br>\nValue: order placed<\/p>\n<\/div><\/div><p>This is important when message ordering is required for the same entity.<\/p><p>For example, all events for the same order ID should go to the same partition so they are processed in order:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>order-created<\/strong> &rarr; <strong>payment-done<\/strong> &rarr; <strong>order-shipped<\/strong><\/p>\n<\/div><\/div><p>Without a key, Kafka may distribute messages across partitions in a round-robin or default manner, depending on producer configuration.<\/p><h3>14. How is message ordering handled in Kafka?<\/h3><p>Kafka guarantees ordering only within a partition, not across the whole topic. This means messages written to the same partition are read in the same order.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Partition 0: M1 &rarr; M2 &rarr; M3<\/p>\n<\/div><\/div><p>If all events for one order go to the same partition, their order is maintained.<\/p><p>To maintain ordering for related messages, producers should use a key like:<\/p><ul>\n<li>orderId<\/li>\n<li>userId<\/li>\n<li>accountId<\/li>\n<\/ul><p>Kafka will send messages with the same key to the same partition.<\/p><p>If messages are spread across multiple partitions, global ordering is not guaranteed.<\/p><h3>15. What is the difference between topic, partition, and offset?<\/h3><p>These three are basic Kafka storage concepts.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Concept<\/b><\/td>\n<td><b>Meaning<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Topic<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Logical category of messages<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Partition<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Subdivision of a topic<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Offset<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Position of a message inside a partition<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Topic: order-events<br>\nPartition: 0<br>\nOffset: 15<\/p>\n<\/div><\/div><p>This means the message is stored in partition 0 of the order-events topic at offset 15.<\/p><p>A topic organizes messages, partitions make the topic scalable, and offsets help consumers track their reading position.<br>\nThis is one of the most important Kafka basic interview questions for freshers.<\/p><h3>16. What is a Kafka cluster?<\/h3><p>A Kafka cluster is a group of Kafka brokers working together. Instead of running Kafka on one server, production systems usually run multiple brokers as a cluster.<\/p><p>A Kafka cluster provides:<\/p><ul>\n<li>High availability<\/li>\n<li>Load distribution<\/li>\n<li>Fault tolerance<\/li>\n<li>Better scalability<\/li>\n<li>Replication support<\/li>\n<\/ul><p>For example, a cluster may have three brokers:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Broker 1<br>\nBroker 2<br>\nBroker 3<\/p>\n<\/div><\/div><p>Topics and partitions are distributed across these brokers. If one broker fails, Kafka can continue working using replicas on other brokers.<\/p><p>This makes Kafka suitable for large-scale real-time applications.<\/p><h3>17. Why is Kafka considered scalable?<\/h3><p>Kafka is scalable because it uses partitions and brokers to distribute workload. A topic can be split into multiple partitions, and those partitions can be spread across different brokers.<\/p><p>Scalability happens at two levels:<\/p><p><strong>Producer side:<\/strong> Multiple producers can write data to Kafka.<br>\n<strong>Consumer side:<\/strong> Multiple consumers in a group can read partitions in parallel.<\/p><p>For example, if a topic has 6 partitions, a consumer group can use up to 6 consumers for parallel reading.<\/p><p>Kafka can handle large volumes of data because it does not depend on a single queue or single consumer.<\/p><h3>18. What is the difference between producer and consumer in Kafka?<\/h3><p>A producer sends data to Kafka, while a consumer reads data from Kafka.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Feature<\/b><\/td>\n<td><b>Producer<\/b><\/td>\n<td><b>Consumer<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Role<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Publishes messages<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Reads messages<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Connects to<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Kafka topic<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Kafka topic<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Tracks offset<\/span><\/td>\n<td><span style=\"font-weight: 400;\">No<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Yes<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Example<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Order service sends order event<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Email service reads order event<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Order Service &rarr; Kafka &rarr; Notification Service<\/p>\n<\/div><\/div><p>Here, the order service is the producer, and the notification service is the consumer.<\/p><p>This separation helps systems communicate without being directly dependent on each other.<\/p><h3>19. How does Kafka help in data engineering?<\/h3><p>Kafka is widely used in data engineering because it can collect, transport, and process real-time data from multiple sources.<\/p><p>Common data engineering use cases include:<\/p><ul>\n<li>Log collection<\/li>\n<li>Clickstream processing<\/li>\n<li>Fraud detection<\/li>\n<li>Real-time dashboards<\/li>\n<li>CDC pipelines<\/li>\n<li>ETL and ELT workflows<\/li>\n<li>Data lake ingestion<\/li>\n<\/ul><p>For example, Kafka can receive user activity events from a website and send them to Spark, Flink, or a data warehouse.<\/p><p>Kafka acts as a real-time data backbone between applications, databases, analytics tools, and storage systems.<\/p><h3>20. What is a Kafka record?<\/h3><p>A Kafka record is the actual message stored in a Kafka topic. It usually contains a key, value, timestamp, headers, and metadata.<\/p><p>A Kafka record can be understood like this:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Key: user_101<br>\nValue: login event<br>\nTimestamp: event time<\/p>\n<\/div><\/div><p>The key helps decide the partition, while the value contains the actual message data.<\/p><p>Kafka records are immutable once written. Consumers can read them, but they do not modify the original record in Kafka.<\/p><p>This structure makes Kafka reliable for event streaming, logs, and real-time data processing.<\/p><h2>Intermediate Kafka Interview Questions<\/h2><p>These Kafka interview questions and answers focus on practical Kafka usage in real applications.<\/p><p>This section covers producer acknowledgements, consumer lag, retries, serialization, retention, compaction, rebalancing, and common configuration topics asked in backend and data engineering interviews.<\/p><h3>1. How do producer acknowledgements work in Kafka?<\/h3><p>Producer acknowledgements, or acks, decide when Kafka confirms that a message has been successfully written.<\/p><p>Common values:<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>acks value<\/b><\/td>\n<td><b>Meaning<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">acks=0<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Producer does not wait for confirmation<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">acks=1<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Leader confirms after writing message<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">acks=all<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Leader and required replicas confirm<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>For stronger durability, <strong>acks=all<\/strong> is preferred because the producer waits until replicas also confirm the message.<\/p><p>However, stronger acknowledgement may slightly reduce speed. In real projects, the choice depends on whether the system values speed or reliability more.<\/p><p>For financial or order systems, reliability is usually more important.<\/p><h3>2. Explain consumer lag in Kafka.<\/h3><p>Consumer lag means the consumer is behind the latest messages in a topic partition. It shows how many messages are waiting to be processed.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Latest offset: 1000<br>\nConsumer offset: 850<br>\nLag: 150<\/p>\n<\/div><\/div><p>A high lag means the producer is sending messages faster than the consumer can process them.<\/p><p>Common causes include:<\/p><ul>\n<li>Slow consumer logic<\/li>\n<li>Too few consumers<\/li>\n<li>Large message volume<\/li>\n<li>Database slowness<\/li>\n<li>Network issues<\/li>\n<li>Rebalancing delays<\/li>\n<\/ul><p>To reduce lag, we can optimize consumer processing, increase partitions, add consumers, improve database writes, or batch messages properly.<\/p><p>Consumer lag is an important monitoring metric in Kafka systems.<\/p><h3>3. How do serializers and deserializers work in Kafka?<\/h3><p>Kafka stores messages as bytes. Serializers convert application objects into bytes before sending them to Kafka. Deserializers convert bytes back into usable objects when consumers read messages.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Producer Object &rarr; Serializer &rarr; Kafka Bytes<br>\nKafka Bytes &rarr; Deserializer &rarr; Consumer Object<\/p>\n<\/div><\/div><p>Common serializers include:<\/p><ul>\n<li>String serializer<\/li>\n<li>JSON serializer<\/li>\n<li>Avro serializer<\/li>\n<li>Protobuf serializer<\/li>\n<\/ul><p>For Java applications, the producer and consumer must use compatible serialization formats. If the producer sends JSON but the consumer expects plain String, deserialization errors may occur.<\/p><h3>4. How does Kafka handle failed message processing?<\/h3><p>Kafka does not automatically know whether business processing succeeded. A consumer reads a message, processes it, and commits the offset. If processing fails before offset commit, the message can be retried.<\/p><p>Common approaches:<\/p><ul>\n<li>Retry the message<\/li>\n<li>Use manual offset commit<\/li>\n<li>Send failed messages to a dead-letter topic<\/li>\n<li>Log the error and continue<\/li>\n<li>Apply backoff between retries<\/li>\n<\/ul><p>Example flow:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Consume &rarr; Process &rarr; Success &rarr; Commit Offset<br>\nConsume &rarr; Process &rarr; Failure &rarr; Retry \/ DLT<\/p>\n<\/div><\/div><p>For critical systems, offset should be committed only after successful processing. This prevents data loss but may require handling duplicate processing safely.<\/p><h3>5. What is a dead-letter topic in Kafka?<\/h3><p>A dead-letter topic, or DLT, is a separate Kafka topic used to store messages that cannot be processed successfully after retries.<\/p><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Main topic: payment-events<br>\nDead-letter topic: payment-events-dlt<\/p>\n<\/div><\/div><p>A message may go to DLT because of:<\/p><ul>\n<li>Invalid format<\/li>\n<li>Missing fields<\/li>\n<li>Deserialization error<\/li>\n<li>Business validation failure<\/li>\n<li>Repeated processing failure<\/li>\n<\/ul><p>DLT is useful because it prevents one bad message from blocking the entire consumer flow. Teams can later inspect failed messages, fix the issue, and reprocess them if needed.<\/p><h3>6. How does consumer rebalancing work in Kafka?<\/h3><p>Consumer rebalancing happens when partitions are reassigned among consumers in the same consumer group.<\/p><p>It can happen when:<\/p><ul>\n<li>A new consumer joins<\/li>\n<li>A consumer leaves<\/li>\n<li>A consumer crashes<\/li>\n<li>Partitions are added<\/li>\n<li>Subscription changes<\/li>\n<\/ul><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Before: C1 &rarr; P0, P1 | C2 &rarr; P2, P3<br>\nAfter C3 joins: C1 &rarr; P0 | C2 &rarr; P1 | C3 &rarr; P2, P3<\/p>\n<\/div><\/div><p>Rebalancing helps distribute workload, but frequent rebalancing can slow down processing.<\/p><p>To reduce unnecessary rebalancing, consumers should process messages efficiently and send heartbeats within the expected time.<\/p><h3>7. What is log compaction in Kafka?<\/h3><p>Log compaction is a Kafka cleanup policy that keeps the latest value for each key instead of deleting messages only by time or size.<\/p><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>user1 &rarr; old email<br>\nuser1 &rarr; new email<\/p>\n<\/div><\/div><p>With compaction, Kafka can keep the latest record for user1.<\/p><p>Log compaction is useful for:<\/p><ul>\n<li>User profile updates<\/li>\n<li>Account status<\/li>\n<li>Configuration data<\/li>\n<li>Database change events<\/li>\n<li>State recovery<\/li>\n<\/ul><p>It is different from normal retention. Retention removes old messages based on time or size, while compaction keeps the latest value for each key.<\/p><p>This is useful when consumers need the latest state of each entity.<\/p><h3>8. How do retries work in Kafka producers?<\/h3><p>Producer retries allow Kafka producers to resend messages if temporary failures occur. For example, if a broker is temporarily unavailable, the producer can retry sending the message.<\/p><p>Important producer settings include:<\/p><ul>\n<li><strong>retries<\/strong><\/li>\n<li><strong>retry.backoff.ms<\/strong><\/li>\n<li><strong>delivery.timeout.ms<\/strong><\/li>\n<li><strong>acks<\/strong><\/li>\n<li><strong>enable.idempotence<\/strong><\/li>\n<\/ul><p>Retries improve reliability, but they must be configured carefully. Without idempotence, retries may sometimes cause duplicate messages in older or poorly configured setups.<\/p><p>For reliable producers, enable idempotence and use proper acknowledgements.<\/p><p>In interview answers, mention that retries are useful for temporary failures, but applications should still be designed to handle possible duplicates.<\/p><h3>9. What is idempotence in Kafka producer?<\/h3><p>Idempotence means the producer can safely retry sending messages without creating duplicates in Kafka due to retry attempts.<\/p><p>If idempotence is enabled, Kafka assigns sequence numbers to producer messages and avoids duplicate writes from the same producer session.<\/p><p>Configuration:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>enable.idempotence=true<\/p>\n<\/div><\/div><p>This is useful when network failures or broker errors cause producer retries.<\/p><p>Without idempotence, the producer may send the same message again and create duplicate records. With idempotence, Kafka can detect repeated send attempts.<\/p><p>Idempotence is important for reliable event streaming, especially in payment, order, and financial systems.<\/p><h3>10. Explain the difference between retention and compaction.<\/h3><p>Retention and compaction are two Kafka cleanup strategies.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Feature<\/b><\/td>\n<td><b>Retention<\/b><\/td>\n<td><b>Compaction<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Removes data based on<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Time or size<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Older values for same key<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Keeps latest state?<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Not necessarily<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Yes<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Best for<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Event history<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Current state<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Example<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Keep logs for 7 days<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Keep latest user profile<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>Retention is useful when we need event history for a limited period. Compaction is useful when we need the latest value for each key.<\/p><p>For example, clickstream data may use retention, while user settings may use compaction.<\/p><p>Kafka topics can be configured based on the business requirement.<\/p><h3>11. How does Kafka support fault tolerance?<\/h3><p>Kafka supports fault tolerance through replication. Each partition can have multiple replicas across different brokers.<\/p><p>One replica acts as the leader, and others act as followers. Producers and consumers interact with the leader. Followers copy data from the leader.<\/p><p>If the leader broker fails, Kafka can elect another in-sync replica as the new leader.<\/p><ul>\n<li>Fault tolerance depends on:<\/li>\n<li>Replication factor<\/li>\n<li>In-sync replicas<\/li>\n<li>Acknowledgement settings<\/li>\n<li>Broker availability<\/li>\n<li>Proper topic configuration<\/li>\n<\/ul><p>For example, a replication factor of 3 means Kafka keeps three copies of the partition. This helps the system continue working even if one broker fails.<\/p><h3>12. What is the role of min.insync.replicas?<\/h3><p>min.insync.replicas defines the minimum number of replicas that must acknowledge a write for it to be considered successful when acks=all is used.<\/p><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>replication.factor = 3<\/p>\n<p>min.insync.replicas = 2<\/p>\n<p>acks = all<\/p>\n<\/div><\/div><p>This means at least two replicas must confirm the message.<\/p><p>This setting improves durability because Kafka does not accept writes unless enough replicas are in sync.<\/p><p>If too many replicas are unavailable, the producer may receive an error instead of silently writing unsafe data.<\/p><p>It is commonly used in production systems where message loss is not acceptable.<\/p><h3>13. How would you choose the number of partitions for a topic?<\/h3><p>The number of partitions depends on throughput, parallelism, ordering needs, and future scaling.<\/p><p>More partitions allow:<\/p><ul>\n<li>More parallel consumers<\/li>\n<li>Higher throughput<\/li>\n<li>Better load distribution<\/li>\n<\/ul><p>But too many partitions can increase overhead for brokers, metadata, leader election, and rebalancing.<\/p><p>Important factors:<\/p><ul>\n<li>Expected message volume<\/li>\n<li>Number of consumers needed<\/li>\n<li>Ordering requirement<\/li>\n<li>Broker capacity<\/li>\n<li>Future growth<\/li>\n<li>Retention size<\/li>\n<\/ul><p>If strict ordering is needed for all messages, fewer partitions may be better. If high parallel processing is needed, more partitions help.<\/p><p>Partition planning is important because changing partitions later can affect key-based ordering.<\/p><h3>14. How does Kafka handle message replay?<\/h3><p>Kafka supports message replay because messages are stored based on retention and are not removed immediately after consumption. A consumer can reset its offset and read old messages again.<\/p><p>Replay is useful for:<\/p><ul>\n<li>Reprocessing failed data<\/li>\n<li>Rebuilding search indexes<\/li>\n<li>Re-running analytics<\/li>\n<li>Fixing downstream system errors<\/li>\n<li>Testing new consumers<\/li>\n<\/ul><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Reset offset from 5000 to 1000<br>\nConsumer reads again from offset 1000<\/p>\n<\/div><\/div><p>Kafka replay depends on whether the old messages are still available based on retention settings.<\/p><p>This feature makes Kafka different from many traditional queues.<\/p><h3>15. What is the difference between at-most-once, at-least-once, and exactly-once delivery?<\/h3><p>These are message delivery semantics.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Delivery Type<\/b><\/td>\n<td><b>Meaning<\/b><\/td>\n<td><b>Risk<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">At-most-once<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Message may be lost but not repeated<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Data loss<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">At-least-once<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Message is not lost but may repeat<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Duplicates<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Exactly-once<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Message is processed once in supported conditions<\/span><\/td>\n<td><span style=\"font-weight: 400;\">More complexity<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>At-least-once is commonly used because it is safer than losing messages. Applications must handle duplicates using idempotent processing.<\/p><p>Exactly-once is possible in Kafka for specific transactional workflows, but it needs proper producer and consumer configuration.<\/p><p>This concept is important for real-time systems where reliability matters.<\/p><h3>16. How does Kafka work with Spring Boot?<\/h3><p>Spring Boot applications can use Spring for Apache Kafka to create producers and consumers easily.<\/p><p>Common components include:<\/p><ul>\n<li><strong>KafkaTemplate<\/strong> for producing messages<\/li>\n<li><strong>@KafkaListener<\/strong> for consuming messages<\/li>\n<li>Producer configuration<\/li>\n<li>Consumer configuration<\/li>\n<li>Serializers and deserializers<\/li>\n<li>Error handlers<\/li>\n<\/ul><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>@KafkaListener(topics = &ldquo;order-events&rdquo;, groupId = &ldquo;notification-service&rdquo;)<\/p>\n<p>public void consume(String message) {<\/p>\n<p>System.out.println(message);<\/p>\n<p>}<\/p>\n<\/div><\/div><p>This is common in Kafka Spring Boot interview questions, especially for Java backend roles.<\/p><p>Spring Boot reduces boilerplate and makes Kafka integration easier in microservices.<\/p><h3>17. How do Kafka headers help in messaging?<\/h3><p>Kafka headers store additional metadata with a message without changing the main key or value.<\/p><p>Examples of header data:<\/p><ul>\n<li>Trace ID<\/li>\n<li>Correlation ID<\/li>\n<li>Source service<\/li>\n<li>Event version<\/li>\n<li>Authentication context<\/li>\n<li>Message type<\/li>\n<\/ul><p>Headers are useful in microservices because they help with tracing, debugging, and routing.<\/p><p>For example, a<strong> correlationId<\/strong> header can track one request across multiple services.<\/p><p>Kafka headers should not store large business data. The main data should remain in the message value, while headers should carry lightweight metadata.<\/p><h3>18. How does Kafka differ from RabbitMQ?<\/h3><p>Kafka and RabbitMQ are both messaging systems, but their design goals differ.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Feature<\/b><\/td>\n<td><b>Kafka<\/b><\/td>\n<td><b>RabbitMQ<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Main use<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Event streaming<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Message queuing<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Message storage<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Retention-based log<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Queue-based delivery<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Replay<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Strong support<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Limited<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Throughput<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Very high<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Good<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Routing<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Topic\/partition model<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Exchange\/queue model<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Best for<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Data pipelines, analytics, events<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Task queues, command messaging<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>Kafka is preferred for event streaming and high-volume pipelines. RabbitMQ is useful when complex routing and task queue patterns are needed.<\/p><p>The choice depends on project requirements.<\/p><h3>19. What is schema evolution in Kafka?<\/h3><p>Schema evolution means changing message structure over time without breaking existing producers or consumers.<\/p><p>For example, version 1 of an event may have:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>{ &ldquo;id&rdquo;: 1, &ldquo;name&rdquo;: &ldquo;Amit&rdquo; }<\/p>\n<\/div><\/div><p>Version 2 may add:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>{ &ldquo;id&rdquo;: 1, &ldquo;name&rdquo;: &ldquo;Amit&rdquo;, &ldquo;email&rdquo;: &ldquo;amit@example.com&rdquo; }<\/p>\n<\/div><\/div><p>If consumers are not ready for the new field, they should still work.<\/p><p>Schema evolution is commonly managed using tools like Schema Registry with Avro, Protobuf, or JSON Schema.<\/p><p>It is important in Kafka because many systems may consume the same event.<\/p><h3>20. How do you monitor Kafka consumers?<\/h3><p>Kafka consumers are monitored using metrics such as consumer lag, offset progress, throughput, error rate, and rebalance frequency.<\/p><p>Important metrics:<\/p><ul>\n<li>Consumer lag<\/li>\n<li>Messages consumed per second<\/li>\n<li>Processing time<\/li>\n<li>Failed message count<\/li>\n<li>Rebalance count<\/li>\n<li>Offset commit status<\/li>\n<\/ul><p>Tools used include:<\/p><ul>\n<li>Kafka command-line tools<\/li>\n<li>Prometheus and Grafana<\/li>\n<li>Confluent Control Center<\/li>\n<li>Burrow<\/li>\n<li>Cloud monitoring tools<\/li>\n<\/ul><p>Consumer lag is one of the most important metrics. If lag keeps increasing, it means the consumer cannot keep up with incoming messages.<\/p><p>Monitoring helps detect issues before they affect business systems.<\/p><h2>Advanced Kafka Interview Questions<\/h2><p>These Kafka advanced interview questions focus on reliability, transactions, performance tuning, security, stream processing, scaling, and production-level troubleshooting.<\/p><p>They are useful for learners preparing for Kafka architecture interview questions and real-time scenario-based technical rounds.<\/p><h3>1. How would you design Kafka for high availability?<\/h3><p>To design Kafka for high availability, I would use multiple brokers, replication, proper partition distribution, and strong producer acknowledgements.<\/p><p>Important design choices:<\/p><ul>\n<li>Use at least 3 brokers<\/li>\n<li>Set replication factor to 3<\/li>\n<li>Use <strong>acks=all<\/strong><\/li>\n<li>Configure <strong>min.insync.replicas=2<\/strong><\/li>\n<li>Distribute partition leaders across brokers<\/li>\n<li>Monitor under-replicated partitions<\/li>\n<li>Use multiple Kafka controllers in KRaft mode<\/li>\n<li>Avoid single points of failure<\/li>\n<\/ul><p>This setup ensures that if one broker fails, Kafka can continue working using replicas on other brokers.<\/p><p>High availability also needs monitoring, backups for configurations, proper disk planning, and tested failure recovery processes.<\/p><h3>2. How do Kafka transactions work?<\/h3><p>Kafka transactions allow producers to write messages to multiple partitions atomically. This means either all writes succeed or none are committed.<\/p><p>Transactions are useful when a system consumes from one topic, processes data, and writes to another topic.<\/p><p>Example flow:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>Read from input topic<\/strong> &rarr; <strong>Process<\/strong> &rarr; <strong>Write to output topic<\/strong> &rarr; <strong>Commit transaction<\/strong><\/p>\n<\/div><\/div><p>Kafka transactions help support exactly-once processing in specific stream processing scenarios.<\/p><p>Important concepts include:<\/p><ul>\n<li>Transactional producer<\/li>\n<li>Transaction ID<\/li>\n<li>Atomic writes<\/li>\n<li>Committed and aborted transactions<\/li>\n<li>Consumer isolation level<\/li>\n<\/ul><p>Transactions are more complex than normal producer writes, so they should be used when atomicity is truly required.<\/p><h3>3. How would you tune Kafka producer performance?<\/h3><p>Producer performance can be improved by tuning batching, compression, acknowledgements, and retries.<\/p><p>Useful configurations:<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Setting<\/b><\/td>\n<td><b>Purpose<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td>batch.size<\/td>\n<td>Sends messages in batches<\/td>\n<\/tr>\n<tr>\n<td>linger.ms<\/td>\n<td>Waits briefly to collect more messages<\/td>\n<\/tr>\n<tr>\n<td>compression.type<\/td>\n<td>Reduces network usage<\/td>\n<\/tr>\n<tr>\n<td>acks<\/td>\n<td>Controls durability<\/td>\n<\/tr>\n<tr>\n<td>retries<\/td>\n<td>Handles temporary failures<\/td>\n<\/tr>\n<tr>\n<td>buffer.memory<\/td>\n<td>Controls producer buffer size<\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>For high throughput, batching and compression are important. For high reliability, acks=all and idempotence should be used.<\/p><p>Tuning depends on whether the system needs speed, durability, or a balance of both.<\/p><h3>4. How would you tune Kafka consumer performance?<\/h3><p>Consumer performance depends on how fast messages are fetched, processed, and committed.<\/p><p>Optimization methods include:<\/p><ul>\n<li>Increase partitions and consumers<\/li>\n<li>Use batch processing<\/li>\n<li>Tune <strong>fetch.min.bytes<\/strong><\/li>\n<li>Tune <strong>max.poll.records<\/strong><\/li>\n<li>Optimize downstream database calls<\/li>\n<li>Avoid slow processing inside poll loop<\/li>\n<li>Use async processing carefully<\/li>\n<li>Monitor consumer lag<\/li>\n<\/ul><p>If consumers are slow because database writes are slow, adding more consumers may not fully solve the problem. The downstream system must also be optimized.<\/p><p>Consumer tuning should be done with monitoring because aggressive settings can increase memory usage or processing failures.<\/p><h3>5. How would you prevent duplicate processing in Kafka?<\/h3><p>Kafka systems often use at-least-once delivery, which means duplicates can happen. To prevent duplicate business impact, consumers should be idempotent.<\/p><p>Approaches include:<\/p><ul>\n<li>Use unique event IDs<\/li>\n<li>Store processed message IDs<\/li>\n<li>Use database constraints<\/li>\n<li>Design update operations safely<\/li>\n<li>Enable producer idempotence<\/li>\n<li>Commit offsets after successful processing<\/li>\n<li>Use transactions where required<\/li>\n<\/ul><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>If paymentId already processed, skip duplicate event.<\/p>\n<\/div><\/div><p>Kafka can reduce duplicates, but applications should still handle them safely. This is especially important in payment, order, banking, and inventory systems.<\/p><h3>6. How does Kafka handle backpressure?<\/h3><p>Backpressure happens when producers send data faster than consumers or downstream systems can handle. Kafka absorbs this pressure by storing messages in topics based on retention.<\/p><p>However, if consumer lag keeps increasing, the system needs attention.<\/p><p>Ways to handle backpressure:<\/p><ul>\n<li>Scale consumers<\/li>\n<li>Increase partitions<\/li>\n<li>Optimize processing logic<\/li>\n<li>Batch database writes<\/li>\n<li>Tune fetch settings<\/li>\n<li>Use rate limiting<\/li>\n<li>Improve downstream system capacity<\/li>\n<li>Monitor lag and throughput<\/li>\n<\/ul><p>Kafka can buffer large amounts of data, but it should not be treated as a permanent storage solution for unprocessed messages.<\/p><p>Backpressure handling is important in real-time data pipelines.<\/p><h3>7. What is exactly-once semantics in Kafka?<\/h3><p>Exactly-once semantics means each message is processed once without loss or duplication within supported Kafka workflows.<\/p><p>Kafka supports exactly-once semantics using:<\/p><ul>\n<li>Idempotent producers<\/li>\n<li>Transactions<\/li>\n<li>Proper offset commits<\/li>\n<li>Transactional writes<\/li>\n<li>Consumer isolation level<\/li>\n<\/ul><p>This is commonly used in Kafka Streams or consume-process-produce patterns.<\/p><p>However, exactly-once does not automatically apply to every external system. If Kafka writes to a database, the database operation must also be handled carefully.<\/p><h3>8. How would you secure a Kafka cluster?<\/h3><p>Kafka security includes authentication, authorization, encryption, and network protection.<\/p><p>Important security practices:<\/p><ul>\n<li>Use SSL\/TLS for encryption<\/li>\n<li>Use SASL for authentication<\/li>\n<li>Configure ACLs for authorization<\/li>\n<li>Restrict topic access<\/li>\n<li>Avoid public broker exposure<\/li>\n<li>Secure ZooKeeper or KRaft metadata access<\/li>\n<li>Rotate credentials<\/li>\n<li>Monitor suspicious access<\/li>\n<li>Use separate users for producers and consumers<\/li>\n<\/ul><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Payment service can write to payment-events.<br>\nAnalytics service can only read payment-events.<\/p>\n<\/div><\/div><p>Kafka security is important because topics may contain sensitive business, user, or financial data.<\/p><h3>9. How do Kafka ACLs work?<\/h3><p>Kafka ACLs, or Access Control Lists, define what users or services are allowed to do in Kafka.<\/p><p>ACLs can control permissions such as:<\/p><ul>\n<li>Create topic<\/li>\n<li>Write to topic<\/li>\n<li>Read from topic<\/li>\n<li>Join consumer group<\/li>\n<li>Describe cluster<\/li>\n<li>Delete topic<\/li>\n<\/ul><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>User payment-producer can WRITE to payment-events.<br>\nUser analytics-consumer can READ from payment-events.<\/p>\n<\/div><\/div><p>ACLs help protect topics from unauthorized access.<\/p><p>In production, each service should have only the permissions it needs. This follows the principle of least privilege and reduces security risk.<\/p><h3>10. How would you handle poison messages in Kafka?<\/h3><p>A poison message is a message that repeatedly fails processing because it has invalid data, unsupported format, or business rule issues.<\/p><p>Handling methods:<\/p><ul>\n<li>Retry with limit<\/li>\n<li>Move to dead-letter topic<\/li>\n<li>Log full error details<\/li>\n<li>Add alerting<\/li>\n<li>Validate schema before processing<\/li>\n<li>Fix and reprocess later<\/li>\n<li>Avoid blocking the entire consumer<\/li>\n<\/ul><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>Invalid payment event<\/strong> &rarr; <strong>retry 3 times<\/strong> &rarr; <strong>send to payment-events-dlt<\/strong><\/p>\n<\/div><\/div><p>Poison messages should not stop the whole consumer group permanently. A proper DLT strategy helps maintain flow while preserving failed messages for investigation.<\/p><h3>11. How would you design Kafka topic naming conventions?<\/h3><p>Good topic naming helps teams understand ownership, purpose, and environment.<\/p><p>A common format is:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>domain.entity.event<\/p>\n<\/div><\/div><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>orders.payment.completed<br>\nusers.profile.updated<br>\ninventory.stock.changed<\/p>\n<\/div><\/div><p>For environment-specific topics:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>dev.orders.created<br>\nqa.orders.created<br>\nprod.orders.created<\/p>\n<\/div><\/div><p>Good naming should be:<\/p><ul>\n<li>Clear<\/li>\n<li>Consistent<\/li>\n<li>Business-friendly<\/li>\n<li>Easy to search<\/li>\n<li>Not too long<\/li>\n<li>Aligned with team standards<\/li>\n<\/ul><p>Poor topic names like <strong>test1<\/strong>, <strong>data-topic<\/strong>, or <strong>new-events<\/strong> create confusion in large systems.<\/p><p>Topic naming becomes very important when many teams use the same Kafka cluster.<\/p><h3>12. How does Kafka support event-driven microservices?<\/h3><p>Kafka supports event-driven microservices by allowing services to publish and consume events without directly depending on each other.<\/p><p><strong>Example:<\/strong><\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Order Service &rarr; order-created event &rarr; Kafka<br>\nInventory Service consumes event<br>\nPayment Service consumes event<br>\nNotification Service consumes event<\/p>\n<\/div><\/div><p>Each service reacts independently.<\/p><p>Benefits include:<\/p><ul>\n<li>Loose coupling<\/li>\n<li>Better scalability<\/li>\n<li>Asynchronous communication<\/li>\n<li>Real-time processing<\/li>\n<li>Easier integration between services<\/li>\n<\/ul><p>If one consumer is down, Kafka can retain messages until it comes back, depending on retention settings.<\/p><p>This design is useful for large systems like e-commerce, banking, logistics, and streaming platforms.<\/p><h3>13. How would you migrate data from one Kafka cluster to another?<\/h3><p>Kafka cluster migration can be done using tools like MirrorMaker 2, Kafka Connect, or custom consumers and producers.<\/p><p>Migration steps:<\/p><ul>\n<li>Identify topics to migrate<\/li>\n<li>Match topic configurations<\/li>\n<li>Replicate data to new cluster<\/li>\n<li>Sync consumer offsets if needed<\/li>\n<li>Test producer and consumer connectivity<\/li>\n<li>Validate message counts and lag<\/li>\n<li>Switch traffic gradually<\/li>\n<li>Monitor both clusters<\/li>\n<\/ul><p>For critical systems, migration should be done carefully to avoid message loss or duplicate processing.<\/p><p>A phased approach is safer than moving all producers and consumers at once.<\/p><p>Cluster migration is usually planned during infrastructure upgrades, cloud migration, or disaster recovery setup.<\/p><h3>14. How would you debug high under-replicated partitions?<\/h3><p>Under-replicated partitions mean some replicas are not fully caught up with the leader. This is a serious Kafka health issue.<\/p><p>I would check:<\/p><ul>\n<li>Broker availability<\/li>\n<li>Network issues<\/li>\n<li>Disk usage<\/li>\n<li>Broker logs<\/li>\n<li>Slow followers<\/li>\n<li>Replication throttling<\/li>\n<li>Partition leader distribution<\/li>\n<li>ISR count<\/li>\n<li>Controller logs<\/li>\n<\/ul><p>Possible causes include broker failure, overloaded disks, network latency, or insufficient broker resources.<\/p><p>If under-replicated partitions remain high, Kafka durability is at risk. The fix may involve restoring failed brokers, balancing partitions, increasing resources, or resolving network issues.<\/p><p>This is an important production-level Kafka troubleshooting question.<\/p><h3>15. How would you manage Kafka message schemas in production?<\/h3><p>In production, message schemas should be managed carefully so producers and consumers remain compatible. Tools like Schema Registry are commonly used with Avro, Protobuf, or JSON Schema.<\/p><p>Good practices:<\/p><ul>\n<li>Define schema contracts<\/li>\n<li>Use backward or forward compatibility<\/li>\n<li>Avoid breaking field changes<\/li>\n<li>Add optional fields instead of removing fields<\/li>\n<li>Version events properly<\/li>\n<li>Validate messages before publishing<\/li>\n<li>Document event structure<\/li>\n<\/ul><p>For example, adding an optional <strong>email<\/strong> field is safer than renaming an existing <strong>userId<\/strong> field.<\/p><p>Schema management is important because many services may depend on the same Kafka event.<\/p><h3>16. How would you handle disaster recovery in Kafka?<\/h3><p>Kafka disaster recovery means planning how to recover if a cluster, data center, or region fails.<\/p><p>Important practices include:<\/p><ul>\n<li>Use replication across brokers<\/li>\n<li>Keep backups of configurations<\/li>\n<li>Use MirrorMaker 2 for cross-cluster replication<\/li>\n<li>Monitor replication lag<\/li>\n<li>Document recovery steps<\/li>\n<li>Test failover regularly<\/li>\n<li>Store critical data in multiple zones or regions<\/li>\n<li>Define RPO and RTO clearly<\/li>\n<\/ul><p>For example, a company may run Kafka in one primary region and replicate important topics to another region.<\/p><p>Disaster recovery should be tested before real failures happen. Otherwise, teams may not know whether recovery actually works.<\/p><h3>17. How would you prevent data loss in Kafka?<\/h3><p>To prevent data loss, Kafka must be configured for durability at producer, broker, and consumer levels.<\/p><p>Important settings and practices:<\/p><ul>\n<li>Use replication factor 3<\/li>\n<li>Set acks=all<\/li>\n<li>Configure min.insync.replicas<\/li>\n<li>Enable producer idempotence<\/li>\n<li>Avoid auto-committing offsets before processing<\/li>\n<li>Monitor under-replicated partitions<\/li>\n<li>Use durable storage<\/li>\n<li>Handle retries properly<\/li>\n<li>Do not delete topics accidentally<\/li>\n<li>Set proper retention<\/li>\n<\/ul><p>For consumers, commit offsets only after successful processing. For producers, wait for strong acknowledgement.<\/p><p>Data loss prevention requires both Kafka configuration and correct application logic.<\/p><h3>18. How does Kafka Streams differ from normal Kafka consumers?<\/h3><p>A normal Kafka consumer reads messages and processes them using custom application logic. Kafka Streams is a client library used to build stream processing applications directly on Kafka.<\/p><table class=\"tablepress\">\n<thead><tr>\n<td><b>Feature<\/b><\/td>\n<td><b>Kafka Consumer<\/b><\/td>\n<td><b>Kafka Streams<\/b><\/td>\n<\/tr><\/thead><tbody class=\"row-striping row-hover\">\n\n<tr>\n<td><span style=\"font-weight: 400;\">Purpose<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Read messages<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Process and transform streams<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">State handling<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Manual<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Built-in state stores<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Joins\/windows<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Manual logic<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Built-in support<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Use case<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Simple consumption<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Stream processing<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><p>Kafka Streams supports filtering, mapping, aggregation, joins, windowing, and stateful processing.<\/p><p>For example, it can calculate real-time order totals or detect suspicious transactions from event streams.<\/p><h3>19. How would you test Kafka-based applications?<\/h3><p>Kafka-based applications can be tested at different levels.<\/p><p>Testing approaches:<\/p><ul>\n<li>Unit test producer and consumer logic<\/li>\n<li>Use embedded Kafka for integration testing<\/li>\n<li>Test serialization and deserialization<\/li>\n<li>Validate topic names and message structure<\/li>\n<li>Test retry and DLT flows<\/li>\n<li>Test duplicate message handling<\/li>\n<li>Test consumer offset behaviour<\/li>\n<li>Test Spring Boot Kafka listeners<\/li>\n<\/ul><p>For example, in Spring Boot, embedded Kafka can be used to test whether a producer sends a message and a consumer receives it correctly.<\/p><h3>20. How would you troubleshoot Kafka performance issues in production?<\/h3><p>To troubleshoot Kafka performance issues, I would check producer, broker, consumer, and network metrics separately.<\/p><p>Important checks:<\/p><ul>\n<li>Producer latency<\/li>\n<li>Broker CPU and disk I\/O<\/li>\n<li>Network throughput<\/li>\n<li>Consumer lag<\/li>\n<li>Partition distribution<\/li>\n<li>Under-replicated partitions<\/li>\n<li>Message size<\/li>\n<li>Batch settings<\/li>\n<li>Compression settings<\/li>\n<li>Slow downstream systems<\/li>\n<\/ul><p>If producer latency is high, batching or broker load may be the issue. If consumer lag is increasing, consumers or downstream systems may be slow.<\/p><p>Kafka performance troubleshooting should be metric-based, not guesswork-based. Monitoring tools like Prometheus, Grafana, and Kafka command-line tools help identify the exact bottleneck.<\/p><h2>Conceptual and Scenario-based Kafka Interview Questions<\/h2><p>These Kafka interview questions scenario-based focus on how Kafka is used in real production systems such as fintech, e-commerce, logistics, streaming platforms, data engineering pipelines, and microservices.<\/p><p>These questions test whether you can think beyond definitions and explain message flow, failures, scaling, ordering, duplicates, lag, and reliability in practical situations.<\/p><h3>1. An e-commerce order service publishes events to Kafka, but the inventory service receives the same order event twice. How would you handle it?<\/h3><p>Duplicate events can happen in Kafka, especially when producers retry messages or consumers process a message but fail before committing the offset. I would not depend only on Kafka to prevent duplicates. Instead, I would make the consumer idempotent.<\/p><p>For example, every order event should have a unique orderId or eventId. Before updating inventory, the inventory service can check whether that event was already processed.<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>orderId: ORD123<\/p>\n<p>eventType: ORDER_PLACED<\/p>\n<\/div><\/div><p>If ORD123 is already processed, the consumer skips it. This prevents duplicate stock deduction. In production, this can be handled using a database unique constraint, processed-event table, or idempotent business logic.<\/p><h3>2. A payment system cannot afford message loss. Which Kafka settings would you check?<\/h3><p>For a payment system, reliability is more important than speed. I would check producer, broker, and consumer configurations together.<\/p><p>Important settings include:<\/p><ul>\n<li>acks=all<\/li>\n<li>enable.idempotence=true<\/li>\n<li>retries configured properly<\/li>\n<li>replication.factor=3<\/li>\n<li>min.insync.replicas=2<\/li>\n<li>Manual offset commit after successful processing<\/li>\n<li>Proper dead-letter topic for failed messages<\/li>\n<\/ul><p>The producer should wait until the required replicas confirm the message. The consumer should commit the offset only after payment processing succeeds. If processing fails, the message should be retried or sent to a dead-letter topic. This reduces the chance of message loss in a critical system.<\/p><h3>3. A food delivery app wants live order tracking using Kafka. How would you design the flow?<\/h3><p>For live order tracking, Kafka can be used to stream order status events from different services. Each stage of the order can publish an event to a topic like <strong>order-tracking-events<\/strong>.<\/p><p>Example events:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>ORDER_PLACED<\/p>\n<p>RESTAURANT_ACCEPTED<\/p>\n<p>FOOD_PREPARING<\/p>\n<p>OUT_FOR_DELIVERY<\/p>\n<p>DELIVERED<\/p>\n<\/div><\/div><p>The producer can be the order service, restaurant service, or delivery partner app. Consumers can be the customer app, notification service, analytics service, and support dashboard.<\/p><p>To maintain correct order for one order, I would use orderId as the Kafka message key. This ensures all events for the same order go to the same partition and maintain sequence.<\/p><h3>4. A consumer lag is continuously increasing in a data pipeline. What would you investigate?<\/h3><p>Increasing consumer lag means the consumer is not processing messages as fast as producers are sending them. I would first check whether the issue is with the consumer, downstream system, or Kafka topic design.<\/p><p>I would investigate:<\/p><ul>\n<li>Consumer processing time<\/li>\n<li>Number of partitions<\/li>\n<li>Number of consumers in the group<\/li>\n<li>Database or API latency<\/li>\n<li>Batch size and poll settings<\/li>\n<li>Rebalance frequency<\/li>\n<li>Error retries slowing the consumer<\/li>\n<li>Message volume spikes<\/li>\n<\/ul><p>For example, if the consumer writes every event one by one to a database, database latency may cause lag. The fix may be batching writes, scaling consumers, increasing partitions, or optimizing the downstream system.<\/p><h3>5. A banking application must process account transactions in the correct order. How would Kafka help?<\/h3><p>Kafka guarantees ordering only within a partition, not across all partitions. For account transactions, I would use <strong>accountId<\/strong> as the message key so all transactions for the same account go to the same partition.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>Key: account_101<\/p>\n<p>Events: DEBIT &rarr; CREDIT &rarr; BALANCE_UPDATE<\/p>\n<\/div><\/div><p>This ensures transaction events for one account are consumed in the same order they were produced.<\/p><p>However, if events are spread across multiple partitions without a proper key, ordering may break. So, the partitioning strategy must be designed carefully. For banking-like systems, key-based partitioning and idempotent processing are both important.<\/p><h3>6. A logistics company wants to track vehicle location updates every few seconds. What Kafka design choices matter?<\/h3><p>For vehicle tracking, Kafka must handle high-volume, frequent location events. I would create a topic like <strong>vehicle-location-events<\/strong> and partition it based on <strong>vehicleId<\/strong>.<\/p><p>Important design choices include:<\/p><ul>\n<li>Use <strong>vehicleId<\/strong> as the message key<\/li>\n<li>Choose enough partitions for high throughput<\/li>\n<li>Use compression to reduce message size<\/li>\n<li>Set suitable retention based on business needs<\/li>\n<li>Use consumer groups for tracking, alerts, and analytics<\/li>\n<li>Monitor consumer lag closely<\/li>\n<\/ul><p>For example, one consumer can update the live map, another can detect route deviations, and another can store data for analytics. Kafka works well here because multiple systems can consume the same location stream independently.<\/p><h3>7. A fraud detection system needs real-time transaction analysis. How would Kafka fit into the architecture?<\/h3><p>Kafka can act as the real-time event backbone for fraud detection. Payment or transaction services can publish transaction events to a Kafka topic such as <strong>transaction-events<\/strong>.<\/p><p>A fraud detection consumer or stream processing application can read these events immediately and check rules such as:<\/p><ul>\n<li>Unusual transaction amount<\/li>\n<li>Multiple failed attempts<\/li>\n<li>New device login<\/li>\n<li>Location mismatch<\/li>\n<li>High-frequency transactions<\/li>\n<\/ul><p>If suspicious activity is found, the fraud service can publish another event to <strong>fraud-alert-events<\/strong>.<\/p><p>This design is useful because Kafka supports high-throughput event streaming and allows fraud detection, analytics, notification, and audit systems to consume the same transaction data independently.<\/p><h3>8. A Kafka consumer crashes after processing a message but before committing the offset. What happens?<\/h3><p>If a consumer processes a message successfully but crashes before committing the offset, Kafka still considers that message unprocessed. When the consumer restarts, it may read the same message again.<\/p><p>This creates duplicate processing risk.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>Message processed<\/strong> &rarr; <strong>Consumer crashes<\/strong> &rarr; <strong>Offset not committed<\/strong> &rarr; <strong>Message read again<\/strong><\/p>\n<\/div><\/div><p>To handle this, the consumer logic should be idempotent. For example, if the message creates an invoice, the system should check whether the invoice for that event already exists before creating another one.<\/p><p>This is why Kafka applications often follow at-least-once processing and design consumers to safely handle duplicates.<\/p><h3>9. A company wants to rebuild its analytics dashboard using old Kafka events. Is it possible?<\/h3><p>Yes, it is possible if the old Kafka events are still available within the topic retention period. Kafka stores messages based on retention settings, not based on whether they were already consumed.<\/p><p>To rebuild analytics, a new consumer group can be created and started from the earliest available offset.<\/p><p>Example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p>New analytics consumer group &rarr; Read from beginning &rarr; Rebuild dashboard data<\/p>\n<\/div><\/div><p>This is useful for reprocessing, rebuilding indexes, fixing data errors, or launching new analytics services.<\/p><p>However, if retention has already deleted old events, Kafka cannot replay them. In that case, the data must come from long-term storage like a data lake, warehouse, or backup system.<\/p><h3>10. A microservices system has too many direct API calls between services. How can Kafka improve it?<\/h3><p>Kafka can reduce tight coupling between microservices by enabling event-driven communication. Instead of one service directly calling multiple services, it can publish an event to Kafka.<\/p><p>For example:<\/p><div class=\"su-note\" style=\"border-color:#dddfde;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\"><div class=\"su-note-inner su-u-clearfix su-u-trim\" style=\"background-color:#f7f9f8;border-color:#ffffff;color:#333333;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;\">\n<p><strong>Order Service<\/strong> &rarr; <strong>order-created event<\/strong> &rarr; <strong>Kafka<\/strong><\/p>\n<\/div><\/div><p>Then multiple services can consume the same event:<\/p><ul>\n<li>Inventory Service<\/li>\n<li>Payment Service<\/li>\n<li>Notification Service<\/li>\n<li>Analytics Service<\/li>\n<\/ul><p>This improves scalability because services do not need to wait for each other synchronously. If one consumer is temporarily down, it can resume later from its last offset. Kafka is useful when systems need asynchronous communication, event history, and independent scaling across services.<\/p><h2>Best Ways to Prepare for Kafka Interviews<\/h2><ul>\n<li><strong>Learn Event Streaming Basics First:<\/strong> Understand why Kafka is used for real-time data streaming, event-driven architecture, log-based messaging, and high-throughput data pipelines.<\/li>\n<li><strong>Understand Kafka Core Concepts:<\/strong> Revise topics, partitions, brokers, producers, consumers, consumer groups, offsets, replication factor, leader-follower replicas, and retention policies.<\/li>\n<li><strong>Focus on Kafka Architecture:<\/strong> Learn how Kafka stores messages, how partitions improve scalability, how brokers coordinate, and how Kafka maintains fault tolerance through replication.<\/li>\n<li><strong><a href=\"https:\/\/www.guvi.in\/courses\/project\/kafka-consumer-and-producer-with-spring-boot\/?utm_source=placement_preparation&amp;utm_medium=blog_cta&amp;utm_campaign=ats_friendly_resume_guide&amp;utm_content=start_your_journey\" target=\"_blank\" rel=\"noopener\">Practise Kafka with Java and Spring Boot<\/a>:<\/strong> For backend and Java roles, practise producer-consumer examples, Kafka listeners, serializers, deserializers, error handling, retries, and basic Spring Boot Kafka integration.<\/li>\n<li><strong>Prepare for Data Engineering Use Cases:<\/strong> Understand how Kafka is used in data pipelines, real-time dashboards, log processing, fraud detection, CDC, stream processing, and analytics workflows.<\/li>\n<li><strong>Solve Scenario-based Questions:<\/strong> Practise questions on consumer lag, duplicate messages, message ordering, failed consumers, topic partitioning, retention issues, and producer acknowledgement settings.<\/li>\n<li><strong>Use <a href=\"https:\/\/www.placementpreparation.io\/\">PlacementPreparation.io<\/a>:<\/strong> Practise Kafka MCQs, <a href=\"https:\/\/www.placementpreparation.io\/mock-test\">mock tests<\/a>, technical questions, and placement-focused exercises to strengthen your interview readiness.<\/li>\n<li><strong>Learn with GUVI and GUVI Zen Class:<\/strong> Use <a href=\"https:\/\/www.guvi.in\/courses\/\" target=\"_blank\" rel=\"noopener\">GUVI courses<\/a> to learn Java, backend development, data engineering basics, cloud, DevOps, and real-time application concepts in a structured way. You can also choose <a href=\"https:\/\/www.guvi.in\/zen-class\/\" target=\"_blank\" rel=\"noopener\">GUVI Zen Class for mentor-led learning<\/a>, hands-on projects, coding practice, and career guidance.<\/li>\n<\/ul><h2>Final Words<\/h2><p>Kafka is an important technology for data engineering, backend, Java, DevOps, and real-time application roles.<\/p><p>To prepare well, practise Kafka interview questions, architecture concepts, producer-consumer flows, partitions, offsets, Spring Boot integration, and scenario-based problems. Strong hands-on practice will help you answer Kafka questions with confidence.<\/p><h2>Frequently asked questions<\/h2><h3>1. What is Kafka used for in real projects?<\/h3><p>Kafka is used for real-time data streaming, event-driven communication, log processing, data pipelines, and microservices communication. You can use Kafka when multiple systems need to send and receive large volumes of data quickly. In interviews, you should explain Kafka with examples like order tracking, payment events, fraud detection, real-time dashboards, and user activity tracking.<\/p><h3>2. Is Kafka an ETL tool?<\/h3><p>Kafka is not a complete ETL tool by itself. It is mainly an event streaming platform that moves data between systems in real time. However, you can use Kafka as part of an ETL or ELT pipeline along with tools like Kafka Connect, Spark, Flink, or data warehouses. In interviews, you can say Kafka helps transport data, while transformation usually happens through stream processing or external tools.<\/p><h3>3. Is Kafka used in backend development?<\/h3><p>Yes, Kafka is widely used in backend development, especially in microservices and distributed systems. Backend services use Kafka to publish and consume events without depending on direct API calls for every action. For example, an order service can publish an order event, and payment, inventory, notification, and analytics services can consume it separately.<\/p><h3>4. Can Kafka replace REST APIs?<\/h3><p>Kafka cannot fully replace REST APIs because both solve different problems. REST APIs are best for request-response communication, such as fetching user details or submitting a form. Kafka is better for asynchronous event streaming, such as sending order updates, logs, or transaction events. In real systems, you can use both REST APIs and Kafka together.<\/p><h3>5. Is Kafka important for data engineering interviews?<\/h3><p>Yes, Kafka is important for data engineering interviews because it is commonly used in real-time data pipelines, log ingestion, streaming analytics, CDC, and event processing. You should prepare Kafka topics, partitions, brokers, producers, consumers, offsets, consumer groups, retention, and consumer lag. You can also practise scenario-based Kafka questions on message loss, duplicates, ordering, and scaling.<\/p><h3>6. Is Kafka difficult for freshers to learn?<\/h3><p>Kafka may look difficult at first because it has many new terms like brokers, partitions, offsets, replicas, and consumer groups. But freshers can learn it step by step by starting with the basic message flow: producer sends data to a topic, Kafka stores it in partitions, and consumers read it. You should practise small producer-consumer examples to understand Kafka better.<\/p><p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kafka is a key skill for freshers and professionals preparing for data engineering, backend development, Java development, DevOps, and real-time application roles.Apache Kafka is used for building high-performance data pipelines, streaming analytics, and event-driven systems, and more than 80% of Fortune 100 companies use Kafka.This article covers practical Kafka interview questions, including basic concepts, architecture, [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":21949,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[],"class_list":["post-21820","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-programming-interview-questions"],"_links":{"self":[{"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/posts\/21820","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/comments?post=21820"}],"version-history":[{"count":7,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/posts\/21820\/revisions"}],"predecessor-version":[{"id":21835,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/posts\/21820\/revisions\/21835"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/media\/21949"}],"wp:attachment":[{"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/media?parent=21820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/categories?post=21820"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.placementpreparation.io\/blog\/wp-json\/wp\/v2\/tags?post=21820"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}