Skip to content

Redpanda Architecture Pt 1๐Ÿ”—

Reference Video

Topics๐Ÿ”—

image

Origins of Kafka๐Ÿ”—

image

Core Concepts in Kafka๐Ÿ”—

image

image

What is Redpanda?๐Ÿ”—

image

Redpanda Bits and Bytes๐Ÿ”—

image

Redpanda Core๐Ÿ”—

image

Seastar Concepts๐Ÿ”—

image

Great question โ€” letโ€™s unpack โ€œthreads are pinned to a coreโ€:


1. Normal threading (default in most apps, like Kafka)๐Ÿ”—

  • An application creates multiple threads.
  • The OS scheduler decides which CPU core runs each thread.
  • Threads can move between cores depending on load, availability, or scheduling.
  • This gives flexibility, but:

  • Causes context switches (thread gets paused, moved to another core).

  • Causes cache misses (data in one coreโ€™s L1/L2 cache isnโ€™t available on the new core).
  • Adds latency jitter.

2. Pinning (CPU affinity)๐Ÿ”—

  • โ€œPinningโ€ means a thread is locked (affined) to a specific CPU core.
  • The OS will always schedule that thread only on that one core.
  • Benefits:

  • No migration โ†’ thread always runs on the same CPU.

  • Cache locality โ†’ data stays in that CPUโ€™s cache, improving performance.
  • Predictable latency โ†’ no interruptions from the scheduler moving threads around.
  • Downside:

  • Less flexible: if one core is overloaded, OS canโ€™t move its threads elsewhere.

  • Requires careful design to balance load across cores.

3. In Redpanda / Seastarโ€™s case๐Ÿ”—

  • Instead of traditional multithreading, Seastar uses a โ€œshard-per-coreโ€ model:

  • Each core gets one dedicated shard (like an event loop).

  • That shard is pinned to the core.
  • It runs everything: networking, disk I/O, scheduling for that shard.
  • This eliminates almost all locking and cross-core coordination overhead.
  • Each shard processes requests independently, and inter-core communication happens explicitly via message passing (not shared-memory locks).

โœ… So, โ€œthreads are pinned to a coreโ€ means: Each execution unit (thread/shard) runs permanently on the same CPU core, giving predictable performance and cache efficiency, instead of being moved around by the OS scheduler.


๐Ÿ”น 1. Process vs Thread๐Ÿ”—

  • A process is a program in execution:

  • Has its own memory space (heap, stack, code, etc.).

  • Example: java -jar kafka.jar starts a Kafka broker process.
  • A thread is a lightweight unit of execution inside a process:

  • Shares the same memory space as other threads in that process.

  • Has its own stack and program counter (so it can run independently).
  • Example: Kafka spawns threads for handling networking, log flushes, replication, etc.

๐Ÿ‘‰ Think of a process as a house, and threads as people inside the house who share the same kitchen (memory), but can each do different tasks.


๐Ÿ”น 2. What does a thread actually do?๐Ÿ”—

  • A thread executes a sequence of instructions (functions, loops, syscalls).
  • The OS schedules the thread on a CPU core.
  • Multiple threads in the same process can run concurrently (on different cores).

Example:

  • Thread A reads data from the network socket.
  • Thread B compresses and batches the data.
  • Thread C writes data to disk.

๐Ÿ”น 3. Does the application โ€œsend data via a threadโ€?๐Ÿ”—

Not exactly.

  • The application creates threads to handle tasks (e.g., read, process, write).
  • Each thread operates on shared data structures in the processโ€™s memory.
  • Threads can pass data between each other via:

  • Shared memory (since they live in the same process).

  • Queues, buffers, or synchronization primitives (locks, semaphores).

So itโ€™s not like a thread is a โ€œpipeโ€ that data flows through. ๐Ÿ‘‰ Instead: A thread is a worker that executes instructions on data in memory. The application controls what the thread does.


๐Ÿ”น 4. Example: Kafka๐Ÿ”—

  • Kafka broker process starts โ†’ JVM process.
  • JVM creates threads:

  • Network thread: handles socket I/O from producers/consumers.

  • I/O thread: appends messages to the log.
  • Replica fetcher threads: replicate data across brokers.
  • Threads share the same heap memory, but each has its own execution flow.

โœ… In short: A thread is not a data pipe โ€” itโ€™s a unit of execution that runs code inside a process. The application assigns tasks to threads, and those threads can work with shared memory to process or pass data around.


Thread Per Core Benefits๐Ÿ”—

image

Shard to Partition Mapping๐Ÿ”—


๐Ÿ”น Partition-to-Shard Mapping in Redpanda (Seastar Model)

  • In Redpanda, a partition is assigned to exactly one shard (CPU core) within a broker.
  • That shard is the exclusive owner of the partition: it handles all reads, writes, and replication logic for it.
  • There is no sharing of partitions across shards โ†’ avoids locks, keeps the model deterministic.

๐Ÿ‘‰ So yes: 1 partition โ†’ 1 shard (on a given broker).


๐Ÿ”น How This Works in Practice

  1. Broker Setup

  2. Each Redpanda broker runs with multiple shards (cores).

  3. Example: A broker with 8 CPU cores โ†’ 8 shards.

  4. Partition Assignment

  5. When a partition is created, Redpanda assigns it to a shard.

  6. Partition โ†’ Shard mapping is stored in metadata (using Raft consensus).

  7. Shard Locality

  8. Once assigned, all producers/consumers that interact with that partition will hit that shard directly.

  9. This guarantees:

    • No cross-shard locks
    • Cache locality (NUMA-aware memory)
    • Predictable performance
  10. Scaling Partitions

  11. More partitions = spread across more shards (and brokers).

  12. If you have 100 partitions and 8 shards, partitions will be distributed \~evenly across shards.

๐Ÿ”น Why This is Efficient

  • Each shard runs its own event loop with Seastar.
  • Since a shard owns the partition exclusively:

  • No lock contention.

  • No need for multiple threads touching the same partition.
  • CPU cache locality is preserved.

This is very different from Kafkaโ€™s JVM/thread pool model, where multiple threads may process partitions and need locks/synchronization.


Visual (Simplified)

Kafka Broker (JVM, Thread Pools):

Thread Pool
  โ”œโ”€โ”€ Partition 1 (handled by multiple threads w/ locks)
  โ”œโ”€โ”€ Partition 2 (ditto)
  โ””โ”€โ”€ Partition 3 ...

Redpanda Broker (Seastar, Shards):

Shard 0 โ†’ Partition 1, Partition 5
Shard 1 โ†’ Partition 2, Partition 6
Shard 2 โ†’ Partition 3
Shard 3 โ†’ Partition 4

๐Ÿ‘‰ Each shard handles its own partitions โ€” no overlap, no locks.


  • A partition cannot be split across shards (itโ€™s always fully owned).
  • But a shard can own multiple partitions if you have more partitions than shards.
  • Redpandaโ€™s scheduler balances partitions across shards.

๐Ÿ”น Shard to Partition Mapping in Redpanda (Seastar Model)๐Ÿ”—

  • In Redpanda, a partition is assigned to exactly one shard (CPU core) within a broker.
  • That shard is the exclusive owner of the partition: it handles all reads, writes, and replication logic for it.
  • There is no sharing of partitions across shards โ†’ avoids locks, keeps the model deterministic.

๐Ÿ‘‰ So yes: 1 partition โ†’ 1 shard (on a given broker).


๐Ÿ”น How This Works in Practice

  1. Broker Setup

  2. Each Redpanda broker runs with multiple shards (cores).

  3. Example: A broker with 8 CPU cores โ†’ 8 shards.

  4. Partition Assignment

  5. When a partition is created, Redpanda assigns it to a shard.

  6. Partition โ†’ Shard mapping is stored in metadata (using Raft consensus).

  7. Shard Locality

  8. Once assigned, all producers/consumers that interact with that partition will hit that shard directly.

  9. This guarantees:

    • No cross-shard locks
    • Cache locality (NUMA-aware memory)
    • Predictable performance
  10. Scaling Partitions

  11. More partitions = spread across more shards (and brokers).

  12. If you have 100 partitions and 8 shards, partitions will be distributed \~evenly across shards.

๐Ÿ”น Why This is Efficient

  • Each shard runs its own event loop with Seastar.
  • Since a shard owns the partition exclusively:

  • No lock contention.

  • No need for multiple threads touching the same partition.
  • CPU cache locality is preserved.

This is very different from Kafkaโ€™s JVM/thread pool model, where multiple threads may process partitions and need locks/synchronization.


๐Ÿ”น Visual (Simplified)๐Ÿ”—

Kafka Broker (JVM, Thread Pools):

Thread Pool
  โ”œโ”€โ”€ Partition 1 (handled by multiple threads w/ locks)
  โ”œโ”€โ”€ Partition 2 (ditto)
  โ””โ”€โ”€ Partition 3 ...

Redpanda Broker (Seastar, Shards):

Shard 0 โ†’ Partition 1, Partition 5
Shard 1 โ†’ Partition 2, Partition 6
Shard 2 โ†’ Partition 3
Shard 3 โ†’ Partition 4

๐Ÿ‘‰ Each shard handles its own partitions โ€” no overlap, no locks.


๐Ÿ”น Important Note

  • A partition cannot be split across shards (itโ€™s always fully owned).
  • But a shard can own multiple partitions if you have more partitions than shards.
  • Redpandaโ€™s scheduler balances partitions across shards.

โœ… Answer: Yes, in Redpandaโ€™s Seastar model, a partition maps to exactly one shard (core). This lock-free ownership model is what gives Redpanda its high throughput and low latency.


๐Ÿ”น Why Rebalancing is Needed๐Ÿ”—

  • In any event streaming cluster, partitions need to be spread evenly for performance.
  • Situations that trigger rebalancing:

  • Adding/removing brokers (scale up/down).

  • Adding/removing CPU cores (changing shard count).
  • Increasing partitions on a topic.
  • Failure recovery (a broker goes down).

๐Ÿ”น Kafka Partition Rebalancing (Traditional Way)๐Ÿ”—

  • Kafka relies on a partition reassigner (via ZooKeeper or KRaft).
  • When brokers are added, Kafka shifts partitions across brokers, but:

  • Within a broker, partitions are handled by threads in pools (not pinned to a core).

  • Partition-to-thread mapping is dynamic, with potential contention.
  • Rebalancing is often manual + disruptive (CLI commands, partition reassignment tool).
  • Data movement = expensive, because Kafka copies log segments across brokers during reassignment.

๐Ÿ”น Redpanda Partition Rebalancing (Seastar Model)๐Ÿ”—

1. Partition-to-Shard Pinning๐Ÿ”—

  • Each partition is always owned by exactly one shard.
  • When partitions are assigned to a broker, Redpanda also ensures load balancing across shards within that broker.

2. Adding a New Broker๐Ÿ”—

  • Redpanda automatically reassigns some partitions to the new broker.
  • Metadata (via Raft) is updated to reflect ownership.
  • The new broker takes over as partition leader or replica for some partitions.
  • Producers/consumers redirect automatically (via client metadata refresh).

๐Ÿ‘‰ This is smoother than Kafkaโ€™s rebalance because Redpanda has no external ZooKeeper layer.


3. Adding CPU Cores (More Shards)๐Ÿ”—

  • Suppose a broker runs on 4 cores (shards) and you upgrade it to 8 cores.
  • Redpanda can redistribute partitions across the new shards.
  • Each partition is moved to a new shard if needed, but ownership is always exclusive.
  • This way, hardware scaling (more cores) leads to more parallelism without rewriting application logic.

4. Partition Expansion๐Ÿ”—

  • If you increase partitions in a topic, Redpanda assigns new partitions to shards across brokers evenly.
  • Existing partitions remain pinned โ€” no surprise reassignments unless explicitly rebalanced.

5. Failure Recovery๐Ÿ”—

  • If a broker/shard fails, Redpanda promotes replicas (via Raft consensus) to leaders.
  • The partition moves to another shard/broker that has a replica.
  • Clients auto-discover the new leader.

๐Ÿ”น Why Redpandaโ€™s Model Helps

Aspect Kafka Redpanda (Seastar)
Partition Ownership Dynamic threads Fixed shard-per-core
Rebalancing Trigger Often manual Mostly automatic
Intra-broker balance Threads may contend Explicit shard assignment
Scaling Cores No concept Shards = cores, easy scaling
Data Movement Heavy (log copy) Lighter (replicas managed via Raft)

๐Ÿ”น Example

Suppose:

  • Cluster = 2 brokers, 4 cores each โ†’ 8 shards total.
  • Topic = 8 partitions.

Initial mapping:

Broker1 Shard0 โ†’ Partition0
Broker1 Shard1 โ†’ Partition1
Broker1 Shard2 โ†’ Partition2
Broker1 Shard3 โ†’ Partition3
Broker2 Shard0 โ†’ Partition4
Broker2 Shard1 โ†’ Partition5
Broker2 Shard2 โ†’ Partition6
Broker2 Shard3 โ†’ Partition7

Now you add a third broker (4 cores):

  • Redpanda rebalances so that Broker3 takes ownership of some partitions (say 2 and 6).
  • Partition ownership shifts smoothly, Raft ensures replica consistency.

Kafka Thread Pooling๐Ÿ”—

Perfect โ€” letโ€™s explain thread pools in the context of Kafka only.


๐Ÿ”น Why Kafka uses thread pools๐Ÿ”—

  • Kafka brokers handle a huge number of concurrent tasks:

  • Accepting requests from producers.

  • Serving fetch requests from consumers.
  • Replicating partitions between brokers.
  • Flushing data to disk.
  • If Kafka created a new thread for every client request, it would waste CPU and memory.
  • Instead, Kafka uses thread pools:

  • A fixed number of threads created at broker startup.

  • Incoming work is placed into queues.
  • Threads pick tasks from these queues and execute them.

๐Ÿ”น Examples of thread pools inside Kafka๐Ÿ”—

  1. Network Thread Pool

  2. Each broker has a set of network threads.

  3. They handle socket connections, parse requests, and enqueue them for processing.
  4. By default, the number of network threads = num.network.threads (configurable).
  5. Example: If you set num.network.threads=3, Kafka creates 3 reusable threads to handle all incoming client connections.

  6. I/O / Request Handler Thread Pool

  7. Requests received by network threads are handed off to I/O threads.

  8. These handle actions like reading/writing data to partitions, updating metadata, etc.
  9. Controlled by num.io.threads.
  10. Example: If you have 8 I/O threads, they work in parallel to serve fetch/produce requests from the queue.

  11. Replica Fetcher Thread Pool

  12. Brokers need to replicate partitions across each other.

  13. Kafka uses a pool of replica fetcher threads, one per leader-follower connection.
  14. They continuously pull new data from leaders and apply it to local logs.

  15. Controller Thread (special case)

  16. The broker elected as controller uses a dedicated thread to manage partition leadership and cluster metadata.

  17. This isnโ€™t a pool but a single thread with a special role.

๐Ÿ”น Why this matters๐Ÿ”—

  • Efficiency: Threads are expensive, so Kafka recycles them.
  • Throughput: A pool keeps all CPU cores busy without creating too many threads.
  • Predictability: Pools prevent the system from spawning unbounded threads when under load (avoiding crashes).

โœ… In short (Kafka terms): Kafka uses thread pools (network, I/O, replica fetchers) to process large numbers of concurrent requests with a fixed number of reusable threads. Instead of one thread per request, requests go into a queue, and a worker thread from the pool handles them.


๐Ÿ”น Kafka: Thread Pools Model๐Ÿ”—

  • Thread pools: Network threads, I/O threads, replica fetcher threads.
  • Work model:

  • A request arrives โ†’ goes into a queue โ†’ some thread in the pool picks it up.

  • Threads may run on different CPU cores โ†’ need locks and synchronization to coordinate access to shared structures (like logs, partitions).
  • Implication:

  • More flexible, but extra overhead from context switches, locks, and memory sharing.

  • OS scheduler decides which threads run on which cores (unless pinned manually).

๐Ÿ”น Redpanda (Seastar): Shard-per-Core Model๐Ÿ”—

  • No thread pools at all.
  • Instead:

  • Each CPU core runs a single Seastar โ€œreactorโ€ thread.

  • That thread never migrates โ†’ it is pinned to the core permanently.
  • Each reactor (aka shard) runs its own event loop and manages all tasks assigned to it: networking, disk I/O, scheduling.
  • Work model:

  • Incoming requests are directed to the shard that owns the partition (no global queue).

  • That shard executes all operations locally, without locks.
  • If work needs to cross cores, shards pass messages explicitly (message passing, not shared-memory locks).
  • Implication:

  • Completely avoids contention โ†’ no thread pools, no locks, no queues between workers.

  • Each shard has exclusive ownership of its memory and partitions.
  • Predictable latency (no surprises from OS scheduling).

๐Ÿ”น Side-by-side Comparison๐Ÿ”—

Feature Kafka (Thread Pools) Redpanda (Shard-per-Core)
Concurrency model Multiple thread pools (network, I/O, replication). One shard (reactor thread) per CPU core.
Scheduling OS scheduler decides which thread runs on which core. Threads are pinned โ†’ 1 thread per core forever.
Work distribution Tasks placed into queues, picked by worker threads. Requests routed directly to the shard that owns the partition.
Synchronization Requires locks (shared memory between threads). No locks โ†’ shard owns its state, cross-core via message passing.
Context switches Frequent, threads may migrate across cores. None (thread never migrates).
Analogy Call center with a pool of operators picking calls from a queue. Each operator has their own dedicated customers, no queue, no sharing.

๐Ÿ”น Why Redpanda dropped thread pools๐Ÿ”—

  • Kafkaโ€™s model = general-purpose, flexible, but pays costs of locks + context switching.
  • Redpandaโ€™s Seastar model = deterministic, low-latency, NVMe-optimized.
  • By dedicating one reactor thread per core, Redpanda avoids the OS scheduler entirely and fully controls concurrency.

โœ… In short:

  • Kafka โ†’ thread pools with shared state, queues, and locks.
  • Redpanda โ†’ no thread pools, just one pinned reactor thread per core, using message passing instead of locking.

๐Ÿ”น In Kafka๐Ÿ”—

  • Separate thread pools handle different responsibilities:

  • Network threads โ†’ accept producer/consumer socket requests.

  • I/O threads โ†’ read/write data to partitions.
  • Replica fetcher threads โ†’ replication.
  • These threads share data structures โ†’ need locks + queues.

๐Ÿ”น In Redpanda (Shard-per-Core model)๐Ÿ”—

  • Each shard = one reactor thread pinned to one CPU core.
  • That shard owns a subset of partitions (log segments).
  • And yes, it handles everything for those partitions:

  • Networking

  • Each shard has its own TCP/HTTP server stack (Seastar provides this).

  • When a producer sends a message for a partition owned by shard 3, the network request is routed directly to shard 3.
  • That shard parses, validates, and queues the write internally.

  • Log append (Producer writes)

  • Shard 3 appends the data directly to its NVMe segment using async direct I/O.

  • No locks, no handing off to another thread.

  • Consumer fetches (Reads)

  • If a consumer requests data for a partition on shard 3, that same shard serves the request directly from its log segment (or in-memory cache).

  • Again: no global queue, no cross-thread locks.

  • Replication (Followers โ†’ Leaders)

  • If shard 3 owns a leader partition, it handles replication requests from follower brokers itself.

  • Fetcher threads in Kafka become shard-owned replication tasks in Redpanda.

๐Ÿ”น Why this is powerful๐Ÿ”—

  • Everything related to a partition lives in one shard.
  • The shard handles networking, persistence, and serving clients without coordination overhead.
  • If work must cross shards (e.g., partition A on core 2, partition B on core 5), Redpanda uses message passing, not shared locks.

โœ… So yes: A shard in Redpanda is responsible for the full lifecycle of the partitions it owns:

  • Accepting producer writes.
  • Appending to disk.
  • Serving consumer fetches.
  • Handling replication.

Kafka splits these into different thread pools โ†’ Redpanda collapses them into a single shard reactor per core.

Isnt Kafka Faster Because of Thread Pools?๐Ÿ”—


๐Ÿ”น Kafkaโ€™s Concurrency Model๐Ÿ”—

  • Multiple thread pools can appear to give concurrency:

  • Network threads enqueue requests.

  • I/O threads pick them up and write/read logs.
  • Replica fetcher threads replicate in parallel.
  • But the cost is:

  • Locks everywhere (log segments, partition metadata, socket buffers).

  • Context switches when handing tasks across pools.
  • Cache misses because data may bounce across cores.
  • OS scheduler interference.

So while Kafka can โ€œparallelizeโ€ operations on the same partition via pools, the overhead (locks, scheduling, context switching) adds latency and jitter.


๐Ÿ”น Redpandaโ€™s Concurrency Model๐Ÿ”—

  • Each shard (core) is a single-threaded event loop:

  • Owns a set of partitions exclusively.

  • No locks, no hand-offs, no blocking.
  • Concurrency comes from multiple shards in parallel:

  • If you have 16 cores, you have 16 shards running completely independently.

  • Each shard is simultaneously handling networking + producers + consumers + replication for its partitions.
  • For a single partition:

  • Only one shard touches it (so no thread-level concurrency on that data).

  • But this actually improves performance: no lock contention, no context switches.

๐Ÿ”น Why Redpanda isnโ€™t slower๐Ÿ”—

  1. Lock-free execution

  2. Kafka: โ€œparallel threadsโ€ but guarded by locks โ†’ effectively serial at the partition level.

  3. Redpanda: one shard, lock-free, guaranteed order โ†’ faster.

  4. Core-to-core scaling

  5. Kafka: extra overhead scaling across cores because threads migrate.

  6. Redpanda: scaling is natural โ€” add more cores โ†’ more shards โ†’ more partitions handled in parallel.

  7. NVMe optimization

  8. Kafkaโ€™s I/O goes through the OS page cache and threads.

  9. Redpanda maps shards directly to NVMe queues โ†’ multiple cores can hit storage in true parallel, without lock contention.

๐Ÿ”น Analogy๐Ÿ”—

  • Kafka = a restaurant where multiple waiters share the same kitchen (need rules/locks to avoid collisions). Looks busy, but thereโ€™s overhead coordinating.
  • Redpanda = each waiter has their own kitchen + their own customers. No conflicts, no coordination. Less โ€œfake concurrency,โ€ more real parallelism.

โœ… Answer: Redpanda is not slower. Even though a shard processes a partitionโ€™s work serially, thatโ€™s exactly what Kafka does too (because partitions are single-threaded units of order). The big win is that Redpanda avoids lock contention, context switches, and cache misses, so it scales much better with more cores and NVMe drives.