Aman Goyal

LeetCode LeetCode

Event-Driven Batch Processing: Building Scalable Workflow Pipelines

Core Concept

Think:

Multiple work queues connected together by events


Why This Pattern Exists

Need workflow orchestration


Core Patterns


1. Copier

Use case:

Also improves:


2. Filter

Example:

Implemented as:


3. Splitter

Example:

Like:


4. Sharder

Benefits:

Same idea as service sharding


5. Merger

Example:

Opposite of copier


Workflow Mental Model

Input → [Copy] → [Parallel queues] → [Filter/Split] → [Merge] → Output

DAG (Directed Acyclic Graph)


Pub/Sub is the Backbone

Provides:


Kafka Setup (Exact Commands)

Create topics

for x in 0 1 2; do
  kubectl run kafka --image=solsson/kafka:0.11.0.0 --rm --attach --command -- \
    ./bin/kafka-topics.sh --create --zookeeper kafka-service-zookeeper:2181 \
      --replication-factor 3 --partitions 10 --topic photos-$x
done

Produce messages

kubectl run kafka-producer --image=solsson/kafka:0.11.0.0 --rm -it --command -- \
    ./bin/kafka-console-producer.sh --broker-list kafka-service-kafka:9092 \
    --topic photos-1

Consume messages

kubectl run kafka-consumer --image=solsson/kafka:0.11.0.0 --rm -it --command -- \
    ./bin/kafka-console-consumer.sh --bootstrap-server kafka-service-kafka:9092\
    --topic photos-1 \
        --from-beginning

Real Example: User Signup Workflow

Flow:

  1. User signup → verification email
  2. After verification:
    • Copier → welcome email + notifications
  3. Notifications:
    • Filter → email/SMS preferences

Combines:


Real-World Problems


1. Uneven Work (Stragglers)

Solution: Work Stealing


2. Worker Failures

Must ensure:


3. Poison Messages

Solution:


4. Backlog Problem

Solution: Priority queues


Key Insights


Trade-offs

Pros

Cons


One-line Summary

Event-driven batch systems connect multiple work queues into workflows using patterns like copier, filter, splitter, sharder, and merger, enabling scalable and reliable data processing pipelines.

#Distributed Systems #System Design #Event-Driven #Batch Processing #Kafka #Workflows