Aman Goyal

LeetCode LeetCode

Coordinated Batch Processing: Combining Parallel Work with Join and Reduce

Core Concept

Think:

“Do work in parallel → then coordinate + combine results”


Why Coordination Is Needed

But:

Need coordination primitives


Core Patterns


1. Join (Barrier Synchronization)

What it does:

Example:


Key properties:


Trade-off:


2. Reduce (VERY IMPORTANT)

What it does:


Example: Word Count

Input:

a: 50
the: 17
a: 30
the: 25

Output:

a: 80
the: 42

Key insight:

Faster than join


Join vs Reduce

FeatureJoinReduce
Waits for all dataYesNo
ParallelismLowHigh
Use caseCompletenessAggregation

3. Sum (special case of Reduce)

Example:

(Seattle, 4M) + (Town, 25k) → (Combined, 4.025M)

4. Histogram (advanced Reduce)

Example:

0 kids: 15%
1 kid: 25%

Combine using:


Key Insight

Reduce is composable → can run multiple times in parallel

[Many outputs] → [Partial reduces] → [Final result]

Real Pipeline Example

Image Processing Workflow

Steps:

  1. Shard
    • Distribute images across workers
  2. Multiworker
    • Detect plate
    • Blur plate
  3. Join
    • Wait for ALL images processed
  4. Copier
    • Duplicate work:
      • Delete originals
      • Analyze vehicles
  5. Shard again
    • Distribute analysis
  6. Reduce
    • Aggregate:
      • Vehicle counts
      • Colors

Example Output

{
  "vehicles": {
     "car": 12,
     "truck": 7,
     "motorcycle": 4
   },
   "colors": {
     "white": 8,
     "black": 3,
     "blue": 6,
     "red": 6
   }
}

Trade-offs

Join

Reduce


Mental Model

[ Sharded Work ]
       ↓
   (Join OR Reduce)
       ↓
[ Final Output ]

Key Insights


One-line Summary

Coordinated batch processing introduces join (wait for all) and reduce (incremental aggregation) patterns to safely combine parallel work into final results.

#Distributed Systems #System Design #Batch Processing #MapReduce #Aggregation