Gossip Protocol & Distributed Cache

Gossip Protocol (aka Epidemic Protocol) is a peer-to-peer communication model where nodes periodically exchange state with a random subset of peers — mimicking how rumors spread in a social network.

How It Works

T=0 · Origination

Node N1 gets new state (e.g., counter=42 for client X). It marks this with a vector clock or timestamp.

T=1 · Fan-out Round 1

N1 picks k=2 random peers (N3, N7). Sends a digest of its state. Peers merge if the version is newer.

T=2 · Fan-out Round 2

N3 and N7 now know. Each independently picks 2 more peers. After O(log N) rounds, all nodes converge.

T=3 · Convergence

Full propagation across 100-node cluster in ~7 rounds. No central coordinator. No SPOF.

Three Gossip Variants

SI Model · Push

Susceptible → Infected. Node sends its full state to peers. Simple, but redundant traffic as all nodes become infected.

SIS Model · Pull

Nodes periodically pull state from peers. Better for reconciliation but adds latency. Used in anti-entropy repair.

SIR Model · Push-Pull

Hybrid: push new state, pull diffs. Most efficient — used in Cassandra, Consul, DynamoDB. Eliminates redundant pushes via Removed state.

Gossip in Rate Limiting Context

In distributed rate limiting, each node tracks counters locally. The challenge: how do you share counter state across nodes without a central bottleneck? Gossip is the answer.

Gossip propagation across rate-limiter cluster

RL-Node-1count=38

⇄

RL-Node-2count=41

⇄

RL-Node-3count=35

⇄

RL-Node-4count=40

After gossip rounds → each node knows: global_count ≈ 154 → enforce limit=150 → DENY

Key Properties

Convergence Guarantee

Information reaches all N nodes in O(log N) gossip rounds. With N=1000 and round interval=200ms, full convergence in ~2 seconds.

Fault Tolerance

Node failures don't stop propagation. A message will route around failed nodes in the next gossip round. No leader election needed.

Eventual Consistency

Nodes may have stale data during propagation window. For rate limiting: tolerable if you accept a small overage window (soft limits).

Gossip Interval Tradeoff

Short interval (50ms) → fresh data, high bandwidth. Long interval (500ms) → stale data, low bandwidth. Tune per SLA.

Merge Logic (CRDT)

Gossip alone isn't enough — you need a conflict-free merge strategy. Rate limiting uses G-Counters (Grow-only CRDTs):

type GCounter struct {
    Counts map[string]int64  // nodeID → local count
}

func (g *GCounter) Merge(other *GCounter) {
    for nodeID, count := range other.Counts {
        if count > g.Counts[nodeID] {
            g.Counts[nodeID] = count  // take max → no conflicts
        }
    }
}

func (g *GCounter) Value() int64 {
    var total int64
    for _, v := range g.Counts {
        total += v
    }
    return total  // global count = sum of all node counters
}

// On incoming request at Node "n1":
func (rl *RateLimiter) Allow(clientID string) bool {
    rl.counter.Counts["n1"]++               // local increment
    return rl.counter.Value() <= rl.limit   // check global sum
}

G-Counters are idempotent — receiving the same gossip message twice doesn't corrupt state. This is critical for unreliable networks where duplicates occur.

A distributed cache in the rate limiting context is a shared, low-latency store that multiple rate-limiter nodes read/write to maintain globally consistent counters — replacing local-only or gossip-propagated state with a centralized source of truth.

Architecture Patterns

Pattern A · Redis Cluster (Centralized)

All RL nodes talk to a Redis cluster. Atomic INCR + EXPIRE or Lua scripts for sliding windows. Strong consistency. Single cache failure = system failure.

Strong Consistent SPOF Risk Simple

Pattern B · Local Cache + Sync (Hybrid)

Each node has a local cache (HashMap). Gossip or async sync merges state. Gossip interval = consistency window. Used by Cloudflare's rate limiter.

High Perf Resilient Eventual

Pattern C · Sharded Cache (Token Bucket)

Consistent hash → route client to specific cache shard. Each shard owns its clients fully. Best isolation, harder rebalancing on scale.

No Hot Keys Scalable Rebalance Cost

Pattern D · Two-Layer Cache

L1: in-process LRU (sub-ms reads). L2: Redis or Memcached (ms reads). L1 absorbs 90%+ traffic. L2 is source of truth. Write-through or write-back.

Ultra Fast Complex Production Grade

Redis Rate Limit Patterns

-- Fixed Window Counter (Lua, atomic)
local key = "rl:{clientID}:{window}"      -- e.g. rl:user123:202403191400
local count = redis.call('INCR', key)
if count == 1 then
    redis.call('EXPIRE', key, 60)          -- TTL = window size
end
return count <= limit

-- Sliding Window Log (ZSET)
local now = tonumber(ARGV[1])             -- current timestamp ms
local window = tonumber(ARGV[2])          -- e.g. 60000ms
local limit = tonumber(ARGV[3])

redis.call('ZREMRANGEBYSCORE', KEY, 0, now - window)
local count = redis.call('ZCARD', KEY)
if count < limit then
    redis.call('ZADD', KEY, now, now)      -- score=timestamp, member=timestamp
    redis.call('EXPIRE', KEY, window/1000)
    return 1                               -- allowed
end
return 0                                   -- denied

Cache Failure Modes (Critical for Interviews)

These failure modes are what Staff-level interviewers probe. Know them cold.

Failure Mode	Cause	Impact on Rate Limiting	Mitigation
Cache Stampede	Key expires, many nodes request simultaneously	All nodes bypass RL, spike to backend	Probabilistic early expiry, lock-based regeneration, local fallback
Hot Key Problem	Popular clientID → all traffic to one shard	Shard overload, high latency, drops	Key sharding (clientID:shard_N), local cache for hot keys
Network Partition	RL nodes can't reach Redis	Fail open (allow all) vs fail closed (deny all)	Circuit breaker + local fallback counter, configurable fail policy
Thundering Herd	Redis restart or failover	Mass counter reset → clients exceed limits temporarily	Sentinel/Cluster HA, warm-up period, shadow counters
Clock Skew	Nodes have different system clocks	Window boundaries misaligned, counter leakage	NTP sync, use Redis server time (TIME command), logical clocks
Stale Read	Reading from replica during replication lag	Counter looks lower than reality → allow over-limit	Always write/read from primary, or use quorum reads

Consistent Hashing for Cache Sharding

To shard rate-limit counters across cache nodes without global coordination:

type HashRing struct {
    replicas int
    ring     map[uint32]string   // hash → nodeID
    sorted   []uint32
}

func (h *HashRing) GetNode(clientID string) string {
    hash := crc32(clientID)
    // find first ring position >= hash (clockwise)
    idx := sort.Search(len(h.sorted), func(i int) bool {
        return h.sorted[i] >= hash
    })
    if idx == len(h.sorted) { idx = 0 }   // wrap around
    return h.ring[h.sorted[idx]]
}

// clientID "user:123" always routes to same cache node
// Node failure → only that node's keys re-routed, not all

Rate limiting is the enforcement of a maximum request count per client per time window. The core challenge at scale: counters must be globally consistent, fast, and fault-tolerant — an impossible triad under CAP.

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst Handling	Distributed Fit
Fixed Window	O(1)	Poor (boundary spike)	Double burst at boundary	Easy (INCR/EXPIRE)
Sliding Window Log	O(requests)	Exact	Precise	Heavy (ZSET per user)
Sliding Window Counter	O(1)	~1% error	Good approximation	Good (2 counters)
Token Bucket	O(1)	Good	Allows bursts up to bucket size	Needs sync for refill
Leaky Bucket	O(1)	Good	Smooths bursts	Queue-based, stateful

System Architecture · Full Stack

Request flow through distributed rate limiter

Client

→

API GatewayNginx / Envoy

→

RL SidecargRPC

⇄

Redis ClusterL2 Cache

↕ gossip / async sync

Local CacheL1 in-proc

⇄

Config Storeetcd / Consul

⇄

MetricsPrometheus

Sliding Window Counter (Most Practical)

func (rl *SlidingWindowRL) Allow(ctx context.Context, clientID string) (bool, error) {
    now := time.Now()
    currWindow := now.Truncate(rl.windowSize).Unix()
    prevWindow := currWindow - int64(rl.windowSize.Seconds())

    currKey := fmt.Sprintf("rl:%s:%d", clientID, currWindow)
    prevKey := fmt.Sprintf("rl:%s:%d", clientID, prevWindow)

    pipe := rl.redis.Pipeline()
    incrCmd  := pipe.Incr(ctx, currKey)
    prevCmd  := pipe.Get(ctx, prevKey)
    pipe.Expire(ctx, currKey, rl.windowSize*2)
    pipe.Exec(ctx)

    curr := incrCmd.Val()
    prev, _ := strconv.ParseInt(prevCmd.Val(), 10, 64)

    // weighted count: prev_count × overlap_fraction + curr_count
    elapsed := float64(now.UnixNano()-currWindow*1e9) / float64(rl.windowSize)
    estimated := float64(prev)*(1-elapsed) + float64(curr)

    return estimated <= float64(rl.limit), nil
}

The Fail-Open vs Fail-Closed Decision

Fail Open (Allow)

When cache is unreachable, allow all requests. Protects user experience. Risk: traffic spikes hit backend unprotected. Used for non-critical APIs or when DDoS protection is upstream.

Fail Closed (Deny)

When cache is unreachable, deny requests. Protects backend at cost of availability. Used for billing APIs, quota-critical endpoints, or security-sensitive paths.

Staff-level insight: Most production systems use a local fallback counter — when Redis is unreachable, fall back to in-process counter with a degraded (lower) limit for the partition duration. This is neither pure fail-open nor fail-closed.

Token Bucket with Distributed Refill

type TokenBucket struct {
    capacity   int64
    refillRate  float64  // tokens per second
}

var tokenBucketScript = redis.NewScript(`
  local tokens = tonumber(redis.call('GET', KEYS[1]) or ARGV[1])
  local last   = tonumber(redis.call('GET', KEYS[2]) or ARGV[3])
  local now    = tonumber(ARGV[3])
  local rate   = tonumber(ARGV[4])
  local cap    = tonumber(ARGV[1])

  -- refill tokens based on elapsed time
  local elapsed = math.max(0, now - last)
  tokens = math.min(cap, tokens + elapsed * rate)

  if tokens >= 1 then
    tokens = tokens - 1
    redis.call('SET', KEYS[1], tokens, 'EX', 86400)
    redis.call('SET', KEYS[2], now,    'EX', 86400)
    return 1   -- allowed
  end
  return 0     -- denied
`)

How gossip and distributed cache fit together — and the exact tradeoffs you need to articulate in a Staff-level system design interview.

When to Use Gossip vs. Centralized Cache

Dimension	Gossip-Based RL	Centralized Cache (Redis)
Consistency	Eventual (100ms–2s lag)	Strong (atomic ops)
Latency	Sub-ms (local read)	1–5ms (network hop)
Fault Tolerance	Excellent (no SPOF)	Redis failure = outage
Accuracy	±10–20% during propagation	Exact
Throughput	Unlimited (local)	Bounded by Redis cluster
Best for	Soft limits, high-scale, geo-distributed	Billing, strict quotas, security
Real-world examples	Cloudflare RL, Riak, Cassandra	Stripe, Shopify, AWS API GW

Hybrid Architecture (Production Pattern)

Tier 1 · Local Counter (Gossip)

Each node maintains a local counter. Gossip sync every 100–500ms. Handles 95%+ of decisions locally. Uses G-Counter CRDT for conflict-free merge.

Tier 2 · Redis (Authoritative)

For strict limits (billing, OAuth tokens), always hit Redis. Use pipeline + Lua for atomicity. Redis Cluster with sentinel for HA.

Limit Partitioning

Split the global limit across nodes. If limit=1000 and 10 nodes: each node enforces 100. Overage possible but bounded. Recalibrate via gossip rounds.

Sticky Routing (Alternative)

Route clientID → same RL node via consistent hash at gateway. No cross-node sync needed. Problem: node failure redistributes clients, counters reset.

Production Nuances

Nuance 1 · Gossip Amplification

With k=2 fanout and N=100 nodes, each gossip round generates O(k × N) messages. At 200ms interval: 100 nodes × 2 × 5/sec = 1000 msg/sec. Manageable, but grows with cluster size. Use push-pull to cut redundancy.

Nuance 2 · Counter Accuracy vs. Enforcement

During gossip propagation window, a client could exceed limit across nodes. Accept soft enforcement — allow up to 110% of limit, then deny. Design the product around this: "approximately 100 req/min" not "exactly 100".

Nuance 3 · Key Expiry & Memory

Rate limit counters are ephemeral. Use TTL aggressively. For sliding window: TTL = window_size × 2. Redis keyspace: O(clients × windows). For 1M active clients, 1-min window: ~2M keys × 8 bytes ≈ 16MB. Totally fine.

Nuance 4 · Geo-Distributed RL

Cross-region gossip adds 50–200ms latency. Options: (a) per-region limits (limit=100 per region), (b) global Redis with CRDT replication, (c) gossip within region + async cross-region sync.

Nuance 5 · Observability

Emit metrics: rl.allowed, rl.denied, rl.counter_value, rl.sync_lag_ms. Alert on sync lag spikes — indicates gossip partition. Trace which node denied vs allowed for debugging.

Interview Answer Framework

When asked "design a distributed rate limiter" — drive the conversation through these 5 axes:

1. REQUIREMENTS CLARIFICATION
   → Hard limit (billing) or soft limit (API quota)?
   → Per-user, per-IP, or per-API-key?
   → Single region or multi-region?
   → What's acceptable accuracy? (exact vs ~5% overage)
   → What's target latency budget for RL check? (<1ms? <5ms?)

2. ALGORITHM CHOICE
   → Soft limit at scale → Sliding Window Counter + Gossip
   → Hard limit, low scale → Token Bucket + Redis
   → Smooth traffic → Leaky Bucket

3. DATA LAYER
   → Redis Cluster for strong consistency
   → Local CRDT + Gossip for eventual consistency
   → Two-tier: local L1 + Redis L2

4. FAILURE MODES
   → Redis down: fail-open or local fallback?
   → Network partition: accept stale reads?
   → Node crash: counter loss → sticky routing helps

5. SCALING & OPERATIONS
   → Consistent hashing for shard assignment
   → Gossip for node membership (like Cassandra)
   → Metrics, alerting, sync lag monitoring

Gossip Propagation Visualizer

Click a node to inject a rate-limit counter update. Watch it propagate via gossip rounds.

Gossip interval: 400ms

Distributed Rate Limiter Simulator

Simulate requests hitting multiple RL nodes with gossip-based counter sync. Observe how counters converge.

Global Limit

Nodes

Mode

Total Requests

0

Allowed

0

Denied

0

Overage %

0%

Mode

gossip