listmonk Architecture Deep Dive

What is listmonk?

listmonk is a high-performance, self-hosted newsletter and mailing list manager. It ships as a single Go binary with an embedded Vue.js frontend, backed by PostgreSQL. It has demonstrated production workloads of 7+ million emails per campaign with peak RAM of ~57MB and fractional CPU usage. The project has 19k+ GitHub stars and is built by Kailash Nadh (CTO of Zerodha, India's largest stock broker).

Why Study This for System Design?

Single Binary Architecture

stuffbin embeds frontend assets, SQL, i18n into one binary. Zero external dependencies beyond Postgres.

Producer-Consumer Pipeline

Classic concurrent pipeline: DB → batch fetch → template render → rate limit → goroutine pool → SMTP.

Cursor-Based Pagination

Uses keyset pagination (last_subscriber_id) instead of OFFSET for O(1) page traversal at scale.

Pluggable Backends

Messenger interface, Media provider interface. Strategy pattern enables SMTP, HTTP webhooks, S3, filesystem.

Materialized Views

Dashboard stats precomputed via PostgreSQL materialized views. Cron-refreshed for large installations.

Graceful Hot Restart

SIGHUP → graceful shutdown → syscall.Exec() self-replace. Zero-downtime config reloads.

Tech Stack

Layer	Technology	Role
Language	Go	Backend, CLI, campaign engine
Web Framework	labstack/echo v4	HTTP routing, middleware
Database	PostgreSQL	All persistent state, JSONB attributes
DB Driver	jmoiron/sqlx + lib/pq	SQL execution, struct scanning
SQL Management	knadh/goyesql	Named SQL queries from .sql files
Config	knadh/koanf	Multi-source: TOML → env vars → DB
Frontend	Vue.js 3 + Buefy	Admin SPA dashboard
Asset Embed	knadh/stuffbin	Embed frontend/SQL/i18n into binary
Auth	Sessions + OIDC + RBAC	Cookie sessions, SSO, role perms
Templating	html/template + Sprig	Dynamic email templates, 100+ funcs

System Architecture Diagram

Hover over each component to see its responsibilities:

Layered Architecture

listmonk follows a clean 4-layer architecture. Each layer has a single responsibility and communicates only with adjacent layers.

Layer 1: HTTP Handlers (cmd/*.go)

Thin handlers that parse HTTP requests (path params, query strings, JSON bodies), call the Core layer, and serialize responses. No business logic here. Each handler is a method on the App struct which holds references to all subsystems. Echo framework provides routing, middleware, and context.

Layer 2: Core Business Logic (internal/core/)

All domain operations: CRUD for subscribers, lists, campaigns, templates. The Core struct wraps the DB and query runner. This layer is pure Go with zero HTTP dependencies — it could be called from CLI, tests, or any other interface. It enforces validation, permission checks, and domain invariants.

Layer 3: Campaign Manager (internal/manager/)

The concurrent campaign processing engine. Runs as a long-lived goroutine that polls the DB for active campaigns. It owns the entire send pipeline: batch fetching, template rendering, rate limiting, worker pool dispatch, progress tracking, error handling. Completely rewritten in v3.0.0 for near-instant pause/cancel and lossless counting.

Layer 4: Data Layer (queries/*.sql + PostgreSQL)

All SQL lives in .sql files, loaded at startup via goyesql. No ORM. This gives full control over query optimization, PostgreSQL-specific features (JSONB, arrays, materialized views), and makes SQL reviewable and versionable. The sqlx library provides struct scanning.

Key Architectural Decision: The App Struct

type App struct {
    core       *core.Core           // Business logic layer
    fs         stuffbin.FileSystem   // Embedded filesystem (assets, SQL, i18n)
    db         *sqlx.DB              // PostgreSQL connection pool
    queries    *models.Queries       // Pre-loaded named SQL statements
    constants  *constants            // Runtime config snapshot
    manager    *manager.Manager      // Campaign processing engine
    importer   *subimporter.Importer // CSV/bulk import processor
    notifs     *notifs.Notifs        // Admin email notifications
    i18n       *i18n.I18n            // Internationalization
    bounceProc *bounce.Manager       // Bounce email processor
    captcha    *captcha.Captcha      // ALTCHA/hCaptcha
    auth       *auth.Auth            // Sessions + RBAC + OIDC
    events     *events.Events        // SSE event bus
    paginator  *paginator.Paginator  // Cursor/offset pagination
    log        *log.Logger
}

This is Go's idiomatic alternative to dependency injection containers. All subsystems are initialized in init.go and wired together via this struct. Handlers access them via a.core, a.manager, etc.

Database Schema (ER Diagram)

PostgreSQL schema with ~12 tables, JSONB for extensibility, materialized views for analytics:

Key Database Design Decisions

1. Custom ENUM Types (12 types)

PostgreSQL ENUMs enforce valid states at the DB level: campaign_status has 6 states (draft → running → scheduled → paused → cancelled → finished), subscriber_status has 3 (enabled, disabled, blocklisted), subscription_status tracks per-list relationship (unconfirmed, confirmed, unsubscribed). This is a state machine enforced by the database, not application code.

2. JSONB for Flexible Attributes

subscribers.attribs stores arbitrary key-value data as JSONB, enabling schema-less subscriber segmentation. You can query attribs->>'city' = 'Atlanta' with GIN indexes. The settings table stores all app config as JSONB key-value pairs, enabling runtime configuration changes through the UI without schema migrations.

3. Junction Table Pattern (subscriber_lists)

Many-to-many with a composite primary key PK(subscriber_id, list_id). The junction table carries its own state (subscription_status) and metadata (meta JSONB). Separate indexes on each FK column for efficient queries in both directions.

4. Keyset Pagination Columns

campaigns.last_subscriber_id and max_subscriber_id enable cursor-based pagination for campaign sending. Instead of OFFSET N (O(N) scan), it uses WHERE id > last_subscriber_id ORDER BY id LIMIT batch_size (O(1) via index). Critical for sending campaigns to millions of subscribers.

5. Materialized Views for Dashboard

Three materialized views precompute expensive aggregations: mat_dashboard_counts (subscriber/list/campaign totals), mat_dashboard_charts (30-day click/view trends), mat_list_subscriber_stats (per-list subscriber counts by status). Refreshed on cron schedule via REFRESH MATERIALIZED VIEW CONCURRENTLY. Orders-of-magnitude speedup for large databases.

6. Soft References & Denormalization

campaign_views.subscriber_id is nullable with ON DELETE SET NULL — if a subscriber is deleted, their view records persist for analytics. campaign_lists.list_name denormalizes the list name so campaign history survives list deletion. Preserves historical accuracy while allowing entity cleanup.

7. Indexing Strategy

Strategic indexes on: email (case-insensitive unique via LOWER(email)), status columns (for filtered scans), composite (id, status) for campaign batch fetching, DATE(created_at) expression indexes for time-series analytics. Partial unique index on templates(is_default) WHERE is_default = true ensures only one default template.

8. UUID + Serial ID Dual Identity

Internal operations use fast integer serial IDs for joins and pagination. External/public-facing operations use UUIDs (subscriber unsubscribe links, campaign archives, media references). Best of both worlds: performance internally, security externally (UUIDs aren't guessable).

Campaign Processing Pipeline

The campaign engine is the heart of listmonk. Rewritten in v3.0.0 for lossless operation:

1. Scan DB

Manager goroutine polls DB every few seconds for campaigns with status='running' or 'scheduled' (past send_at).

2. Fetch Batch

Queries subscribers in batch_size chunks (default 1000). Uses last_subscriber_id cursor for efficient keyset pagination — no OFFSET.

3. Template Render

For each subscriber: render Go template with subscriber attrs, campaign data, tracking URLs. Sprig functions available.

4. Rate Limit

Token bucket / sliding window rate limiter (configurable). Controls msgs/sec globally across all campaigns.

5. Concurrent Send

N goroutine workers (app.concurrency) pull from channel. Each calls Messenger.Push(). SMTP connection pool per server.

6. Track Progress

Atomic counter updates to campaigns.sent. max_subscriber_id for crash recovery. last_subscriber_id for resumption.

7. Handle Errors

Per-message retries (max_msg_retries). Tracks cumulative errors. Auto-pauses campaign at max_send_errors threshold.

8. Finish/Archive

Status → 'finished'. Optional public archive. Materialized views refreshed for dashboard stats.

Concurrency Model Deep Dive

Producer-Consumer with Channels

// Simplified mental model of the campaign engine

// Producer: fetches subscriber batches from DB
go func() {
    for {
        batch := db.Query("SELECT ... WHERE id > ? LIMIT ?",
                          lastSubID, batchSize)
        for _, sub := range batch {
            msg := renderTemplate(campaign, sub)
            msgChan <- msg  // Send to worker pool
        }
        lastSubID = batch[len(batch)-1].ID
        updateProgress(campaign, lastSubID)
    }
}()

// Consumer pool: N goroutines sending messages
for i := 0; i < concurrency; i++ {
    go func() {
        for msg := range msgChan {
            rateLimiter.Wait()          // Token bucket
            err := messenger.Push(msg)  // SMTP/HTTP
            if err != nil {
                handleRetry(msg, err)
            }
            atomic.AddInt64(&sent, 1)
        }
    }()
}

Rate Limiting Strategies

Strategy	Config	Behavior
Fixed Rate	app.message_rate = 10	Max 10 msgs/second globally. Token bucket.
Sliding Window	sliding_window + duration + rate	Max N messages within rolling time window (e.g., 10000/hour).
Per-SMTP Limits	smtp[].max_conns = 10	Connection pool per SMTP server. Backpressure via channel blocking.
Error Threshold	app.max_send_errors = 1000	Auto-pause campaign after N cumulative send failures.

Crash Recovery & Resumption

The campaign stores last_subscriber_id after each batch. On restart, campaigns with status='running' are automatically resumed from the last checkpoint. The v3.0.0 rewrite ensures every single message is counted (not approximated), making pause/resume lossless. The to_send field is computed at campaign start as the total subscriber count for the target lists.

SMTP Connection Pool

Each configured SMTP server maintains a pool of max_conns persistent TCP connections. Connections are reused across messages (SMTP pipelining). idle_timeout and wait_timeout control connection lifecycle. Multiple SMTP servers can be load-balanced under the default "email" messenger, or targeted individually by naming them.

Go Patterns & Best Practices

Interface-Based Abstractions

// Messenger interface (Strategy pattern)
type Messenger interface {
    Name() string
    Push(msg Message) error
    Flush() error
    Close() error
}
// Implementations: email.Emailer, postback.Postback

Clean interfaces for pluggable backends. SMTP emailer and HTTP webhook postback both implement Messenger. Media storage uses the same pattern (filesystem vs S3).

Struct Composition over Inheritance

type App struct {
    core    *core.Core
    manager *manager.Manager
    db      *sqlx.DB
    // ... all subsystems
}
// Methods: func (a *App) GetSubscribers(...)

No inheritance. The App struct composes all subsystems. Each subsystem is independently testable. Handler methods receive the full app context.

Named SQL via goyesql

-- queries/queries.sql
-- name: get-subscriber
SELECT * FROM subscribers WHERE id = $1;

-- name: get-campaign-subscribers
SELECT ... WHERE id > $1 ORDER BY id LIMIT $2;

SQL lives in .sql files with named tags. Loaded at startup, bound to prepared statements. No ORM overhead. Full PostgreSQL feature access. SQL is code-reviewable.

internal/ Package Privacy

internal/
  core/          # Business logic (unexportable)
  manager/       # Campaign engine (unexportable)
  bounce/        # Bounce processing
  auth/          # Authentication
  media/         # Media storage
  subimporter/   # CSV import

Go's internal/ convention makes these packages importable only by listmonk itself. Prevents external packages from depending on internals. Clean public API boundary.

Graceful Shutdown Pattern

sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGHUP)

go func() {
    <-sigChan
    srv.Shutdown(ctx)     // HTTP
    manager.Close()       // Campaigns
    db.Close()            // Postgres
    syscall.Exec(...)     // Self-replace
}()

SIGHUP triggers graceful shutdown: HTTP server drains, campaign manager flushes, DB connections close. Then syscall.Exec() self-replaces the process for hot restart.

Configuration Layering (koanf)

// Precedence (last wins):
// 1. CLI flags (--config, --install)
// 2. TOML file (config.toml)
// 3. Env vars (LISTMONK_*)
// 4. Database settings table

ko.Load(file.Provider("config.toml"),
        toml.Parser())
ko.Load(env.Provider("LISTMONK_", ...))

Multi-source config via koanf library. Static TOML for infra, env vars for containers, DB settings table for runtime-changeable options (via admin UI).

Additional Go Idioms Used

Pattern	Where	Why
Functional Options	SMTP config, manager setup	Flexible initialization without constructor explosion
Context Propagation	HTTP handlers → Core → DB	Request-scoped deadlines, cancellation, auth info
sync.Once	Template compilation caching	Thread-safe lazy initialization of expensive resources
atomic Operations	Campaign sent counter	Lock-free concurrent counter updates in worker pool
embed.FS (stuffbin)	Static assets, SQL, i18n	Single binary deployment with zero external file deps
Error Wrapping	Throughout	fmt.Errorf with %w for error chain inspection
Table-Driven Tests	Core package	Declarative test cases with expected inputs/outputs
Middleware Chain	Echo middleware	Auth → CORS → logging → rate limit → handler

Scalability & Performance at 1M+ Scale

Proven Production Numbers

listmonk.app states: "A production instance sending 7+ million emails. CPU usage is a fraction of a single core with peak RAM of 57 MB."

What Makes It Fast?

Technique	Impact	Details
Keyset Pagination	O(1) batch fetch	WHERE id > cursor instead of OFFSET. Constant time regardless of dataset size.
Batch Processing	Amortized DB cost	Default batch_size=1000. One query fetches 1000 subscribers. Reduces round-trips 1000x.
Connection Pooling	DB: 25, SMTP: 10/srv	max_open=25, max_idle=25 for Postgres. max_conns per SMTP server. Reuse over create.
Materialized Views	Instant dashboards	Pre-aggregated stats. REFRESH CONCURRENTLY allows reads during rebuild.
Template Caching	Zero recompile	Templates compiled once at startup, cached in memory. Re-compiled only on update.
Goroutine Pool	Bounded concurrency	Fixed pool (default 10 workers). No goroutine leak. Channel backpressure for flow control.
Streaming Export	Constant memory	Subscriber export writes CSV rows as they're fetched. No full-dataset buffering.
Expression Indexes	DATE() fast	idx_clicks_date ON (TIMEZONE('UTC', created_at)::DATE) avoids per-row function eval.
Single Binary	Fast startup	No file I/O for assets. stuffbin serves from memory. Startup in milliseconds.
Cache Slow Queries	Configurable	v3+ option: enable/disable query caching with custom cron interval for large DBs.

Bottlenecks & Scaling Limits

Bottleneck	Limit	Mitigation
Single Postgres	~10M subs before slowdown	Materialized views, cache_slow_queries, read replicas
Single Process	No horizontal scaling built-in	Run --passive for read-only replicas. One active sender.
SMTP Rate Limits	Provider-imposed (SES: 14/sec)	Sliding window limiter, multiple SMTPs, message_rate config
Template Rendering	CPU-bound for complex templates	Keep templates simple. Goroutine pool bounds CPU.
link_clicks Table	Can grow to billions of rows	DATE expression index. Consider partitioning at scale.

What Would You Do to Scale to 100M+ Subscribers?

Great system design interview follow-up:

Shard Postgres

Hash-based sharding on subscriber_id. Each shard handles batch fetches independently.

Add Message Queue

Kafka/SQS between batch fetch and send. Decouple producers from consumers. Enable multi-node senders.

Partition Analytics

Time-partition link_clicks, campaign_views by month. Auto-drop old partitions.

Horizontal Senders

Multiple sender nodes consuming from shared queue. Coordinator assigns campaign segments.

Cache Layer

Redis for hot subscriber lookups, template cache, rate limiter state across nodes.

CDN for Archives

Public campaign archives served from CDN. Static generation for high-traffic newsletters.

What Happens When 1 Million Requests Hit listmonk?

listmonk handles 6 fundamentally different request types. Each follows a different hot path through the system. Understanding these flows is critical for system design interviews — an interviewer will ask "walk me through what happens when..." and expect you to trace from TCP accept to DB write to response.

Key Insight: listmonk is NOT a typical CRUD API under load. Campaign sending is a push pipeline (server-initiated, async). Tracking pixels and link clicks are the real high-throughput inbound paths — these get 1M+ hits per campaign.

Flow 1: Tracking Pixel Open (Highest Volume — 1M+ per campaign)

When a subscriber opens an email, their client loads a 1x1 transparent PNG. This is the hottest path in the system.

1. EMAIL CLIENT ─── GET /campaign/{campaignUUID}/{subscriberUUID}/px.png ───▶ ECHO ROUTER
      │
      │  // No auth middleware — public endpoint. No session lookup.
      ▼
2. ECHO ROUTER ─── matches route "/campaign/:campUUID/:subUUID/px.png" ───▶ HANDLER: handleCampaignPixel()
      │
      │  // Parse UUIDs from path params. Validate format (fast regex). No DB lookup yet.
      ▼
3. HANDLER ─── checks privacy.disable_tracking setting ───▶ DECISION POINT
      │
      ├── IF tracking disabled: Return 1x1 PNG immediately. No DB write. O(1).
      │
      ├── IF privacy.individual_tracking = false:
      │     Insert into campaign_views with subscriber_id = NULL (anonymous).
      │     // Still counts the view, but can't attribute to a subscriber.
      │
      └── IF individual tracking enabled:
            │
            ▼
4. DB INSERT ─── INSERT INTO campaign_views (campaign_id, subscriber_id, created_at) ───▶ POSTGRESQL
      │
      │  // campaign_id resolved from UUID via campaigns table lookup (indexed).
      │  // subscriber_id resolved from UUID via subscribers table lookup (indexed).
      │  // Two UUID→ID lookups + one INSERT. Total: 3 queries.
      │  // campaign_views has BIGSERIAL PK — write-optimized append-only table.
      ▼
5. RESPONSE ─── 200 OK, Content-Type: image/png, body: 1x1 transparent PNG (68 bytes) ───▶ EMAIL CLIENT

At 1M Pixel Requests	What Happens	Bottleneck
Echo HTTP Server	Goroutine-per-request model. 1M requests = 1M goroutines (but short-lived, ~2KB each). Echo's radix tree router matches in O(1). No middleware overhead on public routes (no auth, no session).	Not a bottleneck — Go HTTP server handles 100k+ req/s on a single core.
UUID → ID Lookups	Two indexed lookups: `campaigns(uuid)` and `subscribers(uuid)`. Both have UNIQUE indexes. B-tree lookup = O(log N).	At 1M req/s this is 2M index lookups/sec. Could become the bottleneck. Fix: cache UUID→ID mapping in-process (Go map with RWMutex, or sync.Map).
campaign_views INSERTs	1M INSERT operations. BIGSERIAL PK = append-only. No index updates except on campaign_id and subscriber_id FK indexes. DATE expression index updated per row.	High write pressure. PostgreSQL can do ~10-50k INSERTs/sec depending on hardware. 1M requests would need: batch inserts, async buffering, or write-ahead table with periodic flush.
Connection Pool	25 connections shared across all goroutines. Each INSERT holds a connection for ~1ms. Effective throughput: ~25,000 INSERTs/sec.	Pool exhaustion at high concurrency. Goroutines block waiting for free connection. Solution: increase max_open, or buffer writes in a Go channel and batch-insert.
Response	68-byte PNG from memory (hardcoded). No disk I/O. No template rendering. Fastest possible response after DB write.	Not a bottleneck.

How You'd Optimize This at Scale

1) Buffer pixel events in a Go channel, batch INSERT every 100ms or every 1000 events (whichever comes first). 2) Cache UUID→ID mappings in-process with TTL. 3) If tracking is anonymous, skip subscriber UUID lookup entirely. 4) Use COPY instead of INSERT for batch writes (10x faster). 5) Partition campaign_views by campaign_id or date. 6) Consider async write: return 200 immediately, write to DB in background goroutine (accept small data loss risk).

Flow 2: Link Click Tracking (High Volume — 100K+ per campaign)

Every link in a campaign email is wrapped. When a subscriber clicks, they hit listmonk first, which records the click then redirects to the actual URL.

1. BROWSER ─── GET /link/{linkUUID}/{campaignUUID}/{subscriberUUID} ───▶ ECHO ROUTER
      │
      │  // Public endpoint. No auth. Three UUIDs in path.
      ▼
2. HANDLER ─── handleLinkRedirect()
      │
      ├── Resolve link UUID → link record (get actual URL)
      │     // links table: url TEXT NOT NULL UNIQUE. UUID indexed.
      │
      ├── Resolve campaign UUID → campaign_id
      │
      ├── Resolve subscriber UUID → subscriber_id (if individual tracking on)
      │
      ├── INSERT INTO link_clicks (campaign_id, link_id, subscriber_id)
      │     // BIGSERIAL PK. Indexes on campaign_id, link_id, subscriber_id, DATE.
      │     // 4 index updates per INSERT. Heavier than campaign_views.
      │
      └── 302 REDIRECT → actual URL
            // User sees the destination page. Redirect is instant.

At 1M Click Requests	What Happens	Optimization
3 UUID Lookups	Link, campaign, and subscriber UUIDs resolved to integer IDs. Three indexed lookups per request.	Cache link UUID→(id, url) in-process. Links are immutable once created — perfect cache candidate with no invalidation needed.
link_clicks INSERT	Heavier than campaign_views: 4 index updates per row (campaign_id, link_id, subscriber_id, DATE expression index).	Batch inserts via buffered channel. COPY command for bulk. Consider unlogged table for click data if durability isn't critical.
Redirect Latency	User-facing! The subscriber waits for the 302. DB insert is on the critical path — if DB is slow, user sees delay before reaching destination.	Move INSERT off the critical path: write to in-memory buffer, return 302 immediately, flush to DB async. Accept ~1s data delay for instant redirect.
Table Growth	link_clicks grows unboundedly. 1M clicks/campaign × 100 campaigns = 100M rows. DATE expression index helps but table gets large.	Time-based partitioning (monthly). Drop partitions older than retention period. Or archive to cold storage.

Flow 3: Campaign Send Pipeline (1M Outbound Emails)

This is not a request flow — it's a server-initiated push pipeline. But it's what people mean by "1M requests" in the context of listmonk.

MANAGER GOROUTINE (long-lived, polls DB every ~5s)
      │
      ├── SELECT campaigns WHERE status IN ('running','scheduled')
      │     // status column indexed. Cheap scan — usually 0-5 active campaigns.
      │
      ▼  For each active campaign:
BATCH PRODUCER (one goroutine per campaign)
      │
      │   // Loop until all subscribers processed:
      ├── SELECT subscribers WHERE id > last_subscriber_id
      │     AND id IN (subscriber_lists WHERE list_id IN campaign_lists)
      │     AND status = 'enabled'
      │     ORDER BY id LIMIT 1000  // ← keyset pagination, O(1) via PK index
      │
      │   // For each subscriber in batch:
      ├── TEMPLATE RENDER ─── Go html/template.Execute()
      │     // Inject: subscriber.name, subscriber.attribs, campaign.subject
      │     // Generate: tracking pixel URL, wrapped link URLs
      │     // Sprig functions available. Compiled template cached (sync.Once).
      │     // CPU cost: ~0.1ms per render for simple templates.
      │
      ├── msgChan <- message  // Send to buffered channel. Blocks if channel full (backpressure).
      │
      └── UPDATE campaigns SET last_subscriber_id = ?, sent = ?  // Checkpoint after batch

                          ║ channel (buffer = batch_size)
                          ▼
WORKER POOL (N = app.concurrency, default 10 goroutines)
      │
      │   // Each worker loops: for msg := range msgChan { ... }
      │
      ├── RATE LIMITER ─── rateLimiter.Wait()
      │     // Token bucket: app.message_rate tokens/sec
      │     // OR sliding window: app.message_sliding_window_rate per window_duration
      │     // Blocks goroutine until token available. Natural throttle.
      │
      ├── MESSENGER.Push(msg) ─── SMTP connection from pool
      │     // Acquires connection from pool (max_conns per SMTP server)
      │     // SMTP EHLO → MAIL FROM → RCPT TO → DATA → message body → QUIT
      │     // Connection reused for next message (persistent TCP). Pipelining.
      │     // On failure: retry up to max_msg_retries times
      │
      ├── atomic.AddInt64(&sent, 1)  // Lock-free counter update
      │
      └── ON ERROR: increment error counter. If errors > max_send_errors → PAUSE campaign

1M Emails — Time & Resource Breakdown

Phase	Operations	Time @ Defaults	Resource
DB Fetch	1M / 1000 batch_size = 1000 queries	~1ms/query × 1000 = ~1 second total	1 DB connection (sequential per campaign)
Template Render	1M renders × ~0.1ms each	~100 seconds (single-threaded per campaign)	CPU-bound. ~1 core for simple templates.
Rate Limit Wait	1M msgs ÷ 10 msg/sec	~27.7 hours at default message_rate=10	Near-zero (goroutine sleep)
SMTP Send	1M connections (reused) × ~50ms avg	10 workers × 50ms = ~200 msg/sec → ~83 minutes	10 SMTP TCP connections
Progress Updates	1000 checkpoint UPDATEs	~1ms each = ~1 second total	1 DB connection
Memory	10 goroutines + 1000-msg channel buffer + template cache	—	~50-60 MB peak (proven at 7M scale)

The Real Bottleneck: Rate Limiting, Not Go

At default settings (message_rate=10), 1M emails takes ~28 hours. Go processes them in minutes — the rate limiter is the intentional brake. This is by design: SMTP providers (SES, SendGrid) impose rate limits (SES default: 14/sec). Sending faster than the provider allows means bounced connections and IP reputation damage. Production tuning: set message_rate=100-500 with SES production access, concurrency=50, multiple SMTP servers. 1M emails in ~30 minutes.

Tuned Configuration for 1M Emails

concurrency=50, message_rate=500, batch_size=5000, max_conns=20 per SMTP × 3 SMTP servers = 60 total connections. Estimated throughput: ~500 msgs/sec = 1M emails in ~33 minutes. Memory stays under 100MB. CPU usage: 2-3 cores.

Flow 4: Public Subscription Form Submit

When a user subscribes via a public form on your website. Lower volume but has more steps.

1. BROWSER ─── POST /subscription/form ───▶ ECHO ROUTER
      │
      ├── CAPTCHA VERIFY ─── ALTCHA proof-of-work check OR hCaptcha API call
      │     // ALTCHA: CPU verification, no external API call (~1ms)
      │     // hCaptcha: external HTTP call to verify token (~100-300ms)
      │
      ├── DOMAIN FILTER ─── check email domain against blocklist/allowlist
      │     // privacy.domain_blocklist: ["*.disposable.com", "tempmail.org"]
      │     // In-memory check. O(N) against list but lists are tiny.
      │
      ├── UPSERT SUBSCRIBER ─── INSERT ... ON CONFLICT (email) DO UPDATE
      │     // Case-insensitive: idx_subs_email ON LOWER(email)
      │     // If exists: update name/attribs. If new: create with UUID.
      │
      ├── INSERT subscriber_lists ─── subscribe to requested lists
      │     // Status = 'unconfirmed' for double-optin lists
      │     // Status = 'confirmed' for single-optin lists
      │
      ├── IF double-optin: SEND CONFIRMATION EMAIL
      │     // Render optin template with confirmation link
      │     // Push to SMTP via messenger (same pool as campaigns)
      │     // Confirmation link: /subscription/optin/{subscriberUUID}/{listUUID}
      │
      └── 200 OK ─── render success template page

Flow 5: REST API Request (Admin/Programmatic)

API-triggered operations: creating subscribers, managing lists, triggering transactional emails. Authenticated path.

1. CLIENT ─── GET /api/subscribers?page=1&per_page=50 ───▶ ECHO ROUTER
      │
      │  // Authorization header: "username:api-token" (base64)
      ▼
2. MIDDLEWARE CHAIN
      │
      ├── Auth Middleware ─── validate API token
      │     // Lookup user by username in users table (indexed)
      │     // Verify token (bcrypt compare or direct match for API users)
      │     // Load user role + permissions from roles table
      │     // Set auth context on echo.Context
      │
      ├── Permission Check ─── does user have "subscribers:get_all" permission?
      │     // Check user_role permissions array. If list-scoped, filter by allowed lists.
      │
      └── CORS Middleware ─── check Origin header against security.cors_origins
      ▼
3. HANDLER ─── GetSubscribers()
      │
      ├── Parse query params ─── page, per_page, query, list_id, order_by
      │
      ├── Build SQL ─── dynamic WHERE clause from search/filter params
      │     // If "query" is plain text: ILIKE search on email and name
      │     // If "query" starts with SQL expression: parse as raw WHERE clause
      │     // Permission-filtered: only show subscribers in user's allowed lists
      │
      ├── EXECUTE SQL ─── via prepared statement from goyesql
      │     // Uses sqlx.Select() for struct scanning
      │     // COUNT(*) for total (separate query)
      │     // OFFSET/LIMIT pagination for API (not keyset — acceptable for admin UI)
      │
      └── 200 OK ─── JSON response with data[] + total + per_page + page

At 1M API Requests	Bottleneck Analysis
Auth Overhead	Every API request hits the DB for user/role lookup. At 1M req: 1M user queries + 1M role queries. Fix: cache authenticated sessions in-process with TTL. API tokens are static — perfect for caching.
OFFSET Pagination	API uses OFFSET/LIMIT (not keyset). `page=10000&per_page=50` means scanning 500K rows. Degrades linearly. Acceptable for admin UI (low page numbers) but problematic for bulk API consumers.
No API Rate Limiting	No per-consumer rate limiting on the API. A misbehaving client can exhaust the DB connection pool. Fix: per-token rate limiter middleware (token bucket or sliding window).
Connection Pool Contention	API, campaign engine, tracking pixels, and bounce processor all share the same 25-connection pool. Under 1M API requests, campaign sending would slow down. Fix: separate pools or connection prioritization.

Flow 6: Bounce Webhook (SES/SendGrid/Postmark)

After a campaign, bounce notifications flow back from email providers. Volume scales with send volume — expect 2-5% bounce rate.

1. SES/SENDGRID ─── POST /webhooks/bounce/{type} ───▶ ECHO ROUTER
      │
      ├── Authenticate webhook ─── verify provider-specific auth
      │     // SES: verify SNS signature. SendGrid: verify key. Postmark: basic auth.
      │
      ├── Parse bounce payload ─── extract: email, bounce type, campaign ID
      │     // Normalize across providers into internal bounce_type ENUM
      │
      ├── INSERT INTO bounces ─── (subscriber_id, campaign_id, type, source, meta)
      │     // Subscriber looked up by email. campaign_id nullable (might be unknown).
      │
      ├── CHECK THRESHOLD ─── count bounces for this subscriber by type
      │     // SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? AND type = ?
      │     // Compare against configured actions: hard.count=1, soft.count=2
      │
      └── IF threshold exceeded: UPDATE subscribers SET status = 'blocklisted'
            // Or DELETE subscriber if action = 'delete'
            // Cascading: subscriber_lists entries also cleaned up (FK CASCADE)

Flow 7: Transactional Email API

Single message sends triggered by your application (welcome emails, password resets, order confirmations). Different from campaign sends — synchronous, one-at-a-time.

1. YOUR APP ─── POST /api/tx ───▶ ECHO ROUTER
      │
      │  // Body: { "subscriber_email": "...", "template_id": 5, "data": {...} }
      │  // Auth: API token (required)
      ▼
2. HANDLER
      │
      ├── Resolve subscriber ─── lookup by email or ID
      ├── Load TX template ─── from template cache (in-memory)
      ├── Render template ─── with subscriber data + custom data payload
      ├── messenger.Push(msg) ─── synchronous SMTP send
      │     // Blocks until SMTP ACK or error. Caller waits.
      │     // No rate limiting — each TX call = one immediate send.
      └── 200 OK ─── { "data": true }

At 1M TX Requests	What Breaks
Synchronous SMTP	Each TX call blocks until SMTP responds (~50-200ms). With 25 DB connections, max concurrent TX sends = ~25. At 200ms each = ~125 TX/sec. 1M would take ~2.2 hours. Fix: async queue with delivery confirmation callback.
No TX Rate Limiting	TX calls bypass the campaign rate limiter. A burst of 10K TX calls would exhaust SMTP connections. Fix: separate TX rate limiter or shared token bucket.
Shared SMTP Pool	TX emails share the same SMTP connection pool as campaigns. Heavy TX load can starve campaign sending. Fix: dedicated SMTP server for TX (use named messenger).

Request Flow Summary — All 7 Flows at 1M Scale

Flow	Type	Volume Profile	Hot Path Cost	Primary Bottleneck	DB Queries/Req
Tracking Pixel	Inbound GET	1M+ per large campaign	2 UUID lookups + 1 INSERT	DB write throughput	3
Link Click	Inbound GET → 302	100K-500K per campaign	3 UUID lookups + 1 INSERT	DB write + redirect latency	4
Campaign Send	Outbound push	1M emails (async)	Batch fetch + render + SMTP	Rate limiter (intentional)	~1 per 1000
Subscription	Inbound POST	100-10K/day	CAPTCHA + upsert + optin email	CAPTCHA verification	2-4
API CRUD	Inbound REST	Depends on integration	Auth + query + JSON marshal	Auth DB lookup (cacheable)	3-5
Bounce Webhook	Inbound POST	2-5% of send volume	Auth + INSERT + threshold check	Negligible at normal rates	3-4
TX Email	Inbound POST → SMTP	Depends on app	Auth + render + sync SMTP	Synchronous SMTP blocking	2-3

Will Goroutines Run Into Memory Errors?

Short answer: the campaign engine is safe, but the HTTP server is not protected. listmonk has two completely different goroutine models running simultaneously, and they have very different risk profiles.

Go Goroutine Memory Model — The Basics

Property	Value	Why It Matters
Initial Stack Size	2 KB (Go 1.4+)	A goroutine starts tiny. 1000 idle goroutines = ~2 MB. Cheap to create.
Stack Growth	Dynamically grows up to 1 GB (default)	Stack doubles when needed (copy-on-grow). A goroutine doing real work (allocating buffers, rendering templates, building HTTP responses) can grow to 8-64 KB each.
Heap Allocations	Varies per goroutine workload	Template rendering, JSON marshaling, SQL result scanning all allocate on the heap. These are the real memory consumers — not the goroutine stack itself.
GC Pressure	Go GC runs concurrently	High allocation rate from 100K+ goroutines triggers frequent GC cycles. GC latency spikes (STW pauses ~1-5ms) can compound under load.
OS Threads	GOMAXPROCS (default = num CPUs)	Goroutines are multiplexed onto OS threads. 1M goroutines still only uses ~8-16 OS threads. This is NOT the bottleneck.

The Two Goroutine Models in listmonk

✓ SAFE: Campaign Engine (Bounded Pool)

Model: Fixed worker pool of N goroutines (default 10, configurable via app.concurrency).
Bounded by: Channel buffer (batch_size) + fixed worker count. Even sending 7M emails, there are only 10-50 goroutines alive.
Memory per campaign: ~10 workers × ~64KB stack + channel buffer of 1000 messages × ~2KB each = ~2.6 MB total.
Why it's safe: Producer blocks on channel send when buffer is full (backpressure). Workers block on rate limiter. Goroutine count never exceeds concurrency. You cannot OOM from campaign sending.
The pattern:

// Fixed pool — goroutine count = concurrency (constant)
for i := 0; i < concurrency; i++ {
    go worker(msgChan)  // Exactly N goroutines. No more.
}
// Producer blocks if channel full — natural backpressure
msgChan <- msg  // Blocks here, NOT by spawning new goroutines

⚠ UNBOUNDED: HTTP Server (Goroutine-Per-Request)

Model: Go's net/http server spawns one goroutine per incoming connection. Echo sits on top of this. No built-in limit.
Bounded by: Nothing in listmonk's code. If 100K tracking pixel requests arrive simultaneously, Go creates 100K goroutines.
Memory per request: ~8-64KB stack + ~2-10KB heap (UUID parsing, DB result scanning, response writing) = ~50-70KB per concurrent request.
At 100K concurrent: 100K × 70KB = ~7 GB. At 500K concurrent: ~35 GB. At 1M concurrent: ~70 GB → OOM on most machines.
The pattern:

// Go's net/http — UNBOUNDED goroutine creation
func (srv *Server) Serve(l net.Listener) error {
    for {
        conn, _ := l.Accept()
        go srv.serve(conn)  // New goroutine for EVERY connection!
    }                        // No limit. No backpressure.
}

When Does OOM Actually Happen?

The key distinction is concurrent vs total requests. 1M requests over an hour is fine. 1M requests in 1 second is a problem.

Scenario	Concurrent Goroutines	Memory	OOM Risk
Campaign send: 1M emails	10-50 (fixed pool)	~3-10 MB	None — bounded by design
Tracking pixels: 1M over 24 hours ~12 req/sec avg	~12-50	~1-3 MB	None — requests complete fast (~5ms each)
Tracking pixels: 1M in 1 hour ~278 req/sec avg	~278-1000	~20-70 MB	Low — manageable if DB keeps up
Tracking pixels: 1M in 1 minute ~16,667 req/sec	~5,000-50,000	~350 MB - 3.5 GB	Medium — DB becomes bottleneck, goroutines pile up waiting for connections
Tracking pixels: 100K concurrent spike / DDoS / viral email	100,000	~7 GB	High — goroutines block on 25-conn DB pool, stack up in memory
Tracking pixels: 1M concurrent extreme / unrealistic	1,000,000	~70 GB	OOM crash — Go will allocate until killed by OS

The Real Danger: Goroutine Pile-Up on DB Pool

The OOM risk isn't from goroutine creation — it's from goroutine accumulation. Here's the cascade failure:

CASCADE FAILURE SCENARIO — 50K req/sec tracking pixel burst

t=0s    50,000 requests arrive. Go spawns 50,000 goroutines. (~3.5 GB)
          │
          ▼
t=0.001s All 50K goroutines try to acquire a DB connection from pool (max_open=25).
          25 goroutines get connections. 49,975 goroutines BLOCK waiting.
          │
          ▼
t=0.005s Each DB query takes ~5ms. First 25 connections freed. Next 25 goroutines proceed.
          But another 50K requests arrived! Now 99,950 goroutines blocked. (~7 GB)
          │
          ▼
t=1.0s   At 50K/sec inflow, 25 conn pool processes ~5000 req/sec (25 × 200 queries/sec).
          Deficit: 45,000 goroutines/sec accumulating. After 1 second: ~45K blocked.
          │
          ▼
t=10s    ~450K goroutines blocked. Memory: ~30 GB. GC thrashing. Latency spikes.
          │
          ▼
t=20s    ~900K goroutines. OOM killer triggers. Process killed.
          Campaign sending (which shares the same process) dies too.

What listmonk Does NOT Have (Protection Gaps)

Missing Protection	Consequence	What You'd Add
No HTTP connection limit	Unbounded goroutine creation on burst traffic	Echo middleware: `middleware.RateLimiter()` or custom semaphore. Reject with 429 when concurrent > threshold. Or `server.MaxConnsPerHost`.
No request queue / shed	All requests accepted even when DB pool is full — they just block	Load shedding: if DB pool queue > N, return 503 immediately. Fail fast instead of accumulate. `context.WithTimeout` on DB calls.
No goroutine budget	No visibility into goroutine count. No alarm threshold.	Expose `runtime.NumGoroutine()` as Prometheus metric. Alert when > 10K. Circuit break at 50K.
No request timeout	If DB is slow, goroutines hang indefinitely holding memory	`echo.Middleware(TimeoutMiddleware(5 * time.Second))`. Or `http.Server{ReadTimeout: 10s, WriteTimeout: 10s}`.
No async write path	Every tracking pixel does a synchronous DB INSERT before responding	Write to in-memory ring buffer. Background goroutine batch-flushes to DB every 100ms. Decouple request handling from DB writes.

How You'd Fix This — Production-Grade Architecture

PRODUCTION FIX: Bounded Concurrency + Async Writes

// 1. Add server-level connection limits
srv := &http.Server{
    Addr:         ":9000",
    ReadTimeout:  10 * time.Second,   // Prevent slow-read attacks
    WriteTimeout: 15 * time.Second,   // Prevent goroutine hang on slow clients
    IdleTimeout:  60 * time.Second,   // Close idle keep-alive connections
    MaxHeaderBytes: 1 << 20,          // 1MB header limit
}

// 2. Add concurrency limiter middleware
sem := make(chan struct{}, 10000)      // Max 10K concurrent requests
e.Use(func(next echo.HandlerFunc) echo.HandlerFunc {
    return func(c echo.Context) error {
        select {
        case sem <- struct{}{}:
            defer func() { <-sem }()
            return next(c)
        default:
            return c.String(503, "server busy")  // Load shedding
        }
    }
})

// 3. Async tracking writes (decouple from request path)
trackChan := make(chan TrackEvent, 100000)  // Buffered channel

// Handler: write to channel, return immediately
func handlePixel(c echo.Context) error {
    select {
    case trackChan <- TrackEvent{campID, subID}:
        // Queued successfully
    default:
        // Channel full — drop event (acceptable for analytics)
    }
    return c.Blob(200, "image/png", pixel1x1)  // Instant response
}

// Background flusher: batch inserts every 100ms
go func() {
    ticker := time.NewTicker(100 * time.Millisecond)
    batch := make([]TrackEvent, 0, 1000)
    for {
        select {
        case ev := <-trackChan:
            batch = append(batch, ev)
            if len(batch) >= 1000 {
                flushBatch(batch)        // COPY INTO campaign_views
                batch = batch[:0]
            }
        case <-ticker.C:
            if len(batch) > 0 {
                flushBatch(batch)
                batch = batch[:0]
            }
        }
    }
}()

// 4. DB call timeouts
ctx, cancel := context.WithTimeout(c.Request().Context(), 3*time.Second)
defer cancel()
db.QueryContext(ctx, query, args...)   // Cancels if DB slow

// 5. Monitor goroutine count
go func() {
    for range time.Tick(5 * time.Second) {
        n := runtime.NumGoroutine()
        metrics.Gauge("goroutines", n)  // Prometheus metric
        if n > 50000 {
            log.Error("goroutine count critical", "count", n)
        }
    }
}()

Interview Answer: "How Does Go Handle 1M Concurrent Requests?"

The nuanced answer: Go's goroutine-per-request model works brilliantly when requests are fast (sub-millisecond). A goroutine costs ~2KB initially, so 10K concurrent = ~20MB — trivial. The danger is when goroutines block on shared resources — specifically the database connection pool. If your pool has 25 connections and 50K goroutines are waiting, you have 50K goroutines accumulating at ~50-70KB each = ~3.5GB. The goroutine count grows at (request rate - DB throughput) per second.

listmonk's campaign engine avoids this by using a fixed goroutine pool with channel backpressure — the producer-consumer pattern. But the HTTP server uses Go's default unbounded model with no concurrency limit, no request timeout, and no load shedding.

The production fix is three layers: (1) Server-level timeouts (ReadTimeout, WriteTimeout) prevent slow clients from holding goroutines. (2) Concurrency limiter middleware (semaphore channel) caps concurrent requests and returns 503 when saturated. (3) Async write path for hot paths (tracking pixels, link clicks) — decouple the DB write from the HTTP response using a buffered channel with a background batch flusher. This turns a 5ms synchronous DB call into a <50µs channel send.

Interview Framing: "The system handles 1M requests across different hot paths. The campaign send is I/O-bound on SMTP with intentional rate limiting. Tracking pixels and link clicks are the surprise high-throughput paths — they're write-heavy, user-facing, and scale linearly with subscriber count. The architectural insight is that these are append-only writes to analytics tables — perfect candidates for buffered batch inserts, table partitioning, and async processing. The Go HTTP server and goroutine model are never the bottleneck; PostgreSQL write throughput is."

Contention & Concurrency — Where Things Fight Over Shared Resources

Concurrency is about structure (multiple things can run). Contention is about conflict (multiple things fighting for the same resource). listmonk has both, and understanding the contention points is what separates a senior answer from a junior one in interviews.

Resource Contention Map — Every Shared Resource in listmonk

Shared Resource	Who Competes	Contention Type	Protection Mechanism	Risk Level
PostgreSQL Connection Pool max_open = 25	HTTP handlers, Campaign engine, Bounce processor, Importer, Cron jobs (matview refresh)	Mutex-like (pool internal lock)	Go's `database/sql` pool with internal mutex. Goroutines block on `db.Conn()` when pool exhausted.	HIGH — This is the #1 contention point. All subsystems share one pool. No priority, no isolation.
SMTP Connection Pool max_conns per server	Campaign workers, TX email handler, Optin confirmation sender, Notification sender	Channel-based semaphore	Buffered channel acts as connection pool. Workers block on channel receive when all connections in use.	MEDIUM — Campaign workers dominate. TX emails can starve during active campaigns.
Campaign Message Channel buffer = batch_size	Batch producer (1 goroutine) vs Worker pool (N goroutines)	Channel (CSP)	Go buffered channel. Producer blocks on send if full. Workers block on receive if empty. Lock-free.	LOW — By design. Channel provides natural flow control. No contention, only coordination.
Campaign Sent Counter shared int64	All N worker goroutines (10-50 concurrent)	Atomic CAS	`atomic.AddInt64(&sent, 1)`. Lock-free compare-and-swap at hardware level. No mutex.	NONE — Atomic operations have zero contention overhead. O(1) per operation regardless of concurrency.
Template Cache compiled templates in memory	Campaign workers (read) vs Admin updating template (write)	sync.Once / recompile	Templates compiled once (`sync.Once`). On admin update: recompile and swap pointer. Workers read stale until swap. No read-side lock.	NONE during normal operation. Momentary during recompile (new pointer swap is atomic).
Settings / Config app.constants struct	All handlers (read) vs Settings update (write → restart)	Process restart	Settings changes trigger SIGHUP → full process restart. No concurrent read/write possible — the entire process is replaced via `syscall.Exec()`.	NONE — listmonk avoids the problem entirely by restarting. No RWMutex needed.
SSE Event Bus events.Events	Campaign manager (publish) vs Browser clients (subscribe)	Channel per subscriber	Each SSE client gets its own channel. Publisher fans out to all subscriber channels. No shared state between clients.	LOW — Fan-out pattern. Publisher may slow if a client channel is full (slow consumer).
CSV Import Processor single goroutine	Import goroutine vs HTTP API (checking import status)	Mutex (likely)	Importer runs as a single goroutine. Status checked via API. Likely uses a mutex or atomic for progress state.	LOW — Single writer, occasional reader. Minimal contention.

DB Pool Contention — The Priority Inversion Problem

This is the most important contention point in listmonk and a great interview discussion topic:

PRIORITY INVERSION: All subsystems share 25 DB connections

┌─────────────────────────────────────────────────────────┐
│                  sql.DB Pool (max_open=25)               │
│   ┌────┐┌────┐┌────┐┌────┐┌────┐  ...  ┌────┐          │
│   │conn││conn││conn││conn││conn│       │conn│  ×25     │
│   └──┬─┘└──┬─┘└──┬─┘└──┬─┘└──┬─┘       └──┬─┘          │
└──────┼─────┼─────┼─────┼─────┼──────────────┼───────────┘
       │     │     │     │     │              │
       ▼     ▼     ▼     ▼     ▼              ▼
  Campaign Campaign Pixel  Pixel  API          Bounce
  Batch1   Batch2   Track  Track  List          Check

Problem scenarios:

1. Campaign sending large batches holds connections for batch SELECT (~5-50ms).
   Meanwhile, admin API requests queue behind campaign queries.
   Admin dashboard feels slow during active campaigns.

2. 1M tracking pixel INSERTs saturate the pool.
   Campaign batch fetch can't get a connection.
   Campaign throughput drops. Send time extends.

3. Materialized view REFRESH (cron job) takes 30-60 seconds.
   Holds 1 connection during entire refresh.
   24 connections left for everything else.

4. Bulk subscriber import doing 10K upserts.
   Competes with campaign sending for connections.
   Both slow down.

Fix Strategy	How	Trade-off
Separate Connection Pools	Create 3 `sql.DB` instances: one for campaign engine (10 conns), one for HTTP handlers (10 conns), one for background jobs (5 conns). Each with independent `max_open`.	More total connections to Postgres. Needs `max_connections` increase on DB side. Slightly more memory.
Connection Priority	Custom pool wrapper that reserves N connections for high-priority callers (campaign engine). HTTP requests use remaining. Implement with two semaphores.	Complex. Can cause HTTP starvation if campaign is too aggressive.
Read Replica Split	Route all SELECT queries (subscriber lookups, dashboard, API reads) to a read replica. Writes (INSERTs, UPDATEs) go to primary.	Replication lag (milliseconds). Needs application-level routing. listmonk doesn't support this natively.
Context Timeouts	`context.WithTimeout(ctx, 3*time.Second)` on all DB calls. If a connection isn't available within 3s, fail fast with 503 instead of blocking.	Requests fail under load instead of queuing. Better for latency SLOs. Some data operations may need longer timeouts.
PgBouncer	External connection pooler between listmonk and Postgres. Transaction-mode pooling. Multiplexes 25 application connections into 100+ Postgres connections.	Additional infrastructure. Doesn't work with prepared statements in session mode. Adds ~0.1ms latency.

Row-Level Contention in PostgreSQL

Contention Point	Scenario	What PostgreSQL Does	Impact
campaigns row UPDATE	Campaign manager updates `last_subscriber_id` and `sent` after each batch. Admin simultaneously views campaign status.	Row-level lock (MVCC). Writer acquires RowExclusiveLock. Reader sees old snapshot (no block). Writers don't block readers.	None — MVCC handles this perfectly. Reads see consistent snapshot.
subscribers UPSERT	CSV import upserting 10K subscribers while public subscription form creates new subscribers. Both touch `idx_subs_email` unique index.	Each INSERT/UPDATE acquires row lock. Concurrent upserts on different emails: no conflict. Same email: one waits for other's transaction to commit.	Low — conflicts only on same email. Import batches in transactions, so a stuck import blocks other writes to same subscribers.
subscriber_lists INSERT	Two campaigns targeting overlapping lists. Both reading subscriber_lists to find recipients. Campaign manager only reads; doesn't write to this table during send.	No contention — campaign send only SELECTs from subscriber_lists. Subscription changes (add/remove) acquire row locks on specific (subscriber_id, list_id) pairs.	None — read-only during campaign processing.
campaign_views / link_clicks INSERT	Thousands of concurrent tracking pixel and link click INSERTs. All writing to the same tables.	Append-only tables with BIGSERIAL PK. Each INSERT acquires a `nextval()` on the sequence (lightweight lock) + index locks. No row-level conflicts.	Sequence lock is a bottleneck at very high insert rates (~50K+/sec). Fix: `CACHE 100` on sequence to reduce lock acquisitions. Or batch inserts.
settings UPDATE	Admin saves settings while campaign is reading config.	Settings are read at startup and cached in-memory (`app.constants`). DB write doesn't affect running config. Full process restart needed to pick up changes.	None — decoupled by design. Config is immutable during process lifetime.
REFRESH MATERIALIZED VIEW CONCURRENTLY	Cron job refreshes dashboard stats while admin views dashboard.	`CONCURRENTLY` keyword allows reads during refresh. Creates new version of matview, swaps atomically. Requires UNIQUE index on matview.	None for readers. The refresh itself holds an `ExclusiveLock` on the matview — two concurrent refreshes would block.
bounces threshold CHECK	Multiple bounce webhooks for same subscriber arrive simultaneously. Each does `SELECT COUNT(*) FROM bounces WHERE subscriber_id = ?` then potentially `UPDATE subscribers SET status = 'blocklisted'`.	TOCTOU race condition. Two webhooks both count 0 bounces, both insert, both check threshold — subscriber may get N+1 bounces before blocklist triggers.	Minor — subscriber gets one extra email before blocklist. Not dangerous. Fix: `SELECT ... FOR UPDATE` on subscriber row during bounce processing.

Go-Level Concurrency Primitives Used

Primitive	Where Used	Why This Choice	Contention Characteristics
Buffered Channel	Campaign message pipeline SMTP connection pool SSE event fan-out	CSP model. Decouples producer from consumer. Natural backpressure. No explicit locking needed.	Zero contention when buffer isn't full/empty. Contention only at boundaries: producer blocks when full (backpressure), consumer blocks when empty (idle).
atomic.AddInt64	Campaign sent counter Campaign error counter	Lock-free counter. Hardware CAS instruction. No goroutine blocking, ever.	Near-zero. CAS retry on contention (extremely rare, nanoseconds). Outperforms mutex by 10-100x for simple counters.
sync.Once	Template compilation One-time initialization	Thread-safe lazy init. First caller executes, all others wait then return cached result.	First call: brief mutex hold during init. All subsequent calls: atomic read (zero contention). Perfect for "compute once, read forever" patterns.
database/sql Pool	All PostgreSQL access	Built-in connection pooling. Thread-safe. Handles connection lifecycle.	Internal mutex on `connRequests` map. Under high concurrency, goroutines queue in FIFO order. This is the primary contention point in the entire system.
No explicit Mutex	—	listmonk avoids `sync.Mutex` and `sync.RWMutex` in hot paths. Prefers channels, atomics, and immutable data (restart on config change).	Architectural choice: channels for coordination, atomics for counters, process restart for config. Eliminates most mutex contention by design.

Concurrent Campaign Execution — Overlapping Lists

What happens when two campaigns target overlapping subscriber lists simultaneously?

SCENARIO: Campaign A targets List 1 (500K subs), Campaign B targets List 2 (500K subs)
200K subscribers are on BOTH lists.

Campaign A goroutines: Campaign B goroutines:
1 batch producer 1 batch producer
10 send workers 10 send workers
────────────── ──────────────
11 goroutines 11 goroutines // Total: 22 goroutines

What happens to the 200K overlapping subscribers?

✓ Both campaigns independently fetch and send to them.
✓ The subscriber receives BOTH emails (intentional — different campaigns).
✓ No deduplication across campaigns (by design).
✓ No row locks conflict — campaigns only SELECT from subscriber_lists.
✓ Each campaign has independent last_subscriber_id cursor.
✓ Each campaign has independent sent counter (atomic).

Contention points:

1. DB Pool: 22 goroutines competing for 25 connections.
Batch fetches: 2 long-running SELECTs.
Workers doing progress UPDATEs: occasional contention.
Mitigation: workers mostly wait on SMTP (I/O bound), not DB.

2. SMTP Pool: 20 workers sharing max_conns connections.
If max_conns=10, workers queue for connections.
Mitigation: each campaign can use different named SMTP servers.

3. Rate Limiter: GLOBAL rate limit (message_rate) shared across campaigns.
Two campaigns each wanting 100 msg/sec with rate=100 → each gets ~50.
Campaign throughput halves with each concurrent campaign.

4. No campaign-level resource isolation.
A slow campaign (complex template) slows all campaigns
by holding DB connections longer.

Race Conditions — Known & Potential

Race Condition	Severity	Description	Fix
Bounce TOCTOU	Low	Two bounce webhooks arrive for same subscriber simultaneously. Both read count=0, both insert, both check threshold — neither triggers blocklist because each sees count=1 when threshold=2. Next bounce will trigger it.	`SELECT ... FOR UPDATE` on subscriber row. Or `INSERT + SELECT COUNT` in a single transaction with SERIALIZABLE isolation.
Campaign Status Transition	Low	Admin clicks "Pause" while manager is updating `sent` count. PostgreSQL row-level MVCC prevents corruption — the UPDATE acquires a row lock. But the pause might not take effect until the current batch completes.	Current behavior is acceptable. Campaign checks for pause signal between batches. Near-instant in v3.0.0 rewrite.
Duplicate Subscription	None	Two form submissions for same email at the same time. `INSERT ... ON CONFLICT (email) DO UPDATE` handles this atomically — PostgreSQL serializes at the unique index level.	Already handled by DB unique constraint + upsert. No application-level fix needed.
Matview Concurrent Refresh	Low	Two cron triggers fire simultaneously (unlikely but possible). `REFRESH MATERIALIZED VIEW CONCURRENTLY` acquires ExclusiveLock — second call blocks until first completes.	Not harmful — just wastes time. Could add application-level lock (`pg_advisory_lock`) to skip if already running.
Template Hot-Swap	None	Admin updates template while campaign is mid-send using that template. Campaign workers hold reference to compiled template in memory. Template recompile creates new object; old one is GC'd after campaign finishes.	Safe — Go's GC keeps old template alive as long as goroutines reference it. New campaigns get the updated template. In-flight campaign uses the old version.
Subscriber Delete During Send	None	Admin deletes subscriber while campaign is sending to them. Campaign already fetched the batch — subscriber data is in memory. DB INSERT for tracking: `subscriber_id FK SET NULL` handles gracefully.	Already handled by schema design. `ON DELETE SET NULL` on campaign_views and link_clicks preserves analytics data.

Interview Answer: "How Do You Handle Contention in a Concurrent System?"

Layer the answer by resource type:

1. Application-level: listmonk uses channels (not mutexes) for coordination — the CSP model. The campaign engine is a bounded worker pool where backpressure is built into the channel. Counters use lock-free atomics. Config is immutable — changes trigger a full process restart, completely sidestepping read-write contention. This is an architectural decision to prefer simplicity over fine-grained locking.

2. Connection pool: The single shared DB pool (25 connections) is the primary contention point. All subsystems — HTTP handlers, campaign engine, bounce processor, cron jobs — compete for the same connections. Under high load, goroutines queue behind the pool's internal mutex. The fix is pool isolation: separate pools for campaign engine vs HTTP handlers, or a priority queue that reserves connections for critical paths.

3. Database-level: PostgreSQL MVCC eliminates most row-level contention — readers never block writers, writers never block readers. The real contention is on sequences (BIGSERIAL PK on high-insert tables like link_clicks) and unique index locks during concurrent upserts. Mitigate with sequence caching (CACHE 100) and batch inserts.

4. Cross-campaign: The global rate limiter is shared across all campaigns — two concurrent campaigns each get half the throughput. There's no per-campaign resource isolation. At scale, you'd partition resources per campaign: dedicated worker pools, separate rate limiters, and named SMTP servers for high-priority campaigns.

Resilience & High Availability

Failure Modes & Recovery

Failure	Impact	Recovery Mechanism
App Crash Mid-Campaign	Campaign paused at last batch	last_subscriber_id persisted per-batch. Auto-resumes on restart.
SMTP Server Down	Messages fail for that server	Per-message retry. Auto-pause at error threshold. Multiple SMTP fallback.
DB Connection Lost	All operations fail	sqlx auto-reconnect via pool. max_open/max_idle/max_lifetime.
Bounce Flood	Sender reputation at risk	Auto-blocklist after N hard bounces. Configurable per bounce type.
Config Change Needed	Requires restart	SIGHUP hot restart: graceful shutdown → process self-replace. No downtime.
Campaign Stuck	Never finishes	--passive flag for read-only. Admin force status change via API.

What listmonk Does NOT Have (HA Gaps)

No Multi-Node Active-Active

Single process sends campaigns. --passive mode provides read replicas but only one sender.

No Built-in DB Replication

Relies on PostgreSQL's native streaming replication or managed services (RDS, Cloud SQL).

No Distributed Locking

Campaign ownership is implicit (single process). No Redis/etcd coordination.

No Circuit Breaker

SMTP failures use simple retry + threshold. No exponential backoff or circuit breaker.

Data Integrity Guarantees

PostgreSQL provides ACID transactions for all subscriber/list/campaign mutations. FK constraints with CASCADE ensure referential integrity. ENUM types enforce valid state transitions. The sent counter in v3.0.0 is exact (not approximated), ensuring no duplicate or missed sends on pause/resume.

Idempotent Upgrades

./listmonk --upgrade is idempotent. Running it multiple times has no side effects. Migrations use version checks and are applied sequentially. Critical for automated deployment pipelines (Kubernetes rollouts, CI/CD).

Design Principles & Patterns

Unix Philosophy

Single binary that does one thing well. PostgreSQL for storage, SMTP for delivery. Compose with external tools (Caddy, Nginx, SES, S3).

Convention over Configuration

Sensible defaults for everything (batch_size=1000, concurrency=10, port=9000). Override via config, env, or UI. Works out of the box.

Separation of Concerns

HTTP handlers know nothing about SQL. Core knows nothing about HTTP. Manager knows nothing about SMTP internals. Each layer has one job.

Strategy Pattern

Messenger interface (SMTP, HTTP webhook), Media provider (filesystem, S3). Swap implementations without changing callers. Open/Closed principle.

State Machine (DB-Enforced)

Campaign lifecycle: draft→running→scheduled→paused→cancelled→finished. PostgreSQL ENUMs prevent invalid transitions at storage level.

Fail-Safe Defaults

Privacy tracking off by default. Unsubscribe headers on. Blocklist on hard bounce. Conservative error thresholds. Safe by default.

Idempotent Operations

DB upgrades are idempotent. Subscriber import upserts. Campaign resume replays from checkpoint. Safe to retry any operation.

12-Factor App Compliance

Config in env vars. Stateless process (state in Postgres). Logs to stdout. Port binding. Dev/prod parity via Docker.

GoF & Architectural Patterns Catalog

Pattern	Category	Where in listmonk
Strategy	Behavioral	Messenger interface, Media provider interface
Observer	Behavioral	SSE events bus for real-time UI updates
Template Method	Behavioral	Go html/template with Sprig function injection
Producer-Consumer	Concurrency	Campaign batch fetch → channel → worker goroutines
Connection Pool	Creational	sqlx DB pool, SMTP connection pool per server
Repository	Structural	Core layer wraps all DB access behind domain methods
Facade	Structural	App struct provides single entry point to all subsystems
Middleware Chain	Structural	Echo middleware: auth → CORS → logging → handler
Materialized View	Data	Pre-aggregated dashboard stats, refreshed on cron
Cursor Pagination	Data	Keyset pagination via last_subscriber_id bookmark

Non-Functional Requirements — How listmonk Handles Them

NFRs are the make-or-break qualities that interviewers probe after your functional design. listmonk is an excellent case study because it's a production system handling millions of messages — every NFR decision below was battle-tested, not theoretical.

Security

NFR Concern	Implementation	Code Reference
Authentication	Three modes: password login (bcrypt-hashed), OIDC/SSO (Google, Microsoft, Apple), API tokens (username:token header). Sessions stored in PostgreSQL via `simplesessions`.	internal/auth/auth.go, cmd/auth.go
Authorization (RBAC)	Role-based access control with two role types: `user` roles (global permissions) and `list` roles (per-list permissions). Permissions defined in `permissions.json`. Each API endpoint checks permissions via middleware. Users can have different access levels per list.	internal/auth/, roles table, permissions.json
CSRF Protection	Cookie-based sessions with SameSite attribute. OIDC flows use state parameter for CSRF prevention. Admin UI is SPA (same-origin API calls).	cmd/auth.go
XSS Prevention	Campaign preview iframes sandboxed. Custom CSS/JS injection scoped to admin/public separately. Go's `html/template` auto-escapes by default. v5.0.2 patched stored XSS via Sprig template injection.	cmd/admin.go, security advisories
Secret Management	Passwords masked in UI responses with `•` characters. Backend merges existing passwords via UUID matching when masked values are submitted. SMTP passwords, S3 keys, OIDC secrets all masked.	cmd/settings.go
CORS	Configurable allowed origins via `security.cors_origins`. Supports wildcard `*` or specific URLs. Validated and normalized on save.	cmd/settings.go:261-280
CAPTCHA	Two providers: ALTCHA (proof-of-work, privacy-friendly, no external calls) and hCaptcha. Protects public subscription forms from bot abuse.	internal/captcha/
2FA	TOTP-based two-factor authentication for user accounts. Stored as `twofa_type` ENUM and `twofa_key` in users table.	users table, cmd/auth.go
Sprig Template Hardening	Dangerous Sprig functions (`env`, `expandenv`) removed to prevent environment variable leakage from templates. Patched in v5.0.2 after CVE.	internal/manager/manager.go

Privacy & GDPR Compliance

NFR Concern	Implementation	Config Key
Tracking Controls	`privacy.individual_tracking` (off by default) controls per-subscriber open/click attribution. `privacy.disable_tracking` turns off all tracking entirely. When disabled, tracking pixels and link wrapping are skipped.	privacy.individual_tracking, privacy.disable_tracking
Self-Service Data Export	Subscribers can export their own data (profile, subscriptions, campaign views, link clicks) via public pages. Exportable fields configurable via `privacy.exportable`.	privacy.allow_export, privacy.exportable[]
Self-Service Data Wipe	Subscribers can request complete deletion of their data. Cascades via FK constraints to remove all associated records.	privacy.allow_wipe
Self-Service Blocklist	Subscribers can blocklist themselves, preventing any future emails. Status set to 'blocklisted' in DB.	privacy.allow_blocklist
Subscription Preferences	Subscribers can manage their own list subscriptions via public preference pages.	privacy.allow_preferences
Unsubscribe Headers	RFC 8058 `List-Unsubscribe` header added to all campaign emails by default. Required by Gmail/Yahoo for bulk senders.	privacy.unsubscribe_header (default: true)
Domain Filtering	Blocklist and allowlist for email domains. Supports wildcard patterns (`*.example.com`). Applied during subscription and import. Prevents abuse from disposable email domains.	privacy.domain_blocklist[], privacy.domain_allowlist[]
IP Recording	Opt-in IP address recording on subscription confirmation. Off by default for privacy.	privacy.record_optin_ip (default: false)
Data Ownership	Self-hosted = you own all data. No third-party analytics. No external tracking pixels. PostgreSQL under your control.	Architecture decision

Observability & Logging

NFR Concern	Implementation	Gap / Notes
Logging	Standard Go `log.Logger` to stdout. Structured log lines with timestamps. Campaign manager logs start/finish/errors per campaign with subscriber IDs.	No structured JSON logging. No log levels (debug/info/warn/error). Basic but functional.
Dashboard Analytics	Materialized views provide: subscriber counts by status, campaign stats by status, 30-day link click and view trends, per-list subscriber breakdowns. Refreshed on cron.	mat_dashboard_counts, mat_dashboard_charts, mat_list_subscriber_stats
Campaign Tracking	Per-campaign: sent count, open rate (pixel tracking), click-through rate (link wrapping), bounce count by type. Real-time progress via SSE events.	campaign_views, link_clicks, bounces tables
Health Checks	HTTP server responds on configured address. About endpoint exposes version, Go runtime stats (CPU, memory alloc, OS memory). DB connectivity implicit in operations.	No dedicated /health or /readyz endpoint. Would need reverse proxy health checks.
Real-Time Events	SSE (Server-Sent Events) bus via `internal/events/`. Frontend receives live campaign progress updates, import status, notifications without polling.	internal/events/events.go
Metrics / APM	Not built-in. No Prometheus metrics endpoint, no OpenTelemetry instrumentation.	Gap — would need external instrumentation for production monitoring at scale.

Operability & Deployment

NFR Concern	Implementation
Zero-Downtime Config Reload	SIGHUP signal triggers graceful shutdown (HTTP drain, campaign flush, DB close) then `syscall.Exec()` self-replaces process. Campaigns in progress: sets `needsRestart` flag and shows warning banner — admin restarts later.
Idempotent Migrations	`--upgrade` is safe to run multiple times. Version-checked sequential migrations. Critical for CI/CD pipelines and Kubernetes rolling deployments.
Single Binary Distribution	All assets (frontend, SQL, i18n, templates) embedded via `stuffbin`. One binary + one config.toml + one PostgreSQL. No Node.js, no file dependencies.
Docker / Kubernetes	Official Docker image on DockerHub. docker-compose.yml included. Community Helm chart available. Environment variable configuration via `LISTMONK_*` prefix.
Passive Mode	`--passive` flag runs the app without processing campaigns. Useful for read-only API replicas behind a load balancer while one instance handles sending.
Systemd Integration	Ships with `listmonk.service` and `listmonk@.service` (template unit for multiple instances). Production-ready process management.
Backup Strategy	All state in PostgreSQL — standard `pg_dump` for backups. Media on filesystem or S3 (backed up via provider tools). No application-level backup mechanism needed.

Performance (as NFR)

NFR Concern	Implementation
Throughput Tuning	Three knobs: `app.concurrency` (worker goroutines), `app.message_rate` (msgs/sec), `app.batch_size` (DB fetch size). All configurable via UI without code changes.
Resource Efficiency	57MB peak RAM for 7M+ emails. Fractional CPU. Go's goroutines are ~2KB each vs threads at ~1MB. Connection pooling avoids socket exhaustion.
Slow Query Mitigation	`app.cache_slow_queries` enables cron-refreshed materialized views for expensive dashboard aggregations. Configurable interval (default: daily at 3 AM).
Connection Limits	DB: `max_open=25`, `max_idle=25`, `max_lifetime=300s`. SMTP: `max_conns` per server with `idle_timeout` and `wait_timeout`. Prevents resource exhaustion.
Streaming Operations	CSV subscriber export streams rows as fetched (no buffering entire dataset). Import processes in configurable batch sizes. Constant memory for large operations.

Internationalization (i18n)

NFR Concern	Implementation
Multi-Language Support	JSON language files in `i18n/*.json`. Loaded via `stuffbin` embedded filesystem. Backend uses `internal/i18n` package with `T()` and `Ts()` (with substitutions) functions.
Frontend Localization	Vue.js admin dashboard uses `vue-i18n` with the same JSON language files served from `/admin/static/`. Dynamically loaded based on user language setting.
Public Page Localization	Subscription forms, unsubscribe pages, preference pages all localized. System email templates (opt-in confirmation, notifications) use `L()` function for translations.
Date/Time Localization	Day.js configured with localized relative time strings. Absolute dates use translated day/month names from i18n files.

Maintainability & Testability

NFR Concern	Implementation
Code Organization	`internal/` packages enforce encapsulation. Layered architecture (handlers → core → DB) prevents spaghetti dependencies. Each package has a single responsibility.
SQL as First-Class Artifact	All queries in version-controlled `.sql` files. No generated SQL. No ORM magic. Reviewable, diffable, optimizable independently of Go code.
Schema Migrations	Versioned migrations in `internal/migrations/`. Each version file (e.g., `v5.1.0.go`) contains the delta. Applied sequentially with version checks.
Testability	Core business logic has zero HTTP dependencies — can be unit tested with a DB mock. Interface-based design (Messenger, Media provider) enables test doubles.
Dev Environment	`.devcontainer/` config for VS Code dev containers. Makefile with `make dist` build target. Docker Compose for local Postgres. Frontend hot-reload with Vue CLI.

Error Handling & Fault Tolerance

Concern	Mechanism	How It Works
Per-Message Retry	SMTP `max_msg_retries` (default: 2)	Each failed message is retried N times before being counted as a send error. Retry happens immediately within the same worker goroutine. Failed messages don't block the channel — other workers continue sending.
Campaign Error Threshold	`app.max_send_errors` (default: 1000)	Cumulative send errors tracked per campaign via atomic counter. When threshold is hit, campaign auto-pauses — prevents burning through an entire list when SMTP is misconfigured or provider is throttling. Admin can investigate and resume.
SMTP Connection Failure	Connection pool + wait_timeout	If a connection dies mid-send, the pool creates a new one. `wait_timeout` prevents indefinite blocking waiting for a free connection. `idle_timeout` closes stale connections proactively.
Bounce Classification	Three-tier: soft / hard / complaint	Each bounce type has configurable `count` and `action`. Soft bounce: ignore until threshold (transient issues). Hard bounce: blocklist after 1 (permanent — invalid address). Complaint: blocklist after 1 (spam report). Actions: `none`, `blocklist`, `delete`.
Crash Recovery	Checkpoint via `last_subscriber_id`	After each batch, campaign progress is persisted to DB. On process crash/restart, campaigns with status='running' auto-resume from the last checkpoint. No duplicate sends because keyset cursor skips already-processed subscribers.
Subscriber Import Errors	Per-row validation + summary	CSV import validates each row (email format, domain blocklist, required fields). Invalid rows are skipped with error details in import log. Valid rows are upserted. Import can be stopped/retried without corrupting data.
Template Render Errors	Per-subscriber isolation	If a template fails to render for a specific subscriber (e.g., missing attribute), that single message is marked as error. Other subscribers are unaffected. Campaign continues processing.
DB Connection Exhaustion	Pool limits + lifetime	`max_open=25` hard-caps total connections. `max_lifetime=300s` recycles connections preventing stale state. `max_idle=25` keeps warm connections ready. sqlx auto-reconnects on transient failures.
Graceful Degradation	Campaign pause + passive mode	No circuit breaker pattern, but the error threshold serves a similar purpose: after enough failures, the system stops trying (pauses campaign). `--passive` mode allows serving the UI/API while campaign sending is disabled.

What's Missing in Error Handling (Interview Talking Points)

No Exponential Backoff

Retries are immediate, not with increasing delays. At scale, this can worsen SMTP server overload. Fix: implement backoff with jitter (time.Sleep(baseDelay * 2^attempt + rand)).

No Circuit Breaker

If SMTP is down, all workers keep hitting it until error threshold. Fix: per-SMTP circuit breaker (closed → open → half-open) that short-circuits after N consecutive failures.

No Dead Letter Queue

Failed messages are counted but not persisted for later retry. Once max_send_errors is hit, those messages are lost. Fix: DLQ table to store failed messages for manual/automated retry.

No Per-SMTP Health

All SMTP servers in the "email" pool are treated equally. A degraded server gets the same traffic as a healthy one. Fix: weighted round-robin with health scoring.

Go-Specific Scalability Patterns

Pattern	Go Implementation	Why It Scales
Goroutine Worker Pool	Fixed-size pool (default 10) consuming from a buffered channel. `for msg := range msgChan { ... }`	Goroutines are ~2-4KB stack (vs ~1MB threads). 10 goroutines handle millions of messages. Channel provides natural backpressure — if workers are busy, producer blocks on channel send. No unbounded goroutine creation.
Channel-Based Flow Control	Buffered channel between batch producer and send workers. Buffer size = batch_size.	Channels are Go's CSP primitive. They handle synchronization, ordering, and backpressure without explicit locks. If SMTP is slow, channel fills up, producer pauses DB fetching automatically. Self-regulating.
Atomic Counters	`atomic.AddInt64(&sent, 1)` for campaign progress	Lock-free concurrent increment. No mutex contention across 10+ workers updating the same counter. Hardware CAS instruction. O(1) regardless of worker count.
sync.Once for Caching	Template compilation cached via sync.Once. Recompiled only on explicit template update.	Thread-safe lazy initialization. First call compiles, subsequent calls return cached result. Zero allocation after first call. Critical for hot-path template rendering.
Context Cancellation	`context.Context` propagated from HTTP request → Core → DB query	Request-scoped timeouts and cancellation. If a client disconnects, the entire call chain cancels — DB query aborted, goroutine freed. Prevents resource leaks on slow queries or abandoned requests.
Connection Pool (database/sql)	Go's `sql.DB` is already a connection pool. sqlx wraps it. `max_open=25`.	Pool manages connection lifecycle, reuse, health checks. Concurrent goroutines share the pool safely. Idle connections kept warm. Lifetime rotation prevents stale TCP connections.
Embedded Filesystem	`stuffbin.FileSystem` embeds all assets into the binary at compile time.	Zero disk I/O for serving frontend assets. Memory-mapped access. No file descriptor overhead. Eliminates deployment complexity (no asset directory sync). Scales vertically with zero ops burden.
Streaming Response	CSV export: `csv.Writer` wrapping `http.ResponseWriter`. Rows written as DB cursor advances.	O(1) memory for exporting 10M subscribers. No buffering entire result set. HTTP chunked transfer encoding. Client sees data immediately. DB cursor keeps server-side state.

Go Scalability Mental Model for Interviews: listmonk proves that a single Go process with goroutine pools + channels + connection pooling can handle millions of operations. The key insight: Go's concurrency model maps perfectly to the producer-consumer pattern. The producer is I/O-bound (DB fetch), workers are I/O-bound (SMTP send), and channels decouple them. You don't need Kafka for this workload — Go channels are an in-process message queue. You add Kafka when you need multi-node distribution or replay guarantees.

High Availability (HA) — What Exists & What Doesn't

HA Dimension	Current State	How You'd Improve It (Interview Answer)
Process Availability	✓ Hot restart via SIGHUP + `syscall.Exec()`. Systemd auto-restart on crash. Docker restart policies. Millisecond startup time.	Add liveness/readiness probes for K8s. Currently no `/healthz` — requests to any endpoint implicitly confirm liveness.
Database HA	◐ Relies on external PostgreSQL HA (RDS Multi-AZ, Patroni, pg_auto_failover). Connection pool handles transient failures.	For self-hosted: Patroni + pgBouncer. For cloud: RDS/Cloud SQL with read replicas. listmonk's `--passive` mode can point to a read replica.
Campaign Continuity	✓ Checkpoint-based resume. Campaigns survive process restart. `last_subscriber_id` persisted per batch. Status remains 'running' in DB.	For zero-gap: WAL-based approach — log each message to a journal before sending, mark complete after ACK. Current approach has a small window of potential duplicates within a batch.
Horizontal Scaling (Read)	◐ `--passive` mode serves API/UI without processing campaigns. Multiple passive instances behind a load balancer.	Add session stickiness or shared session store (Redis/PostgreSQL sessions already in DB). API token auth is stateless and scales naturally.
Horizontal Scaling (Write/Send)	✗ Single sender process. No campaign sharding, no distributed locking, no work stealing.	Campaign partitioning: assign subscriber ID ranges to sender nodes. Distributed lock (etcd/Redis) for campaign ownership. Message queue (Kafka/SQS) between fetch and send.
SMTP Failover	◐ Multiple SMTP servers can be configured. Default "email" messenger load-balances across all enabled servers. But no active health checking or automatic failover — if one server is slow, it still gets traffic.	Add per-server health scoring. Circuit breaker per SMTP. Weighted routing based on success rate. Remove unhealthy servers from pool temporarily.
Data Durability	✓ All state in PostgreSQL with ACID guarantees. FK constraints prevent orphaned data. WAL provides crash consistency. Standard pg_dump for backups.	Point-in-time recovery via WAL archiving. Cross-region replication for DR. Media on S3 with cross-region replication.
Zero-Downtime Deploys	✓ Idempotent `--upgrade`. SIGHUP hot restart. Campaigns auto-resume post-restart. Docker rolling update compatible.	Blue-green deployment: run new version in passive mode, verify, then switch active sender. K8s rolling update with readiness probe.

HA Architecture for 99.9% Uptime — What You'd Propose in an Interview

┌──────────────┐     ┌──────────────────────────────────────────────┐
│   Load       │     │  Application Tier                            │
│   Balancer   │────▶│  ┌─────────────┐  ┌─────────────┐           │
│   (Nginx/    │     │  │ listmonk    │  │ listmonk    │           │
│    ALB)      │     │  │ (active     │  │ (passive    │           │
│              │     │  │  sender)    │  │  read-only) │  × N      │
└──────────────┘     │  └──────┬──────┘  └──────┬──────┘           │
                     └─────────┼────────────────┼──────────────────┘
                               │                │
                     ┌─────────▼────────────────▼──────────────────┐
                     │  Database Tier                               │
                     │  ┌──────────┐    ┌──────────┐               │
                     │  │ Postgres │───▶│ Postgres │               │
                     │  │ Primary  │    │ Replica  │  (streaming)  │
                     │  └──────────┘    └──────────┘               │
                     └─────────────────────────────────────────────┘

Active sender: processes campaigns, handles writes
Passive instances: serve UI/API reads, handle public pages
Primary DB: all writes, campaign state
Replica DB: passive instances read from here

NFR Summary Matrix — Interview Quick Reference

When an interviewer asks "how would you handle X?" — point to these concrete implementations:

✓ Strong

Security (RBAC, OIDC, 2FA, CAPTCHA), Privacy/GDPR (full self-service), Operability (single binary, hot restart, idempotent upgrades), i18n, Performance tuning, Data integrity (ACID, FK, ENUMs), Crash recovery (checkpoint + auto-resume), Go concurrency (goroutine pool + channels + atomics), Zero-downtime deploys, Data durability

◐ Adequate

Logging (stdout, no levels), Error handling (retry + threshold auto-pause, but no backoff), HA for reads (passive mode), SMTP failover (multi-server but no health scoring), DB HA (relies on external Postgres replication), Testability (interface-based but limited visible test coverage)

✗ Gaps

No Prometheus/OpenTelemetry, no structured logging, no /healthz endpoint, no circuit breaker, no exponential backoff, no dead letter queue, no distributed tracing, no horizontal send scaling, no per-API-consumer rate limits, no per-SMTP health scoring

The gaps are intentional trade-offs for simplicity. In an interview, acknowledge them and propose solutions: "listmonk prioritizes operational simplicity — a single binary serving 7M+ emails. At scale, I'd add: /healthz endpoint for K8s probes, Prometheus metrics via expvar or echo middleware, structured logging with slog (Go 1.21+), circuit breakers per SMTP server using sony/gobreaker, exponential backoff with jitter on retries, and a dead letter table for failed messages. For horizontal send scaling, I'd introduce Kafka between the batch producer and send workers, with campaign partition assignment via etcd distributed locks."

System Design Interview Cheat Sheet

Use listmonk as a reference when answering questions about designing email/notification systems, producer-consumer pipelines, or self-hosted SaaS alternatives.

If Asked: "Design a Newsletter/Email System"

1. Data Model: Subscribers (with JSONB attrs for segmentation), Lists (many-to-many via junction table with subscription status), Campaigns (state machine with 6 states), Templates (Go/Jinja templating).
2. Send Pipeline: Producer fetches batches via keyset pagination (WHERE id > cursor). Workers consume from buffered channel. Rate limiter (token bucket + sliding window). SMTP connection pool per server.
3. Tracking: Pixel tracking for opens (1x1 transparent PNG). Link wrapping for click tracking. Privacy toggle to disable/anonymize. Expression indexes on DATE for time-series queries.
4. Bounce Handling: Webhook receivers for SES/SendGrid/Postmark. POP/IMAP mailbox scanning. Auto-blocklist on hard bounce. Configurable thresholds per bounce type.
5. Scale: Materialized views for dashboards. Batch processing amortizes DB cost. Connection pooling. Single process handles 7M+ emails. For 100M+: shard DB, add message queue, horizontal senders.
6. Reliability: Checkpoint-based crash recovery (last_subscriber_id). Idempotent resume. Error threshold auto-pause. ACID transactions for mutations. Graceful hot restart via SIGHUP.

Key Talking Points

Topic	What to Say	listmonk Reference
Why PostgreSQL?	JSONB for flexible schemas without NoSQL complexity. ENUMs for state machines. Materialized views for read optimization. ACID for correctness.	subscribers.attribs JSONB, campaign_status ENUM, mat_dashboard_counts
Why not an ORM?	Named SQL via goyesql gives full PostgreSQL feature access. No N+1 queries. SQL is reviewable, optimizable. Complex queries don't fit ORM patterns.	queries/*.sql loaded at startup
Cursor vs Offset	OFFSET scans N rows then discards. Cursor (keyset) uses WHERE id > X which hits the index directly. O(1) vs O(N). Critical at scale.	campaigns.last_subscriber_id
Rate Limiting	Token bucket for steady rate. Sliding window for burst control. Per-connection limits for SMTP backpressure. Layered approach.	message_rate, sliding_window, max_conns
Single Binary Trade-offs	Pro: zero-dep deployment, fast startup, simple ops. Con: vertical scaling only. Good for 80% of use cases.	stuffbin embeds all assets
When to Add a Queue	Current: in-process channel. At scale: Kafka/SQS decouples fetch from send, enables multi-node senders, provides replay.	Campaign manager uses Go channels

Quick-Reference: Numbers to Know

57 MB

Peak RAM for 7M+ emails

1000

Default batch size

Default worker concurrency

Default DB connection pool

PostgreSQL ENUM types

Materialized views

Campaign states

~12

Database tables