What is listmonk?
listmonk is a high-performance, self-hosted newsletter and mailing list manager. It ships as a single Go binary with an embedded Vue.js frontend, backed by PostgreSQL. It has demonstrated production workloads of 7+ million emails per campaign with peak RAM of ~57MB and fractional CPU usage. The project has 19k+ GitHub stars and is built by Kailash Nadh (CTO of Zerodha, India's largest stock broker).
Why Study This for System Design?
Tech Stack
| Layer | Technology | Role |
|---|---|---|
| Language | Go | Backend, CLI, campaign engine |
| Web Framework | labstack/echo v4 | HTTP routing, middleware |
| Database | PostgreSQL | All persistent state, JSONB attributes |
| DB Driver | jmoiron/sqlx + lib/pq | SQL execution, struct scanning |
| SQL Management | knadh/goyesql | Named SQL queries from .sql files |
| Config | knadh/koanf | Multi-source: TOML → env vars → DB |
| Frontend | Vue.js 3 + Buefy | Admin SPA dashboard |
| Asset Embed | knadh/stuffbin | Embed frontend/SQL/i18n into binary |
| Auth | Sessions + OIDC + RBAC | Cookie sessions, SSO, role perms |
| Templating | html/template + Sprig | Dynamic email templates, 100+ funcs |
System Architecture Diagram
Hover over each component to see its responsibilities:
Layered Architecture
listmonk follows a clean 4-layer architecture. Each layer has a single responsibility and communicates only with adjacent layers.
Layer 1: HTTP Handlers (cmd/*.go)
Thin handlers that parse HTTP requests (path params, query strings, JSON bodies), call the Core layer, and serialize responses. No business logic here. Each handler is a method on the App struct which holds references to all subsystems. Echo framework provides routing, middleware, and context.
Layer 2: Core Business Logic (internal/core/)
All domain operations: CRUD for subscribers, lists, campaigns, templates. The Core struct wraps the DB and query runner. This layer is pure Go with zero HTTP dependencies — it could be called from CLI, tests, or any other interface. It enforces validation, permission checks, and domain invariants.
Layer 3: Campaign Manager (internal/manager/)
The concurrent campaign processing engine. Runs as a long-lived goroutine that polls the DB for active campaigns. It owns the entire send pipeline: batch fetching, template rendering, rate limiting, worker pool dispatch, progress tracking, error handling. Completely rewritten in v3.0.0 for near-instant pause/cancel and lossless counting.
Layer 4: Data Layer (queries/*.sql + PostgreSQL)
All SQL lives in .sql files, loaded at startup via goyesql. No ORM. This gives full control over query optimization, PostgreSQL-specific features (JSONB, arrays, materialized views), and makes SQL reviewable and versionable. The sqlx library provides struct scanning.
Key Architectural Decision: The App Struct
type App struct {
core *core.Core // Business logic layer
fs stuffbin.FileSystem // Embedded filesystem (assets, SQL, i18n)
db *sqlx.DB // PostgreSQL connection pool
queries *models.Queries // Pre-loaded named SQL statements
constants *constants // Runtime config snapshot
manager *manager.Manager // Campaign processing engine
importer *subimporter.Importer // CSV/bulk import processor
notifs *notifs.Notifs // Admin email notifications
i18n *i18n.I18n // Internationalization
bounceProc *bounce.Manager // Bounce email processor
captcha *captcha.Captcha // ALTCHA/hCaptcha
auth *auth.Auth // Sessions + RBAC + OIDC
events *events.Events // SSE event bus
paginator *paginator.Paginator // Cursor/offset pagination
log *log.Logger
}
This is Go's idiomatic alternative to dependency injection containers. All subsystems are initialized in init.go and wired together via this struct. Handlers access them via a.core, a.manager, etc.
Database Schema (ER Diagram)
PostgreSQL schema with ~12 tables, JSONB for extensibility, materialized views for analytics:
Key Database Design Decisions
1. Custom ENUM Types (12 types)
PostgreSQL ENUMs enforce valid states at the DB level: campaign_status has 6 states (draft → running → scheduled → paused → cancelled → finished), subscriber_status has 3 (enabled, disabled, blocklisted), subscription_status tracks per-list relationship (unconfirmed, confirmed, unsubscribed). This is a state machine enforced by the database, not application code.
2. JSONB for Flexible Attributes
subscribers.attribs stores arbitrary key-value data as JSONB, enabling schema-less subscriber segmentation. You can query attribs->>'city' = 'Atlanta' with GIN indexes. The settings table stores all app config as JSONB key-value pairs, enabling runtime configuration changes through the UI without schema migrations.
3. Junction Table Pattern (subscriber_lists)
Many-to-many with a composite primary key PK(subscriber_id, list_id). The junction table carries its own state (subscription_status) and metadata (meta JSONB). Separate indexes on each FK column for efficient queries in both directions.
4. Keyset Pagination Columns
campaigns.last_subscriber_id and max_subscriber_id enable cursor-based pagination for campaign sending. Instead of OFFSET N (O(N) scan), it uses WHERE id > last_subscriber_id ORDER BY id LIMIT batch_size (O(1) via index). Critical for sending campaigns to millions of subscribers.
5. Materialized Views for Dashboard
Three materialized views precompute expensive aggregations: mat_dashboard_counts (subscriber/list/campaign totals), mat_dashboard_charts (30-day click/view trends), mat_list_subscriber_stats (per-list subscriber counts by status). Refreshed on cron schedule via REFRESH MATERIALIZED VIEW CONCURRENTLY. Orders-of-magnitude speedup for large databases.
6. Soft References & Denormalization
campaign_views.subscriber_id is nullable with ON DELETE SET NULL — if a subscriber is deleted, their view records persist for analytics. campaign_lists.list_name denormalizes the list name so campaign history survives list deletion. Preserves historical accuracy while allowing entity cleanup.
7. Indexing Strategy
Strategic indexes on: email (case-insensitive unique via LOWER(email)), status columns (for filtered scans), composite (id, status) for campaign batch fetching, DATE(created_at) expression indexes for time-series analytics. Partial unique index on templates(is_default) WHERE is_default = true ensures only one default template.
8. UUID + Serial ID Dual Identity
Internal operations use fast integer serial IDs for joins and pagination. External/public-facing operations use UUIDs (subscriber unsubscribe links, campaign archives, media references). Best of both worlds: performance internally, security externally (UUIDs aren't guessable).
Campaign Processing Pipeline
The campaign engine is the heart of listmonk. Rewritten in v3.0.0 for lossless operation:
Concurrency Model Deep Dive
Producer-Consumer with Channels
// Simplified mental model of the campaign engine
// Producer: fetches subscriber batches from DB
go func() {
for {
batch := db.Query("SELECT ... WHERE id > ? LIMIT ?",
lastSubID, batchSize)
for _, sub := range batch {
msg := renderTemplate(campaign, sub)
msgChan <- msg // Send to worker pool
}
lastSubID = batch[len(batch)-1].ID
updateProgress(campaign, lastSubID)
}
}()
// Consumer pool: N goroutines sending messages
for i := 0; i < concurrency; i++ {
go func() {
for msg := range msgChan {
rateLimiter.Wait() // Token bucket
err := messenger.Push(msg) // SMTP/HTTP
if err != nil {
handleRetry(msg, err)
}
atomic.AddInt64(&sent, 1)
}
}()
}
Rate Limiting Strategies
| Strategy | Config | Behavior |
|---|---|---|
| Fixed Rate | app.message_rate = 10 | Max 10 msgs/second globally. Token bucket. |
| Sliding Window | sliding_window + duration + rate | Max N messages within rolling time window (e.g., 10000/hour). |
| Per-SMTP Limits | smtp[].max_conns = 10 | Connection pool per SMTP server. Backpressure via channel blocking. |
| Error Threshold | app.max_send_errors = 1000 | Auto-pause campaign after N cumulative send failures. |
Crash Recovery & Resumption
The campaign stores last_subscriber_id after each batch. On restart, campaigns with status='running' are automatically resumed from the last checkpoint. The v3.0.0 rewrite ensures every single message is counted (not approximated), making pause/resume lossless. The to_send field is computed at campaign start as the total subscriber count for the target lists.
SMTP Connection Pool
Each configured SMTP server maintains a pool of max_conns persistent TCP connections. Connections are reused across messages (SMTP pipelining). idle_timeout and wait_timeout control connection lifecycle. Multiple SMTP servers can be load-balanced under the default "email" messenger, or targeted individually by naming them.
Go Patterns & Best Practices
// Messenger interface (Strategy pattern)
type Messenger interface {
Name() string
Push(msg Message) error
Flush() error
Close() error
}
// Implementations: email.Emailer, postback.Postback
type App struct {
core *core.Core
manager *manager.Manager
db *sqlx.DB
// ... all subsystems
}
// Methods: func (a *App) GetSubscribers(...)
-- queries/queries.sql -- name: get-subscriber SELECT * FROM subscribers WHERE id = $1; -- name: get-campaign-subscribers SELECT ... WHERE id > $1 ORDER BY id LIMIT $2;
internal/ core/ # Business logic (unexportable) manager/ # Campaign engine (unexportable) bounce/ # Bounce processing auth/ # Authentication media/ # Media storage subimporter/ # CSV import
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGHUP)
go func() {
<-sigChan
srv.Shutdown(ctx) // HTTP
manager.Close() // Campaigns
db.Close() // Postgres
syscall.Exec(...) // Self-replace
}()
// Precedence (last wins):
// 1. CLI flags (--config, --install)
// 2. TOML file (config.toml)
// 3. Env vars (LISTMONK_*)
// 4. Database settings table
ko.Load(file.Provider("config.toml"),
toml.Parser())
ko.Load(env.Provider("LISTMONK_", ...))
Additional Go Idioms Used
| Pattern | Where | Why |
|---|---|---|
| Functional Options | SMTP config, manager setup | Flexible initialization without constructor explosion |
| Context Propagation | HTTP handlers → Core → DB | Request-scoped deadlines, cancellation, auth info |
| sync.Once | Template compilation caching | Thread-safe lazy initialization of expensive resources |
| atomic Operations | Campaign sent counter | Lock-free concurrent counter updates in worker pool |
| embed.FS (stuffbin) | Static assets, SQL, i18n | Single binary deployment with zero external file deps |
| Error Wrapping | Throughout | fmt.Errorf with %w for error chain inspection |
| Table-Driven Tests | Core package | Declarative test cases with expected inputs/outputs |
| Middleware Chain | Echo middleware | Auth → CORS → logging → rate limit → handler |
Scalability & Performance at 1M+ Scale
Proven Production Numbers
listmonk.app states: "A production instance sending 7+ million emails. CPU usage is a fraction of a single core with peak RAM of 57 MB."
What Makes It Fast?
| Technique | Impact | Details |
|---|---|---|
| Keyset Pagination | O(1) batch fetch | WHERE id > cursor instead of OFFSET. Constant time regardless of dataset size. |
| Batch Processing | Amortized DB cost | Default batch_size=1000. One query fetches 1000 subscribers. Reduces round-trips 1000x. |
| Connection Pooling | DB: 25, SMTP: 10/srv | max_open=25, max_idle=25 for Postgres. max_conns per SMTP server. Reuse over create. |
| Materialized Views | Instant dashboards | Pre-aggregated stats. REFRESH CONCURRENTLY allows reads during rebuild. |
| Template Caching | Zero recompile | Templates compiled once at startup, cached in memory. Re-compiled only on update. |
| Goroutine Pool | Bounded concurrency | Fixed pool (default 10 workers). No goroutine leak. Channel backpressure for flow control. |
| Streaming Export | Constant memory | Subscriber export writes CSV rows as they're fetched. No full-dataset buffering. |
| Expression Indexes | DATE() fast | idx_clicks_date ON (TIMEZONE('UTC', created_at)::DATE) avoids per-row function eval. |
| Single Binary | Fast startup | No file I/O for assets. stuffbin serves from memory. Startup in milliseconds. |
| Cache Slow Queries | Configurable | v3+ option: enable/disable query caching with custom cron interval for large DBs. |
Bottlenecks & Scaling Limits
| Bottleneck | Limit | Mitigation |
|---|---|---|
| Single Postgres | ~10M subs before slowdown | Materialized views, cache_slow_queries, read replicas |
| Single Process | No horizontal scaling built-in | Run --passive for read-only replicas. One active sender. |
| SMTP Rate Limits | Provider-imposed (SES: 14/sec) | Sliding window limiter, multiple SMTPs, message_rate config |
| Template Rendering | CPU-bound for complex templates | Keep templates simple. Goroutine pool bounds CPU. |
| link_clicks Table | Can grow to billions of rows | DATE expression index. Consider partitioning at scale. |
What Would You Do to Scale to 100M+ Subscribers?
Great system design interview follow-up:
What Happens When 1 Million Requests Hit listmonk?
listmonk handles 6 fundamentally different request types. Each follows a different hot path through the system. Understanding these flows is critical for system design interviews — an interviewer will ask "walk me through what happens when..." and expect you to trace from TCP accept to DB write to response.
Key Insight: listmonk is NOT a typical CRUD API under load. Campaign sending is a push pipeline (server-initiated, async). Tracking pixels and link clicks are the real high-throughput inbound paths — these get 1M+ hits per campaign.
Flow 1: Tracking Pixel Open (Highest Volume — 1M+ per campaign)
When a subscriber opens an email, their client loads a 1x1 transparent PNG. This is the hottest path in the system.
1. EMAIL CLIENT ─── GET /campaign/{campaignUUID}/{subscriberUUID}/px.png ───▶ ECHO ROUTER │ │ // No auth middleware — public endpoint. No session lookup. ▼ 2. ECHO ROUTER ─── matches route "/campaign/:campUUID/:subUUID/px.png" ───▶ HANDLER: handleCampaignPixel() │ │ // Parse UUIDs from path params. Validate format (fast regex). No DB lookup yet. ▼ 3. HANDLER ─── checks privacy.disable_tracking setting ───▶ DECISION POINT │ ├── IF tracking disabled: Return 1x1 PNG immediately. No DB write. O(1). │ ├── IF privacy.individual_tracking = false: │ Insert into campaign_views with subscriber_id = NULL (anonymous). │ // Still counts the view, but can't attribute to a subscriber. │ └── IF individual tracking enabled: │ ▼ 4. DB INSERT ─── INSERT INTO campaign_views (campaign_id, subscriber_id, created_at) ───▶ POSTGRESQL │ │ // campaign_id resolved from UUID via campaigns table lookup (indexed). │ // subscriber_id resolved from UUID via subscribers table lookup (indexed). │ // Two UUID→ID lookups + one INSERT. Total: 3 queries. │ // campaign_views has BIGSERIAL PK — write-optimized append-only table. ▼ 5. RESPONSE ─── 200 OK, Content-Type: image/png, body: 1x1 transparent PNG (68 bytes) ───▶ EMAIL CLIENT
| At 1M Pixel Requests | What Happens | Bottleneck |
|---|---|---|
| Echo HTTP Server | Goroutine-per-request model. 1M requests = 1M goroutines (but short-lived, ~2KB each). Echo's radix tree router matches in O(1). No middleware overhead on public routes (no auth, no session). | Not a bottleneck — Go HTTP server handles 100k+ req/s on a single core. |
| UUID → ID Lookups | Two indexed lookups: campaigns(uuid) and subscribers(uuid). Both have UNIQUE indexes. B-tree lookup = O(log N). | At 1M req/s this is 2M index lookups/sec. Could become the bottleneck. Fix: cache UUID→ID mapping in-process (Go map with RWMutex, or sync.Map). |
| campaign_views INSERTs | 1M INSERT operations. BIGSERIAL PK = append-only. No index updates except on campaign_id and subscriber_id FK indexes. DATE expression index updated per row. | High write pressure. PostgreSQL can do ~10-50k INSERTs/sec depending on hardware. 1M requests would need: batch inserts, async buffering, or write-ahead table with periodic flush. |
| Connection Pool | 25 connections shared across all goroutines. Each INSERT holds a connection for ~1ms. Effective throughput: ~25,000 INSERTs/sec. | Pool exhaustion at high concurrency. Goroutines block waiting for free connection. Solution: increase max_open, or buffer writes in a Go channel and batch-insert. |
| Response | 68-byte PNG from memory (hardcoded). No disk I/O. No template rendering. Fastest possible response after DB write. | Not a bottleneck. |
Flow 2: Link Click Tracking (High Volume — 100K+ per campaign)
Every link in a campaign email is wrapped. When a subscriber clicks, they hit listmonk first, which records the click then redirects to the actual URL.
1. BROWSER ─── GET /link/{linkUUID}/{campaignUUID}/{subscriberUUID} ───▶ ECHO ROUTER │ │ // Public endpoint. No auth. Three UUIDs in path. ▼ 2. HANDLER ─── handleLinkRedirect() │ ├── Resolve link UUID → link record (get actual URL) │ // links table: url TEXT NOT NULL UNIQUE. UUID indexed. │ ├── Resolve campaign UUID → campaign_id │ ├── Resolve subscriber UUID → subscriber_id (if individual tracking on) │ ├── INSERT INTO link_clicks (campaign_id, link_id, subscriber_id) │ // BIGSERIAL PK. Indexes on campaign_id, link_id, subscriber_id, DATE. │ // 4 index updates per INSERT. Heavier than campaign_views. │ └── 302 REDIRECT → actual URL // User sees the destination page. Redirect is instant.
| At 1M Click Requests | What Happens | Optimization |
|---|---|---|
| 3 UUID Lookups | Link, campaign, and subscriber UUIDs resolved to integer IDs. Three indexed lookups per request. | Cache link UUID→(id, url) in-process. Links are immutable once created — perfect cache candidate with no invalidation needed. |
| link_clicks INSERT | Heavier than campaign_views: 4 index updates per row (campaign_id, link_id, subscriber_id, DATE expression index). | Batch inserts via buffered channel. COPY command for bulk. Consider unlogged table for click data if durability isn't critical. |
| Redirect Latency | User-facing! The subscriber waits for the 302. DB insert is on the critical path — if DB is slow, user sees delay before reaching destination. | Move INSERT off the critical path: write to in-memory buffer, return 302 immediately, flush to DB async. Accept ~1s data delay for instant redirect. |
| Table Growth | link_clicks grows unboundedly. 1M clicks/campaign × 100 campaigns = 100M rows. DATE expression index helps but table gets large. | Time-based partitioning (monthly). Drop partitions older than retention period. Or archive to cold storage. |
Flow 3: Campaign Send Pipeline (1M Outbound Emails)
This is not a request flow — it's a server-initiated push pipeline. But it's what people mean by "1M requests" in the context of listmonk.
MANAGER GOROUTINE (long-lived, polls DB every ~5s) │ ├── SELECT campaigns WHERE status IN ('running','scheduled') │ // status column indexed. Cheap scan — usually 0-5 active campaigns. │ ▼ For each active campaign: BATCH PRODUCER (one goroutine per campaign) │ │ // Loop until all subscribers processed: ├── SELECT subscribers WHERE id > last_subscriber_id │ AND id IN (subscriber_lists WHERE list_id IN campaign_lists) │ AND status = 'enabled' │ ORDER BY id LIMIT 1000 // ← keyset pagination, O(1) via PK index │ │ // For each subscriber in batch: ├── TEMPLATE RENDER ─── Go html/template.Execute() │ // Inject: subscriber.name, subscriber.attribs, campaign.subject │ // Generate: tracking pixel URL, wrapped link URLs │ // Sprig functions available. Compiled template cached (sync.Once). │ // CPU cost: ~0.1ms per render for simple templates. │ ├── msgChan <- message // Send to buffered channel. Blocks if channel full (backpressure). │ └── UPDATE campaigns SET last_subscriber_id = ?, sent = ? // Checkpoint after batch ║ channel (buffer = batch_size) ▼ WORKER POOL (N = app.concurrency, default 10 goroutines) │ │ // Each worker loops: for msg := range msgChan { ... } │ ├── RATE LIMITER ─── rateLimiter.Wait() │ // Token bucket: app.message_rate tokens/sec │ // OR sliding window: app.message_sliding_window_rate per window_duration │ // Blocks goroutine until token available. Natural throttle. │ ├── MESSENGER.Push(msg) ─── SMTP connection from pool │ // Acquires connection from pool (max_conns per SMTP server) │ // SMTP EHLO → MAIL FROM → RCPT TO → DATA → message body → QUIT │ // Connection reused for next message (persistent TCP). Pipelining. │ // On failure: retry up to max_msg_retries times │ ├── atomic.AddInt64(&sent, 1) // Lock-free counter update │ └── ON ERROR: increment error counter. If errors > max_send_errors → PAUSE campaign
1M Emails — Time & Resource Breakdown
| Phase | Operations | Time @ Defaults | Resource |
|---|---|---|---|
| DB Fetch | 1M / 1000 batch_size = 1000 queries | ~1ms/query × 1000 = ~1 second total | 1 DB connection (sequential per campaign) |
| Template Render | 1M renders × ~0.1ms each | ~100 seconds (single-threaded per campaign) | CPU-bound. ~1 core for simple templates. |
| Rate Limit Wait | 1M msgs ÷ 10 msg/sec | ~27.7 hours at default message_rate=10 | Near-zero (goroutine sleep) |
| SMTP Send | 1M connections (reused) × ~50ms avg | 10 workers × 50ms = ~200 msg/sec → ~83 minutes | 10 SMTP TCP connections |
| Progress Updates | 1000 checkpoint UPDATEs | ~1ms each = ~1 second total | 1 DB connection |
| Memory | 10 goroutines + 1000-msg channel buffer + template cache | — | ~50-60 MB peak (proven at 7M scale) |
concurrency=50, message_rate=500, batch_size=5000, max_conns=20 per SMTP × 3 SMTP servers = 60 total connections. Estimated throughput: ~500 msgs/sec = 1M emails in ~33 minutes. Memory stays under 100MB. CPU usage: 2-3 cores.Flow 4: Public Subscription Form Submit
When a user subscribes via a public form on your website. Lower volume but has more steps.
1. BROWSER ─── POST /subscription/form ───▶ ECHO ROUTER │ ├── CAPTCHA VERIFY ─── ALTCHA proof-of-work check OR hCaptcha API call │ // ALTCHA: CPU verification, no external API call (~1ms) │ // hCaptcha: external HTTP call to verify token (~100-300ms) │ ├── DOMAIN FILTER ─── check email domain against blocklist/allowlist │ // privacy.domain_blocklist: ["*.disposable.com", "tempmail.org"] │ // In-memory check. O(N) against list but lists are tiny. │ ├── UPSERT SUBSCRIBER ─── INSERT ... ON CONFLICT (email) DO UPDATE │ // Case-insensitive: idx_subs_email ON LOWER(email) │ // If exists: update name/attribs. If new: create with UUID. │ ├── INSERT subscriber_lists ─── subscribe to requested lists │ // Status = 'unconfirmed' for double-optin lists │ // Status = 'confirmed' for single-optin lists │ ├── IF double-optin: SEND CONFIRMATION EMAIL │ // Render optin template with confirmation link │ // Push to SMTP via messenger (same pool as campaigns) │ // Confirmation link: /subscription/optin/{subscriberUUID}/{listUUID} │ └── 200 OK ─── render success template page
Flow 5: REST API Request (Admin/Programmatic)
API-triggered operations: creating subscribers, managing lists, triggering transactional emails. Authenticated path.
1. CLIENT ─── GET /api/subscribers?page=1&per_page=50 ───▶ ECHO ROUTER │ │ // Authorization header: "username:api-token" (base64) ▼ 2. MIDDLEWARE CHAIN │ ├── Auth Middleware ─── validate API token │ // Lookup user by username in users table (indexed) │ // Verify token (bcrypt compare or direct match for API users) │ // Load user role + permissions from roles table │ // Set auth context on echo.Context │ ├── Permission Check ─── does user have "subscribers:get_all" permission? │ // Check user_role permissions array. If list-scoped, filter by allowed lists. │ └── CORS Middleware ─── check Origin header against security.cors_origins ▼ 3. HANDLER ─── GetSubscribers() │ ├── Parse query params ─── page, per_page, query, list_id, order_by │ ├── Build SQL ─── dynamic WHERE clause from search/filter params │ // If "query" is plain text: ILIKE search on email and name │ // If "query" starts with SQL expression: parse as raw WHERE clause │ // Permission-filtered: only show subscribers in user's allowed lists │ ├── EXECUTE SQL ─── via prepared statement from goyesql │ // Uses sqlx.Select() for struct scanning │ // COUNT(*) for total (separate query) │ // OFFSET/LIMIT pagination for API (not keyset — acceptable for admin UI) │ └── 200 OK ─── JSON response with data[] + total + per_page + page
| At 1M API Requests | Bottleneck Analysis |
|---|---|
| Auth Overhead | Every API request hits the DB for user/role lookup. At 1M req: 1M user queries + 1M role queries. Fix: cache authenticated sessions in-process with TTL. API tokens are static — perfect for caching. |
| OFFSET Pagination | API uses OFFSET/LIMIT (not keyset). page=10000&per_page=50 means scanning 500K rows. Degrades linearly. Acceptable for admin UI (low page numbers) but problematic for bulk API consumers. |
| No API Rate Limiting | No per-consumer rate limiting on the API. A misbehaving client can exhaust the DB connection pool. Fix: per-token rate limiter middleware (token bucket or sliding window). |
| Connection Pool Contention | API, campaign engine, tracking pixels, and bounce processor all share the same 25-connection pool. Under 1M API requests, campaign sending would slow down. Fix: separate pools or connection prioritization. |
Flow 6: Bounce Webhook (SES/SendGrid/Postmark)
After a campaign, bounce notifications flow back from email providers. Volume scales with send volume — expect 2-5% bounce rate.
1. SES/SENDGRID ─── POST /webhooks/bounce/{type} ───▶ ECHO ROUTER │ ├── Authenticate webhook ─── verify provider-specific auth │ // SES: verify SNS signature. SendGrid: verify key. Postmark: basic auth. │ ├── Parse bounce payload ─── extract: email, bounce type, campaign ID │ // Normalize across providers into internal bounce_type ENUM │ ├── INSERT INTO bounces ─── (subscriber_id, campaign_id, type, source, meta) │ // Subscriber looked up by email. campaign_id nullable (might be unknown). │ ├── CHECK THRESHOLD ─── count bounces for this subscriber by type │ // SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? AND type = ? │ // Compare against configured actions: hard.count=1, soft.count=2 │ └── IF threshold exceeded: UPDATE subscribers SET status = 'blocklisted' // Or DELETE subscriber if action = 'delete' // Cascading: subscriber_lists entries also cleaned up (FK CASCADE)
Flow 7: Transactional Email API
Single message sends triggered by your application (welcome emails, password resets, order confirmations). Different from campaign sends — synchronous, one-at-a-time.
1. YOUR APP ─── POST /api/tx ───▶ ECHO ROUTER │ │ // Body: { "subscriber_email": "...", "template_id": 5, "data": {...} } │ // Auth: API token (required) ▼ 2. HANDLER │ ├── Resolve subscriber ─── lookup by email or ID ├── Load TX template ─── from template cache (in-memory) ├── Render template ─── with subscriber data + custom data payload ├── messenger.Push(msg) ─── synchronous SMTP send │ // Blocks until SMTP ACK or error. Caller waits. │ // No rate limiting — each TX call = one immediate send. └── 200 OK ─── { "data": true }
| At 1M TX Requests | What Breaks |
|---|---|
| Synchronous SMTP | Each TX call blocks until SMTP responds (~50-200ms). With 25 DB connections, max concurrent TX sends = ~25. At 200ms each = ~125 TX/sec. 1M would take ~2.2 hours. Fix: async queue with delivery confirmation callback. |
| No TX Rate Limiting | TX calls bypass the campaign rate limiter. A burst of 10K TX calls would exhaust SMTP connections. Fix: separate TX rate limiter or shared token bucket. |
| Shared SMTP Pool | TX emails share the same SMTP connection pool as campaigns. Heavy TX load can starve campaign sending. Fix: dedicated SMTP server for TX (use named messenger). |
Request Flow Summary — All 7 Flows at 1M Scale
| Flow | Type | Volume Profile | Hot Path Cost | Primary Bottleneck | DB Queries/Req |
|---|---|---|---|---|---|
| Tracking Pixel | Inbound GET | 1M+ per large campaign | 2 UUID lookups + 1 INSERT | DB write throughput | 3 |
| Link Click | Inbound GET → 302 | 100K-500K per campaign | 3 UUID lookups + 1 INSERT | DB write + redirect latency | 4 |
| Campaign Send | Outbound push | 1M emails (async) | Batch fetch + render + SMTP | Rate limiter (intentional) | ~1 per 1000 |
| Subscription | Inbound POST | 100-10K/day | CAPTCHA + upsert + optin email | CAPTCHA verification | 2-4 |
| API CRUD | Inbound REST | Depends on integration | Auth + query + JSON marshal | Auth DB lookup (cacheable) | 3-5 |
| Bounce Webhook | Inbound POST | 2-5% of send volume | Auth + INSERT + threshold check | Negligible at normal rates | 3-4 |
| TX Email | Inbound POST → SMTP | Depends on app | Auth + render + sync SMTP | Synchronous SMTP blocking | 2-3 |
Will Goroutines Run Into Memory Errors?
Short answer: the campaign engine is safe, but the HTTP server is not protected. listmonk has two completely different goroutine models running simultaneously, and they have very different risk profiles.
Go Goroutine Memory Model — The Basics
| Property | Value | Why It Matters |
|---|---|---|
| Initial Stack Size | 2 KB (Go 1.4+) | A goroutine starts tiny. 1000 idle goroutines = ~2 MB. Cheap to create. |
| Stack Growth | Dynamically grows up to 1 GB (default) | Stack doubles when needed (copy-on-grow). A goroutine doing real work (allocating buffers, rendering templates, building HTTP responses) can grow to 8-64 KB each. |
| Heap Allocations | Varies per goroutine workload | Template rendering, JSON marshaling, SQL result scanning all allocate on the heap. These are the real memory consumers — not the goroutine stack itself. |
| GC Pressure | Go GC runs concurrently | High allocation rate from 100K+ goroutines triggers frequent GC cycles. GC latency spikes (STW pauses ~1-5ms) can compound under load. |
| OS Threads | GOMAXPROCS (default = num CPUs) | Goroutines are multiplexed onto OS threads. 1M goroutines still only uses ~8-16 OS threads. This is NOT the bottleneck. |
The Two Goroutine Models in listmonk
app.concurrency).Bounded by: Channel buffer (batch_size) + fixed worker count. Even sending 7M emails, there are only 10-50 goroutines alive.
Memory per campaign: ~10 workers × ~64KB stack + channel buffer of 1000 messages × ~2KB each = ~2.6 MB total.
Why it's safe: Producer blocks on channel send when buffer is full (backpressure). Workers block on rate limiter. Goroutine count never exceeds
concurrency. You cannot OOM from campaign sending.The pattern:
// Fixed pool — goroutine count = concurrency (constant)
for i := 0; i < concurrency; i++ {
go worker(msgChan) // Exactly N goroutines. No more.
}
// Producer blocks if channel full — natural backpressure
msgChan <- msg // Blocks here, NOT by spawning new goroutines
net/http server spawns one goroutine per incoming connection. Echo sits on top of this. No built-in limit.Bounded by: Nothing in listmonk's code. If 100K tracking pixel requests arrive simultaneously, Go creates 100K goroutines.
Memory per request: ~8-64KB stack + ~2-10KB heap (UUID parsing, DB result scanning, response writing) = ~50-70KB per concurrent request.
At 100K concurrent: 100K × 70KB = ~7 GB. At 500K concurrent: ~35 GB. At 1M concurrent: ~70 GB → OOM on most machines.
The pattern:
// Go's net/http — UNBOUNDED goroutine creation
func (srv *Server) Serve(l net.Listener) error {
for {
conn, _ := l.Accept()
go srv.serve(conn) // New goroutine for EVERY connection!
} // No limit. No backpressure.
}
When Does OOM Actually Happen?
The key distinction is concurrent vs total requests. 1M requests over an hour is fine. 1M requests in 1 second is a problem.
| Scenario | Concurrent Goroutines | Memory | OOM Risk |
|---|---|---|---|
| Campaign send: 1M emails | 10-50 (fixed pool) | ~3-10 MB | None — bounded by design |
| Tracking pixels: 1M over 24 hours ~12 req/sec avg | ~12-50 | ~1-3 MB | None — requests complete fast (~5ms each) |
| Tracking pixels: 1M in 1 hour ~278 req/sec avg | ~278-1000 | ~20-70 MB | Low — manageable if DB keeps up |
| Tracking pixels: 1M in 1 minute ~16,667 req/sec | ~5,000-50,000 | ~350 MB - 3.5 GB | Medium — DB becomes bottleneck, goroutines pile up waiting for connections |
| Tracking pixels: 100K concurrent spike / DDoS / viral email | 100,000 | ~7 GB | High — goroutines block on 25-conn DB pool, stack up in memory |
| Tracking pixels: 1M concurrent extreme / unrealistic | 1,000,000 | ~70 GB | OOM crash — Go will allocate until killed by OS |
The Real Danger: Goroutine Pile-Up on DB Pool
The OOM risk isn't from goroutine creation — it's from goroutine accumulation. Here's the cascade failure:
CASCADE FAILURE SCENARIO — 50K req/sec tracking pixel burst t=0s 50,000 requests arrive. Go spawns 50,000 goroutines. (~3.5 GB) │ ▼ t=0.001s All 50K goroutines try to acquire a DB connection from pool (max_open=25). 25 goroutines get connections. 49,975 goroutines BLOCK waiting. │ ▼ t=0.005s Each DB query takes ~5ms. First 25 connections freed. Next 25 goroutines proceed. But another 50K requests arrived! Now 99,950 goroutines blocked. (~7 GB) │ ▼ t=1.0s At 50K/sec inflow, 25 conn pool processes ~5000 req/sec (25 × 200 queries/sec). Deficit: 45,000 goroutines/sec accumulating. After 1 second: ~45K blocked. │ ▼ t=10s ~450K goroutines blocked. Memory: ~30 GB. GC thrashing. Latency spikes. │ ▼ t=20s ~900K goroutines. OOM killer triggers. Process killed. Campaign sending (which shares the same process) dies too.
What listmonk Does NOT Have (Protection Gaps)
| Missing Protection | Consequence | What You'd Add |
|---|---|---|
| No HTTP connection limit | Unbounded goroutine creation on burst traffic | Echo middleware: middleware.RateLimiter() or custom semaphore. Reject with 429 when concurrent > threshold. Or server.MaxConnsPerHost. |
| No request queue / shed | All requests accepted even when DB pool is full — they just block | Load shedding: if DB pool queue > N, return 503 immediately. Fail fast instead of accumulate. context.WithTimeout on DB calls. |
| No goroutine budget | No visibility into goroutine count. No alarm threshold. | Expose runtime.NumGoroutine() as Prometheus metric. Alert when > 10K. Circuit break at 50K. |
| No request timeout | If DB is slow, goroutines hang indefinitely holding memory | echo.Middleware(TimeoutMiddleware(5 * time.Second)). Or http.Server{ReadTimeout: 10s, WriteTimeout: 10s}. |
| No async write path | Every tracking pixel does a synchronous DB INSERT before responding | Write to in-memory ring buffer. Background goroutine batch-flushes to DB every 100ms. Decouple request handling from DB writes. |
How You'd Fix This — Production-Grade Architecture
PRODUCTION FIX: Bounded Concurrency + Async Writes // 1. Add server-level connection limits srv := &http.Server{ Addr: ":9000", ReadTimeout: 10 * time.Second, // Prevent slow-read attacks WriteTimeout: 15 * time.Second, // Prevent goroutine hang on slow clients IdleTimeout: 60 * time.Second, // Close idle keep-alive connections MaxHeaderBytes: 1 << 20, // 1MB header limit } // 2. Add concurrency limiter middleware sem := make(chan struct{}, 10000) // Max 10K concurrent requests e.Use(func(next echo.HandlerFunc) echo.HandlerFunc { return func(c echo.Context) error { select { case sem <- struct{}{}: defer func() { <-sem }() return next(c) default: return c.String(503, "server busy") // Load shedding } } }) // 3. Async tracking writes (decouple from request path) trackChan := make(chan TrackEvent, 100000) // Buffered channel // Handler: write to channel, return immediately func handlePixel(c echo.Context) error { select { case trackChan <- TrackEvent{campID, subID}: // Queued successfully default: // Channel full — drop event (acceptable for analytics) } return c.Blob(200, "image/png", pixel1x1) // Instant response } // Background flusher: batch inserts every 100ms go func() { ticker := time.NewTicker(100 * time.Millisecond) batch := make([]TrackEvent, 0, 1000) for { select { case ev := <-trackChan: batch = append(batch, ev) if len(batch) >= 1000 { flushBatch(batch) // COPY INTO campaign_views batch = batch[:0] } case <-ticker.C: if len(batch) > 0 { flushBatch(batch) batch = batch[:0] } } } }() // 4. DB call timeouts ctx, cancel := context.WithTimeout(c.Request().Context(), 3*time.Second) defer cancel() db.QueryContext(ctx, query, args...) // Cancels if DB slow // 5. Monitor goroutine count go func() { for range time.Tick(5 * time.Second) { n := runtime.NumGoroutine() metrics.Gauge("goroutines", n) // Prometheus metric if n > 50000 { log.Error("goroutine count critical", "count", n) } } }()
Interview Answer: "How Does Go Handle 1M Concurrent Requests?"
listmonk's campaign engine avoids this by using a fixed goroutine pool with channel backpressure — the producer-consumer pattern. But the HTTP server uses Go's default unbounded model with no concurrency limit, no request timeout, and no load shedding.
The production fix is three layers: (1) Server-level timeouts (ReadTimeout, WriteTimeout) prevent slow clients from holding goroutines. (2) Concurrency limiter middleware (semaphore channel) caps concurrent requests and returns 503 when saturated. (3) Async write path for hot paths (tracking pixels, link clicks) — decouple the DB write from the HTTP response using a buffered channel with a background batch flusher. This turns a 5ms synchronous DB call into a <50µs channel send.
Interview Framing: "The system handles 1M requests across different hot paths. The campaign send is I/O-bound on SMTP with intentional rate limiting. Tracking pixels and link clicks are the surprise high-throughput paths — they're write-heavy, user-facing, and scale linearly with subscriber count. The architectural insight is that these are append-only writes to analytics tables — perfect candidates for buffered batch inserts, table partitioning, and async processing. The Go HTTP server and goroutine model are never the bottleneck; PostgreSQL write throughput is."
Contention & Concurrency — Where Things Fight Over Shared Resources
Concurrency is about structure (multiple things can run). Contention is about conflict (multiple things fighting for the same resource). listmonk has both, and understanding the contention points is what separates a senior answer from a junior one in interviews.
Resource Contention Map — Every Shared Resource in listmonk
| Shared Resource | Who Competes | Contention Type | Protection Mechanism | Risk Level |
|---|---|---|---|---|
| PostgreSQL Connection Pool max_open = 25 | HTTP handlers, Campaign engine, Bounce processor, Importer, Cron jobs (matview refresh) | Mutex-like (pool internal lock) | Go's database/sql pool with internal mutex. Goroutines block on db.Conn() when pool exhausted. | HIGH — This is the #1 contention point. All subsystems share one pool. No priority, no isolation. |
| SMTP Connection Pool max_conns per server | Campaign workers, TX email handler, Optin confirmation sender, Notification sender | Channel-based semaphore | Buffered channel acts as connection pool. Workers block on channel receive when all connections in use. | MEDIUM — Campaign workers dominate. TX emails can starve during active campaigns. |
| Campaign Message Channel buffer = batch_size | Batch producer (1 goroutine) vs Worker pool (N goroutines) | Channel (CSP) | Go buffered channel. Producer blocks on send if full. Workers block on receive if empty. Lock-free. | LOW — By design. Channel provides natural flow control. No contention, only coordination. |
| Campaign Sent Counter shared int64 | All N worker goroutines (10-50 concurrent) | Atomic CAS | atomic.AddInt64(&sent, 1). Lock-free compare-and-swap at hardware level. No mutex. | NONE — Atomic operations have zero contention overhead. O(1) per operation regardless of concurrency. |
| Template Cache compiled templates in memory | Campaign workers (read) vs Admin updating template (write) | sync.Once / recompile | Templates compiled once (sync.Once). On admin update: recompile and swap pointer. Workers read stale until swap. No read-side lock. | NONE during normal operation. Momentary during recompile (new pointer swap is atomic). |
| Settings / Config app.constants struct | All handlers (read) vs Settings update (write → restart) | Process restart | Settings changes trigger SIGHUP → full process restart. No concurrent read/write possible — the entire process is replaced via syscall.Exec(). | NONE — listmonk avoids the problem entirely by restarting. No RWMutex needed. |
| SSE Event Bus events.Events | Campaign manager (publish) vs Browser clients (subscribe) | Channel per subscriber | Each SSE client gets its own channel. Publisher fans out to all subscriber channels. No shared state between clients. | LOW — Fan-out pattern. Publisher may slow if a client channel is full (slow consumer). |
| CSV Import Processor single goroutine | Import goroutine vs HTTP API (checking import status) | Mutex (likely) | Importer runs as a single goroutine. Status checked via API. Likely uses a mutex or atomic for progress state. | LOW — Single writer, occasional reader. Minimal contention. |
DB Pool Contention — The Priority Inversion Problem
This is the most important contention point in listmonk and a great interview discussion topic:
PRIORITY INVERSION: All subsystems share 25 DB connections ┌─────────────────────────────────────────────────────────┐ │ sql.DB Pool (max_open=25) │ │ ┌────┐┌────┐┌────┐┌────┐┌────┐ ... ┌────┐ │ │ │conn││conn││conn││conn││conn│ │conn│ ×25 │ │ └──┬─┘└──┬─┘└──┬─┘└──┬─┘└──┬─┘ └──┬─┘ │ └──────┼─────┼─────┼─────┼─────┼──────────────┼───────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ Campaign Campaign Pixel Pixel API Bounce Batch1 Batch2 Track Track List Check Problem scenarios: 1. Campaign sending large batches holds connections for batch SELECT (~5-50ms). Meanwhile, admin API requests queue behind campaign queries. Admin dashboard feels slow during active campaigns. 2. 1M tracking pixel INSERTs saturate the pool. Campaign batch fetch can't get a connection. Campaign throughput drops. Send time extends. 3. Materialized view REFRESH (cron job) takes 30-60 seconds. Holds 1 connection during entire refresh. 24 connections left for everything else. 4. Bulk subscriber import doing 10K upserts. Competes with campaign sending for connections. Both slow down.
| Fix Strategy | How | Trade-off |
|---|---|---|
| Separate Connection Pools | Create 3 sql.DB instances: one for campaign engine (10 conns), one for HTTP handlers (10 conns), one for background jobs (5 conns). Each with independent max_open. | More total connections to Postgres. Needs max_connections increase on DB side. Slightly more memory. |
| Connection Priority | Custom pool wrapper that reserves N connections for high-priority callers (campaign engine). HTTP requests use remaining. Implement with two semaphores. | Complex. Can cause HTTP starvation if campaign is too aggressive. |
| Read Replica Split | Route all SELECT queries (subscriber lookups, dashboard, API reads) to a read replica. Writes (INSERTs, UPDATEs) go to primary. | Replication lag (milliseconds). Needs application-level routing. listmonk doesn't support this natively. |
| Context Timeouts | context.WithTimeout(ctx, 3*time.Second) on all DB calls. If a connection isn't available within 3s, fail fast with 503 instead of blocking. | Requests fail under load instead of queuing. Better for latency SLOs. Some data operations may need longer timeouts. |
| PgBouncer | External connection pooler between listmonk and Postgres. Transaction-mode pooling. Multiplexes 25 application connections into 100+ Postgres connections. | Additional infrastructure. Doesn't work with prepared statements in session mode. Adds ~0.1ms latency. |
Row-Level Contention in PostgreSQL
| Contention Point | Scenario | What PostgreSQL Does | Impact |
|---|---|---|---|
| campaigns row UPDATE | Campaign manager updates last_subscriber_id and sent after each batch. Admin simultaneously views campaign status. | Row-level lock (MVCC). Writer acquires RowExclusiveLock. Reader sees old snapshot (no block). Writers don't block readers. | None — MVCC handles this perfectly. Reads see consistent snapshot. |
| subscribers UPSERT | CSV import upserting 10K subscribers while public subscription form creates new subscribers. Both touch idx_subs_email unique index. | Each INSERT/UPDATE acquires row lock. Concurrent upserts on different emails: no conflict. Same email: one waits for other's transaction to commit. | Low — conflicts only on same email. Import batches in transactions, so a stuck import blocks other writes to same subscribers. |
| subscriber_lists INSERT | Two campaigns targeting overlapping lists. Both reading subscriber_lists to find recipients. Campaign manager only reads; doesn't write to this table during send. | No contention — campaign send only SELECTs from subscriber_lists. Subscription changes (add/remove) acquire row locks on specific (subscriber_id, list_id) pairs. | None — read-only during campaign processing. |
| campaign_views / link_clicks INSERT | Thousands of concurrent tracking pixel and link click INSERTs. All writing to the same tables. | Append-only tables with BIGSERIAL PK. Each INSERT acquires a nextval() on the sequence (lightweight lock) + index locks. No row-level conflicts. | Sequence lock is a bottleneck at very high insert rates (~50K+/sec). Fix: CACHE 100 on sequence to reduce lock acquisitions. Or batch inserts. |
| settings UPDATE | Admin saves settings while campaign is reading config. | Settings are read at startup and cached in-memory (app.constants). DB write doesn't affect running config. Full process restart needed to pick up changes. | None — decoupled by design. Config is immutable during process lifetime. |
| REFRESH MATERIALIZED VIEW CONCURRENTLY | Cron job refreshes dashboard stats while admin views dashboard. | CONCURRENTLY keyword allows reads during refresh. Creates new version of matview, swaps atomically. Requires UNIQUE index on matview. | None for readers. The refresh itself holds an ExclusiveLock on the matview — two concurrent refreshes would block. |
| bounces threshold CHECK | Multiple bounce webhooks for same subscriber arrive simultaneously. Each does SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? then potentially UPDATE subscribers SET status = 'blocklisted'. | TOCTOU race condition. Two webhooks both count 0 bounces, both insert, both check threshold — subscriber may get N+1 bounces before blocklist triggers. | Minor — subscriber gets one extra email before blocklist. Not dangerous. Fix: SELECT ... FOR UPDATE on subscriber row during bounce processing. |
Go-Level Concurrency Primitives Used
| Primitive | Where Used | Why This Choice | Contention Characteristics |
|---|---|---|---|
| Buffered Channel | Campaign message pipeline SMTP connection pool SSE event fan-out | CSP model. Decouples producer from consumer. Natural backpressure. No explicit locking needed. | Zero contention when buffer isn't full/empty. Contention only at boundaries: producer blocks when full (backpressure), consumer blocks when empty (idle). |
| atomic.AddInt64 | Campaign sent counter Campaign error counter | Lock-free counter. Hardware CAS instruction. No goroutine blocking, ever. | Near-zero. CAS retry on contention (extremely rare, nanoseconds). Outperforms mutex by 10-100x for simple counters. |
| sync.Once | Template compilation One-time initialization | Thread-safe lazy init. First caller executes, all others wait then return cached result. | First call: brief mutex hold during init. All subsequent calls: atomic read (zero contention). Perfect for "compute once, read forever" patterns. |
| database/sql Pool | All PostgreSQL access | Built-in connection pooling. Thread-safe. Handles connection lifecycle. | Internal mutex on connRequests map. Under high concurrency, goroutines queue in FIFO order. This is the primary contention point in the entire system. |
| No explicit Mutex | — | listmonk avoids sync.Mutex and sync.RWMutex in hot paths. Prefers channels, atomics, and immutable data (restart on config change). | Architectural choice: channels for coordination, atomics for counters, process restart for config. Eliminates most mutex contention by design. |
Concurrent Campaign Execution — Overlapping Lists
What happens when two campaigns target overlapping subscriber lists simultaneously?
SCENARIO: Campaign A targets List 1 (500K subs), Campaign B targets List 2 (500K subs) 200K subscribers are on BOTH lists. Campaign A goroutines: Campaign B goroutines: 1 batch producer 1 batch producer 10 send workers 10 send workers ────────────── ────────────── 11 goroutines 11 goroutines // Total: 22 goroutines What happens to the 200K overlapping subscribers? ✓ Both campaigns independently fetch and send to them. ✓ The subscriber receives BOTH emails (intentional — different campaigns). ✓ No deduplication across campaigns (by design). ✓ No row locks conflict — campaigns only SELECT from subscriber_lists. ✓ Each campaign has independent last_subscriber_id cursor. ✓ Each campaign has independent sent counter (atomic). Contention points: 1. DB Pool: 22 goroutines competing for 25 connections. Batch fetches: 2 long-running SELECTs. Workers doing progress UPDATEs: occasional contention. Mitigation: workers mostly wait on SMTP (I/O bound), not DB. 2. SMTP Pool: 20 workers sharing max_conns connections. If max_conns=10, workers queue for connections. Mitigation: each campaign can use different named SMTP servers. 3. Rate Limiter: GLOBAL rate limit (message_rate) shared across campaigns. Two campaigns each wanting 100 msg/sec with rate=100 → each gets ~50. Campaign throughput halves with each concurrent campaign. 4. No campaign-level resource isolation. A slow campaign (complex template) slows all campaigns by holding DB connections longer.
Race Conditions — Known & Potential
| Race Condition | Severity | Description | Fix |
|---|---|---|---|
| Bounce TOCTOU | Low | Two bounce webhooks arrive for same subscriber simultaneously. Both read count=0, both insert, both check threshold — neither triggers blocklist because each sees count=1 when threshold=2. Next bounce will trigger it. | SELECT ... FOR UPDATE on subscriber row. Or INSERT + SELECT COUNT in a single transaction with SERIALIZABLE isolation. |
| Campaign Status Transition | Low | Admin clicks "Pause" while manager is updating sent count. PostgreSQL row-level MVCC prevents corruption — the UPDATE acquires a row lock. But the pause might not take effect until the current batch completes. | Current behavior is acceptable. Campaign checks for pause signal between batches. Near-instant in v3.0.0 rewrite. |
| Duplicate Subscription | None | Two form submissions for same email at the same time. INSERT ... ON CONFLICT (email) DO UPDATE handles this atomically — PostgreSQL serializes at the unique index level. | Already handled by DB unique constraint + upsert. No application-level fix needed. |
| Matview Concurrent Refresh | Low | Two cron triggers fire simultaneously (unlikely but possible). REFRESH MATERIALIZED VIEW CONCURRENTLY acquires ExclusiveLock — second call blocks until first completes. | Not harmful — just wastes time. Could add application-level lock (pg_advisory_lock) to skip if already running. |
| Template Hot-Swap | None | Admin updates template while campaign is mid-send using that template. Campaign workers hold reference to compiled template in memory. Template recompile creates new object; old one is GC'd after campaign finishes. | Safe — Go's GC keeps old template alive as long as goroutines reference it. New campaigns get the updated template. In-flight campaign uses the old version. |
| Subscriber Delete During Send | None | Admin deletes subscriber while campaign is sending to them. Campaign already fetched the batch — subscriber data is in memory. DB INSERT for tracking: subscriber_id FK SET NULL handles gracefully. | Already handled by schema design. ON DELETE SET NULL on campaign_views and link_clicks preserves analytics data. |
Interview Answer: "How Do You Handle Contention in a Concurrent System?"
1. Application-level: listmonk uses channels (not mutexes) for coordination — the CSP model. The campaign engine is a bounded worker pool where backpressure is built into the channel. Counters use lock-free atomics. Config is immutable — changes trigger a full process restart, completely sidestepping read-write contention. This is an architectural decision to prefer simplicity over fine-grained locking.
2. Connection pool: The single shared DB pool (25 connections) is the primary contention point. All subsystems — HTTP handlers, campaign engine, bounce processor, cron jobs — compete for the same connections. Under high load, goroutines queue behind the pool's internal mutex. The fix is pool isolation: separate pools for campaign engine vs HTTP handlers, or a priority queue that reserves connections for critical paths.
3. Database-level: PostgreSQL MVCC eliminates most row-level contention — readers never block writers, writers never block readers. The real contention is on sequences (BIGSERIAL PK on high-insert tables like link_clicks) and unique index locks during concurrent upserts. Mitigate with sequence caching (
CACHE 100) and batch inserts.4. Cross-campaign: The global rate limiter is shared across all campaigns — two concurrent campaigns each get half the throughput. There's no per-campaign resource isolation. At scale, you'd partition resources per campaign: dedicated worker pools, separate rate limiters, and named SMTP servers for high-priority campaigns.
Resilience & High Availability
Failure Modes & Recovery
| Failure | Impact | Recovery Mechanism |
|---|---|---|
| App Crash Mid-Campaign | Campaign paused at last batch | last_subscriber_id persisted per-batch. Auto-resumes on restart. |
| SMTP Server Down | Messages fail for that server | Per-message retry. Auto-pause at error threshold. Multiple SMTP fallback. |
| DB Connection Lost | All operations fail | sqlx auto-reconnect via pool. max_open/max_idle/max_lifetime. |
| Bounce Flood | Sender reputation at risk | Auto-blocklist after N hard bounces. Configurable per bounce type. |
| Config Change Needed | Requires restart | SIGHUP hot restart: graceful shutdown → process self-replace. No downtime. |
| Campaign Stuck | Never finishes | --passive flag for read-only. Admin force status change via API. |
What listmonk Does NOT Have (HA Gaps)
Data Integrity Guarantees
PostgreSQL provides ACID transactions for all subscriber/list/campaign mutations. FK constraints with CASCADE ensure referential integrity. ENUM types enforce valid state transitions. The sent counter in v3.0.0 is exact (not approximated), ensuring no duplicate or missed sends on pause/resume.
Idempotent Upgrades
./listmonk --upgrade is idempotent. Running it multiple times has no side effects. Migrations use version checks and are applied sequentially. Critical for automated deployment pipelines (Kubernetes rollouts, CI/CD).
Design Principles & Patterns
GoF & Architectural Patterns Catalog
| Pattern | Category | Where in listmonk |
|---|---|---|
| Strategy | Behavioral | Messenger interface, Media provider interface |
| Observer | Behavioral | SSE events bus for real-time UI updates |
| Template Method | Behavioral | Go html/template with Sprig function injection |
| Producer-Consumer | Concurrency | Campaign batch fetch → channel → worker goroutines |
| Connection Pool | Creational | sqlx DB pool, SMTP connection pool per server |
| Repository | Structural | Core layer wraps all DB access behind domain methods |
| Facade | Structural | App struct provides single entry point to all subsystems |
| Middleware Chain | Structural | Echo middleware: auth → CORS → logging → handler |
| Materialized View | Data | Pre-aggregated dashboard stats, refreshed on cron |
| Cursor Pagination | Data | Keyset pagination via last_subscriber_id bookmark |
Non-Functional Requirements — How listmonk Handles Them
NFRs are the make-or-break qualities that interviewers probe after your functional design. listmonk is an excellent case study because it's a production system handling millions of messages — every NFR decision below was battle-tested, not theoretical.
Security
| NFR Concern | Implementation | Code Reference |
|---|---|---|
| Authentication | Three modes: password login (bcrypt-hashed), OIDC/SSO (Google, Microsoft, Apple), API tokens (username:token header). Sessions stored in PostgreSQL via simplesessions. | internal/auth/auth.go, cmd/auth.go |
| Authorization (RBAC) | Role-based access control with two role types: user roles (global permissions) and list roles (per-list permissions). Permissions defined in permissions.json. Each API endpoint checks permissions via middleware. Users can have different access levels per list. | internal/auth/, roles table, permissions.json |
| CSRF Protection | Cookie-based sessions with SameSite attribute. OIDC flows use state parameter for CSRF prevention. Admin UI is SPA (same-origin API calls). | cmd/auth.go |
| XSS Prevention | Campaign preview iframes sandboxed. Custom CSS/JS injection scoped to admin/public separately. Go's html/template auto-escapes by default. v5.0.2 patched stored XSS via Sprig template injection. | cmd/admin.go, security advisories |
| Secret Management | Passwords masked in UI responses with • characters. Backend merges existing passwords via UUID matching when masked values are submitted. SMTP passwords, S3 keys, OIDC secrets all masked. | cmd/settings.go |
| CORS | Configurable allowed origins via security.cors_origins. Supports wildcard * or specific URLs. Validated and normalized on save. | cmd/settings.go:261-280 |
| CAPTCHA | Two providers: ALTCHA (proof-of-work, privacy-friendly, no external calls) and hCaptcha. Protects public subscription forms from bot abuse. | internal/captcha/ |
| 2FA | TOTP-based two-factor authentication for user accounts. Stored as twofa_type ENUM and twofa_key in users table. | users table, cmd/auth.go |
| Sprig Template Hardening | Dangerous Sprig functions (env, expandenv) removed to prevent environment variable leakage from templates. Patched in v5.0.2 after CVE. | internal/manager/manager.go |
Privacy & GDPR Compliance
| NFR Concern | Implementation | Config Key |
|---|---|---|
| Tracking Controls | privacy.individual_tracking (off by default) controls per-subscriber open/click attribution. privacy.disable_tracking turns off all tracking entirely. When disabled, tracking pixels and link wrapping are skipped. | privacy.individual_tracking, privacy.disable_tracking |
| Self-Service Data Export | Subscribers can export their own data (profile, subscriptions, campaign views, link clicks) via public pages. Exportable fields configurable via privacy.exportable. | privacy.allow_export, privacy.exportable[] |
| Self-Service Data Wipe | Subscribers can request complete deletion of their data. Cascades via FK constraints to remove all associated records. | privacy.allow_wipe |
| Self-Service Blocklist | Subscribers can blocklist themselves, preventing any future emails. Status set to 'blocklisted' in DB. | privacy.allow_blocklist |
| Subscription Preferences | Subscribers can manage their own list subscriptions via public preference pages. | privacy.allow_preferences |
| Unsubscribe Headers | RFC 8058 List-Unsubscribe header added to all campaign emails by default. Required by Gmail/Yahoo for bulk senders. | privacy.unsubscribe_header (default: true) |
| Domain Filtering | Blocklist and allowlist for email domains. Supports wildcard patterns (*.example.com). Applied during subscription and import. Prevents abuse from disposable email domains. | privacy.domain_blocklist[], privacy.domain_allowlist[] |
| IP Recording | Opt-in IP address recording on subscription confirmation. Off by default for privacy. | privacy.record_optin_ip (default: false) |
| Data Ownership | Self-hosted = you own all data. No third-party analytics. No external tracking pixels. PostgreSQL under your control. | Architecture decision |
Observability & Logging
| NFR Concern | Implementation | Gap / Notes |
|---|---|---|
| Logging | Standard Go log.Logger to stdout. Structured log lines with timestamps. Campaign manager logs start/finish/errors per campaign with subscriber IDs. | No structured JSON logging. No log levels (debug/info/warn/error). Basic but functional. |
| Dashboard Analytics | Materialized views provide: subscriber counts by status, campaign stats by status, 30-day link click and view trends, per-list subscriber breakdowns. Refreshed on cron. | mat_dashboard_counts, mat_dashboard_charts, mat_list_subscriber_stats |
| Campaign Tracking | Per-campaign: sent count, open rate (pixel tracking), click-through rate (link wrapping), bounce count by type. Real-time progress via SSE events. | campaign_views, link_clicks, bounces tables |
| Health Checks | HTTP server responds on configured address. About endpoint exposes version, Go runtime stats (CPU, memory alloc, OS memory). DB connectivity implicit in operations. | No dedicated /health or /readyz endpoint. Would need reverse proxy health checks. |
| Real-Time Events | SSE (Server-Sent Events) bus via internal/events/. Frontend receives live campaign progress updates, import status, notifications without polling. | internal/events/events.go |
| Metrics / APM | Not built-in. No Prometheus metrics endpoint, no OpenTelemetry instrumentation. | Gap — would need external instrumentation for production monitoring at scale. |
Operability & Deployment
| NFR Concern | Implementation |
|---|---|
| Zero-Downtime Config Reload | SIGHUP signal triggers graceful shutdown (HTTP drain, campaign flush, DB close) then syscall.Exec() self-replaces process. Campaigns in progress: sets needsRestart flag and shows warning banner — admin restarts later. |
| Idempotent Migrations | --upgrade is safe to run multiple times. Version-checked sequential migrations. Critical for CI/CD pipelines and Kubernetes rolling deployments. |
| Single Binary Distribution | All assets (frontend, SQL, i18n, templates) embedded via stuffbin. One binary + one config.toml + one PostgreSQL. No Node.js, no file dependencies. |
| Docker / Kubernetes | Official Docker image on DockerHub. docker-compose.yml included. Community Helm chart available. Environment variable configuration via LISTMONK_* prefix. |
| Passive Mode | --passive flag runs the app without processing campaigns. Useful for read-only API replicas behind a load balancer while one instance handles sending. |
| Systemd Integration | Ships with listmonk.service and listmonk@.service (template unit for multiple instances). Production-ready process management. |
| Backup Strategy | All state in PostgreSQL — standard pg_dump for backups. Media on filesystem or S3 (backed up via provider tools). No application-level backup mechanism needed. |
Performance (as NFR)
| NFR Concern | Implementation |
|---|---|
| Throughput Tuning | Three knobs: app.concurrency (worker goroutines), app.message_rate (msgs/sec), app.batch_size (DB fetch size). All configurable via UI without code changes. |
| Resource Efficiency | 57MB peak RAM for 7M+ emails. Fractional CPU. Go's goroutines are ~2KB each vs threads at ~1MB. Connection pooling avoids socket exhaustion. |
| Slow Query Mitigation | app.cache_slow_queries enables cron-refreshed materialized views for expensive dashboard aggregations. Configurable interval (default: daily at 3 AM). |
| Connection Limits | DB: max_open=25, max_idle=25, max_lifetime=300s. SMTP: max_conns per server with idle_timeout and wait_timeout. Prevents resource exhaustion. |
| Streaming Operations | CSV subscriber export streams rows as fetched (no buffering entire dataset). Import processes in configurable batch sizes. Constant memory for large operations. |
Internationalization (i18n)
| NFR Concern | Implementation |
|---|---|
| Multi-Language Support | JSON language files in i18n/*.json. Loaded via stuffbin embedded filesystem. Backend uses internal/i18n package with T() and Ts() (with substitutions) functions. |
| Frontend Localization | Vue.js admin dashboard uses vue-i18n with the same JSON language files served from /admin/static/. Dynamically loaded based on user language setting. |
| Public Page Localization | Subscription forms, unsubscribe pages, preference pages all localized. System email templates (opt-in confirmation, notifications) use L() function for translations. |
| Date/Time Localization | Day.js configured with localized relative time strings. Absolute dates use translated day/month names from i18n files. |
Maintainability & Testability
| NFR Concern | Implementation |
|---|---|
| Code Organization | internal/ packages enforce encapsulation. Layered architecture (handlers → core → DB) prevents spaghetti dependencies. Each package has a single responsibility. |
| SQL as First-Class Artifact | All queries in version-controlled .sql files. No generated SQL. No ORM magic. Reviewable, diffable, optimizable independently of Go code. |
| Schema Migrations | Versioned migrations in internal/migrations/. Each version file (e.g., v5.1.0.go) contains the delta. Applied sequentially with version checks. |
| Testability | Core business logic has zero HTTP dependencies — can be unit tested with a DB mock. Interface-based design (Messenger, Media provider) enables test doubles. |
| Dev Environment | .devcontainer/ config for VS Code dev containers. Makefile with make dist build target. Docker Compose for local Postgres. Frontend hot-reload with Vue CLI. |
Error Handling & Fault Tolerance
| Concern | Mechanism | How It Works |
|---|---|---|
| Per-Message Retry | SMTP max_msg_retries (default: 2) | Each failed message is retried N times before being counted as a send error. Retry happens immediately within the same worker goroutine. Failed messages don't block the channel — other workers continue sending. |
| Campaign Error Threshold | app.max_send_errors (default: 1000) | Cumulative send errors tracked per campaign via atomic counter. When threshold is hit, campaign auto-pauses — prevents burning through an entire list when SMTP is misconfigured or provider is throttling. Admin can investigate and resume. |
| SMTP Connection Failure | Connection pool + wait_timeout | If a connection dies mid-send, the pool creates a new one. wait_timeout prevents indefinite blocking waiting for a free connection. idle_timeout closes stale connections proactively. |
| Bounce Classification | Three-tier: soft / hard / complaint | Each bounce type has configurable count and action. Soft bounce: ignore until threshold (transient issues). Hard bounce: blocklist after 1 (permanent — invalid address). Complaint: blocklist after 1 (spam report). Actions: none, blocklist, delete. |
| Crash Recovery | Checkpoint via last_subscriber_id | After each batch, campaign progress is persisted to DB. On process crash/restart, campaigns with status='running' auto-resume from the last checkpoint. No duplicate sends because keyset cursor skips already-processed subscribers. |
| Subscriber Import Errors | Per-row validation + summary | CSV import validates each row (email format, domain blocklist, required fields). Invalid rows are skipped with error details in import log. Valid rows are upserted. Import can be stopped/retried without corrupting data. |
| Template Render Errors | Per-subscriber isolation | If a template fails to render for a specific subscriber (e.g., missing attribute), that single message is marked as error. Other subscribers are unaffected. Campaign continues processing. |
| DB Connection Exhaustion | Pool limits + lifetime | max_open=25 hard-caps total connections. max_lifetime=300s recycles connections preventing stale state. max_idle=25 keeps warm connections ready. sqlx auto-reconnects on transient failures. |
| Graceful Degradation | Campaign pause + passive mode | No circuit breaker pattern, but the error threshold serves a similar purpose: after enough failures, the system stops trying (pauses campaign). --passive mode allows serving the UI/API while campaign sending is disabled. |
What's Missing in Error Handling (Interview Talking Points)
time.Sleep(baseDelay * 2^attempt + rand)).Go-Specific Scalability Patterns
| Pattern | Go Implementation | Why It Scales |
|---|---|---|
| Goroutine Worker Pool | Fixed-size pool (default 10) consuming from a buffered channel. for msg := range msgChan { ... } | Goroutines are ~2-4KB stack (vs ~1MB threads). 10 goroutines handle millions of messages. Channel provides natural backpressure — if workers are busy, producer blocks on channel send. No unbounded goroutine creation. |
| Channel-Based Flow Control | Buffered channel between batch producer and send workers. Buffer size = batch_size. | Channels are Go's CSP primitive. They handle synchronization, ordering, and backpressure without explicit locks. If SMTP is slow, channel fills up, producer pauses DB fetching automatically. Self-regulating. |
| Atomic Counters | atomic.AddInt64(&sent, 1) for campaign progress | Lock-free concurrent increment. No mutex contention across 10+ workers updating the same counter. Hardware CAS instruction. O(1) regardless of worker count. |
| sync.Once for Caching | Template compilation cached via sync.Once. Recompiled only on explicit template update. | Thread-safe lazy initialization. First call compiles, subsequent calls return cached result. Zero allocation after first call. Critical for hot-path template rendering. |
| Context Cancellation | context.Context propagated from HTTP request → Core → DB query | Request-scoped timeouts and cancellation. If a client disconnects, the entire call chain cancels — DB query aborted, goroutine freed. Prevents resource leaks on slow queries or abandoned requests. |
| Connection Pool (database/sql) | Go's sql.DB is already a connection pool. sqlx wraps it. max_open=25. | Pool manages connection lifecycle, reuse, health checks. Concurrent goroutines share the pool safely. Idle connections kept warm. Lifetime rotation prevents stale TCP connections. |
| Embedded Filesystem | stuffbin.FileSystem embeds all assets into the binary at compile time. | Zero disk I/O for serving frontend assets. Memory-mapped access. No file descriptor overhead. Eliminates deployment complexity (no asset directory sync). Scales vertically with zero ops burden. |
| Streaming Response | CSV export: csv.Writer wrapping http.ResponseWriter. Rows written as DB cursor advances. | O(1) memory for exporting 10M subscribers. No buffering entire result set. HTTP chunked transfer encoding. Client sees data immediately. DB cursor keeps server-side state. |
Go Scalability Mental Model for Interviews: listmonk proves that a single Go process with goroutine pools + channels + connection pooling can handle millions of operations. The key insight: Go's concurrency model maps perfectly to the producer-consumer pattern. The producer is I/O-bound (DB fetch), workers are I/O-bound (SMTP send), and channels decouple them. You don't need Kafka for this workload — Go channels are an in-process message queue. You add Kafka when you need multi-node distribution or replay guarantees.
High Availability (HA) — What Exists & What Doesn't
| HA Dimension | Current State | How You'd Improve It (Interview Answer) |
|---|---|---|
| Process Availability | ✓ Hot restart via SIGHUP + syscall.Exec(). Systemd auto-restart on crash. Docker restart policies. Millisecond startup time. | Add liveness/readiness probes for K8s. Currently no /healthz — requests to any endpoint implicitly confirm liveness. |
| Database HA | ◐ Relies on external PostgreSQL HA (RDS Multi-AZ, Patroni, pg_auto_failover). Connection pool handles transient failures. | For self-hosted: Patroni + pgBouncer. For cloud: RDS/Cloud SQL with read replicas. listmonk's --passive mode can point to a read replica. |
| Campaign Continuity | ✓ Checkpoint-based resume. Campaigns survive process restart. last_subscriber_id persisted per batch. Status remains 'running' in DB. | For zero-gap: WAL-based approach — log each message to a journal before sending, mark complete after ACK. Current approach has a small window of potential duplicates within a batch. |
| Horizontal Scaling (Read) | ◐ --passive mode serves API/UI without processing campaigns. Multiple passive instances behind a load balancer. | Add session stickiness or shared session store (Redis/PostgreSQL sessions already in DB). API token auth is stateless and scales naturally. |
| Horizontal Scaling (Write/Send) | ✗ Single sender process. No campaign sharding, no distributed locking, no work stealing. | Campaign partitioning: assign subscriber ID ranges to sender nodes. Distributed lock (etcd/Redis) for campaign ownership. Message queue (Kafka/SQS) between fetch and send. |
| SMTP Failover | ◐ Multiple SMTP servers can be configured. Default "email" messenger load-balances across all enabled servers. But no active health checking or automatic failover — if one server is slow, it still gets traffic. | Add per-server health scoring. Circuit breaker per SMTP. Weighted routing based on success rate. Remove unhealthy servers from pool temporarily. |
| Data Durability | ✓ All state in PostgreSQL with ACID guarantees. FK constraints prevent orphaned data. WAL provides crash consistency. Standard pg_dump for backups. | Point-in-time recovery via WAL archiving. Cross-region replication for DR. Media on S3 with cross-region replication. |
| Zero-Downtime Deploys | ✓ Idempotent --upgrade. SIGHUP hot restart. Campaigns auto-resume post-restart. Docker rolling update compatible. | Blue-green deployment: run new version in passive mode, verify, then switch active sender. K8s rolling update with readiness probe. |
HA Architecture for 99.9% Uptime — What You'd Propose in an Interview
┌──────────────┐ ┌──────────────────────────────────────────────┐
│ Load │ │ Application Tier │
│ Balancer │────▶│ ┌─────────────┐ ┌─────────────┐ │
│ (Nginx/ │ │ │ listmonk │ │ listmonk │ │
│ ALB) │ │ │ (active │ │ (passive │ │
│ │ │ │ sender) │ │ read-only) │ × N │
└──────────────┘ │ └──────┬──────┘ └──────┬──────┘ │
└─────────┼────────────────┼──────────────────┘
│ │
┌─────────▼────────────────▼──────────────────┐
│ Database Tier │
│ ┌──────────┐ ┌──────────┐ │
│ │ Postgres │───▶│ Postgres │ │
│ │ Primary │ │ Replica │ (streaming) │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────┘
Active sender: processes campaigns, handles writes
Passive instances: serve UI/API reads, handle public pages
Primary DB: all writes, campaign state
Replica DB: passive instances read from here
NFR Summary Matrix — Interview Quick Reference
When an interviewer asks "how would you handle X?" — point to these concrete implementations:
The gaps are intentional trade-offs for simplicity. In an interview, acknowledge them and propose solutions: "listmonk prioritizes operational simplicity — a single binary serving 7M+ emails. At scale, I'd add: /healthz endpoint for K8s probes, Prometheus metrics via expvar or echo middleware, structured logging with slog (Go 1.21+), circuit breakers per SMTP server using sony/gobreaker, exponential backoff with jitter on retries, and a dead letter table for failed messages. For horizontal send scaling, I'd introduce Kafka between the batch producer and send workers, with campaign partition assignment via etcd distributed locks."
System Design Interview Cheat Sheet
Use listmonk as a reference when answering questions about designing email/notification systems, producer-consumer pipelines, or self-hosted SaaS alternatives.
If Asked: "Design a Newsletter/Email System"
2. Send Pipeline: Producer fetches batches via keyset pagination (WHERE id > cursor). Workers consume from buffered channel. Rate limiter (token bucket + sliding window). SMTP connection pool per server.
3. Tracking: Pixel tracking for opens (1x1 transparent PNG). Link wrapping for click tracking. Privacy toggle to disable/anonymize. Expression indexes on DATE for time-series queries.
4. Bounce Handling: Webhook receivers for SES/SendGrid/Postmark. POP/IMAP mailbox scanning. Auto-blocklist on hard bounce. Configurable thresholds per bounce type.
5. Scale: Materialized views for dashboards. Batch processing amortizes DB cost. Connection pooling. Single process handles 7M+ emails. For 100M+: shard DB, add message queue, horizontal senders.
6. Reliability: Checkpoint-based crash recovery (last_subscriber_id). Idempotent resume. Error threshold auto-pause. ACID transactions for mutations. Graceful hot restart via SIGHUP.
Key Talking Points
| Topic | What to Say | listmonk Reference |
|---|---|---|
| Why PostgreSQL? | JSONB for flexible schemas without NoSQL complexity. ENUMs for state machines. Materialized views for read optimization. ACID for correctness. | subscribers.attribs JSONB, campaign_status ENUM, mat_dashboard_counts |
| Why not an ORM? | Named SQL via goyesql gives full PostgreSQL feature access. No N+1 queries. SQL is reviewable, optimizable. Complex queries don't fit ORM patterns. | queries/*.sql loaded at startup |
| Cursor vs Offset | OFFSET scans N rows then discards. Cursor (keyset) uses WHERE id > X which hits the index directly. O(1) vs O(N). Critical at scale. | campaigns.last_subscriber_id |
| Rate Limiting | Token bucket for steady rate. Sliding window for burst control. Per-connection limits for SMTP backpressure. Layered approach. | message_rate, sliding_window, max_conns |
| Single Binary Trade-offs | Pro: zero-dep deployment, fast startup, simple ops. Con: vertical scaling only. Good for 80% of use cases. | stuffbin embeds all assets |
| When to Add a Queue | Current: in-process channel. At scale: Kafka/SQS decouples fetch from send, enables multi-node senders, provides replay. | Campaign manager uses Go channels |
Quick-Reference: Numbers to Know
Related System Design Problems
If you study listmonk deeply, you can answer variations of these interview questions: Design a notification system (email/SMS/push), Design a mailing list manager, Design a campaign analytics platform, Design a self-hosted SaaS tool, Design a producer-consumer pipeline with rate limiting, Design a system with crash recovery and exactly-once processing, How would you handle millions of concurrent email sends, Design a multi-tenant newsletter platform.