System Design: Terms to Know

LATENCY

measured in ms / µs

Time elapsed between sending a request and receiving the first byte of response. The single most human-felt metric — users notice >100ms. Engineers target <1ms inside the data center.

Real numbers

Cache hit (Redis in-memory) ~0.1 – 1 ms

DB read, cached (Postgres) < 5 ms

Message queue (Kafka e2e) < 5 ms

Cross-region network 30 – 150 ms

Scale trigger

Cache > 1ms → add replicas · DB uncached > 5ms → add read replicas

real world

RedisMemcachedKafka

P99 / TAIL LATENCY

percentile

The latency experienced by the slowest 1% of requests. Averages lie — P99 reveals outliers that kill user experience. Distributed systems compound tail latency: if one service is slow, every caller waits.

Real numbers

P50 (median, healthy) 1 – 3 ms

P99 (acceptable) < 50 ms

P99.9 (worst 0.1%) < 200 ms

Scale trigger

P99 breaching SLA → investigate slow paths, add capacity, or shed load

real world

Google SREAWS CloudWatchDatadog APM

TPS

transactions / second

Number of complete database transactions committed per second. Each transaction can span multiple reads and writes. The gold-standard throughput metric for relational databases — dictates vertical scale ceiling before sharding.

Real numbers

Small Postgres instance 1k – 5k TPS

Large DB server (max) 50k TPS

Write scale trigger > 10k TPS

Scale trigger

Write > 10k TPS → shard, CQRS, or switch to write-optimized DB

real world

PostgreSQLMySQLCockroachDB

OPS / SEC

operations / second

Raw read or write operations per second, typically for caches or key-value stores. Broader than TPS — a single transaction can contain many ops. Caches are optimized to maximize this number since they are memory-bound, not disk-bound.

Real numbers

Redis single node 100k+ ops/sec

Memcached cluster 1M+ ops/sec

DynamoDB provisioned up to 40k r/w per table

Scale trigger

Hit rate < 80% or memory > 80% → scale cache cluster

real world

RedisMemcachedDynamoDB

MSGS / SEC

messages / second

Volume of discrete messages a queue or broker can ingest and deliver per second. Differs from TPS — messages are often fire-and-forget, async. Kafka partitions are the scaling unit: more partitions = higher parallelism.

Real numbers

Kafka single broker 1M msgs/sec

Safe throughput ceiling 800k msgs/sec

RabbitMQ single node ~50k msgs/sec

Scale trigger

Near 800k msgs/sec or consumer lag growing → add brokers + partitions

real world

KafkaRabbitMQPulsarSQS

IOPS

I/O operations / second

How many individual read or write operations a disk can handle per second, regardless of data size. Small random reads (databases, indexes) are IOPS-bound. SSDs vastly outperform spinning disks on this metric.

Real numbers

HDD (spinning disk) 100 – 200 IOPS

SATA SSD 5k – 80k IOPS

NVMe SSD (local, cloud) 100k – 1M+ IOPS

AWS io2 Block Express 256k IOPS

Scale trigger

Disk queue length > 1 consistently → upgrade to NVMe or add storage nodes

real world

AWS EBS io2NVMe localPostgres WAL

DISK THROUGHPUT

MB/s or GB/s

Total data transferred to/from disk per second. Matters for large sequential reads: log streaming, analytics scans, backups. Distinct from IOPS — you can have high throughput with few large I/Os, or high IOPS with many small I/Os.

Real numbers

SATA SSD sequential 500 MB/s

NVMe SSD sequential 3 – 7 GB/s

Kafka segment writes ~1 GB/s sequential

Scale trigger

I/O wait > 20% in CPU profile → separate hot data to faster storage tier

real world

Kafka log segmentsSnowflake spillClickHouse

CPU CORES

cores / threads

Parallel execution units. More cores = more concurrent work without context-switching penalty. App servers are typically CPU-bound during request processing. Clock speed (GHz) determines single-threaded performance.

Real numbers

App server range 8 – 64 cores

Typical clock speed 2 – 4 GHz

Max before horizontal scale ~128 cores (2S)

Scale trigger

CPU > 70% sustained → add instances (horizontal) before adding cores (vertical)

real world

c6i.8xlargeNode.js clusterGo goroutines

CONCURRENT CONNECTIONS

conn / instance

Number of simultaneous open connections an app server holds. Async runtimes (Go, Node.js, Nginx) handle far more connections per core than thread-per-request models (Apache prefork). This is often the first ceiling hit in real-time or WebSocket-heavy systems.

Real numbers

Thread-per-request (Java) 1k – 5k conn

Async (Go / Node.js) 100k+ conn

Nginx (static serving) ~1M conn

Scale trigger

Connections near 100k/instance → scale out + load balance

real world

NginxGo net/httpNode.jsDurable Objects

BANDWIDTH

Gbps / Mbps

Maximum data transfer rate of a network link. Bandwidth is the pipe width; throughput is how full the pipe is. Cloud instance network bandwidth is a hard ceiling — exceeding it causes packet drops and queue buildup, not graceful degradation.

Real numbers

Small cloud instance 1 – 10 Gbps

Large instance (c6i.32xl) 50 Gbps

Kafka broker (heavy) 10 – 25 Gbps

AWS enhanced networking max 100 Gbps

Scale trigger

Network saturation > 80% → move to larger instance or add nodes for horizontal spread

real world

AWS ENAKafka brokersCDN edge nodes

CONSUMER LAG

messages behind

In message queues, the gap between the latest message produced and the last message consumed. Lag is the leading indicator of a system falling behind — it means producers are outpacing consumers. Growing lag = bandwidth or processing bottleneck downstream.

Real numbers

Healthy Kafka consumer lag < 1,000 msgs

Alert threshold > 10k msgs growing

Max Kafka partitions/cluster ~200k partitions

Scale trigger

Lag growing → add consumer instances up to partition count, then add partitions

real world

Kafka consumer groupsSQS ApproximateAge

RAM

GB / TB

Working memory — the fastest storage tier (after CPU cache). Caches live entirely in RAM. DBs use RAM for buffer pools to avoid disk I/O. When RAM fills, systems page to disk, causing latency to spike 1000x.

Real numbers

App server standard 64 – 512 GB

Cache node (Redis) up to 1 TB

DB buffer pool (Postgres) 25–40% of total RAM

Scale trigger

Memory > 80% → scale before paging occurs (paging = catastrophic latency)

real world

Redis ElastiCacher6i.metalPostgres shared_buffers

STORAGE CAPACITY

TiB / PB

Total persistent data volume a system holds. Databases use it for rows, indexes, WAL logs. Message queues retain it for replay windows (Kafka default: 7 days). Capacity planning drives replication factor × data size × retention period.

Real numbers

PostgreSQL max practical 64 TiB

Kafka cluster retention up to 50 TB

S3 / object storage unlimited (exabytes)

Scale trigger

Storage > 70% → provision more nodes or offload cold data to object storage

real world

Kafka tiered storageS3Snowflake

THE NUMBERS TO MEMORIZE

Latency

Cache hit → ~1 ms
DB cached read → < 5 ms
Same DC network → ~0.5 ms
Cross-region → 30–150 ms

Throughput

Redis → 100k+ ops/sec
Postgres → 50k TPS max
Kafka → 1M msgs/sec
App server → 100k conn

Scale Triggers

CPU → > 70%
Memory → > 80%
Cache hit rate → < 80%
DB writes → > 10k TPS

Storage

Redis node → up to 1 TB RAM
Postgres → 64 TiB
Kafka retention → 50 TB
NVMe → 1M+ IOPS