TERMS TO KNOW

System Design · Infrastructure Metrics
Scale Triggers · Real-world Anchors

Latency
LATENCY
measured in ms / µs
Time elapsed between sending a request and receiving the first byte of response. The single most human-felt metric — users notice >100ms. Engineers target <1ms inside the data center.
Real numbers
Cache hit (Redis in-memory) ~0.1 – 1 ms
DB read, cached (Postgres) < 5 ms
Message queue (Kafka e2e) < 5 ms
Cross-region network 30 – 150 ms
Scale trigger
Cache > 1ms → add replicas · DB uncached > 5ms → add read replicas
real world
RedisMemcachedKafka
P99 / TAIL LATENCY
percentile
The latency experienced by the slowest 1% of requests. Averages lie — P99 reveals outliers that kill user experience. Distributed systems compound tail latency: if one service is slow, every caller waits.
Real numbers
P50 (median, healthy) 1 – 3 ms
P99 (acceptable) < 50 ms
P99.9 (worst 0.1%) < 200 ms
Scale trigger
P99 breaching SLA → investigate slow paths, add capacity, or shed load
real world
Google SREAWS CloudWatchDatadog APM
Throughput
TPS
transactions / second
Number of complete database transactions committed per second. Each transaction can span multiple reads and writes. The gold-standard throughput metric for relational databases — dictates vertical scale ceiling before sharding.
Real numbers
Small Postgres instance 1k – 5k TPS
Large DB server (max) 50k TPS
Write scale trigger > 10k TPS
Scale trigger
Write > 10k TPS → shard, CQRS, or switch to write-optimized DB
real world
PostgreSQLMySQLCockroachDB
OPS / SEC
operations / second
Raw read or write operations per second, typically for caches or key-value stores. Broader than TPS — a single transaction can contain many ops. Caches are optimized to maximize this number since they are memory-bound, not disk-bound.
Real numbers
Redis single node 100k+ ops/sec
Memcached cluster 1M+ ops/sec
DynamoDB provisioned up to 40k r/w per table
Scale trigger
Hit rate < 80% or memory > 80% → scale cache cluster
real world
RedisMemcachedDynamoDB
MSGS / SEC
messages / second
Volume of discrete messages a queue or broker can ingest and deliver per second. Differs from TPS — messages are often fire-and-forget, async. Kafka partitions are the scaling unit: more partitions = higher parallelism.
Real numbers
Kafka single broker 1M msgs/sec
Safe throughput ceiling 800k msgs/sec
RabbitMQ single node ~50k msgs/sec
Scale trigger
Near 800k msgs/sec or consumer lag growing → add brokers + partitions
real world
KafkaRabbitMQPulsarSQS
Disk I/O
IOPS
I/O operations / second
How many individual read or write operations a disk can handle per second, regardless of data size. Small random reads (databases, indexes) are IOPS-bound. SSDs vastly outperform spinning disks on this metric.
Real numbers
HDD (spinning disk) 100 – 200 IOPS
SATA SSD 5k – 80k IOPS
NVMe SSD (local, cloud) 100k – 1M+ IOPS
AWS io2 Block Express 256k IOPS
Scale trigger
Disk queue length > 1 consistently → upgrade to NVMe or add storage nodes
real world
AWS EBS io2NVMe localPostgres WAL
DISK THROUGHPUT
MB/s or GB/s
Total data transferred to/from disk per second. Matters for large sequential reads: log streaming, analytics scans, backups. Distinct from IOPS — you can have high throughput with few large I/Os, or high IOPS with many small I/Os.
Real numbers
SATA SSD sequential 500 MB/s
NVMe SSD sequential 3 – 7 GB/s
Kafka segment writes ~1 GB/s sequential
Scale trigger
I/O wait > 20% in CPU profile → separate hot data to faster storage tier
real world
Kafka log segmentsSnowflake spillClickHouse
CPU
CPU CORES
cores / threads
Parallel execution units. More cores = more concurrent work without context-switching penalty. App servers are typically CPU-bound during request processing. Clock speed (GHz) determines single-threaded performance.
Real numbers
App server range 8 – 64 cores
Typical clock speed 2 – 4 GHz
Max before horizontal scale ~128 cores (2S)
Scale trigger
CPU > 70% sustained → add instances (horizontal) before adding cores (vertical)
real world
c6i.8xlargeNode.js clusterGo goroutines
CONCURRENT CONNECTIONS
conn / instance
Number of simultaneous open connections an app server holds. Async runtimes (Go, Node.js, Nginx) handle far more connections per core than thread-per-request models (Apache prefork). This is often the first ceiling hit in real-time or WebSocket-heavy systems.
Real numbers
Thread-per-request (Java) 1k – 5k conn
Async (Go / Node.js) 100k+ conn
Nginx (static serving) ~1M conn
Scale trigger
Connections near 100k/instance → scale out + load balance
real world
NginxGo net/httpNode.jsDurable Objects
Network Bandwidth
BANDWIDTH
Gbps / Mbps
Maximum data transfer rate of a network link. Bandwidth is the pipe width; throughput is how full the pipe is. Cloud instance network bandwidth is a hard ceiling — exceeding it causes packet drops and queue buildup, not graceful degradation.
Real numbers
Small cloud instance 1 – 10 Gbps
Large instance (c6i.32xl) 50 Gbps
Kafka broker (heavy) 10 – 25 Gbps
AWS enhanced networking max 100 Gbps
Scale trigger
Network saturation > 80% → move to larger instance or add nodes for horizontal spread
real world
AWS ENAKafka brokersCDN edge nodes
CONSUMER LAG
messages behind
In message queues, the gap between the latest message produced and the last message consumed. Lag is the leading indicator of a system falling behind — it means producers are outpacing consumers. Growing lag = bandwidth or processing bottleneck downstream.
Real numbers
Healthy Kafka consumer lag < 1,000 msgs
Alert threshold > 10k msgs growing
Max Kafka partitions/cluster ~200k partitions
Scale trigger
Lag growing → add consumer instances up to partition count, then add partitions
real world
Kafka consumer groupsSQS ApproximateAge
Memory & Storage
RAM
GB / TB
Working memory — the fastest storage tier (after CPU cache). Caches live entirely in RAM. DBs use RAM for buffer pools to avoid disk I/O. When RAM fills, systems page to disk, causing latency to spike 1000x.
Real numbers
App server standard 64 – 512 GB
Cache node (Redis) up to 1 TB
DB buffer pool (Postgres) 25–40% of total RAM
Scale trigger
Memory > 80% → scale before paging occurs (paging = catastrophic latency)
real world
Redis ElastiCacher6i.metalPostgres shared_buffers
STORAGE CAPACITY
TiB / PB
Total persistent data volume a system holds. Databases use it for rows, indexes, WAL logs. Message queues retain it for replay windows (Kafka default: 7 days). Capacity planning drives replication factor × data size × retention period.
Real numbers
PostgreSQL max practical 64 TiB
Kafka cluster retention up to 50 TB
S3 / object storage unlimited (exabytes)
Scale trigger
Storage > 70% → provision more nodes or offload cold data to object storage
real world
Kafka tiered storageS3Snowflake
Quick Reference — Numbers at a Glance
THE NUMBERS TO MEMORIZE
Latency
Cache hit → ~1 ms
DB cached read → < 5 ms
Same DC network → ~0.5 ms
Cross-region → 30–150 ms
Throughput
Redis → 100k+ ops/sec
Postgres → 50k TPS max
Kafka → 1M msgs/sec
App server → 100k conn
Scale Triggers
CPU → > 70%
Memory → > 80%
Cache hit rate → < 80%
DB writes → > 10k TPS
Storage
Redis node → up to 1 TB RAM
Postgres → 64 TiB
Kafka retention → 50 TB
NVMe → 1M+ IOPS