Staff-Level Interview Prep
NAT Gateway
Internals
How AWS NAT Gateway really works under the hood โ€” SNAT, connection tracking, port exhaustion, AZ affinity, and the failure modes interviewers expect you to know.
Connection Tracking Port Exhaustion AZ Affinity Scaling 5-tuple SNAT
What NAT GW Actually Does
NAT Gateway implements SNAT (Source NAT) โ€” it rewrites the source IP and source port of outbound packets from private EC2s to its own Elastic IP, then tracks the mapping to route replies back. It never touches destination IP โ€” that's DNAT territory (which IGW handles for inbound).
VPC 10.0.0.0/16 PRIVATE 10.0.2.0/24 EC2-A 10.0.2.10 : 52341 EC2-B 10.0.2.11 : 48821 EC2-C 10.0.2.12 : 61009 Route Table 0.0.0.0/0 โ†’ nat-xxxxxxxx PUBLIC 10.0.1.0/24 NAT Gateway EIP: 52.23.101.45 SNAT Engine src: 10.0.2.10:52341 โ†’ 52.23.101.45:1024 src: 10.0.2.11:48821 โ†’ 52.23.101.45:1025 Internet GW ๐ŸŒ Internet 93.184.216.34:443 src: 10.0.2.10:52341 dst: 93.184.216.34:443 src: 52.23.101.45:1024 dst: 93.184.216.34:443 reply lookup in conn table โ‘  EC2 sends packet with private src IP โ‘ก NAT GW rewrites src โ†’ EIP + unique port โ‘ข IGW forwards. Reply reverses the mapping.
5-Tuple SNAT Translation
Translation unit5-tuple: src IP, src port, dst IP, dst port, protocol
What changesSource IP โ†’ EIP ยท Source port โ†’ allocated port
What's unchangedDestination IP, dst port, protocol
Port range1024 โ€“ 65535 per EIP (64,512 ports)
Multiple EIPsUp to 8 EIPs; scales port space 8ร—
ICMPICMP ID field used as pseudo-port
UDPSame mechanism; 350s idle timeout
TCP350s established; 60s on half-close
NAT GW vs NAT Instance
ManagementFully managed (vs self-managed EC2)
HABuilt-in within an AZ (vs single-point EC2)
BandwidthAuto-scales to 100 Gbps
Security GroupsCannot attach SGs to NAT GW
Port forwardingNot supported (NAT instance can)
Bastion useNot possible (NAT instance can)
Cost (idle)~$0.045/hr always-on vs EC2 stoppable
src/dst checkNAT instance must disable; NAT GW N/A
Connection Tracking Table
NAT GW maintains a per-flow connection tracking table in memory. Every active TCP/UDP flow has an entry keyed by 5-tuple. This is what allows it to demultiplex inbound reply packets back to the correct private EC2 โ€” without any routing table entry pointing inward.
Live Connection Table (representative snapshot)
Private (src) Translated (src) Destination Proto State Idle
10.0.2.10:52341 52.23.101.45:1024 93.184.216.34:443 TCP ESTABLISHED 2s
10.0.2.11:48821 52.23.101.45:1025 93.184.216.34:443 TCP ESTABLISHED 14s
10.0.2.12:61009 52.23.101.45:1026 8.8.8.8:53 UDP ACTIVE 1s
10.0.2.10:44100 52.23.101.45:1027 169.254.170.2:80 TCP TIME_WAIT 58s
10.0.2.11:39002 52.23.101.45:1028 10.0.3.5:5432 TCP SYN_SENT โ€”
Idle Timeouts โ€” The Trap
TCP established350 seconds (โ‰ˆ5.8 min)
TCP transitory30โ€“350 seconds (FIN / RST)
UDP350 seconds
ICMP30 seconds
What happens at timeoutEntry purged. Stale packet โ†’ ErrorPortAllocation
โš  Long-lived idle TCP connections (DB pools, keep-alives) will be silently dropped. The TCP stack on EC2 thinks the connection is still open; NAT GW has forgotten it. Fix: enable TCP keepalives at the OS or app layer with interval < 350s.
TCP Keep-Alive Fix
Set at kernel level per EC2 instance (or via launch template user-data):
# Keepalive interval (seconds before first probe)
net.ipv4.tcp_keepalive_time = 60

# Interval between probes
net.ipv4.tcp_keepalive_intvl = 10

# Probes before declaring dead
net.ipv4.tcp_keepalive_probes = 3

# Apply
sysctl -p
โœ“ Also configure keepalive at the app layer: Go net.Dialer.KeepAlive, PostgreSQL keepalives_idle, MySQL wait_timeout. Belt + suspenders.
Connection Tracking vs Route Table โ€” A Subtlety

When a reply packet arrives from the internet at IGW, AWS does not use the route table to forward it. The route table only handles outbound new-flow routing. For inbound replies, the connection tracking table in NAT GW is the sole authority. The packet hits IGW โ†’ IGW sees the dst IP is NAT GW's EIP โ†’ forwards to NAT GW โ†’ NAT GW looks up its conn table โ†’ rewrites dst back to private IP โ†’ delivers to EC2.

This is why you cannot use NACLs to block return traffic for a NAT'd flow โ€” by the time the packet enters the subnet, it's already been translated. The NACL sees the private IP, not the EIP.

Port Exhaustion โ€” The Scaling Cliff
Each NAT GW EIP has 64,512 ports (1024โ€“65535). A flow is uniquely identified by the 5-tuple from NAT GW's perspective: (EIP, translated-src-port, dst-IP, dst-port, protocol). If many EC2s hammer the same dst-IP:dst-port, the unique dimension is only translated-src-port โ€” and you'll exhaust 64K fast.
Port Math โ€” How You Exhaust
Ports per EIP64,512 (1024โ€“65535)
Unique dimension when dst fixedOnly src-port (64,512 slots total)
50 EC2s โ†’ same endpoint64,512 / 50 = ~1,290 conns each max
Error metricErrorPortAllocation in CloudWatch
ImpactNew connections silently dropped
๐Ÿ”ฅ A Lambda function with concurrency=1000 all calling the same RDS endpoint can exhaust 64K ports in seconds. This is a real prod incident pattern.
Mitigations
  • โ‘  Add more EIPs โ€” up to 8 per NAT GW. Multiplies port space 8ร— to ~500K.
  • โ‘ก Reduce unique dst endpoints โ€” use connection pooling (PgBouncer, RDS Proxy) to reduce distinct dst-IP:port combos.
  • โ‘ข Multiple NAT GWs โ€” split subnets; each private subnet routes to its own NAT GW. Shards the connection space.
  • โ‘ฃ Reuse connections โ€” HTTP/2 multiplexing, long-lived gRPC streams, fewer short-lived TCP connections.
  • โ‘ค PrivateLink / VPC Endpoints โ€” route AWS-API traffic (S3, DynamoDB, SQS) via gateway endpoints. Zero NAT GW ports, zero data cost.
  • โ‘ฅ IPv6 + Egress-only IGW โ€” IPv6 flows bypass NAT entirely. No port translation needed.
Port Space Comparison
1 EIP (baseline)
64K
64,512
2 EIPs
129K
129,024
4 EIPs
258K
258,048
8 EIPs (max)
516K
516,096
Same w/ PrivateLink (S3/DDB)
โˆž
Zero NAT ports
CloudWatch Alarms to Set
MetricNamespaceAlarm thresholdWhy
ErrorPortAllocationAWS/NATGatewaySum > 0 (5 min)Port exhaustion is occurring NOW
PacketsDropCountAWS/NATGatewaySum > 0 (5 min)Packets being silently dropped
ActiveConnectionCountAWS/NATGateway> 50,000 per EIPApproaching exhaustion warning
ConnectionAttemptCountAWS/NATGatewaySpike ratioDetect connection storms
BytesInFromDestinationAWS/NATGatewayCost spike detectionUnexpected data transfer cost
HA & AZ Design โ€” The #1 Mistake
NAT Gateway is highly available within a single AZ. It is not cross-AZ. If you put one NAT GW in AZ-a and your private EC2s in AZ-b route through it, you pay cross-AZ data transfer AND lose outbound connectivity if AZ-a goes down. One NAT GW per AZ is the canonical pattern.
โŒ BAD โ€” Single NAT GW AZ-A Public NAT GW (only one) AZ-A Private EC2-A AZ-B Private EC2-B cross-AZ $0.01/GB + AZ-A outage = down โœ“ GOOD โ€” NAT GW per AZ AZ-A NAT GW-A Private Subnet-A EC2-A RT: 0.0.0.0/0 โ†’ NGW-A AZ-B NAT GW-B Private Subnet-B EC2-B RT: 0.0.0.0/0 โ†’ NGW-B Both NGWs โ†’ same IGW โ†’ Internet
Single NAT GW Risks
AZ failureAll outbound traffic from other AZs fails
Cross-AZ data cost$0.01/GB for traffic crossing AZ boundary
LatencyAdds cross-AZ hop to every outbound packet
Blast radiusSingle point for port exhaustion
NAT GW per AZ Benefits
AZ failureOnly that AZ's egress is affected
Data costTraffic stays in-AZ, no cross-AZ charges
Port spacePartitioned: 64K ports ร— number of AZs
Cost trade-off+$0.045/hr per additional NAT GW (3 AZs = $99/mo baseline)
โ„น Private NAT Gateway (2022 feature): You can now create a NAT GW in a private subnet โ€” no EIP required. Useful for VPC-to-VPC traffic where overlapping CIDRs make peering impossible. Traffic is translated but stays private (no internet egress).
Cost Model & Gotchas
NAT GW is often the surprise budget item in AWS bills. Two charges: hourly ($0.045/hr โ‰ˆ $33/mo per gateway) + data processing ($0.045/GB outbound). The data charge applies even for traffic that goes to other AWS services โ€” which VPC Endpoints eliminate entirely.
Billing Breakdown
Hourly (per NAT GW)$0.045/hr โ‰ˆ $32.85/mo
Data processing$0.045 per GB (both directions)
3 AZs, prod setup~$99/mo just for hourly
1 TB/day outbound~$1,350/mo data charges
Cross-AZ data via NAT+$0.01/GB AZ charge on top
VPC Endpoint (S3/DDB)$0.00 per GB โ€” no NAT touch
Cost Optimizations
  • S3 / DynamoDB โ†’ Gateway Endpoint (free, no NAT)
  • SQS / SNS / ECR / etc. โ†’ Interface Endpoint (PrivateLink)
  • Lambda โ†’ AWS APIs โ†’ keep in VPC with endpoints
  • ECS Fargate โ†’ VPC endpoint for ECR image pulls
  • Dev/non-prod โ†’ NAT Instance (t3.nano โ‰ˆ $3/mo)
  • Batch jobs off-hours โ†’ Public subnet + auto-assign public IP (no NAT needed)
Hidden Gotchas Checklist
GotchaWhat HappensFix
Idle TCP timeout (350s) DB pool connections silently killed; app sees stale connection errors OS TCP keepalive < 350s + app-level keepalive
Port exhaustion New connections dropped; ErrorPortAllocation spikes Add EIPs, VPC endpoints, connection pooling
Single NAT GW Cross-AZ charges + single AZ SPOF for egress 1 NAT GW per AZ; route table per private subnet
No SG on NAT GW Can't restrict which EC2s use it via security group Use NACLs on private subnet or restrict at EC2 SG egress rules
Cannot log flows NAT GW itself not visible in VPC Flow Logs by ENI Enable VPC Flow Logs on private subnet ENIs; CloudWatch NAT GW metrics
EIP cost Each EIP attached to a running resource: free. Detached: $0.005/hr Always attach; release unused EIPs
DNS via NAT Route 53 Resolver handles DNS; doesn't go through NAT GW Understand that DNS is exempt from NAT; .2 resolver address always works
Staff-Level Interview Q&A
These are the questions that separate Staff candidates from Senior. Expand each to see the answer framing. Focus on the why and trade-offs, not just facts.
Q1 โ€” A private EC2 is making outbound HTTPS calls to an S3 bucket. Describe exactly what happens at each network hop, and identify where you'd optimize cost. โ–พ
Full hop trace: EC2 โ†’ Route table โ†’ NAT GW (SNAT: private IP โ†’ EIP + ephemeral port) โ†’ IGW (1:1 NAT: EIP โ†’ public IP of IGW) โ†’ AWS S3 public endpoint โ†’ response reverses the path, conn table lookup at NAT GW restores original private IP.

Cost optimization: Every byte processed by NAT GW costs $0.045/GB. S3 is accessible via an Gateway VPC Endpoint at zero cost. Add the endpoint, update the route table with a prefix list entry for S3 โ†’ the endpoint, and NAT GW is bypassed entirely. This is the most impactful single change for S3-heavy workloads.
Q2 โ€” Your application has 200 Lambda functions all calling the same RDS PostgreSQL instance. You're seeing intermittent connection failures. What's the root cause and fix? โ–พ
Root cause โ€” port exhaustion: All 200 Lambdas are behind a single NAT GW EIP, calling the same dst IP:port (RDS endpoint :5432). The 5-tuple unique dimension collapses to just translated src port. 64,512 ports รท short-lived Lambda connections = exhaustion fast. CloudWatch ErrorPortAllocation will confirm this.

Fix strategy (layered): โ‘  Deploy RDS Proxy โ€” it pools and multiplexes connections, so Lambdas share a fixed set of real DB connections. This is the primary fix. โ‘ก Add more EIPs to NAT GW (up to 8). โ‘ข Consider RDS Proxy via PrivateLink to remove NAT entirely. โ‘ฃ If Lambdas are creating DB connections on every invocation, fix connection reuse by initialising outside the handler.
Q3 โ€” You have a 3-AZ VPC with one NAT GW in AZ-A. A colleague says "it's highly available because NAT GW auto-scales." Are they correct? โ–พ
Partially correct, fundamentally wrong: NAT GW is managed and auto-scales bandwidth within its AZ. But it has no cross-AZ failover. If AZ-A has an outage, EC2s in AZ-B and AZ-C lose all outbound internet access โ€” even though they're perfectly healthy.

Correct HA design: Deploy one NAT GW in the public subnet of each AZ. Configure the private route table for each AZ to point 0.0.0.0/0 to that AZ's own NAT GW. Now an AZ failure only isolates that AZ's egress โ€” other AZs are unaffected. Side benefit: eliminates cross-AZ data transfer charges ($0.01/GB).
Q4 โ€” Explain the difference between how NAT GW handles TCP vs UDP from a connection tracking perspective. โ–พ
TCP โ€” stateful with timeout: TCP connection tracking follows the 3-way handshake. NAT GW tracks SYN, SYN-ACK, and FIN/RST to manage state transitions. Idle timeout is 350s for ESTABLISHED, 60s for transitory states. It tracks the full state machine.

UDP โ€” timeout-only: UDP has no handshake. NAT GW creates an entry on the first packet and purges it after 350s of inactivity. There's no concept of "connection closed." Any packet from the destination to the allocated EIP:port within that window gets forwarded back. After timeout, the same dst sending a packet hits no matching entry โ†’ dropped.

Key implication: DNS (UDP :53) is fire-and-forget sub-second so timeout doesn't matter. But long-lived UDP applications (e.g., QUIC/HTTP3, game servers) will see NAT bindings expire silently โ€” apps need to implement their own keepalive probes.
Q5 โ€” Design the egress architecture for a multi-account, 3-region AWS organization with 40 VPCs. โ–พ
Centralized egress via Transit Gateway: The gold standard for >10 VPCs.

โ‘  Each region: One shared "Egress VPC" per region (owned by network account). This VPC has NAT GWs (one per AZ) and an IGW.
โ‘ก TGW: All 40 spoke VPCs attach to TGW. Spoke VPCs have NO IGW, no NAT GW.
โ‘ข TGW route tables: Default route in spoke VPCs โ†’ TGW โ†’ routed to Egress VPC โ†’ NAT GW โ†’ IGW.
โ‘ฃ Benefits: All internet egress through a controlled choke point. WAF / firewall appliances can be inserted in Egress VPC. Only N NAT GWs (N = AZs ร— regions) instead of NAT GWs per VPC.
โ‘ค Trade-offs: TGW data charge ($0.02/GB). Centralized failure point for egress (mitigated by multi-AZ NAT GWs in Egress VPC). Slightly higher latency due to TGW hop.

For AWS-service traffic: Deploy VPC Endpoints (S3, DDB gateway; SQS/SNS interface) in each spoke VPC โ€” never touch NAT or TGW.
Q6 โ€” Why can't you put a Security Group on a NAT Gateway? What's the implication for security controls? โ–พ
Architecture reason: NAT GW is not an ENI-based resource you manage โ€” it's a managed AWS service that presents as a gateway target in route tables. Security Groups attach to ENIs (Elastic Network Interfaces). Since NAT GW's internal implementation is opaque (multiple ENIs under the hood, managed by AWS), AWS doesn't expose SG attachment.

Security implications & compensating controls:
โ‘  EC2 SG egress rules โ€” restrict which destinations your EC2s can reach. The SG is enforced at the EC2's ENI before the packet reaches NAT GW.
โ‘ก NACLs on private subnet โ€” stateless outbound rules can block specific IP ranges from leaving the subnet toward NAT GW.
โ‘ข AWS Network Firewall in the egress path โ€” insert a centralized inspection appliance between private subnet and NAT GW for deep packet inspection, IDS/IPS, domain-based filtering.
โ‘ฃ VPC Flow Logs โ€” audit trail even without SG.

This is a common Staff question to test whether you understand the SG attachment model and can reason about compensating controls.