How to Fix Nginx Out of Memory (OOM) on Google Cloud Run


The Root Cause

Nginx Out of Memory (OOM) on Google Cloud Run specifically occurs when the combined memory footprint of Nginx worker processes and active connections exceeds the container’s allocated memory limit. Cloud Run’s serverless model provisions resources per-container instance, and an Nginx configuration not optimized for these constraints, especially under high concurrency, can quickly exhaust available memory, leading to container termination and restart.

Quick Fix (CLI)

Increase the memory allocated to your Cloud Run service. This provides an immediate larger buffer for Nginx.

gcloud run services update YOUR_SERVICE_NAME \
    --memory 512Mi \
    --region YOUR_REGION \
    --project YOUR_PROJECT_ID

(Adjust 512Mi as necessary; common increments are 256Mi, 512Mi, 1Gi, 2Gi. Replace YOUR_SERVICE_NAME, YOUR_REGION, and YOUR_PROJECT_ID.)

Configuration Check

The primary file to optimize is your nginx.conf (or custom Nginx configuration file that is copied into your Docker image), followed by ensuring your Dockerfile correctly implements these changes.

File to Edit: nginx.conf (or specific configuration file like default.conf if included in nginx.conf).

Key Lines to Change: Optimize Nginx to use fewer resources. Apply these changes within your nginx.conf file:

# In the main context (outside http/events blocks)
worker_processes 1; # Cloud Run scales containers; one worker per container is often sufficient.

events {
    worker_connections 256; # Reduce from default 1024 to conserve memory per worker.
    multi_accept on;
    use epoll; # Use epoll on Linux for efficient event handling.
}

http {
    # Basic settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 15; # Reduce default 75s timeout to free resources faster.
    keepalive_requests 50; # Limit requests per keepalive connection.

    # Reduce memory for client body buffering
    client_body_buffer_size 8k;
    client_max_body_size 1M; # Adjust based on actual payload requirements.

    # Disable or optimize logging to save I/O and memory
    access_log off; # Disable access logs if not strictly necessary for analysis.
    error_log /dev/stderr warn; # Direct error logs to stderr for Cloud Run, set to 'warn' level.

    # For reverse proxy configurations, reduce buffer sizes
    # proxy_buffers 4 8k;
    # proxy_buffer_size 4k;
    # proxy_busy_buffers_size 8k;
    # proxy_buffering off; # Consider disabling if backend is fast and latency is low.

    # Disable gzip if CPU/memory is extremely constrained, or optimize carefully
    # gzip off; 
    # gzip_comp_level 1; # If enabled, lower compression level for less CPU/memory.
}

Dockerfile Integration: Ensure your Dockerfile copies the optimized nginx.conf and starts Nginx correctly.

# ... (your base image and other Dockerfile commands) ...

# Copy your optimized Nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf

# If you have custom server blocks, copy them too
# COPY my_server.conf /etc/nginx/conf.d/default.conf

# Expose the port Nginx is listening on (default 8080 for Cloud Run)
EXPOSE 8080

# Command to run Nginx in the foreground
CMD ["nginx", "-g", "daemon off;"]

Verification

After redeploying your service with the memory increase (CLI) and/or Nginx configuration optimizations (Dockerfile), verify the fix by checking Cloud Run logs and metrics.

  1. Check for OOM events in logs:

    gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=YOUR_SERVICE_NAME AND textPayload:\"OOM\"" \
        --limit 10 \
        --format="json" \
        --project YOUR_PROJECT_ID

    If this command returns no recent results for OOM, it indicates the issue is likely resolved.

  2. Monitor Memory Utilization: Navigate to the “Metrics” tab for your Cloud Run service in the Google Cloud Console. Observe the “Memory utilization” chart. The memory footprint should now remain stable and below the allocated limit, even under load, and “Container exit events” should decrease or cease.