How to Fix Terraform Too Many Open Files on AWS EC2


Troubleshooting “Terraform Too Many Open Files” on AWS EC2

As DevOps engineers managing complex infrastructure, encountering “Too Many Open Files” errors can be a frustrating, yet common, occurrence. When working with Terraform on AWS EC2, this particular error typically indicates that the Terraform process has exceeded its allocated maximum number of file descriptors (FDS). This guide will walk you through understanding, quickly fixing, and permanently resolving this issue.


1. The Root Cause: Why This Happens on AWS EC2

The “Too Many Open Files” error, often manifesting as socket: too many open files or resource temporarily unavailable, occurs when a process attempts to open more file descriptors than the operating system allows it.

On AWS EC2 instances, the root causes for Terraform are typically:

  • Conservative Default Limits: Linux distributions, by default, set a relatively low limit for the maximum number of open files per process (ulimit -n). This default is often sufficient for most standard applications but falls short for demanding tools like Terraform.
  • Terraform’s I/O Intensive Nature:
    • State Files & Configuration: Terraform opens numerous local files for its configuration (.tf), modules, and state files (terraform.tfstate).
    • Provider Plugins: Each Terraform provider (e.g., the AWS provider) runs as a separate process or goroutine and establishes numerous connections to the AWS API. Each API call, S3 object interaction, or EC2 instance metadata query can consume a file descriptor or network socket.
    • Concurrency: Terraform’s ability to provision and manage resources in parallel exacerbates the issue. When managing hundreds or thousands of resources across multiple modules, the cumulative sum of open files, network sockets, and inter-process communication (IPC) channels can quickly exceed thefault limits.
  • Long-Running Operations: Longer terraform apply or terraform plan operations, especially during large infrastructure deployments or deletions, can keep many file descriptors open for extended periods.

Essentially, Terraform, particularly when dealing with large-scale AWS environments, is an I/O and network-heavy application that can easily push against the default ulimit -n settings on an EC2 instance.


2. Quick Fix (CLI)

For immediate relief and to unblock an ongoing deployment, you can temporarily increase the ulimit for your current shell session.

  1. Check Current Limits: First, verify your current limit:

    ulimit -n

    This will likely show a value like 1024 or 4096.

  2. Increase Limit Temporarily: In the same shell session where you plan to run Terraform, increase the nofile (number of open files) limit. A common recommendation for Terraform is 65536 or even 131072.

    ulimit -n 65536

    Note: If you get a “permission denied” error, it means your current hard limit prevents you from setting a higher soft limit. You might need to set the hard limit as root first, or log in as root for this quick fix.

  3. Verify New Limit:

    ulimit -n

    Confirm it reflects the new value.

  4. Run Terraform: Now, execute your Terraform command (terraform plan, terraform apply, etc.) within that same shell session.

    terraform apply

Important: This fix is temporary and applies only to the current shell session. If you open a new terminal, log out, or the process is run by a different user or systemd service, the limit will revert to the default.


3. Configuration Check (Permanent Solution)

To prevent this issue from recurring, you need to configure the system to provide higher file descriptor limits permanently.

a. System-Wide User Limits (/etc/security/limits.conf)

This is the most common and robust way to set higher limits for specific users or all users.

  1. Edit limits.conf: Open /etc/security/limits.conf with sudo.

    sudo vim /etc/security/limits.conf
  2. Add/Modify Entries: Add the following lines to the end of the file. Replace youruser with the specific user account that runs Terraform, or use * for all non-root users.

    # <domain>      <type>  <item>         <value>
    youruser        soft    nofile          65536
    youruser        hard    nofile          65536
    # Or for all non-root users:
    *               soft    nofile          65536
    *               hard    nofile          65536
    • soft limit: The current limit for the user/process. It can be increased by the user up to the hard limit.
    • hard limit: The absolute maximum limit that a user can set. Only root can increase the hard limit once it’s set.
    • A value of 65536 is usually a good starting point. For very large deployments, you might go higher (e.g., 131072 or 262144).
  3. Ensure pam_limits Module is Enabled: Most modern Linux distributions have this enabled by default, but it’s worth verifying. Check /etc/pam.d/common-session (Debian/Ubuntu) or /etc/pam.d/system-auth (RHEL/CentOS) for a line similar to:

    session    required   pam_limits.so

    If it’s commented out or missing, uncomment/add it.

  4. Apply Changes: For these changes to take effect, you must log out and log back in to the EC2 instance (or reboot the instance). This ensures your session inherits the new limits.

b. Systemd Service Unit Files (for CI/CD Agents or Background Services)

If Terraform is executed by a systemd service (e.g., a CI/CD agent like Jenkins, GitLab Runner, or a custom background process), the limits.conf file might not directly apply to the service’s processes. In this case, you must configure the limit directly within the systemd unit file.

  1. Identify the Service: Determine the systemd service name responsible for running Terraform (e.g., jenkins.service, gitlab-runner.service, your-custom-terraform-app.service).

  2. Create an Override or Edit the Service File:

    • Preferred (Override): Create an override file to avoid modifying the original package-managed service file:
      sudo systemctl edit <service_name>
      This will open an editor for a new file like /etc/systemd/system/<service_name>.d/override.conf. Add:
      [Service]
      LimitNOFILE=65536
      LimitNPROC=65536 # Optional: also good for process limits
    • Direct Edit (Less Preferred): Directly edit the service file, usually found in /etc/systemd/system/ or /usr/lib/systemd/system/.
      sudo vim /etc/systemd/system/<service_name>.service
      Locate the [Service] section and add or modify the LimitNOFILE directive:
      [Service]
      # ... other directives
      LimitNOFILE=65536
      LimitNPROC=65536 # Optional: also good for process limits
  3. Reload Systemd and Restart Service:

    sudo systemctl daemon-reload
    sudo systemctl restart <service_name>

c. System-Wide File Descriptor Limit (/etc/sysctl.conf)

While /etc/security/limits.conf sets per-process limits, fs.file-max in /etc/sysctl.conf defines the kernel’s maximum number of file handles that can be allocated across the entire system. If this system-wide limit is too low, it can prevent individual processes from getting their requested file descriptors, even if their ulimit -n is set high.

  1. Edit sysctl.conf:

    sudo vim /etc/sysctl.conf
  2. Add/Modify Entry: Add or modify the fs.file-max parameter. A value of 2097152 is a common high value.

    fs.file-max = 2097152
  3. Apply Changes:

    sudo sysctl -p

    This applies the changes without requiring a reboot.


4. Verification

After implementing any of the permanent fixes, it’s crucial to verify that the changes have taken effect before attempting another terraform apply.

  1. For User Limits (limits.conf):

    • Log out and log back in to the EC2 instance.
    • As the user running Terraform, execute:
      ulimit -n
      It should now reflect your new, higher limit (e.g., 65536).
    • You can also check the limits of your current shell process by inspecting its /proc entry:
      cat /proc/$$/limits | grep 'Max open files'
  2. For Systemd Services (.service files):

    • After restarting the service, check its configuration:
      sudo systemctl show <service_name> | grep LimitNOFILE
      This should show your configured value (e.g., LimitNOFILE=65536).
    • Even better, find the PID of the running Terraform process (or the service’s main process) and check its live limits:
      ps aux | grep terraform # Find the PID
      cat /proc/<terraform_pid>/limits | grep 'Max open files'
  3. For System-Wide Limits (sysctl.conf):

    • Verify the system-wide limit:
      cat /proc/sys/fs/file-max
      This should show the value you set in sysctl.conf.
  4. Run Terraform Again:

    • Execute your terraform apply or terraform plan command on the problematic configuration.
    • Monitor the execution closely. The “Too Many Open Files” error should no longer appear.

By systematically applying these fixes, you can effectively resolve the “Terraform Too Many Open Files” error, ensuring your infrastructure deployments on AWS EC2 run smoothly and reliably.