How to Fix Terraform Too Many Open Files on AWS EC2
Troubleshooting “Terraform Too Many Open Files” on AWS EC2
As DevOps engineers managing complex infrastructure, encountering “Too Many Open Files” errors can be a frustrating, yet common, occurrence. When working with Terraform on AWS EC2, this particular error typically indicates that the Terraform process has exceeded its allocated maximum number of file descriptors (FDS). This guide will walk you through understanding, quickly fixing, and permanently resolving this issue.
1. The Root Cause: Why This Happens on AWS EC2
The “Too Many Open Files” error, often manifesting as socket: too many open files or resource temporarily unavailable, occurs when a process attempts to open more file descriptors than the operating system allows it.
On AWS EC2 instances, the root causes for Terraform are typically:
- Conservative Default Limits: Linux distributions, by default, set a relatively low limit for the maximum number of open files per process (
ulimit -n). This default is often sufficient for most standard applications but falls short for demanding tools like Terraform. - Terraform’s I/O Intensive Nature:
- State Files & Configuration: Terraform opens numerous local files for its configuration (
.tf), modules, and state files (terraform.tfstate). - Provider Plugins: Each Terraform provider (e.g., the AWS provider) runs as a separate process or goroutine and establishes numerous connections to the AWS API. Each API call, S3 object interaction, or EC2 instance metadata query can consume a file descriptor or network socket.
- Concurrency: Terraform’s ability to provision and manage resources in parallel exacerbates the issue. When managing hundreds or thousands of resources across multiple modules, the cumulative sum of open files, network sockets, and inter-process communication (IPC) channels can quickly exceed thefault limits.
- State Files & Configuration: Terraform opens numerous local files for its configuration (
- Long-Running Operations: Longer
terraform applyorterraform planoperations, especially during large infrastructure deployments or deletions, can keep many file descriptors open for extended periods.
Essentially, Terraform, particularly when dealing with large-scale AWS environments, is an I/O and network-heavy application that can easily push against the default ulimit -n settings on an EC2 instance.
2. Quick Fix (CLI)
For immediate relief and to unblock an ongoing deployment, you can temporarily increase the ulimit for your current shell session.
-
Check Current Limits: First, verify your current limit:
ulimit -nThis will likely show a value like
1024or4096. -
Increase Limit Temporarily: In the same shell session where you plan to run Terraform, increase the
nofile(number of open files) limit. A common recommendation for Terraform is65536or even131072.ulimit -n 65536Note: If you get a “permission denied” error, it means your current hard limit prevents you from setting a higher soft limit. You might need to set the hard limit as root first, or log in as root for this quick fix.
-
Verify New Limit:
ulimit -nConfirm it reflects the new value.
-
Run Terraform: Now, execute your Terraform command (
terraform plan,terraform apply, etc.) within that same shell session.terraform apply
Important: This fix is temporary and applies only to the current shell session. If you open a new terminal, log out, or the process is run by a different user or systemd service, the limit will revert to the default.
3. Configuration Check (Permanent Solution)
To prevent this issue from recurring, you need to configure the system to provide higher file descriptor limits permanently.
a. System-Wide User Limits (/etc/security/limits.conf)
This is the most common and robust way to set higher limits for specific users or all users.
-
Edit
limits.conf: Open/etc/security/limits.confwithsudo.sudo vim /etc/security/limits.conf -
Add/Modify Entries: Add the following lines to the end of the file. Replace
youruserwith the specific user account that runs Terraform, or use*for all non-root users.# <domain> <type> <item> <value> youruser soft nofile 65536 youruser hard nofile 65536 # Or for all non-root users: * soft nofile 65536 * hard nofile 65536softlimit: The current limit for the user/process. It can be increased by the user up to thehardlimit.hardlimit: The absolute maximum limit that a user can set. Onlyrootcan increase the hard limit once it’s set.- A value of
65536is usually a good starting point. For very large deployments, you might go higher (e.g.,131072or262144).
-
Ensure
pam_limitsModule is Enabled: Most modern Linux distributions have this enabled by default, but it’s worth verifying. Check/etc/pam.d/common-session(Debian/Ubuntu) or/etc/pam.d/system-auth(RHEL/CentOS) for a line similar to:session required pam_limits.soIf it’s commented out or missing, uncomment/add it.
-
Apply Changes: For these changes to take effect, you must log out and log back in to the EC2 instance (or reboot the instance). This ensures your session inherits the new limits.
b. Systemd Service Unit Files (for CI/CD Agents or Background Services)
If Terraform is executed by a systemd service (e.g., a CI/CD agent like Jenkins, GitLab Runner, or a custom background process), the limits.conf file might not directly apply to the service’s processes. In this case, you must configure the limit directly within the systemd unit file.
-
Identify the Service: Determine the systemd service name responsible for running Terraform (e.g.,
jenkins.service,gitlab-runner.service,your-custom-terraform-app.service). -
Create an Override or Edit the Service File:
- Preferred (Override): Create an override file to avoid modifying the original package-managed service file:
This will open an editor for a new file likesudo systemctl edit <service_name>/etc/systemd/system/<service_name>.d/override.conf. Add:[Service] LimitNOFILE=65536 LimitNPROC=65536 # Optional: also good for process limits - Direct Edit (Less Preferred): Directly edit the service file, usually found in
/etc/systemd/system/or/usr/lib/systemd/system/.
Locate thesudo vim /etc/systemd/system/<service_name>.service[Service]section and add or modify theLimitNOFILEdirective:[Service] # ... other directives LimitNOFILE=65536 LimitNPROC=65536 # Optional: also good for process limits
- Preferred (Override): Create an override file to avoid modifying the original package-managed service file:
-
Reload Systemd and Restart Service:
sudo systemctl daemon-reload sudo systemctl restart <service_name>
c. System-Wide File Descriptor Limit (/etc/sysctl.conf)
While /etc/security/limits.conf sets per-process limits, fs.file-max in /etc/sysctl.conf defines the kernel’s maximum number of file handles that can be allocated across the entire system. If this system-wide limit is too low, it can prevent individual processes from getting their requested file descriptors, even if their ulimit -n is set high.
-
Edit
sysctl.conf:sudo vim /etc/sysctl.conf -
Add/Modify Entry: Add or modify the
fs.file-maxparameter. A value of2097152is a common high value.fs.file-max = 2097152 -
Apply Changes:
sudo sysctl -pThis applies the changes without requiring a reboot.
4. Verification
After implementing any of the permanent fixes, it’s crucial to verify that the changes have taken effect before attempting another terraform apply.
-
For User Limits (
limits.conf):- Log out and log back in to the EC2 instance.
- As the user running Terraform, execute:
It should now reflect your new, higher limit (e.g.,ulimit -n65536). - You can also check the limits of your current shell process by inspecting its
/procentry:cat /proc/$$/limits | grep 'Max open files'
-
For Systemd Services (
.servicefiles):- After restarting the service, check its configuration:
This should show your configured value (e.g.,sudo systemctl show <service_name> | grep LimitNOFILELimitNOFILE=65536). - Even better, find the PID of the running Terraform process (or the service’s main process) and check its live limits:
ps aux | grep terraform # Find the PID cat /proc/<terraform_pid>/limits | grep 'Max open files'
- After restarting the service, check its configuration:
-
For System-Wide Limits (
sysctl.conf):- Verify the system-wide limit:
This should show the value you set incat /proc/sys/fs/file-maxsysctl.conf.
- Verify the system-wide limit:
-
Run Terraform Again:
- Execute your
terraform applyorterraform plancommand on the problematic configuration. - Monitor the execution closely. The “Too Many Open Files” error should no longer appear.
- Execute your
By systematically applying these fixes, you can effectively resolve the “Terraform Too Many Open Files” error, ensuring your infrastructure deployments on AWS EC2 run smoothly and reliably.