How to Fix Terraform Too Many Open Files on CentOS 7
Troubleshooting “Terraform Too Many Open Files” on CentOS 7
As a Senior DevOps Engineer, few errors are as frustratingly vague and specific as “Too Many Open Files.” When Terraform, your infrastructure-as-code orchestrator, throws this error on a CentOS 7 system, it’s a clear signal that you’ve hit an operating system resource limit. This guide will walk you through diagnosing and permanently resolving this issue.
1. The Root Cause: Why This Happens on CentOS 7
The “Too Many Open Files” error, often presented as Error: resource temporarily unavailable or similar with a mention of file descriptors, indicates that the process running Terraform (or a child process it spawns) has exceeded its allowed number of open file descriptors (FDs).
What are File Descriptors? In Linux, almost everything is treated as a file. A file descriptor is an abstract handle used by a process to request I/O operations from the operating system. This includes:
- Actual files on the disk (e.g.,
.tffiles, state files, provider binaries). - Network sockets (for communicating with cloud APIs, databases, etc.).
- Pipes.
- Devices.
Why Terraform Hits This Limit: Terraform can be quite demanding on file descriptors, especially in complex environments:
- Numerous Resources/Modules: Each resource or data source may involve opening connections or reading configuration files.
- Multiple Providers: Each provider often maintains its own set of connections to various cloud APIs, which consume FDs.
- Large State Files: Reading and writing large
terraform.tfstatefiles can be resource-intensive. - Parallelism: When Terraform operates in parallel across many resources, it can rapidly open a significant number of network connections and local files simultaneously.
- Child Processes: Terraform might invoke external tools (e.g.,
kubectl,aws,azCLI tools), and these child processes also consume FDs that count towards the parent’s or user’s limits.
CentOS 7’s Role:
CentOS 7, by default, is configured with conservative limits for open file descriptors. For regular users, the soft limit (ulimit -n) is often 1024, and the hard limit (ulimit -Hn) is 4096. While sufficient for many general-purpose tasks, these values are frequently too low for a sophisticated orchestration tool like Terraform managing large infrastructure.
2. Quick Fix (CLI)
Before diving into permanent configuration changes, you can apply a temporary fix to allow your current shell session to run Terraform. This is useful for immediate operations or testing.
-
Check Current Limits:
ulimit -n ulimit -Hn(
ulimit -nshows the soft limit,ulimit -Hnshows the hard limit.) -
Increase Limits for the Current Session: You cannot set the soft limit higher than the hard limit without first increasing the hard limit.
# Set a higher hard limit first (requires root or specific permissions) # If your current hard limit is already sufficient, you can skip this. sudo sh -c "ulimit -Hn 65536" # Now set the soft limit ulimit -n 65536- Note: Choose a value appropriate for your workload,
65536is a common starting point for heavy usage. - Caveat: This change is only for the current shell session. It will revert to the default when you close the terminal or start a new session.
- Note: Choose a value appropriate for your workload,
-
Run Terraform: Execute your
terraform planorterraform applycommand within the same shell session where you increased the limits.
3. Configuration Check (Permanent Solutions)
For a lasting solution, you need to modify system-wide configuration files. The appropriate files depend on whether Terraform is run by a specific user, as part of a service, or requires a system-wide maximum increase.
3.1. User-Specific / System-Wide Limits (/etc/security/limits.conf)
This file allows you to define resource limits for specific users or groups upon login. This is the most common place to adjust limits for interactive users or CI/CD user accounts.
-
Edit
limits.conf:sudo vim /etc/security/limits.conf -
Add or Modify Entries: Add the following lines to the end of the file. The
*applies to all users (except root). You can replace*with a specific username (e.g.,terraform_user) or a group (e.g.,@devops).# <domain> <type> <item> <value> * soft nofile 65536 * hard nofile 131072nofile: Refers to the maximum number of open file descriptors.soft: The current limit that the system enforces for the user.hard: The maximum value the soft limit can be set to by a user. Non-root users cannot exceed their hard limit.
-
Save and Exit.
-
Apply Changes: For these changes to take effect, the user needs to log out and log back in. If this is for a CI/CD user, you might need to restart the CI/CD agent or the entire server depending on its configuration.
3.2. Service-Specific Limits (systemd Unit Files)
If Terraform is executed as part of a systemd service (e.g., a Jenkins agent, GitLab runner, or a custom application daemon that invokes Terraform), you need to adjust the systemd service unit file directly. The limits defined in limits.conf might not apply to services started by systemd.
-
Identify the Service: Determine the name of the
systemdservice that runs Terraform (e.g.,jenkins.service,gitlab-runner.service). -
Edit the Service Unit File: It’s best practice to create an override file to avoid direct modification of the main unit file, which could be overwritten during updates.
sudo systemctl edit <service_name>.serviceThis will open an editor for
/etc/systemd/system/<service_name>.service.d/override.conf. -
Add
LimitNOFILE: Add the following lines within the[Service]section:[Service] LimitNOFILE=65536- Note:
LimitNOFILEsets both the soft and hard limits to the specified value.
- Note:
-
Save and Exit.
-
Reload
systemdand Restart Service:sudo systemctl daemon-reload sudo systemctl restart <service_name>.service
3.3. Kernel-Level Maximum (/etc/sysctl.conf)
There’s a global, system-wide maximum for file descriptors that the entire kernel can allocate (fs.file-max). While rare, if you have a very busy server with many processes, each opening many files, you might hit this limit. All user and service limits are ultimately capped by fs.file-max.
-
Check Current Kernel Limit:
cat /proc/sys/fs/file-maxOn CentOS 7, this is often
200000or higher by default, which is usually sufficient. -
Edit
sysctl.conf:sudo vim /etc/sysctl.conf -
Add or Modify Entry: Add or modify the following line:
fs.file-max = 200000- Note: Choose a value significantly higher than your per-process or per-user limits, typically
200000to500000is more than enough for most systems.
- Note: Choose a value significantly higher than your per-process or per-user limits, typically
-
Save and Exit.
-
Apply Changes:
sudo sysctl -pThis command loads the settings from
/etc/sysctl.conf. No reboot is strictly required for this specific setting, but it’s good practice to ensure it persists.
4. Verification
After applying your chosen configuration changes, it’s crucial to verify that they have taken effect and that Terraform can now run without issues.
-
Verify New Limits (for Users):
- Log out from your SSH session or terminal.
- Log back in.
- Run
ulimit -nandulimit -Hnagain to confirm the soft and hard limits are at your desired values (e.g.,65536and131072).
-
Verify New Limits (for Services):
- After restarting the
systemdservice, you can check its limits:
Or, more robustly, find the PID of the running service process and then check its limits directly:sudo systemctl status <service_name>.service | grep LimitNOFILE
You should see your configured# Find PID (replace <service_name>) PID=$(sudo systemctl show --property MainPID <service_name>.service | cut -d'=' -f2) # Check limits for that PID cat /proc/$PID/limits | grep 'Max open files'LimitNOFILEvalue reflected in both soft and hard limits.
- After restarting the
-
Verify Kernel Limit:
cat /proc/sys/fs/file-maxThis should reflect the value you set in
sysctl.conf. -
Run Terraform: Execute the problematic
terraform planorterraform applycommand again. It should now complete successfully without hitting the “Too Many Open Files” error.
By systematically addressing these limits, you’ll ensure your CentOS 7 system can handle the resource demands of complex Terraform operations, allowing you to manage your infrastructure without unnecessary interruptions.