How to Fix Terraform Too Many Open Files on Azure VM


Troubleshooting “Terraform Too Many Open Files” on Azure VMs

As a Senior DevOps Engineer, encountering the “Too many open files” error while running Terraform is a clear signal that your operating system’s default resource limits are clashing with the demands of modern infrastructure as code. This guide will walk you through diagnosing and resolving this common issue specifically when managing Azure resources from a Linux-based Azure VM.


1. The Root Cause: Why This Happens

Operating systems, by default, impose limits on the number of file descriptors (also known as file handles) that any single process can open. These limits are designed to prevent a runaway process from consuming all available system resources.

Terraform, especially when managing a large number of resources, complex configurations, or interacting with multiple providers and modules, can open a significant number of file descriptors. Each resource (VM, network interface, storage account, etc.), each provider plugin instance, temporary files, state files, and network connections all contribute to this count.

When Terraform’s internal operations exceed the configured “open files” limit for its process, the operating system denies further requests for file descriptors, leading to the dreaded Error: too many open files message. While specific to Azure VMs in this context, the underlying problem is an OS-level configuration, not an Azure-specific bug. Linux distributions commonly used on Azure VMs (Ubuntu, RHEL, CentOS) all have these configurable limits.


2. Quick Fix (CLI): Temporary Limit Increase

For immediate relief and to confirm that file descriptor limits are indeed the problem, you can temporarily increase the limits for your current shell session. This is useful for testing or for one-off Terraform runs.

  1. Check Current Limits: Before making changes, see what your current limits are:

    ulimit -n # Displays the current soft limit for open files
    ulimit -Hn # Displays the current hard limit for open files

    You’ll likely see values like 1024 or 4096.

  2. Increase Limits: Use the ulimit command to increase both the soft and hard limits. The soft limit can never exceed the hard limit. A common recommendation for Terraform is 65536 or higher for complex environments.

    ulimit -n 65536  # Set the soft limit
    ulimit -Hn 65536 # Set the hard limit
    • Soft Limit (-n): This is the limit that is actively enforced for your current shell and processes launched from it.
    • Hard Limit (-Hn): This is the absolute maximum limit that the soft limit can be raised to by a non-root user. Only root can increase the hard limit.
  3. Verify Temporary Change: Run ulimit -n and ulimit -Hn again to ensure the changes have been applied to your session.

  4. Run Terraform: Now, execute your terraform plan or terraform apply command within the same terminal session where you set the ulimit.

Important: This fix is temporary. If you close the terminal, open a new one, or reboot the VM, these limits will revert to the system defaults.


3. Configuration Check: Making Limits Permanent

To ensure Terraform runs reliably without manual intervention, you need to configure the system to apply higher limits persistently. This typically involves modifying system configuration files.

A. For Interactive Logins / System-wide

If you are running Terraform directly from user logins (e.g., via SSH), you should modify /etc/security/limits.conf.

  1. Edit limits.conf: Open the file with sudo:

    sudo vim /etc/security/limits.conf

    Add the following lines to the end of the file:

    # Increase file descriptor limits for all users
    *    soft    nofile    65536
    *    hard    nofile    65536
    • *: Applies to all users. You can specify a particular username (e.g., azureuser) or group (e.g., @devops) if needed.
    • soft: Sets the soft limit.
    • hard: Sets the hard limit.
    • nofile: Specifies the maximum number of open files.
    • 65536: The desired limit. Adjust as necessary, but this is a good starting point.
  2. Save and Exit: Save the changes and exit the editor.

  3. Re-login: For these changes to take effect, you must log out and log back in to the Azure VM. New sessions will inherit the new limits.

B. For Systemd Services (e.g., CI/CD Agents)

If your Terraform operations are executed by a background service (e.g., a Jenkins agent, GitLab Runner, or a custom systemd service), modifying limits.conf might not be sufficient. Systemd services often manage their own process limits.

  1. Identify the Service: Determine the systemd service file responsible for running your Terraform workload (e.g., /etc/systemd/system/jenkins.service).

  2. Edit the Service File: Open the service unit file:

    sudo systemctl edit --full <service_name>
    # Or, if you know the file path:
    sudo vim /etc/systemd/system/<service_name>.service

    Locate the [Service] section and add or modify the LimitNOFILE directive:

    [Service]
    # ... other directives ...
    LimitNOFILE=65536
  3. Reload Systemd and Restart Service: After modifying the service file, you need to reload the systemd daemon and restart the service for the changes to take effect:

    sudo systemctl daemon-reload
    sudo systemctl restart <service_name>

C. Kernel-level Maximum (Less Common, but Important to Check)

While the above steps address per-process limits, there’s also a system-wide kernel limit on the total number of file handles available on the entire VM. If you’re setting very high per-process limits (e.g., >100K) and have many processes, you might hit this.

  1. Check Current Kernel Limit:

    cat /proc/sys/fs/file-max
  2. Increase Kernel Limit (if necessary): If your individual process limits are high and you suspect a system-wide bottleneck, you can increase this by editing /etc/sysctl.conf:

    sudo vim /etc/sysctl.conf

    Add or modify the line:

    fs.file-max = 200000 # Example: Set a higher value

    Apply the change:

    sudo sysctl -p

4. Verification: How to Test

After implementing the permanent fixes, it’s crucial to verify that the new limits are indeed active and that Terraform can run successfully.

  1. Verify User/System Limits:

    • For interactive logins: Log out and log back in to the Azure VM. Then, run ulimit -n and ulimit -Hn. They should reflect your new 65536 limit.
    • For Systemd services: Check the status of the service: sudo systemctl status <service_name>. Look for the LimitNOFILE entry under the CGroup or Process section.
  2. Verify a Running Process’s Limits: If Terraform is currently running (e.g., during a plan or apply), you can inspect its actual limits:

    • Find the Process ID (PID) of the Terraform process:
      pgrep terraform
    • Inspect its limits:
      cat /proc/<PID>/limits
      Look for Max open files entries. The “soft limit” and “hard limit” should match your configured values.
  3. Run Terraform with Confidence: Execute your terraform plan or terraform apply commands. Terraform should now complete its operations without encountering the “Too many open files” error. If you have a particularly large or complex configuration, try running that specific configuration to truly stress-test the new limits.

By following these steps, you’ll effectively eliminate the “Too many Open Files” error, allowing your Terraform deployments on Azure VMs to run smoothly and reliably.