How to Fix Nginx Too Many Open Files on AWS EC2


Troubleshooting “Nginx Too Many Open Files” on AWS EC2

As a Senior DevOps Engineer, encountering “Too Many Open Files” errors (often EMFILE in Nginx logs) is a clear indicator that your Nginx server is hitting resource limits. While this isn’t an AWS-specific error, it’s a common challenge on EC2 instances, especially as traffic scales, due to default operating system configurations. This guide will walk you through diagnosing and resolving this issue efficiently.


1. The Root Cause

The “Too Many Open Files” error occurs when a process, in this case, Nginx, attempts to open more file descriptors than the operating system or its own configured limits allow. Each active connection, static file served, log file, socket, and even internal Nginx communication consumes a “file descriptor” (FD).

On AWS EC2 instances, the default nofile (number of open files) limit set by the Linux kernel is often conservative (e.g., 1024). While sufficient for basic operations, a high-traffic Nginx server can quickly exceed this:

  • High Concurrency: Many concurrent client connections.
  • Proxying: Numerous upstream connections to application servers.
  • Static Assets: Serving a large number of static files simultaneously.
  • Logging: Open log files.

When Nginx tries to exceed this nofile limit, it fails to open new files or accept new connections, leading to errors like accept() failed (24: Too many open files) in your Nginx error logs, resulting in client connection failures and service degradation.


2. Quick Fix (CLI)

Before diving into persistent configuration, let’s identify the problem and understand the current state.

2.1 Identify Symptoms

Check your Nginx error logs, typically located at /var/log/nginx/error.log, for entries similar to:

2023/10/27 10:30:05 [crit] 12345#0: *123456 connect() to 127.0.0.1:8080 failed (24: Too many open files) while connecting to upstream
2023/10/27 10:30:05 [alert] 12345#0: *123456 open file "/path/to/some/file.html" failed (24: Too many open files)
2023/10/27 10:30:05 [emerg] 12345#0: open() "/var/log/nginx/access.log" failed (24: Too many open files)

2.2 Diagnose Current Limits

  1. Find Nginx Master Process PID:

    pgrep nginx
    # This will typically return a few PIDs; the lowest one is usually the master process.
    # For example: 12345
  2. Check Actual Limits for the Running Nginx Process:

    cat /proc/<nginx_master_pid>/limits
    # Replace <nginx_master_pid> with the PID you found (e.g., 12345)
    # Look for the 'Max open files' line:
    # Max open files            1024                 1024                 files

    This output shows the Soft and Hard limits for the Nginx process. If these values are low (e.g., 1024 or 4096), you’ve found your culprit.

2.3 Temporary Relief (if possible)

A direct “quick fix” for a running Nginx process’s ulimit without restarting it is generally not feasible. The ulimit command sets limits for the current shell and processes spawned from it, not for already running services started by systemd or init.d.

However, if you must provide temporary relief before implementing persistent changes and a proper restart, your options are limited to:

  • Restart Nginx (after pre-configuring OS limits): If you quickly implement the OS-level changes (Section 3.1) and then restart Nginx, it will pick up the new limits. This is often the quickest path to resolution during an outage.
  • Scale Out (if load is the primary factor): Temporarily launch more EC2 instances behind your load balancer to distribute traffic and reduce the load on the affected instance. This doesn’t fix the underlying limit but buys time.
  • Reduce Traffic: Block non-essential traffic or enable maintenance mode if possible.

The real solution involves the persistent configuration changes discussed next, followed by a full Nginx restart.


3. Configuration Check (Persistent Fix)

To permanently resolve “Too Many Open Files,” you need to increase file descriptor limits at multiple levels.

3.1 Operating System Level

A. /etc/sysctl.conf (System-wide maximum files)

This defines the maximum number of file descriptors the entire kernel can allocate. While usually high enough, it’s good practice to ensure it accommodates your needs.

  1. Edit sysctl.conf:
    sudo vi /etc/sysctl.conf
  2. Add or modify the following line:
    fs.file-max = 2097152 # A common high value, adjust as needed.
  3. Apply changes:
    sudo sysctl -p

B. systemd Service Unit File (Recommended for Nginx on modern Linux)

For services managed by systemd (common on modern Linux distributions like Amazon Linux 2, Ubuntu 18.04+, RHEL 7+), systemd’s configuration often overrides /etc/security/limits.conf. This is the most robust way to set limits for Nginx.

  1. Create an override file for the Nginx service:

    sudo systemctl edit nginx

    This will open a new file (e.g., /etc/systemd/system/nginx.service.d/override.conf).

  2. Add the following content:

    [Service]
    LimitNOFILE=65536
    LimitNPROC=65536 # Also good to increase process limits for some setups
    • LimitNOFILE: Sets both Soft and Hard nofile limits.
    • 65536: A commonly recommended value for high-traffic servers. You might go higher (e.g., 131072 or 262144) depending on your application.
  3. Reload systemd to pick up the changes:

    sudo systemctl daemon-reload

C. /etc/security/limits.conf (Fallback/Legacy)

If systemd is not used, or as an additional safeguard, you can configure limits for the user Nginx runs as (e.g., nginx, www-data, ec2-user).

  1. Identify Nginx User:
    grep "user" /etc/nginx/nginx.conf
    # Typically: user www-data; or user nginx;
  2. Edit limits.conf:
    sudo vi /etc/security/limits.conf
  3. Add the following lines (replace nginx_user with the actual Nginx user):
    nginx_user  soft  nofile  65536
    nginx_user  hard  nofile  65536
    • soft: The current limit, which can be temporarily increased by a user up to the hard limit.
    • hard: The absolute ceiling for the nofile limit.
    • Note: Changes in limits.conf require a full system reboot or at least logging out and back in as the user for the changes to take effect in new sessions. For services, systemd overrides are much more direct.

3.2 Nginx Configuration Level

After adjusting OS limits, you should also configure Nginx to take advantage of them.

  1. Edit Nginx’s main configuration file:

    sudo vi /etc/nginx/nginx.conf
  2. Add worker_rlimit_nofile (inside the main block, outside http): This directive explicitly tells Nginx to set its worker process’s nofile limit. It should be equal to or less than the hard nofile limit set at the OS/systemd level.

    # /etc/nginx/nginx.conf
    
    worker_rlimit_nofile 65536; # Match or be lower than OS hard limit
    
    events {
        worker_connections 4096; # Max connections per worker process
                                # This should be about half of worker_rlimit_nofile
                                # e.g., if you have 8 worker processes and worker_connections=4096,
                                # Nginx can handle 8 * 4096 = 32768 active connections.
                                # Each connection uses at least 2 FDs (client + upstream).
    }
    
    http {
        # ... your existing http configuration
    }
  3. Test Nginx configuration for syntax errors:

    sudo nginx -t

    You should see test is successful.

  4. Restart Nginx to apply changes:

    sudo systemctl restart nginx

4. Verification

After implementing the configuration changes, it’s crucial to verify that they have taken effect and that the problem is resolved.

  1. Check Nginx Service Status:

    sudo systemctl status nginx

    Ensure Nginx is running without errors.

  2. Verify Running Nginx Process Limits: Repeat the steps from the “Quick Fix” section to check the actual limits of the running Nginx processes.

    pgrep nginx # Get Nginx PIDs
    # For each PID, run:
    cat /proc/<nginx_pid>/limits

    You should now see Max open files reflecting your new, higher limit (e.g., 65536).

  3. Monitor Nginx Error Logs: Keep an eye on /var/log/nginx/error.log for any new “Too Many Open Files” errors, especially under load.

    tail -f /var/log/nginx/error.log
  4. Perform Load Testing (Recommended): Use a load testing tool (e.g., ApacheBench (ab), siege, JMeter, Locust) to simulate high traffic to your Nginx server. Monitor its behavior and resource usage (CPU, memory, open file descriptors) during the test.

    • To check open FDs for Nginx: sudo lsof -p $(pgrep nginx | head -n 1) | wc -l (This counts all FDs, including those not related to connections, so it will be higher than actual connections).
    • To check active connections: netstat -an | grep ":80\|:443" | grep ESTABLISHED | wc -l

By following these steps, you should effectively resolve the “Nginx Too Many Open Files” error on your AWS EC2 instances, ensuring stable and performant service delivery. Remember to always test configuration changes in a staging environment before deploying to production.