How to Fix Python 504 Gateway Timeout on Ubuntu 20.04


As a Senior DevOps Engineer, encountering a “Python 504 Gateway Timeout” on an Ubuntu 20.04 server is a common signal that your web stack is under strain or misconfigured. This guide will walk you through the diagnostic and resolution steps.


Troubleshooting Python 504 Gateway Timeout on Ubuntu 20.04

A 504 Gateway Timeout indicates that an upstream server, crucial for fulfilling a client’s request, failed to respond in a timely manner. In a typical Python web application setup on Ubuntu 20.04, this almost invariably points to a communication breakdown between your reverse proxy (Nginx or Apache) and your Python WSGI application server (Gunicorn, uWSGI, etc.).

1. The Root Cause

On Ubuntu 20.04, a Python 504 Gateway Timeout most commonly arises from one of these scenarios:

  • Upstream Application Slowness: Your Python application itself is taking too long to process a request. This could be due to complex computations, long-running database queries, blocking I/O operations, or slow external API calls.
  • WSGI Server Bottleneck/Timeout: The WSGI server (e.g., Gunicorn or uWSGI) is configured with too few workers, or its internal timeout is set lower than the time required for your application to respond. Consequently, the WSGI server kills the worker process before it can send a response back.
  • Reverse Proxy Timeout: Nginx or Apache, acting as the reverse proxy, has a proxy_read_timeout (or equivalent) configured too low. It gives up waiting for a response from the WSGI server before the WSGI server or Python application has completed its task.
  • Resource Exhaustion: The server is experiencing high CPU usage, insufficient RAM, or heavy disk I/O, leading to overall system slowdowns that impact your application’s response time.
  • Deadlocked Application: A bug in your Python application causes a process to freeze or deadlock, preventing it from ever responding.

2. Quick Fix (CLI)

Before diving into configuration files, these commands can help diagnose or temporarily alleviate the issue:

  1. Check Application Logs:

    • For Nginx: tail -f /var/log/nginx/error.log
    • For Gunicorn (assuming systemd service gunicorn.service): sudo journalctl -u gunicorn -f
    • For uWSGI (assuming systemd service uwsgi.service): sudo journalctl -u uwsgi -f
    • Look for specific errors, worker timeouts, or resource warnings.
  2. Verify Service Status:

    • Check your WSGI server: sudo systemctl status gunicorn (or uwsgi)
    • Check your reverse proxy: sudo systemctl status nginx (or apache2)
    • Ensure all necessary services are running without errors.
  3. Restart Services (for transient issues):

    • sudo systemctl restart gunicorn (or uwsgi)
    • sudo systemctl restart nginx (or apache2)
    • This can often clear temporary deadlocks or resource leaks.
  4. Monitor System Resources:

    • top or htop: Check CPU and memory usage. High load averages or memory swapping can indicate resource bottlenecks.
    • free -h: Verify available RAM.
    • df -h: Check disk space, as a full disk can cause various issues.

3. Configuration Check

Adjusting timeout settings in your reverse proxy and WSGI server configurations is often the most direct solution.

A. Reverse Proxy Configuration

i. Nginx

Nginx’s default proxy_read_timeout is 60 seconds. If your Python application regularly takes longer, Nginx will issue a 504.

  • File Location: Typically in your site’s configuration file: /etc/nginx/sites-available/your_app_domain

  • Parameters to Adjust:

    • proxy_read_timeout: Timeout for reading a response from the upstream server.
    • proxy_connect_timeout: Timeout for connecting to the upstream server.
    • proxy_send_timeout: Timeout for sending a request to the upstream server.
  • Example Configuration (inside your location / or server block):

    http {
        ...
        proxy_read_timeout 300s;  # Increase to 5 minutes
        proxy_connect_timeout 300s;
        proxy_send_timeout 300s;
    
        server {
            ...
            location / {
                include proxy_params;
                proxy_pass http://unix:/run/gunicorn.sock; # Adjust for your WSGI server
                # You can also set these timeouts here if they're specific to this location
                # proxy_read_timeout 300s;
                # proxy_connect_timeout 300s;
                # proxy_send_timeout 300s;
            }
        }
    }
  • Apply Changes:

    1. Test Nginx configuration: sudo nginx -t
    2. Reload Nginx: sudo systemctl reload nginx

ii. Apache (with mod_proxy)

If you’re using Apache as a reverse proxy, you’ll need to adjust ProxyTimeout.

  • File Location: Typically in your virtual host configuration: /etc/apache2/sites-available/your_app_domain.conf
  • Parameter to Adjust: ProxyTimeout
  • Example Configuration (inside your <VirtualHost> block):
    <VirtualHost *:80>
        ServerName your_app_domain
        ProxyPreserveHost On
    
        <Proxy *>
            Order deny,allow
            Allow from all
        </Proxy>
    
        ProxyPass / unix:/run/gunicorn.sock|http://localhost:8000/ timeout=300
        ProxyPassReverse / unix:/run/gunicorn.sock|http://localhost:8000/
    
        # Or set globally if not already present
        ProxyTimeout 300
    </VirtualHost>
  • Apply Changes:
    1. Test Apache configuration: sudo apachectl configtest
    2. Reload Apache: sudo systemctl reload apache2

B. WSGI Server Configuration

i. Gunicorn

Gunicorn’s default worker timeout is 30 seconds. This is a common culprit for 504 errors if your application has any long-running requests.

  • Configuration Location: Often within your systemd service file (/etc/systemd/system/gunicorn.service) or a dedicated Gunicorn configuration file.

  • Parameters to Adjust:

    • --timeout <seconds>: The maximum number of seconds that a worker will be allowed to handle a request.
    • --workers <count>: Number of worker processes. Too few workers can cause requests to queue up and timeout. A common rule of thumb is (2 * CPU_CORES) + 1.
  • Example in gunicorn.service:

    [Service]
    User=youruser
    Group=www-data
    WorkingDirectory=/path/to/your/app
    ExecStart=/path/to/your/venv/bin/gunicorn \
              --access-logfile - \
              --workers 3 \
              --timeout 120 \  # Increase to 120 seconds (2 minutes)
              --bind unix:/run/gunicorn.sock \
              your_app_module:app
  • Apply Changes:

    1. Reload systemd daemon: sudo systemctl daemon-reload
    2. Restart Gunicorn: sudo systemctl restart gunicorn

ii. uWSGI

uWSGI offers similar timeout settings.

  • Configuration Location: Usually in a .ini file, e.g., /etc/uwsgi/apps-available/your_app.ini

  • Parameters to Adjust:

    • harakiri = <seconds>: Workers stuck for longer than this will be killed and recycled. This is the main worker timeout.
    • http-timeout = <seconds>: For requests coming directly over HTTP, if uWSGI is serving directly. Less relevant if using Nginx/Apache.
    • socket-timeout = <seconds>: Timeout for the communication socket between the reverse proxy and uWSGI.
    • processes = <count>: Number of worker processes.
  • Example in your_app.ini:

    [uwsgi]
    module = your_app_module:app
    callable = app
    master = true
    processes = 3
    socket = /run/uwsgi/your_app.sock
    chmod-socket = 666
    vacuum = true
    harakiri = 120  # Kill workers taking longer than 120 seconds
    socket-timeout = 120
  • Apply Changes:

    1. Restart uWSGI: sudo systemctl restart uwsgi (or your specific uWSGI app service)

C. Python Application Code

While not a configuration file, addressing the root cause within your Python code is crucial for long-term stability.

  • Identify Bottlenecks: Use profiling tools (e.g., cProfile, py-spy, APM solutions) to pinpoint functions or database queries that consume the most time.
  • Optimize Database Queries: Ensure efficient indexing, reduce N+1 queries, and use select_related/prefetch_related in ORMs.
  • External API Calls: Implement proper timeouts for external API calls within your code. Consider caching external responses.
  • Asynchronous Tasks: For genuinely long-running operations (e.g., image processing, report generation, sending bulk emails), offload them to an asynchronous task queue (e.g., Celery with RabbitMQ or Redis) rather than blocking the HTTP request-response cycle.

4. Verification

After making any configuration changes:

  1. Test the Endpoint: Use curl or your web browser to access the endpoint that was previously timing out.

    curl -v http://your_domain.com/path/to/slow/endpoint

    Look for a 200 OK response.

  2. Monitor Logs Continuously: Keep an eye on both your Nginx/Apache error logs and your WSGI server logs during testing:

    • tail -f /var/log/nginx/error.log
    • sudo journalctl -u gunicorn -f (or uwsgi) Ensure no new 504 errors or upstream communication failures appear.
  3. Performance Monitoring: During heavy load, continue to use top, htop, or more advanced APM tools to observe CPU, memory, and process activity. This helps confirm that your adjustments haven’t just moved the bottleneck elsewhere or created new resource constraints.

By systematically working through these steps, you should be able to identify and resolve the Python 504 Gateway Timeout on your Ubuntu 20.04 server. Remember to apply changes incrementally and test thoroughly.