How to Fix Ansible Timeout Error on Ubuntu 22.04


Troubleshooting Guide: Ansible Timeout Error on Ubuntu 22.04

As a Senior DevOps Engineer, encountering an “Ansible Timeout Error” is a common, yet frustrating, experience. While the message is straightforward, its root cause can be multifaceted. This guide will systematically walk you through diagnosing and resolving these timeouts specifically when targeting Ubuntu 22.04 hosts from your Ansible controller.


1. The Root Cause: Why this happens on Ubuntu 22.04

An Ansible timeout error indicates that a connection attempt or a task execution exceeded a predefined time limit. On Ubuntu 22.04, this can stem from several common issues:

  • Network Latency or Congestion: The most frequent culprit. Slow or unstable network links between your Ansible controller and the Ubuntu 22.04 target host can cause SSH connection attempts or data transfer during task execution to exceed the default timeout.
  • Firewall Restrictions (UFW): Ubuntu 22.04 defaults to ufw (Uncomplicated Firewall). If port 22 (SSH) isn’t explicitly open, or if other ports required for specific tasks are blocked, Ansible connections will hang and eventually timeout.
  • SSH Server (sshd) Configuration on Target: The OpenSSH server on Ubuntu 22.04 has its own timeout mechanisms. If ClientAliveInterval or ClientAliveCountMax are too low, inactive connections might be prematurely terminated by the target, leading to Ansible timeouts. Conversely, an overloaded sshd or too many concurrent connections (e.g., MaxStartups limit) can prevent new connections.
  • Target Host Resource Exhaustion: An Ubuntu 22.04 server struggling with high CPU, low memory, or slow disk I/O can delay command execution significantly, causing Ansible tasks to time out even if the initial SSH connection was successful.
  • Ansible’s Default Timeout Settings: Ansible itself has a default connection timeout of 10 seconds. For environments with higher latency or complex initial SSH handshakes, this default is often insufficient. Task execution timeouts are generally controlled by the module or by async operations.
  • DNS Resolution Issues: If your controller cannot quickly resolve the hostname of your Ubuntu 22.04 target, the SSH connection setup will delay, potentially leading to a timeout.

2. Quick Fix (CLI)

Before diving into configuration files, these CLI-based steps can help you quickly identify and often mitigate timeout issues for immediate testing.

  1. Increase Ansible’s Connection Timeout (Ad-hoc Command): This is the fastest way to test if a longer timeout resolves the issue.

    # Test with a ping module, setting the timeout to 30 seconds
    ansible your_ubuntu_host -m ping -i inventory.ini -e 'ansible_ssh_common_args="-o ConnectTimeout=30"'

    Or, using the general Ansible timeout:

    ANSIBLE_TIMEOUT=30 ansible your_ubuntu_host -m ping -i inventory.ini

    Replace your_ubuntu_host and inventory.ini with your actual host and inventory file.

  2. Verify SSH Connectivity Manually: Attempt to connect directly to the Ubuntu 22.04 host using SSH from your Ansible controller. Add verbose output (-vvv) for detailed debugging.

    ssh -vvv user@your_ubuntu_host

    Look for hangs, specific error messages, or delays in the output. This helps differentiate between an Ansible-specific issue and a fundamental SSH connectivity problem.

  3. Check Network Reachability: A simple ping can confirm basic network connectivity.

    ping -c 4 your_ubuntu_host

    High packet loss or very high latency (e.g., >200ms) indicates a network issue that needs addressing outside of Ansible.


3. Configuration Check

For persistent solutions, you’ll need to adjust configuration files on both your Ansible controller and the target Ubuntu 22.04 hosts.

3.1. Ansible Controller Configuration (ansible.cfg)

Edit your global ansible.cfg (e.g., /etc/ansible/ansible.cfg) or a project-specific ansible.cfg file.

  1. Increase timeout: Modify the timeout parameter in the [defaults] section to a higher value (e.g., 30 or 60 seconds).

    # /etc/ansible/ansible.cfg or project_path/ansible.cfg
    [defaults]
    timeout = 30

    Explanation: This sets the general timeout for various Ansible operations, including connection establishment.

  2. Configure SSH Arguments for Connection Stability: In the [ssh_connection] section, you can add specific SSH client arguments.

    [ssh_connection]
    # Use ControlPersist for faster subsequent connections
    ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickey -o ConnectTimeout=30
    # Enable pipelining for faster execution of tasks
    # Ensure 'requiretty' is disabled in sudoers if using pipelining with become
    pipelining = True

    Explanation:

    • ControlMaster=auto and ControlPersist=60s: This reuses an existing SSH connection, significantly speeding up subsequent tasks and reducing the chance of timeout on new connections. 60s keeps the master connection open for 60 seconds of inactivity.
    • ConnectTimeout=30: Explicitly sets the SSH client-side connection timeout to 30 seconds.
    • pipelining = True: Reduces the number of SSH operations by executing multiple commands over a single connection. Requires requiretty to be off for sudo on the target if using become.

3.2. Target Ubuntu 22.04 Host SSH Server Configuration (sshd_config)

If the issue is dropped connections, you might need to adjust the server-side SSH keep-alive settings on your Ubuntu 22.04 hosts.

  1. Edit sshd_config:

    sudo nano /etc/ssh/sshd_config

    Add or modify the following lines:

    # /etc/ssh/sshd_config
    ClientAliveInterval 60
    ClientAliveCountMax 5

    Explanation:

    • ClientAliveInterval 60: The sshd server will send a null packet to the client every 60 seconds if no data has been received from the client. This keeps the connection alive through network devices that might otherwise close idle connections.
    • ClientAliveCountMax 5: If 5 consecutive client alive messages are sent without a response from the client, sshd will disconnect the client.
    • Combined, this means the connection will be terminated after 60 * 5 = 300 seconds (5 minutes) of unresponsiveness. Adjust ClientAliveCountMax higher if your network is extremely unreliable, but be mindful of resource usage.
  2. Restart SSH Service: After modifying sshd_config, you must restart the SSH service for changes to take effect.

    sudo systemctl restart sshd

3.3. Target Ubuntu 22.04 Host Firewall (UFW)

Ensure SSH traffic is explicitly allowed on your Ubuntu 22.04 hosts.

  1. Check UFW Status:

    sudo ufw status

    Look for a rule allowing OpenSSH or port 22.

  2. Allow OpenSSH: If not allowed, enable it:

    sudo ufw allow OpenSSH
    # or if you prefer by port
    # sudo ufw allow 22/tcp
  3. Enable UFW (if disabled): If UFW is inactive, you might need to enable it, but be careful as this will activate all configured rules.

    sudo ufw enable

4. Verification

After making changes, it’s crucial to verify that the timeout issues are resolved.

  1. Rerun Your Original Playbook/Command: The most direct way to verify is to run the exact Ansible playbook or ad-hoc command that was previously timing out.

    # If using a playbook
    ansible-playbook your_playbook.yml -i inventory.ini
    
    # If using an ad-hoc command
    ansible your_ubuntu_host -m setup -i inventory.ini # (or your specific module)
  2. Monitor with Increased Verbosity: If the issue persists, run your Ansible command with -vvv for detailed output. This can reveal where exactly the process is hanging.

    ansible-playbook your_playbook.yml -i inventory.ini -vvv

    Examine the output for clues like “Authentication failed,” “Connection refused,” or long pauses at specific task points.

  3. Check ControlPersist (if enabled): If you enabled ControlPersist, after a successful run, check for the existence of the control socket:

    # The default path is usually /home/user/.ansible/cp or /tmp
    ls -l ~/.ansible/cp/ansible-ssh-*

    The presence of these files indicates that ControlPersist is active and should speed up subsequent connections.

  4. Observe System Logs: On the target Ubuntu 22.04 host, check the SSH daemon logs for connection issues:

    journalctl -u sshd | tail -f

    Look for messages indicating failed authentication, dropped connections, or MaxStartups warnings.

By systematically applying these steps, you should be able to diagnose and resolve most Ansible timeout errors when working with Ubuntu 22.04 hosts. Remember to test changes incrementally and revert if they introduce new issues.