How to Fix Ansible Timeout Error on CentOS 7
Troubleshooting Ansible Timeout Errors on CentOS 7
As a Senior DevOps Engineer, few things are as frustrating as an Ansible run stalling with a “timeout” error. While the message is direct, the underlying causes can range from simple network hiccups to nuanced system configurations. This guide focuses specifically on diagnosing and resolving Ansible timeout issues when targeting CentOS 7 hosts.
1. The Root Cause: Why this Happens on CentOS 7
An Ansible timeout error signifies that the Ansible control node failed to establish an SSH connection or receive a response from the target host within the allotted time. On CentOS 7, several factors frequently contribute to this:
- Aggressive Firewall Rules (
firewalld): CentOS 7 utilizesfirewalldby default. A common scenario is that the target host’s firewall is blocking incoming SSH connections (port 22). This is especially true for freshly provisioned VMs or servers. - SSH Daemon Configuration (
sshd_config): The default SSH server configuration on CentOS 7 might have conservative settings forLoginGraceTime(how long the server waits for a login) or lackClientAliveIntervalandClientAliveCountMax(SSH keep-alive mechanisms), leading to dropped connections or timeouts during prolonged operations, or even during initial handshake if the network is slow. - Network Latency or Packet Loss: High latency or significant packet loss between the Ansible control node and the CentOS 7 target can naturally lead to timeouts as SSH handshakes and command executions take longer than expected.
- SELinux Interference: While less common for a pure “timeout” during initial connection, SELinux in
enforcingmode can block SSH or subsequent Ansible operations, causing commands to hang and eventually time out if it prevents necessary file access or process execution. - Target Host Resource Exhaustion: If the CentOS 7 target host is under severe load (CPU, memory, I/O), its ability to respond to SSH connection attempts or execute commands promptly can be impaired, leading to timeouts.
- Ansible Control Node SSH Client/Ansible Configuration: The Ansible control node itself might have a low default SSH timeout in its
ansible.cfgor the underlying SSH client configuration (~/.ssh/configor/etc/ssh/ssh_config).
2. Quick Fix (CLI)
These commands are for immediate diagnosis and temporary resolution. Use with caution in production environments.
-
Verify Basic Connectivity: First, ensure the target host is reachable on the network and SSH port.
# From your Ansible Control Node ping <target_host_ip> telnet <target_host_ip> 22 ssh -v <user>@<target_host_ip>ping: Checks basic ICMP reachability.telnet: Confirms if port 22 is open and listening. A “Connected” message is good.ssh -v: Provides verbose output for the SSH connection attempt, invaluable for diagnosing SSH handshake issues. Look for “Permission denied”, “Connection timed out”, or “Connection refused” messages.
-
Temporarily Disable Firewall (on Target CentOS 7 Host): If
telnetfails orssh -vshows “Connection refused” immediately, the firewall is a prime suspect.# On the Target CentOS 7 Host (via console or alternative access) sudo systemctl stop firewalld sudo systemctl disable firewalld # Optional, but prevents it from starting on reboot- Caution: This completely disables the firewall. Do not do this permanently in a production environment.
-
Temporarily Set SELinux to Permissive Mode (on Target CentOS 7 Host): If the connection establishes but Ansible tasks hang, SELinux might be interfering.
# On the Target CentOS 7 Host (via console or alternative access) sudo setenforce 0- Caution: This sets SELinux to permissive mode, logging but not enforcing security policies. Do not do this permanently in a production environment without proper analysis.
-
Increase Ad-Hoc Ansible SSH Timeout: For a quick test, you can override the SSH timeout for a specific Ansible command.
# From your Ansible Control Node ansible -m ping <target_host_group_or_name> -i /path/to/inventory.ini -e ansible_ssh_timeout=30- The
ansible_ssh_timeoutvariable sets the SSH connection timeout in seconds. Try increasing it from the default (10 seconds) to30or60.
- The
3. Configuration Check
For robust, permanent solutions, modify the relevant configuration files.
-
Target CentOS 7 Host - Firewall (
firewalld): Instead of disabling, properly open the SSH port.# On the Target CentOS 7 Host sudo firewall-cmd --add-service=ssh --permanent sudo firewall-cmd --reload- This command permanently adds the SSH service to the active firewall zone and reloads the rules without dropping existing connections.
-
Target CentOS 7 Host - SSH Daemon (
/etc/ssh/sshd_config): Edit this file to adjust connection parameters.# On the Target CentOS 7 Host sudo vi /etc/ssh/sshd_configAdd or modify the following lines:
LoginGraceTime 60s: Increases the time allowed for a client to authenticate after connecting. Default is often 2 minutes, but some systems might have less.ClientAliveInterval 30: The server sends a null packet to the client if no data has been received within 30 seconds. This keeps the connection alive.ClientAliveCountMax 5: IfClientAliveIntervalpackets are sent and no response is received for 5 consecutive times, the server will disconnect the client. (30s * 5 = 150s max idle before disconnect).
After editing, restart the SSH service:
sudo systemctl restart sshd -
Ansible Control Node - Global Ansible Configuration (
/etc/ansible/ansible.cfgor~/.ansible.cfg): Modify the global timeout settings for Ansible.# On the Ansible Control Node vi /etc/ansible/ansible.cfgUnder the
[defaults]section, add or modify:[defaults] timeout = 30 ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickeytimeout: This is Ansible’s default connection timeout in seconds. Increase it to30,60, or even120for very slow networks.ssh_args:ControlMaster=autoandControlPersist=60s: These settings allow Ansible to reuse existing SSH connections for subsequent tasks within a playbook run, significantly reducing connection overhead and potential timeouts.ControlPersistspecifies how long the master connection remains open after the last client connection closes.
-
Ansible Control Node - Inventory File (
/path/to/inventory.ini): You can set host-specific timeouts in your inventory for problematic hosts.[webservers] web1.example.com ansible_host=192.168.1.10 ansible_ssh_timeout=45 web2.example.com ansible_host=192.168.1.11- This
ansible_ssh_timeoutvariable will override the globaltimeoutsetting forweb1.example.com.
- This
-
Target CentOS 7 Host - SELinux (Permanent, if needed): If you definitively determine SELinux is causing the issue beyond initial SSH (e.g., specific Ansible modules failing with permission denied after connection), you might need to manage it.
- Recommended: Do not permanently disable SELinux. Instead, identify the specific SELinux denial using
sudo ausearch -c sshd -m avc --raworsudo journalctl -t audit | grep AVCand create a custom SELinux policy module to allow the necessary action. - Last Resort (not recommended for production): Change
/etc/selinux/configtoSELINUX=permissiveand reboot. This makes the change persistent but reduces security.
- Recommended: Do not permanently disable SELinux. Instead, identify the specific SELinux denial using
4. Verification
After applying any configuration changes, it’s crucial to verify the fix.
-
Direct SSH Connectivity Test: From your Ansible control node, ensure you can manually SSH into the target host without issues.
ssh <user>@<target_host_ip> -
Ansible Ad-Hoc Ping Test: Run a simple Ansible ad-hoc command.
ansible -m ping <target_host_group_or_name> -i /path/to/inventory.iniA successful response should look like:
<target_host_ip> | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python" }, "changed": false, "ping": "pong" } -
Run a Simple Playbook: Create and run a minimal playbook to ensure all steps of a typical Ansible run complete without timeouts.
test_playbook.yml:--- - name: Test connectivity and uptime hosts: <target_host_group_or_name> tasks: - name: Get uptime information command: uptime register: uptime_output - name: Print uptime debug: var: uptime_output.stdoutExecute:
ansible-playbook -i /path/to/inventory.ini test_playbook.yml -
Increase Verbosity for Deeper Insight: If issues persist, use Ansible’s verbose output to pinpoint the exact failure point.
ansible -m ping <target_host> -i /path/to/inventory.ini -vvv ansible-playbook -i /path/to/inventory.ini test_playbook.yml -vvvThe
-vvv(or even-vvvv) flag will provide detailed SSH and Ansible execution logs, which can often highlight the specific point of failure or the cause of the delay leading to a timeout.
By methodically working through these steps, you should be able to diagnose and resolve most Ansible timeout errors encountered when managing CentOS 7 hosts.