fail2ban is a software which can be used to monitor service log files and ban IP addresses which executed a brute force attack or tried to use the mail server as a spam relay. In the default configuration in Debian GNU/Linux only SSH login attempts are monitored which works pretty nice. But when you try to add more services then you may run into the problem that fail2ban no longer starts up correctly. The log file contains errors like this:
fail2ban.actions.action: ERROR iptables -N fail2ban-ssh iptables -A fail2ban-ssh -j RETURN iptables -I INPUT -p tcp -m multiport --dports ssh -j fail2ban-ssh returned 200
I searched on the net but only found more victims of this problem, no solution. So I analyzed what was going on here and I finally figured it out.
The commands which are executed to setup the iptables rules for each service are located in
actionstart = iptables -N fail2ban-<name> iptables -A fail2ban-<name> -j RETURN iptables -I INPUT -p <protocol> -m multiport --dports <port> -j fail2ban-<name>
First I thought this is a strange timing problem and added some sleep calls between the iptables calls but that didn't help. Then I wrote a small shell script which wrapped iptables and logged the real error messages instead of simply dumping the return code and got this error message:
iptables: Resource temporarily unavailable.
But more importantly: The logged messages coming from my wrapper script were in a pretty chaotic order which proved that fail2ban runs iptables simultaneously in multiple threads. I guess it starts a thread for each configured service and each of these threads then call iptables to setup the firewall rules. And this causes the Resource temporarily unavailable error message because iptables doesn't like to be executed more than once at the same time.
I'm pretty sure there is a nice way to fix this in the python code of fail2ban but I leave this to others and instead I fixed it by changing the configuration to this:
actionstart = flock /var/lock/fail2ban -c "iptables -N fail2ban-<name>" flock /var/lock/fail2ban -c "iptables -A fail2ban-<name> -j RETURN" flock /var/lock/fail2ban -c "iptables -I INPUT -p <protocol> \ -m multiport --dports <port> -j fail2ban-<name>"
flock tries to aquire a lock file (I use
/var/lock/fail2ban here) and waits until it was able to aquire it. Only then it executes the specified iptables command and then it removes the lock file. So this synchronizes all calls to iptables during startup and fixes the issue. Maybe it is a good idea to do this with all iptables actions in this file just in case fail2ban tries to insert two IP blockings at the same time. Maybe I do this when I encounter more iptables errors in the log file later.