FS#79674 - [ipset] iptables rare race condition nullifies simple firewall configs

Attached to Project: Arch Linux
Opened by Daniel Kruyt (danielkruyt) - Tuesday, 12 September 2023, 20:12 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:19 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sébastien Luttringer (seblu)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

- Overview:

The ipset.service is specified as "Before=(...) iptables.service", however, this is insufficient. There is a race condition on ipset restoring fast enough. If an ipset doesn't exist, the entire iptables-restore fails.

This can essentially take down a major part of the system firewall in some cases, as it did in mine: I used simple hand-written iptables rules with ipset to whitelist a small number of hosts and otherwise reject everything.

- Package versions and random details details:

core/iptables-nft 1:1.8
extra/ipset 7.13.1
extra/linux-hardened 6.4.10.hardened1-1
I believe this will occur on earlier versions as well on the legacy iptables.
BTRFS filesystem on an SSD, for what it's worth.


- Steps to reproduce:

(essentially what is on the ArchWiki pages for iptables and ipset)

create some ipsets, and iptables rules that use them:
# ipset --create ...
# iptables -A ...

persist the changes:

# ipset save > /etc/ipset.conf
# iptables-save > /etc/iptables/iptables.conf
$ systemctl enable ipset iptables

... then reboot N hundred times until one day you're unlucky, the server is suffering for unknown reasons, and then

$ systemctl status iptables

reports a failure due to (roughly) "ipset BLABLABLA doesn't exist", despite

# systemctl status ipset

showing that it loaded successfully.

Apologies, I've only experienced this 'naturally' once, and didn't save the printed status of iptables.service before rebooting to mitigate damage. For ease of explanation, I replicated it by deleting one of the ipsets from /etc/ipset.conf. The message of the original bug was identical to the below, to my memory:

# systemctl status iptables [root@hostname | 21:23:42]
× iptables.service - IPv4 Packet Filtering Framework
Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Tue 2023-09-12 21:22:28 SAST; 1min 17s ago
Process: 320 ExecStart=/usr/bin/iptables-restore /etc/iptables/iptables.rules (code=exited, status=2)
Main PID: 320 (code=exited, status=2)

Sep 12 21:22:28 hostname systemd[1]: Starting IPv4 Packet Filtering Framework...
Sep 12 21:22:28 hostname iptables-restore[320]: iptables-restore v1.8.9 (nf_tables): Set MY-CUSTOM-IPSET-NAME doesn't exist.
Sep 12 21:22:28 hostname iptables-restore[320]: Error occurred at line: 15
Sep 12 21:22:28 hostname iptables-restore[320]: Try `iptables-restore -h' or 'iptables-restore --help' for more information.
Sep 12 21:22:28 hostname systemd[1]: iptables.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 12 21:22:28 hostname systemd[1]: iptables.service: Failed with result 'exit-code'.
Sep 12 21:22:28 hostname systemd[1]: Failed to start IPv4 Packet Filtering Framework.

When I then inspected output of

# iptables-save

it showed that all my hand-written rules were missing, I presume iptables-restore is atomic. Only rules added dynamically like fail2ban ('redundant' security :-) and docker are present.

- Possible solution? :

(Forgive me for the crudenss, I am not a systemd wizard.) I thought of putting a 100ms sleep before attempting to run iptables-restore, i.e. add the directive

ExecStartPre=/bin/sleep 0.1

to the [Service] section of '/usr/lib/systemd/system/iptables.service'

It may be necessary to do this (or another, better change) to nftables and ip6tables services as well, I haven't delved any deeper at this point.
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:19 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/ipset/issues/1
Comment by Sébastien Luttringer (seblu) - Saturday, 30 September 2023, 11:55 GMT
If I understand correctly your issue, the ipset systemd service claims it restored your set, but it wasn't.
Unfortunately we'll need a repructible case to investigate more.

Using an arbitrary delay in booting is often a bad solution. The question is why ipset didn't restore your set while displaying no error in logs.
Comment by Daniel Kruyt (danielkruyt) - Sunday, 01 October 2023, 12:41 GMT
> ipset systemd service claims it restored your set, but it wasn't.

Hmm, my memory of what happened was: `ipset` correctly restored the set, but the restoration occurred _after_ `iptables-restore` was called.

I see now that I didn't write that the sets did exist as well as `ipset.service` showing success.... It makes me doubt my memory, but I do recall the sets existing when listed with `ipset -L`... it is possible that I am misremembering.

I believed this error was due to the fact that the IO is occurring on NVMe-over-fabrics, and a random chance allowed `iptables-restore` to read `/etc/iptables/iptables.conf` faster than `ipset` could read `/etc/ipset.conf`, despite the systemd order requirement (ie. CPU was faster than IO). However, I just tried to replicate this effect with a sleep in ipset, and it doesn't work: I see after some reading that the `Type=oneshot` makes dependencies start only after the process exits, not starts...

I strongly agree that the 100ms delay is not a valid solution, but at the time I didn't know that above-mentioned fact about `oneshot` services. I am stumped; if there was a race condition involved, it may be in the kernel..? Does anyone know any examples of people testing for race conditions in these kinds of userland-kernel intermingling-space that I might take a look at?

I will edit my warning on the ipset wiki page just to mention that this issue has happened once and has unknown cause, so use redundant security and set up a monitoring script that ensures your basic firewall rules are properly loaded if you need 100% assurance.

Loading...