FS#56574 - [linux] excessive memory usage with kernel 4.14.3-1

Attached to Project: Arch Linux
Opened by tom (archtom) - Monday, 04 December 2017, 17:50 GMT
Last edited by Toolybird (Toolybird) - Sunday, 28 May 2023, 06:22 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
I´m sorry in advance I am not able to provide a lot of information about this but installing kernel 4.14.3-1 leads to an excessive use of memory tested on two machines. A system that needs around 500 MB now uses around 15 GB. Unfortunately neither "top" nore "ps aux" as user or root does show a process with that high usage.

Downgrading to the latest 4.13 kernel solves the problem.

Additional info:
* package version(s)
Rest of the packages are up-to-date.

* config and/or log files etc.
Sorry, nothing I could provide

Steps to reproduce:
Update to kernel 4.14.3-1, wait for a few minutes after logging in a user and run top or htop to see the memory usage. I see it in my conky on the desktop.
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 28 May 2023, 06:22 GMT
Reason for closing:  No response
Additional comments about closing:  Plus it's old and stale. Assuming fixed.
Comment by Doug Newgard (Scimmia) - Tuesday, 05 December 2017, 17:37 GMT
Post the top output.

Edit: Actually, free -h would be better for this.
Comment by tom (archtom) - Wednesday, 06 December 2017, 06:42 GMT
Hey,

thanks for the help in advance. First of all. I already tried 4.14.4 from testing and the problem is still there.

Here are the requested outputs from a virtualbox. If I increase the RAM for the vbox the actual RAM used increases, too.

Usually the system takes around 500 MB of RAM with all the stuff running.

[user@archvbox ~]$ free -h
gesamt benutzt frei gemns. Puffer/Cache verfügbar
Speicher: 7,8G 4,4G 2,7G 12M 802M 3,2G
Swap: 4,0G 78M 3,9G

top USER output:
[user@archvbox ~]$ top
top - 07:34:30 up 8 min, 1 user, load average: 0,14, 0,76, 0,53
Tasks: 36 total, 2 running, 34 sleeping, 0 stopped, 0 zombie
%CPU0 : 2,0/1,3 3[|| ]
%CPU1 : 1,3/1,3 3[|| ]
%CPU2 : 1,3/0,0 1[| ]
%CPU3 : 1,4/0,7 2[| ]
GiB Spch: 58,9/7,796 [ ]
GiB Swap: 1,9/4,000 [ ]

PID USER PR NI VIRT RES %CPU %MEM ZEIT+ S BEFEHL
5780 tom 20 0 80,9m 7,7m 0,0 0,1 0:00.03 S systemd
5847 tom 20 0 37,9m 3,7m 0,0 0,0 0:00.02 S `- dbus-daemon
5934 tom 20 0 275,5m 6,9m 0,0 0,1 0:00.00 S `- gvfsd
5940 tom 20 0 404,0m 5,7m 0,0 0,1 0:00.00 S `- gvfsd-fuse
6112 tom 9 -11 856,8m 11,8m 0,0 0,1 0:00.52 S `- pulseaudio
6173 tom 20 0 68,5m 5,8m 0,0 0,1 0:00.00 S `- gconfd-2
6185 tom 20 0 55,3m 4,6m 0,0 0,1 0:00.00 S `- xfconfd
6188 tom 20 0 349,9m 7,1m 0,0 0,1 0:00.01 S `- gvfsd-trash
6809 tom 20 0 408,7m 7,6m 0,0 0,1 0:00.01 S `- zeitgeist-daemo
6828 tom 20 0 272,1m 9,5m 0,0 0,1 0:00.00 S `- zeitgeist-fts
5817 tom 20 0 378,3m 21,5m 0,0 0,3 0:00.28 S openbox
5971 tom 20 0 66,0m 5,2m 0,0 0,1 0:00.17 S compton
5978 tom 20 0 48,2m 0,5m 0,0 0,0 0:00.00 S VBoxClient
5979 tom 20 0 114,9m 4,4m 0,0 0,1 0:00.00 S `- VBoxClient
5989 tom 20 0 48,2m 0,5m 0,0 0,0 0:00.00 S VBoxClient
5990 tom 20 0 48,2m 2,9m 0,0 0,0 0:00.00 S `- VBoxClient
5995 tom 20 0 48,2m 0,5m 0,0 0,0 0:00.00 S VBoxClient
5996 tom 20 0 112,7m 2,4m 0,0 0,0 0:00.00 S `- VBoxClient
6001 tom 20 0 48,2m 0,5m 0,0 0,0 0:00.00 S VBoxClient
6002 tom 20 0 113,2m 2,4m 0,0 0,0 0:00.42 S `- VBoxClient
6088 tom 20 0 201,5m 16,7m 0,0 0,2 0:00.10 S tint2
6089 tom 20 0 609,3m 71,1m 0,0 0,9 0:01.26 S docky
6091 tom 20 0 723,5m 29,5m 0,0 0,4 0:00.11 S volumeicon
6109 tom 20 0 1338,4m 10,7m 1,3 0,1 0:03.50 S conky
6175 tom 20 0 418,9m 24,7m 0,0 0,3 0:00.17 S xfce4-notifyd
6337 tom 20 0 908,9m 37,3m 0,7 0,5 0:02.05 R xfce4-terminal
6341 tom 20 0 24,6m 4,0m 0,0 0,0 0:00.00 S `- bash
8013 tom 20 0 23,1m 4,1m 0,0 0,1 0:00.41 S `- htop
9427 tom 20 0 24,6m 3,9m 0,0 0,0 0:00.00 S `- bash
9463 tom 20 0 40,2m 3,7m 0,7 0,0 0:00.33 S `- top
11531 tom 20 0 24,6m 3,8m 0,0 0,0 0:00.00 S `- bash
11567 tom 20 0 40,3m 3,7m 0,7 0,0 0:00.11 R `- top
12226 tom 20 0 24,6m 3,8m 0,0 0,0 0:00.00 S `- bash
6783 tom 20 0 531,5m 51,9m 0,0 0,6 0:00.62 S kalu

top ROOT output:
top - 07:35:07 up 8 min, 1 user, load average: 0,15, 0,68, 0,51
Tasks: 249 total, 1 running, 149 sleeping, 0 stopped, 0 zombie
%CPU0 : 1,3/0,7 2[| ]
%CPU1 : 0,0/1,3 1[| ]
%CPU2 : 1,3/0,7 2[| ]
%CPU3 : 0,7/0,7 1[ ]
GiB Spch: 58,9/7,796 [ ]
GiB Swap: 1,9/4,000 [ ]

PID USER PR NI VIRT RES %CPU %MEM ZEIT+ S BEFEHL
1 root 20 0 217,7m 5,3m 0,0 0,1 0:01.37 S systemd
406 root 20 0 158,1m 0,0m 0,0 0,0 0:00.00 S `- lvmetad
415 root 20 0 84,9m 2,6m 0,0 0,0 0:00.28 S `- systemd-udevd
491 root 20 0 108,3m 11,2m 0,0 0,1 0:00.36 S `- systemd-journal
535 systemd+ 20 0 150,9m 0,8m 0,0 0,0 0:00.02 S `- systemd-timesyn
541 root 20 0 11,8m 0,0m 0,0 0,0 0:00.30 S `- haveged
542 root 20 0 493,4m 3,8m 0,0 0,0 0:00.08 S `- udisksd
543 dbus 20 0 39,0m 3,2m 0,0 0,0 0:00.41 S `- dbus-daemon
547 root 20 0 19,6m 1,2m 0,0 0,0 0:00.00 S `- crond
549 root 20 0 68,7m 3,0m 0,0 0,0 0:00.08 S `- systemd-logind
556 root 20 0 250,1m 0,0m 0,0 0,0 0:00.16 S `- VBoxService
643 root 20 0 27,0m 0,0m 0,0 0,0 0:00.00 S `- ossec-execd
647 ossec 20 0 28,5m 2,5m 0,0 0,0 0:00.69 S `- ossec-analysisd
651 root 20 0 19,1m 1,9m 0,0 0,0 0:00.12 S `- ossec-logcollec
682 root 20 0 285,0m 1,8m 0,0 0,0 0:00.06 S `- cubesql
687 redis 20 0 45,2m 1,3m 0,0 0,0 0:00.61 S `- redis-server
689 root 20 0 45,9m 0,0m 0,0 0,0 0:00.00 S `- sshd
692 root 20 0 347,2m 5,2m 0,0 0,1 0:00.13 S `- lightdm
5302 root 20 0 656,1m 74,7m 1,3 0,9 0:04.10 S `- Xorg
5654 root 20 0 255,9m 7,1m 0,0 0,1 0:00.01 S `- lightdm
5817 tom 20 0 378,3m 21,5m 0,0 0,3 0:00.28 S `- openbox
693 dnsmasq 20 0 53,2m 0,0m 0,0 0,0 0:00.00 S `- dnsmasq
698 root 20 0 278,4m 4,4m 0,0 0,1 0:00.18 S `- accounts-daemon
706 root 20 0 124,9m 0,0m 0,0 0,0 0:00.00 S `- nginx
707 http 20 0 129,4m 0,0m 0,0 0,0 0:00.00 S `- nginx
708 http 20 0 129,4m 0,0m 0,0 0,0 0:00.00 S `- nginx
709 http 20 0 129,4m 0,0m 0,0 0,0 0:00.00 S `- nginx
710 http 20 0 129,4m 0,0m 0,0 0,0 0:00.00 S `- nginx
711 http 20 0 129,4m 0,6m 0,0 0,0 0:00.01 S `- nginx
797 nobody 20 0 74,8m 1,3m 0,0 0,0 0:00.01 S `- proftpd
809 root 20 0 19,6m 1,9m 0,0 0,0 0:02.03 S `- ossec-syscheckd
813 ossec 20 0 27,1m 0,8m 0,0 0,0 0:00.00 S `- ossec-monitord
1082 root 20 0 27,0m 1,9m 0,0 0,0 0:00.01 S `- ossec-maild
1316 rtkit 21 1 176,9m 0,9m 0,0 0,0 0:00.00 S `- rtkit-daemon

A strange thing is that top shows another amount of used RAM than htop:
htop head output:
1 [|| 2.0%] Tasks: 36, 58 thr; 1 running
2 [|| 1.3%] Load average: 0.08 0.60 0.49
3 [|| 1.4%] Uptime: 00:09:24
4 [|| 3.4%]
Mem[||||||||||||||||||||4.37G/7.80G]
Swp[| 78.2M/4.00G]



Comment by Jan Alexander Steffens (heftig) - Wednesday, 06 December 2017, 08:24 GMT
slabtop will report on kernel memory use (sort by cache size by pressing c).
Comment by tom (archtom) - Wednesday, 06 December 2017, 08:47 GMT
slabtop output as root:

Objekte aktiv / gesamt (% benutzt) : 43515341 / 43774624 (99,4%)
Slabs aktiv / gesamt (% benutzt) : 689616 / 689616 (100,0%)
Caches aktiv / gesamt (% benutzt) : 80 / 105 (76,2%)
Größe aktiv / gesamt (% benutzt) : 2758162,48K / 2780157,10K (99,2%)
Minimum / Average / Maximum Object : 0,01K / 0,06K / 8,00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
43292864 43052398 0% 0,06K 676451 64 2705804K kmalloc-64

293412 293412 100% 0,09K 6986 42 27944K kmalloc-96
11284 9588 0% 0,59K 434 26 6944K inode_cache
4440 2111 0% 1,06K 148 30 4736K ext4_inode_cache
24360 13742 0% 0,19K 1160 21 4640K dentry
28980 28980 100% 0,13K 966 30 3864K kernfs_node_cache
18040 17681 0% 0,20K 902 20 3608K vm_area_struct
720 644 0% 3,69K 90 8 2880K task_struct
3976 3190 0% 0,57K 142 28 2272K radix_tree_node
1952 1910 0% 1,00K 122 16 1952K kmalloc-1024
240 230 0% 8,00K 60 4 1920K kmalloc-8192
6768 5838 0% 0,25K 423 16 1692K kmalloc-256
624 620 0% 2,00K 39 16 1248K kmalloc-2048
1610 1610 100% 0,70K 70 23 1120K shmem_inode_cache
16640 15695 0% 0,06K 260 64 1040K anon_vma_chain
9522 8960 0% 0,09K 207 46 828K anon_vma
375 375 100% 2,06K 25 15 800K sighand_cache
184 180 0% 4,00K 23 8 736K kmalloc-4096
1312 1259 0% 0,50K 82 16 656K kmalloc-512
528 480 0% 1,00K 33 16 528K signal_cache
433 343 0% 0,94K 27 17 432K RAW
2247 2247 100% 0,19K 107 21 428K kmalloc-192
3822 3822 100% 0,10K 98 39 392K buffer_head
552 552 100% 0,66K 23 24 368K proc_inode_cache
1890 1625 0% 0,19K 90 21 360K cred_jar
506 506 100% 0,69K 22 23 352K sock_inode_cache
2368 2368 100% 0,12K 74 32 296K kmalloc-128
609 609 100% 0,38K 29 21 232K mnt_cache
3942 3942 100% 0,05K 54 73 216K Acpi-Parse
6272 6272 100% 0,03K 49 128 196K kmalloc-32
4080 4080 100% 0,05K 48 85 192K ftrace_event_field
150 150 100% 1,06K 5 30 160K dmaengine-unmap-128
125 125 100% 1,25K 5 25 160K UDPv6
207 207 100% 0,69K 9 23 144K files_cache
Comment by tom (archtom) - Wednesday, 06 December 2017, 08:49 GMT
additionally I´m getting this in the logs:

ipset v6.34: Error in line 8313: Kernel error received: Cannot allocate memory

I don`t know if this error occurs before or after the memory kernel problem.
Comment by tom (archtom) - Wednesday, 06 December 2017, 10:11 GMT
Ok, if I disable ipset on boot the excessive memory usage does not occur.

The archwiki here

https://wiki.archlinux.org/index.php/Ipset

says to use this script to update sources.

https://github.com/ilikenwf/pg2ipset/blob/master/ipset-update.sh

As soon as I run the script manually the excessive memory usage starts again. If I then stop ipset service the memory usage keeps being high. Reboot without running the script solves it. It does not happen with 4.13 kernel series. The scri Now I don`t know if it is a kernel thing or if the script would have to be changed.

If it is the fault of the script it would be nice to know how to make it work.

Thanks in advance for any help.
Comment by Jan Alexander Steffens (heftig) - Wednesday, 06 December 2017, 10:14 GMT
Does "ipset list" report lots and lots of sets?
Comment by tom (archtom) - Wednesday, 06 December 2017, 10:15 GMT
Ok, if I disable ipset on boot the excessive memory usage does not occur.

The archwiki here

https://wiki.archlinux.org/index.php/Ipset

says to use this script to update sources.

https://github.com/ilikenwf/pg2ipset/blob/master/ipset-update.sh

As soon as I run the script manually the excessive memory usage starts again. If I then stop ipset service the memory usage keeps being high. Reboot without running the script solves it. It does not happen with 4.13 kernel series. The scri Now I don`t know if it is a kernel thing or if the script would have to be changed.

If it is the fault of the script it would be nice to know how to make it work.

Starting ipset itself as a service is not a problem, only running the script causes the issue.

Thanks in advance for any help.
Comment by tom (archtom) - Wednesday, 06 December 2017, 10:18 GMT
"ipset list" returns empty when enabling the service but not running the script after reboot.
Comment by loqs (loqs) - Wednesday, 06 December 2017, 18:22 GMT Comment by tom (archtom) - Monday, 18 December 2017, 17:41 GMT
Any news on this? I tried the latest mainline 4.15-rc4 kernel and the latest stable kernel 4.14.7 and the problem still persits.

Did anyone report this upstream or did anyone solve it by adjusting the script?

Thanks for any help in advance.
Comment by loqs (loqs) - Tuesday, 26 December 2017, 22:12 GMT
Is it still an issue in 4.14.9?
Do you think it is a bug you did not state how many entries the script is trying to add to ipset does the memory usage seem excessive to you for whatever number of entries the set is using?
Did you try reverting the five commits I noted previously to see if one of them was the cause?
As the sole suffer of the issue it was probably expected you would report / check if upstream was aware of the issue.
Comment by tom (archtom) - Sunday, 07 January 2018, 08:40 GMT
I´m sorry for the long respones time, I´ve been on holidays ;)

Yes, it is still an issue with the latest 4.14.11 kernel.

I´m sorry I can not really help solving this (except of testing and reporting back) as I don`t know how the script impacts the bug and what the exact cause of the error is.
Additionally I don`t know where to report kernel bugs upstream.
It would be really nice if someone could have a closer look if the script or the kernel is responsible and either modify the script in the right place or report it to the kernel bugs.

Thanks in advance
Comment by loqs (loqs) - Sunday, 07 January 2018, 12:10 GMT
https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html
Did you try reverting the five commits I noted previously to see if one of them was the cause that would seem to fall into the category of testing and reporting back.
Arch generally expects the affected user to work with the relevant upstream, others such as the package maintainer may choose to work with upstream on a bug at their discretion.
Comment by tom (archtom) - Sunday, 07 January 2018, 12:48 GMT
I gladly help solving this as good as I can. All I could do was tracking it down to ipset in combination with the script and the kernel.

Sorry, I don`t know how to revert these commits and build the kernel with theses commits reverted.

It would be very nice if someone could catch it up from here.

I will happily help testing a workaround and / or solution.

Comment by loqs (loqs) - Sunday, 07 January 2018, 14:00 GMT
I recommend first setting MAKEFLAGS in /etc/makepkg.conf to reduce build time https://wiki.archlinux.org/index.php/makepkg#Parallel_compilation
bsdtar -xvf linux-4.14.3.r0.g191314edb326-1.src.tar.gz
cd linux
makepkg -rsi # add bootloader entry if needed check the package works apart from the issue
cd linux/src/linux-stable
git revert -n 48596a8ddc46f96afb6a2cd72787cb15d6bb01fc
cd ../..
makepkg -rsief # check again then try reverting one at a time same method as above and retesting 7f4f7dd4417d9efd038b14d39c70170db2e0baa0 e23ed762db7ed1950a6408c3be80bc56909ab3d4 e5173418ac597cebe9f7a39adf10be470000b518
If the issue still persists you will need to do a bisection between 4.13 and 4.14.3 https://wiki.archlinux.org/index.php/Bisecting_bugs
Comment by tom (archtom) - Monday, 08 January 2018, 18:21 GMT
Thanks for the help. I built the original version and as expected the problem still existed.

When reverting commit 48596a8ddc46f96afb6a2cd72787cb15d6bb01fc
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/netfilter/ipset?id=48596a8ddc46f96afb6a2cd72787cb15d6bb01fc
everything works as expected again and the problem is gone.

What`s the next step?

Can the commit be reverted in the arch kernel as long as the issue is resolved upstream?
Who is reporting and following up the problem and the fix for the linux kernel source?
Comment by loqs (loqs) - Monday, 08 January 2018, 20:15 GMT
Is there a large difference between the number of entries in the set on the kernel with the issue and the kernel without the issue?
$ perl scripts/get_maintainer.pl net/netfilter/ipset/ip_set_hash_ip.c
Pablo Neira Ayuso <pablo@netfilter.org> (maintainer:NETFILTER,commit_signer:1/1=100%)
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> (maintainer:NETFILTER,commit_signer:1/1=100%,authored:1/1=100%,added_lines:12/12=100%,removed_lines:10/10=100%)
Florian Westphal <fw@strlen.de> (maintainer:NETFILTER)
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
netfilter-devel@vger.kernel.org (open list:NETFILTER)
coreteam@netfilter.org (open list:NETFILTER)
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-kernel@vger.kernel.org (open list)
Either report it to one of those mailing lists or https://bugzilla.netfilter.org product ipset. Arch in general expects one of those affected by the issue to work with upstream so that would be you.
Comment by tom (archtom) - Tuesday, 09 January 2018, 07:52 GMT
ok, thanks.

I reported the bug here:

https://bugzilla.netfilter.org/show_bug.cgi?id=1212

and hope it is solved quickly.

I will report back when solved. Thanks for all the help.
Comment by tom (archtom) - Wednesday, 10 January 2018, 20:09 GMT
The bug was already worked on and I shall try to apply a patch to the kernel build on top of a commit, but I´m sorry I don`t know how to do that.

If someone could help me out with that would be very nice.

Patch and explanation are shown here:
https://bugzilla.netfilter.org/show_bug.cgi?id=1212

Thanks
Comment by loqs (loqs) - Wednesday, 10 January 2018, 20:36 GMT
bsdtar -xvf linux-4.14.13-1.src.tar.gz
cd linux
makepkg -rsi # will install a patched 4.14.13-1
Comment by tom (archtom) - Thursday, 11 January 2018, 08:40 GMT
Thanks again for the fast help.

I built the latest kernel with this setting and the problem is solved ;)))

Thanks again. I will report the same thing to the netfilter bugtracker and hope the patch gets included in the kernel soon.
Comment by loqs (loqs) - Friday, 12 January 2018, 21:42 GMT
https://patchwork.ozlabs.org/patch/859720/ patch is now awaiting review then should make its way to some future kernels.
Comment by tom (archtom) - Saturday, 13 January 2018, 07:25 GMT
very good, thank you
Comment by Eli Schwartz (eschwartz) - Monday, 22 January 2018, 14:59 GMT
Please don't open duplicates like  FS#57192 

If you want the patch for *this* bug to be backported, ask in *this* bug report.
If you think this bug report should apply to linux-hardened as well, say so in *this* bug report.
Comment by tom (archtom) - Monday, 22 January 2018, 15:23 GMT
Sorry, I didn`t know about that. Yes, it would be very nice if the patch could be included in the hardened kernel until the patch makes it into the stable kernel.

Thanks a lot.
Comment by tom (archtom) - Friday, 02 March 2018, 07:58 GMT
The patch made it into the kernel, but only in the 4.16 branch.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.16-rc3&id=0b8d9073539e217f79ec1bff65eb205ac796723d

I don`t know why it wasn`t implemented in stable and lts.

As our server is still running out of memory constantly every few days it would really be nice to see the patch added to the archlinux hardened (and regular archlinux kernel) until 4.16 gets released.

Thanks a lot.
Comment by mattia (nTia89) - Sunday, 27 February 2022, 13:33 GMT
I cannot reproduce the issue. Is it still valid?

Loading...