FS#58323 - [linux] NUMA not detected with kernel 4.16.3 on AMD RYZEN 7451

Attached to Project: Arch Linux
Opened by Médéric Boquien (mboquien) - Monday, 23 April 2018, 21:23 GMT
Last edited by Jan Alexander Steffens (heftig) - Wednesday, 02 May 2018, 23:45 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

NUMA is not detected on an AMD RYZEN 7451. This causes performance issues as RYZEN CPU were conceived to make use of NUMA, with 4 nodes per CPU.

Additional info:
* The issue occurs with at least with the default kernels 4.15 and 4.16.
* This is not a hardware/hardware configuration issue. Booting the installation image of Ubuntu Server 18.04 Beta 2, the NUMA nodes are correctly detected.
* I attach the output of dmesg.
* The output of lscpu is the following:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7451 24-Core Processor
Stepping: 2
CPU MHz: 3184.113
CPU max MHz: 2300.0000
CPU min MHz: 1200.0000
BogoMIPS: 4601.86
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr ibpb arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
* On Ubuntu Server 18.04 Beta 2 this command displays 8 nodes (4 per processor)
* The output of numactl --show is the following:
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
cpubind: 0
nodebind: 0
membind: 0
* Please let me know what further information I could provide.

Steps to reproduce:
Boot an AMD RYZEN 7451.
   dmesg (126.1 KiB)
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Wednesday, 02 May 2018, 23:45 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 4.16.5-1
Comment by Gustavo Alvarez (sl1pkn07) - Tuesday, 24 April 2018, 15:31 GMT
for record, working(?) on Intel platform (dual Xeon E5-2650-v4)

└───╼ dmesg |grep NUMA
[ 0.000000] NUMA: Initialized distance table, cnt=2
[ 0.000000] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x87fffffff] -> [mem 0x00000000-0x87fffffff]
[ 0.000000] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[ 2.040181] pci_bus 0000:00: on NUMA node 0
[ 2.054105] pci_bus 0000:80: on NUMA node 1

greetings
Comment by Jan Alexander Steffens (heftig) - Tuesday, 24 April 2018, 16:26 GMT
Check your firmware (BIOS setup). The Threadripper and EPYC CPUs can run in a mode that hides the NUMA layout.
Comment by Médéric Boquien (mboquien) - Tuesday, 24 April 2018, 16:46 GMT
I updated to BIOS to the latest version and I checked various memory modes including interleave/channel (which should be the correct one for NUMA) in case the automatic detection did not work. It works out of the box with Ubuntu Server 18.04 beta 2. I tried to compare the NUMA kernel configuration settings but I did not see any obvious difference that would explain why it works in one case but not the other. Thanks!

P.S: if anyone can edit, please s/RYZEN/EPYC in the original report.
Comment by loqs (loqs) - Tuesday, 24 April 2018, 18:43 GMT
What kernel version is the ubuntu server kernel based on? How many NUMA nodes does linux-lts detect?
Comment by Médéric Boquien (mboquien) - Tuesday, 24 April 2018, 20:03 GMT
* The Ubuntu kernel is "4.15.0-13-generic"
* linux-lts detects just one NUMA node. The outputs of lscpu and numctl --show are similar with the latest kernel..
Comment by loqs (loqs) - Tuesday, 24 April 2018, 21:26 GMT
I would suggest building a 4.14/4.15/4.16 kernel without any patches using the ubuntu config and see if that kernel has the issue.
Comment by Médéric Boquien (mboquien) - Wednesday, 25 April 2018, 13:56 GMT
Two important pieces of information:
* Contrary to what I stated yesterday the LTS kernel does actually detect the NUMA nodes (I forgot to add the LTS kernel to the boot menu as it is the first time I installed a non default kernel in a decade, silly me).
* I compiled kernel 4.16.4 using the Ubuntu config file+oldconfig (choosing the default values for everything). The generated kernel is very large but it also detects the NUMA nodes. I attach the diff between the config file in ABS and the generated file based on the Ubuntu config.

Thanks!
   config.diff (136.5 KiB)
Comment by AK (Andreaskem) - Wednesday, 25 April 2018, 15:20 GMT
This maybe?

-CONFIG_NODES_SHIFT=2
+CONFIG_NODES_SHIFT=10

Eight NUMA nodes would probably require at least 3 here.

See
https://github.com/torvalds/linux/blob/master/include/linux/numa.h
or
https://lkml.org/lkml/2010/3/10/406
Comment by loqs (loqs) - Wednesday, 25 April 2018, 15:34 GMT
config NODES_SHIFT
int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
range 1 10
default "10" if MAXSMP
default "6" if X86_64
default "3"
depends on NEED_MULTIPLE_NODES
---help---
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.

4.15-1 dropped the value from 5 to 2
https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=9998d4fe8026c686abe8db9d9c5941d3936af3de
Comment by AK (Andreaskem) - Wednesday, 25 April 2018, 15:50 GMT
On the other hand, linux-lts seems to have CONFIG_NODES_SHIFT=5 and mboquien said that it does not detect all nodes anyway...

edit: Wait, he said that it *does* detect all NUMA nodes. Nevermind.
Comment by Médéric Boquien (mboquien) - Wednesday, 25 April 2018, 20:32 GMT
I have built the regular Arch kernel package setting CONFIG_NODES_SHIFT to 3 as suggested by AK. It works, all the NUMA nodes are detected. Now I there must be a reason why it was reduced from 5 to 2. I am wondering whether it would be possible or not to move it back up to 3 so people with dual EPYC CPU would not be affected by the problem. The difference in performance with/without NUMA can be very large in some cases (up to ~5 in my test case). Thanks!
Comment by loqs (loqs) - Wednesday, 25 April 2018, 22:46 GMT
My understanding is a value of 3 or 8 NUMA nodes would exclude 4 socket AMD EPYC systems which uses 4 nodes * 4 sockets = 16
and 8 socket Intel systems using SNC which uses 2 nodes * 8 sockets = 16
(sub NUMA cluster mode partitions each chip into 2 NUMA nodes with half the cores each using one memory controller)
Comment by Médéric Boquien (mboquien) - Thursday, 26 April 2018, 15:24 GMT
Indeed. I imagine that CONFIG_NODES_SHIFT was reduced to solve another problem? Otherwise reverting to 5 would certainly be better for people with more NUMA nodes.

Loading...