FS#31187 - [linux] - disable NUMA from config files

Attached to Project: Arch Linux
Opened by John (graysky) - Sunday, 19 August 2012, 02:42 GMT
Last edited by Tobias Powalowski (tpowa) - Friday, 12 October 2012, 10:52 GMT
Task Type Feature Request
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

Disabling NUMA is beneficial for desktop systems. I offer attached results of a gcc/make-based benchmark whose code is referenced at the bottom of this request on github. Here you see the mainline "3.5.2-1-ARCH" vs. two different BFS patched kernels. One has NUMA enabled per the Arch Linux defaults and the other has it disabled:

http://s19.postimage.org/a8mk5gxgz/3770k.jpg

There is a clear and statistically significant difference in compile times (n=28) with the median gain through disabling NUMA being 344 ms. From my research, unless the hardware has >1 PHYSICAL CPU -- not cores but physical processors -- it is advantageous to disable NUMA as measured by this non-latency endpoint.

Since the majority of Arch users run workstation/laptop setups, I would ask that the default configs disable NUMA.

Link to benchmark script: https://github.com/graysky2/bin/blob/master/bench
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Friday, 12 October 2012, 10:52 GMT
Reason for closing:  Won't implement
Additional comments about closing:  1 second is not worth disabling it
Comment by John (graysky) - Sunday, 19 August 2012, 02:58 GMT
Should have mentioned that the above results on are an Intel 3770K @4.5 GHz running with 8 threads.
Comment by Dave Reisner (falconindy) - Sunday, 19 August 2012, 03:05 GMT
Why bother involving patched kernels at all in this comparison? Why isn't there any comparison of the stock config versus the non-NUMA config?

Note that every other major distro enables NUMA for x86_64.
Comment by John (graysky) - Sunday, 19 August 2012, 03:38 GMT
@Dave - The answer to your first question is that I am lazy. I suspect that the difference will be the same rank-order with or without the patched kernel. To your 2nd point, I _WILL_ recompile the ARCH kernel without NUMA enabled and repeat the analysis.

Although you are right that other major distros do enable NUMA in their vanilla configs, this is NOT a reason to do it for Arch Linux. I contend that the overwhelming super majority of Arch users have only 1 physical CPU (again workstation and laptop users) and thus are impacted detrimentally by this option. In other words, if 99.99999 % of ARCH users do not have >1 CPU, why in the world would we enable the option in the kernel package that benefits the 0.00001 % of users that do? Just because our peer groups do it is no justification that it is a data-driven and sound decision.

I am tired and am going to bed now, but will compile via ABS the 3.5.2-1-ARCH linux package tomorrow morning repeating this study without any non-standard patches to address point.
Comment by Dave Reisner (falconindy) - Sunday, 19 August 2012, 03:49 GMT
In order to make this a reasonable comparison which presents a compelling reason to disable this, I would think there needs to be:

1. wider testing across more hardware than just an Ivy Bridge processor.
2. wider testing across workloads beyond just Kbuild.

> Although you are right that other major distros do enable NUMA in their vanilla configs, this is NOT a reason to do it for Arch Linux.
There's a lot of things we could potentially disable/enable in our default config to make it a faster kernel. I'm hard pressed to believe that, given enough evidence that this is a good idea™, it's going to make a bigger difference than disabling a lot of the debug options we've recently enabled. The hivemind of the _desktop_ distros clearly has this enabled for a reason.
Comment by John (graysky) - Sunday, 19 August 2012, 06:18 GMT
Insomnia... here are the results you asked to see: as you can see, the same statistically significant trend holds with the ARCH kernel with and without NUMA.

http://s19.postimage.org/gp566f1pv/8_19_2012_2_06_53_AM.jpg

To your points raised in your new post:
1) I have repeated this results using the -ck kernel on a core2 processor (X3360) with the rank order being the same. More testing on older hardware or on hardware with less cores could be done, but my suspicion is that there would be no surprises given that all represent single CPU machines.

2) Agreed that more workloads would be interesting, but given what this particular option does per the manpage (http://www.kernel.org/doc/man-pages/online/pages/man5/numa_maps.5.html), I don't see the need to identify and run these to prove the point.

I can't comment on the debugging options but with respect to the NUMA option, I feel that it enabling it on mono CPU systems is detrimental based on these results. Thank you for your consideration.

3) I respectfully disagree. If all major distros do something because their peer group did it, how is this a reason to justify doing it for Arch -- particularly considering the data showing that it is a performance regression for those users with only one physical CPU?

"...the overwhelming super majority of Arch users have only 1 physical CPU (again workstation and laptop users) and thus are impacted detrimentally by this option. In other words, if 99.99999 % of ARCH users do not have >1 CPU, why in the world would we enable the option in the kernel package that benefits the 0.00001 % of users that do? Just because our peer groups do it is no justification that it is a data-driven and sound decision."
Comment by Jan de Groot (JGC) - Sunday, 19 August 2012, 16:12 GMT
AFAIK you can turn numa off by booting with numa=off on the bootup line. If we disable it in config, that means that any SMP system involving builtin memory controllers will need a kernel rebuild to gain full performance.

We're talking about a 0.2% speedup by disabling NUMA here. Is that a valid reason to disable NUMA support completely? For me it is not.
Comment by John (graysky) - Sunday, 19 August 2012, 16:27 GMT
@JdG - True that the improvement is small, but that is as measured by this specific benchmark. Other tasks may have a much lager increase. The flip side to your argument for those with SMP systems is that the vast majority of Arch systems will run _slower_ by including this... in my mind, there are WAY more single processor users than those with multiple processors out there. Anyway, I just wanted to bring this up to those who control the package; I will respect your decisions to include or not.
Comment by Thomas Bächler (brain0) - Monday, 08 October 2012, 12:33 GMT
I agree with Jan here. We have users running Arch on multi-socket configurations and they should have all the features they need. We do not have the necessary manpower to build multiple kernels for different workloads and systems, so we provide one that works for everyone.

Have you benchmarked performance with numa=off?
Comment by John (graysky) - Monday, 08 October 2012, 19:46 GMT
@Thomas - Your argument makes sense. I will check out with the kernel option you guys recommend and post back here for completeness. In the meantime, I think it's safe to close this ticket... I can still post once it's closed, no?
Comment by Evangelos Foutras (foutrelis) - Monday, 08 October 2012, 20:20 GMT
> I can still post once it's closed, no?

Nope, you can't do that.
Comment by John (graysky) - Tuesday, 09 October 2012, 21:47 GMT
OK... I tested three conditions using linux-3.6.1-1. Again, the benchmark was run 27 times per test. I tested this on 2 different machines. One single CPU machine and one dual CPU machine.

Conditions:
1) Stock kernel
2) Stock kernel booting with numa=off
3) Compiled stock kernel from ABS with NUMA disabled prior to compiling. For example `% zgrep NUMA /proc/config.gz` returned: '# CONFIG_NUMA is not set' on the running kernel. This was the only modification I did to it.

Results for single socket machine:
*Booting with 'numa=off' gave the longest compile times.
*The stock kernel gave longer compile times than the same kernel compiled with NUMA disabled in the .config. Confirming my initial report.
*The fastest compile times were observed with NUMA disabled in the .config prior to building.
Statistical significance on all three tests.

Machine 1 = 3770K @ 4.5 GHz
Total CPUs = 1
Physical cores = 4 and hyperthreaded cores = 4
Total cores (physical+virtual) = 8

http://s19.postimage.org/x5qgb1gqb/3770_ARCH.jpg

Results for dual socket machine:
*Booting with 'numa=off' gave the longest compile times.
*The same kernel compiled with NUMA disabled in the .config gave longer compile times than the stock kernel.
*The fastest compile times were observed with the stock kernel.
Statistical significance on all three tests.

Machine 2 = Dual 5620 @ 2.4 GHz
Total CPUs = 2
Physical cores = 8 and hyperthreaded cores = 8
Total cores (physical+virtual) = 8

http://s19.postimage.org/mwxz57soj/dual_5620.jpg

So there you have it! I still subscribe to the thinking I outlined above, but understand that I am not a developer. Thanks for the consideration.
Comment by Dave Reisner (falconindy) - Tuesday, 09 October 2012, 21:56 GMT
Just so we're clear: you're claiming that this is worth saving less than a second on a kernel compile?
Comment by John (graysky) - Tuesday, 09 October 2012, 22:03 GMT
@falconindy - No, I am simply pointing out that the changes between these conditions are significant as measured by this endpoint. How does this translate into a real-world computing? I do not know. Perhaps given other benchmarks, one might see an even greater differentiation between the two. I opened this ticket primarily on that general principal. I don't have the data, but am going to guess that >99 % of Archers use mono-socket configurations and would thus benefit by disabling the setting.

Loading...