Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#40143 - [linux] compile kernel with memtest support

Attached to Project: Arch Linux
Opened by Radek Podgorny (rpodgorny) - Tuesday, 29 April 2014, 19:04 GMT
Last edited by Jan de Groot (JGC) - Thursday, 01 May 2014, 12:02 GMT
Task Type Feature Request
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas B├Ąchler (brain0)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

i think it would be nice to have kernel memtest compiled in by default. it is a really handy (and low overhead?) solution to reduce possible memory corruption errors. anyway, you have to explicitly enable it on boot so nothing would change for most users.

currently:

> zcat /proc/config.gz|grep MEMTEST
# CONFIG_MEMTEST is not set

...see http://raid6.com.au/~onlyjob/posts/MEMTEST_explained/ for more info if interested.
This task depends upon

Closed by  Jan de Groot (JGC)
Thursday, 01 May 2014, 12:02 GMT
Reason for closing:  Duplicate
Additional comments about closing:   FS#11328 
Comment by Jan de Groot (JGC) - Tuesday, 29 April 2014, 22:02 GMT
This feature is not really useful. It only detects bad memory at bootup once. Article you pointed out also has a weird conclusion: great for servers that can't afford errors...
First point: important servers run ECC memory
Second point: servers come with EDAC, which reports memory errors whenever they occur.

Imho this is just kernel bloat that we shouldn't enable.

Comment by Radek Podgorny (rpodgorny) - Wednesday, 30 April 2014, 00:01 GMT
well, not really, because:

1) according to the article, it tests all memory that is to be allocated, so not only at boot time. (the truth being i can't find it in the source - but that may be my incompetence)
2) a bad memory check at bootup is a good thing, anyway. see for example this: https://lists.debian.org/debian-kernel/2011/12/msg00121.html
3) not all "servers" are real servers. increasing reliability of commodity junk is a really nice feature.
4) ...not to mention it can be useful on desktop as well. data corruption can be a huge problem anywhere.
5) according to various benchmarks, there should be practically no overhead.
6) the default value for memtest (when compiled in) is 0 which means "disabled" so this should not change anything unless forcefully specified on kernel command line.
99) it's fair to mention, thou, that current vanilla has CONFIG_MEMTEST disabled by default (which is actually funny because it used to be enabled by default when it was first introduced to linux).
Comment by Radek Podgorny (rpodgorny) - Wednesday, 30 April 2014, 00:03 GMT
...oh, and:

7) according to the article, debian has this enabled by default so i'd say it's tested well in the wild and there should be nothing to worry about.
Comment by Daniel Micay (thestinger) - Wednesday, 30 April 2014, 00:13 GMT
Why was it disabled by default upstream? Arch usually sticks close to the vanilla configuration, meaning it builds everything as a module if possible and leaves the remaining N or Y switches at the recommended default. There are some exceptions... but they tend to be fade away as the kernel gets upgraded and non-recommended settings are lost. If it truly had no measurable overhead and was useful, then I'd expect it to be Y by default upstream. The places where we deviate from this tend to be related to backwards compatibility issues upstream cares about but we don't (like the symlink/hardlink protections).
Comment by Radek Podgorny (rpodgorny) - Wednesday, 30 April 2014, 00:36 GMT Comment by Jan de Groot (JGC) - Wednesday, 30 April 2014, 06:15 GMT
Memtest only runs at bootup. The page you linked contains evidence for that with the failed software because of a grown defect.

I've been running some crap hardware with a batch of broken memory modules and a mainboard that silently uses ECC which can't be turned off. Testing the memory won't make such boxes more reliable, memtest86 won't even find the defects after running it for 3 days in a row.

Loading...