Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#37089 - [psmisc] 22.20-1 killall complains about some valid regular expressions
Attached to Project:
Arch Linux
Opened by Glenn (grepfor) - Friday, 27 September 2013, 13:05 GMT
Last edited by Eric Belanger (Snowman) - Sunday, 20 October 2013, 15:12 GMT
Opened by Glenn (grepfor) - Friday, 27 September 2013, 13:05 GMT
Last edited by Eric Belanger (Snowman) - Sunday, 20 October 2013, 15:12 GMT
|
DetailsRegarding the --regexp option of killall(1), the man page says:
-r, --regexp Interpret process name pattern as an extended regular expression. The string '*xyz*' (without the single quotes) is a valid extended regular expression, and accepted by various tools which accept extended regular expressions, e.g. grep -E: $ grep -E -e '*xyz*' foo # Matches as expected Yet killall complains about it: $ killall -r '*xyz*' killall: Bad regular expression: *xyz* Does the definition of "extended regular expression" used by killall differ from that used by grep -E? Or is this a bug? If the defintion differs from that used by grep -E, perhaps the man page could be improved by referencing an appropriate definition of extended RE. |
This task depends upon
PCRE declares this invalid:
$ pkgfile -r '*foo'
error: failed to compile regex at char 0: nothing to repeat
POSIX ERE (implemented via <regex.h>) declares this invalid too (see attached):
"Invalid preceding regular expression"
If anything, "various tools" are being lenient and don't actually follow POSIX ERE. grep appears to treat '*' as a literal when it appears at the start of a pattern.
If you want an actual answer, ask upstream. But, it sounds to me like this is WAI, and your expectations are incorrect.
Package should be "core", not "extra".
Severity should be "medium", not "low"
Side issue: I could not figure out how to go about editing these fields myself. Is it possible to do so?
http://pubs.opengroup.org/onlinepubs/9699919799/
Section 9.4.3 states the following:
The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall be special except when used in a bracket expression (see RE Bracket Expression). Any of the following uses produce undefined results:
1) If these characters appear first in an ERE, or immediately following a <vertical-line>, <circumflex>, or <left-parenthesis>
2) If a <left-brace> is not part of a valid interval expression
So, your statement about '*xyz*' being a valid extended regular expression is incorrect. And, the specific mention of "undefined results" means that grep is still in compliance with the ERE standard.
> So, your statement about '*xyz*' being a valid extended regular expression is incorrect
>
Disagree. It is a valid extended regular expression in the grammar of egrep, since it is accepted by egrep and it behaves there in accordance with its documentation:
http://www.gnu.org/software/grep/manual/grep.html#Regular-Expressions
The operative phrase is "The empty regular expression matches the empty string". Hence, the leading * matches zero or more repetitions of the leading empty RE, and it indeed behaves just that way if you try it. In particular it does not behave as you stated earlier ("grep appears to treat '*' as a literal") and I'm curious how you came to that conclusion. It does not seem to be supported by any experiments that I performed using egrep from grep 2.14-2 , with POSIXLY_CORRECT set or unset. For example, if the file 'foo' contains the string 'abcdefghi\n', then
$ grep -E '*def' foo
matches exactly the string 'def', as it should, in accordance with the above doc. You can verify this with colorization (--color). If it behaved as you stated, it would not match anything in foo, since foo doesn't contain a literal asterisk.
In short, '*xyz' is a valid and well-defined ERE within the grammar of egrep: The leading * matches the leading empty RE preceding it, i.e. it matches nothing. That's the way it is documented to behave, it's the way it has historically behaved for many years (AFAIR) and -- at least in a few experiments I did just now -- that is the way it behaves at present.
What you seem to be meaning to say is that '*xyz' (or '*xyz*') is not a valid POSIX ERE. No argument. But if you read my original post, you'll not see any mention or implication of POSIX at all. I simply stated that it was "a" valid ERE, which it obviously is for egrep (and various other tools) and asked if was the same ERE definition that killall uses. (I didn't know at the time what definition killall used, because its man page doesn't say.) The reason for asking was precisely because it does behave differently than egrep, and yet egrep (when POSIXLY_CORRECT is set) is presumably POSIX compliant. Yet even when POSIXLY_CORRECT is set, egrep accepts EREs with leading repetition operators like *.
So something is amiss: Either doc, code, or interpretation of what POSIXLY_CORRECT means within the grep man page. No argument that killall operates in accordance with strict POSIX spec. But that isnt' what my report asked. It's also why I suggested documenting killall's definition of ERE, precisely to avoid these kinds of definitional ambiguities.
The relevent portion of the POSIX standard regarding this issue seems to me not 9.4.3 that you quote above, but 9.5.3, which defines the formal POSIX grammar and defines clearly and unambiguously what is required for strict conformance:
"The ERE grammar ==> does not permit <== several constructs that previous sections specify as having undefined results. [ ... ]"
"Implementations are permitted to extend the language to allow [expressions such as '*xyz']. Conforming applications ==> cannot use <== such constructs."
(Emphasis mine).
The last sentence is crystal clear: If an implementation allows '*xyz', then it is non-conforming.
Thus, the central issue here is exactly what is meant by POSIXLY_CORRECT in the grep doc. The man page is ambiguous. First it says "If [POSIXLY_CORRECT is] set, grep behaves as POSIX.2", but then goes on to list several specific POSIX behaviors (none of which have anything to do with this issue). So it is not clear whether the intent of POSIXLY_CORRECT is the general statement "behaves as POSIX.2", or only the specific behaviors mentioned. If the intent is the general statement, then it is not conforming.
I'll check into this more over the weekend and post what I find here.
Still not a discussion that should be taking place here.
> So then it's merely implementation defined.
>
Seems to me just the opposite: In view of 9.5.3, a conforming app has no implementation flexibility at all on this issue: It "cannot use" EREs like '*xyz'. If grep's intention is to be conforming when POSIXLY_CORRECT is set, then it would seem to be presently in violation.
>
> Still not a discussion that should be taking place here.
>
Agree, and will pursue upstream. For completeness' sake, how about keeping it open here until it gets resolved? I'll then post a 1-2 sentence summary and a link to the upstream ML threads. There will be two: One for killall requesting the suggested doc improvement, and one for grep asking for clarification of POSIXLY_CORRECT semantics and/or code behavior change.
http://lists.gnu.org/archive/html/bug-grep/2013-09/msg00028.html
A doc patch was also submitted against killall.1 to clarify its use of EREs, and a another unrelated doc issue:
http://sourceforge.net/p/psmisc/bugs/59/
which was accepted and scheduled for the next release.
Going forward, I also took as an action item to submit a doc patch against grep, to clarify the semantics of POSIXLY_CORRECT with respect to EREs based on the new (fixed) spec language. But that can't be done until the new language makes it at least to draft form, which will probably be weeks if not months. So probably no point in keeping this ticket open any longer.