Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#37089 - [psmisc] 22.20-1 killall complains about some valid regular expressions

Attached to Project: Arch Linux
Opened by Glenn (grepfor) - Friday, 27 September 2013, 13:05 GMT
Last edited by Eric Belanger (Snowman) - Sunday, 20 October 2013, 15:12 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Eric Belanger (Snowman)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Regarding the --regexp option of killall(1), the man page says:

-r, --regexp
Interpret process name pattern as an extended
regular expression.

The string '*xyz*' (without the single quotes) is a valid extended regular expression, and accepted by various tools which accept extended regular expressions, e.g. grep -E:

$ grep -E -e '*xyz*' foo # Matches as expected

Yet killall complains about it:

$ killall -r '*xyz*'
killall: Bad regular expression: *xyz*

Does the definition of "extended regular expression" used by killall differ from that used by grep -E? Or is this a bug?

If the defintion differs from that used by grep -E, perhaps the man page could be improved by referencing an appropriate definition of extended RE.
This task depends upon

Closed by  Eric Belanger (Snowman)
Sunday, 20 October 2013, 15:12 GMT
Reason for closing:  Deferred
Comment by Dave Reisner (falconindy) - Friday, 27 September 2013, 13:42 GMT
This is *not* a valid regex.

PCRE declares this invalid:

$ pkgfile -r '*foo'
error: failed to compile regex at char 0: nothing to repeat

POSIX ERE (implemented via <regex.h>) declares this invalid too (see attached):

"Invalid preceding regular expression"

If anything, "various tools" are being lenient and don't actually follow POSIX ERE. grep appears to treat '*' as a literal when it appears at the start of a pattern.

If you want an actual answer, ask upstream. But, it sounds to me like this is WAI, and your expectations are incorrect.
   ere.c (0.3 KiB)
Comment by Glenn (grepfor) - Friday, 27 September 2013, 13:53 GMT
Apologies, this report was filed with some incorrect info:

Package should be "core", not "extra".
Severity should be "medium", not "low"

Side issue: I could not figure out how to go about editing these fields myself. Is it possible to do so?
Comment by Eric Belanger (Snowman) - Friday, 27 September 2013, 19:00 GMT
Only devs can edit those field. About the issue, ask upstream.
Comment by Glenn (grepfor) - Friday, 27 September 2013, 22:00 GMT
Still have questions on this, please re-open. Strong evidence that either grep (with POSIXLY_CORRECT defined) or killall has a POSIX conformance issue. Looks like culprit is probably grep, but don't have time to look into it right now. Please re-open for a few days, will provide details over the weekend. Thx.
Comment by Dave Reisner (falconindy) - Saturday, 28 September 2013, 00:03 GMT
Here's the most recent POSIX standard:

http://pubs.opengroup.org/onlinepubs/9699919799/

Section 9.4.3 states the following:

The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall be special except when used in a bracket expression (see RE Bracket Expression). Any of the following uses produce undefined results:
1) If these characters appear first in an ERE, or immediately following a <vertical-line>, <circumflex>, or <left-parenthesis>
2) If a <left-brace> is not part of a valid interval expression

So, your statement about '*xyz*' being a valid extended regular expression is incorrect. And, the specific mention of "undefined results" means that grep is still in compliance with the ERE standard.
Comment by Glenn (grepfor) - Saturday, 28 September 2013, 01:38 GMT
>
> So, your statement about '*xyz*' being a valid extended regular expression is incorrect
>

Disagree. It is a valid extended regular expression in the grammar of egrep, since it is accepted by egrep and it behaves there in accordance with its documentation:

http://www.gnu.org/software/grep/manual/grep.html#Regular-Expressions

The operative phrase is "The empty regular expression matches the empty string". Hence, the leading * matches zero or more repetitions of the leading empty RE, and it indeed behaves just that way if you try it. In particular it does not behave as you stated earlier ("grep appears to treat '*' as a literal") and I'm curious how you came to that conclusion. It does not seem to be supported by any experiments that I performed using egrep from grep 2.14-2 , with POSIXLY_CORRECT set or unset. For example, if the file 'foo' contains the string 'abcdefghi\n', then

$ grep -E '*def' foo

matches exactly the string 'def', as it should, in accordance with the above doc. You can verify this with colorization (--color). If it behaved as you stated, it would not match anything in foo, since foo doesn't contain a literal asterisk.

In short, '*xyz' is a valid and well-defined ERE within the grammar of egrep: The leading * matches the leading empty RE preceding it, i.e. it matches nothing. That's the way it is documented to behave, it's the way it has historically behaved for many years (AFAIR) and -- at least in a few experiments I did just now -- that is the way it behaves at present.

What you seem to be meaning to say is that '*xyz' (or '*xyz*') is not a valid POSIX ERE. No argument. But if you read my original post, you'll not see any mention or implication of POSIX at all. I simply stated that it was "a" valid ERE, which it obviously is for egrep (and various other tools) and asked if was the same ERE definition that killall uses. (I didn't know at the time what definition killall used, because its man page doesn't say.) The reason for asking was precisely because it does behave differently than egrep, and yet egrep (when POSIXLY_CORRECT is set) is presumably POSIX compliant. Yet even when POSIXLY_CORRECT is set, egrep accepts EREs with leading repetition operators like *.

So something is amiss: Either doc, code, or interpretation of what POSIXLY_CORRECT means within the grep man page. No argument that killall operates in accordance with strict POSIX spec. But that isnt' what my report asked. It's also why I suggested documenting killall's definition of ERE, precisely to avoid these kinds of definitional ambiguities.

The relevent portion of the POSIX standard regarding this issue seems to me not 9.4.3 that you quote above, but 9.5.3, which defines the formal POSIX grammar and defines clearly and unambiguously what is required for strict conformance:

"The ERE grammar ==> does not permit <== several constructs that previous sections specify as having undefined results. [ ... ]"

"Implementations are permitted to extend the language to allow [expressions such as '*xyz']. Conforming applications ==> cannot use <== such constructs."

(Emphasis mine).

The last sentence is crystal clear: If an implementation allows '*xyz', then it is non-conforming.

Thus, the central issue here is exactly what is meant by POSIXLY_CORRECT in the grep doc. The man page is ambiguous. First it says "If [POSIXLY_CORRECT is] set, grep behaves as POSIX.2", but then goes on to list several specific POSIX behaviors (none of which have anything to do with this issue). So it is not clear whether the intent of POSIXLY_CORRECT is the general statement "behaves as POSIX.2", or only the specific behaviors mentioned. If the intent is the general statement, then it is not conforming.

I'll check into this more over the weekend and post what I find here.


Comment by Dave Reisner (falconindy) - Saturday, 28 September 2013, 01:47 GMT
So then it's merely implementation defined.

Still not a discussion that should be taking place here.
Comment by Glenn (grepfor) - Saturday, 28 September 2013, 15:19 GMT
>
> So then it's merely implementation defined.
>

Seems to me just the opposite: In view of 9.5.3, a conforming app has no implementation flexibility at all on this issue: It "cannot use" EREs like '*xyz'. If grep's intention is to be conforming when POSIXLY_CORRECT is set, then it would seem to be presently in violation.

>
> Still not a discussion that should be taking place here.
>

Agree, and will pursue upstream. For completeness' sake, how about keeping it open here until it gets resolved? I'll then post a 1-2 sentence summary and a link to the upstream ML threads. There will be two: One for killall requesting the suggested doc improvement, and one for grep asking for clarification of POSIXLY_CORRECT semantics and/or code behavior change.
Comment by Glenn (grepfor) - Sunday, 20 October 2013, 14:37 GMT
Update: Resolution of this issue is still in progress. A careful look into it revealed that the source of the confusion is the spec language in POSIX XBD 9.5.3, which has been wrong since 2001. This is essentially (indirectly) what led to the original report here. Gory detail available at:

http://lists.gnu.org/archive/html/bug-grep/2013-09/msg00028.html

A doc patch was also submitted against killall.1 to clarify its use of EREs, and a another unrelated doc issue:

http://sourceforge.net/p/psmisc/bugs/59/

which was accepted and scheduled for the next release.

Going forward, I also took as an action item to submit a doc patch against grep, to clarify the semantics of POSIXLY_CORRECT with respect to EREs based on the new (fixed) spec language. But that can't be done until the new language makes it at least to draft form, which will probably be weeks if not months. So probably no point in keeping this ticket open any longer.

Loading...