FS#49942 - [grep] not "grepping" with external fixed patterns file

Attached to Project: Arch Linux
Opened by Lacsap (lacsap) - Monday, 04 July 2016, 13:50 GMT
Last edited by Sébastien Luttringer (seblu) - Wednesday, 06 July 2016, 21:45 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Sébastien Luttringer (seblu)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: grep is not "grepping" from grep-2.23-1 with external fixed patterns file.

hi,

I've a big (3.3Go) gzipped file which comes from nsrl with fields separated by one tabulation :

$ zcat nsrlfiletxt.gz | head -2
sha-1 md5 crc32 filename filesize productcode opsystemcode specialcode
000000206738748edd92c4e3d2e823896700f849 392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865 3095 win

I've a file with fixed patterns (windows only from field 7 opsystemcode) :

$ cat win.os
2000 sp 4
2ksp3
dos
...
xp sp2
xphomeedw/sp2
xpprofessw/sp2

my os is :

$ uname -a
Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 GNU/Linux

and grep is :

$ grep --version
grep (GNU grep) 2.25
...

$ pacman -Q grep
grep 2.25-2

when I try this :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,4k 0:00:00 [ 776k/s] [ <=> ]

only 59.4k lines are processed, with no error !
(sed is used on win.os to match only on field)

I downgrade to grep 2.24 :

# pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz
...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,4k 0:00:00 [ 863k/s] [ <=> ]

again, only 59.4k lines are processed, with no error !

I downgrade to grep 2.23 :

# pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,1k 0:00:00 [ 823k/s] [ <=> ]

only 59.1k lines are processed, with no error !

I downgrade to grep 2.22 :

# pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz
...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
157M 0:04:36 [ 567k/s] [ <=> ]

the 157M of lines are well processed !

so I think there's a bug introduced with grep 2.23...

regards.

Additional info:
* package version(s)
* config and/or log files etc.


Steps to reproduce:
This task depends upon

Closed by  Sébastien Luttringer (seblu)
Wednesday, 06 July 2016, 21:45 GMT
Reason for closing:  Upstream
Comment by Dave Reisner (falconindy) - Monday, 04 July 2016, 14:01 GMT
What happens if you set LANG=C?
Comment by Lacsap (lacsap) - Monday, 04 July 2016, 14:15 GMT
$ echo $LANG
fr_FR.UTF-8

$ zcat nsrlfiletxt.gz | pv -l | LANG=C grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows

is ok :-)
Comment by Sébastien Luttringer (seblu) - Wednesday, 06 July 2016, 21:37 GMT
Please report this upstream.

Loading...