FS#41530 - [coreutils] uniq fails on this test file

Attached to Project: Arch Linux
Opened by Rasmus Steinke (rasi) - Monday, 11 August 2014, 23:54 GMT
Last edited by Dave Reisner (falconindy) - Tuesday, 12 August 2014, 17:38 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Sébastien Luttringer (seblu)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: using this test file, uniq will fail to eliminate all duplicate lines.


Additional info:
coreutils 8.23-1
Arch Linux 64bit, testing enabled


Steps to reproduce:
Extract tar file.
Run "cat test3 | uniq"
interestingly it works, if you grep for one of the failed entries
E.g.: "grep MiMi test3 | uniq" will work
This task depends upon

Closed by  Dave Reisner (falconindy)
Tuesday, 12 August 2014, 17:38 GMT
Reason for closing:  Works for me
Additional comments about closing:  input to uniq must be "sorted"
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 12 August 2014, 00:58 GMT
Works as expected here.
Comment by Allan McRae (Allan) - Tuesday, 12 August 2014, 03:03 GMT
What locale do you use?
Comment by Dave Reisner (falconindy) - Tuesday, 12 August 2014, 16:42 GMT
Are you expecting that unsorted input will be made unique? (hint: you shouldn't)
Comment by Rasmus Steinke (rasi) - Tuesday, 12 August 2014, 16:53 GMT
huh? it's not unsorted at all... all duplicates are right behind each other...
Comment by Rasmus Steinke (rasi) - Tuesday, 12 August 2014, 16:54 GMT
carnager@caprica ~ > locale -a
C
en_US.utf8
POSIX
carnager@caprica ~ > locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Comment by Rasmus Steinke (rasi) - Tuesday, 12 August 2014, 16:57 GMT
falconindy: but you have a point. running "cat test3 | uniq" will not only leave duplicates, but it will also screw up the order.
Comment by Rasmus Steinke (rasi) - Tuesday, 12 August 2014, 17:00 GMT
Geranto: your result is also messed up.
a) The order in your result has changed
b) there are duplicates

original file had this:

2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
2008 • Red Sky Coven • Volume 5
1999 • Red Sky Coven • Volume 3
1999 • Red Sky Coven • Volume 3
1999 • Red Sky Coven • Volume 3
1999 • Red Sky Coven • Volume 3
1999 • Red Sky Coven • Volume 3
1999 • Red Sky Coven • Volume 3
1999 • Red Sky Coven • Volume 3

and your result has this:

2008 • Red Sky Coven • Volume 5
1999 • Red Sky Coven • Volume 3
1995 • Red Sky Coven • Volume 2
1995 • Red Sky Coven • Volume 1
2008 • Red Sky Coven • Volume 5
1999 • Red Sky Coven • Volume 3
1995 • Red Sky Coven • Volume 2
1995 • Red Sky Coven • Volume 1
Comment by Dave Reisner (falconindy) - Tuesday, 12 August 2014, 17:01 GMT
> huh? it's not unsorted at all... all duplicates are right behind each other...
No, not really...

$ sed -n '/^2012 • Girls Aloud • Ten$/=' test3
97
98
101
102
106
110
111
...

Notice the gaps? 98 will be elided, 102 will be elided, 111 will be elided... you still have dupes in the list.
Comment by Rasmus Steinke (rasi) - Tuesday, 12 August 2014, 17:37 GMT
oh... damn it. didnt realise those...

Loading...