FS#9024 - pacman segfault during xorg + nvidia installation

Attached to Project: Pacman
Opened by André Prata (nDray) - Thursday, 27 December 2007, 19:12 GMT
Last edited by Dan McGee (toofishes) - Saturday, 05 January 2008, 23:46 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To Aaron Griffin (phrakture)
Xavier (shining)
Dan McGee (toofishes)
Architecture All
Severity High
Priority Normal
Reported Version 3.0.6
Due in Version 3.1.0
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

I was reinstalling arch in my laptop, I was doing some new things... Testing stuff, messing with pacman cache and such... I came to a point where I installed X, and I usually do it with
pacman -Sy xorg-{server,xinit} xf86-input-{mouse,keyboard} xorg-fonts-{100,75}dpi nvidia synaptics

Pacman just "segfaulted"... I thgought I had done something wrong before, screwing up pacman's cache, or so, so I just formatted again, I had just begun...

For my surprise, doing no "new things", produced the same error...

I begun trying to install less packages at a time, and realised that pacman -S nvidia by itself, without X installed, made pacman segfault...

I did a --debug and pacman gracefully tells me that "nvidia-utils provides its own conflict"... I installed all of the above packages relating to X and then, separately, nvidia... It tells me that there's a conflict with libgl.... This shouldn't happen... At all.... It's just stupid.... I need to have have both xorg-server and nvidia...

What I can't understand is why yesterday I had these same packages living along in my system...

I provide here the --debug output... Hope you can do something about it...
I'm available for testing, I'm not installing the rest of the system, that's only a spare laptop....
This task depends upon

Closed by  Dan McGee (toofishes)
Saturday, 05 January 2008, 23:46 GMT
Reason for closing:  Fixed
Additional comments about closing:  fixed in GIT
Comment by André Prata (nDray) - Thursday, 27 December 2007, 19:40 GMT
I totally get it, now...

After some talk at the irc, MrElending lead me to conclusion that pacman, recursively, grabs nvidia, then xorg-server, then libgl, but it didn't acknowledge that nvidia already provided libgl...

I guess this should be improved in pacman, like look for providings, then dependencies, but I'll wait to hear from devs....
Comment by Aaron Griffin (phrakture) - Saturday, 29 December 2007, 06:13 GMT
Assigning this to everyone. I want to check to see if this use case is fixed in 3.1
Comment by Xavier (shining) - Saturday, 29 December 2007, 13:07 GMT
nvidia doesn't provide libgl, nvidia-utils does, which is a dep of nvidia. But anyway, that's just a detail.
I have a few problems :
1) the debug log isn't full, is it? Where is the segfault? It might help to also have the non debug output of pacman to see more easily what happens.
2) I can't reproduce it in any situations : with pacman 3.0 or with 3.1, with or without the libgl package installed.
Comment by André Prata (nDray) - Saturday, 29 December 2007, 13:14 GMT
what i meant is pacman fetches nvidia, then xorg-server, then libgl, and when it reaches the end of that recursion branch, it starts with nvidia-utils, or something....

This hasn't ever happened before, maybe because earlier pacman fetched nvidia-utils first, I don't know... This was the first time, and adding nvidia-utils to the line solves the problem, so it's good for me anyways...


About the debug, I did a "pacman --debug -S nvidia > debuginfo".... stderr printed something like "pacman internal error: segfaulted..."... you must know what message i'm talking about...

If you need it that much, I think I could repeat the procedure and store both stdout and stderr...
Comment by Xavier (shining) - Saturday, 29 December 2007, 13:50 GMT
Ok, I thought it was strange the log stopped at this point, but maybe it isn't.
The main problem is that I'm unable to reproduce this bug after several tries. But maybe someone else can.
Comment by André Prata (nDray) - Saturday, 29 December 2007, 13:53 GMT
I believe I will be able to reproduce it....

I may then provide the pacman log... even syslog-ng... I don't usually activate the daemon, but I could do it.... If you need such, I may try to help...
Comment by André Prata (nDray) - Saturday, 29 December 2007, 19:04 GMT
here is the pacman.log and stdout + stderr output....
Comment by André Prata (nDray) - Saturday, 29 December 2007, 19:06 GMT
this time I was installing as allways:

xorg-server xorg-xinit xf86-input-mouse xf86-input-keyboard xorg-fonts-100dpi xorg-fonts-75dpi ttf-dejavu nvidia synaptics
Comment by Dan McGee (toofishes) - Saturday, 29 December 2007, 19:20 GMT
We've done a lot of work in this area between 3.0 and 3.1, so it would be hard to say whether or not this issue still exists.
Comment by Xavier (shining) - Saturday, 29 December 2007, 21:58 GMT
Indeed, it's unlikely 3.0 and 3.1 have the same behavior, since this area changed a lot.
But still, it bothers me that I'm unable to reproduce this with 3.0.

nDray, you are using 3.0.6, right?

What I don't get is the following lines in the log:
debug: CONFLICTS:: nvidia-utils conflicts with libgl
debug: CONFLICTS:: nvidia-utils conflicts with libgl

Looking at the code, it shouldn't be possible to get duplicate in the list, there is a check for preventing these.
See the 3.0.6 code, libalpm/conflict.c :
162 if(miss && !_alpm_depmiss_isin(miss, baddeps)) {
163 baddeps = alpm_list_add(baddeps, miss);

And when I run pacman 3.0.6 on my system, this check seems to work fine, because when I look at my debug log, I get only one line:
debug: CONFLICTS:: nvidia-utils conflicts with libgl
Comment by André Prata (nDray) - Saturday, 29 December 2007, 22:14 GMT
yes, i'm using 3.0.6.... Although i'm installing from a 2007.08-2 CD, the packages were installed from a local source, all most recent from core....

i can't help you at all about that double check....

The command was exactely:

# PKGX="xorg-server xorg-xinit xf86-input-mouse xf86-input-keyboard xorg-fonts-100dpi xorg-fonts-75dpi ttf-dejavu nvidia synaptics
# pacman -Sy $PKGX > pkgx.log 2>&1

I could repeat the error over and over....
I did a base install, but not all packages were installed, actually...
I omitted licenses, lilo, nano, syslog-ng, mailx, logrotate, reiserfsprogs, xfsprogs, jfsutils, pcmciautils, gettext, lvm2, and i believe that's it... After that it's the pacman.log....
Comment by Xavier (shining) - Sunday, 30 December 2007, 09:15 GMT
Oh yes, you are right, your report is perfectly correct.
I had 3.1 pacman package installed there, so I was using a locally built 3.0.
I tried on another box where I had the official 3.0 package, and I could reproduce the bug.

I couldn't reproduce the bug here because I made a debug build with : ./configure --enable-debug && make
I rebuilt it with : ./configure && make , and now I can reproduce the bug. thanks.
Comment by Xavier (shining) - Sunday, 30 December 2007, 10:05 GMT
Well, Nagy already found out that depmiss_isin was wrong a while ago, and proposed a fix:
http://www.archlinux.org/pipermail/pacman-dev/2007-October/009687.html
That does prevent duplicate conflicts in the baddeps list in conflict.c (even with a non-debug pacman build), and so would prevent the segfault in sync.c .
But the code in sync.c is not totally safe, because it implicitly assumes that the conflict list contains no duplicate.

In 3.1, that depmiss_isin doesn't exist anymore, but duplicate conflicts are now avoided with the _alpm_conflict_isin function, which works correctly.
And the code in sync.c (around line 600) still segfaults in case of duplicate conflicts. That shouldn't happen though, so it's probably not a big problem.
And anyway, it's quite hard to figure out what that code does, for making it more bullet proof.
Comment by Nagy Gabor (combo) - Thursday, 03 January 2008, 15:38 GMT
The bug is line 574-575 of sync.c, because rsync can be NULL, not only with duplicated conflicts.

Well, lines 556-570 are _really_ odd.
Comment by Dan McGee (toofishes) - Thursday, 03 January 2008, 15:58 GMT
This seems like the quick fix, but I agree that the code in that section seems quite odd. sync.c is such great code... :)
Comment by Nagy Gabor (combo) - Thursday, 03 January 2008, 16:17 GMT
Well, I'm not sure that line 579 continue; should be moved to if(rsync).
Comment by Xavier (shining) - Saturday, 05 January 2008, 09:47 GMT
I made a pactest for this, maybe it could be committed.
Comment by Nagy Gabor (combo) - Saturday, 05 January 2008, 13:26 GMT
OFF(?):
The good old choose from providers question.
1. The funny thing, that "pacman -S nvidia xorg-server" leads to a completely different result.
2. This example shows, that removing from target list may be needed (I refer to your sync branch, Xavier)
Comment by Xavier (shining) - Saturday, 05 January 2008, 13:45 GMT
1. I didn't manage to get a different result with pacman -S nvidia xorg-server. Did you use the pactest or your real database?
And which pacman version did you try exactly? git?

2. Yes, that's what I figured, and that's why I made a change in my sync branch, as I said in  FS#8899 

Loading...