FS#23753 - sqlite3 3.7.6-1 produces Firefox 4 segfault

Attached to Project: Arch Linux
Opened by Daniel (AurosGamma) - Wednesday, 13 April 2011, 16:46 GMT
Last edited by Andreas Radke (AndyRTR) - Friday, 15 April 2011, 05:13 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Andreas Radke (AndyRTR)
Architecture i686
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 8
Private No

Details

Description:

The package sqlite3 from [testing] repository causes a segfault on firefox from [extra], the issue is not present with sqlite3 from [core] repository

Additional info:
sqlite3 3.7.6-1 [testing]
sqlite3 3.7.5-1 [core]
firefox 4.0-1 [extra]

Steps to reproduce:
1- Install sqlite3 3.7.6-1 from [testing]
2- Try to run firefox from terminal...segfault
3- To have firefox working again install sqlite3 3.7.5-1 from [core]
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Friday, 15 April 2011, 05:13 GMT
Reason for closing:  Fixed
Additional comments about closing:  fixed with sqlite3 - 3.7.6.1
Comment by Ionut Biru (wonder) - Wednesday, 13 April 2011, 18:03 GMT
does rebuilding xulrunner fix the issue?
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 18:09 GMT
FYI, I'm unable to reproduce this on x86_64, using the same package versions.
Comment by Ionut Biru (wonder) - Wednesday, 13 April 2011, 18:11 GMT
yeah, i cannot replicate on i686 but it seems that others can do it
Comment by Gordy Campbell (bakerboy) - Wednesday, 13 April 2011, 18:25 GMT
I got this when I did a fresh install with testing enabled. Running x86_64 here. Downgraded the package solved the problem.
Comment by Andreas Radke (AndyRTR) - Wednesday, 13 April 2011, 18:48 GMT
I can't reproduce it. Affected users should try if recompiling xulrunner will fix it.
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 18:51 GMT
Based on the stack trace posted in arch-general, it's crashing while checkpointing a database, somewhere under a call to sqlite3_wal_checkpoint_v2(). It's possible to invoke this function directly, using the sqlite3 CLI tool. It would be interesting to see if the crash can be reproduced this way. Unfortunately, the stack trace doesn't show in what mode the checkpoint was called, or which database it was, so there's a lot of trial and error involved here.

Would someone who has this problem do the following:

- Upgrade to the broken sqlite package again
- Verify that firefox still crashes
- For each .sqlite file in your firefox profile directory, do these steps:
1. sqlite3 thatfile
2. check if it is a WAL-mode database or some older kind by doing: pragma journal_mode;
3. Regardless of what the previous step says, try all 3 modes of checkpoint, and see if any crash or complain:
pragma wal_checkpoint(PASSIVE);
pragma wal_checkpoint(FULL);
pragma wal_checkpoint(RESTART);
You will get some numbers as output from these commands. I expect "0|-1|-1" for non-WAL databases, and "0|0|0" for WAL databases, but I don't know what they actually mean.
4. Quit the sqlite3 tool with: .quit
5. Go on to the next file. Remember to check in subdirectories for databases that extensions may create on their own.

It may even be that following this procedure "cleans" a corrupt database, making firefox start to work again afterward.
Comment by gnudna (gnudna) - Wednesday, 13 April 2011, 19:02 GMT
I got the exact same issue after i switched to testing and and installed gnome3. Since my system is untouched i will try the steps mentioned above and keep you all posted on the outcome.
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 19:14 GMT
Strangely enough, I was randomly able to reproduce this, when I couldn't earlier. What I had to do, was start firefox, and then let it sit idle for a few (3 in this case) minutes. It then crashed without further interaction. There were 3 databases open/dirty at crash time: places.sqlite (WAL), cookies.sqlite (WAL), and urlclassifier3.sqlite (journal-delete). I suspect firefox is crashing while doing the kind of housekeeping it only does when idle, or when forced to do immediately. urlclassifier3, the "safe browsing" database, looks especially suspicious to me when thinking down these lines, though it was the cookies DB that actually had work in the WAL waiting to be checkpointed after the crash. Hmm...

I examined these 3 files using my own procedure after the crash, and found nothing interesting. As I'm now able to reproduce it, I'll see if rebuilding xulrunner does anything for me.
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 19:35 GMT
This definitely appears to be about the urlclassifier database. I tried changing that file from delete to WAL, without any impact. I then deleted this database file entirely, and firefox now reliably segfaults immediately on startup, right after creating this file. This should help others to reproduce the problem.

xulrunner rebuild is still in progress.
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 19:38 GMT
I'm actually unable to rebuild xulrunner to test it:

c++ -o nsID.o -c -I../../dist/stl_wrappers -I../../dist/system_wrappers -include /home/ataraxia/junk/xulrunner/src/mozilla-2.0/config/gcc_hidden.h -DOSTYPE=\"Linux2.6\" -DOSARCH=Linux -DTARGET_XPCOM_ABI=\"x86_64-gcc3\" -I/home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/glue/../build -I/home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/glue -I. -I../../dist/include -I../../dist/include/nsprpub -I/usr/include/nspr -I/usr/include/nss -fPIC -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -fno-strict-aliasing -fshort-wchar -pthread -pipe -DNDEBUG -DTRIMMED -Os -freorder-blocks -fomit-frame-pointer -DMOZILLA_CLIENT -include ../../mozilla-config.h -MD -MF .deps/nsID.pp /home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/glue/nsID.cpp
/home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/glue/nsEnumeratorUtils.cpp:115:27: error: uninitialized const ‘EmptyEnumeratorImpl::kInstance’ [-fpermissive]
/home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/glue/nsEnumeratorUtils.cpp:50:7: note: ‘const class EmptyEnumeratorImpl’ has no user-provided default constructor
make[5]: *** [nsEnumeratorUtils.o] Error 1
Comment by Ionut Biru (wonder) - Wednesday, 13 April 2011, 19:48 GMT
export CFLAGS="$CFLAGS -fpermissive"
export CXXFLAGS="$CXXFLAGS -fpermissive"

before make
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 21:18 GMT
I rebuilt xulrunner, but that does NOT solve this problem.
Comment by Ionut Biru (wonder) - Wednesday, 13 April 2011, 21:37 GMT
ok, so is something that mozilla have to fix it. I did search in their bugtracker and i only saw a bug requesting update to 3.7.6 in their tree.

i guess is a good opportunity to report this as a bug and as a workaround until is fixed, i'll disable system sqlite support
Comment by Daniel (AurosGamma) - Wednesday, 13 April 2011, 22:23 GMT
Im not sure, but i think that empathy 3.0 is affected too, i was having some segfaults on it but now i can't test empathy again (because i don't have my laptop with me), if somebody can try it out and see if segfaults begin showing up, i'll appreciate it
Greetings
Comment by Ionut Biru (wonder) - Wednesday, 13 April 2011, 23:00 GMT
telepathy-logger or empathy? cah you get a backtrace? (recompile empathy and telepathy-logger with debug)

maybe it is something wrong with sqlite3 in the end...
Comment by Ray (ataraxia) - Wednesday, 13 April 2011, 23:48 GMT
I forgot to post any actual debug output. Here is a stack trace with a debug-enabled sqlite 3.7.6-1, showing garbage (I think?) on the top of the stack, and demonstrating that firefox was executing the SQL statement "PRAGMA user_version = 7;":

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdeab0700 (LWP 13087)]
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff46c270e in fcntlSizeHint (pFile=0x7fffe040e950, nByte=32768) at sqlite3.c:27511
#2 0x00007ffff46c27c2 in unixFileControl (id=0x7fffe040e950, op=5, pArg=0x7fffdeaaf580) at sqlite3.c:27559
#3 0x00007ffff46b9cec in sqlite3OsFileControl (id=0x7fffe040e950, op=5, pArg=0x7fffdeaaf580) at sqlite3.c:14017
#4 0x00007ffff46ca9d7 in pager_write_pagelist (pPager=0x7fffe040e808, pList=0x7fffe046d008) at sqlite3.c:40338
#5 0x00007ffff46cc813 in sqlite3PagerCommitPhaseOne (pPager=0x7fffe040e808, zMaster=0x0, noSync=0) at sqlite3.c:42183
#6 0x00007ffff46d54b1 in sqlite3BtreeCommitPhaseOne (p=0x7fffe043e888, zMaster=0x0) at sqlite3.c:50086
#7 0x00007ffff46e2f8f in vdbeCommit (db=0x7fffe040e408, p=0x7fffe124a128) at sqlite3.c:58743
#8 0x00007ffff46e3866 in sqlite3VdbeHalt (p=0x7fffe124a128) at sqlite3.c:59145
#9 0x00007ffff46e8258 in sqlite3VdbeExec (p=0x7fffe124a128) at sqlite3.c:63034
#10 0x00007ffff46e5afc in sqlite3Step (p=0x7fffe124a128) at sqlite3.c:60607
#11 0x00007ffff46e5ce8 in sqlite3_step (pStmt=0x7fffe124a128) at sqlite3.c:60672
#12 0x00007ffff470f88c in sqlite3_exec (db=0x7fffe040e408, zSql=0x7fffdeaafac0 "PRAGMA user_version = 7", xCallback=0, pArg=0x0, pzErrMsg=0x0) at sqlite3.c:86244
#13 0x00007ffff56a83a6 in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#14 0x00007ffff56a8124 in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#15 0x00007ffff5609dae in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#16 0x00007ffff560adca in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#17 0x00007ffff5844d2b in NS_InvokeByIndex_P () from /usr/lib/xulrunner-2.0/libxul.so
#18 0x00007ffff583d0e3 in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#19 0x00007ffff58395a6 in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#20 0x00007ffff580b330 in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#21 0x00007ffff58390e5 in ?? () from /usr/lib/xulrunner-2.0/libxul.so
#22 0x00007ffff63892a3 in ?? () from /usr/lib/libnspr4.so
#23 0x00007ffff7bc9d40 in start_thread () from /lib/libpthread.so.0
#24 0x00007ffff7428aed in clone () from /lib/libc.so.6
#25 0x0000000000000000 in ?? ()

Here's some value-printing showing (at the end) that indeed it is creating the urlclassifier3 database:

(gdb) up
#1 0x00007ffff46c270e in fcntlSizeHint (pFile=0x7fffe040e950, nByte=32768) at sqlite3.c:27511
27511 err = osFallocate(pFile->h, buf.st_size, nSize-buf.st_size);
(gdb) p pFile->h
$1 = 41
(gdb) p buf.st_size
$2 = 0
(gdb) p nSize
$3 = 5242880
(gdb) p pFile
$4 = (unixFile *) 0x7fffe040e950
(gdb) p buf
$5 = {st_dev = 2050, st_ino = 13500855, st_nlink = 1, st_mode = 33188, st_uid = 1000, st_gid = 100, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 4096,
st_blocks = 0, st_atim = {tv_sec = 1302737460, tv_nsec = 409707290}, st_mtim = {tv_sec = 1302737460, tv_nsec = 409707290}, st_ctim = {tv_sec = 1302737460,
tv_nsec = 409707290}, __unused = {0, 0, 0}}
(gdb) p *pFile
$6 = {pMethod = 0x7ffff49690a0, pInode = 0x7fffe0453048, h = 41, dirfd = -1, eFileLock = 4 '\004', ctrlFlags = 0 '\000', lastErrno = 0, lockingContext = 0x0,
pUnused = 0x7fffe0424e68, zPath = 0x7fffe040ea28 "/home/ataraxia/.mozilla/firefox/y6o3ya4i.default/urlclassifier3.sqlite", pShm = 0x0, szChunk = 5242880}

Nothing bad happens when I run that same SQL statement myself, on that same file:

$ sqlite3 ~/.mozilla/firefox/y6o3ya4i.default/urlclassifier3.sqlite
SQLite version 3.7.6
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> PRAGMA user_version = 7;
sqlite> PRAGMA user_version;
7

I stupidly did not build my xulrunner with debug symbols - I'm rebuilding it again now and will post a better backtrace. However, this really is starting to look very much like sqlite's problem, and not mozilla's, no?
Comment by Ray (ataraxia) - Thursday, 14 April 2011, 01:59 GMT
A debug-enabled xulrunner did not teach me anything more - the stack trace indeed shows it initializing the urlclassifier DB. I still see values optimized out even in this configuration - clearly xulrunner is too hard for me to debug, and as it takes an hour to build, I'm going to give up on the non-sqlite parts of this.

However, rebuilding sqlite again with -DSQLITE_DEBUG, rather than just disabling strip and adding -g -O0 as I had done before, makes it die on a different SQL statement: "CREATE TABLE IF NOT EXISTS moz_classifier (id INTEGER PRIMARY KEY, domain BLOB, partial_data BLOB, complete_data BLOB, chunk_id INTEGER, table_id INTEGER)"

The stack trace, such as it is:

#0 0x0000000000000000 in ?? ()
#1 0x00007ffff3f0746d in fcntlSizeHint (pFile=0x7fffdee7e150, nByte=2048) at sqlite3.c:27511
#2 0x00007ffff3f0751f in unixFileControl (id=0x7fffdee7e150, op=5, pArg=0x7fffdd4af4e0) at sqlite3.c:27559
#3 0x00007ffff3efca94 in sqlite3OsFileControl (id=0x7fffdee7e150, op=5, pArg=0x7fffdd4af4e0) at sqlite3.c:14017
#4 0x00007ffff3f13120 in pager_write_pagelist (pPager=0x7fffdee7e008, pList=0x7fffdee40808) at sqlite3.c:40338
#5 0x00007ffff3f16232 in sqlite3PagerCommitPhaseOne (pPager=0x7fffdee7e008, zMaster=0x0, noSync=0) at sqlite3.c:42183
#6 0x00007ffff3f23373 in sqlite3BtreeCommitPhaseOne (p=0x7fffdee35428, zMaster=0x0) at sqlite3.c:50086
#7 0x00007ffff3f378a3 in vdbeCommit (db=0x7fffdee7dc08, p=0x7fffdefef1e8) at sqlite3.c:58743
#8 0x00007ffff3f38397 in sqlite3VdbeHalt (p=0x7fffdefef1e8) at sqlite3.c:59145
#9 0x00007ffff3f3eda6 in sqlite3VdbeExec (p=0x7fffdefef1e8) at sqlite3.c:63034
#10 0x00007ffff3f3b0e9 in sqlite3Step (p=0x7fffdefef1e8) at sqlite3.c:60607
#11 0x00007ffff3f3b37c in sqlite3_step (pStmt=0x7fffdefef1e8) at sqlite3.c:60672
#12 0x00007ffff3f71581 in sqlite3_exec (db=0x7fffdee7dc08,
zSql=0x7ffff57c8d80 "CREATE TABLE IF NOT EXISTS moz_classifier (id INTEGER PRIMARY KEY, domain BLOB, partial_data BLOB, complete_data BLOB, chunk_id INTEGER, table_id INTEGER)", xCallback=0, pArg=0x0, pzErrMsg=0x0) at sqlite3.c:86244
#13 0x00007ffff527340e in ExecuteSimpleSQL (this=0x7fffdef19440, aSQLStatement=<value optimized out>)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/storage/src/mozStorageConnection.cpp:855
#14 mozilla::storage::Connection::ExecuteSimpleSQL (this=0x7fffdef19440, aSQLStatement=<value optimized out>)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/storage/src/mozStorageConnection.cpp:850
#15 0x00007ffff518cb3e in nsUrlClassifierDBServiceWorker::MaybeCreateTables (this=<value optimized out>, connection=0x7fffdef19440)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp:3507
#16 0x00007ffff5190d0f in nsUrlClassifierDBServiceWorker::OpenDb (this=0x7fffdee7b400)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp:3434
#17 0x00007ffff51919a6 in nsUrlClassifierDBServiceWorker::BeginUpdate (this=0x7fffdee7b400, observer=0x7fffdee33900, tables=..., clientKey=...)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp:2921
#18 0x00007ffff54db6d7 in NS_InvokeByIndex_P (that=<value optimized out>, methodIndex=<value optimized out>, paramCount=3, params=<value optimized out>)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:195
#19 0x00007ffff54d009b in nsProxyObjectCallInfo::Run (this=0x7fffdee351f0) at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/proxy/src/nsProxyEvent.cpp:181
#20 0x00007ffff54c9bf2 in nsThread::ProcessNextEvent (this=0x7fffdef19080, mayWait=1, result=0x7fffdd4afe2c)
at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/threads/nsThread.cpp:633
#21 0x00007ffff5485f0b in NS_ProcessNextEvent_P (thread=<value optimized out>, mayWait=<value optimized out>) at nsThreadUtils.cpp:250
#22 0x00007ffff54c98a5 in nsThread::ThreadFunc (arg=0x7fffdef19080) at /home/ataraxia/junk/xulrunner/src/mozilla-2.0/xpcom/threads/nsThread.cpp:278
#23 0x00007ffff61ff2a3 in ?? () from /usr/lib/libnspr4.so
#24 0x00007ffff7bc9d40 in start_thread () from /lib/libpthread.so.0
#25 0x00007ffff7428aed in clone () from /lib/libc.so.6
#26 0x0000000000000000 in ?? ()
Comment by gnudna (gnudna) - Thursday, 14 April 2011, 02:45 GMT
I also ran the above commands mentioned to check the sqlite3 files generated with firefox with nothing special I even deleted my whole profile including hidden files and this issue is still present and this was on clean sqlite3 files generated by firefox4 , hence my comment about downgrading the sqlite3 package for a temp fix.

Note helpfull for debugging reasons but a temp fix for this issue is As someone mentioned running the previous version of sqlite3 fixed my issue. I actually had a copy of the bin in /var/cache/pacman/pkg

Comment by Daniel (AurosGamma) - Thursday, 14 April 2011, 04:23 GMT
Actually is not necessary to have a copy of the binary file, if you want to install sqlite3 from [core] just put on terminal:

$ pacman -S core/sqlite3

And that's all, the package will be downgraded automatically
Greetings
Comment by Ionut Biru (wonder) - Thursday, 14 April 2011, 11:01 GMT
@Ray, now that you understand what is happening, can you submit a bug to sqlite and see what they have to say?

it will be nice if Daniel can get the backtrace for empathy/telepathy-logger too mostly to see if is crashing in the same place
Comment by Ray (ataraxia) - Thursday, 14 April 2011, 18:21 GMT
Another user at https://bbs.archlinux.org/viewtopic.php?pid=918756#p918756 reports this is fixed in 3.7.6.1-1 in [testing] today, never mind that this wasn't expected. So far, I agree. What crashed instantly, with 100% reproducibility, on 3.7.6-1, works fine for me now.
Comment by Daniel (AurosGamma) - Thursday, 14 April 2011, 18:22 GMT
@lonut I would do it gladly, but my knowledge isn't good enough to know what i have to do exactly, i've never compiled from source code except my own programs(simple ones), please excuse me.
Greetings
Comment by Andreas Radke (AndyRTR) - Thursday, 14 April 2011, 19:45 GMT
So this is fixed for everybody with 3.7.6.1 in testing?
Comment by Ionut Biru (wonder) - Thursday, 14 April 2011, 21:32 GMT
we have another confirmation via forums and another via irc.
Comment by jwbirdsong (jwbirdsong) - Thursday, 14 April 2011, 21:57 GMT
and another confirmation here. Seems fine now.

Loading...