FS#9063 - xine performance Issue 64 vs 32 bit build

Attached to Project: Arch Linux
Opened by Sander Jansen (GogglesGuy) - Monday, 31 December 2007, 19:35 GMT
Last edited by Aaron Griffin (phrakture) - Wednesday, 23 April 2008, 16:23 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Aaron Griffin (phrakture)
Architecture All
Severity Critical
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Running the latest kernel, I noticed a severe performance difference between 64 bit and 32 bit. When using xine, trying stop a paying stream it seems to wait for a mutex to be released. This releasing of the mutex is significant slower on my two 64 bit machines, compared to my 32 bit machine. The two 64 bit machines are a Intel Pentium 4 and a Intel Core 2 Duo, while the 32 bit machine is a Athlon XP 1600 (5 years old!!).

I've attached a log with some timings (cpu ticks and time). 32 bit seems to be more than 10x faster. A guy on the xine mailinglist mentioned this, which may or may not have anything to do with this:

"I noticed that on a previous release of Xine, but It was not a Xine
issue but a kernel issue (on my hardware at least). The problem was
related to a mutex which took time to release or lock (I didn't remember
exactly) sometimes (the problem was not systematic). I tried several
kernel (from 2.6.16 to 2.6.20) with several configuration (low latency,
BKL or not, 100HZ, 250HZ, 1000HZ), and the only one which didn't suffer
this problem was 2.6.17.14 with this :
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set
CONFIG_HZ_250=y
CONFIG_HZ=250"
   timings (2.5 KiB)
This task depends upon

Closed by  Aaron Griffin (phrakture)
Wednesday, 23 April 2008, 16:23 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed in xine 1.1.12
Comment by Sander Jansen (GogglesGuy) - Monday, 31 December 2007, 19:37 GMT
Note that timings measure how long it takes to return from the call 'xine_close(stream)'
Comment by Tobias Powalowski (tpowa) - Thursday, 03 January 2008, 18:24 GMT
try one of the rc kernels instead:
http://dev.archlinux.org/~tpowa/2.6.24/
Comment by Tobias Powalowski (tpowa) - Friday, 11 January 2008, 10:05 GMT
is the rc kernel working or not?
Comment by Sander Jansen (GogglesGuy) - Friday, 11 January 2008, 14:15 GMT
Sorry, I haven't had time to compile a custom kernel yet. I'll try to do it this weekend.
Comment by Dale Blount (dale) - Monday, 14 January 2008, 20:10 GMT
You shouldn't have to compile anything, tpowa linked to binary packages.
Comment by Sander Jansen (GogglesGuy) - Monday, 14 January 2008, 20:39 GMT
Ok, you're right, how stupid of me.
I've tried rc-6, the timings didn't seem to improve:
(mind you, I used my other 64 bit machine this time to check the timings):

ticks: 1533992227 time: 0.350000
ticks: 1430852572 time: 0.490000
ticks: 1387090612 time: 0.420000
ticks: 1396862580 time: 0.420000
ticks: 1253362432 time: 0.400000
ticks: 1580758372 time: 0.390000
ticks: 1186466767 time: 0.400000
ticks: 1657419000 time: 0.550000
ticks: 2170108147 time: 0.680000
ticks: 2563092405 time: 0.810000
ticks: 1072197525 time: 0.360000
ticks: 2003402902 time: 0.650000
ticks: 2149056405 time: 0.670000
ticks: 672847357 time: 0.220000
ticks: 2741860237 time: 0.800000
Comment by Tobias Powalowski (tpowa) - Saturday, 26 January 2008, 16:06 GMT
status on .24 kernel in testing?
Comment by Sander Jansen (GogglesGuy) - Sunday, 27 January 2008, 01:37 GMT
I've upgraded both my 32 bit desktop and 64 bit laptop to kernel 2,6.24-2. There is still that major performance difference.
The new timings are on average 0.02s (32bit) vs 0.70s (64bit)
Comment by Sander Jansen (GogglesGuy) - Sunday, 27 January 2008, 01:42 GMT
On closer look the timings for 32 bit actually improved over 2.6.23:
ticks: 13366938 time: 0.010000
ticks: 36534152 time: 0.000000
ticks: 34689161 time: 0.000000
ticks: 30580979 time: 0.010000
ticks: 38078827 time: 0.010000
ticks: 36933739 time: 0.010000

The timings of 64bit seems worse than before:
ticks: 775073295 time: 0.530000
ticks: 1286406621 time: 0.920000
ticks: 458499132 time: 0.320000
ticks: 1180204587 time: 0.830000
ticks: 766595115 time: 0.560000
ticks: 905110875 time: 0.650000
ticks: 859795344 time: 0.630000
ticks: 1374765759 time: 0.980000
ticks: 1314767367 time: 0.900000
Comment by Tobias Powalowski (tpowa) - Sunday, 27 January 2008, 20:25 GMT
please report this issue on the xine bugtracker, i talked with one of the devs there, he is interested in debugging it on xine bugtracker
Comment by Sander Jansen (GogglesGuy) - Sunday, 27 January 2008, 21:40 GMT
I've submitted a bug report on the xine-bugtracker:
http://bugs.xine-project.org/show_bug.cgi?id=33
Comment by Sander Jansen (GogglesGuy) - Friday, 22 February 2008, 22:42 GMT
Ok, I haven't heard anything from the xine developers (they seem to be asleep, or never got my email on their mailing list). I think this issue is related to the use of sched_yield in the xine demuxer code.

Basically the demuxer thread in xine unlocks a mutex for a short time to allow other threads to lock that mutex. I'm thinking that on multi core cpu's the sched_yield waiting time is too short to allow the other thread to lock the mutex.

So in short, I think the kernel is ok (unless sched_yield is broken ofcourse :P) , it's not a 64 vs 32 bit issue, but more a single vs multi-core issue. The fix I proposed on the xine mailinglist (and which seems to work for me) is not to use sched_yield, but rather use a sleep instead.
Comment by Sander Jansen (GogglesGuy) - Tuesday, 25 March 2008, 15:36 GMT
xine-lib 1.1.11 should have this issue fixed.
Comment by Sander Jansen (GogglesGuy) - Wednesday, 23 April 2008, 16:19 GMT
xine-lib 1.1.12 in extra fixes this issues. You may close this bug report. Thanks!

Loading...