FS#8721 - BUG: soft lockup detected on CPU#0!

Attached to Project: Arch Linux
Opened by Eric Olsson (emo) - Tuesday, 20 November 2007, 19:15 GMT
Last edited by Tobias Powalowski (tpowa) - Friday, 07 December 2007, 09:47 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture i686
Severity Medium
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
After the kernel upgrade today 20/11-07 i got the following
errors in my 'dmesg'

BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================

I have not noticed any strange behavior so far so maybe not a serious bug.
Additional info:
* Kernel26 2.6.23.8-1
* HP nx6325
* 2.6.23-ARCH #1 SMP PREEMPT Sun Nov 18 07:43:05 UTC 2007 i686 Mobile AMD Sempron(tm) Processor 3500+ AuthenticAMD GNU/Linux


Steps to reproduce:

Upgrade to latest kernel on a HP nx6325
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Friday, 07 December 2007, 09:47 GMT
Reason for closing:  Fixed
Comment by Christian Lønaas (sokkalf) - Tuesday, 20 November 2007, 23:42 GMT
I can confirm this, happens to me too.

Everytime the cpufreq cpu scaling governor is set to "ondemand" or "userspace", I get these "soft lockup"-messages every 2-5 seconds. When set to "powersave" or "performance", they disappear, indicating that it probably has something to do with cpufreq and dynamic scaling of cpu speed.

System otherwise seems stable, with (maybe) a small slowdown.

My system :
Kernel: 2.6.23.8-1
2.6.23-ARCH #1 SMP PREEMPT Sun Nov 18 07:43:05 UTC 2007 i686 Intel(R) Pentium(R) M processor 1400MHz GenuineIntel GNU/Linux

I reverted to 2.6.23.1-6 for the time being, which works with no problems.
Comment by Keerthi (keerthi) - Wednesday, 21 November 2007, 04:22 GMT
I can confirm this too.

Linux DELL-XPS-G2 2.6.23-ARCH #1 SMP PREEMPT Sun Nov 18 07:43:05 UTC 2007 i686 Intel(R) Pentium(R) M processor 2.13GHz GenuineIntel GNU/Linux
Comment by Luca Peduto (luca) - Wednesday, 21 November 2007, 06:45 GMT
I can confirm this problem on my laptop
I found this on LKM [url]http://lkml.org/lkml/2007/11/20/188[/url]
Comment by João Rodrigues (gothicknight) - Thursday, 22 November 2007, 09:17 GMT
I can confirm this error on a laptop and a PC, both of them using AMD64 single-core processors (i686 arch installed).
Comment by Francois Charette (Firmicus) - Thursday, 22 November 2007, 14:08 GMT
Same here on a desktop
kernel26 2.6.23.8-1
2.6.23-ARCH #1 SMP PREEMPT Sun Nov 18 07:43:05 UTC 2007 i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux

Apart from filling up /var/log/messages.log, the "bug" does not seem to affect the system
Comment by Christian Lønaas (sokkalf) - Thursday, 22 November 2007, 14:56 GMT
This thread on the LKML explains the problem in detail, and provides some patches which fixes the issue :

http://lkml.org/lkml/2007/11/17/127
Comment by Vladimir Jary (hendrek) - Thursday, 22 November 2007, 18:56 GMT
I have the same problem with Intel(R) Pentium(R) M processor 1.73GHz. I built kernel with patched softlockup.c Patched kernel does not generate error messages, but cpufreq daemon is still not working:
[root@alderaan ~]# /etc/rc.d/cpufreq restart
:: Setting cpufreq governing rules , cpu 0Error setting new values. Common errors:
- Do you have proper administration rights? (super-user?)
- Is the governor you requested available and modprobed?
- Trying to set an invalid policy?
- Trying to set a specific frequency, but userspace governor is not available,
for example because of hardware which cannot be set to a specific frequency
or because the userspace governor isn't loaded?
However
[root@alderaan ~]# cpufreq-set -g ondemand
works fine.
I have attached modified PKGBUILD and softlockup.c patch.
Comment by Tobias Powalowski (tpowa) - Friday, 23 November 2007, 08:07 GMT
will be fixe don .9 kernel which should appear soon
Comment by Raul Antonio Ortega Moran (ramoran) - Saturday, 24 November 2007, 22:05 GMT
Yesterday upgrade my arch 32 bits, in my ACER Aspire 5050 and now, when put a SD card or any USB drive I cant mount it, and recive this dmesg:
... (many lines ommiteds with the same)
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<de24e628>] fat_ent_read+0x28/0x1c0 [fat]
[<c0237e70>] cfq_set_request+0x0/0x310
[<de24b0c9>] fat_cache_add+0x59/0x120 [fat]
[<de24b333>] fat_get_cluster+0x1a3/0x310 [fat]
[<de24b5a4>] fat_bmap+0x104/0x290 [fat]
[<c022ac11>] elv_insert+0xd1/0x230
[<de2505f5>] fat_get_block+0x55/0x2a0 [fat]
[<c01a3201>] bio_alloc_bioset+0x81/0x150
[<c01a4157>] bio_add_page+0x37/0x50
[<c01a7891>] do_mpage_readpage+0x201/0x710
[<de2505a0>] fat_get_block+0x0/0x2a0 [fat]
[<c015c2c6>] add_to_page_cache+0x66/0xc0
[<c01a7f85>] mpage_readpages+0x95/0x150
[<de2505a0>] fat_get_block+0x0/0x2a0 [fat]
[<de250860>] fat_readpages+0x0/0x20 [fat]
[<c0162bea>] __do_page_cache_readahead+0x1da/0x2d0
[<de2505a0>] fat_get_block+0x0/0x2a0 [fat]
[<c0123a8c>] update_stats_wait_end+0x9c/0xd0
[<c0162f41>] ondemand_readahead+0x111/0x140
[<c015c630>] do_generic_mapping_read+0x170/0x520
[<c0124ac8>] __wake_up+0x38/0x50
[<c015e24c>] generic_file_aio_read+0x10c/0x1b0
[<c015bdd0>] file_read_actor+0x0/0xe0
[<c017e995>] do_sync_read+0xd5/0x120
[<c036115a>] __mutex_lock_slowpath+0x16a/0x2d0
[<c01403b0>] autoremove_wake_function+0x0/0x40
[<c017e8c0>] do_sync_read+0x0/0x120
[<c017f26b>] vfs_read+0xbb/0x140
[<c017f751>] sys_read+0x41/0x70
[<c0104482>] sysenter_past_esp+0x6b/0xa1
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c012415d>] update_curr+0x12d/0x140
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c024001a>] __copy_to_user_ll+0x3a/0x70
[<c015be95>] file_read_actor+0xc5/0xe0
[<c015c824>] do_generic_mapping_read+0x364/0x520
[<c036115a>] __mutex_lock_slowpath+0x16a/0x2d0
[<c015e24c>] generic_file_aio_read+0x10c/0x1b0
[<c015bdd0>] file_read_actor+0x0/0xe0
[<c017e995>] do_sync_read+0xd5/0x120
[<c036115a>] __mutex_lock_slowpath+0x16a/0x2d0
[<c01403b0>] autoremove_wake_function+0x0/0x40
[<c017e8c0>] do_sync_read+0x0/0x120
[<c017f26b>] vfs_read+0xbb/0x140
[<c017f751>] sys_read+0x41/0x70
[<c0104482>] sysenter_past_esp+0x6b/0xa1
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
mmc1: card 379c removed
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c012415d>] update_curr+0x12d/0x140
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c024001a>] __copy_to_user_ll+0x3a/0x70
[<c015be95>] file_read_actor+0xc5/0xe0
[<c015c824>] do_generic_mapping_read+0x364/0x520
[<c015e24c>] generic_file_aio_read+0x10c/0x1b0
[<c015bdd0>] file_read_actor+0x0/0xe0
[<c017e995>] do_sync_read+0xd5/0x120
[<c036115a>] __mutex_lock_slowpath+0x16a/0x2d0
[<c01403b0>] autoremove_wake_function+0x0/0x40
[<c017e8c0>] do_sync_read+0x0/0x120
[<c017f26b>] vfs_read+0xbb/0x140
[<c017f751>] sys_read+0x41/0x70
[<c0104482>] sysenter_past_esp+0x6b/0xa1
=======================

after remove my cart dmesg say:

... (many lines ommiteds with the same)
BUG: soft lockup detected on CPU#0!
[<c0157d9a>] softlockup_tick+0xea/0x120
[<c0135853>] update_process_times+0x33/0x80
[<c0149077>] tick_sched_timer+0x77/0xf0
[<c0143fc3>] hrtimer_interrupt+0x163/0x1f0
[<c0149000>] tick_sched_timer+0x0/0xf0
[<c011ae90>] smp_apic_timer_interrupt+0x50/0x80
[<c0104fa4>] apic_timer_interrupt+0x28/0x30
[<c01206b2>] native_safe_halt+0x2/0x10
[<c0102cbd>] default_idle+0x3d/0x60
[<c0102453>] cpu_idle+0x73/0xe0
[<c0435a6a>] start_kernel+0x30a/0x3a0
[<c0435140>] unknown_bootoption+0x0/0x1f0
=======================

with my old kernel I don't recive this messeges. But with a VirtualBox VM with Windows XP I can work with USB. How can I send information for details?? Is this other bug, or not??? Thanks ArchDebugger teem!!! Its not urgent for me and I like cooperate with debug. Thanks!
Comment by Tobias Powalowski (tpowa) - Sunday, 25 November 2007, 16:03 GMT
http://www.archlinux.org/~tpowa/2.6.23/
does this kernel fix your issue?
Comment by Christian Lønaas (sokkalf) - Sunday, 25 November 2007, 17:19 GMT
Seems to work great, thanks tpowa!
Comment by Eric Olsson (emo) - Sunday, 25 November 2007, 17:58 GMT
I tried that kernel you linked to Tpowa and it crashed horrible when loading modules. I had to boot up on my cd and do a pacman -S kernel26 to go back to the one that spits out the error messg. It doesnt seem to have been logged though, i have searched through all files under /var/log and havent found anything.
Comment by Luca Peduto (luca) - Sunday, 25 November 2007, 18:08 GMT
Hi Tpowa,
this kernel fixes the problem, but now my laptop freezes when start X
I have an ATI card with the latest driver:
catalyst 7.11-1
catalyst-utils 7.11-1
Comment by Luca Peduto (luca) - Sunday, 25 November 2007, 18:09 GMT
This is the Xorg log file:
Comment by Christian Lønaas (sokkalf) - Sunday, 25 November 2007, 18:10 GMT
Did you download and install the catalyst-7.11-2-i686.pkg.tar.gz present in the same directory as tpowas kernel?
Comment by Tobias Powalowski (tpowa) - Sunday, 25 November 2007, 18:20 GMT
sure you need the modules too in this directory, the new cfs scheduler is incompatible with old modules
Comment by Luca Peduto (luca) - Sunday, 25 November 2007, 18:36 GMT
Ok, I'm feel very stupid :-)
Comment by Luca Peduto (luca) - Sunday, 25 November 2007, 18:48 GMT
Thank you Tpowa, the kernel works great!
Comment by Vladimir Jary (hendrek) - Sunday, 25 November 2007, 18:55 GMT
Ok, the problem seems to be fixed. Thank you!
Comment by João Rodrigues (gothicknight) - Tuesday, 27 November 2007, 17:39 GMT
FIXED in kernel26-2.6.23.9-1
Comment by Eric Olsson (emo) - Thursday, 29 November 2007, 20:22 GMT
Dont know what you guys did to get it to work great but i'v tried both pacman from the testing and allso downloaded it from the link that tpowa added here. Both made my poor little laptop crash horrible. Should i just hold my breath and wait for it to come into core ? I can currently not listen to music (xmms) cause it crashes my comp :(

Loading...