FS#21921 - [kernel26] Computer shutting down due to high temperature
Attached to Project:
Arch Linux
Opened by Dominik (cpcgm) - Monday, 29 November 2010, 15:38 GMT
Last edited by Tobias Powalowski (tpowa) - Tuesday, 14 February 2012, 14:41 GMT
Opened by Dominik (cpcgm) - Monday, 29 November 2010, 15:38 GMT
Last edited by Tobias Powalowski (tpowa) - Tuesday, 14 February 2012, 14:41 GMT
|
Details
Description: In the last 18 hours, my computer has shut down
about 8 times due to high temperature. The log says:
> kernel: Critical temperature reached (128 C), shutting down. However, the temperature has been below and around 70 ° C which should not be a problem. Additional info: * kernel26 2.6.36.1-3 * ThinkPad X61s Steps to reproduce: Start the computer. Work. Wait for it to shut down... |
This task depends upon
Closed by Tobias Powalowski (tpowa)
Tuesday, 14 February 2012, 14:41 GMT
Reason for closing: Upstream
Tuesday, 14 February 2012, 14:41 GMT
Reason for closing: Upstream
"It seems that temp1 went from 74C to 128C in one second. I find that hard to believe as I was working on my computer at the time of the shutdown and nothing really would warrant that.
I suppose it's possible that there is a hardware issue and the sensor maxes out for no reason, but given that this only started happening with the upgrade to 9.10, I think it is a software issue."
On the day it started I switched from WiFi to cable based network and I installed gnome-media-pulse and gnome-settings-daemon-pulse. I've been using pulseaudio before but I think there was also a package update that day. Is there anything I can do to help triage that bug?
1) bad hardware design.
2) cooling system fail.
3) cooling system needs a clean.
4) your room temperature is too high.
5) hardware sensors are failing.
6) ACPI DSDT code is doing something bad.
7) Linux thermal driver is confused by ACPI DSDT code.
8) thermal driver is working bad for your hardware.
9) thermal driver needs some adjust for your hardware.
1, 8 - 9: The ThinkPad X61s is not a new machine. I've been using it for three years without any problems, many others have too. If there haven't been any changes to drivers, I doubt they are the reason.
ACPI THERMAL DRIVER
M: Zhang Rui <rui.zhang@intel.com>
L: linux-acpi@vger.kernel.org
W: http://www.lesswatts.org/projects/acpi/
S: Supported
F: drivers/acpi/*thermal*
[root@rails log]# cat kernel.log | grep "Apr 27 23:36:"
Apr 27 23:36:20 rails kernel: [ 5681.650766] CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 27 23:36:20 rails kernel: [ 5681.650773] CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 27 23:36:20 rails kernel: [ 5681.650775] Disabling lock debugging due to kernel taint
Apr 27 23:36:20 rails kernel: [ 5681.651801] CPU2: Core temperature/speed normal
Apr 27 23:36:20 rails kernel: [ 5681.651802] CPU3: Core temperature/speed normal
Apr 27 23:36:27 rails kernel: [ 5688.822292] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:27 rails kernel: [ 5688.828867] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:32 rails kernel: [ 5693.403292] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:32 rails kernel: [ 5693.411178] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:38 rails kernel: [ 5699.683254] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf2500000-0xf251ffff] (PCI address [0xf2500000-0xf251ffff])
Apr 27 23:36:38 rails kernel: [ 5699.683273] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf2525000-0xf2525fff] (PCI address [0xf2525000-0xf2525fff])
Apr 27 23:36:38 rails kernel: [ 5699.683285] e1000e 0000:00:19.0: BAR 2: set to [io 0x1820-0x183f] (PCI address [0x1820-0x183f])
Apr 27 23:36:38 rails kernel: [ 5699.683314] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
Apr 27 23:36:38 rails kernel: [ 5699.683348] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
Apr 27 23:36:38 rails kernel: [ 5699.683411] e1000e 0000:00:19.0: PME# disabled
Apr 27 23:36:38 rails kernel: [ 5699.683511] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X
Apr 27 23:36:38 rails kernel: [ 5699.764050] e1000e 0000:00:19.0: PME# enabled
But before that make sure its not a case of bad hardware, cooling system failure and the rest on Gerardo's list above.
$ uname -a
Linux pluto 2.6.38-ARCH #1 SMP PREEMPT Fri May 13 07:54:18 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux
$ dmesg | grep ips
dmesg[ 5.257007] intel ips 0000:00:1f.6: CPU TDP doesn't match expected value (found 25, expected 29)
[ 5.257030] intel ips 0000:00:1f.6: PCI INT D -> GSI 19 (level, low) -> IRQ 19
[ 5.257141] intel ips 0000:00:1f.6: failed to get i915 symbols, graphics turbo disabled
[ 5.262832] intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 90
[ 5.479463] agpgart-intel 0000:00:00.0: Intel HD Graphics Chipset
[ 10.462485] intel ips 0000:00:1f.6: i915 driver attached, reenabling gpu turbo
[54989.664786] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9087, limit 9000
Current:
$ uname -a
Linux pluto 2.6.38-ARCH #1 SMP PREEMPT Tue Jun 7 06:40:04 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux
I was having the shutdown issue with:
$ uname -a
Linux pluto 2.6.38-ARCH #1 SMP PREEMPT Fri May 13 07:54:18 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux
Jun 20 20:08:10 rails kernel: [198506.039415] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9094, limit 9000
Jun 20 20:08:15 rails kernel: [198511.039882] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9293, limit 9000
Jun 20 20:08:20 rails kernel: [198516.040387] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9396, limit 9000
Jun 20 20:08:25 rails kernel: [198521.040855] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9168, limit 9000
Jun 20 20:08:30 rails kernel: [198526.041343] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9299, limit 9000
Jun 20 20:08:35 rails kernel: [198531.041838] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9699, limit 9000
Jun 20 20:08:40 rails kernel: [198536.042325] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9799, limit 9000
Jun 20 20:08:45 rails kernel: [198541.042807] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9859, limit 9000
Jun 20 20:08:50 rails kernel: [198546.043280] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9817, limit 9000
Jun 20 20:08:55 rails kernel: [198551.043780] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9899, limit 9000
Jun 20 20:09:00 rails kernel: [198556.044248] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9898, limit 9000
Jun 20 20:09:05 rails kernel: [198561.044738] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9899, limit 9000
Jun 20 20:09:10 rails kernel: [198566.045221] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9900, limit 9000
Jun 20 20:09:15 rails kernel: [198571.045744] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9946, limit 9000
Jun 20 20:09:20 rails kernel: [198576.046225] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9956, limit 9000
Jun 20 20:09:25 rails kernel: [198581.046687] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9999, limit 9000
Jun 20 20:09:28 rails kernel: [198584.371619] Critical temperature reached (100 C), shutting down.
Jun 20 20:09:28 rails kernel: [198584.372733] Critical temperature reached (100 C), shutting down.
$ uname -a
unameLinux pluto 2.6.38-ARCH #1 SMP PREEMPT Tue Jun 7 06:40:04 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/751689
I had almost 10 shutdowns in the last two days. That is unacceptable.
CPU0: Thermal monitoring enabled (TM2)
using mwait in idle threads.
ACPI: Core revision 20110316
Source: http://www.webupd8.org/2011/06/linux-kernel-power-issue-fix.html
I also had the same issue until I applied the fix above.
I guess the only hardware problem that could cause this kind of issue is a temperature sensor that peaks. But if the sensor was defect, why would it not peak all the time? Why would it not peak when I compile, why only when I watch a video. I saw it myself once. The temperature (according to the sensor) went from a decent value to 128 ° C in one second.
Some of this messages appears in syslog
[ 3062.696741] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9247, limit 9000
I did check the temp at that moment and it was close to 80 degrees, is that right?
acpi -V says that must be switch to critical mode at 128 degrees, but the system is shutting down at this temp.
* Tried the kernels 2.6.32 (Arch), 2.6.38 (Fedora), 2.6.40 (Fedora), 3.0.X (Arch);
* Blacklisted the intel_ips module;
* Tried the kernel option "processor.ignore_ppc=1";
* Tried the kernel option "pcie_aspm=force";
* Tried the kernel option "i915.i915_enable_rc6=1";
* Updated my BIOS.
Nothing has solved my problem and my computer was getting shutting down every day, so I resolved to install Windows 7 in order to see if the problem was just with Linux... the same thing happened!
The problem was dust (a lot of dust) in the fan. I cleaned everything, changed the thermal paste (processor and chipset), reassembled my laptop and everything is working like a charm. No more shutdowns or slowdowns. ;-)
My temperatures:
Before:
* Idle: 60ºC - 65ºC
* Full load: 90ºC!
Now:
* Idle: 40ºC - 45ºC (sometimes the fan switches itself off!)
* Full load: 70ºC - 75ºC
I'm just sharing my experience in case someone does not know what else to try.
All of a sudden the ASUS Sensor Suite II said that my MB Temperature reached 128 Degrees.
It's not that uncommon and only a hardware related issue as I found out while doing a google search.
It won't damage anything because the temperatur is still normal.
With Core i7 and related CPUs, there is no reason for a shutdown. Thermal throttling will handle this gracefully. Yes, it is a severe problem that might indicate cooling issues, but it might well be a faulty CPU that simply overheats all the time or reports temperature incorrectly (like in my desktop). Shutting down is therefore not an option for me.
Jan 30 17:39:41 rails kernel: [103484.595665] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9315, limit 9000
Jan 30 17:40:11 rails kernel: [103514.598568] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 10093, limit 9000
Jan 30 17:40:14 rails kernel: [103517.244267] Critical temperature reached (100 C), shutting down.
Jan 30 17:40:14 rails shutdown[9644]: shutting down for system halt
Jan 30 17:40:14 rails shutdown[9651]: shutting down for system halt
Jan 30 17:40:14 rails kernel: [103517.252225] Critical temperature reached (100 C), shutting down.
Jan 30 17:40:14 rails logger: ACPI group/action undefined: thermal_zone / LNXTHERM:00
Jan 30 17:40:14 rails logger: ACPI group/action undefined: thermal_zone / LNXTHERM:00
Jan 30 17:40:14 rails init: Switching to runlevel: 0
Jan 30 17:40:15 rails acpid: exiting