FS#21921 - [kernel26] Computer shutting down due to high temperature

Attached to Project: Arch Linux
Opened by Dominik (cpcgm) - Monday, 29 November 2010, 15:38 GMT
Last edited by Tobias Powalowski (tpowa) - Tuesday, 14 February 2012, 14:41 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 6
Private No

Details

Description: In the last 18 hours, my computer has shut down about 8 times due to high temperature. The log says:

> kernel: Critical temperature reached (128 C), shutting down.

However, the temperature has been below and around 70 ° C which should not be a problem.

Additional info:
* kernel26 2.6.36.1-3
* ThinkPad X61s

Steps to reproduce:

Start the computer. Work. Wait for it to shut down...
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Tuesday, 14 February 2012, 14:41 GMT
Reason for closing:  Upstream
Comment by Dominik (cpcgm) - Monday, 29 November 2010, 23:50 GMT
The problem persists with the LTS-Kernel.
Comment by Dominik (cpcgm) - Tuesday, 30 November 2010, 12:58 GMT
And also with the kernel option "noacpi".
Comment by Dominik (cpcgm) - Tuesday, 30 November 2010, 13:09 GMT
The problem has been described several times for different Linux distributions. One example from last year, which sounds exactly like the problem I have (I'm also using a docking station) can be found at http://ubuntuforums.org/showpost.php?p=8324617&postcount=3. Quote:

"It seems that temp1 went from 74C to 128C in one second. I find that hard to believe as I was working on my computer at the time of the shutdown and nothing really would warrant that.

I suppose it's possible that there is a hardware issue and the sensor maxes out for no reason, but given that this only started happening with the upgrade to 9.10, I think it is a software issue."
Comment by Dominik (cpcgm) - Wednesday, 01 December 2010, 22:15 GMT
I ran the computer without the docking station and it shut down again. I don't know the reason. It mostly happens during video playback (no matter if I use gnome-mplayer, vlc or totem) and without the dock it seemed as if I could watch longer than before. No problems when compiling code in eclipse at temperatures about 80 ° C, much higher than then the ones at which the computer shuts down.

On the day it started I switched from WiFi to cable based network and I installed gnome-media-pulse and gnome-settings-daemon-pulse. I've been using pulseaudio before but I think there was also a package update that day. Is there anything I can do to help triage that bug?
Comment by Dominik (cpcgm) - Monday, 06 December 2010, 13:22 GMT
This is really annoying. Is there anything I can do to help triage? Is anyone at least reading this?
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 07 December 2010, 00:25 GMT
  • Field changed: Summary (Computer shutting down due to high temperature → [kernel26] Computer shutting down due to high temperature)
  • Field changed: Status (Unconfirmed → Assigned)
  • Field changed: Category (Packages: Core → Upstream Bugs)
  • Task assigned to Thomas Bächler (brain0), Tobias Powalowski (tpowa)
Too complex, can be a:

1) bad hardware design.
2) cooling system fail.
3) cooling system needs a clean.
4) your room temperature is too high.
5) hardware sensors are failing.
6) ACPI DSDT code is doing something bad.
7) Linux thermal driver is confused by ACPI DSDT code.
8) thermal driver is working bad for your hardware.
9) thermal driver needs some adjust for your hardware.
Comment by Dominik (cpcgm) - Tuesday, 07 December 2010, 00:41 GMT
1 - 5: Well, as I said, I doubt it's a hardware issue. The temperature of the machine is not as high as it has been, the sensor readings show temperatures between 60 ° C and 75 ° C. According to Intel, the Core2Duo was designed to even work at 100 ° C. It's also strange that most of the shut downs occur while playing movies.

1, 8 - 9: The ThinkPad X61s is not a new machine. I've been using it for three years without any problems, many others have too. If there haven't been any changes to drivers, I doubt they are the reason.
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 07 December 2010, 01:04 GMT
Contact upstream.

ACPI THERMAL DRIVER
M: Zhang Rui <rui.zhang@intel.com>
L: linux-acpi@vger.kernel.org
W: http://www.lesswatts.org/projects/acpi/
S: Supported
F: drivers/acpi/*thermal*
Comment by Dominik (cpcgm) - Tuesday, 15 February 2011, 01:46 GMT
I wrote Zhang Rui months ago but didn't hear anything back...
Comment by Dominik (cpcgm) - Tuesday, 15 February 2011, 01:47 GMT
The shutdowns have been gone for a few weeks but now I got another three or four in the last couple of days.
Comment by Peter B. Jørgensen (peder2tm) - Thursday, 10 March 2011, 20:35 GMT
It might be a real thermal problem. See this thread https://bbs.archlinux.org/viewtopic.php?id=112968
Comment by Greg (dolby) - Wednesday, 13 April 2011, 11:51 GMT
Is this still a problem?
Comment by Dominik (cpcgm) - Wednesday, 13 April 2011, 14:50 GMT
The shutdowns are not as frequent as they used to be -- they got down to maybe one every two weeks -- but I wouldn't say the problem is resolved. This again shows that the problem is most likely not connected to my hardware or usage pattern as I didn't change anything.
Comment by Stefan Schick (pommes_) - Thursday, 14 April 2011, 14:02 GMT
I have the same problem on a Thinkpad X201 with a i7-620M processor. Based on the different Hardware the problem occurs I would say it is mainly a software problem.
Comment by Stefan Schick (pommes_) - Wednesday, 27 April 2011, 21:59 GMT
This time it happened while I made a backup of my internal hard disk to a USB disk

[root@rails log]# cat kernel.log | grep "Apr 27 23:36:"
Apr 27 23:36:20 rails kernel: [ 5681.650766] CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 27 23:36:20 rails kernel: [ 5681.650773] CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 27 23:36:20 rails kernel: [ 5681.650775] Disabling lock debugging due to kernel taint
Apr 27 23:36:20 rails kernel: [ 5681.651801] CPU2: Core temperature/speed normal
Apr 27 23:36:20 rails kernel: [ 5681.651802] CPU3: Core temperature/speed normal
Apr 27 23:36:27 rails kernel: [ 5688.822292] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:27 rails kernel: [ 5688.828867] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:32 rails kernel: [ 5693.403292] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:32 rails kernel: [ 5693.411178] Critical temperature reached (128 C), shutting down.
Apr 27 23:36:38 rails kernel: [ 5699.683254] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf2500000-0xf251ffff] (PCI address [0xf2500000-0xf251ffff])
Apr 27 23:36:38 rails kernel: [ 5699.683273] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf2525000-0xf2525fff] (PCI address [0xf2525000-0xf2525fff])
Apr 27 23:36:38 rails kernel: [ 5699.683285] e1000e 0000:00:19.0: BAR 2: set to [io 0x1820-0x183f] (PCI address [0x1820-0x183f])
Apr 27 23:36:38 rails kernel: [ 5699.683314] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
Apr 27 23:36:38 rails kernel: [ 5699.683348] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
Apr 27 23:36:38 rails kernel: [ 5699.683411] e1000e 0000:00:19.0: PME# disabled
Apr 27 23:36:38 rails kernel: [ 5699.683511] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X
Apr 27 23:36:38 rails kernel: [ 5699.764050] e1000e 0000:00:19.0: PME# enabled
Comment by Greg (dolby) - Thursday, 28 April 2011, 08:54 GMT
This bug cant be fixed by the Arch developers. You have to report to the kernel ones.
But before that make sure its not a case of bad hardware, cooling system failure and the rest on Gerardo's list above.
Comment by Murilo Pereira (mpereira) - Tuesday, 24 May 2011, 18:08 GMT
I was having this issue 4 months ago. After some kernel updates (sorry I didn't keep track of versions), it stopped happening. Now it's back again. CPU temperature goes from 60 to 100 in 3 seconds.

$ uname -a
Linux pluto 2.6.38-ARCH #1 SMP PREEMPT Fri May 13 07:54:18 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux

$ dmesg | grep ips
dmesg[ 5.257007] intel ips 0000:00:1f.6: CPU TDP doesn't match expected value (found 25, expected 29)
[ 5.257030] intel ips 0000:00:1f.6: PCI INT D -> GSI 19 (level, low) -> IRQ 19
[ 5.257141] intel ips 0000:00:1f.6: failed to get i915 symbols, graphics turbo disabled
[ 5.262832] intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 90
[ 5.479463] agpgart-intel 0000:00:00.0: Intel HD Graphics Chipset
[ 10.462485] intel ips 0000:00:1f.6: i915 driver attached, reenabling gpu turbo
[54989.664786] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9087, limit 9000
Comment by Jim Yang (yangjeep) - Monday, 30 May 2011, 14:43 GMT
NEEDINFO: Could the reporter provide the log of your CPU occupancy? that might helps
Comment by Jim Yang (yangjeep) - Monday, 30 May 2011, 14:50 GMT
BTW, the hardware design of some thinkpad models may cause this problem. To avoid the overheat problem you might need to open it and clean the devices (especially the fans).
Comment by Dominik (cpcgm) - Monday, 30 May 2011, 14:53 GMT
Yeah. But, as I pointed out several times, this can't just be a hardware issue. The computer shuts when watching a video at 70°C and keeps running while compiling code at 89°C. And it sometimes shuts down all the time, sometimes it doesn't. It also seems to shut down more often when it's in the UltraBase. That could be due to a higher temperature but the sensor doesn't show high temperatures.
Comment by Jim Yang (yangjeep) - Monday, 30 May 2011, 15:03 GMT
I think you might report to kernel.org then.
Comment by Murilo Pereira (mpereira) - Friday, 10 June 2011, 18:27 GMT
The issue was yet again fixed for me with a kernel update. The CPU temperature caps at 61C with tasks requiring 100% CPU on the four cores. I didn't clean, open or handle my machine in any way. The only thing that changed is package updates and reboots. So I think saying this is a Thinkpad hardware issue can be discarded. At least for my model; I have a x201.

Current:
$ uname -a
Linux pluto 2.6.38-ARCH #1 SMP PREEMPT Tue Jun 7 06:40:04 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux


I was having the shutdown issue with:
$ uname -a
Linux pluto 2.6.38-ARCH #1 SMP PREEMPT Fri May 13 07:54:18 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux
Comment by Leonid Isaev (lisaev) - Monday, 20 June 2011, 18:48 GMT
Are there any error regarding BIOS/firmware bugs in dmesg.log. It seems like a problem with a CPU driver... Does the latest kernel work fine?
Comment by Dominik (cpcgm) - Monday, 20 June 2011, 18:51 GMT
There has not been a shutdown with the latest kernel. And as Murilo said, the temperature is much lower. On my computer it's currently 67°C with 10 to 20 percent CPU load. It used to be much higher.
Comment by Stefan Schick (pommes_) - Monday, 20 June 2011, 18:58 GMT
I had a shutdown an hour ago with Kernel 2.6.39.1-1

Jun 20 20:08:10 rails kernel: [198506.039415] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9094, limit 9000
Jun 20 20:08:15 rails kernel: [198511.039882] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9293, limit 9000
Jun 20 20:08:20 rails kernel: [198516.040387] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9396, limit 9000
Jun 20 20:08:25 rails kernel: [198521.040855] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9168, limit 9000
Jun 20 20:08:30 rails kernel: [198526.041343] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9299, limit 9000
Jun 20 20:08:35 rails kernel: [198531.041838] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9699, limit 9000
Jun 20 20:08:40 rails kernel: [198536.042325] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9799, limit 9000
Jun 20 20:08:45 rails kernel: [198541.042807] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9859, limit 9000
Jun 20 20:08:50 rails kernel: [198546.043280] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9817, limit 9000
Jun 20 20:08:55 rails kernel: [198551.043780] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9899, limit 9000
Jun 20 20:09:00 rails kernel: [198556.044248] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9898, limit 9000
Jun 20 20:09:05 rails kernel: [198561.044738] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9899, limit 9000
Jun 20 20:09:10 rails kernel: [198566.045221] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9900, limit 9000
Jun 20 20:09:15 rails kernel: [198571.045744] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9946, limit 9000
Jun 20 20:09:20 rails kernel: [198576.046225] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9956, limit 9000
Jun 20 20:09:25 rails kernel: [198581.046687] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9999, limit 9000
Jun 20 20:09:28 rails kernel: [198584.371619] Critical temperature reached (100 C), shutting down.
Jun 20 20:09:28 rails kernel: [198584.372733] Critical temperature reached (100 C), shutting down.
Comment by Murilo Pereira (mpereira) - Monday, 20 June 2011, 19:03 GMT
I spoke too soon about the issue being fixed for me. The issue re-appeared after a reboot. I didn't update any packages.

$ uname -a
unameLinux pluto 2.6.38-ARCH #1 SMP PREEMPT Tue Jun 7 06:40:04 UTC 2011 i686 Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz GenuineIntel GNU/Linux
Comment by Dominik (cpcgm) - Wednesday, 29 June 2011, 14:44 GMT
Me too. I had one shutdown a couple of days ago and another two today :-(. Temperature was only around 73°C. Is there no way to triage this bug?
Comment by Stefan Schick (pommes_) - Wednesday, 29 June 2011, 16:06 GMT
Here is a link to the corresponding bugreport from the ubuntu bugtracker

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/751689

Comment by Dominik (cpcgm) - Sunday, 03 July 2011, 03:16 GMT
I'm not sure it's the same bug. My computer shuts down at temperatures of around 75°C at some times and does not shut down at temperatures of 90°C at other times. The Ubuntu users report temperatures around 100°C which is the maximum operating temperature for the CPU.

I had almost 10 shutdowns in the last two days. That is unacceptable.
Comment by Dominik (cpcgm) - Friday, 08 July 2011, 16:27 GMT
I changed the BIOS setting for processor, thermal and sound control in AC mode from Maximum Power to Balanced. It had always been Balanced in Battery mode. Result: The computer doesn't boot anymore. It stops after:

CPU0: Thermal monitoring enabled (TM2)
using mwait in idle threads.
ACPI: Core revision 20110316
Comment by Ross McDonald (rossmcd) - Thursday, 28 July 2011, 12:16 GMT
Try adding pcie_aspm=force to the kernel command line.
Source: http://www.webupd8.org/2011/06/linux-kernel-power-issue-fix.html
I also had the same issue until I applied the fix above.
Comment by x (onexused) - Sunday, 18 September 2011, 18:34 GMT
If this is still happening, to make sure it's a software issue and not a hardware one, perhaps you could dual-boot an old version (from before this problem started happening) of a stepped-release Linux distro? Or one of the BSDs? Or perhaps a non-*nix OS? I believe Ubuntu 8.04 and Slackware 8.1 are still available.
Comment by Dominik (cpcgm) - Sunday, 18 September 2011, 18:39 GMT
I had a fan problem and changed some other settings so the computer is much cooler (51 ° C) now. That means I can't test it.

I guess the only hardware problem that could cause this kind of issue is a temperature sensor that peaks. But if the sensor was defect, why would it not peak all the time? Why would it not peak when I compile, why only when I watch a video. I saw it myself once. The temperature (according to the sensor) went from a decent value to 128 ° C in one second.
Comment by Stefan Schick (pommes_) - Monday, 19 September 2011, 09:05 GMT
Since Kernel 3.0 I think it was, it seems to be fixed here.
Comment by Juan Andres Mucarquer (jamon) - Saturday, 24 September 2011, 20:26 GMT
Nope, Linux boogie 3.0-ARCH #1, Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz

Some of this messages appears in syslog
[ 3062.696741] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9247, limit 9000

I did check the temp at that moment and it was close to 80 degrees, is that right?
acpi -V says that must be switch to critical mode at 128 degrees, but the system is shutting down at this temp.
Comment by Leonel Freire (leonelfreire) - Saturday, 01 October 2011, 01:46 GMT
I was experiencing the same problem (same messages, PC slow like hell and turning off all the time). I tried a lot of things that I found when I'm was researching the problem:

* Tried the kernels 2.6.32 (Arch), 2.6.38 (Fedora), 2.6.40 (Fedora), 3.0.X (Arch);
* Blacklisted the intel_ips module;
* Tried the kernel option "processor.ignore_ppc=1";
* Tried the kernel option "pcie_aspm=force";
* Tried the kernel option "i915.i915_enable_rc6=1";
* Updated my BIOS.

Nothing has solved my problem and my computer was getting shutting down every day, so I resolved to install Windows 7 in order to see if the problem was just with Linux... the same thing happened!

The problem was dust (a lot of dust) in the fan. I cleaned everything, changed the thermal paste (processor and chipset), reassembled my laptop and everything is working like a charm. No more shutdowns or slowdowns. ;-)

My temperatures:
Before:
* Idle: 60ºC - 65ºC
* Full load: 90ºC!

Now:
* Idle: 40ºC - 45ºC (sometimes the fan switches itself off!)
* Full load: 70ºC - 75ºC

I'm just sharing my experience in case someone does not know what else to try.
Comment by Dominik (cpcgm) - Saturday, 01 October 2011, 12:07 GMT
The shutdowns will of course stop if you manage to lower the temperature. I did the same thing. But still: Something is wrong here. A CPU that's supposed to work at temperatures of 100 ° C shutting down at 75 ° C. A sensor that shows an increase of 50 ° C in one second. And I still don't understand why the shutdowns appeared while watching videos at low temperature and did not appear when compiling at a much higher temperature.
Comment by Peter B. Jørgensen (peder2tm) - Sunday, 02 October 2011, 08:42 GMT
Dominik: Watching video is different, because it is more GPU intensive. Maybe your GPU is overheating.
Comment by Dominik (cpcgm) - Sunday, 02 October 2011, 13:41 GMT
I know, but wouldn't it be a different sensor then? I saw my CPU sensor peaking.
Comment by Tim (blackout23) - Wednesday, 16 November 2011, 20:36 GMT
I have experienced the same with my new PC and ASUS P8Z68-V Pro Motherboard while in Windows.
All of a sudden the ASUS Sensor Suite II said that my MB Temperature reached 128 Degrees.
It's not that uncommon and only a hardware related issue as I found out while doing a google search.
It won't damage anything because the temperatur is still normal.
Comment by Andrej Podzimek (andrej) - Thursday, 26 January 2012, 10:47 GMT
This is what I did.

With Core i7 and related CPUs, there is no reason for a shutdown. Thermal throttling will handle this gracefully. Yes, it is a severe problem that might indicate cooling issues, but it might well be a faulty CPU that simply overheats all the time or reports temperature incorrectly (like in my desktop). Shutting down is therefore not an option for me.
Comment by Stefan Schick (pommes_) - Monday, 30 January 2012, 17:03 GMT
Since Kernel 3.2 the Problem seems to be back again.

Jan 30 17:39:41 rails kernel: [103484.595665] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 9315, limit 9000
Jan 30 17:40:11 rails kernel: [103514.598568] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 10093, limit 9000
Jan 30 17:40:14 rails kernel: [103517.244267] Critical temperature reached (100 C), shutting down.
Jan 30 17:40:14 rails shutdown[9644]: shutting down for system halt
Jan 30 17:40:14 rails shutdown[9651]: shutting down for system halt
Jan 30 17:40:14 rails kernel: [103517.252225] Critical temperature reached (100 C), shutting down.
Jan 30 17:40:14 rails logger: ACPI group/action undefined: thermal_zone / LNXTHERM:00
Jan 30 17:40:14 rails logger: ACPI group/action undefined: thermal_zone / LNXTHERM:00
Jan 30 17:40:14 rails init: Switching to runlevel: 0
Jan 30 17:40:15 rails acpid: exiting

Loading...