FS#32025 - [linux] 3.6.2 - 3.11.x results laptop 10 degrees warmer
Attached to Project:
Arch Linux
Opened by Bryan (bryan) - Wednesday, 17 October 2012, 06:33 GMT
Last edited by Tobias Powalowski (tpowa) - Thursday, 10 October 2013, 09:51 GMT
Opened by Bryan (bryan) - Wednesday, 17 October 2012, 06:33 GMT
Last edited by Tobias Powalowski (tpowa) - Thursday, 10 October 2013, 09:51 GMT
|
Details
Description: With nothing else changed (package, config,
used software), upgrading from kernel 3.5.6 to 3.6.2 results
in my laptop being 10 degrees warmer (~60-65°C). Reverting
to 3.5.6-1-ARCH brings lower temps (~50°C) back.
I'm using kernels from package, unmodified with only two modules loaded through modules-load.d (ecryptfs and oss-sound) : - Linux version 3.6.2-1-ARCH (tobias@T-POWA-LX) (gcc version 4.7.2 (GCC) ) #1 SMP PREEMPT Fri Oct 12 23:58:58 CEST 2012 - Linux version 3.5.6-1-ARCH (tobias@T-POWA-LX) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 SMP PREEMPT Sun Oct 7 19:30:49 CEST 2012 Laptop is Lenovo ThinkPad X220, model 429137G |
This task depends upon
Closed by Tobias Powalowski (tpowa)
Thursday, 10 October 2013, 09:51 GMT
Reason for closing: Upstream
Thursday, 10 October 2013, 09:51 GMT
Reason for closing: Upstream
All I can see in dmesg are quite a few lines going "CPU0: Package temperature above threshold, cpu clock throttled", "CPU0: Package temperature/speed normal", "CPU0: Package power limit notification", and "CPU0: Package power limit normal" (variations include replacing "Package" by "Core"), which I've certainly never seen before.
Edit: To illustrate the severity, my idle laptop just went from 82°C to 44°C when I downgraded to linux-3.5.6 and rebooted.
Personally, my battery times and heat are fine with 3.6, so the bug is specific to certain hardware and/or drivers.
powertop.hot (12.9 KiB)
If you can be bothered with building 15 kernels and testing, you can bisect it: First, test if it is fine with version 3.5.0 and problematic with 3.6.0, then (preferably built with the same configuration):
$ git bisect start
$ git bisect good v3.5
$ git bisect bad v3.6
Bisecting: 5071 revisions left to test after this (roughly 13 steps)
Downgrading to the 3.5.6 version fix the problem, so I will be waiting for a real fix to switch to the 3.6 branch.
With 3.6.2-1: 2.5GHz across the board (edit: Oddly enough, less wakeups and gpu ops/sec)
With 3.5.6-1: 800MHz for all processors
Both of these powertop snapshots were taken at boot time.
powertop-3.6.2-1-ARCH.html (59.1 KiB)
RC6 is no longer active. Look at this:
| GPU |
| |
| Active 100.0% |
| RC6 0.0% |
| RC6p 0.0% |
| RC6pp 0.0% |
| |
| |
| GPU |
| |
| Active 0.0% |
| RC6 100.0% |
| RC6p 0.0% |
| RC6pp 0.0% |
| |
| |
See this thread:
http://lists.freedesktop.org/archives/intel-gfx/2012-October/021327.html
Downgrading to 3.6.2 fixed the issue without fault.
this patch fixed it for me. Before it was at 35W and 96 degrees in idle... Now 8W and 40 degree.
1. Boot your laptop
2. Powertop reports a discharge rate of 9 ~ 11W. CPU Temp is around ~50C.
3. Suspend the laptop
4. Resume the laptop
5. Powertop reports a discharge rate of 15 ~ 18W. CPU Temp rises up to ~67C.
Only a reboot seems to fix this. Tested with kernels 3.5.6-1, 3.6.2-1 & 3.6.6-1.
- The GPU not using deep sleep states anymore after suspend; RC6 does not get used, it is permanently Active. Cleared on reboot.
- The CPU not using deep sleep states on occasion, even without suspend; a mostly idle core spends all of its time in either C0 or (mostly) C1, C7 being ignored. Cleared by waiting, suspend, or reboot.
- The CPU being stuck at a high frequency. Seems to be caused by suspend. Fiddling with cpupower does nothing. Cleared on reboot.
Here is my powerstat output http://pastebin.com/raw.php?i=8aEtZgxC which shows usage from 21-27W. Powertop shows 23W usage when idle.
My cpu temperature is constantly around 70C, my gpu temp is also in the 70C range.
I noticed really high values in powertop for my snd_hda_intel card, always at 100%, seems to be this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/877560, so this seems to improve thing *slightly*:
In file /etc/modprobe.d/modprobe.conf:
options snd_hda_intel power_save=1 power_save_controller=Y
But as far as I can tell, I don't suffer from the CPU being stuck at a frequency bug, cpupower seems to be doing it's job and keeping the frequency at 800MHz which is the minimum freq for my cpu http://pastebin.com/raw.php?i=2Tvb1hbB, /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq shows 800000, as does /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
I have to set the CPU governor on powersave before the suspend if I want that the GPU powersave stays active.
no recent kernel update and nothing else that i could understand to be related to this. but right now, as my CPU temp. is 10 centigrades plus again, i find this in my dmesg.log:
[ 6.499362] ACPI Warning: 0x0000000000000428-0x000000000000042f SystemIO conflicts with Region \GPIS 1 (20120320/utaddress-251)
[ 6.499367] ACPI Warning: 0x0000000000000428-0x000000000000042f SystemIO conflicts with Region \PMIO 2 (20120320/utaddress-251)
[ 6.499372] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[ 6.499375] ACPI Warning: 0x0000000000000500-0x000000000000057f SystemIO conflicts with Region \GPIO 1 (20120320/utaddress-251)
[ 6.499379] ACPI Warning: 0x0000000000000500-0x000000000000057f SystemIO conflicts with Region \GP01 2 (20120320/utaddress-251)
[ 6.499383] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[ 6.499384] lpc_ich: Resource conflict(s) found affecting gpio_ich
[ 6.499418] ACPI Warning: 0x000000000000e040-0x000000000000e05f SystemIO conflicts with Region \SMB0 1 (20120320/utaddress-251)
[ 6.499423] ACPI Warning: 0x000000000000e040-0x000000000000e05f SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 2 (20120320/utaddress-251)
[ 6.499426] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
haven't seen this earlier but assume it's happening whenever it gets hot under the hood. don't really know what to make of it though.
https://bugzilla.redhat.com/show_bug.cgi?id=859597
The workaround I (Lenovo T420) found was to use Fedora kernel package 3.5.5-2, but I think that I probably need to go back and get the version of the i915 driver instead
You can tell if it's happening by doing a
cat /sys/kernel/debug/dri/0/i915_drpc_info
a few times and seeing if RC6+ or RC6++ change at all. If those values stay 0, then your box will probably heat up. I wrote a script to check that (attached to the Redhat bugzilla ticket)
dhw
Back to kernel 3.5.6.
May be related to this commit: https://bbs.archlinux.org/viewtopic.php?pid=1227526#p1227526
Reverting to 3.5.6.
Machine: Mobo: ASUSTeK model: K53SK version: 1.0 Bios: American Megatrends version: K53SK.203 date: 10/11/2011
CPU: Quad core Intel Core i7-2670QM CPU (-HT-MCP-) clocked at 800.00 MHz
Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated Graphics Controller
Card-2: Advanced Micro Devices [AMD] nee ATI Whistler LE [AMD Radeon HD 6625M Graphics]
X.org: 1.13.2.901 driver: intel Resolution: 118x37
1. You have a laptop with two GPU's. On boot, you have disabled the discrete card with vgaswitcheroo. After the laptop resumes, the discrete card is turned on again, which explains the rise in temperature.
2. After the laptop resumes, the integrated Intel GPU is 100% active and does not return to RC6 mode.
I use a workaround for scenario 1 and I personally haven't come across scenario 2 since 3.7.6.
I hope you'll all have similar successes with 3.8.3.
I've been pinned to LTS on my laptop until this bug gets resolved, but that's starting to get problematic, as I'm starting to get plagued by other issues as a result of running LTS. (See: https://bugs.archlinux.org/task/34209)
It happened only one time on my laptop, so we can say it's almost fixed, but not entirely in my own.
Does someone know why this was broken at all upstream?
I mean: could it be that the ones who still have sometimes an hot laptop be hit by this bug?
If you are having this issue, though your CPU usage and load averages are down at idle levels, then this is a CPU/GPU scaling issue and is part of this bug report.
Thank you.
On a side note, I haven't had this issue with 3.9.0.
Some people already reported that GPU is 100% in running state even if they don't suspend/resume. I didn't experience that before updating to kernel 3.8.11
When I start up the notebook on AC it is fine (20-50% RC6 depending on what I'm doing). Afte a while the GPU starts getting crazy and the device heats up. I did not find out what triggers this behavior. Rebooting fixes it again, suspend/resume doesn't. I also made sure to turn off things like laptop-mode
kernel options: pcie_aspm=force i915.i915_enable_rc6=1 i915.i915_enable_fbc=1 i915.lvds_downclock=1
Generally nowadays suspend just doesn't work, so I have not been using it very often. :(
$ uname -a
Linux hertz 3.9.2-1-ARCH #1 SMP PREEMPT Sat May 11 20:31:08 CEST 2013 x86_64 GNU/Linux
It runs hot after suspend.
First of all, it doesn't happen every time I continue from suspend, but it does happen once in a while.
Secondly, right now I've just unsuspended, and the computer isn't running crazy hot, but the fan is still running (which is weird, considering I'm just working in some LaTeX in Emacs). I opened powertop and noticed that the "Powered on" and "RC6" counters under GPU fluctuate wildly. One moment it'll look like this: http://i.imgur.com/zTsNJvY.jpg and if i press 'r' to refresh, not even a second later, it can look like this http://i.imgur.com/78XVVe8.jpg and everything in between. I'm guessing it should be pretty stable at 100% RC6 if I'm just idling?
Could you please specify what exactly needs to be done to reproduce this?
Is it just resuming from suspend, or is there more you need to do?
I am running [testing] and gnome-shell, and pressing the power button to suspend.
Ohh, looking at the 3.10.5 changelog, there's f4332be drm/i915: fix long-standing SNB regression in power consumption after resume v2.
Could be fixed indeed. :)
Generally when resuming after a suspend the graphics card "goes rogue" with respect to RC6+ or RC6++ as I mentioned above. It's possible due to resuming with the power status *different* than when suspend happened (on AC when suspending, on battery when resuming). I wrote a script to tell me if it's the graphics card or not. Generally a keyboard suspend/resume cycle will bring things back into order.
the script (debugi915.sh) that I've attached is run like:
sudo ./debugi915.sh
This script prints the same information (cat /sys/kernel/debug/dri/0/i915_drpc_info) on 1 second intervals and the output looks something like this:
# ------------------------- Wed Aug 7 12:03:12 CDT 2013
RC information accurate: yes
Video Turbo Mode: yes
HW control enabled: yes
SW control enabled: no
RC1e Enabled: no
RC6 Enabled: yes
Deep RC6 Enabled: no
Deepest RC6 Enabled: no
Current RC state: RC6
Core Power Down: no
RC6 "Locked to RPn" residency since boot: 0
RC6 residency since boot: 2152244865 <<------ The number to watch....
RC6+ residency since boot: 0
RC6++ residency since boot: 0
RC6 voltage: 450mV
RC6+ voltage: 245mV
RC6++ voltage: 245mV
Temp: 63000
If the RC6 residency (the number to watch above) since boot number is less than 100,000 and doesn't change, then the graphics card will heat up your box like mad.
Can someone verify that this script works on Arch? I've submitted the bug report to Fedora (I'm running 19 and it happens frequently)
Edit:
I just did the testing I should have done before I posted this:
uname -a: Linux t420w 3.10.4-300.fc19.x86_64 #1 SMP Tue Jul 30 11:29:05 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Lenovo Thinkpad T420, 16gb, Type 4177-CTO
Only happens when resuming via opening the lid on battery. Doesn't happen via Fn-F4 or on AC
With 3.10.5 now onboard here so far so good. One stuck at 100% would kill that, of course. Will report back.
Incidentally, I've boosted to i915.i915_enable_rc6=7 and powertop is indeed reporting the chip in RC6pp.
AfC
Disclaimer: I'm on Fedora
Props to the Arch Wiki for having the answer that works for all of us.
Link: https://wiki.archlinux.org/index.php/Intel_Graphics#Choose_acceleration_method
I added the attached file at /etc/X11/xorg.conf.d/20-intel.conf