FS#18815 - [kernel26] CPU Frequency Scaling Stuck [minimum,randomly]

Attached to Project: Arch Linux
Opened by orbisvicis (orbisvicis) - Wednesday, 24 March 2010, 07:07 GMT
Last edited by Jan de Groot (JGC) - Tuesday, 28 September 2010, 14:34 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
This is not 100% reproducible, but it seems to happen mostly within 10 minutes of startup or right after a resume.
Googling, I've found that many people from a wide range of distributions have run into this issue, but all dating back about two years. For example, some links:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/138465
https://bugzilla.kernel.org/show_bug.cgi?id=10564
* ^ exactly identical to my issue
As most reporting, I use a Dell 9400 laptop, with a T2400 intel core duo and an A10 BIOS. More details to follow.

Also, I use cpufreqd with the ondemand or performance governors - through the acpi-cpufreq driver - depending on AC power. However, this has nothing to do with cpufreqd or the governors: the frequency is throttled to a minimum whether or not the cpufreqd daemon is running, and with any governor.

When the CPU is throttled ACPI logs to syslog:
Mar 24 00:57:46 cinnabar logger: ACPI group/action undefined: processor / CPU0
Mar 24 00:57:46 cinnabar logger: ACPI group/action undefined: processor / CPU1
And acpi_listen records the processor moving into P-Sate 2:
processor CPU0 00000080 00000002
processor CPU1 00000080 00000002
The files at "/proc/acpi/processor/CPU?" confirm the switch.

After throttling the information reflected at "/sys/devices/system/cpu/cpu?/cpufreq/" doesn't change much. The files "/sys/devices/system/cpu/cpu?/cpufreq/cpuinfo_cur_freq" are updated to display the stuck frequency and become immutable. Neither one of:
cpufreq-set "..options.."
cat 1833000 >/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
succeed in modifying the CPU frequency or the files. However, I can successfully switch governors or drivers - in other words edit any other information stored "/sys/devices/system/cpu/cpu?/cpufreq/*" that does not concern the current CPU frequency. (either through "cat" or "cpufreq-set"). Afterwards cpufreq-info displays the updated information.

These are some errors I was able to collect:
$ cpufreqd -D -V7
cpufreqd_loop : New Rule ("AC Rule"), applying.
cpufreqd_set_profile : Couldn't set profile "Performance High" set for cpu0 (100-100-performance)
cpufreqd_loop : Cannot set policy, Rule unchanged ("none").

$ cpufreq-set -r -f 1.83Ghz
$ cpufreq-set -r -g performance -u 1.83GHz -d 1.83GHz
:: Setting cpufreq governing rules , cpu 0Error setting new values. Common errors:
- Do you have proper administration rights? (super-user?)
- Is the governor you requested available and modprobed?
- Trying to set an invalid policy?
- Trying to set a specific frequency, but userspace governor is not available,
for example because of hardware which cannot be set to a specific frequency
or because the userspace governor isn't loaded?

These are the kernel modules I have loaded. Unloading and reloading any or all of them does not allow me to "unstick" the CPU frequency.
$ lsmod | grep -i freq
cpufreq_powersave 646 0
cpufreq_ondemand 6897 0
acpi_cpufreq 5631 0
freq_table 1955 2 cpufreq_ondemand,acpi_cpufreq
processor 26526 3 acpi_cpufreq

Following is the output of cpufreq-info before and after the throttling.
cpufreq-info before the throttling:
analyzing CPU 0:
driver: acpi-cpufreq
CPUs which run at the same hardware frequency: 0 1
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 10.0 us.
hardware limits: 1000 MHz - 1.83 GHz
available frequency steps: 1.83 GHz, 1.33 GHz, 1000 MHz
available cpufreq governors: powersave, ondemand, performance
current policy: frequency should be within 1.83 GHz and 1.83 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 1.83 GHz.

cpufreq-info after the throttling:
analyzing CPU 0:
driver: acpi-cpufreq
CPUs which run at the same hardware frequency: 0 1
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 10.0 us.
hardware limits: 1000 MHz - 1.83 GHz
available frequency steps: 1.83 GHz, 1.33 GHz, 1000 MHz
available cpufreq governors: powersave, ondemand, performance
current policy: frequency should be within 1000 MHz and 1000 MHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 1000 MHz.

List of files in "/sys/devices/system/cpu/cpu?/cpufreq/"
$ ls -lah /sys/devices/system/cpu/cpu?/cpufreq/
/sys/devices/system/cpu/cpu0/cpufreq/:
total 0
drwxr-xr-x 2 root root 0 Mar 24 01:04 .
drwxr-xr-x 7 root root 0 Mar 24 01:04 ..
-r--r--r-- 1 root root 4.0K Mar 24 01:04 affected_cpus
-r-------- 1 root root 4.0K Mar 24 01:08 cpuinfo_cur_freq
-r--r--r-- 1 root root 4.0K Mar 24 01:04 cpuinfo_max_freq
-r--r--r-- 1 root root 4.0K Mar 24 01:04 cpuinfo_min_freq
-r--r--r-- 1 root root 4.0K Mar 24 01:08 cpuinfo_transition_latency
-r--r--r-- 1 root root 4.0K Mar 24 01:08 related_cpus
-r--r--r-- 1 root root 4.0K Mar 24 01:04 scaling_available_frequencies
-r--r--r-- 1 root root 4.0K Mar 24 01:04 scaling_available_governors
-r--r--r-- 1 root root 4.0K Mar 24 01:05 scaling_cur_freq
-r--r--r-- 1 root root 4.0K Mar 24 01:08 scaling_driver
-rw-r--r-- 1 root root 4.0K Mar 24 01:55 scaling_governor
-rw-r--r-- 1 root root 4.0K Mar 24 01:04 scaling_max_freq
-rw-r--r-- 1 root root 4.0K Mar 24 01:04 scaling_min_freq
-rw-r--r-- 1 root root 4.0K Mar 24 01:55 scaling_setspeed

/sys/devices/system/cpu/cpu1/cpufreq/:
... (It's the same as for the other core) ...

Also sensors show very cool temperatures which are expected if running throttled at 1Ghz with fans at maximum. So there does not seem to be any reason for the BIOS to throttle the CPU.
sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +28.5°C (crit = +99.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +26.0°C (crit = +100.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1: +25.0°C (crit = +100.0°C)

As far as I can tell there is no other information in any of the log files.


Other symptoms include:
. every action (including typing) becomes sluggish
. X uses an abnormally high percentage of CPU
. time ticks faster. For example "sleep XY" after throttling is approximately 1.75 times faster than "sleep XY" before throttling. (This doesn't happen during normal frequency scaling).
. fans are forced to high.

Some information about my system:
$ acpitool -c
CPU type : Genuine Intel(R) CPU T2400 @ 1.83GHz
Min/Max frequency : 1833/1833 MHz
Current frequency : 1833 MHz
Frequency governor : performance
Freq. scaling driver : acpi-cpufreq
Cache size : 2048 KB
Bogomips : 3662.69
Bogomips : 3663.30

# of CPU's found : 2

Processor ID : 0
Bus mastering control : yes
Power management : yes
Throttling control : yes
Limit interface : yes
Active C-state : C0
C-states (incl. C0) : 3
Usage of state C1 : 549980 (10.6 %)
Usage of state C2 : 4622981 (89.2 %)
T-state count : 8
Active T-state : T0


Processor ID : 1
Bus mastering control : yes
Power management : yes
Throttling control : yes
Limit interface : yes
Active C-state : C0
C-states (incl. C0) : 3
Usage of state C1 : 305218 (6.0 %)
Usage of state C2 : 4737969 (93.9 %)
T-state count : 8
Active T-state : T0

$ lshw -c cpu
WARNING: you should run this program as super-user.
*-cpu
product: Genuine Intel(R) CPU T2400 @ 1.83GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
version: 6.14.8
serial: 0000-06E8-0000-0000-0000-0000
size: 1833MHz
capacity: 1833MHz
width: 32 bits
capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm cpufreq
configuration: id=0
*-logicalcpu:0
description: Logical CPU
physical id: 0.1
width: 32 bits
capabilities: logical
*-logicalcpu:1
description: Logical CPU
physical id: 0.2
width: 32 bits
capabilities: logical


Additional info:
. I can't figure out why I'm only seeing these symptoms now. There haven't been any important software updates recently, I haven't touched the BIOS or other operating systems, nor have I modified the hardware.

Thoughts:
Since my bug is almost identical to the kernel bug (see second link) of about two years ago, this is probably a kernel issue, most likely caused when the kernel is unaware that the BIOS changes the CPU frequency. I'm not really sure what to do about this. Most likely I am the only one with these symptons, and I doubt I have the time to test patches and rebuild kernels..
This task depends upon

Closed by  Jan de Groot (JGC)
Tuesday, 28 September 2010, 14:34 GMT
Reason for closing:  No response
Additional comments about closing:  No activity in +1 month. Original reported no longer affected.
Comment by Sergey (skewml) - Wednesday, 24 March 2010, 09:16 GMT
Try to add "processor.ignore_ppc=1" to your kernel boot command line
Comment by orbisvicis (orbisvicis) - Saturday, 27 March 2010, 18:15 GMT
Thanks for that tip, it's taken a while to test since this is not 100% reproducible.

"processor.ignore_ppc=1" helps *somewhat*
. seems to keep the processor in the C0 state and prevent (random) switching to other P-states. Therefore, clock frequency and voltage are maintained at maximum, and "cpufreq-set" and "cpufreqd" work as expected.
. does not affect T-states. Therefore, my processor will (randomly) be throttled to a T6 state - at 25% performance. This is veeery slow.(I guess this explains why the system clock was runing faster and I/O seemed to lag)
. does not affect BIOS overriding fan speeds. Not even i8kmon can modify the speeds. (it's noisy, but I don't really care)

Apparently the acpi-cpufreq module outputs some information, but I haven't gotten around to checking the _PPC ACPI information from my laptop. Perhaps that information could help clarify why this is happening.
As a stop-gap measure, does anyone know how to manually switch T-states?
Comment by Gerardo Exequiel Pozzi (djgera) - Monday, 09 August 2010, 22:52 GMT
  • Field changed: Status (Assigned → Waiting on Response)
status with 2.6.35?
Comment by orbisvicis (orbisvicis) - Thursday, 12 August 2010, 16:51 GMT
Since I've replaced the fans in my laptop (~2.6.34), this issue no longer affects me. However (aurelieng) also voted for this issue, so it probably shouldn't be closed.

This doesn't mean the actual cause still isn't valid. To rehash and clarify:
My laptop was 'ratcheted' into the highest (lowest performing) P-state. For example, once it entered a C3 state, it would never fall back to a {C0,C1,C2} state, no matter how thermally cool it was. I mean, I could throw my laptop into the freezer (10-15C) and it still wouldn't switch back to it's native P-state. Now, is this a kernel issue or a BIOS issue? I'm not sure.

The other problem's listed are invalid:
i8k has been broken since 2.6.33 or 2.6.34, different issue: https://bbs.archlinux.org/viewtopic.php?id=96356
cpu T-state throttling *was* working correctly
Comment by orbisvicis (orbisvicis) - Thursday, 12 August 2010, 16:58 GMT
This no longer affects me because my cpu no longer *needs* to be throttled out of its C0 state. Im sure if I smothered it with a blanket I could reproduce the problem ... but I really don't want to.

Loading...