- We use yocto (with mainline kernel 4.4.18) for our i.MX 6SOLO-based custom board.
- We use in-kernel thermal management framework with default thermal governor (step_wise) (drivers/thermal/imx-thermal.c) with defined trip points:
imx_thermal 2000000.aips-bus:tempmon: Commercial CPU temperature grade - max:95C critical:90C passive:85C
- We heat up the board in climate chamber until temperature reaches critical value (90C) and in-kernel thermal management powers it off. After short period of time (when temperature is in range passive(85C)...critical(90C), we power up the board again so the alarm condition is met and imx_thermal interrupt fires up. When we try to read out the temperature from corresponding sysfs file we permanently get the error -EAGAIN:
strace cat /sys/class/thermal/thermal_zone0/temp ... open("/sys/class/thermal/thermal_zone0/temp", O_RDONLY|O_LARGEFILE) = 3 sendfile64(1, 3, NULL, 16777216) = -1 EAGAIN (Resource temporarily unavailable) read(3, 0x7ee5bc00, 4096) = -1 EAGAIN (Resource temporarily unavailable) brk(NULL) = 0x1ad9000 brk(0x1afa000) = 0x1afa000 write(2, "cat: read error: Resource tempor"..., 50cat: read error: Resource temporarily unavailable
root@bibo:~# cat /proc/interrupts CPU0 <snip>
271: 2 GPC 49 Level imx_thermal
- There are a couple of workarounds to enforce the temperature file to be readable. If we explicitly enable the thermal's mode via sysfs (echo enabled > /sys/sys/class/therma/thermal_zone0/mode) the temperature file becomes readable. The same applies to suspend/resume cycles - after resume the temperature file is readable.
Having analyzed/debugged the code (for both mainline and freescale's trees) I figured out the reason of the problem:
In imx_thermal_probe() thermal alarm interrupt is enabled before device's 'mode' field is set to THERMAL_DEVICE_ENABLED while the sensor hardware is already powered up. If alarm condition is met -
the interrupt immediately fires up. During (threaded) interrupt processing imx_get_temp() is called. The field 'mode' is still set to DISABLED, so imx_get_temp() processes such case by special way: sensor is powered up,
measurement is enabled, a reading is taken, after that measurement is DISABLED and temperature sensor is POWERED DOWN.
When processing of alarm interrupt ends, imx_thermal_probe() continues and sets mode field to ENABLED, but in fact the device is powered off!
This leads to broken logic of further calls of imx_get_temp().
The consequences of this bug could be quite serious - the temperature is not read out from the sensor, so in-kernel thermal management is useless - the board is not powered off by thermal management when the CPU is overheated.
Attached is patch against freescale's tree (branch origin/imx_4.1.y_brillo)
Original Attachment has been moved to: 0001-thermal-imx-Fixes-temperature-reading-always-returns.patch.zip