Hi people,
I noticed in a kernel 4.0.7 that I loose CAN packets when an incoming rsync transfer changes my eMMC or SD-Card image. I used CONFIG_FRACE to find why this is the case, and came to this trace:
# tracer: preemptoff
#
# preemptoff latency trace v1.1.5 on 4.0.7
# --------------------------------------------------------------------
# latency: 1046 us, #756/756, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: mmcqd/0-76 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: sdhci_do_set_ios
# => ended at: sdhci_do_set_ios
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
mmcqd/0-76 2d..1 1us : _raw_spin_lock_irq <-sdhci_do_set_ios
mmcqd/0-76 2d..2 2us : esdhc_writeb_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 4us : esdhc_pltfm_set_bus_width <-sdhci_do_set_ios
mmcqd/0-76 2d..2 5us : l2c210_sync <-esdhc_pltfm_set_bus_width
mmcqd/0-76 2d..2 7us : esdhc_writeb_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 9us : l2c210_sync <-esdhc_writeb_le
mmcqd/0-76 2d..2 10us : esdhc_readw_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 12us : esdhc_writew_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 14us : l2c210_sync <-esdhc_writew_le
mmcqd/0-76 2d..2 16us : l2c210_sync <-esdhc_writew_le
mmcqd/0-76 2d..2 18us : esdhc_readw_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 19us : esdhc_writew_le <-sdhci_do_set_ios
mmcqd/0-76 2d..2 21us : l2c210_sync <-esdhc_writew_le
mmcqd/0-76 2d..2 22us : esdhc_set_uhs_signaling <-sdhci_do_set_ios
mmcqd/0-76 2d..2 24us : pinctrl_select_state <-esdhc_set_uhs_signaling
mmcqd/0-76 2d..2 26us : esdhc_readl_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 28us : esdhc_writel_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 29us : l2c210_sync <-esdhc_writel_le
mmcqd/0-76 2d..2 31us : esdhc_readl_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 33us : esdhc_writel_le <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 34us : l2c210_sync <-esdhc_writel_le
mmcqd/0-76 2d..2 36us : l2c210_sync <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 37us : __timer_const_udelay <-esdhc_pltfm_set_clock
mmcqd/0-76 2d..2 39us : __timer_delay <-__timer_const_udelay
mmcqd/0-76 2d..2 40us : read_current_timer <-__timer_delay
mmcqd/0-76 2d..2 41us : imx_read_current_timer <-read_current_timer
mmcqd/0-76 2d..2 43us : read_current_timer <-__timer_delay
mmcqd/0-76 2d..2 44us : imx_read_current_timer <-read_current_timer
mmcqd/0-76 2d..2 45us : read_current_timer <-__timer_delay
....
So it seems that esdhc_pltfm_set_clock() somehow waits. And really there is an mdelay() there. What's worse: I put a printk() into this function, just before the mdelay(). And it get's called hundreds of times when I boot, or when I update via rsync. And for each call we have an mdelay() that kills preemption.
This is bad because in the FlexCAN driver the ISR calls napi_schedule(), which in turn clears the CAN RxFIFO. And if this scheduling is taking too long, CAN packets are dropped.
The mdelay() is also bad because it makes eMMC/SD-Card accesses WAY slower than they could be.