emmc dump caused by CQHCI

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

emmc dump caused by CQHCI

Jump to solution
314 Views
yan4
Contributor II

Hi NXP Teams

Recently we face a problem casued by CQHCI feature.

kernel version: 6.1.68

platform: S32G3

emmc model: FEMDME008G-A8A39

We used sysbench to perform pressure test on emmc and emmc keeped outputting dump information.

dump messages are below:

[ 232.419161] mmc0: cqhci: timeout for tag 4, qcnt 25
[ 232.423677] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[ 232.430083] mmc0: cqhci: Caps: 0x0000310a | Version: 0x00000510
[ 232.436501] mmc0: cqhci: Config: 0x00001001 | Control: 0x00000000
[ 232.442924] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006
[ 232.449346] mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000
[ 232.455799] mmc0: cqhci: TDL base: 0x823c3000 | TDL up32: 0x00000000
[ 232.462202] mmc0: cqhci: Doorbell: 0x81dfffff | TCN: 0x00000000
[ 232.468619] mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00800000
[ 232.475067] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00011000
[ 232.481469] mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000900
[ 232.487896] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000
[ 232.494330] mmc0: cqhci: Resp idx: 0x0000000d | Resp arg: 0x00000000
[ 232.500739] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 232.507158] mmc0: sdhci: Sys addr: 0x9ca3e000 | Version: 0x00000002
[ 232.513578] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000020
[ 232.520001] mmc0: sdhci: Argument: 0x00018000 | Trn mode: 0x00000023
[ 232.526424] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000030
[ 232.532847] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 232.539279] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000000f
[ 232.545694] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
[ 232.552126] mmc0: sdhci: Int enab: 0x107f4000 | Sig enab: 0x107f4000
[ 232.558568] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 232.564973] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b407
[ 232.571391] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 232.577837] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0xffffffff
[ 232.584242] mmc0: sdhci: Resp[2]: 0x329f5903 | Resp[3]: 0x00d0ffff
[ 232.590660] mmc0: sdhci: Host ctl2: 0x00000008
[ 232.595104] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xad464408
[ 232.601514] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 232.609066] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2120
[ 232.614702] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 232.620431] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 232.626246] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 232.631888] mmc0: sdhci-esdhc-imx: adma debug status: 0x2510
[ 232.637616] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 232.643345] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 232.649595] mmc0: sdhci: ============================================
[ 232.656073] mmc0: running CQE recovery
[ 289.818104] mmc0: running CQE recovery
[ 423.303076] mmc0: cqhci: timeout for tag 31, qcnt 27
[ 423.307677] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[ 423.314085] mmc0: cqhci: Caps: 0x0000310a | Version: 0x00000510
[ 423.320504] mmc0: cqhci: Config: 0x00001001 | Control: 0x00000000
[ 423.326927] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006
[ 423.333381] mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000
[ 423.339786] mmc0: cqhci: TDL base: 0x823c3000 | TDL up32: 0x00000000
[ 423.346379] mmc0: cqhci: Doorbell: 0xff1fcfff | TCN: 0x00000000
[ 423.352628] mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00000400
[ 423.359047] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00011000
[ 423.365465] mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000900
[ 423.371888] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000
[ 423.378309] mmc0: cqhci: Resp idx: 0x0000000d | Resp arg: 0x00000000
[ 423.384736] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 423.391160] mmc0: sdhci: Sys addr: 0xa5b7b000 | Version: 0x00000002
[ 423.397583] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000020
[ 423.404005] mmc0: sdhci: Argument: 0x00018000 | Trn mode: 0x00000023
[ 423.410426] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000030
[ 423.416849] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 423.423280] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000000f
[ 423.429696] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
[ 423.436119] mmc0: sdhci: Int enab: 0x107f4000 | Sig enab: 0x107f4000
[ 423.442568] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 423.449210] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b407
[ 423.455401] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 423.461812] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0xffffffff
[ 423.468236] mmc0: sdhci: Resp[2]: 0x329f5903 | Resp[3]: 0x00d0ffff
[ 423.474657] mmc0: sdhci: Host ctl2: 0x00000008
[ 423.479083] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xad467408
[ 423.485508] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 423.493061] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2120
[ 423.498701] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 423.504429] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 423.510245] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 423.515887] mmc0: sdhci-esdhc-imx: adma debug status: 0x2510
[ 423.521615] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 423.527344] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 423.533593] mmc0: sdhci: ============================================
[ 423.540075] mmc0: running CQE recovery
[ 501.799092] mmc0: cqhci: timeout for tag 31, qcnt 24
[ 501.803698] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[ 501.810106] mmc0: cqhci: Caps: 0x0000310a | Version: 0x00000510
[ 501.816528] mmc0: cqhci: Config: 0x00001001 | Control: 0x00000000
[ 501.822948] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006
[ 501.829398] mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000
[ 501.835801] mmc0: cqhci: TDL base: 0x823c3000 | TDL up32: 0x00000000
[ 501.842221] mmc0: cqhci: Doorbell: 0xe01fffff | TCN: 0x00000000
[ 501.848666] mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00008000
[ 501.855070] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00011000
[ 501.861533] mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000900
[ 501.867916] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000
[ 501.874333] mmc0: cqhci: Resp idx: 0x0000000d | Resp arg: 0x00000000
[ 501.880756] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 501.887180] mmc0: sdhci: Sys addr: 0xa5a15000 | Version: 0x00000002
[ 501.893600] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000020
[ 501.900029] mmc0: sdhci: Argument: 0x00018000 | Trn mode: 0x00000023
[ 501.906450] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000030
[ 501.912875] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 501.919301] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000000f
[ 501.925717] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
[ 501.932139] mmc0: sdhci: Int enab: 0x107f4000 | Sig enab: 0x107f4000
[ 501.938564] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 501.944994] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b407
[ 501.951435] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 501.957841] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0xffffffff
[ 501.964284] mmc0: sdhci: Resp[2]: 0x329f5903 | Resp[3]: 0x00d0ffff
[ 501.970686] mmc0: sdhci: Host ctl2: 0x00000008
[ 501.975106] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xad463808
[ 501.981529] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 501.989083] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2120
[ 501.994728] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 502.000451] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 502.006266] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 502.011909] mmc0: sdhci-esdhc-imx: adma debug status: 0x2510
[ 502.017636] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 502.023365] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 502.029650] mmc0: sdhci: ============================================
[ 502.036090] mmc0: running CQE recovery
[ 607.207068] mmc0: cqhci: timeout for tag 31, qcnt 21
[ 607.211666] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[ 607.218083] mmc0: cqhci: Caps: 0x0000310a | Version: 0x00000510
[ 607.224498] mmc0: cqhci: Config: 0x00001001 | Control: 0x00000000
[ 607.230915] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006
[ 607.237344] mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000
[ 607.243803] mmc0: cqhci: TDL base: 0x823c3000 | TDL up32: 0x00000000
[ 607.250194] mmc0: cqhci: Doorbell: 0x83ffc1fe | TCN: 0x00000000
[ 607.256635] mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00000001
[ 607.263039] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00011000
[ 607.269485] mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000900
[ 607.275915] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000
[ 607.282328] mmc0: cqhci: Resp idx: 0x0000000d | Resp arg: 0x00000000
[ 607.288731] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 607.295180] mmc0: sdhci: Sys addr: 0x9cefe000 | Version: 0x00000002
[ 607.301577] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000020
[ 607.308021] mmc0: sdhci: Argument: 0x00018000 | Trn mode: 0x00000023
[ 607.314425] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000030
[ 607.320843] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 607.327263] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000000f
[ 607.333725] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
[ 607.340117] mmc0: sdhci: Int enab: 0x107f4000 | Sig enab: 0x107f4000
[ 607.346575] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 607.352962] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b407
[ 607.359405] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 607.365818] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0xffffffff
[ 607.372254] mmc0: sdhci: Resp[2]: 0x329f5903 | Resp[3]: 0x00d0ffff
[ 607.378692] mmc0: sdhci: Host ctl2: 0x00000008
[ 607.383103] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xad463408
[ 607.389505] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 607.397083] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2120
[ 607.402704] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 607.408454] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 607.414245] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 607.419880] mmc0: sdhci-esdhc-imx: adma debug status: 0x2510
[ 607.425607] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 607.431335] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 607.437586] mmc0: sdhci: ============================================
[ 607.444076] mmc0: running CQE recovery

 

We noticed that all dumps occurred when CMD13 was sent and then emmc responsed timeout. We tried to use debugfs to find out reason and we noticed that only on the occasion that we execute “cat /sys/kernel/debug/mmc0/mmc0\:0001/status” to send a CMD13 to emmc which takes about 10 seconds to return and display result while in other cases we excute other command such as "cat /sys/kernel/debug/mmc0/mmc0\:0001/state" which returns immediatly.

Afterall, we closed CQHCI and solved the problem. It seems that CQHCI doesn't work well on emmc produced by specific manufacture. We would like to see new progress on  this issue.

Best Regards,

Yan

Tags (3)
0 Kudos
Reply
1 Solution
54 Views
jun13_chen
Contributor III
0 Kudos
Reply
8 Replies
38 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @yan4 

I did not reproduce the issue on local RDB, but once met such issues on other NXP platforms. I suggest disabling the CQHCI temporally.

Thanks for @jun13_chen's inputs, may close the feature with the patch mentioned.

 

BR

Chenyin

0 Kudos
Reply
238 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @yan4 

Thanks a lot for your valuable post, I could understand the status, and would check if it existed on refernece boards.

Thanks


BR

Chenyin

0 Kudos
Reply
75 Views
yan4
Contributor II

Hello,chenyin_h

Have you confirmed whether this problem exits on board  previously mentioned or not? If you need some extra information, I would be glad to provide.

Best Regards

Yan

0 Kudos
Reply
285 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @yan4 

Thanks for your post.

I noticed that the 6.1.68 kernel is used during your test, which seems not a formally supported version from BSP perspective.

May I know if you also met the issue while using the BSP(, for example, BSP41 with 5.15.x or 6.6.x kenel)?

 

BR

Chenyin

0 Kudos
Reply
250 Views
yan4
Contributor II
Hello, chenyin
I need to explain that the kernel I use is customized so it doesn't match the formally supported version of BSP38 which is supposed to be V5.15. However, there is a similar problem in community, link:https://community.nxp.com/t5/i-MX-Processors/The-issues-on-quot-mmc0-cqhci-timeout-for-tag-0-quot/m-....
They encountered the same issue on the I.MX platform, and their BSP version and kernel version are officially supported. So I won't suppose that this issue is caused by a mismatched kernel version. By the way, I noticed that CQHCI seems to be an unstable feature , occasionally disabled and then enabled in linux main branch, but gogdan-gabriel.roman@nxp.com enabled it in 2022. Perhaps we should reassess the necessity of introducing this feature or fix the bugs within it.
Best Regards
Yan
0 Kudos
Reply
282 Views
yan4
Contributor II
Yes, BSP38 with 6.1.168 kernel was used in my test.
0 Kudos
Reply
55 Views
jun13_chen
Contributor III
Tags (1)
0 Kudos
Reply
30 Views
yan4
Contributor II

Hello,jun13_chen

This patch can indeed solve the problem. Thanks a lot.

0 Kudos
Reply