AnsweredAssumed Answered

lx2160ardb: kernel reports problems during heavy disk I/O with PCIe/SATA adapters

Question asked by Daniel Klauer on Jul 17, 2019

Hello,

 

Currently I'm trying to do some benchmarking with the LX2160ARDB and SATA SSDs. For this I have two PCIe-4x-to-4-port-SATA3.0 cards with Marvel 88SE9230 RAID controllers, that are plugged into the LX2160ARDB's PCIe slots (J23 and J28). Each has 4 Samsung 860 EVO SSDs connected for a total of 8 SSDs. The testing is done with Linux built from Yocto, btrfs raid0 and fio.

 

Sometimes while writing data to the disks with fio I'm seeing various kernel errors, and noticable delays/hangs which interrupt the writing process for multiple seconds or even minutes. Some examples:

 

root@lx2160ardb:~# fio --name=x --directory=/mnt/btrfspool --numjobs=1 --size=3T --filesize=1G --nrfiles=3K --openfiles=1 --file_service_type=sequential --rw=write --ioengine=sync --bs=1M --direct=0 --fallocate=posix --zero_buffers=0 --write_bw_log=x
x: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=sync, iodepth=1
fio-3.12-dirty
Starting 1 process
x: Laying out IO files (3072 files / total 3145728MiB)
Jobs: 1 (f=1): [W(1)][0.6%][

 

[ 1204.410448] mmc1: Timeout waiting for hardware cmd interrupt.
[ 1204.416186] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 1204.422615] mmc1: sdhci: Sys addr:  0x5cc80000 | Version:  0x00002202
[ 1204.429043] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000000
[ 1204.435471] mmc1: sdhci: Argument:  0x00010000 | Trn mode: 0x00000033
[ 1204.441899] mmc1: sdhci: Present:   0x01f80008 | Host ctl: 0x0000003c
[ 1204.448327] mmc1: sdhci: Power:     0x00000007 | Blk gap:  0x00000000
[ 1204.454755] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00000208
[ 1204.461183] mmc1: sdhci: Timeout:   0x0000000e | Int stat: 0x00000001
[ 1204.467610] mmc1: sdhci: Int enab:  0x037f100f | Sig enab: 0x037f100b
[ 1204.474038] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
[ 1204.480465] mmc1: sdhci: Caps:      0x34fa0000 | Caps_1:   0x0000af00
[ 1204.486893] mmc1: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[ 1204.493321] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffff8d
[ 1204.499748] mmc1: sdhci: Resp[2]:   0x320f5903 | Resp[3]:  0x00000900
[ 1204.506175] mmc1: sdhci: Host ctl2: 0x00000080
[ 1204.510607] mmc1: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000f9c9820c
[ 1204.517729] mmc1: sdhci: ============================================

 

[...]

 

[ 3565.222434] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:   
[ 3565.228521] rcu:     (detected by 14, t=5252 jiffies, g=610161, q=2482)
[ 3565.234867] rcu: All QSes seen, last rcu_preempt kthread activity 5250 (4295783547-4295778297), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 3565.247284] kworker/u32:27  R  running task        0  3372      2 0x0000002a
[ 3565.254333] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[ 3565.260414] Call trace:
[ 3565.262851]  dump_backtrace+0x0/0x158
[ 3565.266502]  show_stack+0x14/0x20
[ 3565.269805]  sched_show_task+0x13c/0x168
[ 3565.273716]  rcu_check_callbacks+0x7e0/0x850
[ 3565.277975]  update_process_times+0x2c/0x70
[ 3565.282147]  tick_sched_handle.isra.5+0x3c/0x50
[ 3565.286664]  tick_sched_timer+0x48/0x98
[ 3565.290487]  __hrtimer_run_queues+0x118/0x1a8
[ 3565.294831]  hrtimer_interrupt+0xe4/0x240
[ 3565.298829]  arch_timer_handler_phys+0x2c/0x38
[ 3565.303262]  handle_percpu_devid_irq+0x80/0x138
[ 3565.307779]  generic_handle_irq+0x24/0x38
[ 3565.311776]  __handle_domain_irq+0x60/0xb8
[ 3565.315860]  gic_handle_irq+0x7c/0x178
[ 3565.319596]  el1_irq+0xb0/0x128
[ 3565.322726]  queued_spin_lock_slowpath+0x230/0x2a8
[ 3565.327505]  queued_read_lock_slowpath+0x118/0x120
[ 3565.332285]  _raw_read_lock+0x44/0x48
[ 3565.335935]  btrfs_tree_read_lock+0x40/0x130
[ 3565.340193]  btrfs_search_slot+0x6d4/0x8a0
[ 3565.344278]  btrfs_lookup_csum+0x5c/0x188
[ 3565.348275]  btrfs_csum_file_blocks+0x214/0x580
[ 3565.352794]  add_pending_csums+0x64/0x98
[ 3565.356705]  btrfs_finish_ordered_io+0x2c0/0x810
[ 3565.361309]  finish_ordered_fn+0x10/0x18
[ 3565.365219]  normal_work_helper+0x228/0x240
[ 3565.369390]  btrfs_endio_write_helper+0x10/0x18
[ 3565.373909]  process_one_work+0x1e0/0x318
[ 3565.377907]  worker_thread+0x40/0x428
[ 3565.381557]  kthread+0x124/0x128
[ 3565.384772]  ret_from_fork+0x10/0x18
[ 3565.388337] rcu: rcu_preempt kthread starved for 5250 jiffies! g610161 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=7
[ 3565.398843] rcu: RCU grace-period kthread stack dump:
[ 3565.403881] rcu_preempt     R    0    10      2 0x00000028
[ 3565.409355] Call trace:
[ 3565.411789]  __switch_to+0xa0/0xe0
[ 3565.415179]  __schedule+0x1e0/0x5c0
[ 3565.418655]  schedule+0x38/0xa0
[ 3565.421784]  schedule_timeout+0x198/0x338
[ 3565.425783]  rcu_gp_kthread+0x428/0x800
[ 3565.429607]  kthread+0x124/0x128
[ 3565.432822]  ret_from_fork+0x10/0x18

 

[...]

 

[ 4510.522428] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 4510.528509] rcu:     (detected by 15, t=241578 jiffies, g=610161, q=96607)
[ 4510.535111] rcu: All QSes seen, last rcu_preempt kthread activity 241578 (4296019875-4295778297), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 4510.547701] swapper/15      R  running task        0     0      1 0x00000028

 

[...]

an example from another run:

[ 4261.539921] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 4261.546349] mmc1: sdhci: Sys addr:  0x5c580000 | Version:  0x00002202
[ 4261.552777] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000008
[ 4261.559204] mmc1: sdhci: Argument:  0x00010000 | Trn mode: 0x00000033
[ 4261.565631] mmc1: sdhci: Present:   0x01f80008 | Host ctl: 0x0000003c
[ 4261.572059] mmc1: sdhci: Power:     0x00000007 | Blk gap:  0x00000000
[ 4261.578487] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00000208
[ 4261.584914] mmc1: sdhci: Timeout:   0x0000000e | Int stat: 0x00000001
[ 4261.591342] mmc1: sdhci: Int enab:  0x037f100f | Sig enab: 0x037f100b
[ 4261.597770] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
[ 4261.604197] mmc1: sdhci: Caps:      0x34fa0000 | Caps_1:   0x0000af00
[ 4261.610625] mmc1: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[ 4261.617052] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x0000000d
[ 4261.623480] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[ 4261.629907] mmc1: sdhci: Host ctl2: 0x00000080
[ 4261.634339] mmc1: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000f9c9820c
[ 4261.641459] mmc1: sdhci: ============================================
[ 4261.648429] mmc1: card 0001 removed
[ 4271.982233] ata11.00: exception Emask 0x0 SAct 0xffc00bff SErr 0x0 action 0x6 frozen
[ 4271.989975] ata11.00: failed command: WRITE FPDMA QUEUED
[ 4271.995330] ata11.00: cmd 61/80:00:80:4a:9c/00:00:2f:00:00/40 tag 0 ncq dma 65536 out
[ 4271.995330]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4271.996397] sd 3:0:0:0: [sdd] tag#31 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
[ 4272.010534] ata11.00: status: { DRDY }
[ 4272.010537] ata11.00: failed command: WRITE FPDMA QUEUED

[ 4272.010542] ata11.00: cmd 61/00:08:00:45:9c/02:00:2f:00:00/40 tag 1 ncq dma 262144 out
[ 4272.010542]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

[ 4272.010544] ata11.00: status: { DRDY }

fio: io_u error on file /mnt/btrfspool/x.0.3042: Input/output error: write offset=195035136, buflen=1048576
[ 4272.010546] ata11.00: failed command: WRITE FPDMA QUEUED
[ 4272.010550] ata11.00: cmd 61/00:10:00:48:9c/01:00:2f:00:00/40 tag 2 ncq dma 131072 out
[ 4272.010550]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4272.010552] ata11.00: status: { DRDY }

[ 4272.010554] ata11.00: failed command: WRITE FPDMA QUEUED

[ 4272.010558] ata11.00: cmd 61/00:18:00:4b:9c/01:00:2f:00:00/40 tag 3 ncq dma 131072 out

fio: io_u error on file /mnt/btrfspool/x.0.3042: Input/output error: write offset=240123904, buflen=1048576
[ 4272.010558]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4272.010559] ata11.00: status: { DRDY }
[ 4272.010561] ata11.00: failed command: WRITE FPDMA QUEUED
[ 4272.010565] ata11.00: cmd 61/80:20:80:44:9c/00:00:2f:00:00/40 tag 4 ncq dma 65536 out
[ 4272.010565]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4272.010566] ata11.00: status: { DRDY }
[ 4272.010568] ata11.00: failed command: WRITE FPDMA QUEUED
[ 4272.010573] ata11.00: cmd 61/00:28:00:4c:9c/01:00:2f:00:00/40 tag 5 ncq dma 131072 out
[ 4272.010573]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4272.010574] ata11.00: status: { DRDY }
[ 4272.010579] ata11.00: failed command: WRITE FPDMA QUEUED
[ 4272.011649] sd 2:0:0:0: [sdc] tag#31 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
[ 4272.011654] sd 2:0:0:0: [sdc] tag#31 CDB: opcode=0x2a 2a 00 2f 9b e8 80 00 00 80 00
[ 4272.011657] print_req_error: I/O error, dev sdc, sector 798746752
[ 4272.011659] sd 1:0:0:0: [sdb] tag#31 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
[ 4272.011664] BTRFS error (device sda): bdev /dev/sdc errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 4272.011665] sd 1:0:0:0: [sdb] tag#31 CDB: opcode=0x2a 2a 00 2f 9b e9 80 00 00 80 00
[ 4272.011669] BTRFS warning (device sda): direct IO failed ino 4323 rw 1,34817 sector 0x2f9be900 len 0 err no 10
[ 4272.011671] print_req_error: I/O error, dev sdb, sector 798747008
[ 4272.011676] BTRFS error (device sda): bdev /dev/sdb errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 4272.011680] BTRFS warning (device sda): direct IO failed ino 4323 rw 1,34817 sector 0x2f9bea00 len 0 err no 10
[ 4272.011705] sd 2:0:0:0: [sdc] tag#30 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
[ 4272.011708] sd 2:0:0:0: [sdc] tag#30 CDB: opcode=0x2a 2a 00 2f 9b d2 00 00 0a 00 00
[ 4272.011710] print_req_error: I/O error, dev sdc, sector 798740992
[ 4272.011713] BTRFS error (device sda): bdev /dev/sdc errs: wr 2, rd 0, flush 0, corrupt 0, gen 0

 

Some additional information that might be useful:

  • I started out with the Yocto state corresponding to LSDK-19.03 (as per yocto_2.7 tag in the qoriq-components/yocto-sdk repo). However this included the linux-qoriq kernel 4.19.26+gc0c214110624 which crashed at boot in ahci-qoriq driver's fsl_sata_errata_379364() function. That was fixed by upgrading to kernel 4.19.46+g1a4cab2c597d from LSDK-19.06. (some other components such as u-boot and management-complex firmware had to be upgraded to LSDK-19.06 state too to make the new kernel work)
  • I did not see such problems during disk I/O with only 4 SSDs connected directly to the board's SATA connectors. Also no problems with 4 SSDs connected to only 1 PCIe card, in either slot, as opposed to using 2 PCIe cards.
  • The same problems appeared when using 2 PCIe cards with 2 SSDs each (total 4 SSDs instead of 8).
  • We tried powering half of the SSDs with an external power supply, in case the internal one was too weak for 8 SSDs, but it doesn't seem to make a difference - errors appeared either way.

 

Does anyone have an idea what the problem could be? I'm pretty much lost at this point. It would be really nice to get the 8 SSDs to work reliably, but so far, no luck. Probably the way forward is to do more tests to rule out hardware problems with the PCIe cards or SSDs themselves.

 

But I don't understand why doing disk I/O would cause mmc1 timeouts. Yes, I'm booting from SD card, but that's mmc0 - mmc1 is (as far as I know) the internal eMMC which isn't being used at all (when mounting the btrfs raid0, only /dev/sd[a-h] are used).

 

Thanks in advance for any hints.

Outcomes