FMC Crash - timeout waiting for Tx confirmation

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

FMC Crash - timeout waiting for Tx confirmation

2,855 Views
LucianZ
Contributor II

Hi,

We have identified a possible fmc instability that is described below:

Setup details:

Custom HW based on LS1043A SoC and LSDK 19.06.

We are working on creating different fmc configs/policies that will be applied based on system events, we observed the following scenario which seems to be wrong:

if fmc -x is executed during pending egress traffic we get unexpected results.

for this particular case in order to reproduce the issue we used qperf but should be the same if iperf or any other tool is used to generate egress traffic.

Issue reproduced by starting a qperf client/server TCP session (target as client qperf -ip 19766 -t 40 172.16.65.10 -m 63K -vu -ub tcp_bw ) and called fmc -x during egress traffic:

 qperf -ip 19766 -t 40 172.16.65.10 -m 63K -vu -ub tcp_bw &
[1] 484
tcp_bw:
fmc -x
[   46.803618] CPU: 1 PID: 485 Comm: fmc Tainted: P                  4.18.45-yocto-standard #1
[   46.811965] Hardware name: LS1043A RDB Board (DT)
[   46.816658] Call trace:
[   46.819103]  dump_backtrace+0x0/0x168
[   46.822756]  show_stack+0x24/0x30
[   46.826062]  dump_stack+0x90/0xb4
[   46.829368]  QmEnqueueCB+0x1a0/0x220
[   46.832934]  FmHcPcdSync+0x124/0x4f0
[   46.836500]  FmPcdHcSync+0x28/0xc8
[   46.839893]  DetachPCD+0x60/0x1a8
[   46.843197]  FM_PORT_DeletePCD+0x70/0x4b8
[   46.847197]  LnxwrpFmPortIOCTL+0x1238/0x31a0
[   46.851457]  fm_ioctls.isra.1+0x5c/0x120
[   46.855369]  fm_ioctl+0x50/0x70
[   46.858502]  do_vfs_ioctl+0xc4/0x858
[   46.862067]  ksys_ioctl+0x84/0xb8
[   46.865372]  sys_ioctl+0x34/0x48
[   46.868591]  __sys_trace_return+0x0/0x4
[   46.872430] cpu 1: ! MINOR FM Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/src/wrapper/lnxwrp_fm_port.c:257 QmEnqueueCB]: Write Access Failed;
[   46.872433] cpu 1: timeout waiting for Tx confirmation
[   46.886174] cpu 1:
[   46.893492] cpu 1: ! MINOR FM-PCD Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:225 EnQFrm]: Write Access Failed;
[   46.893495] cpu 1: HC enqueue failed
[   46.906529] cpu 1:
[   46.912285] cpu 1: ! MINOR FM-PCD Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:1223 FmHcPcdSync]: Write Access Failed;
[   46.912286] cpu 1:
[   46.925843] cpu 1:
[   46.930129] cpu 1: *** ASSERT_COND failed [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:216 EnQFrm]
[   46.941419] ------------[ cut here ]------------
[   46.946024]
[   46.946024]
[   46.946024] FMD: fatal error, driver can't go on!!!
[   46.946024]
[   46.955348] WARNING: CPU: 1 PID: 485 at drivers/net/ethernet/freescale/sdk_fman/src/xx/xx_arm_linux.c:157 XX_Exit+0x1c/0x28
[   46.966464] Modules linked in: xt_comment xt_tcpudp xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables x_tables af_packet 8021q caam_jr pfuze100_regulator lm90 mcp25xxfd rsa_generic asn1_decoder mpi can_dev caamhash_desc caamalg_desc rng_core authenc aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64 sha1_ce sha1_generic uio_pdrv_genirq uio ptp_qoriq i2c_imx ptp pps_core spi_fsl_dspi qoriq_thermal layerscape_edac_mod caam error(P) fsl_quadspi edac_core sch_fq_codel optee tee nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack libcrc32c at24 regmap_i2c unix ipv6
[   47.025685] CPU: 1 PID: 485 Comm: fmc Tainted: P                  4.18.45-yocto-standard #1
[   47.034023] Hardware name: LS1043A RDB Board (DT)
[   47.038716] pstate: 40000085 (nZcv daIf -PAN -UAO)
[   47.043497] pc : XX_Exit+0x1c/0x28
[   47.046889] lr : XX_Exit+0x1c/0x28
[   47.050279] sp : ffff00000a20b970
[   47.053582] x29: ffff00000a20b970 x28: 0000000000000000
[   47.058885] x27: 0000000000f00000 x26: 0000000000000010
[   47.064187] x25: ffffcd8ff47fc210 x24: 0000000000000000
[   47.069490] x23: 0000000000000000 x22: 0000000000000000
[   47.074792] x21: 0000000000000000 x20: ffff2ced55298588
[   47.080095] x19: ffffcd8ff4004980 x18: 0000000000000010
[   47.085398] x17: 0000000000000001 x16: 0000000000000007
[   47.090701] x15: ffffffffffffffff x14: ffff2ced55298588
[   47.096004] x13: ffff2cedd5328146 x12: ffff2ced5532814e
[   47.101308] x11: ffff2ced552aa000 x10: ffff00000a20b650
[   47.106610] x9 : 00000000ffffffd0 x8 : ffff2ced54c34ac8
[   47.111912] x7 : 0000000000000005 x6 : 00000000000001b0
[   47.117215] x5 : 0000000000000000 x4 : ffffcd8ffbd86678
[   47.122518] x3 : ffffcd8ffbd86678 x2 : 0000000000000007
[   47.127820] x1 : 25ec2b8646980900 x0 : 0000000000000000
[   47.133122] Call trace:
[   47.135559]  XX_Exit+0x1c/0x28
[   47.138605]  FmHcKgWriteSp+0x304/0x820
[   47.142344]  FmPcdKgUnbindPortToSchemes+0x64/0x2d0
[   47.147126]  DeletePcd+0x104/0x598
[   47.150518]  FM_PORT_DeletePCD+0xa0/0x4b8
[   47.154517]  LnxwrpFmPortIOCTL+0x1238/0x31a0
[   47.158777]  fm_ioctls.isra.1+0x5c/0x120
[   47.162689]  fm_ioctl+0x50/0x70
[   47.165822]  do_vfs_ioctl+0xc4/0x858
[   47.169387]  ksys_ioctl+0x84/0xb8
[   47.172692]  sys_ioctl+0x34/0x48
[   47.175911]  __sys_trace_return+0x0/0x4
[   47.179735] ---[ end trace 2eeac047aa1772c8 ]---
[   48.271540] CPU: 1 PID: 485 Comm: fmc Tainted: P        W         4.18.45-yocto-standard #1
[   48.279881] Hardware name: LS1043A RDB Board (DT)
[   48.284572] Call trace:
[   48.287011]  dump_backtrace+0x0/0x168
[   48.290665]  show_stack+0x24/0x30
[   48.293971]  dump_stack+0x90/0xb4
[   48.297275]  QmEnqueueCB+0x1a0/0x220
[   48.300842]  FmHcKgWriteSp+0x154/0x820
[   48.304581]  FmPcdKgUnbindPortToSchemes+0x64/0x2d0
[   48.309362]  DeletePcd+0x104/0x598
[   48.312753]  FM_PORT_DeletePCD+0xa0/0x4b8
[   48.316753]  LnxwrpFmPortIOCTL+0x1238/0x31a0
[   48.321012]  fm_ioctls.isra.1+0x5c/0x120
[   48.324924]  fm_ioctl+0x50/0x70
[   48.328056]  do_vfs_ioctl+0xc4/0x858
[   48.331622]  ksys_ioctl+0x84/0xb8
[   48.334927]  sys_ioctl+0x34/0x48
[   48.338145]  __sys_trace_return+0x0/0x4
[   48.342683] cpu 1: ! MINOR FM Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/src/wrapper/lnxwrp_fm_port.c:257 QmEnqueueCB]: Write Access Failed;
[   48.342686] cpu 1: timeout waiting for Tx confirmation
[   48.356336] cpu 1:
[   48.363654] cpu 1: ! MINOR FM-PCD Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:225 EnQFrm]: Write Access Failed;
[   48.363656] cpu 1: HC enqueue failed
[   48.376692] cpu 1:
[   48.382443] cpu 1: ! MINOR FM-PCD Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:1107 FmHcKgWriteSp]: Write Access Failed;
[   48.382445] cpu 1:
[   48.396175] cpu 1:
[   48.400450] cpu 1: ! MAJOR FM-PCD Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/Pcd/fm_kg.c:2146 FmPcdKgUnbindPortToSchemes]: Write Access Failed;
[   48.400452] cpu 1:
[   48.415657] cpu 1:
[   48.419933] cpu 1: ! MAJOR FM-Port Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/Port/fm_port.c:1738 DeletePcd]: Write Access Failed;
[   48.419935] cpu 1:
[   48.434011] cpu 1:
[   48.438289] cpu 1: ! MAJOR FM-Port Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/Port/fm_port.c:5333 FM_PORT_DeletePCD]: Write Access Failed;
[   48.438291] cpu 1:
[   48.453063] cpu 1:
[   48.457339] cpu 1: ! MINOR FM Error [CPU01, drivers/net/ethernet/freescale/sdk_fman/src/wrapper/lnxwrp_ioctls_fm.c:4679 LnxwrpFmPortIOCTL]: Invalid Operation;
[   48.457341] cpu 1: IOCTL FM PORT
[   48.471594] cpu 1:
ERR : Invocation of FM_PORT_DeletePCD for fm0/port/MAC/9 failed with error code 0x00010013
failed to receive results: timed out

Same issue signature observed in https://community.nxp.com/t5/QorIQ/FMC-crashes/m-p/494244 also, but this issue is not marked as solved and seems that it is not generated by fmc clear option.

 

On our side the issue is reproducible 100% by generating egress traffic and forcing fmc -x  in the middle of transfers. Our expectation is that fmc and fmd gracefully fails without tainted trace.

Can you please help with this? We think that this issue should be easy to be reproduced on your side also.

 

Best regards,

Lucian

Labels (1)
0 Kudos
Reply
7 Replies

2,622 Views
AndreiZ
Contributor I

Hello @yipingwang

My name is Andrei and I have picked up this topic as Lucian is no longer working for this project.

Can you please share current status? Are you investigating the issue on your side!?

Thank you

0 Kudos
Reply

2,642 Views
AndreiZ
Contributor I

Hello @yipingwang 

My name is Andrei and I have picked up this topic as Lucian is no longer working for this project.

Can you please share current status? Are you investigating the issue on your side!?

Thank you

0 Kudos
Reply

2,843 Views
yipingwang
NXP TechSupport
NXP TechSupport

I didn't reproduce this issue with LSDK 19.06 image, I didn't find Kernel crash problem.

I connected  two LS1043 boards together through RGMII1 port

on bard 1, I execute the following command:

# iperf -s

On board 2, I execute the following command.

# iperf -c 10.10.10.101 -P 10 -t 30 &

# fmc -x

Please refer to the following console log on board2.

root@localhost:~# iperf -c 10.10.10.101 -P 10 -t 30 &
[1] 4694
root@localhost:~# ------------------------------------------------------------
Client connecting to 10.10.10.101, TCP port 5001
TCP window size: 153 KByte (default)
------------------------------------------------------------
[ 12] local 10.10.10.100 port 52304 connected with 10.10.10.101 port 5001
[ 8] local 10.10.10.100 port 52300 connected with 10.10.10.101 port 5001
[ 3] local 10.10.10.100 port 52286 connected with 10.10.10.101 port 5001
[ 9] local 10.10.10.100 port 52292 connected with 10.10.10.101 port 5001
[ 7] local 10.10.10.100 port 52296 connected with 10.10.10.101 port 5001
[ 6] local 10.10.10.100 port 52290 connected with 10.10.10.101 port 5001
[ 5] local 10.10.10.100 port 52288 connected with 10.10.10.101 port 5001
[ 10] local 10.10.10.100 port 52294 connected with 10.10.10.101 port 5001
[ 11] local 10.10.10.100 port 52302 connected with 10.10.10.101 port 5001
[ 4] local 10.10.10.100 port 52298 connected with 10.10.10.101 port 5001

root@localhost:~# fmc -x
[ 1011.024803] cpu 1: ! MINOR FM Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/src/wrapper/lnxwrp_fm_port.c:256 QmEnqueueCB]: Write Access Failed;
[ 1011.024806] cpu 1: timeout waiting for Tx confirmation
[ 1011.043160] cpu 1:
[ 1011.050490] cpu 1: ! MINOR FM-PCD Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:225 EnQFrm]: Write Access Failed;
[ 1011.050494] cpu 1: HC enqueue failed
[ 1011.068219] cpu 1:
[ 1011.073983] cpu 1: ! MINOR FM-PCD Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:1223 FmHcPcdSync]: Write Access Failed;
[ 1011.073985] cpu 1:
[ 1011.092234] cpu 1:
[ 1011.096525] cpu 1: *** ASSERT_COND failed [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:216 EnQFrm]
[ 1012.198629] cpu 1: ! MINOR FM Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/src/wrapper/lnxwrp_fm_port.c:256 QmEnqueueCB]: Write Access Failed;
[ 1012.198632] cpu 1: timeout waiting for Tx confirmation
[ 1012.216980] cpu 1:
[ 1012.224300] cpu 1: ! MINOR FM-PCD Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:225 EnQFrm]: Write Access Failed;
[ 1012.224302] cpu 1: HC enqueue failed
[ 1012.242044] cpu 1:
[ 1012.247798] cpu 1: ! MINOR FM-PCD Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/HC/hc.c:1107 FmHcKgWriteSp]: Write Access Failed;
[ 1012.247800] cpu 1:
[ 1012.266223] cpu 1:
[ 1012.270499] cpu 1: ! MAJOR FM-PCD Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/Pcd/fm_kg.c:2146 FmPcdKgUnbindPortToSchemes]: Write Access Failed;
[ 1012.270501] cpu 1:
[ 1012.290395] cpu 1:
[ 1012.294674] cpu 1: ! MAJOR FM-Port Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/Port/fm_port.c:1739 DeletePcd]: Write Access Failed;
[ 1012.294676] cpu 1:
[ 1012.313441] cpu 1:
[ 1012.317721] cpu 1: ! MAJOR FM-Port Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/Peripherals/FM/Port/fm_port.c:5334 FM_PORT_DeletePCD]: Write Access Failed;
[ 1012.317723] cpu 1:
[ 1012.337184] cpu 1:
[ 1012.341459] cpu 1: ! MINOR FM Error [CPU01, /home/jenkins/ci/lsdk/master/all/packages/linux/linux/drivers/net/ethernet/freescale/sdk_fman/src/wrapper/lnxwrp_ioctls_fm.c:4680 LnxwrpFmPortIOCTL]: Invalid Operation;
[ 1012.341461] cpu 1: IOCTL FM PORT
[ 1012.360407] cpu 1:
ERR : Invocation of FM_PORT_DeletePCD for fm0/port/MAC/6 failed with error code 0x00010013
root@localhost:~# fmc -x
root@localhost:~# fmc -x
root@localhost:~# uname [ ID] Interval Transfer Bandwidth
[ 12] 0.0-36.1 sec 71.8 MBytes 16.7 Mbits/sec
[ 9] 0.0-36.1 sec 71.7 MBytes 16.7 Mbits/sec
[ 10] 0.0-36.1 sec 71.6 MBytes 16.6 Mbits/sec
[ 11] 0.0-36.1 sec 71.6 MBytes 16.6 Mbits/sec
[ 4] 0.0-36.1 sec 71.6 MBytes 16.6 Mbits/sec
[ 8] 0.0-36.4 sec 71.6 MBytes 16.5 Mbits/sec
[ 3] 0.0-36.4 sec 71.7 MBytes 16.5 Mbits/sec
[ 7] 0.0-36.4 sec 71.6 MBytes 16.5 Mbits/sec
[ 6] 0.0-36.4 sec 71.6 MBytes 16.5 Mbits/sec
[ 5] 0.0-36.4 sec 71.7 MBytes 16.5 Mbits/sec
[SUM] 0.0-36.4 sec 716 MBytes 165 Mbits/sec
-z

root@localhost:~# uname -a
Linux localhost 4.19.46 #1 SMP PREEMPT Thu Jun 27 17:52:01 CST 2019 aarch64 aarch64 aarch64 GNU/Linux

 

0 Kudos
Reply

2,681 Views
LucianZ
Contributor II

Hello @yipingwang any updates on this topic?

As I mentioned in my replay the issue is the same, the difference in your case is just a kernel config option.

 

We're still blocked by this issue.

 

 

0 Kudos
Reply

2,743 Views
LucianZ
Contributor II

Hello I have replied few days ago and edited my post a bit and seems that it was somehow lost ...

Back to this topic.

I have looked over your logs and investigated the code and it looks like in you case kernel WARN() is not triggered. Probably CONFIG_BUG=n is disable in your build.

 

#ifdef DISABLE_ASSERTIONS
#define ASSERT_COND(_cond)
#else
#define ASSERT_COND(_cond) \
    do { \
        if (!(_cond)) { \
            XX_Print("*** ASSERT_COND failed " PRINT_FORMAT "\r\n", \
                    PRINT_FMT_PARAMS); \
            XX_Exit(1); \
        } \
    } while (0)
#endif /* DISABLE_ASSERTIONS */

void XX_Exit(int status)
{
    WARN(1, "\n\nFMD: fatal error, driver can't go on!!!\n\n");
}

 

 If you enable CONFIG_BUG=y you should get the same signature as in my case.

Even w/o the kernel WARN() there's still the MINOR / MAJOR FM-PCD errors triggered when fmc -x is used and still pending egress traffic. For us this looks as an instability and we need to understand the impact.

Currently we are in process of creating a production build and we are using dynamic / multistage fman/fmc policy/config controlling firewall and data paths and we observed this issue  which seems to be a notable instability at least with the understanding we have now.

Can you please provide a deeper analyse for this possible issue ?

Thank you,

BR,

Lucian

0 Kudos
Reply

2,747 Views
LucianZ
Contributor II

Hello again,

Back to this topic.

I have analyzed your logs and investigated the code a bit and looks like WARN is disabled in your build > and FMD WARN() is not triggered.

 

 

#ifdef DISABLE_ASSERTIONS
#define ASSERT_COND(_cond)
#else
#define ASSERT_COND(_cond) \
    do { \
        if (!(_cond)) { \
            XX_Print("*** ASSERT_COND failed " PRINT_FORMAT "\r\n", \
                    PRINT_FMT_PARAMS); \
            XX_Exit(1); \
        } \
    } while (0)
#endif /* DISABLE_ASSERTIONS */

void XX_Exit(int status)
{
    WARN(1, "\n\nFMD: fatal error, driver can't go on!!!\n\n");
}

 

 

Maybe in your case kernel is configured with CONFIG_BUG=n disabled not sure .., if you try enabling CONFIG_BUG=y you should get the same backtrace.

According to the logs it seems that the fman driver is not reliable when fmc -x is used during pending transfer triggering > MINOR FM-PCD ErrorMAJOR FM-PCD Error, FMD: fatal error, driver can't go on!!!

Even without the kernel WARN() backtrace seams to be this FM-PCD MINOR / MAJOR Errors and fmc tool is not gracefully performs cleaning operation.

We are preparing a build for production and have a multi stage fman/fmc configuration/policy which involves fmc -x and we want to understand the impact of this instability. 

BR,

Lucian

0 Kudos
Reply

2,715 Views
LucianZ
Contributor II

Hello @yipingwang  any updates on this ?

Thank you and best regards,

Lucian

0 Kudos
Reply