Hello,
We are using a imx8qm on a custom board we created using TJA11454ATK/FD (x2) and TJA1042TK/3 for the can transceivers. I implemented a set of throughput tests (using pytests) for all interfaces and bitrates. We are using Yocto (Scarthgap) with kernel version 6.6.28 from Freescale. All can interfaces use the flexcan driver. The software for the setup uses cangen and candump to generate and capture the traffic.
candump -tz -T1700 can1 2>/dev/null 1>/tmp/canlog &
cangen -n10000 can1 -I123 -L8 -Di -g0 -p1
Rationale for some of the parameters:
For the physical setup, we have 7 can nodes on the line, with both end terminated with 120Ohm resistor. The nodes are either the transceivers from above or PEAK CANFD adapters. The cable is twisted pair and it has a length of smaller than one meter. Finally the PEAK CANFD adapters are connected the a regular PC. For these tests that I am referencing all nodes but two (the ones involved in the test) are off.
(Hope I didn't forget any important information on the setup)
When I run the tests across all interfaces, bitrates (25k, 50k, 100k, 125k, 250k, 1M) and dbitrates(1M, 2M, 5M), I have noticed that a specific subset of tests is having problems. The tests run in both directions (send, recv - from the perspective of the board). When the board is the sender all combinations of tests work fine (this is also why I don't expect the issue to be related to cabling). However when the board is receiving traffic the tests with a dbitrate of [2M, 5M] (bitrate [25K, 50k] for all interfaces [can0, can1, can2] most likely not relevant but it's good to mention) seem to sometimes fail.
What I mean here is that not all of the tests constantly fail but rather when these parameters are involved there is a big chance they will. (Keep in mind that send tests have not failed so far) Ex. Below
I have multiple assertions for what would fail these tests. However all of these have the same pattern, which is as follows:
- Cangen has sent all of the messages
- Candump has finished collecting and exited
- The packets transmitted (using ifconfig on the interface) are at least 10000 more than the after the test is finished (this asserts that at least X amount of packets were transmitted).
- The packets received (using ifconfig on the interface) are at least 10000 more than the after the test is finished (this asserts that at least X amount of packets were received). This is were the tests are failing (so the error in unrelated to candump).
A note at this point is that ifconfig at both the receiver and the sender reports 0 messages with errors or dropped. Even if data frames were lost due to the high bitrates of the interfaces I would have expected this to be reported.
This is just an (un-)educated guess but the problem seems to be related to some rx queue filling up before it can be emptied at a very low level.
I am not an expert on can so if you see something that feels off or you need more clarification about feel free to ask.
Kind regards
Hello @dervel
It's not an known issue to us, we need to further investigate the issue. I set up a similiar stress test with 2 nodes of i.MX93, but haven't reproduced the issue. I'll try with i.MX8QM MEK once I have the board available.
Just want to double check a few facts:
- The transmitted packets reported by ifconfig is always increased by exactly the number you specified in cangen, no failure on this.
- The received packets reported by ifconfig is equal to number of packets printed by candump
- "ifconfig" never reports packet error or drop on both sides
- When it fails, the transmitted packets and received packets reported by ifconfig doesn't match
- Which [bitrate, dbitrate] combination is most likely to fail? Does it fails more when dbitrate is increased?
A few suggestions on steps to narrow down the issue:
- If possible, monitor the packet with an CAN analyzer to see the packets actually transmitted.
- Compare the CAN_TX signal on transmiting side and CAN_RX signal on receiving side with a logic analyzer.
- Power on more nodes on the line in listen-only mode when running the test. This should tell us if increased bus load makes it worse, since there's more error on higher FD bitrates.
Regards
Harvey