Hello,
We are using a imx8qm on a custom board we created using TJA11454ATK/FD (x2) and TJA1042TK/3 for the can transceivers. I implemented a set of throughput tests (using pytests) for all interfaces and bitrates. We are using Yocto (Scarthgap) with kernel version 6.6.28 from Freescale. All can interfaces use the flexcan driver. The software for the setup uses cangen and candump to generate and capture the traffic.
candump -tz -T1700 can1 2>/dev/null 1>/tmp/canlog &
cangen -n10000 can1 -I123 -L8 -Di -g0 -p1
Rationale for some of the parameters:
- -g0: On purpose to push the interface to the limit
- -p1: Needed since the app will receive -ENOBUFFS to retry the send
- Redirection to file is needed to parse the output
- Even though I am pushing the errors to devnull the missing packets can be seems in system level (before they reach application)
For the physical setup, we have 7 can nodes on the line, with both end terminated with 120Ohm resistor. The nodes are either the transceivers from above or PEAK CANFD adapters. The cable is twisted pair and it has a length of smaller than one meter. Finally the PEAK CANFD adapters are connected the a regular PC. For these tests that I am referencing all nodes but two (the ones involved in the test) are off.
(Hope I didn't forget any important information on the setup)
When I run the tests across all interfaces, bitrates (25k, 50k, 100k, 125k, 250k, 1M) and dbitrates(1M, 2M, 5M), I have noticed that a specific subset of tests is having problems. The tests run in both directions (send, recv - from the perspective of the board). When the board is the sender all combinations of tests work fine (this is also why I don't expect the issue to be related to cabling). However when the board is receiving traffic the tests with a dbitrate of [2M, 5M] (bitrate [25K, 50k] for all interfaces [can0, can1, can2] most likely not relevant but it's good to mention) seem to sometimes fail.
What I mean here is that not all of the tests constantly fail but rather when these parameters are involved there is a big chance they will. (Keep in mind that send tests have not failed so far) Ex. Below

I have multiple assertions for what would fail these tests. However all of these have the same pattern, which is as follows:
- Cangen has sent all of the messages
- Candump has finished collecting and exited
- The packets transmitted (using ifconfig on the interface) are at least 10000 more than the after the test is finished (this asserts that at least X amount of packets were transmitted).
- The packets received (using ifconfig on the interface) are at least 10000 more than the after the test is finished (this asserts that at least X amount of packets were received). This is were the tests are failing (so the error in unrelated to candump).
A note at this point is that ifconfig at both the receiver and the sender reports 0 messages with errors or dropped. Even if data frames were lost due to the high bitrates of the interfaces I would have expected this to be reported.
This is just an (un-)educated guess but the problem seems to be related to some rx queue filling up before it can be emptied at a very low level.
I am not an expert on can so if you see something that feels off or you need more clarification about feel free to ask.
Kind regards