hello,
I did some tests using the example lwip_iperf_enet_qos_bm. I used that because there is both m7 and m4 version. I don't know then what changes from the non-qos version.
Some data:
MCUXpresso IDE v24.9 [Build 25] [2024-09-26]
SDK 2.16.100
board RT1170-EVKB
My PC is:
cpu: AMD Ryzen Threadripper 2990WX 3.0GHz (4.3MHz turbo)
mem: 32GB DDR4
net: Intel I211 Gigabit Network
SO: xubuntu linux 24.04
iperf: 2.1.9
To test the PC NIC performance I tried iperf with another PC. So the performaces of my NIC when communicating with another PC are (performance is in Mbit/s):
To explain: on my PC I ran the server and on the client I used tradeoff (simultaneous bidirectional which I indicated COMBO) and dualtest (which runs RX and TX tests sequentially) modes. I verified that the dualtest mode achieves the same performance as the stand alone rx and tx modes.
On the board I ran m7 in flash and ram and the cm only in ram because the example I could not run in flash. Find all the examples attached (even the m4 flash one).
the results are as follows:
To explain: RX and TX TCP I calculated them using item 4 of the menu on the board, COMBO TCP item 3.
For similar UDP using items 9 and 8.
I'm not sure about the COMBO data, but about TX and RX I think they are reliable and the numbers I think are pretty much in line with what we should expect from a 1G eth. The only doubt is about UDP on M7 where TX is much lower than RX and much lower than TX on M4
There are the attached examples. Can you confirm that you get comparable numbers to mine?
Can you give an explanation why UDP in TX is slower on M7 than on M4? And why on M7 UDP TX is 13 times slower than RX?
Is this performance the maximum achievable by the combination of this peripheral + bus + core, or is the bottleneck the SW and so working on it could achieve more? And how much more? A few or many percentage points?
Note that the RAM used in the attached examples is exclusively TCM.
regards
Max
Hi @mastupristi,
From your projects attached, it is worth noting that you have enabled SW checksum, which has a high impact on performance.
Please try the iperf performance using the default configuration of the latest SDK (2.16.100) to better understand the results.
BR,
Edwin.
Hi @EdwinHz ,
thank you for your reply. However, I think there might have been some misunderstanding.
To begin with, as I clearly stated in my original message, I used the example from SDK 2.16.100 ("lwip_iperf_enet_qos_bm"). Specifically, I made minimal changes to ensure my tests ran as intended and even attached the projects to my post for your convenience. I am curious: are you suggesting that in the official SDK example, hardware checksum offloading is disabled by default? If that is the case, could you kindly clarify where exactly this setting is configured?
That said, I must point out that the main focus of my inquiry wasn’t about how to increase performance (though that could be interesting too). My main concerns were:
While I appreciate the suggestion about hardware checksum offloading, it doesn't seem to address my specific concerns about performance disparities or theoretical limits. If you still believe it’s relevant, could you please explain how it might account for the discrepancies I observed?
Looking forward to your insights.
Best regards,
Max
Hi @mastupristi,
Thanks for your clarifications. The thing is, this disparity that you are talking about is not aligned with the results we have previously gotten. There are the previous results we see for CM7:
CM7 | |
Iperf UDP TX | 954 Mbps |
Iperf UDP RX | 905 Mbps |
There seems to be something wrong with the Iperf UDP RX on your tests. This is why I mentioned the previous suggestion to disable the checksum.
This can be done under Project Properties > C/C++ Build > Settings > Tool Settings > Preprocessor:
BR,
Edwin.
Hi @EdwinHz ,
thank you for your response, but I must admit I’m rather baffled.
First, you suggested I modify the preprocessor settings, pointing to a potential checksum issue. However, as I’ve stated (and as you could easily verify by opening the projects I attached to my original message), the settings in my projects are identical to the ones in the SDK example. Indeed, it could not be otherwise since my projects are almost identical to the SDK project from which I derived them. Did you actually check the projects? Or are you basing your suggestion solely on assumptions?
Here a screenshot:
Second, you referenced “previous results” showing 954 Mbps for UDP TX and 905 Mbps for UDP RX on CM7. This is intriguing, as it’s the first time I’ve heard about these numbers. Could you please provide more details? For instance:
Finally, I feel like my core questions are still being sidestepped. Let me reiterate:
These are critical questions for evaluating whether further development effort is worthwhile, yet I feel they’ve been overlooked in favor of generic suggestions.
I urge you to review the attached projects thoroughly before suggesting changes again. If something in my setup deviates from your “previous results,” I’d appreciate precise feedback based on actual analysis. Otherwise, this back-and-forth isn’t helping us reach a meaningful conclusion.
Looking forward to receiving a detailed and well-informed response.
Best regards,
Max
Hi @mastupristi,
It seems like the issue might lie on the SDK's drivers. These are the performance tests done in both SDK v2.16.100 and 2.16.000:
As you can see, performance in the previous 2.16.0 is much higher and in line with the expected results. This is probably the issue you are seeing, and what is causing both bad performance on UDP TX, as well as CM7's underperformance compares to CM4.
I will investigate this issue further with the internal SDK team, but in the meantime, if you wish to execute without those issues, please use SDK v2.16.0 instead.
BR,
Edwin.
Hi @EdwinHz
just to see things from another point of view
this directly relates SDK 2.16.0 and 2.16.1
you say:
As you can see, performance in the previous 2.16.0 is much higher and in line with the expected results.
this is only true for ENET 1G.
This is probably the issue you are seeing, and what is causing both bad performance on UDP TX, as well as CM7's underperformance compares to CM4.
How can you be sure of this statement?
What clues lead you to this conclusion?
Seeing the data I pick up conflicting clues: in my case (UDP over CM7 ENET QoS Flash opt -O3) RX 13 times faster than TX, in your case RX is 1.29 times faster than TX.
Also, you get much better performance with ENET 1G but I also get much better performance with QoS, whereas you have a very big difference between the two ENETs with other conditions being equal? Does this depend on the peripherals or the driver?
In conclusion, there are still several points that need to be clarified.
I find it hard to think of (only) problems with version 2.16.1
best regards
Max
Hi @mastupristi,
How can you be sure of this statement?
What clues lead you to this conclusion?
The chart that I previously shared with you show that the appropriate performances for ENET 1G and ENET QoS on the default example code for both SDK 2.16.0 and 2.16.1, and they do not have a low TX like you showed on your initial chart.
Does this depend on the peripherals or the driver?
This would have to be dependent on the peripherals, yes.
BR,
Edwin.