I've got a board based on the imx6q - it's running the 3.0.35 kernel (via yocto) and is generally pretty stable. My code fires out a lot of UDP packets via a gigabit ethernet interface. Using a simple standard socket / sendto() TX only test loop I can get a reasonable 490Mbit/s throughput on the wire before the CPU load tops out. However, top out it does and having a quick dig via perf I get the feeling that it's mostly the copying and context switching while sending the packets that causes this. I'm pretty sure the ethernet driver itself is fine as its the standard e1000e driver and also I can easily saturate the gigabit running iperf multithreaded.
So, I was planning on converting my code to use PACKET_MMAP to see what can be achieved as this seems to work well on other platforms. Doing a quick and dirty test using packet_mmap.c (from Linux packet mmap - IwzWiki) I top out at around 265Mbit/s before the CPU maxes out which is pretty bad. Note that the same version of the code compiled on my x86 box behaves exactly as expected. If I run "perf top" on the target I see that about 60-70% of the CPU time is spent in v7_flush_kern_dcache_area which doesn't seem right. Has anyone any experience doing similar on any of the imx6 boards / kernels? Unfortunately I don't have the dev kit so cant easily test on a later kernel! (if anyone fancies spending 10 mins replicating on their board it would be appreciated!)