Test cmd:
dd if=/tmp/Full.run of=/dev/null
4.14 performance:
$ dd if=/tmp/Full.run of=/dev/null
758105+1 records in
758105+1 records out
388149983 bytes (370.2MB) copied, 2.160693 seconds, 171.3MB/s
$ dd if=/tmp/Full.run of=/dev/null
758105+1 records in
758105+1 records out
388149983 bytes (370.2MB) copied, 2.116380 seconds, 174.9MB/s
$ dd if=/tmp/Full.run of=/dev/null
758105+1 records in
758105+1 records out
388149983 bytes (370.2MB) copied, 2.096918 seconds, 176.5MB/s
$ perf record -g dd if=/tmp/Full.run of=/dev/null
758105+1 records in
758105+1 records out
388149983 bytes (370.2MB) copied, 2.197450 seconds, 168.5MB/s
5.10 performance
$ dd if=/tmp/Full.run of=/dev/null
758105+1 records in
758105+1 records out
388149983 bytes (388 MB, 370 MiB) copied, 3.2701 s, 119 MB/s
$ dd if=/tmp/Full.run of=/dev/null
758105+1 records in
758105+1 records out
388149983 bytes (388 MB, 370 MiB) copied, 3.05876 s, 127 MB/s
After diff the performance, it reported that: __copy_tofrom_user() is much more than 4.14.
Please use kmalloc() , dma_map_single() and copy_to_user() in your application.
In addition, the latest SDK release for T1042 from NXP is from https://github.com/nxp-qoriq/yocto-sdk/tree/dunfell, please try whether you can get better performance with this version Kernel.