Good morning,
we're performing a performance test on our 2 platforms, imx7d and imx8 mmini, to investigate an issue we're facing.
The test is based in a basic communication between 2 threads, implemented with the pthread library, and it works as follow:
1) One thread is in waiting with a conditional wait
2) the other thread wakes it up
3) elapsed time computation
4) threads exchange roles
5) cycle restarts
Fact is: we see very different performance with this test between imx7 and imx8 platforms.
To make a deeper analysis, we make these measurements in 3 different conditions:
- threads on the same core
- threads on different cores
- OS decides on its own
The most critical condition, the one in which the 2 platforms perform very differently is when threads are in different cores. We even tried to put 2 cores offline in the imx8 platform, but as it is possible to see below, nothing changes. Can someone help us with this situation? Are we missing something? 
 
 
Further details:
We tried to use this system settings to get a better system stability for all the tests:
 
- scaling_governor: performance
- dynamic frequency scaling driver: disabled
(i.e. on imx8: echo 0 > /sys/bus/platform/drivers/imx_busfreq/busfreq/enable)
Data
 
imx7d
uname -r:  4.9.11+gf1a31cc
 
Forcing threads on different cores
 [T1] Average is 10 us; 
 
[T2] Average is 14 us; 
 
 
Forcing threads on same core
 
[T1] Average is 10 us; 
 
[T2] Average is 14 us; 
 
 
OS scheduler decides thread-core affinity
 [T1] Average is 11 us; 
 
[T2] Average is 15 us; 
  imx8mmini (all cores online)
 
uname -r: 4.14.78-imx_4.14.78_1.0.0_ga_dev+g991fec2
 
 Forcing threads on different cores
 
[T1] Average is 493 us; 
 
[T2] Average is 458 us; 
 
 
Forcing threads on same core
 
[T1] Average is 10 us; 
 
[T2] Average is 10 us; 
 
 
OS scheduler decides thread-core affinity
 [T1] Average is 507 us;
 
[T2] Average is 448 us; 
imx8mmini (only 2 cores online)
uname -r: 4.14.78-imx_4.14.78_1.0.0_ga_dev+g991fec2
 
 
 Forcing threads on different cores
 [T1] Average is 474 us; 
 
[T2] Average is 379 us; 
 
 
Forcing threads on same core
 
[T1] Average is 10 us; 
 
[T2] Average is 10 us; 
 
 
 
OS scheduler decides thread-core affinity
 [T1] Average is 480 us; 
 
[T2] Average is 384 us; 
 
 As it is possible to see the difference is sensible. We're available for any further useful test, just let us know.
Any help would be appreciated, thanks!