Depends on what kind of operations you are asking.
The Cortex A9 NEON core can do:
1) 4 single precision multiply/multiply-accumulate in 2 cycles
2) 4 other single precision operation in 1 cycle.
In practice it's not difficult to achieve 25%-30% code efficiency.
For a 1Ghz single core configuration, you can have 1GHz*25%*4 = 1GFLOPS for trivial operations, and 500M multiply-accumulate per second.
To get closer to the peak performance requires a lot of hand optimized NEON code. 25% of the peak is what I usually get from GCC NEON intrinsics.
Depends on what kind of operations you are asking.
The Cortex A9 NEON core can do:
1) 4 single precision multiply/multiply-accumulate in 2 cycles
2) 4 other single precision operation in 1 cycle.
In practice it's not difficult to achieve 25%-30% code efficiency.
For a 1Ghz single core configuration, you can have 1GHz*25%*4 = 1GFLOPS for trivial operations, and 500M multiply-accumulate per second.
To get closer to the peak performance requires a lot of hand optimized NEON code. 25% of the peak is what I usually get from GCC NEON intrinsics.