Bad HIFI4 Performance on iMX8ULP

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Bad HIFI4 Performance on iMX8ULP

834 Views
ferringer1
Contributor IV

We have the iMX8ULP eval board and we are running a demo-app there:

  • On the HIFI4 core, we have instances of the LC3Plus Encoder/Decoder embedded in the provided imx_audio_framework. 
  • Audio data is provided from the application domain using hardware message boxes for synchronization

Those are our observations

  • Everything works fine. Audio is encoded & decoded correctly
  • Audio data was saved as WAV file to check quality etc - everything as expected
  • Exchanging data with the app-domain is not the problem. We measured the message-passing time explicity.
  • BUT: Performance is VERY BAD! The HIFI4 is running at full speed, i.e. 475MHz, and it takes about 2.5ms to encode/decode one 10ms block worth of audio (16kHz, 16bit, 10ms). 
  • We did the same test on an RT600, which also has a HIFI4 core, and there the performance is about a factor of 10 better. One pass takes just appr. 0.27ms!
  • Even when we place data in the DTCM (tightly coupled memory), performance is still a factor of 3 worse. And the TCM is way to small to host all the data of our application.
  • It is to be expected that accessing internal RAM is somewhere between DDR and TCM, so the performance penalty will be somewhere between a factor of 3 to 10...

Questions:

  • Can you explain this vast deviation?
  • As both RT600 and imx8ulp have the same HIFI4 core, it cannot be the core itself.
  • We suspect it has to do with probably very bad DDR memory access times. Can you confirm this? If so, what can we do about it?
  • We also found some comments in the code, see dsp_wrapper/src/dsp_wrap.c in Line 310: There you explicitly exclude certain codecs from processing in the HIFI4 because of performance reasons - explicitly and exclusively for the imx8ulp! Why is that?
  • Another related question regarding the dsp_warp.c file: For the MX8ULP, you are explicitly choosing CODEC_FSL_MP3_DEC over CODEC_MP3_DEC, same for another codec. Why are you doing that? What is the difference between those codec-types?
  • Can you comment on the performance of the HIFI4 wrt. I/DTCM, DDR and Internal (Shared) Memory?

Thanks and regards,

Markus

0 Kudos
Reply
5 Replies

722 Views
ferringer1
Contributor IV

To post the answer from the other support case here: Long story short, DDR performance of HIFI4 is bad because it's running over a NIC. Nothing can be done about it.

I am wondering if this is also true for the Application Domain, as the DDR is accessed via a NIC as well.

0 Kudos
Reply

759 Views
ferringer1
Contributor IV

I guess so, thanks!

0 Kudos
Reply

768 Views
brian14
NXP TechSupport
NXP TechSupport

Hi @ferringer1

Thank you for contacting NXP Support.

It seems that there is an issue with the data flow it could be related to the throughput from the audio bus.
To confirm, are you using the i.MX8ULP EVK or a custom board?

0 Kudos
Reply

765 Views
ferringer1
Contributor IV
We are using the i.MX8ULP EVK.
We are passing in 10ms worth of audio data via a shared buffer in DDR. Those 10ms worth of audio data are then LC3+ encoded and immediately decoded again, and pushed back to RAM.
To the best of our knowledge, the bad performance only affects the LC3+ algorithms, i.e. the actual processing within the HIFI4 core.
The raw round-trip time of the data, without any LC3+ encoding/decoding, has been measured separately, and doesn't seem to be the problem at all.
0 Kudos
Reply

761 Views
brian14
NXP TechSupport
NXP TechSupport

Hi @ferringer1

I noticed that an NXP colleague is working on your case, if this is correct would it be ok to close this ticket and let my co-worker continue in your thread this to avoid any kind of confusion.

0 Kudos
Reply