Fine-tuning guidance for S32K396 (like AN5191 for MPC5777C)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Fine-tuning guidance for S32K396 (like AN5191 for MPC5777C)

Jump to solution
265 Views
ricardofranca
Contributor III

Hello,

Feel free to jump to the questions in the end of this post, as my experiment description is quite verbose and may be superfluous...

I have been doing some benchmarking with the S32K396-BGA-DC1 running in S32DS for S32. My reference for timing behavior expectations is a MPC5777CEVB running the same program in S32DS for Power Architecture - I usually run it at 264MHz but also tried it a couple of times at 300MHz (not often because my chip is the old PN that is not supposed to run at 300MHz).

I believe I am running the MPC5777C at its optimal configuration (stack in cache, optimized Flash read/prefetch) as recommended by AN5191. My program takes some 750us to run when I let the processor run at 264MHz and some 650us when the processor runs at 300MHz (it is a sort of sequential code which does not rely very heavily on caches).

As I am a beginner on S32K3, I inserted my code in the Dio_Example_S32K396 example available in "S32K396 AUTOSAR R21-11 RTD 4.0.0 P14 D2403 Example Projects" - its linker directives allocate code/constants in Flash, data in SRAM and stack in DTCM. I also used the configuration tool to set the following clocks (using the "Mode A+" values presented in Table 122 of the S32K39/37 reference manual):

PLL_PHI0 = 320MHz

PLL_PHI1 = 480MHz

CORE_CLK = 160MHz

AIPS_PLAT_CLK = 80MHz

AIPS_SLOW_CLK = 40MHz

HSE_CLK = 80MHz

DCM_CLK = 40MHz

LBIST_CLK = 40MHz

QSPI_MEM_CLK = 320MHz

CM7_CORE_CLK = 320MHz

 

While I expected the new MCU to beat the old one due to its faster and more efficient core, I was surprised to see that it ran my program faster than the 264MHz MPC5777C but slower than the 300MHz MPC5777C. Both processors had very similar performance when I allocated all my data in the CPU0 DTCM, which is not something that would work for a larger application that would use most of the SRAM.

I did not configure the Flash by hand but it seems the configuration tool (or the example startup code) had set it wait states to 5 (which seems to match the 160MHz CORE_CLK), although it did not enable prefetching. I suppose this is a bit critical for my code, as disabling prefetching in the MPC5777C caused a 100us increase in its execution time.

Having said this all, my questions are:

- Can I enable prefetching using the S32DS configuration tool? If this is not possible, what would be the cleanest way of doing this? I do not know where my _start function comes from...

- Given the 5 wait states of the Flash (vs. 3 of the MPC5777C when its peripheral clock runs at 132MHz or even 150MHz), is it correct to assume the MPC5777C Flash memory performs better than the S32K396 one?

- Is there any document like the AN5191 that provides guidance for extracting the best performance of the S32K396?

 

Thanks!

Ricardo

 

0 Kudos
Reply
1 Solution
201 Views
petervlna
NXP TechSupport
NXP TechSupport

Hello,

- Can I enable prefetching using the S32DS configuration tool? If this is not possible, what would be the cleanest way of doing this? I do not know where my _start function comes from...

No — prefetching is not configurable via the S32DS Configuration Tool (as of the latest RTD and SDK versions).

You’ll need to manually configure the Flash prefetch settings in your startup code.

- Given the 5 wait states of the Flash (vs. 3 of the MPC5777C when its peripheral clock runs at 132MHz or even 150MHz), is it correct to assume the MPC5777C Flash memory performs better than the S32K396 one?

petervlna_0-1755764279642.png

  • MPC5777C:

    • Based on Power Architecture, with dual e200z7 cores.
    • Has a more advanced Flash controller with better pipelining and prefetch capabilities.
    • Often used in high-performance automotive and industrial applications.
  • S32K396:

    • Based on Arm Cortex-M7, optimized for flexibility and integration.
    • Flash controller is simpler, and performance is more sensitive to wait states and cache configuration.

- Is there any document like the AN5191 that provides guidance for extracting the best performance of the S32K396?

This is the closest equivalent to AN5191 for the S32K3 family. It provides:

https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://community.nxp.com/pwmxy876...

Best regards,

Peter

 

View solution in original post

1 Reply
202 Views
petervlna
NXP TechSupport
NXP TechSupport

Hello,

- Can I enable prefetching using the S32DS configuration tool? If this is not possible, what would be the cleanest way of doing this? I do not know where my _start function comes from...

No — prefetching is not configurable via the S32DS Configuration Tool (as of the latest RTD and SDK versions).

You’ll need to manually configure the Flash prefetch settings in your startup code.

- Given the 5 wait states of the Flash (vs. 3 of the MPC5777C when its peripheral clock runs at 132MHz or even 150MHz), is it correct to assume the MPC5777C Flash memory performs better than the S32K396 one?

petervlna_0-1755764279642.png

  • MPC5777C:

    • Based on Power Architecture, with dual e200z7 cores.
    • Has a more advanced Flash controller with better pipelining and prefetch capabilities.
    • Often used in high-performance automotive and industrial applications.
  • S32K396:

    • Based on Arm Cortex-M7, optimized for flexibility and integration.
    • Flash controller is simpler, and performance is more sensitive to wait states and cache configuration.

- Is there any document like the AN5191 that provides guidance for extracting the best performance of the S32K396?

This is the closest equivalent to AN5191 for the S32K3 family. It provides:

https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://community.nxp.com/pwmxy876...

Best regards,

Peter