In the S32K14x, why start stack in SRAM_U?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

In the S32K14x, why start stack in SRAM_U?

Jump to solution
3,510 Views
Joey_van_Hummel
Contributor III

In the S32K14x series, SRAM is divided into two regions: SRAM_L and SRAM_U. Documentation explicitly mentions that SRAM_L extends downwards. Depending on the SRAM size, SRAM_L start address is lower to facilitate larger memories for SRAM_L. Documentation also states that SRAM_U extends upwards.

Documentation also states:

Misaligned accesses across the 2000_0000h boundary are not supported in the Arm Cortex-M4F architecture.

and

Burst accesses cannot occur across the 2000_0000h boundary that separates the two SRAM arrays. The two arrays should be treated as separate memory ranges for burst accesses.

Take also in account that

Accesses to the SRAM_L and SRAM_U memory ranges outside the amount of RAM on the chip causes the bus cycle to be terminated with an error followed by the appropriate response in the requesting bus master.

Taking all this into account, when looking at software development, SRAM_L to me sounds like the perfect place to put the stack. It extends downwards, so the starting address can be the same for any of the S32K14x series chips. SRAM_U is perfect for the heap in my eyes, as again the starting address can be the same for any of the S32K14x series chips.

This setup also steers clear of crossing the 0x2000_0000 boundary by growing away from it. In addition, if I ever cause a stack overflow, this will be detected; Instead of causing heap corruption, the stack in my setup grows away from the heap and into reserved memory ranges.

All seems clear as day.

However, I thought I'd take a look at S32DS' included example projects, and noticed the opposite is done there. The setup is more traditional, with stack starting at the top of SRAM_U, and the heap starting at the bottom of SRAM_L. So, heap and stack grow towards each other and towards the 0x2000_0000 boundary.

Example.png

To me this makes no sense, and I was wondering if anyone could shed some light on why NXP decided to utilize the memory in this way. The argument could be made for having flexible heap and stack sizes, but SRAM_L and SRAM_U are not contiguous memory devices. The boundary limitation sounds like a pain, and the setup I described above provides robustness and portability benefits that NXP's setup does not.

So, my question is: what could the arguments for NXP's setup be?

Labels (1)
1 Solution
3,169 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hi Joey,

Sorry for the delayed response, there is one more thing.

Although the RM states this

Accesses to the SRAM_L and SRAM_U memory ranges outside the amount of RAM on the chip causes the bus cycle to be terminated with an error followed by the appropriate response in the requesting bus master.

It is not recommended to utilize that but to avoid that.

In fact, there is an internal memory region at the end of SRAM_U that is mapped and does not trigger a fault exception when accessed.

pastedImage_3.png

The MCU has an MPU module that can be used for the overflows.

Thanks,

BR, Daniel

View solution in original post

7 Replies
3,169 Views
johnadriaan
Contributor III

Joey,

My five cents (Australia did away with the 1¢ and 2¢ coins decades ago):

I like your two diagrams that show the two areas either converging or diverging. I really like your idea of using the same start addresses for both, regardless of actual chip being used. However...

Ignoring the heap's actual start address as pointed out by danielmartynek‌, what your second diagram shows is that you're limiting both the heap and stack to less than what could be available. If not all the stack is required, the heap can't use it, and vice versa. The first diagram shows that a large heap and small stack, or vice versa, can co-exist gracefully.

In your version, if either "overflow" you get a hard crash when the processor accesses invalid memory. In the first version, you get a "soft" crash if the two memory areas collide - who knows what will happen! Which is "preferred" is up to the project...

While your point about the boundary between the two memory areas is relevant, what you seem to have missed is that it's only for mis-aligned accesses: e.g. a 32-bit access at 0x1FFF'FFFE. Normal aligned accesses would flow over the boundary without a hiccup. Think of it as a tram track in the path that you don't want to step on rather than a brick wall.

It's really, really hard to organise the stack such that it uses misaligned accesses (the C/C++ compiler will never allow that to happen), and while easier to happen in the heap, again the compiler usually organises structures so that the multi-byte fields within them are aligned. So, I guess a mis-aligned struct as a local variable could cause a stack problem too. Do you explicitly pack your structs? Or have you rearranged them to avoid compiler padding?

The burst access point is interesting. It says it "cannot" happen, rather than unsupported. I'm unsure what would happen if an attempt is made - it'd probably just process it as two bursts, or perhaps stop the burst and continue in non-burst mode. But again, this is not something that normal C/C++ code has to worry about.

Are you using any low-level driver code, or writing in assembly language?

John

3,169 Views
Joey_van_Hummel
Contributor III

Hi John,

Thanks for your reply. I've only started reading the S32K and ARM documentation a few days ago. I come from the S32's older siblings S12 and S08. There's a lot of information to process and there are bound to be things I'll have missed, but I feel like the transition will be well worth the effort.

About detecting overflows, this was more or less a brainfart. We can of course use the MPU to detect overflows, and we don't have to use the bus-faulting mechanism as I described above. I figured it'd be a nice extra, but I can't for the life of me figure out what the added benefit would be.

As for your comment about the boundary: I did see that it was solely for mis-aligned accesses, but coming from the S08 and S12 family, we're used to always using the smallest necessary variable size to accommodate its usage. Usually this meant using 8- or 16-bit variables all over the place. I assume using 8/16-bit variables in a 32-bit processor has a detrimental effect on efficiency, and we should just avoid using smaller variables wherever possible. I actually don't know for sure whether or not the compiler pads 8- and 16-bit variables. I seem to remember from an ARM workshop in college that it doesn't, but that was a while ago. I'll read up on that. We never use packing of structs. We just let the compiler decide whatever is optimal, so we should be fine in that regard.

We do write low-level drivers and we do occasionally write code in assembly. My colleague and I have a strong bias towards writing completely bare-metal code, so we'll probably not be utilizing any CMSIS or other abstraction layers except our own.

I really appreciate your and Daniel's input. So far I've learned a lot. Switching architectures isn't trivial. The other day I was trying to grasp why the PC is initialized with an offset of +1. This was absurd to me, since in the S08 and S12 this would be wrong. Turns out, bit[0] indicates that the address contains Thumb instructions. Same apparently goes for interrupt vector entries.

Thanks and kind regards,

Joey

0 Kudos
3,169 Views
johnadriaan
Contributor III

Joey,

I used an S08 chip a long time ago, when I was still doing most of my work on the x86. Memories! I too am doing everything bare metal - I found that by the time I understood the way CMSIS stuff worked, I almost had to fully understand the relevant peripheral's register set completely anyway, so I might as well talk to the peripheral myself.

As I understand it, an 8- or 16-bit access is no more problematic for an ARM than a 32-bit. The bus transaction is aligned 32-bit, but the hardware auto-shifts the necessary 8- or 16-bit quantity to where it's needed. A struct that only had 8- and 16-bit quantities would only need to worry about the alignment of 16-bit values, of course. My experience has been that the compiler will put two 16-bit values in the same 32-bit word, or four 8-bit values, or 2x8+1x16, or 1x16+2x8 - but not an 8, 16 then 8. That would pad as 8,(8),16,8 if you know what I mean.

As for the Thumb +1 trick, yes: it takes a little while to get used to ignoring the LSb (odd) bit for code pointers (not data ones!). Function pointers and the like have to be handled with kid gloves...

John

3,169 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hello jh@bevertmc.nl,

Could you specify version of the S32DS and the examples that you are referring to?
I have gone through several of them in S32DS 2.2 and I see that it is as follows.

pastedImage_1.png

m_data_2 (SRAM_U):
The Stack starts at the top of the SRAM_U region and grows towards 0x20000000
The Heap starts at the bottom of the SRAM_U region and grows towards 0x20007000

m_data (SRAM_L):

Contains the Interrupt vector table (if enabled), global variables, and a code_ram section for SRAM routines.

And it makes sense, because each region uses a different bus.
It allows accessing both SRAM_U(data) and SRAM_L(code) simultaneously.

pastedImage_2.png

pastedImage_3.png

AN4745 Optimizing Performance on KinetisK-series MCUs

pastedImage_5.png

Regards,

Daniel

3,169 Views
Joey_van_Hummel
Contributor III

Hi Daniel,

Thanks for your reply. I've wrongly assumed the global variables were placed on the heap. I saw global vars placed in SRAM_L and concluded the heap was being set there. I've taken another look and you're right. Heap is placed at 0x2000_0000. Sorry for the misdirection.

Would you say my setup as I described it in my first post (stack growing down from 0x1FFF_FFFF, .data growing up from 0x2000_0000) has any other downsides or technical limitations? We do not use dynamic memory allocations. We also do not care much about access latencies and wait-states when executing from RAM (which is rarely). We mostly want to avoid crossing the 0x2000_0000 border while freeing up as much space for .data and stack, and make use of the reserved space around SRAM to detect any overflows.

With kind regards,

Joey

0 Kudos
3,170 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hi Joey,

Sorry for the delayed response, there is one more thing.

Although the RM states this

Accesses to the SRAM_L and SRAM_U memory ranges outside the amount of RAM on the chip causes the bus cycle to be terminated with an error followed by the appropriate response in the requesting bus master.

It is not recommended to utilize that but to avoid that.

In fact, there is an internal memory region at the end of SRAM_U that is mapped and does not trigger a fault exception when accessed.

pastedImage_3.png

The MCU has an MPU module that can be used for the overflows.

Thanks,

BR, Daniel

3,169 Views
Joey_van_Hummel
Contributor III

Hi Daniel,

Thanks for taking the time to come back. I'll stick to the traditional setup.

Kinds regards,

Joey

0 Kudos