So, I was reading in App Note "4745" that SRAM_LOWER is faster and we should try and use it as much as possible but the MCUXpresso default appears to be to use the SRAM_UPPER first.
"All Kinetis K-series devices include two blocks of on-chip SRAM. The first block (SRAM_L) is mapped to the CODE bus, and the second block (SRAM_U) is mapped to the system bus. The memory itself can be accessed in a single cycle, but because instruction accesses to the system bus incurs a one clock delay at the core, SRAM_U instruction accesses take at least two clocks. SRAM_L is the only memory where code or data can be stored and the core is almost always guaranteed a single cycle access. For this reason, it makes sense to use the SRAM_L block as much as possible. This is a good area for storing critical code."
So, I was running out of RAM (heavy GUI application) and I went into FreeRTOS (heap1.c) and added the following include statement and changed one line of code. This should run faster for me right?
//static uint8_t ucHeap[ configTOTAL_HEAP_SIZE ]; -- Remove This Line
__NOINIT(SRAM_LOWER) uint8_t ucHeap[ configTOTAL_HEAP_SIZE ]; -- Add This Line
Does anyone see any potential issues with this, I haven't noticed any yet.
Hi Myke (Yes, my CPU is MK22FN512)
I just read that and agree that is one way to go.
As for heap1 vs heap4, that is just one of my firmware preferences when the design is possible to never free() memory. It makes my test cases smaller because I don't have to test for fragmentation and leak issues.
Do you agree that this LOWER block is faster?
Yes, my CPU is MK22FN512
I have to pay attention to the subject line more, duh. I had the feeling you were talking about the K22 but I didn't see that in the text of your posts.
Thank you for pointing out AN4745.pdf - I haven't seen that before and it offers some interesting information (it's now in my folder of useful ApNotes).
I just deleted a long winded answer to your question:
Do you agree that this LOWER block is faster?
and I realized that it's probably better to simply state the answer as:
I don't care which block is faster.
When somebody is asking about MCU application "speed", I the question to ask what are the requirements with regards to the needed response time for different stimuli. When you are working with an MCU with a UI, sensors, actuators, communications, the "speed of operation" is more appropriately defined as "response time" to different inpputs and the question becomes does the application respond in an appropriately timely manner:
When doing part selection for an application, I am looking for a device that meets the response time requirements without doing anything special. If a part number is so marginal for the application requirements that I have to worry about the number of clock cycles per block read to meet the requirements, I'm going to look at other devices.
Last point on the heap model. I looked at this some time ago and decided to stick with Heap 4 for FreeRTOS. I think we agree that MCU application code shouldn't have any malloc's/free's in the application code (memory leaks are too much of a danger for an embedded device with no MMU regardless of how good a programmer you are) but that doesn't mean that RTOS resources don't use these functions. When I looked at the FreeRTOS code, the messaging functions (queues, semaphores, mutexes, notifications, etc.) DO use malloc's and free's to handle different message sizes - if you use Heap1 I seem to remember that the code for providing buffers for these functions ends up being more complex and execute longer in Heap1 than Heap4. There maybe cases where if you use a single message size and limit the functions you use that Heap1 will be more efficient than Heap4 but that will reduce your options significantly - you should probably ask this question in the FreeRTOS community forum.
Yeah, I wasn't clear on my question regarding speed. What I was really concerned with was the "risk level" of this type of change in a "maintenance release" of this code.I have 6 months of perfectly running released code and then I go and move RAM banks around. Normally, this kind of change raises my "Spidey Senses".
I will look into the fixed buffer complexity on heap1 because I hadn't thought of that. The code has been running great for 2 days straight so I am going to put it through some more testing but the RAM move appears harmless.
Thanks for your comments and insight, Gary
I would be concerned about the level of risk specifying blocks in a "maintenance release" as well - not so much that it is a conern now (it sounds like you're testing and it's going well) but down the road when somebody (including you) changes/adds to the code, it doesn't work, and you have to re-remember why you specifed the blocks and how that mechanism worked and then figure out how to fix it...
It's much easier turning the whole Kinetis SRAM into one block and, other than ensuring the hardware buffers don't cross the block boundary, don't worry about how the SRAM is used.
This issue came up a couple of months ago in this discussion here: Simple way to reallocate USB stack to SRAM_LOWER Personally, I recommend redefining SRAM_UPPER so that all the SRAM in the system is available in that software block with the caveat that all buffers used by hardware (ie DMA, USB, Networking, etc) are buffer aligned (as discussed in the discussion above) so they don't cross the two SRAMs' boundary. I believe it's the simplest way to make all the SRAM built into the system available to an application.
I should point out that not everybody here agrees with this approach but I haven't seen any cases where doing this will be an issue.
Two comments back:
So a better solution is to define these in my main.c file and then set the RTOS config flag to use my allocated memory but my question is still valid, is this faster now?
// In my freertos_main.c (Allocate 48 KBytes)
__NOINIT(SRAM_LOWER) uint8_t ucHeap[ configTOTAL_HEAP_SIZE ]; // RTOS Memory
// In FreeRTOSConfig.h
#define configTOTAL_HEAP_SIZE ((size_t)(48 * 1024))
#define configAPPLICATION_ALLOCATED_HEAP 1