M4-M0 core synchronisation / lock mechanism using mutex or semaphore

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wlamers on Fri Jul 18 01:06:35 MST 2014
I have an application in which both the M4 and M0 core (of an LPC4357) are accessing a (kind of) ring buffer. The buffer is used to store and retrieve data using a pointer mechanism. This requires a sort of lock mechanism preventing simultaneous access of both cores to the pointers/variables. As a side note: there are also some interrupts (mainly on the M4 core) that also access the same buffer pointers/variables.

The latter situation (interrupts) can de dealt with by disabling the interrupts while entering a critical section (section that access the buffer pointers/variables). But the M4-M0 synchronisation is more difficult. I do not want to use the SEV instruction to signal an interrupt to the other core to handle the lock mechanism, mainly due to performance reasons.

At the moment if have implement a simple lock mechanism that reads a lock variable in shared memory, accessible by both cores. The problem is that the 'check and set' of the lock variable isn't an atomic operation (it requires an 'if' statement and an assignment). This could possible break the lock mechanism causing unpredictable behaviour.

Therefore I need to implement a classic mutex lock mechanism where the test and set of the lock variable (mutex) is atomic. For the M4 this should be possible using the LDREX and STREX (load-exclusive, store-exclusive) instructions. Also ARM recommends using DMB to set a memory barrier. But oddly enough the M0 does NOT have these instructions. Rendering it impossible to use a mutex or other lock mechanism between the cores. Obviously NXP did know this during the design op the 43xx family and I assume they have come up with a solution. Although I cannot seem to find one. The IPC section of the manual and the application note is of no help, nor do I find much information on the internet.

So to generalize my question: who knows a good way to design an 'atomic' lock mechanism between the two cores (which can also be used in the interrupt service routines)?

lpcware · ‎06-16-2016

Content originally posted in LPCWare by JohnR on Fri Jul 18 05:53:58 MST 2014
Hi wlamers,

You write,

Quote:
I do not want to use the SEV instruction to signal an interrupt to the other core to handle the lock mechanism, mainly due to performance reasons.

Could you explain why you felt this was so.

I am using SEV and interrupts on a M4/M0/M0 system with the LPC4370. The data to be transferred between cores are placed in shared memory. So far the system seems to work without problems and seems a lot easier than the IPC queues suggested in the UM10503 manual.

From the manual

Quote:
A CPU cores raises an interrupt to the other CPU core or cores using the TXEV
instruction

Quote:
Since the ARM Cortex-M4 and ARM Cortex-M0 cannot at the same time write to the same
location, there is no need for a synchronization object (e.g. a semaphore) in this IPC.

Quote:

One awkwardness is that only one TXEV is issued by M4 and wakes up both M0Sub and M0App. I use a global flag to differentiate between the two cases but still both interrupts respond and then have to either execute some code or simply return if not flagged.

It would have been better, I think, if instead of M0Sub and M0App having the same interrupt numbers (INT #1), separate numbers had been assigned.

John

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wlamers on Wed Jul 23 03:08:31 MST 2014
Thanks for that info.

Unfortunately my buffer is a bit more complex than a simple ring buffer. In fact it is a buffer that is able to keep blocks in contiguous memory. Therefore it needs some helper functions to determine where to put data and where to get data using buffer pointers (maintained in a struct). Therefore the other core may never access these pointers (e.g. updating/writing them) at the same time the first core does. Here is where the locking is necessary. I implemented the Peterson's algorithm (including some memory barriers) and it seems to work fast enough. I did not run into trouble yet. But to make sure this will never happen in practise I need a way to check this.

You mention to use unit testing, but I do not have experience with multiple thread unit testing. How could I write test code that does this? Do I need some sort of random generator that 'fires' tests in an unpredictable timely manner? Could you give me an example?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by rocketdawg on Tue Jul 22 10:46:31 MST 2014

Quote: wmues
A ringbuffer with a read index and a write index can be used by 2 processes without any other synchronisation, if the accesses to the index registers are non-interruptible.

regards
Wolfgang

I was thinking the same thing
google lockless algorithms
I would think that one core is the Consumer and the other the Provider.
But lockless algorithms can be complex or simple, but often, hard as ever to debug.

the Peterson algorithm does have some execution overhead. I may very well shorter than an IRS context save/restore.
I always worry about the M0 core since it is a Von Newman core, thus reads and write to RAM are slower than the Harvard Arch M4.
and the CPU burst size is 1. Bus masters share memory in a round robin scheme. So what does that mean?
and I certainly do not like the "spin in place" if one were to use this method from within a ISR. Could get endless loop.

but, it might all work just fine. :)
just write a bunch of unit tests.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wmues on Sun Jul 20 13:01:41 MST 2014
A ringbuffer with a read index and a write index can be used by 2 processes without any other synchronisation, if the accesses to the index registers are non-interruptible.

regards
Wolfgang

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wlamers on Fri Jul 18 07:13:44 MST 2014
Indeed it does! Only Peterson is using a while which is even better.

I missed that one, he published it in 1981 and I was born in 1982 ;)

Thanks for the tip!

lpcware · ‎06-15-2016

Content originally posted in LPCWare by TheFallGuy on Fri Jul 18 06:57:51 MST 2014
This looks like Peterson's Algorithm:
http://en.wikipedia.org/wiki/Peterson%27s_algorithm

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wlamers on Fri Jul 18 06:49:51 MST 2014

Quote: JohnR
Could you explain why you felt this was so.

I am using SEV and interrupts on a M4/M0/M0 system with the LPC4370. The data to be transferred between cores are placed in shared memory. So far the system seems to work without problems and seems a lot easier than the IPC queues suggested in the UM10503 manual.

Well this is exactly what I am doing to send commands and messages between the cores. I defined two section in each project (one @0x20008000 0x200 long, and the second @ 0x20008200 also 0x200 long). This works indeed really well for this purpose.

But in my case I also let the M4 and M0 simultaneously 'work' on a ring buffer. This requires that both the M0 and M4 check and set a lock variable such that they cannot mess things up. I could use the same memory regions of above and send and interrupt to handle the locking, but this takes al lot of overhead (set the message, signal the other core, enter the IRQ, read/set lock, exit irq, continue). Since this is a very time critical application I would be better of by just let one core wait until the resource (buffer) lock is released after which the core can start immediately processing. This saves at least 20-50 instructions.

Quote:
Since the ARM Cortex-M4 and ARM Cortex-M0 cannot at the same time write to the same location, there is no need for a synchronization object (e.g. a semaphore) in this IPC.

I am aware of this but unfortunately does not solve the classis 'atomic mutex' problem. For example:

- M4 tests the lock (if (locked)) and it appears to be unlocked so it can continue
- M0 also tests the lock at the same instance in time or one or two clocks later and and it appears to be unlocked also (the M4 has not set the lock yet)
- M4 sets the lock (locked == true) and continues
- M0 also sets the lock (which was already set by the M4) and continues
- CONFLICT, since both change the buffer pointers etc!

You see, I need an atomic 'test and set' instruction for both the M0 and M4.

This is nothing new, since all multi-core/multi-thread applications and OS's have this problem and require an atomic test and set instruction.

Well the M4 has this, but the M0 not. But maybe there is a clever way around this?

Hmmm, well writing this up I was thinking the following. If I just use two additional lock variables in shared memory. And one is assigned to the M0 and one to the M4, meaning that only the M0 has write access to the first an only the M4 had write access to the second. I that way I can create a mechanism preventing testing and setting the value while the other core is busy doing that also. In code it would look like this:

// M0
while (M0_testLock == true) {} // wait until M4 releases test and set lock which cannot take more than a few clocks
M4_testLock = true; // set the set and test lock for the M4 since we are going to do a test and set
if (locked == false) // Test for lock
locked = true; // Set lock
else 
dummy;
M4_testLock = false; // Release the test and set lock


// M4 (vice versa)
while (M4_testLock == true) {} // wait until M0 releases test and set lock which cannot take more than a few clocks
M0_testLock = true; // set the set and test lock for the M0 since we are going to do a test and set
if (locked == false) // Test for lock
locked = true; // Set lock
else 
dummy;
M0_testLock = false; // Release the test and set lock

Could this work or am I overlooking something? Or who knows a better way?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wlamers on Fri Jul 18 02:35:29 MST 2014
Yes I have, but the suggested method relies on interrupts and or 'messaging' the other core. Both I want to prevent due to performance reasons. Or are you implicating something else (I maybe have overlooked) here?

Quote: wlamers
The IPC section of the manual and the application note is of no help, nor do I find much information on the internet.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by TheFallGuy on Fri Jul 18 01:39:48 MST 2014
Have you read chapter 2 of the User Manual (UM10503)? Section 2.5 describes an IPC Protocol example.

M4-M0 core synchronisation / lock mechanism using mutex or semaphore

M4-M0 core synchronisation / lock mechanism using mutex or semaphore

LPC43xx