RPMSG, Increasing number of buffer (RPMSG_NUM_BUFS)

niranjanbc · ‎06-14-2017

below are the changes i have made in imx_rpmsg.c file

#define RPMSG_NUM_BUFS (1024)
+#define RPMSG_BUF_SIZE (512)
+#define RPMSG_BUFS_SPACE (RPMSG_NUM_BUFS * RPMSG_BUF_SIZE)
+
+/*
+ * The alignment between the consumer and producer parts of the vring.
+ * Note: this is part of the "wire" protocol. If you change this, you need
+ * to update your BIOS image as well
+ */
+#define RPMSG_VRING_ALIGN (4096)
+.

.

struct imx_rpmsg_vproc *rpdev = &imx_rpmsg_vprocs[i];
+
+ if (!strcmp(rpdev->rproc_name, "m4")) {
+ ret = of_device_is_compatible(np, "fsl,imx7d-rpmsg");
+ ret |= of_device_is_compatible(np, "fsl,imx6sx-rpmsg");
+ if (ret) {
+ /* hardcodes here now. */
+ rpdev->vring[0] = 0xBF800000;//0xBFFF0000;
+ rpdev->vring[1] = 0xBF880000;//0xBFFF8000;
+ }
+ } else {
+ break;
+ }

i want to increase the number of buffers, what other changes i have to make that work.

if change the buffer number back to 512, i see rpmsg driver works fine.with above memory allocation.

i have allocated now last 8MB of shared DDR memory for RPMSG.

do i need to change "RPMSG_VRING_ALIGN", what is the value and how to calculate it.

if i change RPMSG_VRING_ALIGN, what other files are affected, what other changes i have to make.

b50844 · ‎06-16-2017

Hi Niranjanbc,

If you change RPMSG_NUM_BUFS, it is necessary to also change this on the remote/microcontroller/coprocessor side.

Have you changed it there as well? (Both VRING base address and number of RPMsg buffers)

Regards,
Marek

niranjanbc · ‎06-16-2017

Yes I did changed in Platform_info.c

/*
* Linux requires the ALIGN to 0x1000(4KB) instead of 0x80
*/
#define VRING_ALIGN 0x1000

/*
* Linux has a different alignment requirement, and its have 512 buffers instead of 32 buffers for the 2 ring
*/
#define VRING0_BASE 0xBF800000
#define VRING1_BASE 0xBF880000

/* IPI_VECT here defines VRING index in MU */
#define VRING0_IPI_VECT 0
#define VRING1_IPI_VECT 1

#define MASTER_CPU_ID 0
#define REMOTE_CPU_ID 1

....

.......

....

struct hil_proc proc_table []=
{
/* CPU node for remote context */
{
/* CPU ID of master */
MASTER_CPU_ID,

/* Shared memory info - Last field is not used currently */
{
(void*)SHM_ADDR, SHM_SIZE, 0x00
},

/* VirtIO device info */ /*struct proc_vdev*/
{
2, (1<<VIRTIO_RPMSG_F_NS), 0, /*num_vring, dfeatures, gfeatures*/

/* Vring info */ /*struct proc_vring*/
{
/*[0]*/
{ /* TX */
NULL, (void*)VRING0_BASE/*phy_addr*/, 512/*num_descs*/, VRING_ALIGN/*align*/,
/*struct virtqueue, phys_addr, num_descs, align*/
{
/*struct proc_intr*/
VRING0_IPI_VECT,0,0,NULL
}
},
/*[1]*/
{ /* RX */
NULL, (void*)VRING1_BASE, 512, VRING_ALIGN,
{
VRING1_IPI_VECT,0,0,NULL
}
}
}
},

/* Number of RPMSG channels */
1, /*num_chnls*/

/* RPMSG channel info - Only channel name is expected currently */
{
{"rpmsg-openamp-demo-channel"} /*chnl name*/
},

/* HIL platform ops table. */
&proc_ops, /*struct hil_platform_ops*/

/* Next three fields are for future use only */
0,
0,
NULL
},

/* CPU node for remote context */
{
/* CPU ID of remote */
REMOTE_CPU_ID,

/* Shared memory info - Last field is not used currently */
{
(void*)SHM_ADDR, SHM_SIZE, 0x00
},

/* VirtIO device info */
{
2, (1<<VIRTIO_RPMSG_F_NS), 0,
{
{/* RX */
NULL, (void*)VRING0_BASE, 512, VRING_ALIGN,
{
VRING0_IPI_VECT,0,0,NULL
}
},
{/* TX */
NULL, (void*)VRING1_BASE, 512, VRING_ALIGN,
{
VRING1_IPI_VECT,0,0,NULL
}
}
}
},

/* Number of RPMSG channels */
1,

/* RPMSG channel info - Only channel name is expected currently */
{
{"rpmsg-openamp-demo-channel"}
},

b50844 · ‎06-16-2017

...then it should work...

Legacy RPMsg by OpenAMP is however no longer supported on our platforms - have you considered trying RPMsg Lite (GitHub - NXPmicro/rpmsg-lite: RPMsg implementation for small MCUs ) ?

I will be able to provide more help if you try to port your remote side to RPMsg Lite.

Regards,
Marek

dry · ‎06-19-2018

Hi Marek,

Just to understand: when you say not supported on our platforms, which ones do you mean?

The date of op and this discussion is June 2017, and NXP's Linux & FreeRTOS BSP were released later that year -

linux-imx-4.9.11_1.0.0_ga, and FreeRTOS_BSP_1.0.1_iMX7D. And till now (20.06.2018) I haven't spotted any release updates to that. (yes, I'm concerned with iMX7xxx )

I did not see any reference to RPMsg Light in the releases I mentioned above. And the FreeRTOS example setup and the Linux kernel setup came with OpenAMP setup.

So what you wrote seems like NXP stopped supporting it before it was even released ?

Or is that only limited to some iMXxx with Cortex-M0+ ? Is the OpenAMP RPMsg as in the relase files still supported for other SoCs ? Specifically iMX7D, iMX6SX ? (anything else using those BSPs .. )

Would be great to get feedback on this , as I'm a bit stunned of the possibility of it being un-supported anymore .

b50844 · ‎06-21-2018

Hi D.RY,

With current Linux implementation, you are free to use either OpenAMP RPMsg or RPMsg-Lite, however RPMsg-Lite is the preferred implementation, since it has smaller footprint for smaller AMP microcontrollers.
However, if your project for some reason require OpenAMP RPMsg, the protocol is binary compatible with what is available on the Linux side.
Can you please share the exact platform you would like to use, so that I can help you more? Why do you need OpenAMP RPMsg in your design? Is it a new project or a legacy one?

Thanks,
Marek

dry · ‎06-21-2018

Hi Market,

The platform is iMX7D, using official BSP from NXP as referred to in previous post, with minor patches for custom board.

> Why do you need OpenAMP RPMsg...

This is because I/we have not seen any reference to RPMsg-Lite in any of the documents contained in the BSP as released. As such some time was already invested into understanding the RPMsg port - on Linux, and FreeRTOS - as provided in the BSP files, as well as porting existing MCC communication to RPMsg, with some adaption.

( I have to note MCC was used successfully ).

Whether RPMsg-Lite is lighter on memory resources may or not be concern for us. What is concerning is that you write NXP does not support OpenAPM RPMsg anymore.

>Is it a new project or a legacy one?

The project is new relatively, to be completed this year or about. We took latest (as I still see. Was there another Linux & FreeRTOS BSP official release from NXP that moved to RPMsg-Lite?) official NXP release. If fact that is also latest used by at least few SoM/CoM vendors, which base their trees on NXP official release. Thus my puzzlement.

>the protocol is binary compatible ..

Does this mean that, the actual underlying structure of RPMsg - the ring buffers in shared memory between two cores, the IMX's MU usage, the virtio usage on Linux side, the device and endpoints creation and management - is changed under RPMsg-Lite, while the on-top rules like hand shake , name service and device/endpoints establishment stayed the same?

b50844 · ‎06-21-2018

Hi D.ry,

As a matter of fact, if you use RPMsg-Lite or OpenAMP RPMsg, nothing changes on the Linux side.
In both cases you need to setup equally the location of the shared memory (vring base addresses) and the handshake and nameservice remains the same. The only difference with RPMsg-Lite is that you don't have to use nameservice if you want to setup your endpoints manually, but you can use it, if you want... So to answer first your last question - yes, MU usage, virtio usage etc. remains the same. With the only difference that RPMsg-Lite can be used to create multiple independent instances of RPMsg with different buffer size etc. This is used on some platforms to completely separate communication with different criticality - audio can have a separate instance of RPMsg, non-time-critical data can have another RPMsg instance etc. This possibility was added, since all endpoints in one instance of RPMsg share the same underlaying vrings, which makes it possible for one endpoint to block another endpoint by not freeing/consuming the received buffers and depleting the available buffer pool.... However, AFAIK the main reason for moving towards RPMsg-Lite was its smaller footprint, which is not an issue for i.MX devices, but it is nice to have a single implementation for Cortex-M devices (baremetal/FreeRTOS/other RTOS ones).

Actually, RPMsg-Lite is not running as part of Cortex-A core Linux BSP, but runs on M4 and therefore it might not be mentioned in the BSP documentation. Earlier in this thread, there is a reference to this repository: GitHub - EmbeddedRPC/erpc-imx-demos: eRPC demos for i.MX devices
You should find there and example of how to setup everything for Linux to M4 communication.

I understand your concerns, it should be however straight forward to initialize RPMsg-Lite and to use it. If you run into any actual issues, you are free to let me know here and I will help you as much as I can.

Regards,
Marek

dry · ‎06-21-2018

Hello Marek,

> ..With the only difference that RPMsg-Lite can be used to create multiple independent instances of RPMsg with >different > buffer size etc

This is in fact needed, and also I saw the same fixed hardcoded buffer sizes are limitations. So you saying this is fixed now, and you don't have to stick to fixed 256 bytes buffers for example? As I saw, for different purpose channels this is limitation, as well as fixed total size of tx/rx queues for any device created (as some devs may not need such long queues).

About independent instances. What I saw in the current RPMsg Linux setup from BSP is that you can create already multiple virtual devices for RPMsg if you increase the allocated memory in the Linux device tree, and increase vdev-nums property. Thus I could create 2 virtual RPMSG devices, having separate rx/tx queues (4 queues or ringbuffers in total, therefore), and thus independent in how you explained RPMsg-Lite does it. And thus extra setup on the FreeRTOS side would be required to accommodate the extra remote device on Linux. In this way, you only use one (default) end point which gets automatically created per each vdev by the Linux.

Does RPMsg-Lite do more than above to separate these devices / channels ?

> .. completely separate communication with different criticality

What I saw (from code) as limiting for channel independence (in the current RPMsg in Linux from BSP) is that, RPMsg gets reserved just one word & interrupt from MU for kicking(awaking) and one word for receiving. No matter on which virtual RPMsg device data is sent on or received from, there is one MU word to kick any device queue, and one other MU word it receives on (albeit with buffering, I think on Linux ~10 word buffer when receiving before processing). Thus there is no really independent RPMsg vdevs.

Is this different then on the RPMsg-Lite? Does it reserve more of MU for implementing the comm?

b50844 · ‎06-24-2018

Hello D.RY,

This is in fact needed, and also I saw the same fixed hardcoded buffer sizes are limitations. So you saying this is fixed now, and you don't have to stick to fixed 256 bytes buffers for example? As I saw, for different purpose channels this is limitation, as well as fixed total size of tx/rx queues for any device created (as some devs may not need such long queues).

-> Yes, you can have more instances, each having different size of buffers (256, 512, 2048...). However the implementation in Linux does not support this currently, just the RPMsg-Lite part.
In Linux kernel the size is hard-coded: linux-fslc/virtio_rpmsg_bus.c at 4.17.x+fslc · Freescale/linux-fslc · GitHub - line 918 (vrp->buf_size = MAX_RPMSG_BUF_SIZE;)

About independent instances. What I saw in the current RPMsg Linux setup from BSP is that you can create already multiple virtual devices for RPMsg if you increase the allocated memory in the Linux device tree, and increase vdev-nums property. Thus I could create 2 virtual RPMSG devices, having separate rx/tx queues (4 queues or ringbuffers in total, therefore), and thus independent in how you explained RPMsg-Lite does it. And thus extra setup on the FreeRTOS side would be required to accommodate the extra remote device on Linux. In this way, you only use one (default) end point which gets automatically created per each vdev by the Linux.

Does RPMsg-Lite do more than above to separate these devices / channels ?

-> You can use more than just the default endpoint, but yes, this is how it works with RPMsg-Lite as well.

What I saw (from code) as limiting for channel independence (in the current RPMsg in Linux from BSP) is that, RPMsg gets reserved just one word & interrupt from MU for kicking(awaking) and one word for receiving. No matter on which virtual RPMsg device data is sent on or received from, there is one MU word to kick any device queue, and one other MU word it receives on (albeit with buffering, I think on Linux ~10 word buffer when receiving before processing). Thus there is no really independent RPMsg vdevs.

Is this different then on the RPMsg-Lite? Does it reserve more of MU for implementing the comm?

The implementation of "independent" RPMsg instances is always very BSP-dependent. In case of i.MX, all instances share the same MU interrupt and the vring number is passed in via the 32bit transmit register in MU. However the interrupt processing should not block, so the instances can be regarded as almost-independent. Of course, when MU lifecycle state is changed or the interrupt is disabled, it stops all instances from working...
Other transmit registers in MU are still dependent on the same MU peripheral, so moving new instances to new transmit register would make these new instances independent of the original one more by having its own interrupt line. However they would still share one MU peripheral - e.g. if the clock is disabled, all instances stop to work.

Regards,
Marek

dry · ‎06-26-2018

Hi Marek,

Thanks for taking time to reply & provide all the useful details.

Small question:

> .. Of course, when MU lifecycle state is changed ..

What is MU lifecycle ?

niranjanbc · ‎06-16-2017

you mean i should change to Rpmsg Lite on M4(remote) side only and keep A9 same.

dusancervenka-b · ‎06-17-2017

HI niranjanvbc,

Yes that means let RPMSG on Linux side and use RPMSG-Lite on M4 side. We have also example of use here for i.mx GitHub - EmbeddedRPC/erpc-imx-demos: eRPC demos for i.MX devices . But this example is using our eRPC framework. You don't need to use it. On linux side is used python i think. But for you is important the second side.

niranjanbc · ‎06-19-2017

Thanks for response Dusan

i am not using python, i am doing sysyem call from C to install the endpoint module.

since i have the issue on Linux side, how could that fix the issue by changing to RPMSG-Lite on FreeRtos side.

b50844 · ‎06-19-2017

Hi Niranjanbc,

Actually, the problem is not limited to Linux side. Accurate settings is required on both sides for RPMsg to work correctly. You can look at the repository suggested by Dusan, it contains also an RPMsg-only example using RPMsg-Lite on the side of M4 and imx_rpmsg.c + a kernel module exporting RPMsg endpoints to user space.

When you change something on Linux side, it creates issues on the other side, if all the settings are not changed also there accordingly... As we don't support OpenAMP RPMsg for M4, I suggest you to use RPMsg-Lite on M4, so that we can help you better.

Regards,
Marek

niranjanbc · ‎06-20-2017

Hi Marek

On the linux/A9 side i want to configure below memory map

1. 0x80000000 - 0xBEFFFFFF --> Linux usable memory

2. 0xBF000000 - 0xBF7FFFFF --> RPMSG Shared memory

3. 0xBF800000 - 0xBFFFFFFF --> M4 Data memory

for above memory map, i made below linuxside and m4 side changes

1. change in dts file

         memory {
        linux,usable-memory = <0x80000000 0x3f000000>;
        reg = <0x80000000 0x3f800000>;
     };

2. change in imx_rpmsg.c ,

                 /* hardcodes here now. */
              rpdev->vring[0] = 0xBF7F0000;//0xBFFF0000;
              rpdev->vring[1] = 0xBF7F8000;//0xBFFF8000;

change in M4 side, don't know yet because i have not started porting to RPMSG-lite yet.

3. change in IAR linker file

am i missing anything here.

above vring is stored at the end of the RPMSG shared memory, is that correct, because reference code does the same.

remaining space will be used for RPMSG buffer will be calculated and used automatically by RPMSG driver, or do i need to specify that. if i need to do that, where is that part of code located.

can you please comment on this.

b50844 · ‎06-22-2017

Hi Niranjanbc,

What you set by hardcoding the rpdev->vring[x] is a place, where the VRING structures (used ring buffer, avail ring buffer and an array of descriptors of buffers) will be placed. The number of buffers makes the VRING structure grow, so 32kB is normally allocated for a single rpdev->vring[x] as you can see and as you did it correctly.

The buffers, which are being pointed to from the array of buffer descriptors are allocated using kernel API: dma_alloc_coherent() ( linux/drivers/rpmsg/virtio_rpmsg_bus.c - Elixir - Free Electrons line 892 ) so the allocated memory area is uncached (https://www.kernel.org/doc/Documentation/DMA-API.txt ). You actually don't have control where it will be placed and rpmsg driver in Linux Kernel does not calculate anyhow, that it should be placed in the remainder as you suggest. On the other hand, you can be sure it will be in DDR, as it is big, but as it is dynamically allocated, you don't know where it will be placed before you all dma_alloc_coherent().

Did I clarify a little bit the situation?

Regards,
Marek

dry · ‎10-29-2018

Hi Marek,

I would like to clarify your answers.

Marek Novak wrote:
....
What you set by hardcoding the rpdev->vring[x] is a place, where the VRING structures (used ring buffer, avail ring buffer and an array of descriptors of buffers) will be placed. The number of buffers makes the VRING structure grow, so 32kB

In The iMX7D Linux kernel BSP device tree reserves 32KB for each vring, yet in Linux kernel code, the vring_size() after page alignment is 3 pages / 0x3000 or 12KB. The actual vring_size() calculation is close to what the OpenAMP RPMsg VirtIO transport layer says it should be.

Thus, why much more is reserved for each vring i devtree - why the 32KB?

dma_alloc_coherent() ( linux/drivers/rpmsg/virtio_rpmsg_bus.c - Elixir - Free Electrons line 892 ) so the allocated memory area is uncached (https://www.kernel.org/doc/Documentation/DMA-API.txt ). You actually don't have control where it will be placed and rpmsg driver in .....you don't know where it will be placed before you all dma_alloc_coherent().

So it is entirely upto Linux kernel to allocate data buffers anywhere in it's total RAM /DDR .. ? You cannot fix it to a fixed segment? (what if you want it not in DDR ? )

On the iMXD Sabre board, the default NXP's devtree has these holes in linux memory:

linux,usable-memory = <0x80000000 0x1ff00000>,
<0xa0000000 0x1ff00000>;

And rpmsg at :

reg = <0xbfff0000 0x10000 >;

I see 1MB of each 512MB block is chopped / reserved; Are those not then used by dma_alloc_coherent() for the rpmsg buffers ..(. I guess not .. ).

I'm wondering why -2MB if rpmsg only needs 64KB .

b50844 · ‎10-29-2018

Hi D.RY,

It's been a while I did not go through the code you mention, but as far as I know:

- I think that the size that is reserved is higher than actually needed for your setup to make it unnecessary to change for other setups (higher number of buffers), size the required size will grow with growing number of buffers, I guess...

- I expressed myself ambiguously - the allocation is controlled by vdev->dev.parent->parent, which actually ends up here: Linux source code: kernel/dma/coherent.c (v4.19) - Bootlin , where the allocation is done from a memory reserved for the device. So to my limited understanding - you don't know where it will be placed, but you know the pool in which it will be -> so not anywhere in the DDR, but in the memory reserved for the vdev.

- Yes, 2MB are reserved and only 64kB is used -> the reason is the same as in the first point - so that it is not needed to change device tree everytime the RPMsg settings is modified. If you need every MB of DDR saved, you can reduce the size. And yes, the carveout of 2MB is used also for allocation of rpmsg buffers, as explained in my second point, so proceed carefully when reducing the size. (AFAIK, the size of the buffer is 512B and the number of buffers is 512, so at least 256kB is needed!)

Regards & thanks for analysis,
Marek

dry · ‎10-30-2018

Hi Marek,

Thanks you for replying & helping out.

I need to clarify this part:

- Yes, 2MB are reserved and only 64kB is used -> the reason is the same as in the first point - so that it is not needed to change device tree everytime the RPMsg settings is modified. If you need every MB of DDR saved, you can reduce the size. And yes, the carveout of 2MB is used also for allocation of rpmsg buffers, as explained in my second point, so proceed carefully when reducing the size. (AFAIK, the size of the buffer is 512B and the number of buffers is 512, so at least 256kB is needed!)
...

As you see from devtree, its only 64KB is reserved for RPMsg device - the "vdev".

Thus, as you pointed out, 512x512 / 256KB buffer space cannot fit into that device memory , so we see that does not fit into the "vdev"'s allocated pool. It can't be same pool then that dma_alloc_coherent_xxxx allocates those buffers. (Unless it assumes things beyond that memory??) So that point I don't understand.

Also, my understanding is if you specify "usable-memory" range/size to Linux as per above device tree I referred too, it is only that memory that it will use, and will be un-aware of what's outside it. I'm not sure completely on this, but I think it makes sense ? This is how we could reserve DDR space for M4 code if we need.

It would be strange, thus, that the kernel would allocate memory from the space which 1) marked as not usable, and 2) not specified as part of the RPMsg device in it's dev node (the rpmsg vdev section; which only specified 64KB as we see).

I don't know if I understood your point 2) above right. I took it as you saying kernel would allocate it from those 2MB chopped.

...but in the memory reserved for the vdev.

But as I explained, I'm not seeing it. If you meant it reserves the device memory elsewhere in DDR, then ok.

niranjanbc · ‎06-22-2017

Thanks Marek,

i have completed porting to Rpmsg-lite, i have below code in FreeRtos application for rpmsg lite init and receive data. i have not increased buffer number, still i have kept default 512 on RPMSG in Linux core A9.

but it is not working, it works fine if I switch back to legacy openamp RPMSG in FreeRtos.

struct rpmsg_lite_instance *rpmsgM4Instance;
struct rpmsg_queue_handle *rpmsgQ;

rpmsgM4Instance = rpmsg_lite_remote_init((void *)(0xBF7F0000), RL_PLATFORM_IMX6SX_M4_LINK_ID, init_flags);
assert(rpmsgM4Instance == NULL);
PRINTF("RPMSG Lite Initialized\n\r");

rpmsgQ = rpmsg_queue_create(rpmsgM4Instance);
assert(rpmsgQ == RL_NULL);

for (;;)
{
rpmsg_queue_recv_nocopy(rpmsgM4Instance, rpmsgQ, &src, &rx_buf, &len, 0xFFFFFFFF);
.

.

rpmsg_queue_nocopy_free(rpmsgM4Instance, rx_buf);
}

is there anything i am missing in above code.

RPMSG, Increasing number of buffer (RPMSG_NUM_BUFS)

RPMSG, Increasing number of buffer (RPMSG_NUM_BUFS)

i.MX6SoloX

Linux