RPMsg vqueue /vring shared memory overhead and data buffers

dry · ‎10-25-2018

UPDATED

In the context of iMX7D Linux and FreeRTOS OpenAMP RPMsg implementation, as provided by NXP (un-modified, and not alternative Lite version):

In the default Linux devtree, total of 0x10000 / 64KB reserved for the vrings, half-half for Tx, Rx 0x8000 / 32KB each (the 0x8000 is hardcoded imx_rpmsg.c).

This fixed location for vrings is only the management overhead from the vrings - is this correct?

The buffers for the actual data are allocated by the kernel (virtio_rpmsg_bus.c, dma_alloc_coherent), from another memory, and this is 512 buffers x 512 buffer size = 256KB (for Tx & Rx).

For the vrings, from Linux imx_rpmsg.c it says:

/* With 256 buffers, our vring will occupy 3 pages */
#define RPMSG_RING_SIZE ((DIV_ROUND_UP(vring_size(RPMSG_NUM_BUFS / 2, \
RPMSG_VRING_ALIGN), PAGE_SIZE)) * PAGE_SIZE)

That is 3 x 0x1000 = 0x3000 / 12228, or ~12KB per vring / per Tx /Rx.

Why then 32KB is reserved in devtree for each queue ...?

Also,

*
* For now, allocate 256 buffers of 512 bytes for each side. each buffer
* will then have 16B for the msg header and 496B for the payload.
* This will require a total space of 256KB for the buffers themselves, and
* 3 pages for every vring (the size of the vring depends on the number of
* buffers it supports).
*/

Is the total space overhead in this setup then:

512 * 16 = 8192 / 0x2000 for total RPMsg messages (Tx & Rx queues) +

2 * 0x3000 = 0x6000 / 24576 for vrings

= 0x8000 / 32KB, which is 12,5%

Is this right or something is missing?

I can't calculate why 64KB is reserved for vrings reserved in devtree though ...Where does it come from?

In addition (first time I didn't) count for virtqueues per each shared vring, which are allocated for management in Linux

(rp_find_vq, calls vring_new_virtqueue). Tracing that, for default setup of 256 buffers, at least another 2132 bytes per virtqueue allocated (it's at least because virtqueue has other data pointers in it, which may be assigned allocated mem by other parts of code I didn't trace). Thus another 4264 bytes for 2 vqueues for management on Linux.

On FreeRTOS side it creates virtqueues for vrings, (rpmsg_rdev_create_virtqueues in remote_device.c), and that's traced to 4224 (after malloc alignment) per queue.

Adding up ‭32768‬ + 12712 = 45480, or about ~45KB . (There is also other state management stuff, like rdevs, channels, etc..)

So now ~ 17% overhead, with 256KB data space.

b36401 · ‎11-14-2019

The throughput of RPMsg is about 5 Mbps.

dry · ‎01-11-2020

Hey Victor,

How did you measure, and whats your test system?

I also wanted to do a test but never got to it.

dry · ‎10-26-2018

Also, the shared vring size should be same, for both sides.. But there is a slight size difference between Linux calculation and FreeRTOS one:

Linux :

static inline unsigned vring_size(unsigned int num, unsigned long align)
{
return ((sizeof(struct vring_desc) * num + sizeof(__virtio16) * (3 + num)
+ align - 1) & ~(align - 1))
+ sizeof(__virtio16) * 3 + sizeof(struct vring_used_elem) * num;
}

Runtime this gives 10246

FreeRTOS:

static inline int vring_size(unsigned int num, unsigned long align)
{
int size;
size = num * sizeof(struct vring_desc);
size += sizeof(struct vring_avail) + (num * sizeof(uint16_t)) +
sizeof(uint16_t);
size = (size + align - 1) & ~(align - 1);
size += sizeof(struct vring_used) +
(num * sizeof(struct vring_used_elem)) + sizeof(uint16_t);
return (size);
}

This gives 10254, 8 bytes more.

Align is 4096 in both cases. If I use protocol definition (RPMsg Messaging Protocol · OpenAMP/open-amp Wiki · GitHub ) and align after avail buffer (as done above), I get

10244

Shouldn't they be identical .. ? This is the size of the type in shared memory. ..

( Align helps between avail and used ring buffers, and also each vring fixed at address at an 'extra' offset/margin from each other .. so you don't care for a few extra bytes, as long as it's more ?)

RPMsg vqueue /vring shared memory overhead and data buffers

RPMsg vqueue /vring shared memory overhead and data buffers

i.MX7Dual