Hi All,
I need to play 4 x 1080p streams concurrently but the VPU fails to allocate memory. I'm getting the following kernel error message:
Physical memory allocation error!
The "gpumem=" command line argument explained in this post does not affect the behaviour. I tried 64M,128M and 256M without any success.
Also this patch does not solve the problem.
Is there any more suggestions?
Thanks,
Tarek
已解决! 转到解答。
You can reduce the amount of memory the VPU is allocating to minimum by setting the frame-plus property.
For example:
gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=540 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=540 disp-width=960 disp-height=540
This command can play 4 x 1080p streams
You can reduce the amount of memory the VPU is allocating to minimum by setting the frame-plus property.
For example:
gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=540 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=540 disp-width=960 disp-height=540
This command can play 4 x 1080p streams
I'm using wandsolo.
I do get this error when I combine 1080p video with an active opengl context.
e.g.:
gst-launch filesrc location=sintel_trailer-1080p.mp4 typefind=true ! aiurdemux ! vpudec ! glimagesink
=> " Physical memory allocation error! "
(note that it would work with mfw_v4lsink, but I'm after the combined-with-opengl-use-case)
It's working with:
gst-launch filesrc location=sintel_trailer-1080p.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! glimagesink
fyi: I still get:
[WARN] VPU iram is less than needed, some parts don't use iram
... but it's rendering properly anyway!
The problem is that I want to use this within Qt. (glimage sink is just a example so simple test that frame-plus=1 is working)
Qt doesn't build the whole pipeline manually, rather lets gstreamer define most of it and just adds the 'end pieces'.
Therefor I think it's practically impossible to add frame-plus=1 to vpudec ... unless one restructures QtMultimedia significantly.
Is there a way to set that as environment variable or something like that?
So something like:
export GST_VPU_FRAME_PLUS=1; gst-launch ...
would be awesome! :smileyhappy:
Alternate question: can I somehow adjust the size of the memory in question? (gpu mem? vpu mem? ipu mem?)
I've seen the Android mem-alloc page, gut gpumem=... seems to not work on linux.
I've tried fbmem=100M, but it's not working either.
I also tried added the patch (an adjusted version) from here: Re: GStreamer crashing on i.MX6 (Boundary Devices Nitrogen6x)
.. to board-wand.c
Nothing changes.
Hi Thomas,
1. For your first suggestion you can modify gst-fsl-plugins to set the default value to 1. I think if you changed the number "6" to "1" in vpudec.c line 182 that will do.
2. For your alternative solution you can modify the kernel memory map to give more space for DMA zone and reduce the VMALLOC zone. The VPU is allocating memory from DMA and at some point there is it enough for it.
If you look at kernel messages at boot time you will see the memory map. Something like this:
Memory: 640MB 256MB = 896MB total
Memory: 896808k/896808k available, 151768k reserved, 0K highmem
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)
DMA : 0xe0a00000 - 0xffe00000 ( 500 MB)
vmalloc : 0xc0800000 - 0xde400000 ( 476 MB)
lowmem : 0x80000000 - 0xc0000000 (1024 MB)
pkmap : 0x7fe00000 - 0x80000000 ( 2 MB)
modules : 0x7f000000 - 0x7fe00000 ( 14 MB)
.init : 0x80008000 - 0x8003c000 ( 208 kB)
.text : 0x8003c000 - 0x80a95b64 (10599 kB)
.data : 0x80a96000 - 0x80af9ca0 ( 400 kB)
.bss : 0x80af9cc4 - 0x80b47eac ( 313 kB)
Hope that helps
hmm I tried to adjust the memory ... this is roughly what I've tried:
.. this version doesn't boot .. without the modifications on the DMA size, it's booting.
the kernel error:
[ | 0.423988] ------------[ cut here ]------------ |
[ | 0.428696] WARNING: at arch/arm/mm/dma-mapping.c:174 consistent_init+0x60/0xd0() |
[ | 0.436201] Modules linked in: |
[ | 0.439353] [<c0043f3c>] (unwind_backtrace+0x0/0xf4) from [<c006ec60>] (warn_slowpath_common+0x54/0x64) |
[ | 0.448787] [<c006ec60>] (warn_slowpath_common+0x54/0x64) from [<c006ed0c>] (warn_slowpath_null+0x1c/0x24) |
[ | 0.458506] [<c006ed0c>] (warn_slowpath_null+0x1c/0x24) from [<c000bab0>] (consistent_init+0x60/0xd0) |
[ | 0.467772] [<c000bab0>] (consistent_init+0x60/0xd0) from [<c00375c0>] (do_one_initcall+0x11c/0x174) |
[ | 0.476963] [<c00375c0>] (do_one_initcall+0x11c/0x174) from [<c00089ec>] (kernel_init+0xc0/0x144) |
[ | 0.485870] [<c00089ec>] (kernel_init+0xc0/0x144) from [<c003da7c>] (kernel_thread_exit+0x0/0x8) |
[ | 0.494712] ---[ end trace 1b75b31a2719ed1c ]--- |
[ | 0.499427] ------------[ cut here ]------------ |
[ | 0.504068] WARNING: at arch/arm/mm/dma-mapping.c:174 consistent_init+0x60/0xd0() |
[ | 0.511589] Modules linked in: |
[ | 0.514694] [<c0043f3c>] (unwind_backtrace+0x0/0xf4) from [<c006ec60>] (warn_slowpath_common+0x54/0x64) |
[ | 0.524141] [<c006ec60>] (warn_slowpath_common+0x54/0x64) from [<c006ed0c>] (warn_slowpath_null+0x1c/0x24) |
[ | 0.533834] [<c006ed0c>] (warn_slowpath_null+0x1c/0x24) from [<c000bab0>] (consistent_init+0x60/0xd0) |
[ | 0.543104] [<c000bab0>] (consistent_init+0x60/0xd0) from [<c00375c0>] (do_one_initcall+0x11c/0x174) |
[ | 0.552272] [<c00375c0>] (do_one_initcall+0x11c/0x174) from [<c00089ec>] (kernel_init+0xc0/0x144) |
[ | 0.561189] [<c00089ec>] (kernel_init+0xc0/0x144) from [<c003da7c>] (kernel_thread_exit+0x0/0x8) |
[ | 0.570000] ---[ end trace 1b75b31a2719ed1d ]--- |
Are you actually sure it's the kernel-DMA size?... I've checked on my other boards (nitrogen and sabreauto) none of them has a bigger DMA size, but the problem doesn't occur on them ...?
nitrogen:
[ | 0.000000] | vector : 0xffff0000 - 0xffff1000 ( 4 kB) | |
[ | 0.000000] | fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) | |
[ | 0.000000] | DMA : 0xf4600000 - 0xffe00000 ( 184 MB) | |
[ | 0.000000] | vmalloc : 0xc0800000 - 0xf2000000 ( 792 MB) | |
[ | 0.000000] | lowmem : 0x80000000 - 0xc0000000 (1024 MB) | |
[ | 0.000000] | pkmap : 0x7fe00000 - 0x80000000 ( 2 MB) | |
[ | 0.000000] | modules : 0x7f000000 - 0x7fe00000 ( 14 MB) | |
[ | 0.000000] | .init : 0x80008000 - 0x80035000 ( 180 kB) | |
[ | 0.000000] | .text : 0x80035000 - 0x8067e550 (6438 kB) | |
[ | 0.000000] | .data : 0x80680000 - 0x806d7160 ( 349 kB) | |
[ | 0.000000] | .bss : 0x806d7184 - 0x80721818 ( 298 kB) |
AI:
vector : 0xffff0000 - 0xffff1000 ( 4 kB) | ||
fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) | ||
DMA : 0xf4600000 - 0xffe00000 ( 184 MB) | ||
vmalloc : 0xea800000 - 0xf2000000 ( 120 MB) | ||
lowmem : 0x80000000 - 0xea000000 (1696 MB) | ||
pkmap : 0x7fe00000 - 0x80000000 ( 2 MB) | ||
modules : 0x7f000000 - 0x7fe00000 ( 14 MB) | ||
.init : 0x80008000 - 0x8003e000 ( 216 kB) | ||
.text : 0x8003e000 - 0x80b0d54c (11070 kB) | ||
.data : 0x80b0e000 - 0x80b7a070 ( 433 kB) | ||
.bss : 0x80b7a094 - 0x80bee4d0 ( 466 kB) |
Sorry for the late response. I was out of office for a week.
This patch looks good, unfortunately it's not for the kernel and board I was looking for.
I was looking for linux-wandboard (git://github.com/johnweber/linux.git) and board is wandboard-solo
Also 500mb-DMA-size is not applicable (the wandboard-solo only has 512mb ram) .. it's a 'enough for 1x1080p when someone else is consuming GPU as well' (within a Qt/OpenGL application)
Dear Michael.
I have met vpu memory allocation failure too.:smileycry:
Test environment
- LTIB, version : 3.0.35-4.1.0
- mxc_vpu_test.out (jpeg decoding test)
I applied "dma memory size" patch (increase from 184M to 500M).
But I have memory issue still.
As I heard, the vpu memory size is limited to 32M now .
So, I'm trying to test about increasing vpu memory size from 32M to 35M.
Could I increase? If yes, How could I increase this vpu memory size to 35M?
Thank you.
BRs,
Jessie.
Hi,
You can try to increase the GPU memory size in the boardfile. Currently it's 128M.
static struct viv_gpu_platform_data imx6q_gpu_pdata __initdata = {
.reserved_mem_size = SZ_128M,
};
For the dma zone size:
The change you showed in Kconfig, plus setting the proper size in the kernel config menu should do the trick.
Michel
This is the patch created by LeonardoSandovalGonzalez for the change suggested by Thomas.
Signed-off-by: Leonardo Sandoval <leo.san.gon@gmail.com>
---
src/video/vpu/src/vpudec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/video/vpu/src/vpudec.c b/src/video/vpu/src/vpudec.c index 6066ef8..2f0af18 100644
--- a/src/video/vpu/src/vpudec.c
+++ b/src/video/vpu/src/vpudec.c
@@ -179,7 +179,7 @@ static GstsutilsOptionEntry g_vpudec_option_table[] = {
G_STRUCT_OFFSET (VpuDecOption, adaptive_drop), "true"},
{PROP_FRAMES_PLUS, "frame-plus", "addtionlal frames",
"set number of addtional frames for smoothly playback", G_TYPE_INT,
- G_STRUCT_OFFSET (VpuDecOption, bufferplus), "6", "1", STR_MAX_INT},
+ G_STRUCT_OFFSET (VpuDecOption, bufferplus), "1", "1",
+ STR_MAX_INT},
{PROP_OUTPUT_FORMAT, "output-format", "output format",
"set raw format for output",
G_TYPE_ENUM,
--
actually .. it was saving enough memory .. but the pipeline seems to break down (for the QtMultimedia) case .. the lower the value the 'easier'/faster it breaks down.
6 seems to be a good value after all :smileywink:
The patch provided solves the VPU allocation problem but I think there is still another element that needs a similar fix. Perhaps it's the IPU!!?
The VPU itself is allocating 10MB for each pipeline but the total memory allocated by a single pipeline is about 90MB?
Where is this memory going? is it the IPU?
Could you post your pipeline? Maybe LeonardoSandovalGonzalez Can take a look.
The problem can be reproduced if you run 3 pipelines using
gst-launch playbin2 uri=file:///1080p-file video-sink=mfw_isink.
My exact pipeline is:
appsrc ! typefinder ! vpudec ! mfw_isink
But I don't think this problem is specific to my pipeline
There are several limitations when trying to analyse a problem like this.
One is VPU troughput
Other is DDR troughput.
VPU Max decoding trougput is
99,532,800 pixels/s
So for example with Dual 1080p playback @30fps:
Dual 1080p | X | Y | Rate | Pixel Rate | |
1080p Raw | 1080p@30 | 1920 | 1080 | 30 | 62,208,000 |
1080p Raw | 1080p@30 | 1920 | 1080 | 30 | 62,208,000 |
124,416,000 |
Exceeds supported throughput at default VPU Frequency (266MHz)
Can be handled if VPU Frequency is 352MHz but is still under test.
So for example at 24fps
Dual 1080p | X | Y | Rate | Pixel Rate | |
1080p Raw | 1080p@30 | 1920 | 1080 | 24 | 49,766,400 |
1080p Raw | 1080p@30 | 1920 | 1080 | 24 | 49,766,400 |
99,532,800 |
Is right on the limit.
DDR use
Theoretical Peak DDR bandwith of a DDR3 32Bits is 3200 MB/s
Estimated bus utilization is about 50% so 1600 MB/s available
Decoding one 1080p video at 30fps uses:
DDR load (VPU Decode) | |||||||||
H.264 1080p 30fps | 383.0 | MB/s | |||||||
DDR load (VDOA ) | |||||||||
1080p YUV420 tiled -> YUV422 raster | 210.0 | MB/s | |||||||
Display refresh | |||||||||
display DDR load | |||||||||
BW = FW x FH x fps x data format (2 bytes/pixel) | |||||||||
For 1080p60, YUV422 output this would be | |||||||||
FW | FH | fps | bytes/pix | ||||||
1920 | 1080 | 60 | 2 | 248.8 | MB/s | background - combining done in the DP | |||
1920 | 1080 | 60 | 2 | 248.8 | MB/s | overlay - combining done in the DP |
With a total of 1090.7 MB/s of DDR, so even dual playback at 30fps is not possible.
The max 1080p playback possible is dual playback, and the videos need to be at <=24fps.