Memory Reserved on i.MX6 for VPU

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Memory Reserved on i.MX6 for VPU

Jump to solution
24,376 Views
Tarek
Senior Contributor I

Hi All,

I need to play 4 x 1080p streams concurrently but the VPU fails to allocate memory. I'm getting the following kernel error message:

Physical memory allocation error!

The "gpumem=" command line argument explained in this post does not affect the behaviour. I tried 64M,128M and 256M without any success.

Also this patch does not solve the problem.

Is there any more suggestions?  

Thanks,

Tarek

Labels (4)
Tags (2)
1 Solution
6,099 Views
Tarek
Senior Contributor I

You can reduce the amount of memory the VPU is allocating to minimum by setting the frame-plus property.

For example:

gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=540 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=540 disp-width=960 disp-height=540

This command can play 4 x 1080p streams

View solution in original post

21 Replies
6,100 Views
Tarek
Senior Contributor I

You can reduce the amount of memory the VPU is allocating to minimum by setting the frame-plus property.

For example:

gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=0 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=0 axis-top=540 disp-width=960 disp-height=540 & gst-launch filesrc location=wheel.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! mfw_isink axis-left=960 axis-top=540 disp-width=960 disp-height=540

This command can play 4 x 1080p streams

6,080 Views
senykthomas
Contributor III

I'm using wandsolo.

I do get this error when I combine 1080p video with an active opengl context.

e.g.:

gst-launch filesrc location=sintel_trailer-1080p.mp4 typefind=true ! aiurdemux ! vpudec ! glimagesink

  => " Physical memory allocation error! "

(note that it would work with mfw_v4lsink, but I'm after the combined-with-opengl-use-case)

It's working with:

gst-launch filesrc location=sintel_trailer-1080p.mp4 typefind=true ! aiurdemux ! vpudec frame-plus=1 ! glimagesink

fyi: I still get:

[WARN] VPU iram is less than needed, some parts don't use iram

... but it's rendering properly anyway!

The problem is that I want to use this within Qt. (glimage sink is just a example so simple test that frame-plus=1 is working)

Qt doesn't build the whole pipeline manually, rather lets gstreamer define most of it and just adds the 'end pieces'.

Therefor I think it's practically impossible to add frame-plus=1 to vpudec ... unless one restructures QtMultimedia significantly.

Is there a way to set that as environment variable or something like that?

  So something like:

export GST_VPU_FRAME_PLUS=1; gst-launch ...

would be awesome! :smileyhappy:

Alternate question: can I somehow adjust the size of the memory in question? (gpu mem? vpu mem? ipu mem?)

I've seen the Android mem-alloc page, gut gpumem=... seems to not work on linux.

I've tried fbmem=100M, but it's not working either.

I also tried added the patch (an adjusted version) from here: Re: GStreamer crashing on i.MX6 (Boundary Devices Nitrogen6x)

  .. to board-wand.c

Nothing changes.

0 Kudos
6,081 Views
Tarek
Senior Contributor I

Hi Thomas,

1. For your first suggestion you can modify gst-fsl-plugins to set the default value to 1. I think if you changed the number "6" to "1" in vpudec.c line 182 that will do.

2. For your alternative solution you can modify the kernel memory map to give more space for DMA zone and reduce the VMALLOC zone. The VPU is allocating memory from DMA and at some point there is it enough for it.

If you look at kernel messages at boot time you will see the memory map. Something like this:

Memory: 640MB 256MB = 896MB total

Memory: 896808k/896808k available, 151768k reserved, 0K highmem

Virtual kernel memory layout:

    vector  : 0xffff0000 - 0xffff1000   (   4 kB)

    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)

    DMA     : 0xe0a00000 - 0xffe00000   ( 500 MB)

    vmalloc : 0xc0800000 - 0xde400000   ( 476 MB)

    lowmem  : 0x80000000 - 0xc0000000   (1024 MB)

    pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)

    modules : 0x7f000000 - 0x7fe00000   (  14 MB)

      .init : 0x80008000 - 0x8003c000   ( 208 kB)

      .text : 0x8003c000 - 0x80a95b64   (10599 kB)

      .data : 0x80a96000 - 0x80af9ca0   ( 400 kB)

       .bss : 0x80af9cc4 - 0x80b47eac   ( 313 kB)

Hope that helps

6,081 Views
senykthomas
Contributor III

hmm I tried to adjust the memory ... this is roughly what I've tried:

http://pastebin.com/SMRHpVu4

.. this version doesn't boot .. without the modifications on the DMA size, it's booting.

the kernel error:

[0.423988] ------------[ cut here ]------------
[0.428696] WARNING: at arch/arm/mm/dma-mapping.c:174 consistent_init+0x60/0xd0()
[0.436201] Modules linked in:
[0.439353] [<c0043f3c>] (unwind_backtrace+0x0/0xf4) from [<c006ec60>] (warn_slowpath_common+0x54/0x64)
[0.448787] [<c006ec60>] (warn_slowpath_common+0x54/0x64) from [<c006ed0c>] (warn_slowpath_null+0x1c/0x24)
[0.458506] [<c006ed0c>] (warn_slowpath_null+0x1c/0x24) from [<c000bab0>] (consistent_init+0x60/0xd0)
[0.467772] [<c000bab0>] (consistent_init+0x60/0xd0) from [<c00375c0>] (do_one_initcall+0x11c/0x174)
[0.476963] [<c00375c0>] (do_one_initcall+0x11c/0x174) from [<c00089ec>] (kernel_init+0xc0/0x144)
[0.485870] [<c00089ec>] (kernel_init+0xc0/0x144) from [<c003da7c>] (kernel_thread_exit+0x0/0x8)
[0.494712] ---[ end trace 1b75b31a2719ed1c ]---
[0.499427] ------------[ cut here ]------------
[0.504068] WARNING: at arch/arm/mm/dma-mapping.c:174 consistent_init+0x60/0xd0()
[0.511589] Modules linked in:
[0.514694] [<c0043f3c>] (unwind_backtrace+0x0/0xf4) from [<c006ec60>] (warn_slowpath_common+0x54/0x64)
[0.524141] [<c006ec60>] (warn_slowpath_common+0x54/0x64) from [<c006ed0c>] (warn_slowpath_null+0x1c/0x24)
[0.533834] [<c006ed0c>] (warn_slowpath_null+0x1c/0x24) from [<c000bab0>] (consistent_init+0x60/0xd0)
[0.543104] [<c000bab0>] (consistent_init+0x60/0xd0) from [<c00375c0>] (do_one_initcall+0x11c/0x174)
[0.552272] [<c00375c0>] (do_one_initcall+0x11c/0x174) from [<c00089ec>] (kernel_init+0xc0/0x144)
[0.561189] [<c00089ec>] (kernel_init+0xc0/0x144) from [<c003da7c>] (kernel_thread_exit+0x0/0x8)
[0.570000] ---[ end trace 1b75b31a2719ed1d ]---

Are you actually sure it's the kernel-DMA size?... I've checked on my other boards (nitrogen and sabreauto) none of them has a bigger DMA size, but the problem doesn't occur on them ...?

nitrogen:

[0.000000] vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[0.000000] fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[0.000000] DMA : 0xf4600000 - 0xffe00000   ( 184 MB)
[0.000000] vmalloc : 0xc0800000 - 0xf2000000   ( 792 MB)
[0.000000] lowmem  : 0x80000000 - 0xc0000000   (1024 MB)
[0.000000] pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
[0.000000] modules : 0x7f000000 - 0x7fe00000   (  14 MB)
[0.000000]   .init : 0x80008000 - 0x80035000   ( 180 kB)
[0.000000]   .text : 0x80035000 - 0x8067e550   (6438 kB)
[0.000000]   .data : 0x80680000 - 0x806d7160   ( 349 kB)
[0.000000]    .bss : 0x806d7184 - 0x80721818   ( 298 kB)

AI:

vector  : 0xffff0000 - 0xffff1000   (   4 kB)
fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
DMA : 0xf4600000 - 0xffe00000   ( 184 MB)
vmalloc : 0xea800000 - 0xf2000000   ( 120 MB)
lowmem  : 0x80000000 - 0xea000000   (1696 MB)
pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
modules : 0x7f000000 - 0x7fe00000   (  14 MB)
  .init : 0x80008000 - 0x8003e000   ( 216 kB)
  .text : 0x8003e000 - 0x80b0d54c   (11070 kB)
  .data : 0x80b0e000 - 0x80b7a070   ( 433 kB)
   .bss : 0x80b7a094 - 0x80bee4d0   ( 466 kB)
0 Kudos
6,085 Views
ChucoChe
NXP Employee
NXP Employee

Did changing any of the memory settings helped?

Michel

0 Kudos
6,085 Views
Tarek
Senior Contributor I

In short, with frame-plus=1 you can play 4x1080p streams without problems. With changing the memory map you can play 8x1080p but the system is unstable.

0 Kudos
6,082 Views
YixingKong
Senior Contributor IV

Tarek

Had your issue got resolved? If yes, we are going to close the discussion in 3 days. If you still need help please feel

free to contact Freescale.

Thanks,
Yixing

0 Kudos
6,071 Views
Tarek
Senior Contributor I

Yes please close

0 Kudos
6,085 Views
senykthomas
Contributor III

Sorry for the late response. I was out of office for a week.

This patch looks good, unfortunately it's not for the kernel and board I was looking for.

I was looking for linux-wandboard  (git://github.com/johnweber/linux.git) and board is wandboard-solo

Also 500mb-DMA-size is not applicable (the wandboard-solo only has 512mb ram) .. it's a 'enough for 1x1080p when someone else is consuming GPU as well' (within a Qt/OpenGL application)

0 Kudos
6,085 Views
ChucoChe
NXP Employee
NXP Employee

Hi Thomas,

     You could ping timesys on the issue. They may provide the proper patch.

Michel

0 Kudos
6,085 Views
Jessie_Lee
NXP Employee
NXP Employee

Dear Michael.

I have met vpu memory allocation failure too.:smileycry:

Test environment

- LTIB, version : 3.0.35-4.1.0

- mxc_vpu_test.out (jpeg decoding test)

I applied "dma memory size" patch (increase from 184M to 500M).

But I have memory issue still.

As I heard, the vpu memory size is limited to 32M now .

So, I'm trying to test about increasing vpu memory size from 32M to 35M.

Could I increase? If yes, How could I increase this vpu memory size to 35M?

Thank you.

BRs,

Jessie.

0 Kudos
6,081 Views
ChucoChe
NXP Employee
NXP Employee

Hi,

You can try to increase the GPU memory size in the boardfile. Currently it's 128M.

static struct viv_gpu_platform_data imx6q_gpu_pdata __initdata = {

          .reserved_mem_size = SZ_128M,

};

For the dma zone size:

The change you showed in Kconfig, plus setting the proper size in the kernel config menu should do the trick.

  1. diff --git a/arch/arm/plat-mxc/Kconfig b/arch/arm/plat-mxc/Kconfig
  2. index 32408ed..39e9cde 100755
  3. --- a/arch/arm/plat-mxc/Kconfig
  4. +++ b/arch/arm/plat-mxc/Kconfig
  5. @@ -164,7 +164,7 @@ config CLK_DEBUG
  6. config DMA_ZONE_SIZE
  7.          int "DMA memory zone size"
  8. -        range 0 184
  9. +        range 0 220
  10.          default 24
  11.          help
  12.            This is the size in MB for the DMA zone. The DMA zone is used for


Michel

0 Kudos
6,081 Views
Tarek
Senior Contributor I

Hi Michel,

Unfortunately the menu config doesn't work. The values are hard coded  in the drivers and any menu config value has no effect.

You need to modify the Linux source code.

Thanks

6,081 Views
FranciscoCarril
Contributor V

This is the patch created by LeonardoSandovalGonzalez  for the change suggested by Thomas.

Signed-off-by: Leonardo Sandoval <leo.san.gon@gmail.com>

---

src/video/vpu/src/vpudec.c |    2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/video/vpu/src/vpudec.c b/src/video/vpu/src/vpudec.c index 6066ef8..2f0af18 100644

--- a/src/video/vpu/src/vpudec.c

+++ b/src/video/vpu/src/vpudec.c

@@ -179,7 +179,7 @@ static GstsutilsOptionEntry g_vpudec_option_table[] = {

G_STRUCT_OFFSET (VpuDecOption, adaptive_drop), "true"},

{PROP_FRAMES_PLUS, "frame-plus", "addtionlal frames",

         "set number of addtional frames for smoothly playback", G_TYPE_INT,

- G_STRUCT_OFFSET (VpuDecOption, bufferplus), "6", "1", STR_MAX_INT},

+ G_STRUCT_OFFSET (VpuDecOption, bufferplus), "1", "1",

+ STR_MAX_INT},

{PROP_OUTPUT_FORMAT, "output-format", "output format",

         "set raw format for output",

G_TYPE_ENUM,

--

  1. 1.7.9.5
6,081 Views
senykthomas
Contributor III

Thanks for the pointer!

That's working ... apparently still not enough memory saved for QtMultimedia .. :smileywink:

0 Kudos
6,081 Views
senykthomas
Contributor III

actually .. it was saving enough memory .. but the pipeline seems to break down (for the QtMultimedia) case .. the lower the value the 'easier'/faster it breaks down.

6 seems to be a good value after all :smileywink:

0 Kudos
6,081 Views
Tarek
Senior Contributor I

The patch provided solves the VPU allocation problem but I think there is still another element that needs a similar fix. Perhaps it's the IPU!!?

The VPU itself is allocating 10MB for each pipeline but the total memory allocated by a single pipeline is about 90MB?

Where is this memory going? is it the IPU?

0 Kudos
6,081 Views
FranciscoCarril
Contributor V

Could you post your pipeline?   Maybe LeonardoSandovalGonzalez Can take a look.

0 Kudos
6,081 Views
Tarek
Senior Contributor I

The problem can be reproduced if you run 3 pipelines using

gst-launch playbin2 uri=file:///1080p-file video-sink=mfw_isink.

My exact pipeline is:

appsrc ! typefinder  ! vpudec ! mfw_isink

But I don't think this problem is specific to my pipeline

0 Kudos
6,081 Views
FranciscoCarril
Contributor V

There are several limitations when trying to analyse a problem like this.

One is VPU troughput

Other is DDR troughput.

VPU Max decoding trougput is

99,532,800 pixels/s

So for example with Dual 1080p playback @30fps:

Dual 1080pXYRatePixel Rate
1080p Raw1080p@30192010803062,208,000
1080p Raw1080p@30192010803062,208,000
124,416,000

Exceeds supported throughput at default VPU Frequency (266MHz)

Can be handled if VPU Frequency is 352MHz but is still under test.

So for example at 24fps

Dual 1080pXYRatePixel Rate
1080p Raw1080p@30192010802449,766,400
1080p Raw1080p@30192010802449,766,400
99,532,800

Is right on the limit.

DDR use

Theoretical Peak DDR bandwith of a DDR3  32Bits is 3200 MB/s

Estimated bus utilization is about 50%  so 1600 MB/s available

Decoding one 1080p video at 30fps uses:

DDR load (VPU Decode)
H.264 1080p 30fps383.0MB/s
DDR load (VDOA )
1080p YUV420 tiled -> YUV422 raster210.0MB/s
Display refresh
display DDR load
BW = FW x FH x fps x data format (2 bytes/pixel)
For 1080p60, YUV422 output this would be
FWFHfpsbytes/pix
19201080602248.8MB/sbackground - combining done in the DP
19201080602248.8MB/soverlay - combining done in the DP

With a total of 1090.7 MB/s of DDR, so even dual playback at 30fps is not possible.

The max 1080p  playback possible is dual playback, and the videos need to be at <=24fps.