Long running vpu task with memory leak bug on imx6

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

Long running vpu task with memory leak bug on imx6

15,833 次查看
michaelb_
Contributor III

I have an app that shows a long running RTSP stream (h264 encoded). gstreamer pipeline is as follows:

gst-launch rtspsrc location=rtsp://axis-cam1/axis-media/media.amp ! rtph264depay ! vpudec low-latency=true ! mfw_isink axis-left=40 axis-top=20 disp-width=960 disp-height=540 sync=false

In parallel my app does some other stuff and also play some videos or also short RTSP streams. This works very nice.

After about 1,5 hours, though, all temporary gstreamer activities that are started in parallel to the long running task no longer get any VPU memory. This is the error:

aiurdemux194:si: page allocation failure: order:11, mode:0xd1

[<8004c944>] (unwind_backtrace+0x0/0xf4) from [<800cf614>] (warn_alloc_failed+0xd4/0x10c)

[<800cf614>] (warn_alloc_failed+0xd4/0x10c) from [<800d2094>] (__alloc_pages_nodemask+0x540/0x6e4)

[<800d2094>] (__alloc_pages_nodemask+0x540/0x6e4) from [<8004f658>] (__dma_alloc+0x9c/0x2fc)

[<8004f658>] (__dma_alloc+0x9c/0x2fc) from [<8004fbf0>] (dma_alloc_coherent+0x60/0x68)

[<8004fbf0>] (dma_alloc_coherent+0x60/0x68) from [<80465260>] (vpu_alloc_dma_buffer+0x2c/0x54)

[<80465260>] (vpu_alloc_dma_buffer+0x2c/0x54) from [<804656ac>] (vpu_ioctl+0x424/0x8c0)

[<804656ac>] (vpu_ioctl+0x424/0x8c0) from [<80111a90>] (do_vfs_ioctl+0x3b4/0x530)

[<80111a90>] (do_vfs_ioctl+0x3b4/0x530) from [<80111c40>] (sys_ioctl+0x34/0x60)

[<80111c40>] (sys_ioctl+0x34/0x60) from [<80045f80>] (ret_fast_syscall+0x0/0x30)

Mem-info:

DMA per-cpu:

CPU    0: hi:   90, btch:  15 usd:  87

CPU    1: hi:   90, btch:  15 usd:  87

CPU    2: hi:   90, btch:  15 usd:   0

CPU    3: hi:   90, btch:  15 usd:  83

Normal per-cpu:

CPU    0: hi:  186, btch:  31 usd:  32

CPU    1: hi:  186, btch:  31 usd:  32

CPU    2: hi:  186, btch:  31 usd:  63

CPU    3: hi:  186, btch:  31 usd:  62

HighMem per-cpu:

CPU    0: hi:   90, btch:  15 usd:  89

CPU    1: hi:   90, btch:  15 usd:  83

CPU    2: hi:   90, btch:  15 usd:  77

CPU    3: hi:   90, btch:  15 usd:  78

active_anon:9812 inactive_anon:35 isolated_anon:0

active_file:15255 inactive_file:8527 isolated_file:0

unevictable:0 dirty:0 writeback:0 unstable:2

free:386119 slab_reclaimable:411 slab_unreclaimable:1799

mapped:5509 shmem:48 pagetables:219 bounce:0

DMA free:71672kB min:616kB low:768kB high:924kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolateds

lowmem_reserve[]: 0 1244 1593 1593

Normal free:1249804kB min:4212kB low:5264kB high:6316kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB o

lowmem_reserve[]: 0 0 2794 2794

HighMem free:223000kB min:348kB low:640kB high:936kB active_anon:39248kB inactive_anon:140kB active_file:61020kB inactive_file:34108kB unevio

lowmem_reserve[]: 0 0 0 0

DMA: 22*4kB 90*8kB 21*16kB 82*32kB 7*64kB 137*128kB 69*256kB 3*512kB 8*1024kB 5*2048kB 3*4096kB 0*8192kB 0*16384kB 0*32768kB = 71672kB

Normal: 1*4kB 53*8kB 16*16kB 6*32kB 6*64kB 4*128kB 1*256kB 1*512kB 2*1024kB 2*2048kB 3*4096kB 2*8192kB 4*16384kB 35*32768kB = 1249772kB

HighMem: 14*4kB 10*8kB 437*16kB 386*32kB 374*64kB 27*128kB 10*256kB 7*512kB 8*1024kB 7*2048kB 4*4096kB 2*8192kB 1*16384kB 3*32768kB = 223000B

23829 total pagecache pages

0 pages in swap cache

Swap cache stats: add 0, delete 0, find 0/0

Free swap  = 0kB

Total swap = 0kB

524288 pages of RAM

387015 free pages

73287 reserved pages

1169 slab pages

8545 pages shared

0 pages swap cached

Physical memory allocation error!

Physical memory allocation error!

Alignment trap: multiqueue194:s (5204) PC=0x3ae495b4 Instr=0xe8810018 Address=0x00002a0f FSR 0x801

or this:

rtpjitterbuffer: page allocation failure: order:11, mode:0xd1

[<8004c944>] (unwind_backtrace+0x0/0xf4) from [<800cf614>] (warn_alloc_failed+0xd4/0x10c)

[<800cf614>] (warn_alloc_failed+0xd4/0x10c) from [<800d2094>] (__alloc_pages_nodemask+0x540/0x6e4)

[<800d2094>] (__alloc_pages_nodemask+0x540/0x6e4) from [<8004f658>] (__dma_alloc+0x9c/0x2fc)

[<8004f658>] (__dma_alloc+0x9c/0x2fc) from [<8004fbf0>] (dma_alloc_coherent+0x60/0x68)

[<8004fbf0>] (dma_alloc_coherent+0x60/0x68) from [<80465260>] (vpu_alloc_dma_buffer+0x2c/0x54)

[<80465260>] (vpu_alloc_dma_buffer+0x2c/0x54) from [<804656ac>] (vpu_ioctl+0x424/0x8c0)

[<804656ac>] (vpu_ioctl+0x424/0x8c0) from [<80111a90>] (do_vfs_ioctl+0x3b4/0x530)

[<80111a90>] (do_vfs_ioctl+0x3b4/0x530) from [<80111c40>] (sys_ioctl+0x34/0x60)

[<80111c40>] (sys_ioctl+0x34/0x60) from [<80045f80>] (ret_fast_syscall+0x0/0x30)

Mem-info:

DMA per-cpu:

CPU    0: hi:   90, btch:  15 usd:   0

CPU    1: hi:   90, btch:  15 usd:   0

CPU    2: hi:   90, btch:  15 usd:   0

CPU    3: hi:   90, btch:  15 usd:   0

Normal per-cpu:

CPU    0: hi:  186, btch:  31 usd:   0

CPU    1: hi:  186, btch:  31 usd:   0

CPU    2: hi:  186, btch:  31 usd:   0

CPU    3: hi:  186, btch:  31 usd:   0

HighMem per-cpu:

CPU    0: hi:   90, btch:  15 usd:  14

CPU    1: hi:   90, btch:  15 usd:   0

CPU    2: hi:   90, btch:  15 usd:   0

CPU    3: hi:   90, btch:  15 usd:   0

active_anon:10583 inactive_anon:35 isolated_anon:0

active_file:4074 inactive_file:7742 isolated_file:0

unevictable:0 dirty:1 writeback:0 unstable:0

free:397922 slab_reclaimable:398 slab_unreclaimable:1788

mapped:5165 shmem:47 pagetables:227 bounce:0

DMA free:70760kB min:616kB low:768kB high:924kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolatedo

lowmem_reserve[]: 0 1244 1593 1593

Normal free:1251960kB min:4212kB low:5264kB high:6316kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB o

lowmem_reserve[]: 0 0 2794 2794

HighMem free:268968kB min:348kB low:640kB high:936kB active_anon:42332kB inactive_anon:140kB active_file:16296kB inactive_file:30968kB unevio

lowmem_reserve[]: 0 0 0 0

DMA: 28*4kB 91*8kB 26*16kB 86*32kB 5*64kB 131*128kB 68*256kB 3*512kB 10*1024kB 4*2048kB 3*4096kB 0*8192kB 0*16384kB 0*32768kB = 70760kB

Normal: 113*4kB 58*8kB 40*16kB 17*32kB 4*64kB 2*128kB 1*256kB 2*512kB 1*1024kB 1*2048kB 2*4096kB 3*8192kB 4*16384kB 35*32768kB = 1252148kB

HighMem: 269*4kB 315*8kB 268*16kB 35*32kB 1*64kB 60*128kB 21*256kB 0*512kB 1*1024kB 2*2048kB 1*4096kB 1*8192kB 2*16384kB 6*32768kB = 268908kB

11863 total pagecache pages

0 pages in swap cache

Swap cache stats: add 0, delete 0, find 0/0

Free swap  = 0kB

Total swap = 0kB

524288 pages of RAM

398106 free pages

73287 reserved pages

1142 slab pages

8331 pages shared

0 pages swap cached

Physical memory allocation error!

Physical memory allocation error!

I am using kernel 3.0.35 from yocto on a cubox-i4pro device.

vpudec versions :smileyhappy:

    plugin: 3.0.10

    wrapper: 1.0.45(VPUWRAPPER_ARM_LINUX Build on Apr 15 2014 09:52:34)

    vpulib: 5.4.20

    firmware: 2.3.10.40778

The same happens when I use a long running vpu encode job without display element (instead of a decode job). If the gstreamer pipeline does not include a VPU element there is no problem.

This seems to be a memory leak problem, I guess? Where should I report it? Is there an official bug tracker?

18 回复数

8,262 次查看
peterbauer
Contributor I

I also see this problem(s) on the Utilite pro from compulab with kernel 3.0.35. The current workaround is
1. a kernel patch originating from the wandboard project, this solves dma memory fragmentation problems

iMX6 VPU: Use a DMA allocation pool, instead of kernel default allocator · 7cbd06b · wandboard-org/l...  

2. dropping disc caches during video playback via VPU with a small perl script

Utilite & Trim-Slice Users Forum - View topic - Ubuntu 12.04, Video playing with Totem

#!/usr/bin/perl

while (1) {

  sleep 10;

  print "dropping cache\n";

  `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`;

}

Find a test case to reproduce part of the problem (large disc cache stops video playback !?):

Ubuntu 12.04.3 from Jan2014 with latest kernel

Linux utilite-desktop 3.0.35-cm-fx6-5.3-unoff-unsup-00001-gb0886b4 #427 SMP Thu Feb 6 11:27:43 IST 2014 armv7l armv7l armv7l GNU/Linux

Test case:

precondition: device rebooted

  video playback via gplay and totem is working

  Swap partition with 2 GB on SSD is active

  login as user utilite

Action 1: start firefox

  firefox http://www.google.com&

Reaction: firefox opens web page

Action 2: create large file

  dd if=/dev/zero bs=1M of=test.txt count=4000

Reaction: file created on disc (SSD)

Action 3: gplay /path_to_file/video.mp4

Reaction: video is played (stop video playback with control-C)

Action 4: Repeat Action 2,3 - two times

Reaction: gplay stops with memory allocation failure

I really hope the situation will improve with kernel 3.10.17.

Best Regards,
Peter Bauer

http://bitkistl.blogspot.com

0 项奖励
回复

8,262 次查看
ottoblom
Contributor III

I see something similar playing MPEG2 TS RTP Streams on the 4.1.0 BSP. After a day or so I get -

BUG: Bad page state in process udpsrc0:src  pfn:50612

page:8c00c254 count:31 mapcount:-1946107299 mapping:8bf86cb8 index:0x8bf84938

page flags: 0xba(error|uptodate|dirty|lru|slab)

[<8004ae34>] (unwind_backtrace+0x0/0xf8) from [<800c9e60>] (bad_page+0x9c/0xf8)

[<800c9e60>] (bad_page+0x9c/0xf8) from [<800ca3d8>] (get_page_from_freelist+0x424/0x518)

[<800ca3d8>] (get_page_from_freelist+0x424/0x518) from [<800cab70>] (__alloc_pages_nodemask+0xf0/0x6cc)

[<800cab70>] (__alloc_pages_nodemask+0xf0/0x6cc) from [<800df924>] (handle_pte_fault+0x5e0/0x7d8)

Looks like the same type of problem that you are experiencing. Any word from Freescale if there is a new BSP in the making ? This seems like a fairly major problem for anyone using the VPU.

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

I found an interesting post about a very similar (or identical) problem on the wandboard: Wandboard - Freescale i.MX6 ARM Cortex-A9 Opensource Community Development Board - January 17 2014 -...

0 项奖励
回复

8,262 次查看
jamesbone
NXP TechSupport
NXP TechSupport

Let me ask internally to our  VPU experts if they have some comments on this ?

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

That would be great. I am currently testing to cancel the encoding process every hour and restart it. So there would be a short unavailability of the service. Not nice, but if it works, maybe I can live with it.

0 项奖励
回复

8,262 次查看
jamesbone
NXP TechSupport
NXP TechSupport

Hello Michael,

I a got a response from the internal team, they mention that We are aware of a defect in 3.0.35 where free memory is depleted by faulty cache allocations and frees.


Have a great day,
Jaime

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

Hi Jaime

Very good. But does that mean: "yeah, we know, but we don't know what to do either" or "yes, we are working on it and will have a patch soon"?

8,262 次查看
jamesbone
NXP TechSupport
NXP TechSupport

Hello.

We recommend going to 3.10.17 if it is possible.  As I said we are aware of a memory management problems in 3.0.35, but identifying the patch (or patchset) has been a difficult path. There seems to be  a pervasive problem with kmem_cache_alloc (and kmem_cache_free), and in 3.10.17 that entire memory mamanegment section (slab) has been redone, and the defect has not been replicated.


Have a great day,
Jaime

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

Hi Jamie

Actually, I have moved about a week ago to kernel 3.10.30 (current stable version for cubox-i). I still see the same problem, though.

I guess, if you cannot replicate the problem, that will make it difficult to solve. :-(

I can get another kernel backtrace. Maybe it differs to the one from 3.0.35 and can help identify, if it is still the exact same issue.

Michael

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

OK, just did it again on the 3.10.30er Kernel. There is a difference: There is no longer a kernel oops.

Otherwise, the application still doesn't get VPU memory and segfaults. This is the syslog:

May 12 18:26:16 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:26:16 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:27:00 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:27:00 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:27:48 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:27:48 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:28:33 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:28:33 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:28:33 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:28:33 cubox-i user.err kernel: mxc_vpu 2040000.vpu: Physical memory allocation error!

May 12 18:28:34 cubox-i user.warn kernel: Alignment trap: multiqueue156:s (14293) PC=0x65adcfc8 Instr=0xe8810011 Address=0x00002f0f FSR 0x80

I will try to get a new 3.15er kernel in the near future and see if this is still the same. Without knowing much about memory allocation, I am still wondering, if it could be a problem of the gst-fsl libs?

0 项奖励
回复

8,262 次查看
ottoblom
Contributor III

That is the big question... I'm having the same problem btw

0 项奖励
回复

8,262 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Michael,

system may be running out of contiguous memory

try clearing out the disk caches:


# for n in 1 2 3 ; do echo $n > /proc/sys/vm/drop_caches ; done

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

Good idea. But, unfortunately, doesn't help.

This does not seem to be related to vm caches, but rather vpu memory and dma allocation. Maybe the dma pool is exhausted? But then the question in general is, why does the long running vpu job exhaust any ressources? Sounds like a leak somewhere, doesn't it?

My application regularly starts gstreamer short gstreamer jobs. Normally, this is no problem at all. EXCEPT when there is a parallel long running vpu job.

0 项奖励
回复

8,262 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Michael, for dma allocation there were some patches provided below (patch_v4l2_issue.zip,patch-v4l.zzip.zip)

GStreamer crashing on i.MX6 (Boundary Devices Nitrogen6x)

8,262 次查看
michaelb_
Contributor III

That looks pretty interesting. Will, however, not be very simple to apply, I guess. I am using a cubox-i and am currently on kernel 3.10.30. I guess, I need to show that patch a kernel hacker to see, if it can be applied for my system.

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

Hmm, just checked by going back to the 3.0.35 kernel (I use both, the 3.0.35 and the 3.10.30). This patch (https://community.freescale.com/servlet/JiveServlet/download/323480-258660/patch-v4l.zzip.zip) is already included in the kernel. So, no, it does not help. :-(

0 项奖励
回复

8,262 次查看
Wlodek_D_
Senior Contributor II

Hello,

Thank you for your post, however please consider moving it to the right community place (e.g. i.MX Community ) to get it visible for active members.

For details please see general advice Where to post a Discussion?

Thank you for using Freescale Community. 

0 项奖励
回复

8,262 次查看
michaelb_
Contributor III

done

0 项奖励
回复