i.MX6: async behavior of glTexDirectVIVMap

senykthomas · ‎07-17-2013

I'm currently trying to implement a QtQuick2.0 Video-Backend to use glTexDirectVIVMap, so that we can get to <10% (maybe <5%) for 1080p video playback.

I'm already rather far and it does look promising!

Here is the interesting part of the code:

http://pastebin.com/rPjUrb2N

I'm just pasting part of the code for simplicity reasons.

'bind()' is the function which is called with a active glContext and should result in a glBindTexture call.

All the other code is glue-code and not important for this question.

(If someone want the code never the less, just ask me. It will eventually be open sourced anyway so no problem in sharing the unfinished version)

I still have two mayor problems.

This question is about my struggle with glTexDirectVIVMap:

It seams that glTexDirectVIVMap is mapping the memory in the background...?

What I meaning is that it's completely async ... I think.

I seems that the time it requires to do the map is heavily depending on the chip (GC800 or GC2000, imx6solo vs. quad) and also(!) very much depending on what else is going on on the GPU.

It can very

from < 1/24 seconds (==smooth playback)

to ~ 1/4 seconds (the only way to get non-stuttering playback to reduce to playback speed to ~1/6)

This is a huge problem.

*edit - missed to describe to problem*

The problem is that the frame start to 'stutter' as soon as the delay is to high.

On the solo it's stuttering all the time.

On the quad it's only stuttering if one renders something else in the background (which results in 60fps instead of 24fps opengl-rendering)

*edit - end*

I have a couple of possible solutions in mind ... but either I don't have enough knowledge for it or it has massive drawbacks.

Possible solutions:

1. I'm doing something wrong with glTexDirectVIVMap or glTexDirectInvalidateVIV

... if someone spots an error, let me know.

... I've tried to use glTexDirectVIV as well without success, I think it's not meant for this use-case (==where memory allocation happens somewhere else)

2. queue with a specific value.

I could do a N-buffering approach

So this would mean one would:

1. take a new frame

2. map it to texture

3. push texture on a stack/FIFO

4. pop oldest texture from stack (and free memory)

During all this time the 'oldest texture' is used for rendering.

This could be configurable but at the end of the day one would need to find a reasonable value for N.

It looks like that the max. value for N could be ~6.

... this is just a guess!

If frames come along and the stack is full, one would drop them.

6 frames are 0.25 seconds (24fps/6frame).

Meaning one would need to buffer 0.25 seconds audio as well.

3. queue until map is done.

Same thing as '2.' but rather then guessing 'N', one could use (if existing) some API to get the state of the memory mapping?

GL_MAPPING_DONE flag or something like that?

Then one would pop the oldest texture as soon as the second-oldest texture is ready.

karina_valencia · ‎08-26-2013

Re: i.MX6: async behavior of glTexDirectVIVMap

PrabhuSundararaj Aug 26, 2013 10:14 AM (in response to Carlos Roberto De Aguinaga Ramirez)

VPU is expected to create 7 buffers for 1080p video. But
we see 13 textures are created. So the current way of mapping is correct. No
harm in it.

When you run the sintel_trailer-1080p.mp4, the frame
address incoming is not in round robin fashion, since this video is high
profile.

VPU will take the available free buffer.

In this case I can see the same address is being repeated
for 3 consecutive frames. That means, the both GPU and VPU accessing the
buffer, causing this distortion.

For this case I added a check if the same address is
repeating immediately for the next frame do a glFinish before

glBindTexture(GL_TEXTURE_2D,
bitsToTextureMap.value(vF.bits()));

glTexDirectInvalidateVIV(GL_TEXTURE_2D);

Fixed the problem to some extent.

Also I tried the base line profile video, where the
address were not repeated, it worked in round robin fashion. But I can see the
distortion when I add parallel GLES drawings.

Here are my findings

-eglSwapBuffer
does not wait for the frame to complete. It flushes the command buffer and
gives back control for the application for the next frame. So the command
buffers which including the texture to be rendered is in GPU queue, not
rendered by the GPU. In the meanwhile VPU also can take the buffer control and
can decode on it. Hence this asynchronous behavior causing this problem.

- One solution
is to make sure the frame is completely processed by GPU and give back control
to VPU to process the next frame. As like below

else {

glFinish();

glBindTexture(GL_TEXTURE_2D,
bitsToTextureMap.value(vF.bits()));

glTexDirectInvalidateVIV(GL_TEXTURE_2D);

return bitsToTextureMap.value(vF.bits());

}

- glTexImage2D
will work good in this case, because the texture will be copied to GPU memory.
No race conditions.

- Please let me know whether the glFinish is solving the problem on wandsolo, also would
like to know the performance impact.

/message/346765?et=watches.email.thread

View solution in original post

senykthomas · ‎08-05-2013

I can only agree with all of Volkers statements :smileyhappy:

It's not solved for me either ... I've something running, but it's not ideal and it's certainly not a solution for the wandsolo

karina_valencia · ‎08-26-2013

Re: i.MX6: async behavior of glTexDirectVIVMap

PrabhuSundararaj Aug 26, 2013 10:14 AM (in response to Carlos Roberto De Aguinaga Ramirez)

VPU is expected to create 7 buffers for 1080p video. But
we see 13 textures are created. So the current way of mapping is correct. No
harm in it.

When you run the sintel_trailer-1080p.mp4, the frame
address incoming is not in round robin fashion, since this video is high
profile.

VPU will take the available free buffer.

In this case I can see the same address is being repeated
for 3 consecutive frames. That means, the both GPU and VPU accessing the
buffer, causing this distortion.

For this case I added a check if the same address is
repeating immediately for the next frame do a glFinish before

glBindTexture(GL_TEXTURE_2D,
bitsToTextureMap.value(vF.bits()));

glTexDirectInvalidateVIV(GL_TEXTURE_2D);

Fixed the problem to some extent.

Also I tried the base line profile video, where the
address were not repeated, it worked in round robin fashion. But I can see the
distortion when I add parallel GLES drawings.

Here are my findings

-eglSwapBuffer
does not wait for the frame to complete. It flushes the command buffer and
gives back control for the application for the next frame. So the command
buffers which including the texture to be rendered is in GPU queue, not
rendered by the GPU. In the meanwhile VPU also can take the buffer control and
can decode on it. Hence this asynchronous behavior causing this problem.

- One solution
is to make sure the frame is completely processed by GPU and give back control
to VPU to process the next frame. As like below

else {

glFinish();

glBindTexture(GL_TEXTURE_2D,
bitsToTextureMap.value(vF.bits()));

glTexDirectInvalidateVIV(GL_TEXTURE_2D);

return bitsToTextureMap.value(vF.bits());

}

- glTexImage2D
will work good in this case, because the texture will be copied to GPU memory.
No race conditions.

- Please let me know whether the glFinish is solving the problem on wandsolo, also would
like to know the performance impact.

/message/346765?et=watches.email.thread

volki · ‎11-12-2013

Unfortunately the glFinish does not seem to solve our issue. The pipeline I'm currently using is still the same as described already before.

Create two textures using glTexDirectVIV using GL_VIV_NV12
Define on texture as current backbuffer the gstreamer handoff callback copies it's data to using memcpy (from within the callback thread)
Post an event after data has been copied to the main rendering thread that it should call glTexDirectInvalidateVIV with the backbuffer texture bound
After glTexDirectInvalidateVIV has been called in the main rendering thread, front/backbuffer textures will be swapped and the previous backbuffer texture will be rendered
Now that rendering is done using the backbuffer texture, the previously used frontbuffer texture is the new backbuffer texture and processing continues with the handoff callback described in point 2

As mentioned, the rendering is fine as long as no other GUI elements are rendered using OpenGL. As soon as we e.g. rotate another normal OpenGL texture rendered as a simple rectangle sprite video rendering starts stuttering.

Adding a call to glFinish before the glTexDirectInvalidateVIV but after binding the texture did not make any difference

Never mind, it seems that the problem occurs in my case because I only used a double buffering and had to do the updates using glTexDirectInvalidateVIV asynchronously to the gstreamer callback in my main rendering loop. As I only have two video buffers it could happen that the callback thread posted more than one frame changed event although it only used always the backbuffer as reference. Since I swap the buffers in the main rendering thread, it could happen, that there are more buffer swaps rendered that acutally occured. So the bug in this case is on my side, sorry for the rumors.

I only wonder what's the best way to do synchronous rendering of the gstreamer data. One option would be to stall the gstreamer callback until the rendering thread has presented the buffer, which might introduce several problems within the gstreamer pipeline processing (not sure about that). Another one would be to queue all incoming data but there I see two problems. Either the main rendering thread is capable of doing higher framerates than the video frame rate, in this case it could happen that due to synchronization problems between the gstreamer threads and the rendering thread there are times, when the rendering is processing several frame_changed events faster than they should be presented (because there are several events queued in the event loop), or if the main rendering thread is slower, you have to drop frames otherwise you would end up filling your memory.

senykthomas · ‎07-18-2013

I tried the N-buffering .. didn't help at all (even with 12 buffers) :smileyhappy:

I got no clue what's going on here

... I've checked the data-buffers, there creation and deletion ... the GstBuffers and everything.Everything comes in in order, gets passed on in order and gets deleted in order.

If I replace the glTexDirectVIVMap with

glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, vF.width(), vF.height(), 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, constBits);

it runs flawlessly.

I'm out of ideas ...

Anyone an idea what I could try, test, change or look at?

i.MX6: async behavior of glTexDirectVIVMap