GPU affinity and Vulkan

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

GPU affinity and Vulkan

2,383件の閲覧回数
lavusedu
Contributor II

Hi,

I'm experimenting with imx8qmmek board and I'm running into issues with GPU affinity support with Vulkan.

In OpenCL I can use the VIV_MGPU_AFFINITY environment variable to choose either or both the chips to execute my compute workload - this has measurable difference in performance and power consumption so I am confident it is working.

However, executing an equivalent compute workload through Vulkan I see the performance equivalent to that of OpenCL with affinity set only on one chip. The environment variable does not seem to affect the computation at all. I also cannot see the devices separately in Vulkan, neither as physical devices, nor through the vkEnumeratePhysicalDeviceGroupsKHR extension function which is supported in the drivers.

I tested this on imx-5.10.72-2.2.0 and imx-5.15.32-2.0.0 yocto builds, neither of them seems to allow me to use both chips at once in Vulkan. Is there something I'm missing to allow me to execute a compute workload in Vulkan on both chips same as in OpenCL?

タグ(3)
0 件の賞賛
返信
7 返答(返信)

2,375件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello lavusedu,

I looked into the combine or independent mode setting for vulkan,  I got:

option->affinityMode = __VK_MGPU_AFFINITY_COMBINE;

option->affinityCoreID = 0

 

gcoOS_GetEnv(gcvNULL, "VIV_MGPU_AFFINITY", &affinity);

    if (affinity)

    {

        gctSIZE_T length;

        gcoOS_StrLen(affinity, &length);

        if (length >= 1)

        {

            if (affinity[0] == '0')

            {

                option->affinityMode = __VK_MGPU_AFFINITY_COMBINE;

            }

            else if ((affinity[0] == '1') && (affinity[1] == ':'))

            {

                if ((affinity[2] != '0') || (affinity[2] != '1'))

                {

                    option->affinityMode = __VK_MGPU_AFFINITY_INDEPENDENT;

                    option->affinityCoreID = affinity[2] - '0';

                }

            }

        }

    }

 

I feel the code has a bug,  in line if ((affinity[2] != '0') || (affinity[2] != '1')),  I think it should be

if ((affinity[2] == '0') || (affinity[2] == '1'))

 

It seems to me that the above code will always set gpu to be combined mode if following the variable setup in the graphics manual.

 

Can you try export 

export VIV_MGPU_AFFINITY=1:2,  

to see if you have the application running only on gpu 1?

 

In the mean time, The developers will work on this issue.

 

Regards

0 件の賞賛
返信

2,372件の閲覧回数
lavusedu
Contributor II

Hello,

thank you for your answer.

Unfortunately setting the environment variable to higher numbers (e.g. VIV_MGPU_AFFINITY=1:2) causes both OpenCL and Vulkan to segfault. In addition, in Vulkan this causes the kernel to report an internal error in logs with a call trace:

Call trace:
gckKERNEL_Dispatch+0x57c/0x13f0
gckDEVICE_Dispatch+0x84/0x1d0
drv_ioctl+0x2b8/0x460
__arm64_sys_ioctl+0xac/0xf0
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0xd4/0xfc
do_el0_svc+0x2c/0x94
el0_svc+0x28/0x80
el0t_64_sync_handler+0xa8/0x130
el0t_64_sync+0x1a0/0x1a4

0 件の賞賛
返信

2,365件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

I changed the code and rebuilt vulkan library for you to try.  Can you try the attached vulkan lib?  It is based on 5.10.72.2.2.0.

Regards

0 件の賞賛
返信

2,360件の閲覧回数
lavusedu
Contributor II

I tried using the library you sent by replacing the path in `/etc/vulkan/icd.d/`. In my benchmark even with this library the problem was still persistent - the GPU took the same amount of time to complete the benchmark with all three `VIV_MGPU_AFFINITY=0`, `VIV_MGPU_AFFINITY=1:0` and `VIV_MGPU_AFFINITY=1:1`.

It no longer segfaulted with `VIV_MGPU_AFFINITY=1:2`.

However, putting some load on the GPU and computing my benchmark multiple times in a row resulted in a kernel driver crash and kernel logging a memory dump. This does not happen with the current driver at all.

0 件の賞賛
返信

2,352件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

you should not use export VIV_MGPU_AFFINITY=1:2.  It is not valid config.  Last time, I asked you to do it just for testing purpose to see if gpu can go to independent mode.  Forget this.

1. with my libvulkan.so,  do you experience gpu dump with all these three configs, VIV_MGPU_AFFINITY=0.0, VIV_MGPU_AFFINITY=1:0 and VIV_MGPU_AFFINITY=1:1 ? 

It doesn't make sense since I only changed one line to let vulkan to be in independent mode when

VIV_MGPU_AFFINITY=1:0 or VIV_MGPU_AFFINITY=1:1

.2. what is your benchmark application?

Regards

 

0 件の賞賛
返信

2,342件の閲覧回数
lavusedu
Contributor II

Ok, nevermind the gpu dump. My main problem is that the GPU performs the same in both the combined and the independent mode when running Vulkan compute. On OpenCL it performs better in combined mode (because it uses both chips), but in Vulkan the combined mode is as slow as if it was only using one chip.

0 件の賞賛
返信

2,346件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

If you have anymore question please enter a case in www.nxp.com

Regards

 

0 件の賞賛
返信