Problem compiling onnx model using GLOW compiler: constant not found

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Problem compiling onnx model using GLOW compiler: constant not found

2,556 Views
LuisMedina-08
Contributor II

I'm trying to compile a custom onnx model to use it on an IMX RT1060 but when I try to compile using the command line

model-compiler.exe -model=VSGmodel\politicaVSG.onnx -model-input=Estado,float,[1,6] -emit-bundle=source -backend=CPU -target=arm -mcpu=cortex-m7 -float-abi=hard

I get an error that reads "could not find constant with name 278" and I think is related to the issue metioned in [Reshape] Allow the shape to be computed from constants · Issue #3827 · pytorch/glow 

The problem seems to be a gather operation in my model and I don't really know if there is a way to solve the issue or if it is an incompatibility with GLOW. My guess is that the GLOW toolkit available form NXP is based on an old version of GLOW where the issue #3837 had not yet been fixed.

Any thoughts on how I could solve this issue? 

I attach my onnx model (compressed in a zip) if it helps

Labels (1)
Tags (2)
0 Kudos
7 Replies

2,530 Views
david_piskula
NXP Employee
NXP Employee

Hello @LuisMedina-08,

I believe that our GLOW version is not so old that this fix would not be included. Usually, the last official GLOW commit available is taken and ported to the SDK and there have been several SDK releases since then. I will check with the developers which commit was used in SDK 2.10 exactly.

However, it does seem like the behavior is very similar to the issue you linked to and I couldn't find anything else about it either. Can you provide more information about the model structure? Do you know exactly which operation at which location inside the graph is causing this error or were you just guessing with the gather operation?

Best Regards,

David

2,525 Views
LuisMedina-08
Contributor II

Thanks for your prompt reply. When loading the model on Netron, the constant not found by the compiler seems to be the output of a "Where" node (precisely node "Where_186" in the graph), which along with other nodes previous to it, is the result of translating a gather operation from my original Pytorch model to an equivalent set of operations in the ONNX model

The structure of the model is basically a multi layer ANFIS and if it helps, the gather operation is used to expand a tensor of activation degrees to combine it with a tensor of consequents of fuzzy rules.

From the issue I linked in the original post and similar issues listed in the Glow repository, the problem seems to be non constant shape tensors, but last year a solution was found and it was soon implemented, so the problem should not persist. 

Perhaps the issue I have with my model is not the same, but I don't have enough information to know better.

0 Kudos

2,497 Views
david_piskula
NXP Employee
NXP Employee

Hello @LuisMedina-08,

I had a call with one of our engineers who provided me with some useful information about the older issue that seems similar to yours.

The most important thing is that GLOW is a static compiler and therefore doesn't work with dynamic shapes and values and requires constants everywhere.

In the case of the Reshape issue you linked, the reshape computation itself was dynamic but all of the inputs of the computation were static. The issue was solved by adding a constant folding mechanism, which allows GLOW to compute the dynamic value during compilation.

Now, your issue seems similar in nature, however, the constant folding fix is already present in the SDK GLOW (the engineer confirmed this by testing a model with the same reshape computation as mentioned in the Reshape issue). I am not familiar with the Where operator but if your problem were exactly the same, the model would compile successfully.

(Btw. We checked and the Where operator is supported by the GLOW included in our SDK.)

So, in order to move forward, I think we need to determine the following:

  1. Is your model static or dynamic? If it's dynamic, it won't work with GLOW.
  2. Are there any dynamically computed values in the model? (like the computations of shapes)
  3. If yes, do they have static inputs? If so, perhaps the issue could be solved the same way the Reshape issue was solved.
  4. If yes, but some of the inputs are not static, would it be possible to change the model architecture so that they are static instead?

I have no experience with ANFIS networks. Perhaps there is some kind of mechanism in their architecture, which makes them incompatible with GLOW.

Regards,

David

0 Kudos

2,492 Views
LuisMedina-08
Contributor II

Thanks for the insight, I was able to modify the model to always use tensors of the same size and it solved the issue of the missing constant, but also revealed another issue which has to do with the "GatherElements" operation in the ONNX model. Now when I run the command to compile the model, I get an error that reads "Error Code: MODEL_LOADER_USUPPORTED_OPERATOR" and "Error Message: Failed to load operator GatherElements".

image.png

image.png

According to https://github.com/pytorch/glow/pull/5285, this should no longer be a problem, since support for such operator was added since May 10th of this year. Any ideas on what could be the issue? 

0 Kudos

2,467 Views
david_piskula
NXP Employee
NXP Employee

Hello @LuisMedina-08,

the GLOW in our SDK is based on a late June release, however, the pull request you linked to adds only partial support for the operator:

  1. it's adding support specifically to the torch.gather operator
  2. it's adding support for the interpreter backend of GLOW, whereas we use the CPU backend of GLOW instead

Could you try replacing the GatherElements with a different Gather operator, like Gather or GatherND or would that not work?

Regards,

David

0 Kudos

2,464 Views
LuisMedina-08
Contributor II

Unfortunately I can't change the exact Gather operator the ONNX model implements, as it is chosen by Pythorch during the translation of the original model to the ONNX format. In view of this I assume the only option is to wait for further updates of the GLOW compiler to eventually add compatibility with such operator. Thanks for the support.

0 Kudos

2,440 Views
david_piskula
NXP Employee
NXP Employee

I see, in that case, yeah, you will have to wait until support for this operator is implemented. Perhaps you could raise the issue on GitHub and let them know that there is a need for this operator so that they might decide to prioritize it over others.

Good luck,

David

0 Kudos