Thanks for getting back to me on this!
Below is a typical frame with details of the vertex count and primitive count. Both are reasonably stable across frames, except for a few frames where the input primitives are '12' and the 'trivially reject count' is zero. Either way, the counts are much, much lower than 300-500k.
However, I don't know how to see the bytes per vertex; is it possible that that is very high for some reason?

With respect to 'CPU-side culling', do you know how I could check that? We aren't doing any graphics programming directly - we are drawing using Qt controls. Do you know anything we should check about our use of Qt?
Thanks,
Tony