We do not have "official experimental data". Difference in performance between 32-bit and 64-bit bus interfaces greatly depends on the application. For example, if the core is the only master accessing the DDR memory, the memory bus throughput is slightly affected (meaning the difference is less than 10%). For reference, typical DDR read access at 1600MT/s takes 26 bus clocks for 64-bit case, 30 bus clocks for 32-bit case.
The worst case is when DMA is the only master on DDR interface, data cycles (bursts) can be back-to-back on the bus, performance of the DDR interface in 32-bit case can be theoretically 50% of the 64-bit case.
In a typical case, when multiple masters access the memory concurrently, the difference in performance is somewhere between these numbers.
If you feel that expected data traffic through the memory is relatively high and the application has timing constraints for data processing, we typically recommend to run tests on a development board close/similar to your potential application with both bus configurations to clarify if the DDR bus width is a bottleneck.
Regards,
Bulat