I think using the Cycle Counter in SWO Counters in MCUXpresso is the correct way to compare the two versions.
This function is able to measure the number of cycles between break points.
I am refer ARM SWO Performance Counters , measure the CRC-16/CCIT-FALSE(line 197 and 198) of crc example in MCUXpresso SDK_2.4.1_FRDM-K64F. That hardware-implemented CRC calculation takes 201 Cycle counter(120MHz).

Would you please show us the software and hardware CRC function measure result images?
Maybe you can attached your test project here, so that we can check it.
Best Regards,
Robin
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------