CAN is inherently a 'broadcast' protocol, with guaranteed delivery. In a properly configured system ALL slaves respond to ALL messages at the 'bus' layer (but can be filtered by higher-layers). For a 'good' transfer, ALL slaves will return ACK. That being said, 'message delivered' requires that 'at least' one slave return ACK, and NO slave return NAK (lest all receivers discard the message, with MASTER to send a retry). So CANUSB can indeed supply an ACK for all messages from any other TX. You might look to see if you have loopback(LPB) or 'listen only'(LOM) in your nodes --- these will prevent normal ACK generation.
You might also try all these functional experiments at a MUCH lower speed, and 'work up' to your full speed. The fully distributed arbitration and semi-synchronous reception and response-slot-allocation require some pretty tough physical requirements, all of which are eased at a slower speed. If you need to go faster than 125K, there are no shortcuts --- you must fully understand the bit timing elements and external propagation delays with bus-length limits.
I had asked previously what your ultimate CAN-peripheral clock source is in each node --- I still think that would be important to know here!