Hello Jing,
I read the document you mentioned, and it was not really any help the only "step by step" part was for LPC mcu's. I figured it out on my own however. The K64F mcu also needs the trace clock enabled, which there is no mention of in the document you mentioned. I will include what I did here in case anybody else has the same issue.
In the clock initialization part of the code add the following define
#define SIM_TRACE_CLK_SEL_CORE_SYSTEM_CLK 1U /*!< Trace clock select: Core/system clock */
Then enable the trace clock with the line below
/* Set debug trace clock source. */
CLOCK_SetTraceClock(SIM_TRACE_CLK_SEL_CORE_SYSTEM_CLK);
Lastly enable the clock for port A, and setup pin 36 for SWO trace.
/* Port A Clock Gate Control: Clock enabled */
CLOCK_EnableClock(kCLOCK_PortA);
/* PORTA2 (pin 36) is configured as TRACE_SWO */
PORT_SetPinMux(PORTA, 2U, kPORT_MuxAlt7);
The above still does not work with the multilink FX, as far as getting interrupt traces, but will allow to debug. However if using a j-link it all works.