Kinetisマイクロコントローラ・ナレッジ・ベース

ディスカッション

上篇详细的介绍了加密锁定Kinetis的一种方法，本篇再接再厉，给大家再介绍一种加密方法（哎，这点家底都晒出来了）。当然实际上原理还是不变的，即还是通过修改0x400~0x40F地址段的内容来实现加密锁定，万变不离其宗，所谓殊途同归罢了，下面好戏登台：既然实现security最终都是改写寄存器加载段flash地址的内容，那实际上修改flash内容的方式还是灵活多变的，方案一中提到的在中断向量表的最后添加flash配置信息只是其中一种，那还有哪些呢？还是不摆谱了，小心被拍砖，哈哈。不错，那就是通过在指定地址定义常量的方法，当然定义常量大家都会用到（有些应用譬如LCD显示的字模或者一些固定的查找表为节省RAM空间我们一般会选择定义const常量的方法将它们存放到flash空间中），但是指定地址的存放方式用的会少些（一般都是让编译器自动分配的），如果我们非要指定地址呢（哎，强迫症又开始了，呵呵），即将flash配置信息作为常量强制指定存放到0x400起始的地址，那岂不是跟方案一有了异曲同工之妙了，好吧，这样的话那就该“@”这位老兄上场了（咳咳，可不是给单片机发email啊，呵呵），相信很多人到此处就都明白了。下面我仍然以IAR环境下锁定K60为例，简单介绍下方案二的使用步骤： 1. 打开待加密工程中的main.c文件，在其中的main函数之前以添加如下图所示常量定义，即将FlashConfig数据组数据存放到“.flashConfig”段中，其中FlashConfig[11]即为0x40C地址： 2. 至于这个.flashConfig段属性是需要在与该工程匹配的IAR连接文件（.icf文件）中人为添加定义的，如下图所示，需要添加三个部分，然后保存： 3. 前两步完成之后，其实需要添加的部分就已经完成了，但是还有特别重要的两点需要注意，这里我加红注释一下，如下：（1）采用方案二的情况，需要确保vectors.c中中断向量表最后的16个字节没有被添加，即不能有4个CONIFG_x配置信息的，否则会出现编译错误，因为这就涉及到两者冲突的问题，也就是说在采用方案一的话就不能采用方案二，同理，采用方案二的话也不能采用方案一，总之两者不能同存；（2）还需要考虑编译器优化的问题，因为我们在.flashConfig段定义了常量，但是在代码程序里却没有使用它，这种情况下编译器会直接把这段常量优化掉，所以我们做的工作算是白做了，即使我们在IAR的优化等级中设置成low或者none都不行，因为人家编译器认死理儿，反正你也没有使用它，我就是怕它pass掉，这下子伤心了，呵呵。还好IAR给我们留了条后路，在options->Linker->Input选项卡中提供了Keep symbol功能，如下图，将FlashConfig添加进去即可强制编译不优化它，这样目的就达到了，呵呵，看来还是天无绝人之路啊有木有。 3. 编译通过，下载调试，程序下载之后同样会出现进入不到调试窗口的现象，这个是正常现象，因为这个时候芯片就已经被security了，这样就可以放心量产了，呵呵~ 希望这两篇系列文章能对大家有所帮助，enjoy it~

The Kinetis K70 MCU family includes 512KB-1MB of flash memory, a single precision floating point unit, Graphic LCD Controller, IEEE 1588 Ethernet, full- and high-speed USB 2.0 On-The-Go with device charge detect, hardware encryption, tamper detection capabilities and a NAND flash controller. 256-pin devices include a DRAM controller for system expansion. The Kinetis K70 family is available in 196 and 256 pin MAPBGA packages. For more information visit www.freescale.com/kinetis

中文版本：在KL25的官方Demo 源代码中只有I2C驱动的PE代码而没有I2C驱动的baremental代码，对于不习惯用PE生成代码的用户直接上手有难度，于是考虑将K60的 I2C baremental 驱动代码中移植到KL25上，以供大家参考。但在移植过程中遇到了两个比较典型的问题，所以这里分享出来，希望能帮助遇到同样问题的用户迅速定位并解决问题。测试硬件：TWR-K60D100M开发板 K60+MMA8451（MMA8451为三轴加速传感器，与K60通过I2C总线连接。K60作为master，MMA8451作为slave） FRDM-KL25Z开发板 KL25+MMA8451 开发环境：IAR 6.6 1.问题描述：配置I2Cx_F寄存器MULT位不为0时，Repeat start信号无法产生问题提出： K60示例代码（如附件1）中I2C demo的功能是通过I2C接口读取板载的加速度传感器MMA8451的数据，并且I2C数据控制采用查询ACK标志位的方式，在TWR-K60D100M开发板上运行该Demo一切正常。使用几乎相同的I2C驱动代码，在FRDM-KL25Z开发板上执行发现：程序总是停在如下Function 1的红色字体行i2c_wait(I2C0_B)，进入这个函数内部，它实际上是停在while((p->S & I2C_S_IICIF_MASK)==0)，一直等待传输完成的中断标志IICIF置位。 Function 1. u8 hal_dev_mma8451_read_reg(u8 addr) { u8 result; i2c_start(I2C0_B); i2c_write_byte(I2C0_B, I2C_ADDR_MMA8451 | I2C_WRITE); i2c_wait(I2C0_B); i2c_get_ack(I2C0_B); i2c_write_byte(I2C0_B, addr); i2c_wait(I2C0_B); i2c_get_ack(I2C0_B); i2c_repeated_start(I2C0_B); i2c_write_byte(I2C0_B, I2C_ADDR_MMA8451 | I2C_READ); i2c_wait(I2C0_B); i2c_get_ack(I2C0_B); i2c_set_rx_mode(I2C0_B); i2c_give_nack(I2C0_B); result = i2c_read_byte(I2C0_B); i2c_wait(I2C0_B); i2c_stop(I2C0_B); result = i2c_read_byte(I2C0_B); pause(); return result; } Function 2. void i2c_wait(I2C_MemMapPtr p) { while((p->S & I2C_S_IICIF_MASK)==0) ; // wait flag p->S |= I2C_S_IICIF_MASK; // clear flag } 原因分析：初步判断可能是上一步数据的传输 i2c_write_byte()没有完成，导致IICIF未能被置位。于是通过示波器去捕捉这个过程，发现在执行 i2c_repeated_start(I2C0_B)时，KL25并没有产生一个 Repeat start信号。经过一番谷哥和度娘，终于在Kinetis L的Errata中找到了答案：Repeat start cannot be generated if the I2Cx_F[MULT] field is set to a non-zero value. 这也就意味着，当 I2Cx_F[MULT]位被设置为非0值时，I2C Master不能产生一个Repeat start信号。而在应用程序的I2C初始化I2C_init()代码中，我恰好设置I2Cx_F[MULT]=01，这正好是符合了Errata描述的错误产生的条件。解决方案： I2C的C1寄存器中MULT位是I2C SCL时钟的倍乘因子，用于控制I2C的波特率。为解决上面的问题，FSL官方提供了两种workaround的办法： 1）如果repeat start必须产生时，配置 I2Cx_F[MULT]为0； 2）在置位 I2Cx_F register (I2Cx_C1[RSTA]=1)的Repeat START产生位之前临时设置 I2Cx_F [MULT]，然后再在repeated start信号产生后恢复I2Cx_F [MULT]位的设置。按照第一种方法，我修改程序中I2Cx_F[MULT]的设置从01到00，然后程序在FRDM-KL25Z 开发板上运行正常，能正常读取板载的加速度传感器MMA8451的数据。 2.问题描述： I2C单字节读取时序问题问题提出：在上面的Function 1中， KL25读取MMA8451的基本过程是：发送要访问的从机地址及对从机的写命令->发送要访问的从机的寄存器地址->发送Repeat Start信号到从机->发送要访问的从机地址及读命令->读取从机返回的数据，如下Figure1 MMA8451的单周期读时序图所示，其过程和上面代码的描述一致。但是有一点值得注意的是Figure 1中红色方框部分，按照Figure 1的表述，Master是在从Slave从机读取DATA[7:0]之后返回NAK信号的，用于指示本数据是Master要接收的最后一个DATA，最后发送stop signal终止数据的传送。按照这个思路得到的KL25的程序代码如下Section 2，它首先去读取从机返回的数据 i2c_read_byte(I2C0_B)，然后发送NACK信号到从机i2c_give_nack(I2C0_B)。然而从KL25实际的物理时序的角度看，这个顺序是错误的，正确的应该是如下Section 1，应该在读取从机返回的数据 i2c_read_byte(I2C0_B)之前，首先发送NACK信号到从机i2c_give_nack(I2C0_B)。 Section 1. i2c_set_rx_mode(I2C0_B); i2c_give_nack(I2C0_B);----line1 result = i2c_read_byte(I2C0_B);----line2 i2c_wait(I2C0_B);----line3 i2c_stop(I2C0_B);----line4 result = i2c_read_byte(I2C0_B);----line5 Section 2. i2c_set_rx_mode(I2C0_B); result = i2c_read_byte(I2C0_B);- i2c_wait(I2C0_B); i2c_give_nack(I2C0_B);- i2c_stop(I2C0_B); 原因：主机发送的NACK信号只有在下一个数据接收之后才会被push到总线上，KL25的RM手册中的描述为the No acknowledge signal is sent to the bus after the following receiving data byte (if FACK is cleared)。具体分析：按照两个时序分别做了一个测试，并用示波器捕捉了相应的波形：执行Section 1的代码得到的波形如下Figure 2所示，NACK(1)信号刚好在第9个pluse脉冲上升沿被push总线上，然后在Stop信号后总线处于idle状态（SCL和SDA均为高）。执行Section 2的代码得到的波形如下Figure 3所示，ACK(0)信号在第9个pluse脉冲上升沿被push总线上，说明后面还有数据要传输，一直处于等待MMA8451数据的再次传送中，这明显违背了读取单字节数据的原本意图。总之，KL的I2C应用中Section 1的代码操作顺序是正确的，实际的物理时序和 Figure 1的示意图时序是不一样的，这点需要特别注意。 Figure 1. MMA8451's 单周期读时序示意图 Figuire 2. Section 1 代码对应的时序 Figure 3. Section 2 代码对应的时序为方便大家验证这些问题，我这里在附件中一并上传了K60的I2C的示例代码，KL25的示例代码，以及Kinetis L关于I2C的Errata。 —————————————————————————————————————————————————————————————————————— English Version： Recently, I migrate the K60’s I2C demo code to the KL25, but found it can't works when the same demo code runs on FRDM-KL25Z board while it runs well on the K60 board. After a painful struggling, I finally get the cause, so here I make a record, wish it could be helpful when other users happen to meet same problem. Repeat start can't be generated when configure I2Cx_F[MULT] to non-zero The K60’s demo( the attached 1) is to communicate with the onboard accelerometer MMA8451 by I2C, and in the demo it finish a data transmission by quering I2C’s flag bit. With almost same code, it always stops at below Function 1's red line i2c_wait(I2C0_B), also this function's defination is shown as below Function 2, it stops at while((p->S & I2C_S_IICIF_MASK)==0) to wait IICIF flag. Function 1. u8 hal_dev_mma8451_read_reg(u8 addr) { u8 result; i2c_start(I2C0_B); i2c_write_byte(I2C0_B, I2C_ADDR_MMA8451 | I2C_WRITE); i2c_wait(I2C0_B); i2c_get_ack(I2C0_B); i2c_write_byte(I2C0_B, addr); i2c_wait(I2C0_B); i2c_get_ack(I2C0_B); i2c_repeated_start(I2C0_B); i2c_write_byte(I2C0_B, I2C_ADDR_MMA8451 | I2C_READ); i2c_wait(I2C0_B); i2c_get_ack(I2C0_B); i2c_set_rx_mode(I2C0_B); i2c_give_nack(I2C0_B); result = i2c_read_byte(I2C0_B); i2c_wait(I2C0_B); i2c_stop(I2C0_B); result = i2c_read_byte(I2C0_B); pause(); return result; } Function 2. void i2c_wait(I2C_MemMapPtr p) { while((p->S & I2C_S_IICIF_MASK)==0) ; // wait flag p->S |= I2C_S_IICIF_MASK; // clear flag } Then what's the matter? when I capture the I2C's wave form, found it didn't generate a Repeat start signal when excute i2c_repeated_start(I2C0_B); After a struggle, In the Kinetis L's Errata do I find the answer: Repeat start cannot be generated if the I2Cx_F[MULT] field is set to a non-zero value. That means there is a bug in KL's design, if the I2Cx_F[MULT] field is set to a non-zero value, the I2C master can't generate a Repeat start signal. Coincidentally, in the I2C_init function I happen to set theI2Cx_F[MULT]=01, so it just meets the I2C's Errata. Considering the MULT bits define the multiplier factor mul. and used along with the SCL divider to generate the I2C baud rate. In the Errata, FSL gives two possible workarounds: 1) Configure I2Cx_F[MULT] to zero if a repeat start has to be generated. 2) Temporarily set I2Cx_F [MULT] to zero immediately before setting the Repeat START bit in the I2C C1 register (I2Cx_C1[RSTA]=1) and restore the I2Cx_F [MULT] field to the original value after the repeated start has occurred. To verify it easily, I revise the I2Cx_F[MULT] from 01 to 00. After that the same code runs well on FRDM-KL25Z board. 2. The Timing Sequence Of I2C's single byte Reading In the above Function 1, there are a MMA8451 data read section like below after Write Device Address->Write Register Address->Repeat Start->Write Device Address, and these steps is same as MMA8451's single byte read Timing Sequence requirment which is shown as below Figure 1. But referring to Figure 1, it looks like Section2 we should first excute below line2 to read the data, and then line1 give a nack to suggest it's the last data, at last excute line4 to send a I2C stop signal. But unfortunately the idea is wrong, because in the phasical timing sequence the No acknowledge signal is sent to the bus after the following receiving data byte (if FACK is cleared) ,which means we need to give NACK signal before a read. And the captured wave form is like below Figure 2, you can find the NACK in the Ninth pluse, while the captured wave form is like below Figure 3 if excute Section 2 code instesd of Section 1 code, you can find the ACK in the Ninth pluse. it means the master will read another data, but the original intention is to read only one byte, so the I2C bus blocks. In a word, the section 1 code is right, the physical timing is different from the Figure 1's sketch map. Section 1. i2c_set_rx_mode(I2C0_B); i2c_give_nack(I2C0_B);----line1 result = i2c_read_byte(I2C0_B);----line2 i2c_wait(I2C0_B);----line3 i2c_stop(I2C0_B);----line4 result = i2c_read_byte(I2C0_B);----line5 Section 2. i2c_set_rx_mode(I2C0_B); result = i2c_read_byte(I2C0_B);- i2c_wait(I2C0_B); i2c_give_nack(I2C0_B);- i2c_stop(I2C0_B); Figure 1. MMA8451's single byte read Timing sketch map Figuire 2. Section 1 code's Timing Figure 3. Section 2 code's Timing

Introduction Even with the prevalence of universal asynchronous receiver/transmitter (UART) peripherals on microcontrollers (MCUs), bit banged UART algorithms are still used. The reasons for this vary from application to application. Sometimes it is simply because more UARTs are needed than the selected device provides. Maybe application or layout restrictions require certain pins to be used for the UART functions but the device does not route UART pins to the required package pins. Maybe the application requires a non-standard or proprietary UART scheme. Whatever the reason, there are applications where a bit banged UART is used and is typically a pure software implementation (a timer is used and the MCU core controls a GPIO pin directly). A better alternative may be to use Flextimer (FTM) or Timer/PWM Module (TPM) to take advantage of the features of these peripherals and possibly offload the CPU. This document will explain and provide a sample application of how to emulate a UART using the FTM or TPM peripheral. A Kinetis SDK example (for the TWR-K22F120M and FRDM-K22F platforms) and a baremetal legacy code example (for the FRDM-KL26Z) are provided here. UART protocol Before creating an application to emulate a UART, the UART protocol and encoding must be understood. The UART protocol is an asynchronous protocol that typically includes a start bit, payload (of 7-10 data bits), and a stop bit but does allow for many variations on the number of stop bits and what/how to transfer the data. For this document and application example, the focus will be UART transmission that follows 1 start bit, 8 data bits, 1 stop bit, no parity, and no flow control. The data will be transmitted least significant bit (LSB) first. The following image is a block diagram of this transmission. However, this doesn't specify what the transmission looks like electrically. The figure below shows a screenshot of an oscilloscope capture of a UART transmission. The data transmitted is 0x55 or a "U" in the ASCII representation. Notice that the transmission line is initially a logic high, and then transitions low to signal the start of the transmission. The transmission line must stay low for one bit width for the receiver to detect it. Then there are 8 data bits, followed by 1 stop bit. In the case shown above, the data bits are 0x55 or 0b0101_0101. Remember that the transmissions are sent LSB first, so the screenshot shows 1-0-1-0-1-0-1-0. The last transition high marks the beginning of the stop bit and the line remains in that state until the start of the next transmission. The receiver, being asynchronous, does not require any type of identifying transition to mark the end of the stop bit. FTM/TPM configuration The first question many may ask when beginning a project like this is "How do I configure the FTM/TPM when emulating a UART". The answer to this depends on the aspect of this problem you are trying to solve. Transmitting and receiving characters require two different configurations. Transmission requires a configuration that manipulates the output pin at specific points in time. Receiving characters requires a configuration that samples the receive pin and measures the time between pin transitions. The FTM and TPM have the modes listed in the following table: The FTM and TPM have four different modes that manipulate an output: Output compare (no pulse), Output compare (with pulse), Edge-aligned PWM, and Center-aligned PWM. Neither PWM mode is ideal for the requirements of the application. This is because the PWM modes are designed to produce a continuous waveform and are always going to return to the initialized state once during the cycle of the waveform. However, the UART protocol may have continuous 1's or 0's in the data without pin transitions between them. The output compare mode (high-true or low-true pulse modes) is designed to only manipulate the pin once, and only produces pulses that are one FTM/TPM clock cycle in duration. So this is obviously not desirable for the application. The output compare mode (Set/Clear/Toggle on match) is promising. This mode manipulates the output pin every cycle. There are three different options: clear output on match, set output on match, and toggle output on match. Neither "clear output on match" nor "set output on match" are ideal as either would require configuration changes during the transmission of a character. The "toggle output on match", however, can be used and is the selected configuration mode for this sample application. To receive characters, there is only one mode that is intuitive: "the input capture mode". This mode records the timer count value on an edge transition of the selected input pin. Similar to the output compare mode chosen for the transmit functionality, the input capture mode has three sub-modes: capture on rising edge, capture of falling edge, and capture on either edge. It is clear from the descriptions that capture on either edge should be selected. Transmit encoding The selection of the FTM/TPM mode is moderately intuitive, but using this mode to emulate a UART transmission is not. There are two issues that make this a little tricky. 1) The output pin is initialized low. However, the UART protocol needs the pin to begin in a logical high state. 2) The pin transitions on every cycle provided the channel value is less than the value of the MOD register. Due to continuous strings of 1's or 0's, it is necessary to have periods where the pin does not transition. Both of these points have workarounds. Output pin initialization For the first issue, the channel interrupt is first enabled and the channel value register is loaded with a value much less than the value in the MOD register. Then in the channel interrupt service routine, the pin is sampled to ensure that it is in the logic high state and the channel interrupt is disabled (and will not be re-enabled throughout the life of the application). The code for this interrupt service routine is as follows. Output pin control For the second issue, a method of not transitioning the pin value while allowing the timer to continue counting normally is necessary. The Output Compare mode uses the channel value register to determine when the pin transition occurs. If a value greater than MOD is written to the channel value register, the channel value will never match the count register and thus, a pin transition will never occur. So, when a series of continuous 1's or 0's need to be transmitted, a value greater than the value in the MOD register can be written to the channel value register to keep the output pin in its current state. However, when a value greater than MOD is written to the channel value register, no channel match will occur (which means channel interrupts will not occur). So the timer overflow interrupt must be used to continue writing values. This requires the updates to be output pin to be planned ahead of time and makes the transmission algorithm a little tricky. The following diagram displays when which values should be written to the channel value register at which points in time to generate the appropriate pulses. Writing a function to translate a number into the appropriate series of MOD/2 and MOD+1 values can be a little tricky. To do this, we must first notice that MOD/2 needs to be written when changes on the transmission pin are need and MOD+1 needs to be written when pin transmissions are not desired. So, what logical function can we use to determine when a change has happened? XOR is the correct answer. So what two values need to be XOR'd together? One value is obviously the value that we want to send. But what is the second value? It turns out that the second value is a shifted version of the value that we want to send. Specifically, the second value is the desired value to send shifted to the left by one. (You can think of it as sort of a "future" value of the desired value). The following pictures show how to determine the queue to use for the transmission. Receive decoding The receive functionality has an advantage over the transmit functions in that it is possible to use DMA for the reception of characters. This is because the receive function takes advantage of the input capture functionality of the FTM / TPM and therefore can use the channel match interrupt. The example application provided with this document implements a DMA method and a non-DMA method for reception. First, the non-DMA method will be discussed. Before discussing the specifics of gathering the input pulse widths, some details of the receive pin need to be discussed. Detecting the start bit The receive pin needs to be able to determine when the start of the packet transmission begins. To do this, the receive pin is configured as an FTM / TPM pin. At the same time, the GPIO interrupt functionality is configured on the same pin for a falling edge interrupt. The GPIO interrupt capabilities are enabled in any digital mode, so the GPIO interrupt will still be able to be routed to the Nested Vector Interrupt Controller (NVIC). The pin interrupt is used to start the FTM / TPM clock when a new character reception begins. In the GPIO interrupt for this pin, the FTM / TPM counter register is reset and the clock to the FTM / TPM is turned on. The code for the GPIO interrupt service routine is shown below. Receiving characters without DMA Now, when receiving characters and not using DMA, the first thing to understand is that the Interrupt Service Routine (ISR) will be used and it will mainly be used to record the captured count values. The interrupt service routine also tracks the current receive character length and resets the counter register. This is so that the values in the receive queue reflect the time since the last pin transition. The interrupt function for the non-DMA application is shown below. Notice that the first two actions in the ISR are resetting the count register, and clearing the channel event interrupt flag. Then the channel value is stored in the receive pulse width array (this is simply an array that holds the receive pulse widths of the current character being received). Next, recvQueueLength, the variable which holds the current length of the character being received, is updated to reflect the latest character length. The next step is to determine if the full character has been received. This is determined by comparing recvQueueLength to the RECV_QUEUE_THRESH, which is the threshold as determined by multiplying the number of expected bits by the expected bit width plus another bit width (for the start bit). If the recvQueueLength is greater than the RECV_QUEUE_THRESH, then a semaphore is set, recvdChar, to indicate that a full character has been received. The FTM / TPM clock is turned off, and the pin interrupt functionality of the receive pin is enabled. The final step in the interrupt routine is to increment the receive queue index, recvQueueIndex. This variable points to the current entry in the receive queue array. Using DMA to receive characters When using DMA, the receive FTM / TPM interrupt is much different. The interrupt routine simply needs to clear the channel interrupt flag, stop the FTM / TPM timer, disable the DMA channel, and set the received character semaphore. The character is then decoded outside of the interrupt routine. The interrupt function when using DMA is shown below: Decoding the received pulse widths Once the array of pulse widths has been populated, the received character needs to be translated into a single number. This varies slightly when using DMA and when not using DMA. However, the basic principle is the same. The number of bits in a single entry is determined by dividing by the expected bit width and this is translated into a temporary array that contains 1's and 0's, and then that is used to shift in the appropriate number of 1's and 0's into the returned char variable. A temporary array is needed because the values are shifted into the UART LSB first, so the bit must be physically flipped from the first entry to the last. There is no logical operation that will do this automatically. The algorithm to perform this translation is shown below. In this algorithm, note that recvPulseWidth is the array that contains the raw count value of the pulse width. The array tempRxChar holds the decoded character in reverse order and rxChar is a char variable that holds the received character. Conclusion This document provides an overview of the UART protocol and describes a method for creating a software UART using the timing features of the FTM or TPM peripheral. This method allows for accurate timing and while not relying entirely on the CPU and the latency associated with the interrupt and the GPIO pins. The receive function is open to further optimization by using DMA, which can provide further unloading of the CPU.

The attached zip file contains software that accompanies the document UART Emulation Using the FTM or TPM. It contains two sample applications: one that uses the TPM, and one that uses the FTM. The TPM example targets the FRDM-KL26Z development board and is written in baremetal code. The FTM example targets the TWR-K22F120M and FRDM-K22F and is written using the Kinetis SDK 1.0 release. Installation instructions are contained within the zip package. Unzip the package to an empty folder and then copy the appropriate folders to the the appropriate locations on your PC per the instructions located in the zip file.

Curve22519 is a Montgomery elliptic-curve. Such as Apple HomeKit, most of network and IoT software use it in Diffie-Hellman algorithm for key exchanging. On the Security Kinets MCU chip,if we use just the software algorithm (base on mbedTLS), Curve25519 will spend 180ms for calculation of the shared security. It is faster than other 256bit elliptic-curve with software algorithm, Because of the shared security calculation will take more than 1200ms with a Weierstrass’s BP256R1curve when use software algorithm. With LTC ECC HW acceleration, it take only 16ms to calculate the shared security on 256bit elliptic-curve. Whatever you do, the speed of hardware acceleration always faster than the software algorithm. Now that we should also want to use the LTC to accelerate the Curve22519. The LTC, however, only supported Weierstrass form curve, but Curve22519 is a Montgomery curve… Although, we can't use LTC in Curve22519 directly, we can use it by mapping it to a Weierstrass form to use it. As below, we gave parameters of these curves, transform formulas, example code and test result to show how and why to do it. 1. Curve parameter: Cuvre22519 in Montgomery form: Y^2 = X^3 + A*X^2 + X Fp = 0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffed A= 486662 Gx = 9 Gy = 0x20ae19a1b8a086b4e01edd2c7748d14c923d4d7e6d7c61b229e9c5a27eced3d9 Order of G point = 0x1000000000000000000000000000000014def9dea2f79cd65812631a5cf5d3ed Cuvre22519 in Weierstrass form : Y^2 = X^3 + a*X + b Fp = 0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffed a = 0x2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa984914a144L b = 0x7b425ed097b425ed097b425ed097b425ed097b425ed097b4260b5e9c7710c864L Gx = 0x2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaad245a Gy = 0x20ae19a1b8a086b4e01edd2c7748d14c923d4d7e6d7c61b229e9c5a27eced3d9 Order of G point = 0x1000000000000000000000000000000014def9dea2f79cd65812631a5cf5d3ed 2. Calculation formula: x_w – x-coordinate value in Weierstrass form y_w – y-coordinate value in Weierstrass form x_m - x-coordinate value in Montgomory form y_m - we don’t care y-coordinate value in Weierstrass mode a_m – a coefficient of Montgomery equation ( Y^2 = X^3 + a_m * X^2 + X) a_w – a coefficient of Weierstrass equation ( Y^2 = X^3 + a*X + b ) b_w – a coefficient of Weierstrass equation ( Y^2 = X^3 + a*X + b ) a) x_w = (x_m + a_m/3) % p b) y_w ^2 = x_w ^ 3 + a_w*x_w + b_w c) x_m = (x_w - a_m/3) % p You could reference these document as below: https://en.wikipedia.org/wiki/Curve25519 https://en.wikipedia.org/wiki/Montgomery_curve 3. example code: // public and private at Montgomery end #define M255_d "0x7178DAC11D42AA5F39B10A62A8584DB0C8864564ADC9DF84EC0B13D9AEC220F8" #define M255_Qx "0x3BA5048381744348D84E754B9944ABE080B37F7D4158DCE60CD79F66B98AB89E" // public and private at Weierstrass end #define WTS255_d "0x09CC5CCF43C656C1309EE5A3491D5A8361607CEEB0C9B2B31A575E0FEF2B8835" #define WTS255_Qx "0x3F4BDE110EE7AF71EF428D1018D188E35BAFB019F34F84E6465C5194B363DC2D" #define WTS255_Qy "0x7540577CE6F920354E2A9D38CE88847D7447E66FA4D188AC75CB63C17210B718" #define WTS255_Qx_TO_M255_Qx "0x14A13366643D04C74497E2656E26DE38B105056F48A4DA3B9BB1A6EA08B6B7DC" #define AM_INV3 "0x2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaad2451" int ecdh_wts_curve_end( ) { unsigned int ticks; int ret = 0; size_t blen = 0, blen_peer = 0; ecdh_context ecdh; ecdh_context ecdh_peer; // to_wts255 ecdh_context ecdh_peer_m255; mpi R; mpi_init(&R); ecdh_init( &ecdh); ecdh_init( &ecdh_peer); ecdh_init( &ecdh_peer_m255); MPI_CHK(ecp_use_known_dp( &ecdh.grp, ECP_DP_WTS25519 )); MPI_CHK(ecp_use_known_dp( &ecdh_peer.grp, ECP_DP_WTS25519 )); MPI_CHK(ecp_use_known_dp( &ecdh_peer_m255.grp, ECP_DP_M255 )); blen = set_hash_buff(/*TEST_ECP_GRP_ID*/ECP_DP_WTS25519, &secret_buf, ecp_name); if(blen == 0) { ret = -1; goto cleanup; } mpi_read_string(&ecdh.d, 16, WTS255_d); mpi_read_string(&ecdh.Q.X, 16, WTS255_Qx); mpi_read_string(&ecdh.Q.Y, 16, WTS255_Qy); mpi_lset(&ecdh.Q.Z, 1); mpi_read_string(&ecdh_peer_m255.d, 16, M255_d); mpi_read_string(&ecdh_peer_m255.Q.X, 16, M255_Qx); mpi_init(&ecdh_peer_m255.Q.Y); mpi_lset(&ecdh_peer_m255.Q.Z, 1); // map M255 point to WTS255 point my_timer_start(); mpi_read_string(&R, 16, AM_INV3); mpi_add_mpi(&ecdh_peer.Q.X, &ecdh_peer_m255.Q.X, &R); mpi_mod_mpi(&ecdh_peer.Q.X, &ecdh_peer.Q.X, &ecdh_peer_m255.grp.P); mpi_lset(&R, 3); mpi_exp_mod (&ecdh_peer_m255.Q.Y , &ecdh_peer.Q.X, &R, &ecdh_peer_m255.grp.P, NULL); mpi_mul_mpi(&R, &ecdh_peer.grp.A, &ecdh_peer.Q.X); mpi_mod_mpi(&R, &R, &ecdh_peer.grp.P); mpi_add_mpi(&ecdh_peer_m255.Q.Y, &ecdh_peer_m255.Q.Y, &R); mpi_add_mpi(&ecdh_peer_m255.Q.Y, &ecdh_peer_m255.Q.Y, &ecdh_peer.grp.B); mpi_mod_mpi(&ecdh_peer_m255.Q.Y, &ecdh_peer_m255.Q.Y, &ecdh_peer.grp.P); mpi_mod_sqrt(&ecdh_peer.Q.Y, &ecdh_peer_m255.Q.Y, &ecdh_peer_m255.grp.P); // z = 1 mpi_lset(&ecdh_peer.Q.Z, 1); MPI_CHK(ecp_copy(&ecdh.Qp, &ecdh_peer.Q)); MPI_CHK(ecdh_calc_secret_wts2mont( &ecdh, &blen, secret_buf, blen, myrand, NULL)); mpi_read_string(&R, 16, AM_INV3); mpi_sub_mpi(&ecdh_peer_m255.Q.X, &ecdh.Q.X, &R); mpi_mod_mpi(&ecdh_peer_m255.Q.X, &ecdh_peer_m255.Q.X, &ecdh_peer_m255.grp.P); ticks = my_timer_stop(); // print out message polarssl_printf("Weierstrass curve shared secutiy:\n"); mpi_printf_string( &ecdh.z, 16); polarssl_printf("%s ecdh peer to peer: %lu ticks, %d ms (%d) \n", ecp_name , ticks, ticks / (CLOCK_SYS_GetPitFreq(0) / 1000),CLOCK_SYS_GetPitFreq(0) ); cleanup: if( ret !=0 ) polarssl_printf( "%s test Unexpected error, return code = %08X\n", ecp_name, ret ); mpi_free(&R); ecdh_free( &ecdh); ecdh_free( &ecdh_peer); ecdh_free( &ecdh_peer_m255); return( 0 ); } int ecdh_mont_curve_end( ) { int verbose = 1; unsigned int ticks; int ret = 0; size_t blen = 0, blen_peer = 0; ecdh_context ecdh; ecp_point Q_peer; // peer public point ecdh_init( &ecdh); ecp_point_init( &Q_peer); MPI_CHK(ecp_use_known_dp( &ecdh.grp, ECP_DP_M255 )); blen_peer = set_hash_buff(ECP_DP_M255, &secret_buf_peer, ecp_name); if(blen_peer == 0) { ret = -1; goto cleanup; } mpi_read_string(&ecdh.d, 16, M255_d); mpi_read_string(&ecdh.Q.X, 16, M255_Qx); mpi_init(&ecdh.Q.Y); // don't care Y, only init it mpi_lset(&ecdh.Q.Z, 1); mpi_read_string(&Q_peer.X, 16, WTS255_Qx_TO_M255_Qx); mpi_init(&Q_peer.Y); mpi_lset(&Q_peer.Z, 1); MPI_CHK(ecp_copy(&ecdh.Qp, &Q_peer)); my_timer_start(); MPI_CHK(ecdh_calc_secret( &ecdh, &blen_peer, secret_buf_peer, blen_peer, myrand, NULL)); ticks = my_timer_stop(); polarssl_printf("%s ecdh peer to peer: %lu ticks, %d ms (%d) \n", ecp_name , ticks, ticks / (CLOCK_SYS_GetPitFreq(0) / 1000),CLOCK_SYS_GetPitFreq(0) ); polarssl_printf("Montogemory curve shared secutiy:\n"); mpi_printf_string( &ecdh.z, 16); polarssl_printf( "passed\n" ); cleanup: if( ret !=0 && verbose != 0 ) polarssl_printf( "%s test Unexpected error, return code = %08X\n", ecp_name, ret ); ecdh_free( &ecdh); ecp_point_free( &Q_peer); if( verbose != 0 ) polarssl_printf( "\n" ); return( 0 ); } ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ 4. Test result: Test result of curv25519 in Weierstrass form with LTC: 2. Test result of curve25519 in Montgomery form with software algorithm: We could see that the shared security both in Weierstrass form with LTC and Montgomery form are “0x1454BDCD6A94D6336AA5A76F3CB40BBE12B65A2CDC9DA6B478948906638896D1”. But the calculation speed with LTC was ten times faster than other one.

How to byte program SPI flash via QSPI QSPI module are used in many Kinetis MCU, like K8x, K27/28 and KL8x. QSPI expands the internal flash range and can run in a fast speed. Compared to DSPI, QSPI is very complex and often takes a lot of time to learn. In KSDK there are two QSPI demo which shows how to program SPI flash in DMA mode and polling mode. Both of them program the QSPI flash with a word type array. But can the QSPI module program SPI Flash in byte? Yes, this article shows how to do it. Device: FRDM_KL82Z Tool: MCUXpresso IDE Debug firmware: JLINK I build the test project base on KL82 SDK/driver_example/qspi/polling_transfer. To byte program SPI flash, a new LUT item must be added. uint32_t lut[FSL_FEATURE_QSPI_LUT_DEPTH] = {/* Seq0 :Quad Read */ /* CMD: 0xEB - Quad Read, Single pad */ /* ADDR: 0x18 - 24bit address, Quad pads */ /* DUMMY: 0x06 - 6 clock cyles, Quad pads */ /* READ: 0x80 - Read 128 bytes, Quad pads */ … … [32] = QSPI_LUT_SEQ(QSPI_CMD, QSPI_PAD_1, 0x02, QSPI_ADDR, QSPI_PAD_1, 0x18), [13] = QSPI_LUT_SEQ(QSPI_WRITE, QSPI_PAD_1, 0x1, 0, 0, 0), … /* Match MISRA rule */ [63] = 0}; This item tells system how to program a single byte. Then when we write the data to TxBuffer, we must write the byte 4 times. This is because a write transaction on the flash with data size of less than 32 bits will lead to the removal of four data entry from Txbuffer. The valid bit will be used and the rest of the bits will be discard. Then before we start programming, we must set the data size. QSPI_SetIPCommandSize(EXAMPLE_QSPI,1); After byte program, we can see the result from 0x68000000. Attachment is the demo project. You can find that 0x03 was written to 0x68000005 after running.

1. How Calibration works There are three main sub-blocks important in understanding how the Kinetis SAR module works. There is a capacitive DAC, a comparator, and the SAR engine that controls the module. Of those blocks, the DAC is most susceptible to variations that can cause linearity problems in the SAR. The DAC is architected with three sets of binary weighted capacitors arrayed in banks, as in Figure 1. The capacitors that represent the most significant bits of the SAR (B15:B11) are connected directly to the inputs of the comparator. The next bank of five capacitors (B10:B6) is connected to the top plate of the MSB array through an intentionally oversized scaling capacitor. The final six capacitors that makeup the least significant bits of the SAR (B5:B0) are correspondingly connected to the top plate of the middle bank of capacitors through another scaling capacitor. Figure 1. Arrangement of DAC capacitors Only the MSB capacitor bank is calibrated. Because the first scaling capacitor is intentionally oversized, each of the non-calibrated MSB capacitors will have an effective capacitance too small to yield accurate results. However, because they are always too small, we can measure the amount oferror that each of those capacitors would cause individually, and add that back in to the result. Calibration starts with the smallest of the LSB capacitors, B11. The SAR samples Vrefl on all of the capacitors that are lower-than or equal-to the capacitor under test (CUT), while connecting all of the smaller capacitors to Vrefh. The top plate of all of the MSB capacitors is held at VDDA while this happens. After the sampling phase is complete, the top plates of the MSB capacitors are allowed to float, and the bottom plates of the MSBs not under test are connected to Vrefl. This allows charge to redistribute from the CUT to the smaller capacitors. Finally, an 11 bit SAR algorithm (corresponding with the 11 capacitors that are smaller than the MSB array) is performed which produces a result that indicates the amount of error that the CUT has compared to an ideally sized capacitor. This process is repeated for each of the five MSBs on both the plus side and minus side DACs and the five error values that are reported correspond to the five MSBs accordingly. All of these error values are about the same magnitude, with a unit of 16-bit LSBs. See Figure 2 for an example. Figure 2. Example of calibration on bit 11 The DAC MSB error is cumulative. That is, if bit 11 of the DAC is set, then the error is simply the error of that bit. However if bit 12 of the DAC is set, the total error is equivalent tothe error reported on bit 12, plus the error reported on bit 11. For each MSB the error is calculated as below, where Ex is the error found during the calibration for its corresponding MSB bit: When bit 11 of the DAC is set: CLx0 = E0. When bit 12 of the DAC is set: CLx1 = E0+E1. When bit 13 of the DAC is set: CLx2 = E2 + E1 + 2E0. When bit 14 of the DAC is set: CLx3 = E3 + E2 + 2E1 + 4E0. When bit 15 of the DAC is set: CLx4 = E4 + 2E3 + 4E2 + 8E1 + 16E0 Figure 3. Effect of calibration error on ADC response These are the values that are then placed in each of the CLxx calibration results registers. Figure 3 shows how the errors would accumulate if all of the CLxx registers were set to zero. The offset and gain registers are calculated based on these values as well. Because of this, the gain and offset registers calibrate only for errors internal to the SAR itself. Self calibration does not compensate for board or system level gain or offset issues. 2. Recommended Calibration Procedure From the above description it is evident that the calibration procedure is in effect several consecutive analog to digital conversions. These are susceptible to all of the same sources of error of any ADC conversion. Because what is primarily being measured is the error in the size of the MSB capacitors; the recommendation is to configure the SAR in such a way as to make for the most accurate conversions possible in the environment that the SAR is being calibrated in. Noise is the primary cause of run-to-run variation in this process,so steps should be taken to reduce the impact of noise during the calibration process. Such as: All digital IO should be silent and unnecessary modules should be disabled. The Vrefh should be as stable and high a voltage as possible, since higher Vrefh means larger ADC code widths. An isolated Vrefh pin would be ideal. Lacking that, using an isolated VDDA as the reference would be preferable to using VREFO. The clock used should be as noise free as possible, and less than or equal to 6 MHz. For this purpose the order of desirable clock sources for calibration would be OSC > PLL > FLL > ASYNC The hardware averaging should be set to the maximum 32 samples. The Low Power Conversion bit should be set to 0. The calibration should be done at room temperature. The High Speed Conversion and Sample Time Adder will not have much effect in most situations, and the Diff and Mode bits are completely ignored by the calibration routine. The calibration values should be taken for each instance of the SAR on a chip in the above conditions. They should be stored in nonvolatile memory and then written into their appropriate registers whenever the ADC register values are cleared. In some instances, the system noise present will still cause the calibration routine to exhibit greater than desired run-to-run variation. One rule of thumb would be to repeat calibration several times and look at the CLx0 registers. If the value reported in that register varies by more than three, the following procedure can be implemented. Run the calibration routine several times. Twenty to forty times. Place the value of each of the calibration registers into a corresponding array. Perform a bubble sort on each array and find the median value for each of the calibration registers. Use these median values as described for typical calibration results.

Hi, I have a project created by Processor Expert and CodeWarrior 10.2 for TWR-K20 demo kit. Becasue I have some problem to use the Processor Expert USB HID Keyboard Host of the USB stack 4.1.1, I need to change to add the non-PE USB HID Keyboard Host into the project. Can anyone tell me how to do it? It will be very appreciated to give me a simple 'PE' example project, and add the non-PE USB HID keyboard host stack. Thank you! Stanley

Kinetisマイクロコントローラ・ナレッジ・ベース

Kinetis Microcontrollers Knowledge Base

ディスカッション

浅谈知识产权保护方法之加密Kinetis K60（方案二）（Security Kinetis K60 for IP Protection Scheme2）

Freescale Kinetis K70 MCU: FPU Demo

KL25的I2C模块调试的两个注意事项（Two tips in the debugging of KL25's I2C）

UART Emulation Using the FTM or TPM

Software UART (using the FTM or TPM)

How to use the Kinets LTC ECC HW to accelerate Curve25519.

How to byte program SPI flash via QSPI

Freedom K20 example code

16-bit SAR ADC calibration

USB HID Keyboard Host of the USB stack 4.1.1

KE02_I2C_Slave_PE_LDD.zip

Interrupts appear to be disabled when debugging.pdf