LPC865 DMA Bus access cycles including Descriptor fetch

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

LPC865 DMA Bus access cycles including Descriptor fetch

253 Views
shoichi_kojima
Contributor I

Dear staffs,

Now, I'm trying to use DMA channels on LPC865 to make contiguous SPI Master Tx traffic by chained descriptors plus some other DMA channels also making a periodic write access to some other peripherals by self-chained descriptor. Yes, those DMA channels are endlessly ever-working WITHOUT  ANY CPU assistance.

I've been doing the same kind of DMA scenarios also on LPC845 whose max bus frequency is limited at 30MHz. On LPC865, I expect its improved max Bus frequency 60MHz helps better. In order to have better fetch/execution performance  of DMAC, all descriptors are placed on SRAM(expecting zero-wait), not on Flash ROM (we need three AHB cycles for 60MHz for them)

In spite of bus frequency improvement on LPC865, I could not have expected bus perfirmance results.

Is there any changes in somewhere between DMA bus master and co-workable AHB(/APB) slaves, especially for the pathways between SRAM ?

Or, is there any detail explanation documents written about bus acccess cycles from descriptor fetch and generated DMA transfer (source and destination) bus cycles ? 

 

0 Kudos
6 Replies

212 Views
Alice_Yang
NXP TechSupport
NXP TechSupport

Hello @shoichi_kojima 

What about the frequency of SPI? 

The mainly consideration is SPI. Because DMA is much faster than SPI.

 

BR

Alice

0 Kudos

192 Views
shoichi_kojima
Contributor I

Thank you for your respnse.

>What about the frequency of SPI? 

SPI Tx rate would be 13.3Mbps. Of course, to suppress AHB bus access load, SPI frame format is 16bit long.

>The mainly consideration is SPI.

>Because DMA is much faster than SPI.

Yes. I expect so.

I've already succeed in that SPI Tx rate and frame format on LPC845 whose max clock is 30MHz, LPC845 provides two separate SRAM slaves 8KB for each.

So, on LPC845, I placed most of DMA-related fundamental memory resouces such as Descriptor area for DMAC itself and linked Descriptor images which are the part of Ping-Pong DMA scenario and fetched also by DMAC in SRAM1.

Only the data to be Tx-ed on the SPI Tx data payload is located in the other bank, which is SRAM0 that is also used by MTB when employed.

 

Frankly my real goal target is employing I3C, not SPI.

But at this moment, I could not obtain good I3C-natured external devices.

When the I3C is really employed, at least the slave I3C device must responds with ACK on the bus. It is a little bit complicated for the early stages of I3C-featured system design. I3C is expected at the rate of 12.5Mbps or so.

So, for this kind of early stages I'm trying to SPI Tx-only in place, in order to have just the similar I/O traffic to I3C. Because I do not need take cares about ACK/NAK response for imaginary communication target devices connected to LPC865.

0 Kudos

183 Views
shoichi_kojima
Contributor I

Hi, Alice-san,

I think I should show more detail inforation about clock and some other scenario information around DMAC employment.

DMA scenario to SPI-Tx is as follows:

Transfer rate 13.3Mbps is realized by feeding FRG clock to SPI.

In order to feed 60MHz HCLK for CPU, DMAC and bus matrix, PLL is employed.

I set 240MHz vco clock frequency in PLL and finally to AHB domain, 1/4 divided output is fed for those.

On the other hand, to the connectivity Peripherals such as SPI (and I3C in the close future), FRG1 is employed. Also here, PLL output is the source and as common clock source to AHB, at the 120MHz branch from PLL, which is symbol-named "sys_pll0_clk" in Fig 7. in User manual "UM11607". LPC845 has the same pathway.

In the LPC865, a divider named "PLL Divider" is newly implemented in the pathway to FRGs.

To keep similarity to the implementation on LPC845, diver ratio is set 1/1 to set "pll_div_clk" frequency to FRGs at 120MHz.

To have SPI transfer rate clock of 13.3MHz, FRG1 is employed and no diving  function inside SPI ("DIV.DIVVAL" as 0) is used.

Probably in my further trial with I3C, clock source selection would be an issue.

In LPC865, "I3CFCLKSEL" is provided with the choice among FRO, Main clock, FRG0 clock, FRG1 clock etc... if the section "8.6.26" in UM11607 is correct.

But, one thing anxious about is "Fig.8 Clock generation". There only two choices 'fro' and 'external_clk" are shown. (I also need confirmation that point, which is correct)

Even in the worse case where not to get clear info, I can choose fro, I guess.

So, in my current implementation, I set the frequency of FRO at 60MHz to have enough number of dividing setting choice....

Anyway, then I should show you a little bit detail about Ping-Pong DMA scenario.

I employ two descriptors crossingly linked togather as an endless loop of an Asynmetric number of transfer. Both does transfer in 16bit-wide single invoked by SPI_Tx request. One is 16 halfword long and another is 9 halfword long.

SPI frame format with no intensional delay. 

DMAC would makes fetch access at the each end of described number of transfers.

 

For your more precise analysis, I should mention about one additional DMA ch employment.

As you may remenber I asked about FTM in place of SCT.

I need at least PWM-functioning timer peripheral optionally with compariator as much as  possible, completely independent of SPI/I3C-connectivity function. I need two PWM output with cascaded period for measurement/scheduling data transfer.

With SCT, I could do that by using H/L-combined/cascaded setting of one SCT, I can leave another SCT almost completely free. But, on LPC865, FTM does not provide H/L-conbined/cascaded usage just by one FTM instance.

So, here I decided to try Bit-Banging-by-DMA style of implementation.

Making a FTM just as a PWM function of shorter/upper period and INIT event trigger is employed to req/trig one of the other DMA channel with self-linked DMA descriptor to have endless behavior.

As the source, H/L-flipping image and as the desctination, GPIO-functioning peripheral found in LPC865. As you know, the "GPIO" peripheral is located in the OUTSIDE of AHB, where DMAC can NOT make a write access, Only CPU can do.

So, I had to search for a reasonable alternative. Now I employ FTM's SWOCTRL as DMA destination to make H/L-flip on a pin as the preprogrammed sequence as a data table content stored in FlashROM. As the request trigger for the DMA ch, FTM1's INIT_TRIG is used. A little bit painful and tricky implementation, you may say....

As my first tesing scenario of DMA with self-linked descriptor, I've done it and confirmed periodic H/L-flip on a MCU pin, as expected. Seems success..

Then as the next one, I started to employ another DMA ch for SPI Tx additionally to that. DMA channel priorities for those two are intensionally set different.

SPI Tx's is weaker than those for tricky H/L-toggle by FRT1's INIT_TRIG.

While SPI in LPC8xx dones NOT have dedicated 'FIFO' with certain depth, at least the Tx data buffer itself just behind shift register for Tx-ing can hold one Tx-halfword, 16bit-time-long. Even if SOME other DMA channel prioritized higher is triggered, Tx-ed signal observed on MOSI should  be hardly disturbed by them.

(At that step, I noticed the the available priority bits are reduced to two in LPC865, while LPC845's DMAC provides three. So, I adjusted in 2bit range.)

 

At the first glace to the result of this two DMA channel employment implementation, H/L-flip is stopped, didn't work any more... I had checked from several view points.

Requet/trigger assignment, AHB bus traffic and occupancy/conflict not only for src and dest accesses but also those arround Descriptor fetch etc...

But I can get no idea....

 

After a few days interval, without any intension, I'd manually set non-zero DIVVAL in SPI via debugger's peripheral window for that SPI during the MCU and its DMAC is working, I happed to see the H/L-flip on the faked PWM output. (Here, in UM11607's corresponding page, Table 288 for example, there no explanation about the case where clock timing source is the other one than PCLK. but I know the same in lots of predecessors' like LPC845's UM11029. So, no surprise.)

The DIVVAL which can make H/L-flips by SWOCRTL is enough large, such as 0x40, 0x50 etc..., and the larger the DIVVAL the closer H/L-flip period as expected.

This observed fact of behavior changes might be indicating important hints, I expect.

 

Additional one thing I shoud correct in my previous report about similar implementation LPC845, not a big thing...

I'd also assign the data to be transferred out on MOSI on to the same SRAM bank.

So, almost all the data memory portions are allocated in the single SRAM slave. And even in that concentrated placement, no strange behavior had been observed on LPC845 even while the bus clock is 30MHz. 

This concenration would be practically the same on LPC865 that provides only single SRAM slave. I expected the better system behaviors provided by improved bus clock frequency of 60MHz on LPC865.

Any analysis suggestions ?

Thanx

 

0 Kudos

181 Views
shoichi_kojima
Contributor I

Sorry I forgot one thing about timing relation between FTM and SPI Ping-Pong.

The cycle period set for the FTM is exactly the same to that for 16-halfword 'Ping' portion plus 9-halfword 'Pong' portioon.

So, the LPC865 DMA is required to fetch new linked next descriptors totally three times in the period for 352 bit time for the transfer by SPI Tx running at 13.3M(= 120MHz/9) bps.

0 Kudos

46 Views
shoichi_kojima
Contributor I

Hi, Alice-san;

Almost one month, no supporting reply from NXP TechSupport staffs...

I got a very little progress, but not yet solved by question and problems around DMA usage in LPC865.

In the follwing report, I put MY understanding and comments around DMAC in LPC8xx serise DMAC. If I'm wrong, any corrections and commets are very welcome.

 

Before describing DMA and Bus performance issues, I found some in the other area in LPC865.

○SPI transfer clock

   To have 13.3Mbps ( a little bit faster than I3C ) by SPI Tx, I tried to set dividing elements feeding clock to SPI. 13.3MHz is 120MHz/9, I'd done also LPC845.

When I watch the SPI1.SCK, I found very narrow SCK clock on LPC865.

Also found that the contiguous SPI Tx (MOSI output) by DMA occasionally missing.

At first, I doubt some DMA management condition change in LPC865 causes this.

But, I found the other reasons seen only on LPC865, not in LPC845.

Following is a close-up around LPC865's SPI clock source path.

"PLL Divider" is newly placed before the path to FRG.

shoichi_kojima_0-1715134277860.png

I initially set SYSPLLDIV as just 1/1 .

Because my reference implemetation on LPC845, this element was NOT implemented in LPC845. So, 1/9 dividing was done by FRG and worked on LPC845.

But when I watched narrow duty SPI1.SCK wave form on LPC865, I tried to employ this PLL Divider to have 1/9 from PLL output. 

When I set 1/3 for PLL divider and another 1/3 division inside FRG, SPI MOSI and SCK output are working seamlessly contiguous as I expected. DMA is employed for SPI Tx with endless Ping-Pong loop scenario.

So, this would be one of the hidden causes that I could NOT get 13.3Mbps SPI Tx.

I guess several small points changed in LPC865 are meaningfull by SOME reasons in LPC865. But, at the same time, SOME other changes considered "risk-less" might effect unexpectedly worse...

After above changes, I could confrim SPI Tx seamlessly driven by DMAC (only by itself) really works.

As the next step, tried to add some other DMA events with DMA channels.

As one of them I've been trying to the MUX'ed trigger event from FTM.

To have cascaded PWM timer behavor only by employing single FTM, 

Another DMA channel triggered by FTM's INIT_TRIG, update its SWOCTRL, update sequence is pre-coded table in Flash ROM, by self-linked descriptor for endless behavior.

The destination SWOCTRL register in FTM is NOT on the APB where SPI is located but on AHB ( then via AIPS-Lite bridge) By employing SWOCTRL register, unused FTM output can be used like a DMA-driven GPIO.

I set the FTM's PWM cycle (by its MOD register) as 22 * 16 SPI bit time.

One PWM output is let FTM to generate by itself and another virtually-cascaded PWM output wave from is generated as the result of update sequence of its SWOCTRL register.

SPI is set in 16bit framing RXIGNORE for Tx-Only. endless circular Ping-pong scenario crosslinked descriptor sets;  Asynmetric 16 Tx and 6 Tx for easy distinguish of by-Ping / by-Pong from the MOSI wave form with FTM-related PWM output signals

With the CNTIN.INIT value set as 0, FTMx_INIT_ITRIG is connected thru DMA_ITRIG_INMUX[n] to free DMA channel n with no dedicated peripheral hardware request enabled, only by hardware trigger from DMA_ITRIG_MUX[n].

 

Let me confirm trig/req for DMAC (commonly to LPC-8xx serise).

One is from SPI Tx hardware REQUEST to its dedicated DMA channel.

("REQUEST" is basically level-sensitive, acknowledge signal from DMAC may clear it internally)

The other is a TRIG event signal from FTM.

TRIG is basically a edge-sensitive, no acknowledge is required from DMAC.

User manulas published from NXP sometimes do not explain those clearly, a little bit fuzzy and confusing....

If my understandng about those is wrong, please correct  --> To NXP TechSupport staffs.

Anyway, I then tried to employ two DMA channels; one by REQUEST and nother by TRIG.

Regarding those occerence, 22 times of REQUEST from SPI Tx occur during one TRIG interval rom FMT.

In case where multiple DMA channels (requests/triggers) are employed, the priority setting of each channel is to be carefully set with the othres',

 

Here, I would like to ask to NXP TechSupport and confirm.

To proseed implementation design above, explanation for "TRIGTYPE" field in DMA channel configuration should be read carefully.

But, as far as I've tested in LPC8xx serise DMAC, the explanation seen there seems to be opposite each other...

When I employ Timer-related event(essentially an edge-trigger signal in nature), I need to set that field as a "1" to get expected behavor in LPC8xx. On the other hands, when I employ peripheral request, I need to set "0".

I've been believing this issue is already well-known among LPC8xx custommers....Am I wrong?

 

Going back to DMA usage with FMT and SPI  issue...

The occurence ratio is 1:22.

SPI Tx has one-frame-deep buffer before shift-register portion of SPI peripheral inside. As I set 16bit framing for SPI, LPC865 system running at 60MHz core/AHB clock, this time depth would be (1/13.3Mbps)*16 --> 1.2us --> 73.4 AHB cycles.

Here, I need precise number of cycles that one DMA transfer consumes.

During this 73 AHB cycle period, I need to complete at least two DMA trarnsfers; each of them is just a single transfer, not burst.

Regarding priority, if I expect less-jitter on pseudo cascaded PWM done by SWOCTRL write access by DMA triggered from FTM, I think priority for thet DMA channel must be higher than that of the DMA channel working to SPI Tx.

 

I do NOT think that 73 AHB cycles is hopeless value for two DMA channel acitivities.

But, unfortunately, I could not get acceptable result....

Then, what are observed ?

 

SPI Tx MOSI and SCK signals are good as expected. No extra slack time, small-enough jitter etc... When I observe with PWM generated by purely by FTM, the time relation among them are stablly constant.

But, for SWOCTRL-ed pseudo PWM output done by another DMA channel is sometimes, somehow missing or almost no signal flip observed on the pin... 

Even the DMA priority for that is set higher...

I remember that the priority field in LPC845 is three bit long, not two bit as LPC865.

So, I guess some modifications are also made around or inside DMAC of LPC865, its priority arbitration logic as well... . 

 

Additionally, in order to get hits of causes for this behavior, I intensionally set none-ZERO DIVVAL in SPI peripheral. Doing this reduces the occurence of the DMA request from SPI Tx less frequently.

The result are....

When I set that larger DIVVAL than 0x15, I could observe the closer PWM wave frequency as expected.

But, if DIVVAL is equal or less than 0x15, pseudo PWM wave is no longer observed.

From those results, I guess that if the request trigger timing from FTM conflicts with the request level pulse from SPI Tx, the fact that FTM trigger issued is completely ignored. The more often conflict, the longer interval seen on preudo PWM output signal.

I understand these results very serious at this moment.

ITRIG-type event signaling MAY always ignored/lost by any dedicated REQUESTs from pre-wired peripherals.

 

Probably, something is wrong in my understandings and trial methods and conditions.

If anyone of you TechSupport team find some, let me correct.

 

Regards,

 

0 Kudos

13 Views
shoichi_kojima
Contributor I

Dear TechSupport staffs,

 

How do you anaylze my last report ?

The value 0x15(22-1 in decimal) for SPI.DIVVAL seems to be a kind of watershed...

By setting that to a level-sensitive wired REQUEST origin is the similar occurent frequency that the ITRIG-ed event thru DMAIMUX  to another DMA channel whose priority is set higher. If these relationship exceeds the watershed, ITRIGed DMA request NEVER be taken...

In another word, REQESTs and ITRIGs occur same interval can NOT be handled by LPC865's DMAC channels, regardlessly to those priority difference.

This is VERY SERIOUS to an MCU product that equips DMAC providing multiple DMA channels in its facial spec.

 

I do still have questions and need information about ACTUAL number of bus clocks that one DMA tranfer consumes on LPC865.

But, gradually I realized that there is MORE CRITICAL something in LPC865 behind the problem which I'm facing at.

Time-triggered DMA is a very fundamental function that MCU with DMAC should support. On LPC845, I did confirm that really works employing SCTimer and SPI Tx.

But, on LPC865, as a Time trigger generator, FTM seems to be the reasonable and only alternative to SCTimer. FTM seems to be transplanted from Freescale resources, not NXP's. Even if FTM by ITSELF works fine, it may NOT always work as a timing generatior functioning part of MCU elements...

I could not find the good EVIDENTIAL examples featuring DMA channels triggered by events from FTM among currently provided NXP's XpressoSDK.

If those combinational use cases have been really tested, some might exists at least in your lab, I belive.

Such examples might be much better and suitable to me and my cases.

 

BR,

0 Kudos