IMX53 on recent 4.4.x kernels.

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

IMX53 on recent 4.4.x kernels.

3,085件の閲覧回数
Noel_V
Contributor III

Hi all,

 

I’m facing a problem , for multiple weeks already.

 

I have ported a recent kernel (4.4.x branch) to one of our custom ARM ( imx53-based ) boards.

This board has been running kernel 2.6.35.x for some years, and at this time we decided we needed newer kernel ( for wifi operation), so far so good.

 

 

When running this a NEW (recent) kernel 4.4.61 (or even newer 4.4.x Lets say) I see a HIGH CPU load compared to the older 2.6.35 kernel ( when running the same applications)

 

Comparing systems is difficult I know.. but... What has been changed ? only the KERNEL ( same hardware, same gcc-compiler, same C-lib, same user-space applications) only the KERNEL has been upgraded (nothing else) , both boards I compare are identical .. running at the same clock-speed... same amount of ram / storage. ! 100% sure on this ! .

 

When comparing 2.6.35.x to 4.4.x kernels I see that a LARGE fraction of the CPU power is consumed into system ( looking at TOP)

 

 

I know that I moved couple of versions ( 2.6.35 -> 4.4.x is quite a big step there is no discussion about that, we all know that, and want to understand that )

 

BUT I did not expect the same 'user application' (for example TOP / HTOP , or my custom application) to use that MUCH CPU ( running on 100% the same hardware configuration ).

{ did not recompile any application, the only thing that has been upgraded is the kernel nothing else , same compiler, same Clib, etc..}

 

Reaction speed/time of my custom app is very bad.. when I compare .. 2.6.35 to 4.4.x ( even with the same (aprox the same) kernel configuration)

 

I do compare the reaction speed of both kernels on the same user application, and I can’t understand that an 'empty/clean' Linux system (custom app being removed) just running TOP (or HTOP) on a ARM CPU (running at 1GHZ) is causing 31% of cpu load on a 4.4 l kernel and 1% on 2.6.35-kernel.  { I show you here the results on top/htop because everyone know these, but the 'real' issue I'm fighting with is the fact that all is responding very badly on those 4.4.x kernels ! }

 

whatever you seem to do on a 4.4.x kernel .. it seems to be using that much cpu very quickly, and due to this high cpu load all applications ( even bare basic system application like top/htop / watch .. etc ... ) are NOT running smooth !

 

 

Both system I compare have been stripped down to a 'bare' minimum required.

 

( I could send you a screenshot .. of both if you want.. 100% the same processes are running... same builds, same versions, build with same compiler, running on the same HARDWARE , the SINGLE change is the kernel-version 2.6.35 vs 4.4.x ! )

 

I have exactly the same number of TASKS/processes (in htop) and I have exactly the same process layout (when comparing both).

 

On the 4.4.x kernel I see HIGH CPU load, 'system load' ( what means kernel load) on the 2.6.35 kernel.. I have way less cpu load. ( Running exactly the same number of applications)

 

On the 4.4.x system I see a load up to +31% (system) on HTOP/TOP while I see for the 2.6.35 kernel... a load of 1% ... with the SAME processes running ! ( you can count on that )

 

Spending 31% load on a 1GHZ CPU ... with a system that is nearly 'empty' while on the same hardware and a 2.6.35. kernel ( 100% same setup , same process tree ) I nearly have 1% CPU load.

 

 

And the load is that different on both kernels you can 'feel' this ( you can see this in almost every action you take, for example on display drawing, reaction on key-input.. etc etc...)

 

(since I can not include pictures here I've include some 'live' textual info)

 

 

LOOK at the results below ....

 

Best Regards

 

ANY HINTS.. I’ve been digging for a long time.. whatever I tried… cpu load stays high !

 

 

 

====================================4.4.x=========================

 

Kernel 4.4.x

 

uname -a

Linux DU11 4.4.73 #3 PREEMPT Tue Jul 4 09:06:46

 

#top

Mem: 176076K used, 844800K free, 0K shrd, 44K buff, 44K cached

CPU: 0% usr 31% sys 0% nic 68% idle 0% io 0% irq 0% sirq Load average: 1.31 1.41 1.43 1/85 12125 PID PPID USER STAT VSZ %VSZ %CPU COMMAND

12125 21246 root R 1184 0% 23% top

12142 1 root S 3728 0% 0% /usr/sbin/mosquitto -c /etc/mosquitto/

 

# htop

CPU[##********** 40.0%] Tasks: 32, 0 thr; 1 running

Mem[|||||** 142/978MB] Load average: 7.20 7.18 7.12 Swp[ 0/0MB] Uptime: 16:22:48

 

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command

30886 root 20 0 1064 576 432 R 33.0 0.1 0:00.06 htop

 

====================================2.6.35=========================

 

 

Kernel 2.6.35

uname -a

Linux OLD 2.6.35.3 #1 PREEMPT Tue Jun 14 13:45:24

 

#top

Mem: 97064K used, 904376K free, 0K shrd, 0K buff, 46160K cached

CPU: 0% usr 0% sys 0% nic 99% idle 0% io 0% irq 0% sirq Load average: 1.41 1.27 1.26 2/70 18808 PID PPID USER STAT VSZ %VSZ %CPU COMMAND

18696 30503 root R 1176 0% 0% top

842 1 root S 3928 0% 0% /usr/sbin/mosquitto -c /etc/mosquitto/

 

 

 

#htop

CPU[#* 1.5%] Tasks: 32, 0 thr; 1 running

Mem[||||** 126/996MB] Load average: 1.21 1.23 1.30 Swp[ 0/0MB] Uptime: 2 days, 00:05:53

 

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command

13431 root 20 0 1076 772 628 R 1.0 0.1 0:00.66 htop

ラベル(2)
タグ(2)
0 件の賞賛
返信
11 返答(返信)

2,520件の閲覧回数
Noel_V
Contributor III

Hello  Fabio,

Good news ...  Russell's recommendation .. did it ..  ( LOCK_DEP) 

At this time .. i have the same results for both kernels  ( old 2.6.35 ) and ( new 4.4.x) are running on the same speed.

Thanks for your assistance...  and feedback ! 

4.4.x - Kernel - test results ! 

# /bin/testcode2
TestCode-1
Going to loop 20000000 times.
2438110-2420843 = >17267 ms
#

Best Regards

Noel

0 件の賞賛
返信

2,521件の閲覧回数
Noel_V
Contributor III

Hi Fabio,

Nice to hear, but as you might expect, as soon as I did read Russel's answer, I was almost immediately convinced that his hint/tips made sense (and that of those debug features were enabled , that turning this off would be a huge performance gain)

I did not find time to experiment, with it,  it's national holiday in Belgium and we are starting our annual-holidays ( we are closed for 3 weeks ,and I'm out of the country for almost 2 weeks.  

Definitely , after the holiday's I'll try and report back. my findings.

 

NOTE : your  seconds make sense to ( I did mention before, but is easily overlooked,  that we are running on 1-GHZ )

this is why measurements came on 18seconds  ( 18/800*1000 => approx 22) so our 2.6.35 was running on exactly the same measurement times on the loop ( knowing that I was on 1Ghz and you probably on 800Mhz)

But , since you told me 4.4.x in not really supported  by NXP ( and is thus not a good choice) I might convert all my patches ( and that are some, in fact its a patch set of +30.000 lines for our hardware)

I don't know why but when I started the working on a port I asked on a IMX forum what is the best choice to start 4.4.x or 4.9 ( what was the recent kernel available at the start of my ports) and some told me you should go for 4.4.x.

But this 4.4.x seems to be the bad choice now.

I wonder , whenever I have to make such a decision, again , for upgrading a kernel , how do I make a good choice in kernel version (there are so many , and some people recommend Version-xxx and others Version-yyyy)

Anyway thank you , for all of the support so far, and .. you can count on it when back at work, I'll report back to you.

Hopefully, I can bring good news.

Thanks

One MINOR ( less important question )  you know why the BOGO-mips on recent kernels ( shown in startup) is totally different for 2.6 and 4.x kernels? (in mu case 2.6.35 , 999-bogo-mips  and on 4.4.x it says 66)

0 件の賞賛
返信

2,521件の閲覧回数
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

I am running MX53 at 1GHz too.

If you plan to use mainline kernel it would be better to spend some time upstreaming all the drivers you need to get your hardware supported. Submit the patches in the appropriate lists for review.

Once all the drivers are available in the mainline kernel, then you can also submit your dts and this allows you to run future kernels very easily and you will not have to worry again about upstreaming to kernel 4.x.y because your hardware will be supported by the kernel.

About the BogoMIPS value: the lower number you see is normal. It is the result from using timer-based delays.

The commit that introduced this behavior is this one:

commit 1119c84aa3053bd415ba731ada1ecef24c8f82a2
Author: Sebastian Andrzej Siewior <bigeasy at linutronix.de>
Date: Wed Jan 22 12:35:44 2014 +0100

ARM: imx: enable delaytimer on the imx timer

The imx can support timer-based delays, so implement this.
Skips past jiffy calibration.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

Please see this thread:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245630.html 

Enjoy your holidays!
0 件の賞賛
返信

2,521件の閲覧回数
Noel_V
Contributor III

Hi Fabio,

Well comparing your QSB ( with all your applications you have running) with mine ( running mine rootfs) makes not really sense.

The reason why my system is not comparable with yours is that I do have other "things" running then you have running.

Bottom line YOU should compare YOUR  QSB with the old 2.6.x kernel ( and identical Rootfs (lets call it your-rootfs))  with YOUR QSB and a recent kernel ( with same rootfs (your-rootfs)).

If you do this you would see that your NEW 4.x.x kernel would ( at least) be factor 3.3 times slower than the 2.6.35 kernel ( or even more slow them 3.3 times)

When comparing systems ( 4.4.x vs 2.6.35)  here on my desk , I make sure that both systems ( old and new kernel have the same APPLICATIONS running) { I have exactly the same process-three/layout for both compares... }

When stripping the systems here ( old 2.6.35 and new 4.4.x) down to the bare minimum  ( which I assume you have , the bare minimum running ) .. well at that time I have run times (for the testcode) for  the new 4.4.x kernel 40.000ms ( aprox your times measured) but the old 2.6.35 kernel goes to less than 7000 ms in that case. ( this is almost factor 6,  40.000/7000)

I've been reporting the KERNEL speed issue to the kernel list already ( I've tried the get into the attention of the kernel developers) ..  before.. .but... it seems not be be of their concern, they say all is working fine...

It is nearly impossible that nobody has noticed this before... ! 

If you take it down to the real facts ( numbers, measured with test-code, which is less then 10 lines of C code , and that is no rocket science ! ) then this would mean I would need AT LEAST a quad core on the 4.4.x kernels to have the same feeling on the user interface , when I compare this to my old 2.6.35 kernel ( running on a single core imx53) .. with the SAME applications running ! ( knowing that,  I only did change the kernel ! )

For 4.x kernel : I see that the RESULTS of the 'testcode' ( that you have been testing) are very 'instable' ( means fluctuating a lot )  once you have a little (even the smallest) load on the kernel... it tends to 'increase' a lot,  while on the older 2.6.x kernel the load of the kernel itself .. does not have that much impact on the results of the testcode !

I will try to report   once more .. you never know, but it feels like talking to a concrete wall :-) 

{ I've been spending many weeks on this issue trying to optimize all kind of stuff, and trying to locate the bottleneck in the linux-kernel code ... etc etc. }

{ Dito for the NAND flash issue i've been reporting ( you will find the NAND issue here on the forum to ,  you responded me to .. ) .. I've been reporting that to the mtd.. list.. multiple times... not a single reaction :-( } 

Best Regards

Noel

0 件の賞賛
返信

2,521件の閲覧回数
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

Thanks for starting the thread in the linux-arm-kernel list.

I implemented Russell King's suggestions to remove some debugging features in defconfig and now your application execution time went from 41 to 22 seconds with kernel 4.13-rc1:

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/520419.html 

I get this same 22 seconds with kernel 2.6.35.

0 件の賞賛
返信

2,521件の閲覧回数
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

I am trying to help you, but please keep in mind that NXP does not support kernel 4.4 on mx53, so that's why I am asking you to report the problem in the linux-arm-kernel list.

When reporting to mailing lists, it is also a good idea to Cc the maintainers of the particular driver. For example on the mx53 case you could get the list of names running:

./scripts/get_maintainer.pl -f arch/arm/mach-imx/mach-imx53.c

For the NAND issue you reported earlier:

./scripts/get_maintainer.pl -f drivers/mtd/nand/mxc_nand.c

Hope this helps.

0 件の賞賛
返信

2,521件の閲覧回数
Noel_V
Contributor III

Hi Fabio,

I've double checked my UBOOT version and it already contains the L2-cache enable patch !

'use case 1' ( simple test program, simple C loop with kernel call )

I just did rebuild the sample code ( I already scratched my old-test, but since is only a couple of c-lines.. it was restored quickly )

For this SIMPLE test program I get:

for 2.6.35

# uname -a
Linux OLD 2.6.35.3 #1 PREEMPT Tue Jun 14 13:45:24 CEST 2016 armv7l GNU/Linux
# testcode2
TestCode-1
Going to loop 20000000 times.
319456382-319438304 = >18078 ms

for 4.4.x

# uname -a
Linux DU11 4.4.76 #1 PREEMPT Fri Jul 14 08:19:47 CEST 2017 armv7l GNU/Linux
# testcode2
TestCode-1
Going to loop 20000000 times.
230307-169002 = >61305 ms  ==> what is 3.39 times ( or 339%)  slower ( and this was a lucky shot , most of the time it is even slower)

The code behind the Test  (here you go, its included , main.c)

The c-code is no rocket science ( just a simple loop probing for kernel time), it just shows you that the Newer Kernels (4.4.x ) are VERY SLOW compared to the 2.6.35.

For 'use case 2' : Sharing this code .. won't help .. because it is a huge program that requires some very specific ... HARDWARE ( like FPGA ) and custom drivers that wont run on a standard QSB. ( but believe me ... if you can get the 'main.c' test-code ( for use case 1)  as fast on a recent kernel as it is on 2.6.35 .. the use-case2 will run much smoother too)

For use case 3 ( browser reaction)

Well I'm comparing the browser speed on an older 2.6.35.x kernel with the browser speed on a recent 4.4.x kernel , BOTH ( in this comparo) have no graphic acceleration !

I know and understand .. that comparing systems is very difficult... .. this is why I'm running the SAME user applications ( I even did not rebuild them) .. the only and single difference is the KERNEL. Bottom line it all falls down to .. the 'kernel' .. the recent kernel is consuming way more cpu power (on itself).

{ the 4.4.x kernel itself is configured similar as the 2.6.35, ( means as much as possible identical, but I can tell you I did experiment with almost every kernel option I can think of, it all tends to be the same...means no drastic improvements by changing 4.4.x kernel options!)

As said before whatever you seem to do on a 4.4.x kernel .. it seems to be using that much cpu very quickly, and due to this high cpu load all applications ( even bare basic system application like top/htop / watch .. etc ... ) are NOT running smooth !

Best Regards

Noel

0 件の賞賛
返信

2,521件の閲覧回数
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

Just tried your code on 4.12.1:

# ./test
TestCode-1
Going to loop 20000000 times.
100346-59630 = >40716 ms

Better than your 4.4 result, but still worse than 2.6.35.

I would suggest you to report this issue to the linux-arm-kernel list.

0 件の賞賛
返信

2,521件の閲覧回数
Noel_V
Contributor III

Hi Fabio,

Ok, trying to explain some more.

For some unknown reason I get very high CPU loads on recent kernels ( i've been looking into the kernel for many hours/days weeks but can't get the clue.) 

I've been trying to locate the bottleneck where the big difference is .. and it the only thing I see is that for the same application ( even not rebuild) .. I see that the system CPU load is very high on the NEW kernel.

As said before, new kernels.. seem to use that much CPU in kernel time... compared to the older 2.6.35 kernels.

( even been checking almost every kernel option, i've been experimenting with high-restimer/HZ.. ec etc..) nothing helps.. new kernels seems easely eating CPU-power..

Real life situation's!  

For example : I made a simple stupid test program just querying the clock with 

clock_gettime(CLOCK_MONOTONIC, &tp) 10 million times . ( nearly 10 lines of C code .. so it is not that big)

Resume: … on this small test...  10 million clock_gettime(CLOCK_MONOTONIC, &tp) calls on the

2.6.35  kernel  takes 98 seconds - system load 40%

4.4.75  kernel takes 280 seconds. - system load 99%

2-nd example/case: I take our USER-application  and let it loop for 100.000 mains and take the start/end-time.

for an old (2.6.35) kernels I get runtime of those 100.000 main loops of 31.000 ms till 34.000 ms 

for a recent (4.4.x)  kernel I get values of runtime 80.000 ms and 95.000 ms ! ( -> almost multiplied by 3 for the same application)

Resume :

2.6.35 - 100.000 main loops : 31.000 -till- 34.000 ms  - Total system load 33% - 25% user - 8% system

4.4.x   - 100.000 main loops : 80.000 -till- 95.000 ms -  Total system load 93% - 32% user - 61% system

3-th example : another example ? When I run a WEB-browser it runs smooth on the 2.6.35 kernel, when running the same (100% same) WEB-browser .on a 4.4.x kernel its slow.. very slow.. etc etc .. plenty of examples.  

When I compare , I do compare... 100% exactly the same systems... same hardware ( same compilers/ Clibs.. etc .) ,... the one ( and ONLY) change... is the kernel version.

{did not rebuild the applications, just did put a NEW kernel on the system, that is the only and single difference}

I have the same behavior on the QSB .. recent kernels.. use lots of CPU  compared to 2.6.35. ( and its not a little... is factor 3 ( what means 300%) , what is a lot , too much , if you ask me)

And even when there is the smallest "load" on the system you see that in HTOP/top

Did you try to run TOP and HTOP simultaneously ? .. you should definitely see a difference on the QSB.

( to compare... easily the best thing you can do is to put 2 system up .. and see the difference life... you will see that recent kernels.. are very unresponsive compared to the older 2.6.35)

Regards Noel

PS: i've included a screenshot to show you the difference ( look carefully ... same number of tasks running .....100% same builds... only and single difference between both is the kernel version !) 

0 件の賞賛
返信

2,521件の閲覧回数
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

I remember I used to get slow performance when using mainline kernel on mx53 and the reason was that in mainline kernel the L2 cache was not enabled. Kernel 2.6.35 does enable the L2 cache.

After enabling L2 cache in the bootloader the performace greatly improved:

http://git.denx.de/?p=u-boot.git;a=commitdiff;h=4867b634b7c0e5ede258b4998fa4b2710e7daacf;hp=723ec69a... 

So make sure you have L2 cache enabled in your bootloader. Apply the patch above or move to a recent U-Boot (2017.07) and test again on mx53qsb.

If this still does not help, then please share the C code for usecases 1 and 2.

For 3, I think this is expected because there is currently no GPU acceleration in mainline kernel, so all the graphics rendering in mainline is being done by the CPU instead of using the GPU.

0 件の賞賛
返信

2,521件の閲覧回数
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

I have just tested kernel 4.12.1 + U-Boot 2017.05 on a imx53-qsb board and htop reports 2%.

Regards,

Fabio Estevam

0 件の賞賛
返信