Problem with DirectFB (Multi App Core) on Coldfire

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Problem with DirectFB (Multi App Core) on Coldfire

4,923 Views
AllenBlack
Contributor I
I'm seeing an issue when I try to run any DirectFB (Multi Application Core) application (with Fusion library and the linux-fusion kernel module) on a Coldfire MCF5484. The processor is on a custom board that is based on and similar to Freescale's M5485EVB board. This is the same board (and same project) that has been mentioned in many of jkimble's posts to this forum. We are using Freescale's Coldfire M547X_8X BSP (released 7/2008 with 2.6.25 kernel) with LTIB for our development.

NOTE: You may notice that the DirectFB version in the printout below shows v1.4.0. Initially I was using v1.1.0 (the version that comes with the M547X_8X BSP's LTIB). It behaved exactly the same in all respects to version 1.4.0. It's also important to know the version of the linux-fusion kernel module that I have been using. When working with DFB v1.1.0, I was using linux-fusion v7.0.1. And currently while working with DFB v1.4.0 I'm using linux-fusion v8.1.1 (these are both the most recent version available).The linux-fusion kernel module is NOT normally included in the 2.6.25 kernel that comes with the M547X_8X BSP so it was obtained directly from the directfb.org website (and added manually to the kernal tree which was then rebuilt by LTIB).

By looking at the Fusion library source code and the debugging messages (shown below). It looks like a successful call to mmap() is being followed by a segfault when the first access is attempted on the newly mapped shared memory area.

root@freescale:/usr/local/UCP# ./UCP_GUI --dfb:debug
[  NO NAME         0.000] (  383) Direct/Main:        direct_initialize() called...
[  NO NAME         0.013] (  383) Direct/Thread:          direct_thread_set_name( 'Main Thread' )
[  NO NAME         0.022] (  383) Direct/Thread:            -> attaching unknown thread 383
[  NO NAME         0.031] (  383) Direct/Mem:                   +   110 bytes [thread.c:369 in direct_thread_set_name()]
[  NO NAME         0.043] (  383) Direct/Mem:                   +    12 bytes [thread.c:386 in direct_thread_set_name()] -> 0x80379188 "             "
[Main Thread       0.059] (  383) Direct/Main:        ...initializing now.
[Main Thread       0.067] (  383) Direct/Signals:         Initializing...

   ~~~~~~~~~~~~~~~~~~~~~~~~~~| DirectFB 1.4.0 |~~~~~~~~~~~~~~~~~~~~~~~~~~
        (c) 2001-2009  The world wide DirectFB Open Source Community
        (c) 2000-2004  Convergence (integrated media) GmbH
      ----------------------------------------------------------------

[Main Thread       0.077] (  383) DirectFB/Core:      dfb_core_create...
[Main Thread       0.109] (  383) Direct/Main:            direct_initialize() called...
[Main Thread       0.118] (  383) Direct/Main:            ...2 references now.
(*) DirectFB/Core: Multi Application Core. (2009-06-30 19:16) [ DEBUG ][ TRACE ]
[Main Thread       0.136] (  383) Direct/Modules:         direct_modules_explore_directory( 'systems' )
[Main Thread       0.153] (  383) Direct/Mem:                   +    44 bytes [modules.c:229 in direct_modules_explore_directory()]
[Main Thread       0.166] (  383) Direct/Mem:                   +    21 bytes [modules.c:237 in direct_modules_explore_directory()] -> 0x8037a2b0 "  "
[Main Thread       0.203] (  383) Direct/Modules:             Loading '/usr/lib/directfb-1.4-0/systems/libdirectfb_fbdev.so'...
[Main Thread       0.241] (  383) Direct/Modules:                 Registering 'fbdev' ('systems')...
[Main Thread       0.252] (  383) Direct/Mem:                           +     6 bytes [modules.c:134 in direct_modules_register()] -> 0x8037a658 "   "
[Main Thread       0.268] (  383) Direct/Modules:                 ...registered.
[Main Thread       0.276] (  383) Direct/Mem:                   +    44 bytes [modules.c:229 in direct_modules_explore_directory()]
[Main Thread       0.289] (  383) Direct/Mem:                   +    22 bytes [modules.c:237 in direct_modules_explore_directory()] -> 0x8037a698 "  "
[Main Thread       0.306] (  383) Direct/Modules:             Loading '/usr/lib/directfb-1.4-0/systems/libdirectfb_devmem.so'...
[Main Thread       0.329] (  383) Direct/Modules:                 Registering 'devmem' ('systems')...
[Main Thread       0.339] (  383) Direct/Mem:                           +     7 bytes [modules.c:134 in direct_modules_register()] -> 0x8037aa40 "   "
[Main Thread       0.356] (  383) Direct/Modules:                 ...registered.
[Main Thread       0.367] (  383) Direct/Mem:               +    42 bytes [core.c:297 in dfb_core_create()]
[Main Thread       0.379] (  383) Direct/Mem:               +    24 bytes [thread.c:113 in direct_thread_add_init_handler()]
[Main Thread       0.392] (  383) Fusion/Main:            fusion_enter( 0, 45, 0x8037a9b4 )
(*) Fusion/SHM: Using MADV_REMOVE (2.6.25.0 >= 2.6.19.2)
[Main Thread       0.410] (  383) Direct/Main:                direct_initialize() called...
[Main Thread       0.419] (  383) Direct/Main:                ...3 references now.
[Main Thread       0.432] (  383) Fusion/Main:              -> Fusion ID 0x00000001
[Main Thread       0.443] (  383) Fusion/Main:              -> shared area at 0x20000000, size 2300
[  383:    0.453] --> Caught signal 11 (at 0x20000004, invalid permissions) <--
[  383: -STACK- ]
  #0  0x800e1eec in signal_handler () from /usr/lib/libdirect-1.4.so.0 [0x800d2000]
  #1  0x80102680 in fusion_enter () from /usr/lib/libfusion-1.4.so.0 [0x800f8000]
  #2  0x8019aa8c in dfb_core_create () from /usr/lib/libdirectfb-1.4.so.0 [0x80120000]
[Main Thread       0.642] (  383) Fusion/Main:                fusion_fork_handler_prepare()
[Main Thread       0.678] (  383) Fusion/Main:                fusion_fork_handler_parent()
[Main Thread       0.701] (  384) Fusion/Main:                fusion_fork_handler_child()
sh: line 1: nm: command not found
  #3  0x800014d8 in DirectFBCreate () from ./UCP_GUI [0x80000000]

Aborted


Below is the code segment from the Fusion library (a part of the DirectFB package, not to be confused with the linux-fusion kernel module) file "fusion.c". The "mmap" call is obviously successful from looking at the debugging messages. But I believe the segfault occurs several lines below that, when the first attempt is made to access (write to) the shared memory area. I'm guessing there may be an architecture dependent virtual memory issue here. Or I may be completely off base with that guess.

     if (id == FUSION_ID_MASTER) {
          int shared_fd;
         
          snprintf( buf, sizeof(buf), "%s/fusion.%d.core",
                    fusion_config->tmpfs ? : "/dev/shm", world_index );
          
          /* Open shared memory file. */        
          shared_fd = open( buf, O_RDWR | O_CREAT | O_TRUNC, 0660 );
          if (shared_fd < 0) {
               D_PERROR( "Fusion/Init: Couldn't open shared memory file!\n" );
               ret = DR_INIT;
               goto error;
          }

          if (fusion_config->shmfile_gid != (gid_t)-1) {
               if (fchown( shared_fd, -1, fusion_config->shmfile_gid ) != 0)
                    D_INFO( "Fusion/Init: Changing owner on %s failed... continuing on.\n", buf );
          }
        
          fchmod( shared_fd, 0660 );
          ftruncate( shared_fd, sizeof(FusionWorldShared) );
         
          /* Map shared area. */
          shared = mmap( (void*) 0x20000000 + 0x2000 * world_index, sizeof(FusionWorldShared),
                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, shared_fd, 0 );
          if (shared == MAP_FAILED) {
               D_PERROR( "Fusion/Init: Mapping shared area failed!\n" );
               close( shared_fd );
               ret = DR_INIT;
               goto error;
          }
         
          close( shared_fd );
         
          D_DEBUG_AT( Fusion_Main, "  -> shared area at %p, size %zu\n", shared, sizeof(FusionWorldShared) );
         
          /* Initialize reference counter. */
          shared->refs = 1;
         
          /* Set ABI version. */
          shared->world_abi = abi_version;


I should mention that Single Application Core DirectFB applications (both version 1.1.0 and 1.4.0) worked perfectly fine. It's only when we "./configure --enable-multi" and build DirectFB as a Multi Application Core that we are seeing problems.

I have several questions:

1. - Has anyone ever used DirectFB in Multi Application Core mode on a Coldfire?
2. - Does anyone know the significance of address 0x20000000 in the mmap() call in "fusion.c"? And if this might collide with something architecturally significant in the Coldfire's VM implementation?
3. - And obviously, can anyone help provide some insight to what I've described above?

Thanks!
Allen
Labels (1)
0 Kudos
Reply
3 Replies

1,207 Views
AllenBlack
Contributor I

NOTE: I'm responding to my own question to post direct responses that I received from Niels Roest, one of the developers at directfb.org. He declined to participate in the Freescale Coldfire forum but gave me permission to post his thoughts here:

 

 

/************* Niels Roest's response # 1 ****************/

 

Hi Allen,
just to give my thoughts on the matter.
Martin forwarded your mail, including the problem description at

http://forums.freescale.com/freescale/board/message?board.id=CFCOMM&thread.id=7192

Your analysis is correct; but I have no ready solution so we need to dig a bit.
in fusion.c, the offending "shared->refs = 1" explains the address of 0x20000004.
In answer to your question "Does anyone know the significance of address 0x20000000 in the mmap() call in fusion.c?": this is a reasonably arbitrary address, chosen to prevent the need to remap and recalculate addresses when passing them between processes. It has worked reasonably nicely in the past, only the sh4 chip and some mipses had a collision and were remapped, I believe, but only for the shared memory pools, not for this "fusion world" shared area of 2300 bytes.

What I would be interested in is your memory mapping right after the mmap.
If you put a sleep just before the write to "shared->refs", and dump /proc/<processnumber>/maps (or mmaps, or whatever), I am curious to see if there are any collisions at this address.

Greets
Niels

 

/************* My "follow-up" response # 1 ****************/

 

 Hi Niels,

Thanks very much for your prompt response! ...

I added a debugging printf() and a 30 second sleep just before "shared->refs = 1" in "fusion.c"... Here are my results while running "dfbinfo"...

 

[NOTE: not posted because it looks the same as the output in my initial question/posting in this forum]

 

 ... and the output from calls to "ps" and "cat /proc/<pid>/maps" while "dbinfo" was sleeping (just prior to the segfault):

root@freescale:~# ps                PID  Uid     VmSize Stat Command
   1 root        840 S   init         2 root            SW< [kthreadd]
   3 root            SW< [ksoftirqd/0]
   4 root            SW< [watchdog/0]
   5 root            SW< [events/0]
   6 root            SW< [khelper]
  36 root            SW< [kblockd/0]
  45 root            SW< [kseriod]
  63 root            SW  [pdflush]
  64 root            SW  [pdflush]
  65 root            SW< [kswapd0]
 119 root            SW< [aio/0]
 250 root            SW< [mtdblockd]
 258 root            SW< [spi_coldfire]
 271 root            SW< [rpciod/0]
 292 root        616 S   /sbin/syslogd
 295 root        592 S   /sbin/klogd
 317 root        712 S   /usr/sbin/inetd
 319 bin         440 S   /sbin/portmap
 323 root        688 S   /usr/sbin/dropbear
 329 root       1384 S   -sh
 332 root        968 S   /usr/sbin/dropbear
 333 root        968 S   /usr/sbin/dropbear
 335 root       1360 S   -sh
 336 root       1360 S   -sh
 355 root       1296 S   dfbinput --dfb:debug
 356 root        792 R   ps
root@freescale:~#
root@freescale:~# cat /proc/355/maps
20000000-20002000 rw-s 00000000 00:0b 3975544    /dev/fusion/0
80000000-80004000 r-xp 00000000 00:0b 3917515    /usr/bin/dfbinput
80004000-80006000 rw-p 00002000 00:0b 3917515    /usr/bin/dfbinput
80006000-8001e000 r-xp 00000000 00:0b 3883884    /lib/ld-2.5.so
8001e000-80020000 rw-p 00016000 00:0b 3883884    /lib/ld-2.5.so
80020000-80022000 rw-p 80020000 00:00 0
80024000-800fe000 r-xp 00000000 00:0b 3917529    /usr/lib/libdirectfb-1.4.so.0.0.0
800fe000-80108000 rw-p 000d8000 00:0b 3917529    /usr/lib/libdirectfb-1.4.so.0.0.0
80108000-8010a000 rw-p 80108000 00:00 0
8010a000-8012c000 r-xp 00000000 00:0b 3917532    /usr/lib/libfusion-1.4.so.0.0.0
8012c000-80130000 rw-p 00020000 00:0b 3917532    /usr/lib/libfusion-1.4.so.0.0.0
80130000-80152000 r-xp 00000000 00:0b 3917526    /usr/lib/libdirect-1.4.so.0.0.0
80152000-80156000 rw-p 00020000 00:0b 3917526    /usr/lib/libdirect-1.4.so.0.0.0
80156000-80158000 rw-p 80156000 00:00 0
80158000-8015a000 r-xp 00000000 00:0b 3884005    /lib/libdl-2.5.so
8015a000-8015c000 rw-p 00000000 00:0b 3884005    /lib/libdl-2.5.so
8015c000-8016a000 r-xp 00000000 00:0b 3884029    /lib/libpthread-0.10.so
8016a000-8016e000 rw-p 0000c000 00:0b 3884029    /lib/libpthread-0.10.so
8016e000-801b0000 rw-p 8016e000 00:00 0
801b0000-802ac000 r-xp 00000000 00:0b 3883946    /lib/libc-2.5.so
802ac000-802b4000 rw-p 000fa000 00:0b 3883946    /lib/libc-2.5.so
802b4000-802b6000 rw-p 802b4000 00:00 0
802b6000-802c8000 r-xp 00000000 00:0b 3917925    /usr/lib/libz.so.1.2.3
802c8000-802ca000 rw-p 00010000 00:0b 3917925    /usr/lib/libz.so.1.2.3
802ca000-803ca000 rw-p 802ca000 00:00 0
803ca000-803dc000 r-xp 00000000 00:0b 3925656    /usr/lib/directfb-1.4-0/systems/libdirectfb_fbdev.so
803dc000-803e0000 rw-p 00010000 00:0b 3925656    /usr/lib/directfb-1.4-0/systems/libdirectfb_fbdev.so
bf8ee000-bf918000 rw-p bffd6000 00:00 0          [stack]
root@freescale:~#


Based on that, it looks to me like it should be working OK (i.e. - it doesn't look like virtual address 0x20000000 is conflicting with anything)... I suspect that virtual address 0x20000000 may "just not work" for some reason on the Coldfire's MMU... I've tried a few other addresses (without really understanding what I was doing) such as 0xA0000000, 0xC0000000, 0x40000000... If I remember correctly, the mmap() call returned "failure" in all those cases (so no segfault since Fusion didn't try to access the mapping)...

Thanks again for your help!
Allen

 

/************* Niels Roest's response # 2 ****************/

 

Hi Allen.

your "maps" looks exactly like mine, except that my node is called /dev/fusion0, and my shared size is 0x1000 instead of 0x2000 in your case. This is just PC, in my case.
I am starting to think that our implementation of the mmap is not working in your case.

It might be worth a try to see if you can get a closer look at fusion_mmap in fusiondev.c (fusion kernel module); put some printk's in there, you can always compare with your PC (and use e.g. X11 as directfb-system), which should work I hope.
Alternatively, you can try to map it to 0x9000.0000, which _should_ be working..

Not sure what else
I leave it to you to post it to a forum, though the Freescale guys might be the more logical approach here.
Would be interested in the solution.
hth
Niels 

 

/************* My "follow-up" response # 2 ****************/

 

Thanks Niels!

I'm a bit confused as to why your character device node would be at /dev/fusion0 rather than /dev/fusion/0 ... Is that because you're working with "development code" and /dev/fusion0 is going to be the new convention?...

I understand that you see 0x1000 for the shared memory size as that's the PAGE_SIZE for x86 and most other architectures, while I believe ColdFire (m68k) has a PAGE_SIZE = 0x2000 (for reasons I don't understand yet)...

I did try changing the fixed address for the mmap() call to 0x90000000... But I still get a segfault attempting to access address 0x90000004...

So, yes, I believe I will soon be adding many printk's to the linux-fusion kernel module (and to the kernel) to increase my understanding of the code involved... And hopefully find a solution...

Let me know if you think of anything else I should be trying...

Thanks!
Allen

 

/************* Niels Roest's response # 3 ****************/

 

Hi Allen,
no sorry - our resident fusion guru is being ill at home.
I would do as you suggest - going the printk road.

/dev/fusion0 was renamed /dev/fusion/0 or the other way round, I wasn't around at that time.
The major number was also changed at that time, but I do not think you have such conflict since the "enter" ioctl has already completed successfully with sane output at that stage.

Greets
Niels 

 

0 Kudos
Reply

1,207 Views
jkmahan
Contributor III

This was resolved with a patch but I don't recall the name of the patch.

 

You can find the resolution if you read the thread titled "MCF5485EVB, Linux 2.6.25 kernel: MMap broken..." started on 11/17/2008.  Nabendu posts the changes to cf_pgtable.h to fix this.  In my patch I don't think I added the CF_PAGE_DIRTY flag (but it's been a while..)

0 Kudos
Reply

1,207 Views
AllenBlack
Contributor I

Thanks for your prompt response!

 

I probably should have mentioned that I am familiar with the forum thread and have already implemented the patch you mentioned... I work directly (although remotely) with jkimble and this was the first thing we both thought of when I saw this issue several days ago...

 

Just now I rechecked and rebuilt my kernel to make sure it contained Nabendu's change to "cf_pgtable.h"... and it does... and that patch does not help or change the issue I am seeing...

 

I'm guessing that you should be able to reproduce the issue using (mostly) the released Coldfire M547X_8X BSP and the "default" packages from LTIB's GPP... Just enable and build the DirectFB from LTIB (version 1.1.0)... Run "dfbinfo" on your target to see that DirectFB works in "Single Application Core" mode...

 

Then edit the .spec file for DirectFB to add "--enable-multi" to the ./configure line and rebuild the package... At this point you need to get the kernel module "linux-fusion" from here:

 

 http://www.directfb.org/downloads/Core/linux-fusion/linux-fusion-7.0.1.tar.gz

 

Version 7.0.1 of the linux-fusion kernel module is the version that matches the DirectFB internal API for DirectFB v1.1.0 ... I just extracted the tarball directly into my kernel source tree in the directory ltib-xxxxx/rpm/BUILD/linux-2.6.25/drivers/char/fusion... and modified the ltib-xxxxx/rpm/BUILD/linux-2.6.25/drivers/char/KConfig file so it would build "fusion.ko" along with the kernel... Then I let LTIB rebuild the kernel...

 

Then you will need to create a device node on your target like this:

 

mknod -c 250 0 /dev/fusion/0

 

And load the module like this:

 

modprobe fusion

 

And then try to run "dfbinfo" again (now in Multi Application Core mode) and I believe you will see the issue...

 

I am trying to "cross post" these questions on the "directfb-users" mailing list but so far I have not even been able to join the list (i.e. - no confirmation e-mail to my join request yet)...

 

I still don't understand the significance of the Fusion library's call to mmap() with a fixed address of 0x20000000 + "some offset" (see the code segment from "fusion.c" in my original post)... I'm more familiar with calling mmap() with NULL as the first argument to let the kernel return whatever virtual address it wants to give the process...

 

I wonder if there is some "quirk" in the Coldfire kernel implementation of memory management that is conflicting with assumptions made in the DirectFB fusion or linux-fusion kernel module code...

 

This level and area of kernel code is really unfamiliar and cryptic to me so... Thanks for any help or ideas anyone can give!

 

Allen

0 Kudos
Reply