Getting started with ALSA SWPDM Plugin

3 Kudos

Why SWPDM?

In order to process human voice, it is required to have the best audio resolution in the incoming data captured by the microphones. This mean, having a resolution of 16bits is not enough to capture all the information to properly process the voice. Voice processing requires a peripheral capable of capture data on a 32bits resolution within the range of the most common sample rates (16kHz, 44.1kHz, 48Khz, etc.).

On the i.MX8M family there is a peripheral which fulfill those requirements and is called MICFIL. MICFIL is a peripheral which convert PDM (Pulse Density Modulation) data to PCM (Pulse-Code Modulation) data. The PDM format encode the analog signal in just one bit. Where 1 means the signal is increasing in amplitude while 0 means the opposite. In the other hand, the PCM format encode the data in 8, 16, or 32 bits. The advantage of PDM is that the creation of microphones is cheaper than having PCM microphones but then you will need a software or hardware which do the conversion for PDM to PCM since PDM cannot be processed. This is the reason of the MICFIL peripheral.

However, not all the MICFIL's on the difference SOMs are the same. While the i.MX8MPLUS has a resolution of 32bits its smaller brothers do not. i.MX8MMINI and i.MX8MNANO have a MICFIL which only allows a resolution up to 16bits. For most of the cases it will be enough but not for voice processing.

Nevertheless, not everything is lost; As mentioned previously, the PDM to PCM conversation can be done by hardware or by software. NXP also have the algorithm in software to do the conversation. Therefore, if a Mini or Nano is being used for voice processing it is fully recommended to use the ALSA SWPDM Plugin and avoid MICFIL peripheral.

Using the Plugin

In order to use the plugin, it is required to change the DTB to imx8mm-evk-8mic-swpdm.dtb, when using the i.MX8MM or imx8mn-evk-8mic-swpdm.dtb, when using the i.MX8MN.

In order to do so follow the next steps:

Please notice below example if for Mini. For Nano will be the same just changing the DTB name to imx8mn-evk-8mic-swpdm.dtb.

# Stop at U-boot
u-boot=> edit fdtfile
edit: imx8mm-evk-8mic-swpmd.dtb
u-boot=> saveenv
u-boot=> boot

The change in the DTB is required to disable MICFIL so Linux can receive the raw data and sent it to the plugin.

However, the plugin is not enabled by default, users need to explicit add the plugin to their ALSA pipeline. The way of doing so is by adding the following device to /etc/asound.conf:

pcm.cic {
	type cicFilter
	slave "hw:imxswpdmaudio,0"
	delay 100000
	gain 0
	OSR 48
}

Where:

pcm.cic: Is an arbitrary name which allow ALSA to find the requested devices when setting the -D flag with arecord or aplay.
type cicFilter: This is the plugin type which is named with the algorithm name.
slave: Name of the physical or virtual device which will be controlled by the cicFilter plugin. The recommendation is to always have the actual hardware connected to this plugin.
delay: Amount of time in microsecond which the plugin won't write to the buffer, but it still does the conversion. The value could be between 100us to 1'000,000us. By removing the property from the structure, the delay will be set to 0.
gain: A value between 0 and 100.
OSR: Is related to the quality of the signal by increasing the PDM sample rate. With a higher valuer a best quality on the audio can be achieved. However, keep in mind than having a higher value will also require more memory to store all the new data due to the oversampling. The valid values for the OSR are: 48, 64, 96, 128, and 192.

With all being said, the only thing left is to test the plugin by running the following command:

$ arecord -D cic -c4 -r16000 -f s32_le --period-size=96 -d5 -v test.wav

Integration With AFE

The next and final step is integrating the plugin with AFE and VoiceSeeker. The integration of SWPDM requires to apply a patch to the SWPDM repository. The patch changes the amount of period sizes allowed on the plugin. By default, the plugin only allows certain values which are:

48 Samples = 3ch x 4bytes format x 16samples = 192 bytes.
48 Samples = 2ch x 4bytes format x 48samples = 384 bytes.
48 Samples = 4ch x 4bytes format x 48samples = 768 bytes.
96 Samples = 4ch x 4bytes format x 96samples = 1,536 bytes.

Although, AFE and VoiceSeeker are extremely configurable, 48 or 96 samples for the algorithm is too small. Meaning that the SWPDM should support a bigger period size, not all the way around. By applying the attached file, the plugin can have a period size from 64 bytes (1ch and 16 samples) up to 16,384 bytes (4ch and 1024 samples). However, the number of samples can vary depending on the OSR value and the number of channels.

Once the patch has been applied in must be installed on: /usr/lib/alsa-lib(if the repository is being built on a standalone environment).

AFE opens a device called mic for capture the microphones' input. This device can have anything below it. By default, have the following definition on /etc/asound.conf (after following the steps described on the TODO.md file).

# mic represents the physical source (capture)
pcm.mic
{
	type plug
	slave.pcm "hw:micfilaudio,0"
}

The devices opens the MICFIL driver, but on this case MICFIL is disable, which means the definition of the device must change. From above cic device the definition can be copy and paste and then tweak one parameter. The delay must be set to 0 by removing the property or setting it explicitly on the structure. If this step if forgotten this might cause some underrun issues. The device definition will be:

pcm.mic {
	type cicFilter
	slave "hw:imxswpdmaudio,0"
	delay 0
	gain 0
	OSR 48
}

The last thing to do will be running AFE with VoiceSeeker as usual.

$ /unit_tests/nxp-afe/voice_ui_app &
$ /unit_tests/nxp-afe/afe libvoiceseekerlight &

Considerations and Restrictions

With all that said, there are few things left to mention, which are the considerations and restrictions on the plugin itself. These are good things to know before adding the plugin into any application.

The plugin is supported from the Linux BSP 5.15.32.
Currently the plugin only supports up to 4 channels.
Plugin only outputs a S32_LE format (if required another format please use MICFIL).
By applying above patch, the period size must be a multiple of 16, due to a limitation on the algorithm itself, rather than the plugin.
The driver only allows to have one microphone per data-line while MICFIL allows to have two microphones per data-line.
The SWPDM Plugin is based on the External Plugin: I/O Plugin. This means it also have the restriction of this ALSA plugin, being the following restriction the most important one:
- "The I/O-type plugin is a PCM plugin to work as the input or output terminal point, i.e. as a user-space PCM driver". In other words, there can't be any device/plugin on top of it, not even a "plug" type.

Getting started with ALSA SWPDM Plugin

Getting started with ALSA SWPDM Plugin