-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: reference channels shifts (disordered) over multiple recordings #309
Comments
hi, @gusido ! I looked briefly through the https://github.com/respeaker/seeed-voicecard/blob/master/ac108.c code and there has been quite a few changes that might affect channel order. I'm almost done with issue backlog and after that will be spending time working on issues that we were able to reproduce while doing internal testing. @HinTak do you have any ideas of what might be causing channel shift? It looks similar to #301 ,which I wasn't able to reproduce. But this one affects Reference channels and not the recording cahnnels it seems. |
Yes, #301 and quite a few closed-without-resolution ones. Afaik this is generic to multichannel (>2) capture and playback on the pi. See it on a different device and more / better discussion : Audio-Injector/Octo#1 . The audio-injector people at least leave the issue open for years, for other people to read about it... |
Hi there. |
Hi, I am having a similar problem with the Respeaker-4-mic-array for Raspberry Pi, though not with the "reference" mics but with the recordings. Is there any active development/troubleshooting going on? I am using the 64-bit kernel, and tested out on 2 different Raspberry Pi's and arrays with Audacity. I get two different permutations: 1-2-3-4 or 3-4-1-2. I couldn't get to find a reliable way to induce the switching between these permutations, but if I try hard enough (basically restarting the capture within Audacity or the script at https://github.com/spatialaudio/python-sounddevice/blob/0.4.1/examples/plot_input.py until it happens). Please let me know if it would be appropriate to open up a new issue. Stable permutations of the microphones is very crucial for our application, and we would be really happy to reach a solution as fast as possible. Best regards, |
Hi, I had the same problem on Respeaker-4-mic-array and made this workaround. In the zip file, ac108.c and seeed-voicecard.c are modified Yours faithfully, |
@JaPhoton this solved my problem. Thanks a lot! |
Hello, can you help out with the Respeaker-6-mic-array? Thank you so much |
@rnehrboss |
The 6 mic is a nightmare as seems totally random and currently not much good for the DelaySum/TDOA beamformer I have hacked together. Is anyone @JaPhoton else hosting a repo with the channel fixes as say with above its impossible to use with rotating channels. |
You have stated the ac108 is @EOL would the http://www.everest-semi.com/pdf/ES7210%20PB.pdf be an alternative? |
Is anyone still looking at this? it's been close to a year, and the respeaker 6 is still essentially unusable because of the inconsistent channel order. |
Only thing I can say is make sure you buy with paypal and at least then you can get a refund as yeah completely unusable if you have a random channel order. PS Respeaker please fix these with a new revision and supply mic daughter boards as why limit the board to what is a bad choice of the geometry you supply and near impossible to isolate the onboard mics. Or at least be honest and remove them from the store. |
I also have the same problem. |
I ordered via paypal and got a refund as seemed a better idea. |
I have forked this repository and applied the necessary changes there. Then
if I recall correctly, things worked like a charm. If you click on my
profile you should be able to find it. Good luck!
13 Nis 2022 Çar 01:45 tarihinde rnehrboss ***@***.***> şunu
yazdı:
… @JaPhoton <https://github.com/JaPhoton> Great work. Did this fix get
merged in?
If not, how do we Make the new files using the C source code files you
provided?
@egaznep <https://github.com/egaznep> Looks like you got it working too.
Do you mind sharing the recompile and installation steps using @JaPhoton
<https://github.com/JaPhoton> s code.
Thanks!
—
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AECMKFLPYNWNE7QJGJBKAZ3VEX4HBANCNFSM5CN7L6VQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
hey folks, I did a PR which cleans up and improves the above patch since it was breaking the output for my respeaker 6 mic. Let me know if you had the same problem and if this fixes it :) P.S. I did a PR against HinTak repo as it's more updated and we might want to batch multiple changes. let me know if you prefer a PR against this repo instead. |
well... turns out the previous solution worked like 80% of the times so I bit the bullet and implemented automatic loopback channel detection into the ec project (that's realistically how most of us would use it anyway) please check the PR above this comment and play with the other features I added. Let me know how it goes :) |
I checkout branch linux-4.19-or-less instead of master, use sudo ./install.sh --compat-kernel, then kernel version will be transformed from 5.10.17-v7l+ to 4.19. the order becomes correct... |
FWIW, Even "placebo style" random white-space changes is guaranteed to be correct at least 25% of time, since there are only 4 sync positions. Besides, I don't think the original was as poor as 25%? More like occasional (ie 80% correct). So I think "80%" correct is just placebo. |
One more comment on this issue. I think many users need to remotely log in to the Raspberry Pi through their laptop (not 24/7 on, you might close your laptop), start a screen session, then run their script in the screen session, and lastly, use Ctrl-A Ctrl-D to exit the screen session. In this way, your script will keep running even if you disconnect the SSH session. However, in our tests, this process may lead to a channel shift after you use Ctrl-A Ctrl-D to exit the screen session. The solution is to not use |
Disclaimer: I don't work for Seeed Studio. FWIW, comments like "this issue happens in this other situation I care about too" isn't helpful. The problem is well-understood I think - various components of the hardware just fake it and packs 2-channel 176k data, to and from, 8-channel 44k data. There are 4 ways of doing it. The driver starts and stop the components together, so most of the time, it is correct. However, when the system is busy (any situation, hence naming your "favourite" situation is not helpful) and stutters a bit, they go out of sync and you get one of the other 3 of 4 ways of packing 2x176k to 8x44k. I think the only correct way to fix this, is to fix the other bug about kernel panic with spinlocks. That addresses the scheduling problem. |
The spinlock issue is #251 |
This exact issue has bitten me in my current project. With 4 mic ReSpeaker card I observe occasional channel swap while recording. While reading the issue, BCM I2S block description, I came across the following:
It's the only way I can imagine things going out of sync - swapping L/R parts of I2S frame would permute Is there anything else (github issue, forum thread, whatever) that sheds some light on this topic? |
This bit us a while back. Someone made a firmware fix. We grabbed that
and have used it ever since without issue.
Sorry I don't have a link to the fix. Should be able to search back. It
was probably late 2021, early 2022.
…On Mon, Sep 18, 2023 at 8:56 AM Przemysław Węgrzyn ***@***.***> wrote:
This exact issue has bitten me in my current project. With 4 mic ReSpeaker
card I observe occasional channel swap while recording.
While reading the issue, BCM I2S block description, I came across the
following:
If a FIFO error occurs in a two channel frame, then channel
synchronisation may be lost
which may result in a left right audio channel swap. RXSYNC and TXSYNC
status bits are provided to help determine if channel slip has occurred.
They indicate if the number of words in the FIFO is a multiple of a full
frame (taking into account where we are in the current frame being
transferred). This assumes that an integer number of frames data has been
sent/read from the FIFOs.
It's the only way I can imagine things going out of sync - swapping L/R
parts of I2S frame would permute 1-2-3-4 mics into 3-4-1-2 which is what
I observe. A sample scenario would be where the FIFO is overflown at start
if e.g codec starts pushing the data before DMA is started. Is this what's
happening here?
Is there anything else (github issue, forum thread, whatever) that sheds
some light on this topic?
—
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QE2TTO3RKCQ6IQ7KR6DX3BHHXANCNFSM5CN7L6VQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Do you refer to this particular comment? #309 (comment) I will certainly give it a try then. |
Yep.. I think that's it.
Good luck.
…On Mon, Sep 18, 2023 at 9:12 AM Przemysław Węgrzyn ***@***.***> wrote:
Do you refer to this particular comment? #309 (comment)
<#309 (comment)>
I will certainly give it a try then.
—
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QEZ3TX3VCBVW2GYPXELX3BJGNANCNFSM5CN7L6VQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Already explained that the code change is rubbish: |
Working for us. Prior, the channels were totally random, after driver change, seem to be 100% acurate. We now have many many units in the field. |
Hmm, I'm still missing any explanation of what exactly is causing the issue, I'd love to gain a deeper understanding. @HinTak you write about "4 sync positions" - how can there be 4 sync positions, anyway? I must be missing something important here, but my understanding so far was that the mis-synchronization is due to I2S input FIFO going out of sync, but the FIFO is 32-bit wide, so that would only explain @rnehrboss do you remember, if you have tried the original code from the zip from JaPhoton, or the one from jacopomaroli pull request? |
I think it was the zip from JaPhoton. I kind of recall needing to rebuild
the driver. I don't have a great memory though.
…On Mon, Sep 18, 2023 at 10:12 AM Przemysław Węgrzyn < ***@***.***> wrote:
Hmm, I'm still missing any explanation of what exactly is causing the
issue, I'd love to gain a deeper understanding.
@HinTak <https://github.com/HinTak> you write about "4 sync positions" -
how can there be 4 sync positions, anyway? I must be missing something
important here, but my understanding so far was that the
mis-synchronization is due to I2S input FIFO going out of sync, but the
FIFO is 32-bit wide, so that would only explain 1-2-3-4 to 3-4-1-2 swap.
I'm confused here.
@rnehrboss <https://github.com/rnehrboss> do you remember, if you have
tried the original code from the zip from JaPhoton
<https://github.com/JaPhoton>, or the one from jacopomaroli
<https://github.com/jacopomaroli> pull request?
—
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QE72RPM7UXXIBXVUBQDX3BQFLANCNFSM5CN7L6VQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@codepainters AFAIK it is an artifact of trying to pack and unpack 8-channel audio as 2-channel at 4x frequency. (The 4-channel device has 8 channels with 4 empty). So there are 4 ways of doing it, with a bias to the sync position. So even if you do it without any synchronisation, you would still be 25% correct. |
I'm still confused about where shall such a (mis)synchronization happen:
The only place in this chain that is susceptible to misalignment is the FIFO, as documented by Broadcomm (of course we have 2 TDM slots per one I2S channel):
|
Ok, I've done my homework, I think I understand a bit more. @HinTak wrote:
AFAIK the trick with running stereo I2S at 4x the nominal frequency that you refer to is what e.g. Audio Injector Octo does. It overcomes the sync issue using additional CPLD as However, with 4-Mic ReSpeaker it is slightly different. The codec itself is configured with You can clearly see 4 slots, 32 bits each, with last 8 bits in each slot zeroed (the codec seems to produce 24 bit samples). Actually it puzzled me for a while - Pi's I2S interface can handle up to 2 slots of 32 bits per frame, so how is that even possible to handle 4 slots per frame? Here's the tricky part (from "BCM2711 ARM Peripherals" document):
And the ReSpeaker driver sets I2S frame length to 64 bits. Thus, each 128 bit frame from codec is in fact consumed as 2 consecutive 64 bit frames, 2 slots each. The receiver effectively re-synchronizes every second 64-bit frame. It has interesting implications:
Unfortunately Broadcomm's document doesn't give enough details, some more experimentation is necessary. |
Wow great analysis.
I'll be curious to know what your experimentation reveals. I'm pretty sure
when we put the fix in, it started acting correctly. Like I said we tested
a bunch and now havr many in the field.
Although now you have me second guessing...
…On Fri, Sep 22, 2023, 3:05 PM Przemysław Węgrzyn ***@***.***> wrote:
Ok, I've done my homework, I think I understand a bit more.
@HinTak <https://github.com/HinTak> wrote:
AFAIK it is an artifact of trying to pack and unpack 8-channel audio as
2-channel at 4x frequency. (The 4-channel device has 8 channels with 4
empty). So there are 4 ways of doing it, with a bias to the sync position.
So even if you do it without any synchronisation, you would still be 25%
correct.
AFAIK the trick with running stereo I2S at 4x the nominal frequency that
you refer to is what e.g. Audio Injector Octo does. It overcomes the sync
issue using additional CPLD as BLCK and LRCK source (i.e. both
RaspberryPi and codec are configured as slaves), as far as I understand
CPLD takes care of starting the stream at the right moment.
However, with 4-Mic ReSpeaker it is slightly different.
The codec itself is configured with LRCK frequency equal to sampling
frequency. Each frame is 128 bits long, with 4 slots, 32 bits each. I've
confirmed it by checking codec registers, as well as with the scope (yellow
- data, blue - LRCK):
[image: s1]
<https://user-images.githubusercontent.com/961496/270050401-d7dd4827-56e9-4c47-9d21-8e61f9d6c7cc.jpeg>
You can clearly see 4 slots, 32 bits each, with last 8 bits in each slot
zeroed (the codec seems to produce 24 bit samples).
Actually it puzzled me for a while - Pi's I2S interface can handle up to 2
slots of 32 bits per frame, so how is that even possible to handle 4 slots
per frame?
Here's the tricky part (from "BCM2711 ARM Peripherals" document):
Note that in frame sync slave mode there are two synchronising methods.
The legacy method is used when the frame
length = 0. In this case the internal frame logic has to detect the
incoming PCM_FS signal and reset the internal frame
counter at the start of every frame. The logic relies on the PCM_FS to
indicate the length of the frame and so can cope
with adjacent frames of different lengths. However, this creates a short
timing path that will corrupt the PCM_DOUT for
one specific frame/channel setting.
The preferred method is to set the frame length to the expected length.
Here the incoming PCM_FS is used to
resynchronise the internal frame counter and this eliminates the short
timing path.
And the ReSpeaker driver sets I2S *frame length to 64 bits*. Thus, each
128 bit frame from codec is in fact consumed as 2 consecutive 64 bit
frames, 2 slots each. The receiver effectively re-synchronizes every second
64-bit frame.
It has interesting implications:
- channel rotation is not caused at the I2S transport level (as is the
case with 4x fs trick) - receiver always synchronizes at the same slot
(where the 128 bit frame starts).
- I've found no information on how the I2S receiver behaves right
after enabling it - if it waits for the first LRCK pulse before
receiving anything, or if it starts deserializing the bitstream at random
place, and only regains the synchronization on the first LRCK pulse.
- in the first case, it should be enough to enable LRCK only after
the receiver is ready (with FIFO emptied), to get a proper channel order. I
suppose that's exactly what the patch discussed above tries to achieve.
- in the second case, if any 32-bit words are written to the FIFO
before the first LRCK pulse, then there's no way to reliably
synchronize channels without extra hardware.
Unfortunately Broadcomm's document doesn't give enough details, some more
experimentation is necessary.
—
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QE67DKICGZGY4MYHU5TX3X4TFANCNFSM5CN7L6VQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Dunno does anyone even know what TDM format is in place or is even a true TDM format? |
I'm not sure what you mean. I'm quite confident now about the format used by the codec. I've checked the registers (Igot a full AC108 datasheet from X-Powers), done some oscilloscope measurements, everything matches. As stated before - it's 128 bits per frame, 4 slots, 32 bits per slot, LRCK pulse width = 1 BCLK period, 1 BLCK period delay. AC108 manual calls it PCM mode A (SR = WORD_SIZE =32, LRCK mode Short): Here's a trace that confirms the LRCK and BLCK polarities: That's pretty much all you need to know about the format. |
Thanks. I just wanted to understand the root cause.
I'm not sure if we will go down the rabbit hole, it's quite deep :) We need a reliable solution, and all the multichannel Raspberry interfaces seem to be a hack now. I'm not sure if it is worth the effort. |
Dunno the Espressif doc has good examples, but always wondered what I2S ports that do and don't support TDM mode as in what is the difference. I noticed this with the ESP32 which doesn't support TDM mode... |
I've done a simple experiment - see https://github.com/codepainters/rpi-i2s-experiments Basically I send I2S frames to Pi, only enabling Given the above, I've no idea how to solve this very issue. With a CPLD/FPGA it could be possible to precisely gate the At that point I decided to give up - even if there's any software-only solution, it would be an ugly hack. For our project we've decided to build a simple I2S to USB interface and use regular USB Audio Device drivers. |
We'll go back and look at our application. Pretty sure it was random
channel assignment, and with the driver fix, I was pretty sure that it
became reliable. Will follow up and let you know.
…On Tue, Sep 26, 2023 at 10:20 AM Przemysław Węgrzyn < ***@***.***> wrote:
I've done a simple experiment - see
https://github.com/codepainters/rpi-i2s-experiments
Basically I send I2S frames to Pi, only enabling LRCK after some number
of frames, to check how the receiver behaves. This confirmed my
understanding - I2S receiver starts deserializing at a random place in the
stream, and only regains synchronization on the first LRCK pulse.
Given the above, I've no idea how to solve this very issue. With a
CPLD/FPGA it could be possible to precisely gate the BCLK and LRCK clocks
- but that's a hardware mod.
At that point I decided to give up - even if there's any software-only
solution, it would be an ugly hack. For our project we've decided to build
a simple I2S to USB interface and use regular USB Audio Device drivers.
—
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QE3334B26IZJW7NYDWLX4LXENANCNFSM5CN7L6VQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This also works for the 6-mic array. Thanks for your sharing. |
I think there are still other reasons that can cause the channel shift problem. But for the software version, I think we should use the |
For those who just want to avoid this problem, I try the following steps with a Raspberry Pi 4B and the 6-Mic array and make it work: (followed the sharing from @beitong95)
And I found the python script provided by the wiki (which uses play_and_record.pyimport wave
import argparse
import numpy as np
import sounddevice as sd
import os
class Recorder:
def __init__(self, channels, samplerate, chunk_size):
self.channels = channels
self.samplerate = samplerate
self.chunk_size = chunk_size
self.frames = []
def callback(self, indata, frames, time, status):
if status:
print(status)
self.frames.append(indata.copy())
# Setting up argument parser
parser = argparse.ArgumentParser()
parser.add_argument("--filename", type=str, default="output")
parser.add_argument("--playPath", type=str, default="./sequence.npy") # audio to be played
parser.add_argument("--savePath", type=str, default="./recordings/")
parser.add_argument("--playDevice", type=int, default=5)
parser.add_argument("--recDevice", type=int, default=0)
args = parser.parse_args()
WAVE_OUTPUT_FILENAME = args.filename.strip()
file_path = args.playPath.strip()
chunk_size = 1024
data = np.load(file_path)
RECORD_SECONDS = len(data) / 48000 + 0.1
recorder = Recorder(channels=8, samplerate=48000, chunk_size=chunk_size)
stream = sd.InputStream(
samplerate=48000,
channels=8,
dtype='int16',
blocksize=chunk_size,
callback=recorder.callback,
device=args.recDevice
)
print("Recording...")
with stream:
sd.play(data, samplerate=48000, device=args.playDevice)
sd.wait()
base_filename = os.path.splitext(WAVE_OUTPUT_FILENAME)[0].strip()
filename = f"{base_filename}.wav"
full_save_path = os.path.join(args.savePath.strip(), filename)
print(f"Saving to {full_save_path}")
wf = wave.open(full_save_path, 'wb')
wf.setnchannels(8)
wf.setsampwidth(2) # 16-bit resolution
wf.setframerate(48000)
wf.writeframes(b''.join(recorder.frames))
wf.close()
print("Done recording") Hope this will be useful :) |
I am not convinced about just downgrading. I think there were a kernel bug between 5.4 and 5.10 where different parts of the hardware were initialized and deinitialized at the wrong order. As this bug is sensitive to how the driver is initialized, that bug may gives better sync due to it being wrong.... anyway, I haven't seen a "convincing" answer yet, just a lot of voodoos...I.e. "dance naked under the next full moon in an open grass field and your device will work" :-). |
Essentially you are bit banging TDM mode over a I2S channel that doesn't support hardware TDM. From Esspressif to Ti you can get devices that do or don't support hardware TDM and none suggest bit banging TDM on a standard I2S. Here is an old Cyrus logic app note https://gab.wallawalla.edu/~larry.aamodt/engr432/cirrus_logic_TDM_AN301.pdf Time Division Multiplexed Audio Interface: A Tutorial As far as I can gather the L/R clock in TDM mode is not a L/R clock but the frame sync and yes many things can be bitbanged that don't have hardware support but there are reasons why these do have specific harware needs to run as expected. It clearly states how the Frame Synchronization Pulse should be timed and is the same with all hardware TDM I2S where the L/R clock works in a totally different manner because it isn't a L/R clock but a pulse denoting the 1st frame in the multichannel audio... |
Describe the bug
To Reproduce
Steps to reproduce the behavior:
arecord -D hw:CARD=seeed8micvoicec,DEV=0 -d 3 -r 48000 -c 8 -f s32_le test.wav
Expected behavior
reference channels should always be channels 6 and 7 (count starting from 0)
Platform
Relevant log output
No response
The text was updated successfully, but these errors were encountered: