-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LNL HDA alsabat capture failed, "Peak freq too low" #9164
Comments
@fredoh9 how many jack connectors do we have on the RVP? is the same 3.5mm jack connector for HDaudio and SoundWire reworks? |
Only one jack. I don't remember any RVP with more than one jack.
I don't know. I looked at |
I just noticed someone disabled ba-lnlm-rvp-hda-02, on which the two failures were found... |
@plbossart We have only one jack in the RVP, i don't remember any RVP with multiple JACK. @marc-hb For LNLM_RVP_HDA, the RVP doesn't have HDA codec, so we attached external AIOC. Hence the jack for the HDA codec is in external AIOC is being used. |
From @ssavati : silicon being upgraded. |
@marc-hb now ba-lnlm-rvp-hda-02 board is up with silicon upgrade. Its uses community firmware |
Very recent one today:
|
Got one reproduction of this, had to run test 200+ times to hit the problem, so occurence rate would seem to be below 1%. In the one failing case, captured bat.wav looks good but analysis fails, so not clear yet what is happening, but the error looks similar as in original report. If you can hit @jsarha please add data. @marc-hb is this roughly in line with occurence rate you've seen? |
I would guess the reproduction rate is somewhere between 0.1% and 5% so, yes: 1% falls in that range :-) |
Did 249 runs more on ba-lnlm-rvp-hda-02, no failures. I'll pick a machine tomorrow again, run couple of hours more. It bothers me a bit that there is an obvious glitch in the audio in the both referred occurrences, but they are not at all similar. Its very hard to imagine a common cause for both of them. |
Here is one more occurrence, very much like the one in #9164 (comment) . Gap af around 0.2s seconds (one bit longer another shorter) in the middle of the capture, then a little glitch ~ 0.015s before the signal starts to come back in a ramp. |
That was bit under 300 test runs this morning, and one occurrence. |
occurrence_2024-06-14-10.zip Here is three more occurrences, but I do not think its the same issue. They look more like test setup failures to me. The bat.wav files look perfectly Ok, but for some reason the validation fails. |
I'm not familiar with alsabat but I've been told that it is much more sensitive than the human eye or even ear. |
I did not try to analyze it with my ears, but did frequency analysis with Audacity, and the frequency peak was there exactly in the right place, with no other local peaks. |
There's a small glitch in occurrence_2024-06-14-11-46.zip around 0.6740, which explains the alsabat fail. |
occurrence-2024-06-17-11-04.zip Two more cases. One without any immediately obvious fault, but probably some subtle discontinuation in the sine-wave somewhere. The other has the obvious gap-pattern. The gap is a bit wider this time, about 700ms. |
Can you please summarize how you found (with Audacity?) what @jsarha didn't? |
I think the same tools can be used. I missed this as well at first as expectation was a big visible gap or a repeating glitch pattern. I noticed this when listening to the file and noted the glitch. Then freq analysis in Audacity in small segments to limit the search space further and final bits by manual analysis of sample values (zooming into waveform display and/or exporting sample data values to text file) to find the exact point. |
Not sure how much light this sheds the issue, but I first run 673 successful round of alsabat test [1] using sof-hda-benchmark-gain32.tplg, and then quit the test script without a single failure. Then I restored the original sof-hda-generic-ace1-4ch.tplg and was able to run 263 rounds when I hit the error. BUT, the error I hit is of a completely new class. The signal is simply cut off in the middle of sample, and it does not resume. This was all with 3da8e64 FW commit. The test logs and the failed test-case is in the attached zip. testlogs-and-failed-testcase.zip [1] TPLG=/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg MODEL=LNLM_RVP_HDA SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-alsabat.sh -p hw:sofhdadsp,0 -c hw:CODEC,0 -C 2 -F 821 |
https://sof-ci.01.org/linuxpr/PR5075/build3763/devicetest/index.html Also daily test run 42929?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-nocodec-32bits-599 |
Oh, this is yet a new type of failure. There is a gap of only ~6ms and - bit suspiciously - the signal continues from exactly the same phase after the gap. Buffer under-run somewhere? There is couple of FW log messages like these in the middle of the test log (so not at the setup or tear-down time):
|
I run today another 680 round of alsabat test with sof-hda-benchmark-gain32.tplg to be sure that I just did not get lucky last friday. E.g. run again this test: To complete this test, I ran another 230 rounds with the same daily build (20240623/sof-28a5265568a8-1) this time with standard generic-ace1-4ch topology, to hit the "phase shift"-error again [1]. Starts to look like the issue does not show with gain widget only. I'll try some other benchmark topologies next, to see if I can find the problematic widget that way. |
I became suspicious about the last weeks findings and decided to try them again with enough cycles to know with reasonable certainty in what configurations the issue happens and in what it does not. So I run the test again using following configurationns:
This starts to point now either to the HDA interface or to the USB audio device that is used for capture in DUT. It would be nice to be able to test this with a loop-back cable from RVP line-out to line-in. [1] alsabat-benchmark-gain32.zip |
@jsarha when I look at the captured waveform for DSPless I don't see anything suspicious, there's one nice sinewave with a 0.08 FS value? |
This comment was marked as off-topic.
This comment was marked as off-topic.
@plbossart there is an obvious glitch at 1.313s: |
@jsarha indeed this looks like a 0.5ms loss of data |
To pull together all findings the phase shift glitch is seen on all these configurations, but one. The reproduction rate varies between 1/200 and 1/1000. The nocodec test was running more than 1200 cycles (about 8 hours) without a glitch.
One common thing with all failed setups is the USB audio device doing the capture. We should still do one more test with analog loop-back cable connected back to SOF HDA audio device to rule out a USB audio failure. The other failure type is the long gap (hundreds of milliseconds) and a ramp when the test signal resumes. My theory there is that the test script, that starts the playback and capture parts asynchronously, somehow fails with the timings, and the capture catches the gap between the tests. |
Recent reproduction: https://sof-ci.01.org/softestpr/PR1217/build594/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-599 EDIT: we never had any such issues with our USB audio setup before LNL. |
This looks very similar on... MTL: |
@marc-hb Is it possible to get logs from IPC payloads? |
IPC logs should be enabled on LNL. Where are they missing? |
Are IPC logs missing in today's failure https://sof-ci.01.org/softestpr/PR1224/build686/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-playback-997? |
@marc-hb these logs will be good, thanks! |
We can stress this after upgrading to new loopback dongles. |
@lgirdwood do you know when we can expect the upgrade? |
Probably Q4. |
Keeping this open although likely same rootcause as #9449 Let's see another week of daily plan results and conclude then. |
Not seen in PR/daily tests for a week, closing. |
Originally posted by @marc-hb in #9123 (comment)
https://sof-ci.01.org/sofpr/PR9151/build4789/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997 ( ba-lnlm-rvp-hda-02)
The text was updated successfully, but these errors were encountered: