Skip to content

ooshyun/ClarityChallenge2023

Repository files navigation

2023 ICASSP Clarity Challenge for Speech enhancement with hearing aid

@author daniel.oh

@data 2023.03.30

1. Summary

This repository is for the 2023 ICASSP Clarity Challenge for Speech enhancement with hearing aid. It used two addtional repository, which are ML model for speech enhancment and Hearing aid modules. The main model is Conv-Tasnet using Permutation Invariant Training(PIT).

Denoised S03603 wav file

2. Baseline

Before starting, it clone a git from https://github.com/ooshyun/Speech-Enhancement-Pytorch as "mllib" folder, and added ./recipes/icassp_2023/MLbaseline.

2.1 Evalaute (clarity/receipes/icassp_2023/baseline)

  • clarity/receipes/icassp_2023/baseline/enhance.py: It processes from each mixture sound depending on scene name to denoised and save as wav file. From this file, participants should implement their own enhanced mechnism.

  • clarity/receipes/icassp_2023/baseline/evaluate.py: It processes from each enhanced sound to amplify and compress, and score the sound using clean sound and "anechoic"(deverbersed) sound using haspi/hasqi metric. These score saved as .csv files.

  • clarity/receipes/icassp_2023/baseline/report.py: It load .csv file and avergage haspi/hasqi scores

2.2 Dataset

  • SOXXXX: Scene name

    • XXX_mix_XXX.wav : several person and target person
    • XXX_interferer_XXX.wav: several person
    • XXX_target_XXX.wav: clean sound for target
    • XXX_anechoic_XXX.wav: sounds free from echo for target
  • L0XXXX: Listeners ID, which can load hearing loss

  • Tree of dataset folder

2.3 Pipeline

  1. Dataset
  2. Dataloader
  3. trainer
  4. model
  5. evalutation
  6. submission file
  • Used library: julius,librosa, torchaudio

2.4 Model Research

3. Results

If want to test 3 models, then it should change manully model parameter in ./mllib/src/model/conv_tasnet.py

3.1 Parameters

  • Reference. https://github.com/JusperLee/Conv-TasNet/tree/9eac70d28a5dba61172ad39dd9fb90caa0d1a45f

  • 1 epoch: 316+56 step

  • randomly cropping the wavform

  • dataset channel 0, 1, 2, 3

  • skip: False

  • segment 4

  • norm z-score

  • Utterance-level Permuatation Invariant Training (uPIT)

  • Saved Model

    - 20230220-100114
        - 	N, 	L, 	 B,   H, P, X, R, Norm, Casual, batch
        - 128, 40, 128, 256, 3, 7, 2,  gLN,   X,     16             
    
    - 20230221-231507
        - 	N, 	L, 	 B,   H, P, X, R, Norm, Casual, batch
        - 512, 	32 	128, 512  3  8  3   gLN    X 	  4
    
    - 20230223-140053
        - 	N, 	L, 	 B,   H, P, X, R, Norm, Casual, batch
        - 512, 	40 	128, 512  3  8  4   gLN    X 	  4
    

3.2 Denoise

The details of results can show in ./tensorboard, and it prepared to inference noisy sound file to denoise using inference.ipynb and conv-tasnet in result/model.

4. Concepts in Clarity Challenge

4.1 Amplification

  • NAL-R
  • Each freqeuncy gain, bias, Each freqeuncy/HL bias, hl*0.31 gain
  • interpolate 1D

4.2 Compression

  • att 5, release 20, attenuate 0.0001, threshold 1, makeup gain 1,
  • Compression depending on the rms
    if rms_i > self.threshold:
        temp_comp = (rms_i * self.attenuation) + (
            (1 - self.attenuation) * self.threshold
        )
        curr_comp = (curr_comp * (1 - self.attack)) + (temp_comp * self.attack)
    else:
        curr_comp = (1 * self.release) + curr_comp * (1 - self.release)
    
    ...
    
    signal * np.array(comp_ratios) * self.makeup_gain

4.3 HASPI

  • x: original, y: amplified
  1. Ear Model

    • itype = 0, channel = 32
    1. Cochlear model parameters
    2. itype = 0, HL = 0 else HL = HL
    3. resample 24kHz(Resamp24kHz, diff length of result with resampy resampe)
    4. align processed and original
    5. [HASQI] amplified using NALR
    6. Cochlear model for middle ear(LP 5000Hz, HP 350Hz)
    7. Auditory filter bank for each channel
      • Gamma Tone band (middle ear, Max HL BW)
      • Cochlear compression
  2. Envelop filter

  3. Ceptral Coeffcient

  4. Modulate Filter

  5. Correlation

  6. Neural Feedforward

4.4 HASQI

  1. Ear Model: itype = 1(NALR gain apply)
  2. Smoothing
  3. Mel Correlation(Ceptral correlation)
  4. spectral differiate(log-term spectra)
  5. Temporal Correlation
  6. Segment cross-covariance
  7. nonlinear/linear performance
  8. nonlinear*linear -> max of middle of nonlinear * middle of linear

5. Reference

*The details for reference is on the each python files.

About

Speech Enhancement for Hearing Aid

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published