Neural-mask-estimation

key feature

LSTM-based Neural Mask Estimation for designing MVDR [1, 4]
on-the-fly data augmentation
pre-trained model
speaker-Aware mask training supported [2]
SNR-based reference mic selection for MVDR [1, 4]
small scale sample training data
- You can perform experiment using any data by replacing the data
- We put WHAM! noise data[2], Libri Speech and LJ speech as sample noise clean speech data.

How to use

Please run generate_validate_data.py
- Please put data(noise and clean speech) ./dataset/validate/*
- You will get validation_features/speech_mask.npy, validation_features/noise_mask.npy and validation_features/val_spec.npy
Please run train.py
- Please put data(noise and clean speech) ./dataset/train/*
- You will get model/neaural_mask_estimator{}.hdf5 ・{} indicates the number of times of epoch
Please run predict.py
- Perform mask estimation and design MVDR beamformer and you can get enhanced speech
- Please put multi channel data ./dataset/data_for_beamforming/* for beamforming
- You will get result in ./result/* ・ enhencement_all_channels.wav is result without channel selection -・enhacement_snr_select.wav is result with channel selection

speaker-aware mask estimating

1: Please run adapt.py - Please prepare target speaker list and non target speaker list (e.g., sp1_list.txt, sp2_list.txt) - you will get speaker-aware model ./model/speaker_2.hdf5

Please run speaker_aware_mask_predict.py
- you can compare mask results before/after adaptation

Reference:

[1] EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION
	- https://www.microsoft.com/en-us/research/uploads/prod/2018/04/ICASSP2018-Christoph.pdf


[2] WHAM!: Extending Speech Separation to Noisy Environments
	- https://arxiv.org/abs/1907.01160
	
[3] The Hitachi/JHU CHiME-5 system: Advances in speech recognition for veryday home environments using multiple microphone arrays
	- http://spandh.dcs.shef.ac.uk/chime_workshop/papers/CHiME_2018_paper_kanda.pdf


[4] Improved MVDR beamforming using single-channel mask prediction networks
	- https://www.merl.com/publications/docs/TR2016-072.pdf

Requirement:

python 3.6.7+

numpy 1.14.3 soundfile 0.9.0 pyroomacoustics 0.1.21 librosa 0.6.2 tensorflow 1.9.0 scipy 1.2.0 cython 0.25.2 matplotlib 3.6.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural-mask-estimation

key feature

How to use

Reference:

Requirement:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
adapt_data		adapt_data
beamformer		beamformer
dataset		dataset
image		image
maskestimator		maskestimator
model		model
result		result
tflog		tflog
validation_features		validation_features
README.md		README.md
adapt.py		adapt.py
adapt_speaker_list.txt		adapt_speaker_list.txt
generate_validate_data.py		generate_validate_data.py
non_adapt_speaker_list.txt		non_adapt_speaker_list.txt
predict.py		predict.py
predict_single.py		predict_single.py
sp1_list.txt		sp1_list.txt
sp2_list.txt		sp2_list.txt
speaker_aware_mask_predict.py		speaker_aware_mask_predict.py
train.py		train.py

AkojimaSLP/Neural-mask-estimation

Folders and files

Latest commit

History

Repository files navigation

Neural-mask-estimation

key feature

How to use

Reference:

Requirement:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages