Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating README with citations and links to papers #47

Merged
merged 1 commit into from
Sep 6, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions coraal-multi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,34 @@ The multi-coraal dataset is a 10-hour subset of [CORAAL](https://oraal.uoregon.e
# Tooling
For purposes of alignment and WER cal [fstalign](https://github.com/revdotcom/fstalign/tree/master) tool. We strongly recommend the use of this tool to quickly get started using the *Earnings-21* dataset.

To find the recordings we used for this work, see the [DataSelection Jupyter Notebook](https://github.com/revdotcom/speech-datasets/blob/main/coraal-multi/code/DataSelection.ipynb) which shows our data selection process as well as the files that we ultimately selected for transcription. The audios can then be downloaded from the offical [CORAAL](https://oraal.uoregon.edu/coraal) website.


# Cite this Dataset
This dataset has been accepted to Interspeech 2024.
The paper describing our methods and results can be found on arXiv at
The paper describing our methods and results can be found on [ArXiv](https://arxiv.org/abs/2409.03059), and on the [ISCA Archive](https://www.isca-archive.org/interspeech_2024/heuser24_interspeech.html).

If you use work please cite the following:
## Text Citation
```
Heuser, A., Kendall, T., del Rio, M., McNamara, Q., Bhandari, N., Miller, C., Jetté, M. (2024) Quantification of stylistic differences in human- and ASR-produced transcripts of African American English. Proc. Interspeech 2024, 4538-4542, doi: 10.21437/Interspeech.2024-2300
Kendall, Tyler and Charlie Farrington. 2023. The Corpus of Regional African American Language. Version 2023.06. Eugene, OR: The Online Resources for African American Language Project. [https://doi.org/10.7264/1ad5-6t35]
```

## BibTex Citation
```
ADD LATER
@inproceedings{heuser24_interspeech,
title = {Quantification of stylistic differences in human- and ASR-produced transcripts of African American English},
author = {Annika Heuser and Tyler Kendall and Miguel {del Rio} and Quinn McNamara and Nishchal Bhandari and Corey Miller and Migüel Jetté},
year = {2024},
booktitle = {Interspeech 2024},
pages = {4538--4542},
doi = {10.21437/Interspeech.2024-2300},
}
@article{kendall2018corpus,
title={The corpus of regional {A}frican {A}merican {L}anguage},
author={Kendall, Tyler and Farrington, Charlie},
journal={Version 2023.06},
year={2023}
}
```
Loading