Skip to content

Audio annotation tool designed for second language acquisition (SLA) researchers

License

Notifications You must be signed in to change notification settings

0ldriku/CAF-Annotator

Repository files navigation

CAF-Annotator

CAF-Annotator is an annotation tool designed to help researchers and students in the field of Second Language Acquisition (SLA) efficiently transcribe, annotate, computate CAF (Complexity, Accuracy, and Fluency) measures for audio files.

Motivation

Currently, when extracting CAF measures from audio files, researchers often use a complex workflow involving multiple tools. They may use Praat for annotation, a separate transcription tool for transcribing the audio, and tools like Coh-Metrix to calculate transcription-based measures. This fragmented workflow can be time-consuming and cumbersome. CAF-Annotator aims to simplify and streamline this process by integrating the entire workflow into a single web application. This tool enables researchers to perform transcription, annotation, and computation tasks within a unified interface.

Features

  • User-friendly interface for annotating audio files
  • Automatic transcription, annotation and pause detection.
  • Cross-platform compatibility

Requirements

  • Python 3.9 or greater

Getting started

  1. Clone the repository:

    git clone https://github.com/your-username/your-repo.git
  2. Navigate to your project directory:

    cd /path/to/your/project
  3. Create a virtual environment:

    python -m venv .venv
  4. Activate the virtual environment:

    • For Linux/Mac:

      source .venv/bin/activate
    • For Windows:

      .venv\Scripts\activate
  5. Install the required dependencies:

    pip install -r requirements.txt
  6. Run the application:

    python app.py
  7. Access the application through your web browser at http://localhost:5000 (or the appropriate URL).

Compatibility

Best work with Chrome and Edge. Safari is not recommended.

Workflow Instructions

  1. Prepare Audio Files and Transcriptions:

    • Place your audio files in /files. Then click the transcribe button to transcribe the audio files. The transcriptions will be stored in /results/transcriptions.
      • If your ethics review allows you to upload audio files to the internet, you can use this Google Colab notebook. However, make sure to carefully consider the privacy implications and get explicit permission before uploading any audio data, especially if it contains personally identifiable information or sensitive content.
  2. Segment Text:

    • Segment the text into the units you want to analyze.
    • You can't change the words in this step. You can only add or remove lines.
    • Auto segmentation can be done with NLP tools like Stanza or SpaCy, but they may not be as accurate for spoken language as they are for written text. Therefore, I decided to let the user manually adjust the segments for now. If you want, you can use those tools to segment and load the data in the next steps.
  3. Adjust Segment Boundaries:

    • Load the transcribed audio.
    • Load the segmentation. After selecting the audio file, the system will automatically detect the segmentation files. Simply click the "Load Small/Big Segments" button to proceed.
    • Since the faster-whisper tool is not perfect, you may need to add or delete regions, or adjust the boundaries of the segments as necessary.
  4. Edit Text:

    • Edit the text in track 2 if necessary. The text in track 3 is not used to compute the CAF measures, so it does not require editing.
  5. Pause Detection:

    • Press the pause detection button to automatically detect pause durations.
    • Verify the accuracy of detected pauses and adjust manually if necessary.
  6. Dysfluency Detection:

    • Press the dysfluency detection button to automatically detect dysfluencies.
    • Verify the accuracy of detected dysfluencies and adjust manually if necessary.
    • Dysfluencies are detected based on the results of whisper-timstamped GitHub Repository. The tool may not be able to detect all dysfluencies, so manual verification is necessary.
  7. Annotate Accuracy and Dysfluency:

    • Annotate the accuracy in track 5 and dysfluency in track 6.
  8. Save Annotations:

    • Save all annotations upon completion.
  9. Compute CAF Measures:

    • Load the annotations and compute the CAF measures.

About the Tracks in the Annotation Step

Annotation Screenshot

  • Track 1: Waveform

    • Displays the waveform of the audio.
  • Track 2: Small Segments Annotation

    • Used for annotating small segments of the audio, such as clauses or any other user-defined segments.
  • Track 3: Large Segments Annotation

    • Used for annotating large segments of the audio, which can include sentences, AS-units, or any other user-defined segments.
  • Track 4: Pause Duration Annotation

    • CAF-Annotator supports automatic pause detection. Users can click the "Auto Detect Pause" button to automatically detect pause durations. Pauses within small segments are labeled as "M," and pauses between large segments are labeled as "E." Users can also manually annotate pause durations by clicking the "+ PAUSE" button.
  • Track 5: Accuracy Annotation

    • This track is designated for annotating accuracy in the audio. Currently can only be performed manually by the user.
  • Track 6: Dysfluency Annotation

    • This track is designated for annotating dysfluency. Currently can only be performed manually by the user.

Data Structure

  • /files
    Place your audio files here.

  • /results/transcriptions
    Contains transcriptions of the audio files.

    • [filename].[extension].transcribe.json: File with word-level timestamps.
    • [filename].[extension].subtitles.txt: Plain text file containing only text without timestamps.
  • /results/textfiles
    Contains segmented files.

    • [filename].[extension].bigsegment.txt: Contains big segments of text.
    • [filename].[extension].smallsegment.txt: Contains small segments of text.
  • /results/matchedjson
    Contains JSON files with word-level timestamps of segmented files.

    • [filename].[extension].bigsegment.matched.json: Big segment file with timestamps.
    • [filename].[extension].smallsegment.matched.json: Small segment file with timestamps.
  • /results/adjustedRegions

    • [filename].[extension].regionData.json: Contains all annotations added in Step 2 (Annotate). Computations in Step 3 are based on this file.

Supported CAF Measures

For academic purposes, it is important to review the computation methods used. Please check the compute_caf_values() function defined in app.py for detailed implementation.

The results of the annotation process are saved in JSON file format. This file contains the transcriptions, timestamps, and annotations of each audio data track. Although the tool offers basic metrics of CAF measures, it acknowledges that current Natural Language Processing (NLP) tools may not adequately capture the unique linguistic features and errors typical in L2 learners' speech. Users can implement and apply their preferred methods and metrics base on the JSON file.

  • Syntactic complexity

    • Mean length of clause: The mean number of words produeced per segment.
  • Speed fluency

    • Speed rate: The mean number of words produced per second, divided by total audio duration.
    • Articulation rate: The mean number of words per second, divided by total phonation time (ie., total speech duration excludingpauses).
  • Breakdown fluency (if the user segment the small segment as clauses)

    • Mid-clause pause ratio: The total number of unfilled pauses within clauses was divided by the total number of words.
    • Final-clause pause ratio: The total number of unfilled pauses between clauses was divided by the total number of words.
    • Mid-clause pause duration: Mean duration of pauses within clauses, expressed in seconds.
    • Final-clause pause duration: Mean duration of pauses between clauses, expressed in seconds.
  • Repair fluency

    • Dysfluency rate: The mean number of dysfluencies per second, divided by total speech duration. Detected by time-stamped whisper, but the accuracy is not guaranteed.

CAF Measures on working

I plan to implement the following measures on a separate page. Since initializing the NLP tools can be time-consuming, this will be set up on a different page to optimize performance.

Important Note: The labeling of content words and the total word count heavily depend on the configuration of the NLP tools. I strongly recommend verifying the labeling results before proceeding with the analysis.

  • Syntactic complexity

    • Mean length of noun phrases: The mean number of words per noun phrases.
  • Lexical complexity

    • Measure of textual lexical diversity (MTLD): The mean length of sequential word strings in a text that maintains a giventype-token ratio value.
    • CELEX log frequency: The averaged logarithmic frequency of content words produced in a text based on the CELEX corpus.
    • Lexical density: The proportion of content words to the total words produced.
  • Accuracy

    • I have no idea how to auto detect the accuracy. The user can only manually annotate the accuracy now.

The CPU usage in the transcription step

Many researchers in this field may not have a GPU or CUDA installed, so the default setting for whisper-timestamped is to use the CPU mode for audio transcription. On an Apple M1 Max CPU, it takes about 3 minute to transcribe a 1-minute audio file. However, if your CPU is not very powerful, transcribing audio can be time-consuming. For those without a powerful computer or who prefer not to strain their system, I recommend using Google Colab, which provides free access to T4 GPUs. I have created a Google Colab notebook that implements this feature. You can access the notebook here: Click here.

Experimental Features

Gemini API Integration

The AI segment feature is currently an experimental feature. The effectiveness of this function heavily depends on the prompt provided. I have included an example prompt that you can use to segment text into clauses or sentences. You should modify it according to your needs. To use this feature, you need access to the Gemini API. Google offers free usage of Gemini; for details, please visit this link. To integrate the API, enter your API key in the /static/js/geminiapi.js file.

Acknowledgements

Transcription

Annotation

Google Colab Notebook

Contributing

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Contact

For any questions or inquiries, please fell free to post here or contact me.