Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange Behaviors #77

Open
Boetty opened this issue Nov 8, 2024 · 2 comments
Open

Strange Behaviors #77

Boetty opened this issue Nov 8, 2024 · 2 comments

Comments

@Boetty
Copy link

Boetty commented Nov 8, 2024

Hi everyone, first of all: I really appreciate this code. It has helped me a lot. Thank you for that.

Issues:

  1. The model sometimes responds that it has been trained until 2023, sometimes only until 2021. Why?

2 The model sometimes replies that it can't retrieve information from the internet because it's not connected, but other times it works... Why?

  1. The model "stumbles" at the beginning of the audio initialization. When you click Record and make a request, the model starts by giving strange words and responses initially. Only after a certain time does it respond correctly... Why?

Thank you very much,

Stefan

@Boetty
Copy link
Author

Boetty commented Nov 8, 2024

Additional:

The model sometimes reads phone numbers and sentences incorrectly in the audio output, even though they're correct in the text messages. Example text: Phone number: 1234567890, audio output: "1 2 45 789 54." Why?

@Boetty
Copy link
Author

Boetty commented Nov 9, 2024

Update:

To address the initialization issue where random noise or unintended audio data was sent immediately after starting, we made the following changes in main.ts:

Delay in Starting Real-Time Messages: We added a 1-second delay in the start_realtime() function before invoking handleRealtimeMessages(). This delay allows the audio system to stabilize before sending any initial audio data to the model, reducing the chance of random noise being processed as valid input.

Delay in Starting the Audio Recorder: Within the resetAudio() function, a 500-millisecond delay was introduced before starting the actual audio recording. This provides time for the audio recorder to initialize fully and ensures that no random noise or unintentional sounds are captured at the moment of starting the recording.

Noise Filtering in the Audio Buffer: In the processAudioRecordingBuffer() function, we implemented a noise filter by checking if the audio buffer contains meaningful audio data before sending it. By setting a threshold (e.g., >10), the function only processes buffers with valid audio content, preventing low-level noise or silence from being mistakenly interpreted as input.

Example:

`// main.ts

async function start_realtime() {
const { endpoint, apiKey, deploymentOrModel } = await fetchConfigFromProxy();

realtimeStreaming = new LowLevelRTClient(new URL(endpoint), { key: apiKey }, { deployment: deploymentOrModel });

try {
await realtimeStreaming.send(createConfigMessage());
} catch (error) {
makeNewTextBlock("[Connection error]: Please check the proxy endpoint.", "system-response");
setFormInputState(InputState.ReadyToStart);
return;
}

// Reset audio recorder and start it with a slight delay to avoid noise
await resetAudio(true);

// Delay to ensure initial random signals are not sent immediately
setTimeout(() => {
handleRealtimeMessages();
}, 1000); // 1-second delay
}

async function resetAudio(startRecording: boolean) {
recordingActive = false;
if (audioRecorder) {
audioRecorder.stop();
}
if (audioPlayer) {
audioPlayer.clear();
}
audioRecorder = new Recorder(processAudioRecordingBuffer);
audioPlayer = new Player();
audioPlayer.init(24000);

if (startRecording) {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

// Delay before starting recording to avoid initial random noise
setTimeout(() => {
  audioRecorder.start(stream);
  recordingActive = true;
}, 500); // 500ms delay

}
}

function processAudioRecordingBuffer(data: Buffer) {
const uint8Array = new Uint8Array(data);

// Check if buffer contains actual audio content (threshold set to filter out noise)
if (uint8Array.some((sample) => sample > 10)) { // Adjust threshold as needed
combineArray(uint8Array);
if (buffer.length >= 4800) {
const toSend = new Uint8Array(buffer.slice(0, 4800));
buffer = new Uint8Array(buffer.slice(4800));
const regularArray = String.fromCharCode(...toSend);
const base64 = btoa(regularArray);
if (recordingActive) {
realtimeStreaming.send({
type: "input_audio_buffer.append",
audio: base64,
});
}
}
}
}`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant