⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Python 268 28 Updated Dec 18, 2024

huggingface / data-is-better-together

Let's build better datasets, together!

Jupyter Notebook 235 29 Updated Dec 20, 2024

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 503 22 Updated Dec 18, 2024

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 203 24 Updated Dec 22, 2024

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 1,846 129 Updated Dec 23, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 10,912 2,440 Updated Dec 23, 2024

gabrielchua / open-notebooklm

Forked from knowsuchagency/pdf-to-podcast

Convert any PDF into a podcast episode!

Python 1,700 185 Updated Dec 7, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,210 1,057 Updated Dec 22, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 32,452 4,945 Updated Dec 24, 2024

alaskasquirrel / Chinese-Podcasts

播客 🎧 编程、设计、Vlog、音乐、访谈、博客...

1,983 109 Updated Oct 6, 2023

Doriandarko / o1-engineer

o1-engineer is a command-line tool designed to assist developers in managing and interacting with their projects efficiently. Leveraging the power of OpenAI's API, this tool provides functionalitie…

Python 2,845 293 Updated Dec 16, 2024

lamm-mit / PDF2Audio

Jupyter Notebook 1,106 139 Updated Sep 24, 2024

knowsuchagency / pdf-to-podcast

Convert any PDF into a podcast episode!

Python 631 262 Updated Nov 15, 2024

wangxuqi / Prompt-Engineering-Guide-Chinese

Prompt工程师指南，源自英文版，但增加了AIGC的prompt部分，为了降低同学们的学习门槛，翻译更新

MDX 1,090 116 Updated Sep 14, 2024

microsoft / autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 36,283 5,242 Updated Dec 24, 2024

kyutai-labs / moshi

Python 7,031 550 Updated Dec 20, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,673 184 Updated Nov 14, 2024