Skip to content
View liusongxiang's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report liusongxiang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

426 18 Updated Dec 14, 2024
Python 109 10 Updated Aug 13, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,755 233 Updated Dec 4, 2024

The repo provides information about KeSpeech dataset.

129 7 Updated Oct 13, 2022

A generative world for general-purpose robotics & embodied AI learning.

Python 19,016 1,385 Updated Dec 24, 2024

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 6,616 493 Updated Dec 23, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,672 159 Updated Dec 18, 2024

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Python 268 28 Updated Dec 18, 2024

Let's build better datasets, together!

Jupyter Notebook 235 29 Updated Dec 20, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 503 22 Updated Dec 18, 2024

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 203 24 Updated Dec 22, 2024

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 1,846 129 Updated Dec 23, 2024

Ongoing research training transformer models at scale

Python 10,912 2,440 Updated Dec 23, 2024

Convert any PDF into a podcast episode!

Python 1,700 185 Updated Dec 7, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,210 1,057 Updated Dec 22, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 32,452 4,945 Updated Dec 24, 2024

播客 🎧 编程、设计、Vlog、音乐、访谈、博客...

1,983 109 Updated Oct 6, 2023

o1-engineer is a command-line tool designed to assist developers in managing and interacting with their projects efficiently. Leveraging the power of OpenAI's API, this tool provides functionalitie…

Python 2,845 293 Updated Dec 16, 2024
Jupyter Notebook 1,106 139 Updated Sep 24, 2024

Convert any PDF into a podcast episode!

Python 631 262 Updated Nov 15, 2024

Prompt工程师指南,源自英文版,但增加了AIGC的prompt部分,为了降低同学们的学习门槛,翻译更新

MDX 1,090 116 Updated Sep 14, 2024

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 36,283 5,242 Updated Dec 24, 2024
Python 7,031 550 Updated Dec 20, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,673 184 Updated Nov 14, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 924 53 Updated Dec 9, 2024
TypeScript 29 4 Updated Aug 17, 2024
Python 6 Updated Aug 25, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 619 56 Updated Dec 24, 2024

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".

Python 321 18 Updated Apr 20, 2024

利用HuggingFace的官方下载工具从镜像网站进行高速下载。

Python 892 83 Updated Oct 12, 2024
Next
Showing results