llm-datasets

Star

Here are 12 public repositories matching this topic...

neo4j-labs / text2cypher

Star

collection of text2cypher datasets, evaluations, and finetuning instructions

neo4j graph cypher cypher-query-language llm llms llm-training llm-datasets text2cypher

Updated Jun 13, 2024
Jupyter Notebook

dsdanielpark / open-llm-datasets

Sponsor

Star

Repository for organizing datasets and papers used in Open LLM.

natural-language-processing datasets large-language-models llm llm-training llm-datasets

Updated Jul 6, 2023

discus-labs / discus

Star

A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

python openai gpt synthetic-data fine-tuning synthetic-dataset-generation ner-data huggingface-transformers gpt-4 large-language-models llms llm-training llm-datasets fine-tuning-llm

Updated Nov 20, 2023
Python

asimsinan / LLM-Research

Star

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

arxiv-papers large-language-models llm llms llm-datasets llm-tools buyuk-dil-modelleri llm-research llm-theses llm-benchmarking llm-frameworks

Updated Oct 8, 2024
Python

altunenes / rustysozluk

Sponsor

Star

Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust

rust scraper sentiment-analysis turkish eksisozluk rust-lang webscraping eksi-sozluk reqwest duyguanalizi rust-scraping llm-training llm-datasets

Updated Feb 8, 2024
Rust

DefinetlyNotAI / LLM_Data

Sponsor

Star

A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI

c data cpp cuda jupyter-notebook python3 code-examples llm llm-datasets data-dum programming-data programming-data-sets llm-code

Updated Dec 12, 2024
Python

tiddly-gittly / TiddlyWiki-LLM-dataset

Star

WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)

dataset tiddlywiki wikitext llm llm-training llm-datasets

Updated Nov 20, 2024
TypeScript

Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.

solid dataset-generation conversational-ai intent-classification llm-training llm-inference llm-datasets llm-dialogs llm-conversations zephyr-7b-beta intent-aware-conversation-generation solid-rl

Updated Aug 18, 2024
Python

redblock-ai / parrot-python

Star

PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.

benchmarking-framework llm-inference llm-datasets llm-qa-document llm-benchmarking