This README outlines the process of generating the Schedule dataset, which includes creating schedules, generating dialogues, and producing datasets of varying difficulty levels.
- Generate schedules for multiple people
- Create dialogues based on the generated schedules
- Generate datasets (easy, medium, and hard) using schedules and dialogues
- Import dialogues into a database
After that, you can open iAgents platform to see the imported dialogues among people, and evaluate using the datasets with different difficulty levels.
This script creates schedules for a group of people:
- Generates random activities for each person
- Adds routine activities (sleep, lunch)
- Schedules shared activities between people
- Outputs the schedules to 'schedule_data_list.jsonl'
Using the generated schedules, this script creates conversations between pairs of people:
- Reads schedules from 'schedule_data_list.jsonl'
- Query LLM to generate dialogues among people based on the schedules
- Outputs the dialogues to 'dialogue.csv'
Three scripts generate datasets of varying difficulty levels:
- dataset_generate_easy.py: Creates easy-level questions
- dataset_generate_medium.py: Creates medium-level questions
- dataset_generate_hard.py: Creates hard-level questions
Each script:
- Reads schedules from 'schedule_data_list.jsonl'
- Reads dialogues from 'dialogue.csv'
- Generates questions and answers based on the schedules and dialogues
- Outputs the datasets to JSONL files (dataset_easy.jsonl, dataset_medium.jsonl, dataset_hard.jsonl)
This script imports the generated dialogues into a MySQL database:
- Creates necessary database tables (users, friendships, chats)
- Reads dialogues from 'dialogue.csv'
- Inserts data into the database tables
To generate the complete dataset:
- Run schedule_generate.py
- Run dialogue_generate.py
- Run dataset_generate_easy.py, dataset_generate_medium.py, and dataset_generate_hard.py
- (Optional) Run Schedule_import_db.py to import dialogues into a database
Note:
- Ensure all required dependencies are installed and database configurations are set up correctly before running the scripts.
- Adjust scale in all these scripts, which means the number of samples to generate.
- All scripts require proper configuration of the iAgents README for LLM backend usage.