Do you want to fine-tune the most powerful language model in the world? Do you have a lot of data and a lot of patience? Do you like to watch numbers go up and down? If you answered yes to any of these questions, then this repo is for you!
This is the second part in the soup to nuts Fine-tuning GPT-3.5 from scratch series. The first part is here: https://github.com/BenderScript/meraki_genai_community_scraper
In this part, we will fine-tune GPT-3.5 Turbo based on the data scraped from the Meraki Community Forum.
This repo contains a Python script that lets you fine-tune GPT-3.5 Turbo on your own data. GPT-3.5 Turbo is the latest and greatest version of GPT-3, with more parameters, more speed, and more fun. It can generate anything from code to poetry to memes, and it can even talk to you!
Before you start, make sure you have the following:
- A valid OpenAI API key
- A data file in JSONL format. Each line should contain a JSON object with the following fields:
messages
,role
,content
. See the example below for more details. - There is an example
data.jsonl
file in this repo taht was generated by the scraper.
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What is the issue you are facing when trying to create a GPO to automatically connect computers to a network with WPA2-Personal and PSK?"}, {"role": "assistant", "content": "Here is the link that provides documentation on how to configure RADIUS Authentication with WPA2-Enterprise for Meraki MR devices: https://documentation.meraki.com/MR/Encryption_and_Authentication/Configuring_RADIUS_Authentication_with_WPA2-Enterprise. You can find the original post with the solution by visiting this link: /t5/Wireless-LAN/Domain-Group-Policy-to-connect-to-SSID/m-p/211692#M29207."}]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "Is it possible for wireless clients to be enrolled in both Meraki System Manager (SM) and MS Intune at the same time for authentication?"}, {"role": "assistant", "content": "Yes, you are correct. By following the information provided in the link [link](https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Trusted_Access_for_Secure_Wireless_Connectivity), you can achieve authentication and authorization with certificate-level security. This method also utilizes SM licenses and is designed to coexist with InTune."}]}
-
git clone https://github.com/your_username/fine-tuning-gpt-3.5-turbo.git cd fine-tuning-gpt-3.5-turbo pip install -r requirements.txt
-
echo "OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" > .env
-
python3 main.py
will through the entire process as below -
Sit back and relax as the script uploads your data file, creates a fine-tuning job, and monitors its progress. You will see something like this:
Status: running Trained Tokens: None Job ID: ftjob-pfWRH9QrmN0XRKgbsZ7tI9s6 Status: running Trained Tokens: None Job ID: ftjob-pfWRH9QrmN0XRKgbsZ7tI9s6 Status: running Trained Tokens: None Job ID: ftjob-pfWRH9QrmN0XRKgbsZ7tI9s6 Status: running Trained Tokens: None Job ID: ftjob-pfWRH9QrmN0XRKgbsZ7tI9s6 Status: running Trained Tokens: None Job ID: ftjob-pfWRH9QrmN0XRKgbsZ7tI9s6 Status: succeeded Trained Tokens: 19272
Possible statuses are: "validating_files", "running", "succeeded"
-
Bypass fine-tuning and provide your own question
python3 main.py --bypass_fine_tuning --model_id ft:gpt-3.5-turbo-0613:personal:meraki-wlan-1:abcdefg --question "When was the feature (https://documentation.meraki.com/MR/AP_Neighbors) / (https://community.meraki.com/t5/Feature-Announcements/New-AP-Neighbors-feature-now-available/ba-p/211919) supposed to be released"
"Show this thing actually works and stop stalling" you say? Fine, here is an example of a question and answer generated by the fine-tuned model:
python3 main.py --bypass_fine_tuning --model_id ft:gpt-3.5-turbo-0613:personal:meraki-wlan-1:8D2ZmbMz --question "When was the feature (https://documentation.meraki.com/MR/AP_Neighbors) / (https://community.meraki.com/t5/Feature-Announcements/New-AP-Neighbors-feature-now-available/ba-p/211919) supposed to be released"
The AP Neighbors feature was released on May 11, 2020. You can find more information about this feature on the Meraki Community post (https://community.meraki.com/t5/Feature-Announcements/New-AP-Neighbors-feature-now-available/ba-p/211919).
chatGPT 3.5-Turbo response without fine-tuning
I don't have access to real-time data or updates beyond my last knowledge update in September 2021. To find out the specific release date of the feature you mentioned on the Meraki platform, I recommend visiting the official Meraki website, checking their official documentation, or reaching out to their support or community forums for the most up-to-date information. Release dates for software features can change, and it's best to refer to the most recent and official sources for the latest information.
It is actually hard to force chatGPT to go in the direction of the fine-tuned data if you have small data set.
I had to try many times until I got a acceptable answer but when I fine-tuned the model with 500 QAs it was very easy.
Ref: https://community.openai.com/t/how-does-fine-tuning-really-work/39972