Skip to content

wagner-group/prompt-injection-defense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jatmo

Fine-tuning base models to build robust task-specific models

Installation

  • Create a pyton3.9 virtual env: python3.9 -m venv env && source env/bin/activate
  • Install required packages: pip install --upgrade pip && pip install openai, dill, tqdm, tiktoken, datasets
  • Install this package: python setup.py install
  • Export your openai key: export OPENAI_API_KEY=[your key]

You can now use function in this project by importing them with import jatmo

Usage

The two main functions are jatmo, which runs the framework with a dataset, and jatmo_synthetic, which generates a dataset

jatmo

You can run Jatmo with an input dataset by using the jatmo function. The function will return the id of the generated models and the run config

from jatmo import jatmo

### Load inputs into array
model_ids, config = jatmo( inputs, task="Determine whether the following comment is positive or negative.")

jatmo_synthetic

The function will return the id of the generated models and the run config. You can pass it a one-shot example or even multiple examples

from jatmo import jatmo_synthetic

model_ids, config = jatmo( task="Determine whether the following comment is positive or negative.", few_shot_examples = "This movie is awesome!")

Common resources

You can use from prompt_injection_defense.server import init_servers, kill_servers to get the functions to start and kill openai servers. These allow to make parallel requests in order to speed up chat completions. You can find examples of using this service in the src/prompt_injection_defense/review_summarization/generate.py file.

You can use from prompt_injection_defense.server import rate_completions to run the rating algorithm (using GPT-3.5-turbo). The docstring in the prompt_injection_defense.server file provides information as to its usage.

Results

Code summarization Sentiment Analysis Review Summarization Translation News Summarization Toxicity Detection Toxicity Detection (w/ GPT-generated label) Sentence Similarity
Dataset The Stack IMDB Reviews Amazon Reviews Gutenberg Project CNN/DM Kaggle Toxic Comment Classification Challenge Semantic Textual Similarity Benchmark
FT training size / test size 400 / 100 1000 / 100 400 / 100 400 / 100 400 / 100 400/100
Base Model curie curie davinci_002 davinci_002 davinci_002 davinci_002 davinci_002 davinci_002
Quality vs Baseline Better than GPT No Quality Loss No Quality Loss No Quality Loss Better than GPT (87%->92%) No Quality Loss (86%) No Quality Loss
Success Rate PI [Start/GPT] 98% 100% 98% 100% 99% 89% 89% 99%
Success Rate PI [Start/FT] 0% 0% 0% 0% 0% 0% 0% 0%
Success Rate PI [End/GPT] 96% 99% 100% 100% 100% 84% 84% 100%
Success Rate PI [End/FT] 0% 0% 2% 0% 0% 0% 0% 0%
Success Rate PI [Middle/GPT] 12% 89% 93% 52% 71% 85% 85%
Success Rate PI [Middle/FT] 0% 0% 0% 0% 0% 0% 0%

About

Fine-tuning base models to build robust task-specific models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages