random-prompt

Code and supplementary document for paper Prompt Optimisation with Random Sampling

Most up-to-date version Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation

References

If you use this repository in your research, please cite our paper:

@inproceedings{lu-etal-2024-strings,
    title = "Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation",
    author = "Lu, Yao  and
      Wang, Jiayi  and
      Tang, Raphael  and
      Riedel, Sebastian  and
      Stenetorp, Pontus",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.122",
    pages = "2221--2231",
}

Implementation from scratch

Want to implement from scratch? You can take a look at the core implementation for generating random separators in less than 10 lines of code.

Random vocabulary mode

import random
from transformers import GPT2Tokenizer

prompt = "this is a good movie [Answer:] positive"

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
vocab_size = tokenizer.vocab_size

# random length for separator
separator_length = random.randint(1, 5)
random_separator_ids = random.sample(range(vocab_size), separator_length)
random_separator_text = tokenizer.decode(random_separator_ids, skip_special_tokens=True)

random_prompt = prompt.replace("[Answer:]", random_separator_text)

# evaluate on training set
# ...

Random without context mode

from transformers import GPT2Tokenizer, GPT2LMHeadModel

prompt = "this is a good movie [Answer:] positive"

model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# random length for separator
separator_length = random.randint(1, 5)
random_separator_ids = model.generate(do_sample=True, max_new_tokens=separator_length)[0]
random_separator_text = tokenizer.decode(random_separator_ids)

random_prompt = prompt.replace("[Answer:]", random_separator_text)

# evaluate on training set
# ...

Random with context mode

from transformers import GPT2Tokenizer, GPT2LMHeadModel

prompt = "this is a good movie [Answer:] positive"

# follow OPRO's examples format https://arxiv.org/abs/2309.03409
context = "I like this movie <INS> positive\nI don't like this movie <INS>\n"

model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# random length for separator
separator_length = random.randint(1, 5)
context_input_ids = tokenizer.encode(context, return_tensors='pt')
random_separator_ids = model.generate(context_input_ids, do_sample=True, max_new_tokens=separator_length)[0]
random_separator_text = tokenizer.decode(random_separator_ids)

random_prompt = prompt.replace("[Answer:]", random_separator_text)

# evaluate on training set
# ...

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
main.py		main.py
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

random-prompt

References

Implementation from scratch

About

Releases

Packages

Languages

License

yaolu/random-prompt

Folders and files

Latest commit

History

Repository files navigation

random-prompt

References

Implementation from scratch

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages