Skip to content

Commit

Permalink
Merge pull request #508 from QData/example_bug_fix
Browse files Browse the repository at this point in the history
bug fix for codes in example folder
  • Loading branch information
qiyanjun authored Aug 3, 2021
2 parents fd7117d + 392576e commit f64df48
Show file tree
Hide file tree
Showing 16 changed files with 112 additions and 43 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,5 @@ checkpoints/
# vim
*.swp

.vscode
.vscode
*.csv
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,11 @@ or a specific command using, for example,
textattack attack --help
```

The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file. The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.


The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..


### Running Attacks: `textattack attack --help`

Expand All @@ -88,7 +92,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example

*DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset*:
```bash
textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
```

*Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM*:
Expand Down Expand Up @@ -323,7 +327,9 @@ For example, given the following as `examples.csv`:
"it's a mystery how the movie could be released in this condition .", 0
```

The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
The command
```textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
```
will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to `augment.csv` by default.)

Expand Down Expand Up @@ -453,7 +459,7 @@ create a short file that loads them as variables `model` and `tokenizer`. The `
be able to transform string inputs to lists or tensors of IDs using a method called `encode()`. The
model must take inputs via the `__call__` method.

##### Model from a file
##### Custom Model from a file
To experiment with a model you've trained, you could create the following file
and name it `my_model.py`:

Expand Down Expand Up @@ -488,14 +494,12 @@ which maintains both a list of tokens and the original text, with punctuation. W



#### Dataset via Data Frames (*coming soon*)
#### Dataset loading via other mechanism, see: [here](https://textattack.readthedocs.io/en/latest/api/datasets.html)



### Attacks and how to design a new attack

The `attack_one` method in an `Attack` takes as input an `AttackedText`, and outputs either a `SuccessfulAttackResult` if it succeeds or a `FailedAttackResult` if it fails.


We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.

Expand Down
6 changes: 2 additions & 4 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example
*对 Quora 问句对数据集上训练的 DistilBERT 模型进行 DeepWordBug 攻击*:

```bash
textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
```

*对 MR 数据集上训练的 LSTM 模型:设置束搜索宽度为 4,使用词嵌入转换进行无目标攻击*:
Expand Down Expand Up @@ -315,7 +315,7 @@ TextAttack 的组件中,有很多易用的数据增强工具。`textattack.Aug
"it's a mystery how the movie could be released in this condition .", 0
```

使用命令 `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
使用命令 `textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
会增强 `text` 列,约束对样本中 10% 的词进行修改,生成输入数据两倍的样本,同时结果文件中不保存 csv 文件的原始输入。(默认所有结果将会保存在 `augment.csv` 文件中)

数据增强后,下面是 `augment.csv` 文件的内容:
Expand Down Expand Up @@ -454,8 +454,6 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]

### 何为攻击 & 如何设计新的攻击

`Attack` 中的 `attack_one` 方法以 `AttackedText` 对象作为输入,若攻击成功,返回 `SuccessfulAttackResult`,若攻击失败,返回 `FailedAttackResult`


我们将攻击划分并定义为四个组成部分:**目标函数** 定义怎样的攻击是一次成功的攻击,**约束条件** 定义怎样的扰动是可行的,**变换规则** 对输入文本生成一系列可行的扰动结果,**搜索方法** 在搜索空间中遍历所有可行的扰动结果。每一次攻击都尝试对输入的文本添加扰动,使其通过目标函数(即判断攻击是否成功),并且扰动要符合约束(如语法约束,语义相似性约束)。最后用搜索方法在所有可行的变换结果中,挑选出优质的对抗样本。

Expand Down
2 changes: 1 addition & 1 deletion docs/0_get_started/command_line_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ For example, given the following as `examples.csv`:

The command:
```
textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 \
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 \
--transformations-per-example 2 --exclude-original
```
will augment the `text` column with 10% of words edited per augmentation, twice as many augmentations as original inputs, and exclude the original inputs from the
Expand Down
12 changes: 10 additions & 2 deletions docs/1start/attacks4Components.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,16 @@ This modular design enables us to easily assemble attacks from the literature wh
![two-categorized-attacks](/_static/imgs/intro/01-categorized-attacks.png)




- You can create one new attack (in one line of code!!!) from composing members of four components we proposed, for instance:

```bash
# Shows how to build an attack from components and use it on a pre-trained model on the Yelp dataset.
textattack attack --attack-n --model bert-base-uncased-yelp --num-examples 8 \
--goal-function untargeted-classification \
--transformation word-swap-wordnet \
--constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
--search greedy
```

### Goal Functions

Expand Down
60 changes: 56 additions & 4 deletions docs/1start/multilingual-visualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,66 @@ TextAttack Extended Functions (Multilingual)



## Textattack Supports Multiple Model Types besides huggingface models and our textattack models:

- Example attacking TensorFlow models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_0_tensorflow.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_0_tensorflow.html)
- Example attacking scikit-learn models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_1_sklearn.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_1_sklearn.html)
- Example attacking AllenNLP models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_2_allennlp.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_2_allennlp.html)
- Example attacking Kera models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html)


## Multilingual Supports

- see example code: [https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py](https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py) for using our framework to attack French-BERT.

- see tutorial notebook: [https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html) for using our framework to attack French-BERT.
- see tutorial notebook for using our framework to attack French-BERT.: [https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html)

- see example code for using our framework to attack French-BERT: [https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py](https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py) .



## User defined custom inputs and models


### Custom Datasets: Dataset from a file

Loading a dataset from a file is very similar to loading a model from a file. A 'dataset' is any iterable of `(input, output)` pairs.
The following example would load a sentiment classification dataset from file `my_dataset.py`:

```python
dataset = [('Today was....', 1), ('This movie is...', 0), ...]
```

You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.


#### Custom Model: from a file
To experiment with a model you've trained, you could create the following file
and name it `my_model.py`:

```python
model = load_your_model_with_custom_code() # replace this line with your model loading code
tokenizer = load_your_tokenizer_with_custom_code() # replace this line with your tokenizer loading code
```

Then, run an attack with the argument `--model-from-file my_model.py`. The model and tokenizer will be loaded automatically.



## User defined Custom attack components

The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..

- custom transformation example @ [https://textattack.readthedocs.io/en/latest/2notebook/1_Introduction_and_Transformations.html](https://textattack.readthedocs.io/en/latest/2notebook/1_Introduction_and_Transformations.html)

- custome constraint example @[https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html#A-custom-constraint](https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html#A-custom-constraint)




## Visulizing TextAttack generated Examples;


- You can visualize the generated adversarial examples vs. see examples, following visualization ways we provided here: [https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html](https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html)

## We have built a new WebDemo For Visulizing TextAttack generated Examples;
- If you have webapp, we have also built a new WebDemo [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo) for visualizing generated adversarial examples from textattack..

- [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo)
2 changes: 1 addition & 1 deletion docs/3recipes/attack_recipes_cmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example

*DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset*:
```bash
textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
```

*Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM*:
Expand Down
5 changes: 4 additions & 1 deletion docs/3recipes/augmenter_recipes_cmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ and the number of augmentations per input example. It outputs a CSV in the same
"it's a mystery how the movie could be released in this condition .", 0
```

The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
The command
```
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
```
will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to `augment.csv` by default.)

Expand Down
14 changes: 7 additions & 7 deletions examples/attack/attack_camembert.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import numpy as np
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, pipeline

from textattack import Attacker
from textattack.attack_recipes import PWWSRen2019
from textattack.datasets import HuggingFaceDataset
from textattack.models.wrappers import ModelWrapper
Expand All @@ -20,11 +21,11 @@ class HuggingFaceSentimentAnalysisPipelineWrapper(ModelWrapper):
[[0.218262017, 0.7817379832267761]
"""

def __init__(self, pipeline):
self.pipeline = pipeline
def __init__(self, model):
self.model = model

def __call__(self, text_inputs):
raw_outputs = self.pipeline(text_inputs)
raw_outputs = self.model(text_inputs)
outputs = []
for output in raw_outputs:
score = output["score"]
Expand Down Expand Up @@ -55,7 +56,6 @@ def __call__(self, text_inputs):
recipe.transformation.language = "fra"

dataset = HuggingFaceDataset("allocine", split="test")
for idx, result in enumerate(recipe.attack_dataset(dataset)):
print(("-" * 20), f"Result {idx+1}", ("-" * 20))
print(result.__str__(color_method="ansi"))
print()

attacker = Attacker(recipe, dataset)
results = attacker.attack_dataset()
2 changes: 1 addition & 1 deletion examples/attack/attack_from_components.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
# model on the Yelp dataset.
textattack attack --attack-n --goal-function untargeted-classification \
--model bert-base-uncased-yelp --num-examples 8 --transformation word-swap-wordnet \
--constraints edit-distance^12 max-words-perturbed:max_percent=0.75 repeat stopword \
--constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
--search greedy
20 changes: 10 additions & 10 deletions examples/augmentation/augment.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
text,label
"the rock is destined to be the 21st century's novel conan and that he's go to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21st century's novo conan and that he's going to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or stephens segal.",1
the gorgeously elaborate continuation of 'the lord of the rings' triad is so massive that a column of words cannot adequately describe co-writer/director pete jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
the gorgeously elaborate continuation of 'the lordy of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/superintendent peter jackson's enlargements vision of j . r . r . tolkien's middle-earth .,1
take care of my cat offers a cheerfully different slice of asian cinema .,1
take care of my cat offers a refreshingly different slice of asian cinemas .,1
a technically well-made suspenser . . . but its abrupt fall in iq points as it races to the finish line demonstrating simply too discouraging to let slide .,0
a technologically well-made suspenser . . . but its abrupt dip in iq points as it races to the finish line proves simply too discouraging to let slide .,0
it's a mystery how the cinematography could be released in this condition .,0
it's a mystery how the movies could be released in this condition .,0
"the rock is destined to be the new conan and that he's going to make a splash even greater than arnold , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21st century's new conan and that he's going to caravan make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
the gorgeously rarify continuation of 'the lord of the rings' trilogy is so huge that a column of give-and-take cannot adequately describe co-writer/director shaft jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
the gorgeously elaborate of 'the of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded of j . r . r . tolkien's middle-earth .,1
take care different my cat offers a refreshingly of slice of asian cinema .,1
take care of my cat offers a different slice of asian cinema .,1
a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish IT line proves simply too discouraging to let slide .,0
a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish demarcation proves plainly too discouraging to let slide .,0
it's pic a mystery how the movie could be released in this condition .,0
it's a mystery how the movie could in released be this condition .,0
2 changes: 1 addition & 1 deletion examples/augmentation/augment.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#!/bin/bash
textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite
2 changes: 0 additions & 2 deletions examples/augmentation/example.csv

This file was deleted.

4 changes: 4 additions & 0 deletions examples/train/train_lstm_imdb_sentiment_classification.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
# demonstration of our training script and `datasets` integration.
textattack train --model-name-or-path lstm --dataset imdb --epochs 50 --learning-rate 1e-5
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
# demonstration of our training script and `datasets` integration.
textattack train --model-name-or-path lstm --dataset rotten_romatoes --epochs 50 --learning-rate 1e-5
textattack train --model-name-or-path lstm --dataset rotten_tomatoes --epochs 50 --learning-rate 1e-5
1 change: 1 addition & 0 deletions textattack/training_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,7 @@ def _add_parser_args(cls, parser):
# Arguments that are needed if we want to create a model to train.
parser.add_argument(
"--model-name-or-path",
"--model",
type=str,
required=True,
help='Name or path of the model we want to create. "lstm" and "cnn" will create TextAttack\'s LSTM and CNN models while'
Expand Down

0 comments on commit f64df48

Please sign in to comment.