Riveter 💪 is a Python package that measures social dynamics between personas mentioned in a collection of texts.
The package identifies and extracts the subjects, verbs, and direct objects in texts; it performs coreference resolution on the personas mentioned in the texts (e.g., clustering "Elizabeth Bennet" and "she" together as one persona); and it measures social dynamics between the personas by referencing a given lexicon. The package currently includes lexica for Maarten Sap et al's connotation frames of power and agency and Rashkin et al's connotation frames of perspective, effect, value, and mental state, but you can also load your own custom lexicon.
The name Riveter is inspired by "Rosie the Riveter," the allegorical figure who came to represent American women working in factories and at other industrial jobs during World War II. Rosie the Riveter has become an iconic symbol of power and shifting gender roles — subjects that the Riveter package aims to help users measure and explore.
Watch our two minute demo video here: link
Check out our demo notebook here: link
To skip local installation and get started immediately, you can using this Google Colab notebook.
- Python 3.9
- numpy
- pandas
- seaborn
- matplotlib
- spacy-experimental
These instructions have been tested on OSX machines. We have not tested these instructions in other environments.
- We strongly recommend creating a new virtual environment. Activate this environment before installing and before running the code.
conda create -n riveterEnv python=3.9
conda activate riveterEnv
- Download this repo by using the Git command below or by downloading the repository manually (click the green Code button above, select Download ZIP, and then unzip the downloaded directory).
git clone https://github.com/maartensap/riveter-nlp.git
cd riveter-nlp
Note: If installing on a Mac, you will need Xcode installed to run git from the command line.
- Install spacy-experimental and the spaCy model files.
pip install -U spacy-experimental
pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl#egg=en_coreference_web_trf
python -m spacy download en_core_web_sm
- Install pandas and seaborn.
conda install pandas
conda install seaborn
To use Riveter 💪, see the examples in our demo notebook.
This notebook includes both toy and realistic examples and all of the most important function calls.
If you want a quick start:
riveter = Riveter()
riveter.load_sap_lexicon('power')
riveter.train(texts,
text_ids)
persona_score_dict = riveter.get_score_totals()
Note: Here are some instructions for how to run demo.ipynb
from the riveterEnv conda environment that you created during installation.
Get the final scores for all the entities, above some frequency threshold across the dataset.
Name | Type | Description |
---|---|---|
frequency_threshold |
integer | Optional: Entities must be matched to at least this many verbs to appear in the output. |
RETURNS | dictionary | Dictionary of entities and their total scores. |
Create a bar plot showing the final scores across the dataset.
Name | Type | Description |
---|---|---|
number_of_scores |
integer | Optional: Show only the top or bottom number of scores. |
title |
string | Optional: Plot title. |
frequency_threshold |
integer | Optional: Entities must be matched to at least this many verbs to appear in the output. |
Get the final scores for all the entities, above some frequency threshold in a single document.
Name | Type | Description |
---|---|---|
doc_id |
string or integer | Show results for this document ID. |
frequency_threshold |
integer | Optional: Entities must be matched to at least this many verbs to appear in the output. |
RETURNS | dictionary | Nested dictionary of document IDs, entities, and their total scores. |
Create a bar plot showing the final scores for a single document.
Name | Type | Description |
---|---|---|
doc_id |
string or integer | Show results for this document ID. |
number_of_scores |
integer | Optional: Show only the top or bottom number of scores. |
title |
string | Optional: Plot title. |
frequency_threshold |
integer | Optional: Entities must be matched to at least this many verbs to appear in the output. |
Gets all the verbs, their frequencies, and whether they contributed positively or negatively to the final scores for every entity. Computed across the whole dataset.
Name | Type | Description |
---|---|---|
RETURNS | dictionary | Nested dictionary of entities, positive or negative contribution, verbs, and counts. |
Create a heatmap showing the verb counts for a single persona.
Name | Type | Description |
---|---|---|
persona |
string | The entity whose results will be shown in the plot. |
figsize |
tuple | Optional: Figure dimensions, e.g. (2, 4). |
output_path |
string | Optional: Where to save the plot as a file. |
Get the total counts for the entities (all entity matches, whether or not they were matched to a lexicon verb).
Name | Type | Description |
---|---|---|
RETURNS | dictionary | Dictionary of entities and integer counts. |
Get the entity counts for a single document.
Name | Type | Description |
---|---|---|
doc_id | string or integer | Show results for this document ID |
RETURNS | dictionary | Dictionary of entities and integer counts. |
Get the verb counts (verbs that were matched to the lexicon) for a single document.
Name | Type | Description |
---|---|---|
doc_id | string or integer | Show results for this document ID |
RETURNS | dictionary | Dictionary of verbs and integer counts. |
Get the noun subject counts for a single document.
Name | Type | Description |
---|---|---|
doc_id | string or integer | Show results for this document ID |
matched_only | boolean | If true, return only the subjects that were matched to identified entities. |
RETURNS | dictionary | Dictionary of noun subjects and integer counts. |
Get the direct object counts for a single document.
Name | Type | Description |
---|---|---|
doc_id | string or integer | Show results for this document ID |
matched_only | boolean | If true, return only the direct objects that were matched to identified entities. |
RETURNS | dictionary | Dictionary of direct object and integer counts. |
Get the full entity cluster from neuralcoref
.
Name | Type | Description |
---|---|---|
persona | string | Show results for this entity. |
RETURNS | dictionary | Dictionary of the main entity string and all of its string matches. |
Load the verb lexicon from Sap et al., 2017.
Name | Type | Description |
---|---|---|
dimension | string | Select the lexicon: "power" or "agency". |
Load the verb lexicon from Rashkin et al., 2016.
Name | Type | Description |
---|---|---|
dimension | string | Select the lexicon: ["effect", "state", "value", "writer_perspective", "reader_perspective", "agent_theme_perspective", "theme_agent_perspective"]. |
Load your own verb lexicon.
Name | Type | Description |
---|---|---|
lexicon_path | string | Path the lexicon; this should be a TSV file. |
verb_column | string | Column in the TSV that contains the verb. This should be in the same form as the Rashkin lexicon, e.g. "have" "take". |
agent_column | string | Column containing the agent score (positive or negative number). |
theme_column | string | Column containing the theme score (positive or negative number). |
Find all the documents matched to the verb.
Name | Type | Description |
---|---|---|
target_verb | string | The verb you'd like to match. |
RETURNS | (list, list) | List of matched document IDs, list of matched document texts. |
Find all the documents matched to the persona.
Name | Type | Description |
---|---|---|
target_persona | string | The persona you'd like to match. |
RETURNS | (list, list) | List of matched document IDs, list of matched document texts. |
This package was created by an interdisciplinary team including Maria Antoniak, Anjalie Field, Jimin Mun, Melanie Walsh, Lauren F. Klein, and Maarten Sap. You can find our paper writeup at the following URL: http://maartensap.com/pdfs/antoniak2023riveter.pdf
Use the following BibTex to cite the paper:
@article{antoniak2023riveter,
title={Riveter: Measuring Power and Social Dynamics Between Entities},
author={Antoniak, Maria and Field, Anjalie and Mun, Ji Min and Walsh, Melanie and Klein, Lauren F. and Sap, Maarten},
year={2023},
url={http://maartensap.com/pdfs/antoniak2023riveter.pdf}
}