This repository includes our code used in the following paper (arXiv) accepted at ICML2022:
@misc{https://doi.org/10.48550/arxiv.2206.10140,
doi = {10.48550/ARXIV.2206.10140},
url = {https://arxiv.org/abs/2206.10140},
author = {Kamigaito, Hidetaka and Hayashi, Katsuhiko},
keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Computation and Language (cs.CL), Social and Information Networks (cs.SI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Note that our original paper at PMLR wrongly drops |D| in Eq. (10), (12), and (13) by typos (erratum). Please see the latest arXiv version of our paper to understand our work.
We modified KGE-HAKE (Zhang et al., 2020) and KnowledgeGraphEmbedding (Sun et al., 2019) to implement our code.
We locate our modified KnowledgeGraphEmbedding on ./KnowledgeGraphEmbedding
and KGE-HAKE on ./KGE-HAKE
.
- Python 3.6+
- PyTorch 1.0+
- NumPy 1.15.4+
You can fulfill these requirements by running the command:
pip install -r requirements.txt
To rerun RESCAL, ComplEx, DistMult, TrasnE, and RotatE:
- Move to
./KnowledgeGraphEmbedding
- Run setting files at
./KnowledgeGraphEmbedding/settings/
for each model. - After the training, you can test trained models by running
./KnowledgeGraphEmbedding/eval.sh
.
- The evaluation results are stored in
test.log
of each model directory.
To rerun HAKE:
- Move to
./KGE-HAKE
- Run setting files at
./KGE-HAKE/settings/
for each model. - After the training, you can test trained models by running
./KGE-HAKE/eval.sh
.
- The evaluation results are stored in
test.log
of each model directory.
RESCAL, ComplEx, DistMult, TrasnE, and RotatE in ./KnowledgeGraphEmbedding
You can run the following scripts:
run.sh
trains a model using the self-adversarial negative sampling (SANS) loss function.run_wo_adv.sh
trains a model using the NS loss in Eq. (3) in our paper with uniform noise.run_wo_adv_sum.sh
trains a model using the NS loss in Eq. (2) in our paper with uniform noise.
The above scripts conduct testing after the final epoch of the training. Note that this result is on the model obtained through the last training epoch. If you need to evaluate the model that achieved the best validation MRR, please use the method described in Testing Models.
HAKE in ./KGE-HAKE
You can run the following scripts:
runs.sh
trains a model using the self-adversarial negative sampling (SANS) loss function.runs_wo_adv.sh
trains a model using the NS loss in Eq. (3) in our paper with uniform noise.runs_wo_adv_sum.sh
trains a model using the NS loss in Eq. (2) in our paper with uniform noise.
The above scripts conduct testing after the final epoch of the training. Note that this result is on the model obtained through the last training epoch. If you need to evaluate the model that achieved the best validation MRR, please use the method described in Testing Models.
In the training scripts of both ./KnowledgeGraphEmbedding
and ./KGE-HAKE
, you can use subsampling described in our paper by the following options:
--default_subsampling
: The subsampling included in./KnowledgeGraphEmbedding
--freq_based_subsampling
: Frequency-based subsampling described in Eq. (12) of our paper.--uniq_based_subsampling
: Unique-based subsampling described in Eq. (13) of our paper.
RESCAL, ComplEx, DistMult, TrasnE, and RotatE in ./KnowledgeGraphEmbedding
You can test a trained model in ${MODEL_DIRECTORY}
by using the following command:
python -u codes/run.py --do_test --cuda -init ${MODEL_DIRECTORY}
HAKE in ./KGE-HAKE
You can test a trained model in ${MODEL_DIRECTORY}
by using the following command:
python -u codes/runs.py --do_test --cuda -init ${MODEL_DIRECTORY}
Other options are described in ./KGE-HAKE/README.md
and ./KnowledgeGraphEmbedding/README.md
.