From 1c9cb8eadcaa9b62986cb478dc5629752790650f Mon Sep 17 00:00:00 2001 From: simonepri Date: Fri, 20 Mar 2020 19:05:27 +0000 Subject: [PATCH] Create readme.md --- readme.md | 190 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) create mode 100644 readme.md diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..4c47898 --- /dev/null +++ b/readme.md @@ -0,0 +1,190 @@ +

+ datasets-knowledge-embedding +

+

+ + + Project license + +

+

+ 📝 A collection of common datasets used in knowledge embedding +

+ + +## Datasets + +This project collects different datasets used in various knowledge embedding related papers. +It also standardizes the format of these datasets, making it easier to use them in the evaluation of new works. + +The datasets can be downloaded from the [release page][release]. +For licensing information, please refer to the original dataset license file. + + +### COUNTRIES-S1 +This dataset was introduced in [On Approximate Reasoning Capabilities of Low-Rank Vector Spaces](https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10257). +The link to the original dataset as released by the authors is unknown but a copy has been taken from [here](https://github.com/TimDettmers/ConvE/tree/master/countries). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 271 | 2 | 1159 | 1111 | 24 | 24 | + +[![Download COUNTRIES-S1.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S1.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S1.tgz) [![Download COUNTRIES-S1-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S1-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S1-ID.tgz) + + +### COUNTRIES-S2 +This dataset was introduced in [On Approximate Reasoning Capabilities of Low-Rank Vector Spaces](https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10257). +The link to the original dataset as released by the authors is unknown but a copy has been taken from [here](https://github.com/TimDettmers/ConvE/tree/master/countries). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 271 | 2 | 1111 | 1063 | 24 | 24 | + +[![Download COUNTRIES-S2.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S2.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S2.tgz) [![Download COUNTRIES-S2-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S2-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S2-ID.tgz) + +### COUNTRIES-S3 +This dataset was introduced in [On Approximate Reasoning Capabilities of Low-Rank Vector Spaces](https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10257). +The link to the original dataset as released by the authors is unknown but a copy has been taken from [here](https://github.com/TimDettmers/ConvE/tree/master/countries). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 271 | 2 | 1033 | 985 | 24 | 24 | + +[![Download COUNTRIES-S3.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S3.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S3.tgz) [![Download COUNTRIES-S3-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S3-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S3-ID.tgz) + +### FB15K +This dataset was introduced in [Translating Embeddings for Modeling Multi-relational Data](https://dl.acm.org/doi/10.5555/2999792.2999923). +The original dataset as release by the authors is available [here](https://everest.hds.utc.fr/doku.php?id=en:transe). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 14951 | 1345 | 592213 | 483142 | 50000 | 59071 | + +[![Download FB15K.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K.tgz) [![Download FB15K-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K-ID.tgz) + +### FB15K-237 +This dataset was introduced in [Observed versus latent features for knowledge base and text inference](https://www.aclweb.org/anthology/W15-4007/). +The original dataset as release by the authors is available [here](https://www.microsoft.com/en-us/download/details.aspx?id=52312). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 14541 | 237 | 310116 | 272115 | 17535 | 20466 | + +[![Download FB15K-237.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K-237.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K-237.tgz) [![Download FB15K-237-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K-237-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K-237-ID.tgz) + +### KINSHIP +This dataset was introduced in [Learning systems of concepts with an infinite relational model](https://dl.acm.org/doi/10.5555/1597538.1597600). +The original dataset as release by the authors is available [here](http://www.charleskemp.com/code/irm.html). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 104 | 25 | 10686 | 8544 | 1068 | 1074 | + +[![Download KINSHIP.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/KINSHIP.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/KINSHIP.tgz) [![Download KINSHIP-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/KINSHIP-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/KINSHIP-ID.tgz) + +### NATIONS +This dataset was introduced in [Learning systems of concepts with an infinite relational model](https://dl.acm.org/doi/10.5555/1597538.1597600). +The original dataset as release by the authors is available [here](http://www.charleskemp.com/code/irm.html). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 14 | 55 | 1992 | 1592 | 199 | 201 | + +[![Download NATIONS.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/NATIONS.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/NATIONS.tgz) [![Download NATIONS-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/NATIONS-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/NATIONS-ID.tgz) + +### UMLS +This dataset was introduced in [Learning systems of concepts with an infinite relational model](https://dl.acm.org/doi/10.5555/1597538.1597600). +The original dataset as release by the authors is available [here](http://www.charleskemp.com/code/irm.html). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 135 | 46 | 6529 | 5216 | 652 | 661 | + +[![Download UMLS.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/UMLS.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/UMLS.tgz) [![Download UMLS-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/UMLS-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/UMLS-ID.tgz) + +### WN18 +This dataset was introduced in [Translating Embeddings for Modeling Multi-relational Data](https://dl.acm.org/doi/10.5555/2999792.2999923). +The original dataset as release by the authors is available [here](https://everest.hds.utc.fr/doku.php?id=en:transe). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 41105 | 18 | 151442 | 141442 | 5000 | 5000 | + +[![Download WN18.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18.tgz) [![Download WN18-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18-ID.tgz) + +### WN18RR +This dataset was introduced in [Convolutional 2D Knowledge Graph Embeddings](https://arxiv.org/abs/1707.01476). +The original dataset as release by the authors is available [here](https://github.com/TimDettmers/ConvE). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 41105 | 11 | 93003 | 86835 | 3034 | 3134 | + +[![Download WN18RR.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18RR.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18RR.tgz) [![Download WN18RR-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18RR-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18RR-ID.tgz) + +### YAGO3-10 +This dataset was introduced in [Convolutional 2D Knowledge Graph Embeddings](https://arxiv.org/abs/1707.01476). +The original dataset as release by the authors is available [here](https://github.com/TimDettmers/ConvE). + +| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges | +|----------|----------------|-------|-------------|------------------|------------| +| 123182 | 37 | 1089040 | 1079040 | 5000 | 5000 | + +[![Download YAGO3-10.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/YAGO3-10.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/YAGO3-10.tgz) [![Download YAGO3-10-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/YAGO3-10-ID.tgz +)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/YAGO3-10-ID.tgz) + + +## Add a new dataset + +If you want to add a new dataset to this collection, first you need to create three files called `train.tsv`, `valid.tsv`, and `test.tsv` containing respectively the edges for the three splits train, validation and test. +The files must contain tab-separated triples of the form `(head entity, relation, tail entity)`. + +Once you did this, you can simply process the three files with the following bash script. + +```bash +bash build.sh train.tsv valid.tsv test.tsv . +``` + +The script uses the [datasets-knowledge-embedding][github:simonepri/datasets-knowledge-embedding] tool under the hood. + + +## Authors + +- **Simone Primarosa** - [simonepri][github:simonepri] + +See also the list of [contributors][contributors] who participated in this project. + + +## License + +This project is licensed under the MIT License - see the [license][license] file for details. + + +[license]: https://github.com/simonepri/datasets-knowledge-embedding/tree/master/license +[contributors]: https://github.com/simonepri/datasets-knowledge-embedding/contributors +[release]: https://github.com/simonepri/datasets-knowledge-embedding/releases/latest + +[github:simonepri]: https://github.com/simonepri + +[github:simonepri/datasets-knowledge-embedding]: https://github.com/simonepri/datasets-knowledge-embedding