-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Nicolas Heist
committed
Dec 15, 2019
1 parent
be3635a
commit 6e163cf
Showing
8 changed files
with
753 additions
and
157 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,6 @@ nltk = "*" | |
pynif = "*" | ||
tables = "*" | ||
xgboost = "*" | ||
owlready2 = "*" | ||
|
||
[dev-packages] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,44 @@ | ||
# CaLiGraph | ||
|
||
TODO: Intro-Text | ||
\- A Large Semantic Knowledge Graph from Wikipedia Categories and Listpages \- | ||
|
||
## Purpose | ||
todo | ||
For information about the general idea, extraction statistics, and resources of CaLiGraph, visit the [CaLiGraph website](http://caligraph.org). | ||
|
||
## Configuration | ||
### Prerequisites | ||
- Python 3 | ||
- pipenv (https://pipenv.readthedocs.io/en/latest/) | ||
|
||
Note: If you have problems with your pipenv installation, you can also run the code directly via python. Just make sure to install all the dependencies given in `Pipfile` and `Pipfile.lock`. | ||
|
||
### System Requirements | ||
- You need a machine with at least 100 GB of RAM as we load most of DBpedia in memory to speed up the extraction | ||
- During the first execution of an extraction you need a stable internet connection as the required DBpedia files are downloaded automatically | ||
|
||
### Setup | ||
- In the project source directory, create and initialize a virtual environment with pipenv (run in terminal): | ||
|
||
- Create virtual environment with pipenv | ||
``` | ||
pipenv install | ||
``` | ||
|
||
- Download the spacy corpus: | ||
``` | ||
pipenv run python -m spacy download en_core_web_lg | ||
``` | ||
|
||
- Download the wordnet corpus of nltk (run in python): | ||
- If you have not downloaded them already, you have to fetch the latest corpora for spaCy and nltk-wordnet (run in terminal): | ||
``` | ||
import nltk | ||
nltk.download('wordnet') | ||
pipenv run python -m spacy download en_core_web_lg # download the most recent corpus of spaCy | ||
pipenv run python -c 'import nltk; nltk.download("wordnet")' # download the wordnet corpus of ntlk | ||
``` | ||
|
||
### Basic Configuration Options | ||
|
||
Use `config.yaml` for configuration of the application. | ||
You can configure the application-specific parameters as well as logging- and file-related parameters in `config.yaml`. | ||
|
||
## Usage | ||
|
||
- Run the application with pipenv: | ||
Run the extraction with pipenv: | ||
|
||
``` | ||
pipenv run . | ||
pipenv run python3 . | ||
``` | ||
|
||
## License | ||
MIT. | ||
https://opensource.org/licenses/MIT | ||
All the required resources, like DBpedia files, will be downloaded automatically during execution. | ||
CaLiGraph is serialized in N-Triple format. The resulting files are placed in the `results` folder. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.