This is code implementation for semester project Understanding and Visualizing Graph Neural Networks .
All commands should be executed within the src/run
subfolder. The relevant configuration files for experiments are in src/configs
.
- training GCN with joint loss function
python fixpoint.py --dataset cora --fixpoint_loss --exp_times 10
where cora
is the dataset name and may be changed to pubmed
or citeseer
. If --fixpoint_loss
is set True
, then
GCN is trained with proposed joint loss function, otherwise it's trained with normal entropy loss for classification.
--exp_times
represent the repeating times of the experiments, the result shown in final report is the average of 10 experiments.
- to visualize the accuracy on 3 citation datasets, apply the above command for each dataset respectively and then
head over to
notebooks/fixedpoint_visualization.ipynb
. Visualization of test accuracy taken from final report:
Dataset | GCN | SSE | GCN with joint loss function |
---|---|---|---|
Cora | 81.5 | 79.0 | 70.3 |
PubMed | 81.2 | 79.7 | 69.0 |
CiteSeer | 79.4 | 75.8 | 72.5 |
- executing the experiment to check the node embedding identifiability
python identifiability.py --dataset cora --knn 1 --repeat_times 5 --max_gcn_layers 10
where --dataset
is used to determine the dataset in the experiment and can be chosen from cora
,pubmed
and citeseer
.
--knn
is used to set the k-nearest-neighbourhood search after recovering the input node features. --repeat_times
represent
how many times the experiment will be repeated. --max_gcn_layers
determine the maximal layers of GCN model used in the experiment.
- Results are visualized in the script
notebooks/identifiability_visualization.ipynb
. Example visualization results of cora dataset are shown below:
python gnn_n_100layerGCN.py --dataset cora --exp_times 10 --num_random_features 10
The parameter --dataset
can be chosen from 7 node classification datasets, namely cora
, pubmed
, citeseer
,
amazon_photo
, amazon_computers
, coauthors_cs
and coauthors_physics
. You can train 100-layer GCN several times and this
is decided by --exp_times
, while for each training trail the trained model is tested with 10 different random features
that can be changed by --num_random_features
.
python gnn_n_3layerMLP.py --dataset cora --exp_times 10
Similar as experiments of 100-layer GCN, --dataset
can be chosen from 7 node classification datasets and --exp_times
determines how many times the experiment process will be repeated.
- computing GNN-N values
python gnn_n.py --dataset cora --mlp_exp_times 10 --gcn_exp_times 10 --gcn_num_random_features 10
In this step, experimental possibilities and
are computed, and then GNN-N
value is derived. --mlp_exp_times
must be set the same as --exp_times
used in the 3-layer MLP experiment.
--gcn_exp_times
and --gcn_num_random_features
on the other hand must be set the same as --exp_times
and
--num_random_features
used in experiments of 100-layer GCN respectively.
- After executing experiments and computing GNN-N values for all 7 datasets, you can visualize the results using
notebooks/gnn_n_3layerMLP_visualization.ipynb
. Visualization of GNN-N values for the 7 node classification datasets are listed in the following:
- The script
notebooks/gnn_n_3layerMLP_visualization.ipynb
is used to visualize results of 3-layer MLP experiments, for example test accuracy and repeating rates of 3-layer MLP:
- Results of 100-layer GCN experiments can be visualized in
notebooks/gnn_n_100layerGCN_visualization.ipynb
. The following image shows the visualization results of accuracy, R-RR and RO-RR.
Meanwhile, this script also includes the visualization results of TT-RR with heatmaps. The following image shows TT-RR of cora dataset and is taken from appendix of final report.