Code for the paper "Detecting Edge and Node Anomalies with Temporal GNNs", Proceedings of the 3rd GNNet Workshop@CoNEXT 2024.
This repository contains the code to implement GCN-GRU for anomaly detection on nodes and edges on graph data and the four real-world datasets with injected anomalies used in the paper. The code is organized as follows.
gcn-gru/
+-- scripts/
| +-- preprocessing/
| | +-- preprocessing.py
| +-- tgnn/
| | +-- gcngru.py
| | +-- models.py
| +-- utils/
| | +-- utils.py
+-- notebooks/
| +-- demo.ipynb
| ...
+-- data/
| ...
preprocessing.py
: functions to preprocess datagcngru.py
: wrapper class for the base modelsmodels.py
: description of base models (GCN, GCN-GRU for nodes, edges and both)utils.py
: utility functions
demo.ipynb
: example of a single training and testing for anomaly detection (node-only, edge-only and both)
- Each file named
adjs_anom_dataSet
is a list of matrices (one per snapshot). Each matrix contains original edges + injected anomalies. They represent both the Graph and the "Features". - Each file named
anomalies_edges_idx_dataSet
is a list of boolean arrays (one per snapshot). True means that the edge is anomalous, False means that the edge is normal. They represent the EDGE ground truth - Each file named
anomalies_nodes_idx_dataSet
is a list of boolean arrays (one per snapshot). True means that the node is anomalous, False means that the node is normal. They represent the NODE ground truth
Bipartite | Docs | Event | |
---|---|---|---|
reddit |
Y | Social posting | |
webbrowsing |
Y | WebBrowsing | Web browsing |
stackoverflow |
N | StackOverflow | Community interaction |
uci |
N | UCI | Messages on social network |
The notebook demo
allows to perform a single training and test experiment. To use it, specify the desired dataset and the model parameters. The results are printed and the anomaly scores for edges and nodes are saved.
In demo.ipynb
, the variable splits
is a tuple with 5 variables. They are:
history
: number of snapshots used as historytrain_start
: first training snapshot ID -1train_end
: last training snapshot IDval
: number of snapshots used as validationtest
: final snapshot. E.g.:
splits = (10, 9, 19, 5, 29)
this means that
- the history starts at
$t_0$ and ends at$t_9$ - the training starts at
$t_{10}$ and ends at$t_{19}$ - the validation starts at
$t_{20}$ and ends at$t_{24}$ - the test starts at
$t_{25}$ and ends at$t_{29}$