Skip to content

Commit

Permalink
add minibatch tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
kayzliu committed Feb 4, 2024
1 parent 86e59dd commit 2654c80
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/api_cc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Key Attributes of a fitted detector:

* :attr:`pygod.detector.Detector.decision_score_`: The outlier scores of the input data. Outliers tend to have higher scores.
* :attr:`pygod.detector.Detector.label_`: The binary labels of the input data. 0 stands for inliers and 1 for outliers.
* :attr:`threshold_` : The determined threshold for binary classification. Scores above the threshold are outliers.
* :attr:`pygod.detector.Detector.threshold_` : The determined threshold for binary classification. Scores above the threshold are outliers.

**Input of PyGOD**: Please pass in a `PyG Data object <https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch_geometric.data.Data>`_.
See `PyG data processing examples <https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#data-handling-of-graphs>`_.
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ GADNR 2024 GNN+AE Yes :class:`pygod.detector.GADN
install
tutorials/index
api_cc
minibatch

.. toctree::
:maxdepth: 3
Expand Down
57 changes: 57 additions & 0 deletions docs/minibatch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Efficient GPU Training
======================

To train deep detectors efficiently, we usually use
`CUDA <https://developer.nvidia.com/cuda-toolkit>`_ to accelerate
the detector training on GPU. PyGOD provides ``gpu`` parameter for
``DeepDetector``. During initialization, we can set ``gpu`` to the index
of the GPU that is available. By default, ``gpu=-1``, which means train
the detector on CPU. Here is an example of initialize ``DOMINANT`` with
the first GPU (index of ``0``):

.. code:: python
DOMINANT(gpu=0)
However, training deep detectors on large-scale graphs can be
memory-intensive, especially on the detectors relying on adjacency
matrix reconstruction. At this time, full batch training may result in
out-of-memory (OOM) error. As such, we divide the large graph into
minibatches, and train the detector on each batch. PyGOD provides
``batch_size`` parameter for ``DeepDetector``, where users are able to
adjust the size of each batch for various GPU memory. We recommend users
setting ``batch_size`` to largest value that will not cause OOM. For
instance, we would like to train ``DOMINANT`` with the batches of 64
nodes:

.. code:: python
DOMINANT(gpu=0, batch_size=64)
Unlike other data modalities, the output of each node in graphs rely on
its neighbors. In PyGOD implementation, we adopt the data loader
``torch_geometric.loader.NeighborLoader`` in PyG to load both the center
nodes and the neighbor nodes for minibatches. But the computation on
neighbor nodes will lead to significant overhead and reduce the
efficiency in the detector training. Thus, we neighbor sampling is
crucial to reduce the overhead. PyGOD provides ``num_neigh`` parameter
for ``DeepDetector``. We can specify how many neighbors are sampled at
each layer of the detector. The default value of ``num_neigh`` is
``-1``, indicating sample all neighbors of the center node. If we want
to sample 5 neighbors at each layer, we can initialize ``DOMINANT``
like:

.. code:: python
DOMINANT(gpu=0, batch_size=64, num_neigh=5)
We can also sample different number of neighbors at each layer by
setting ``num_neigh`` as a list, but the length of the list has to match
with the number of layers ``num_layers``:

.. code:: python
DOMINANT(gpu=0, batch_size=64, num_layers=2, num_neigh=[5, 3])
To learn more, read PyG's tutorial on
`Scaling GNNs via Neighbor Sampling <https://pytorch-geometric.readthedocs.io/en/latest/tutorial/neighbor_loader.html>`_.

0 comments on commit 2654c80

Please sign in to comment.