这篇文章中介绍了一种采用生成对抗网络的新的方案来解决图中的半监督学习。首先是我的阅读笔记。然后在此基础上做一些简单的实验。
Semi-supervised Learning on Graphs with Generative Adversarial Nets https://github.com/THUDM/GraphSGAN
Using Generative Adversarial Nets(GAN) to help semi-supervised learning on graphs. The curse of dimensionality may make the propagation across different cluster easier, so authors want to generate more fake nodes between different clusters.
Using GAN to generate fake nodes for low gap density area.
This formula means that we want to minimize the Generated graphs(fake) and maximize the real graphs for D function. At the same time, we should maximize the second term for G function to fool the D function.
Try to weaken the effect of propagation across density gaps by adding fake nodes.
Problem: GAN cannot use the graphs' topology, the G function cannot generate fake nodes in low density areas.
Solution: Use some embedding techniques such as DeepWalk, Line, NetMF and so on to learn the latent distributed representation. After the graph topology learning, we can use these infomation to train GANs.
-
Batch Normalization (BN) for gradient stability
-
Weight Normalization for trainable weight scale
-
additive Gaussian noise in D function (classifier in this paper) for training
-
neighbor fusion
Classifier(D function): softmax with an additional fake class
Loss function:
-
D function:$L_D =loss_{sup} + \lambda_0 loss_{un} + \lambda_1 loss_{ent} + loss_{pt}$
- for labeled nodes: maximize cross entropy
- for classify: should not be mapped into the low density area and fake nodes should be mapped into the low density area: minimize p(M|x),maximize p(M|g(z))
- For unlabeled nodes: maximize the entropy over definite labels
- For low density area: widen the gap: maximize the cosine distance
-
G function:$L_G =loss_{fm} + \lambda_2 loss_{pt}$
- Generated nodes: should be in the low density area: minimize the distance to central point
- Generated samples should not overfit at the only center point: the same as 4 in D function.
-
hyper-parameter: to control the relative strength of different terms.
To random walk to generate word2vec model, I choose the number of walk rounds as 10, because the average edge for each node is 2. Meanwhile, I choose the path length as 400 due because the average nodes' number for each type is 400.
import random
from gensim.models import Word2Vec
def walker(G, walk_length, start_node):
walk = [str(start_node)]
while len(walk) < walk_length:
cur = int(walk[-1])
cur_nbrs = list(G.neighbors(cur))
if len(cur_nbrs) > 0:
walk.append(str(random.choice(cur_nbrs)))
else:
break
return walk
# From https://github.com/shenweichen/GraphEmbedding/blob/master/ge/walker.py
def _simulate_walks(G, nodes, num_walks, walk_length):
walks = []
for _ in range(num_walks):
random.shuffle(nodes)
for v in nodes:
walks.append(walker(G, walk_length=walk_length, start_node=v))
return walks
def walk2vec(G, num_walks, walk_length):
walks = _simulate_walks(G, list(G.nodes()), num_walks, walk_length)
return Word2Vec(walks,sg=1,hs=1)
# The DataSet has 2708 nodes and 5429 edges with 7 classes
Neighbour Fusion
import numpy as np
def neighbour_fushion(node, nblist, fmatrix, alpha):
sum = np.zeros(len(fmatrix[0]))
for idx in nblist:
sum = sum + fmatrix[idx]
return sum * (1 - alpha) / len(nblist) + alpha * fmatrix[node]
For a test:
G = nx.Graph(adj)
Ln = list(G.neighbors(1))
f = neighbour_fushion(1, Ln, features, 0.9)
a = list(G.nodes())
c = walk2vec(G, 10, 400)
f = np.append(f, c['1'])
- walk2vec(G, 10, 400): acc = 67% 这个结果和文章中预期的结果83%相去甚远,我首先认为是由于对网络结构信息的过度采样导致的,于是我对自己实现的deepwalk进行了参数调整。
walk2vec(G, 10, 100): acc = 68.5%
walk2vec(G, 10, 10): acc = 71.2%
walk2vec(G, 10, 5): acc = 64.5%
walk2vec(G, 10, 7): acc = 65%
walk2vec(G, 10, 15): acc = 65.4%
- 好像没用,依然猜测和网络表示不足相关
word2vec(size = ):
size = 1000: acc = 70%
size = 1500: acc = 70%
size = 2000: acc = 73%
size = 2500: acc = 72%
size = 3000: acc = 71%
size = 5000: acc = 67.5%
最好的结果依然不能和文中的83%相比。 观察到生成器的Loss很难降低,而分类器的Loss迅速降低,转而去调整算法中的超参数。
- 上面的DeepWalk是自己实现+word2vec的简易版本,为了对比不同的网络嵌入向量,采用OpenNE系统,系统性地生成不同的embedding试一下