Documente Academic
Documente Profesional
Documente Cultură
Akash Mittal1∗ , Anuj Dhawan1∗ , Sourav Medya2 , Sayan Ranu1 , Ambuj Singh2
1
Indian Institute of Technology Delhi
2
University of California, Santa Barbara
1
{cs1150208, Anuj.Dhawan.cs115, sayanranu}@cse.iitd.ac.in , 2 {medya, ambuj}@cs.ucsb.edu
arXiv:1903.03332v1 [cs.LG] 8 Mar 2019
Train
Table 3: MVC: Comparison of our method with the state-of-the- our method produces better results consistently with the qual-
art method on large BA graphs. Gain is defined as the ratio ity being up to 41% better than SoA. Furthermore, SoA fails
between coverage produced by GCOMB and SoA. to scale on large graphs (100k nodes and beyond).
GCOMB is not only better in quality, but also more robust.
This property is captured in Table 4, which shows the vari-
Baselines: We denote our method as GCOMB, the greedy
ance in quality across the different training instances. The
algorithm as GR, the state-of-the-art method from [Khalil et
variance is SoA is up to 3 times higher than GCOMB. Over-
al., 2017] as SoA. For SoA, we use the code shared by the au-
all, GCOMB is up to 41% better in quality, more scalable,
thors. Note that computing the optimal solution is not feasible
and robust when compared to SoA.
since the problems being learned, such as MCP and MVC,
are NP-hard. GR guarantees a 1 − 1/e approximation for Next, we move to evaluating GCOMB on large graphs. We
both MCP and MVC (optimization version). Given a budget omit SoA from the next set of experiments since it crashes on
b, we compute a random selection of b nodes. This baseline these datasets due to scalability issues.
is called Random.
Other Settings: GCN is trained for 200 epochs with a
4.3 Comparison with Greedy
learning rate of 0.0005, a dropout rate of 0.1 and a convo- In this section, we benchmark GCOMB against GR.
lution depth (K) of 2. For training the n-step Q-Learning Quality: Table 5 shows the results on the MCP problem
neural network, n is set to 2 and a learning rate of 0.0001 is over multiple synthetic and real datasets. GCOMB is trained
used. In each epoch of training, 8 training examples are sam- on BA-1000. The “Ratio” column in Table 5 denotes the ratio
pled uniformly from the Replay Memory M as described in between the quality produced by GCOMB against GR. While
Alg. 3. This Q Learning network is trained for 5-10 graph GCOMB consistently produces quality above 80% on syn-
instances with b = 30. thetic graphs, it improves further to 91% on real datasets. For
MVC problem, the results are around 50% (Table 6).
4.2 Comparison with State-of-the-art Efficiency (MVC): The real benefit of using GCOMB for
First, we benchmark GCOMB against the state-of-the-art combinatorial optimization problems comes from the prop-
learning method by [Khalil et al., 2017] (SoA) on the bipartite erty that it is significantly faster than greedy. Specifically,
graph model proposed by [Khalil et al., 2017]. We measure once the parameters have been learned in the training phase
the quality of the solution sets obtained by these two meth- and the node embeddings have been constructed on the un-
ods on both MCP and MVC.For a fair comparison, we keep seen graph, for any given budget b, we only need to pass the
the training time (1 hour), training dataset and testing dataset top-10% nodes through the Q-learning neural network. This
same for both methods. Furthermore, to measure robustness, is an extremely lightweight task when compared to the greedy
all results are reported by averaging the quality over 5 training solution, and therefore, significantly faster.
instances. To bring out this aspect, we present the online running
As Khalil et al. show results on small graphs, we first test times of GCOMB and greedy in Fig. 2b as the budget is
both GCOMB and SoA on relatively small graphs. Both the increased. As clearly visible, GCOMB is more than 7 times
methods are trained on a graph with 1000 nodes and tested for faster with a lower growth rate than GR. The quality, although
the MCP problem (Section 2) with budget 15. As presented slightly lower than GR, grows at par with GR (Fig. 2a).
in Table 2, GCOMB outperforms SoA across all graph sizes. Impact of dimension (MCP): We run GCOMB for MCP
Next, we show the results for BA datasets with larger sizes by varying the node embedding dimension in GCN. This ex-
on the MVC problem with budget as b = 30. Both methods periment is performed on BP datasets of different sizes. The
are trained on BA graphs with 1k nodes. Table 3 shows that model is trained on a BP graph with 1000 nodes. Table 7
Graph GR Random GCOMB Ratio
BA-100k 14490 270 7975 .55 3000 900
GCOMB
BA-500k 30723 268 14911 .49 2500 GR
Coverage
Coverage
LB 14787 238 7468 .50 2000 Random 600
LG 76793 416 29531 .38 1500
1000 300
Table 6: MVC: Comparison of our method with baselines on 500
large graphs. Ratio denotes the ratio between the quality pro- 00 15 30 45 00 15 30 45
duced by GCOMB and GR.
Gi Gi
(a) MVC (b) MCP
80000 150 Figure 3: Quality of the solution sets produced by GCOMB and
60000 120
Time (Sec.) GR on temporal networks.
Coverage
90 GCOMB
40000 GR and MVC problems respectively and then test on all of the
GCOMB
60 remaining graphs.
20000 GR 30
Random Figure 3 presents the performance of GCOMB. As visible,
030 40 50 60 70 030 40 50 60 70 the quality of GCOMB’s solutions sets are almost identical to
Budget (b) Budget (b) GR across all test graphs.
(a) Quality (b) Time
5 Previous Work
Figure 2: MVC: (a) Quality and (b) running Time of GCOMB Many combinatorial graph problems lead to NP-hardness
and GR on BA graphs with 1 million nodes against budget. [Karp, 1972].There are classical NP-hard problems on graphs
such as Minimum Vertex Cover, Minimum Set Cover, Trav-
elling Salesman Problem (TSP); as well as other important
presents the results. The quality of GCOMB is quite stable
combinatorial problems with many applications. Examples
and does not vary much across dimensions.
include finding top influential nodes [Kempe et al., 2003],
maximizing centrality of nodes [Yoshida, 2014], and optimiz-
Graph d = 64 d = 128 d = 256
BP-4k 2678 2668 2589
ing networks [Medya et al., 2018; Dilkina et al., 2011].
BP-5k 3344 3296 3250 There has been recent interest in solving graph combinato-
BP-10k 6603 6565 6443 rial problems with neural networks and reinforcement learn-
ing [Bello et al., 2016; Khalil et al., 2017]. Learning-based
Table 7: MCP: Performance of GCOMB with varying embed- approaches are useful in producing good empirical results for
ding dimensions. NP-hard problems. The methods proposed in [Bello et al.,
2016] are generic and do not explore structural properties
of graphs, and use sample-inefficient policy gradient meth-
4.4 Application: Temporal Networks ods. Khalil et al. [Khalil et al., 2017] has investigated the
same problem with a network embedding approach combined
The primary motivation behind GCOMB is the observation
with a reinforcement learning technique. The same problem
that many real-world networks change with time although the
has been studied by Li et al. [Li et al., 2018] via a super-
underlying model remains the same. Temporal networks is
vised approach using GCN. Among other interesting work,
one such example. In this section, we show that GCOMB
for branch-and-bound algorithms, He et al. studied the prob-
is ideally suited for this scenario. Specifically, we establish
lem of learning a node selection policy [He et al., 2014].
that if GCOMB is trained on today’s network snapshot, it can
Another examples include pointer networks [Vinyals et al.,
predict solutions to combinatorial optimization problems on
2015] proposed by Vinyals et al. and reinforcement learn-
future network snapshots accurately.
ing approaches by Silver et al. [Silver et al., 2016] to learn
To demonstrate GCOMB’s predictive power on temporal
strategies for the game Go.
networks, we considered a question-answer platform, namely
the sx-mathoverflow (SM) dataset (Table 1). The nodes cor-
respond to users and an edge at a time t depicts an interaction 6 Conclusion
between two users (Ex: answering or commenting to ques- In this paper, we have proposed a deep reinforcement learn-
tions) at time t. The data consists of interaction over a span of ing based framework called GCOMB to learn algorithms
2350 days. To generate train and test graphs, we split the span for combinatorial problems on graphs. GCOMB first gen-
of 2350 days into 55 equal spans (around 42 days) and con- erates node embeddings through Graph Convolutional Net-
sider all interactions (edges) inside a span for the correspond- works (GCN). These embeddings encode the effect of a node
ing graph. We denote these as G = {Gi |i = 1, 2, · · · , 55}. on the budget-constrained solution set. Next, these embed-
We use multiple interactions between a pair of users as the dings are fed to a neural network to learn a Q-function and
weight of the edge between them. The edge-weight is used predict the solution set. Through extensive experiments on
as a feature (activity of an user) in the GCN. For training, we both real and synthetic datasets containing up to a million
pick the graphs G0 , · · · , G4 and G21 , · · · , G26 for the MCP nodes, we show that GCOMB is up to 41% better in quality
over the state-of-the-art neural method, more robust and more [Khalil et al., 2017] Elias Khalil, Hanjun Dai, Yuyu Zhang,
scalable. In addition, GCOMB is up to 7 times faster than the Bistra Dilkina, and Le Song. Learning combinatorial op-
greedy approach, while being of comparable quality. Overall, timization algorithms over graphs. In Advances in Neural
with these qualities, GCOMB can be operationalized on real Information Processing Systems, pages 6348–6358, 2017.
dynamic networks to solve practical problems at scale. [Li et al., 2018] Zhuwen Li, Qifeng Chen, and Vladlen
Koltun. Combinatorial optimization with graph convolu-
References tional networks and guided tree search. In Advances in
[Apollonio and Simeone, 2014] Nicola Apollonio and Neural Information Processing Systems, pages 537–546,
Bruno Simeone. The maximum vertex coverage problem 2018.
on bipartite graphs. Discrete Applied Mathematics, [Medya et al., 2018] Sourav Medya, Jithin Vachery, Sayan
165:37–48, 2014. Ranu, and Ambuj Singh. Noticeable network delay min-
[Arora et al., 2017] Akhil Arora, Sainyam Galhotra, and imization via node upgrades. Proceedings of the VLDB
Endowment, 11(9):988–1001, 2018.
Sayan Ranu. Debunking the myths of influence maximiza-
tion: An in-depth benchmarking study. In Proceedings of [Mnih et al., 2013] Volodymyr Mnih, Koray Kavukcuoglu,
the 2017 ACM International Conference on Management David Silver, Alex Graves, Ioannis Antonoglou, Daan
of Data, pages 651–666. ACM, 2017. Wierstra, and Martin Riedmiller. Playing atari with deep
reinforcement learning. arXiv preprint arXiv:1312.5602,
[Bello et al., 2016] Irwan Bello, Hieu Pham, Quoc V Le,
2013.
Mohammad Norouzi, and Samy Bengio. Neural combi-
natorial optimization with reinforcement learning. arXiv [Riedmiller, 2005] Martin Riedmiller. Neural fitted q
preprint arXiv:1611.09940, 2016. iteration–first experiences with a data efficient neural re-
inforcement learning method. In European Conference on
[Dilkina et al., 2011] Bistra Dilkina, Katherine J. Lai, and Machine Learning, pages 317–328. Springer, 2005.
Carla P. Gomes. Upgrading shortest paths in networks.
In Integration of AI and OR Techniques in Constraint [Silver et al., 2016] David Silver, Aja Huang, Chris J Maddi-
Programming for Combinatorial Optimization Problems, son, Arthur Guez, Laurent Sifre, George Van Den Driess-
pages 76–91. Springer, 2011. che, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan-
neershelvam, Marc Lanctot, et al. Mastering the game
[Goyal et al., 2011] Amit Goyal, Wei Lu, and Laks VS Lak- of go with deep neural networks and tree search. nature,
shmanan. Simpath: An efficient algorithm for influence 529(7587):484, 2016.
maximization under the linear threshold model. In 2011
IEEE 11th international conference on data mining, pages [Sutton and Barto, 2018] Richard S Sutton and Andrew G
211–220. IEEE, 2011. Barto. Reinforcement learning: An introduction. MIT
press, 2018.
[Guha et al., 2002] Sudipto Guha, Refael Hassin, Samir
[Vinyals et al., 2015] Oriol Vinyals, Meire Fortunato, and
Khuller, and Einat Or. Capacitated vertex covering with
applications. In Proceedings of the thirteenth annual Navdeep Jaitly. Pointer networks. In Advances in Neural
ACM-SIAM symposium on Discrete algorithms, pages Information Processing Systems, pages 2692–2700, 2015.
858–865. Society for Industrial and Applied Mathematics, [Wilder et al., 2018] Bryan Wilder, Han Ching Ou, Kayla
2002. de la Haye, and Milind Tambe. Optimizing network struc-
[Hamilton et al., 2017] Will Hamilton, Zhitao Ying, and Jure ture for preventative health. In Proceedings of the 17th
International Conference on Autonomous Agents and Mul-
Leskovec. Inductive representation learning on large
tiAgent Systems, pages 841–849, 2018.
graphs. In Advances in Neural Information Processing
Systems, pages 1024–1034, 2017. [Yoshida, 2014] Yuichi Yoshida. Almost linear-time algo-
rithms for adaptive betweenness centrality using hyper-
[He et al., 2014] He He, Hal Daume III, and Jason M Eisner.
graph sketches. In Proceedings of the 20th ACM SIGKDD
Learning to search in branch and bound algorithms. In
international conference on Knowledge discovery and
Advances in neural information processing systems, pages
data mining, pages 1416–1425. ACM, 2014.
3293–3301, 2014.
[Jung et al., 2012] Kyomin Jung, Wooram Heo, and Wei
Chen. Irie: Scalable and robust influence maximization
in social networks. In 2012 IEEE 12th International Con-
ference on Data Mining, pages 918–923. IEEE, 2012.
[Karp, 1972] Richard M Karp. Reducibility among combi-
natorial problems. In Complexity of computer computa-
tions, pages 85–103. Springer, 1972.
[Kempe et al., 2003] David Kempe, Jon Kleinberg, and Éva
Tardos. Maximizing the spread of influence through a so-
cial network. In KDD, 2003.