-
Notifications
You must be signed in to change notification settings - Fork 142
a question #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, thanks for the comments. I am currently in the process of getting results for experiments on the Travelling Salesman Problem, and in the process I've been improving the code quite a bit. As soon as I get some reasonable results that are comparable to those from the paper, I'll make a more detailed write-up (probably a blog post!). This was quite difficult to implement from scratch lol. To do that, you'd just have to write a script that loads the model from a checkpoint (I have this code written in |
thank you for your answer and looking 👀 forward to your blog post! |
Hello, Would you be interested in exchanging implementation ideas? If so I can send you an email... Best regards, |
I'm kindly asking to keep the discussion here, because it will be beneficial for all concerned. |
@ricgama I implemented the LSTM cell in the decoder myself to add the "glimpse" and "pointer" attention components manually. I took inspiration from https://github.com/MaximumEntropy/Seq2Seq-PyTorch/blob/master/model.py |
ok, thanks for the reply. As I followed https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation I've used an Pytorch cell for the decoder as well. I think it's the same only different programming style. |
I'm not 100% sure, but I don't think it is the same. The attending and pointing mechanisms, as indicated in the paper, take as input both the encoder LSTM's hidden state (context, in the NCO for RL paper, |
My decoder class looks like this:
Maybe I'm wrong but I think that it does what you have described. What do you think? |
I see, you're calling |
@ricgama How is it going for you? I have yet to get results on TSP20. The hyperparameters mentioned in the paper don't seem to be working for me :( |
Hello @pemami4911, |
@ricgama Is there a particular difficulty with using Github issues for you? I find communicating via Issues quite nice actually. Yeah for n=10, I tried halving the size of the network to 64 and batch sizes of 64, and for learning rates of ~3.5e-4 I was able to get a little bit of improvement. But for n=20, and for the most part for n=10, the tour length stays the same. I'm plotting the average negative-log likelihood of the selected tours, which helps to see whether it's learning something our not. On n=10, the NLL hangs at 15.10; on my best run, it dropped all the way down to ~5 for a little while, but the test tour length stayed around 5. I've been training the regular 128 dim network with lr 1e-3 on n=20 for about ~30 hours now (~850K minibatches of size 128) and the NLL has stayed flat and the tour length hasn't budged :/ |
@ricgama are you setting the logits of the already selected nodes during decoding to -infty to ensure only valid tours are used? |
@ricgama Also, did you do a distributed implementation? Or single-threaded on a GPU? |
@pemami4911 |
Ok, let me know if you find any bugs. Maybe I have a similar one then. I've tried to double and triple check how I implemented the equations, and it all seems right. For the RL version, I call 1 epoch a random generation of 10,000 minibatches of size 128 of TSP2D graphs- basically, I randomly generate training data on the fly. I randomly generate 1000 test graphs at the beginning of a run and keep it the same thereafter; after each epoch, I shuffle this test set and evaluate the progress of my model. I hope this is correct but not 100% sure. |
@unnir Yep! That's the general framework. The critic helps to reduce the variance when updating the actor (policy) parameters @ricgama I contacted the authors of Learning Combinatorial Optimization Algorithms Over Graphs, as they present some results comparing this to their proposed algorithm, and got the following reply about their attempt to replicate the results "We also use single GPU card for PN-AC (as Hanjun mentioned we use K80 for experiments). It is indeed painful to make Bello et.al.’s method work on TSP2D. We carefully read their paper for many times, and used all the tricks they mentioned. We also found it not working at all, and tried to use exponential moving average as the critic, instead of a separate critic neural network. Finally it works (the loss goes down), but not as good as the performance reported in their paper." |
and from Irwan Bello via email - "You can get to 6.3 average TSP50 tour length pretty easily (a few hours on a CPU). To get to 5.95, you will need to train longer, e.g 2 days with 8 gpus." |
@pemami4911 well, we are not alone :) I think I can give it a try with exponential moving average... |
@pemami4911 One question: You calculate the negative-log likelihood with respect to the ground truth or to the selected path? |
There's no ground truth in the RL case! So I am computing w.r.t the selected paths |
Hello, |
Hello,
but now the loss blows up to |
I had loss going to NaN when there was (what I believe to be) a race condition occurring when the decoder would select an invalid tour by sampling a city that it had just selected. You can check out my |
I implemented the exponential moving average critic but it didn’t seem to fix anything |
I'm almost sure that the problem is on the mask but still haven't figured out how to fix it. I guess that if the mask is properly done it will work on supervised learning as in RL... |
Yeah, if the mask is not implemented properly it will affect both the SL and RL cases. That was one of the trickier parts to get right I think |
Hi, |
Hmm.. I would just recommend using the reward function I have in |
Thanks for pointing this out, I haven’t tested on the sorting task in a while- I’ll take a look soon |
Per tick means that “0” = 5000, “1” = 10,000, ... , “9” = 50,000. The dashed line represents the results from Bello et al. EDIT: Also, I should point out that the Google drive dataset is for the supervised learning scenario; for RL, you need to be generating your data on the fly (since you need a lot more graphs to learn with RL than SL!) |
@unnir Just had to flip the sign of the sorting reward! Check out the latest commit on the master branch. Since for sorting, you want to maximize reward (in TSP, you minimize). Also updated it so you generate a new training set every epoch, but sorting converges after only a few steps |
Hello, your code has benefited me a lot. Do you have a program with a knapsack problem? I look forward to your help.My emai is quhengedu@mail.hfut.edu.cn |
Hi,
first thank you for your repo!
I just want to ask you, is there a source from where I can understand more about the architecture of your work, besides the mentioned paper?
I mean, maybe you wrote a blog post about your algorithm or something like this.
Kind regards,
Paddy
update:
Also, could you please tell how I can reuse the model which I've trained. I mean I finished the training and want to apply it for sorting a list, f.e.:
And the output would be a sorted list:
The text was updated successfully, but these errors were encountered: