Hello, great work!
In the paper and the code, you refer to an "annealing speed" lambda ranging over 10*(2^i) for i = 0, 1, ..., 8.
What does this refer to? Is it a learning rate annealer? Do you mean this is how often you reduce the learning rate?
I tried to read the code but could not figure out what seek does (the lambda parameter is used to update this data seek thing).
Thank you very much!
Hello, great work!
In the paper and the code, you refer to an "annealing speed" lambda ranging over 10*(2^i) for i = 0, 1, ..., 8.
What does this refer to? Is it a learning rate annealer? Do you mean this is how often you reduce the learning rate?
I tried to read the code but could not figure out what
seekdoes (the lambda parameter is used to update this data seek thing).Thank you very much!