I'm trying to fill the pseudocode.py DeepMind attached to its paper about MuZero. for the network part I used a very similar structure to yours in this repository but I'm not able to make the training process to converge.
By the way, I tried with Connect4 game instead of tic-tac-toe.
Could you please @YuriCat take a look just in case I made some mistake? https://github.com/Zeta36/muzero
Thank you!!
I'm trying to fill the pseudocode.py DeepMind attached to its paper about MuZero. for the network part I used a very similar structure to yours in this repository but I'm not able to make the training process to converge.
By the way, I tried with Connect4 game instead of tic-tac-toe.
Could you please @YuriCat take a look just in case I made some mistake? https://github.com/Zeta36/muzero
Thank you!!