What is better, Machine Learning on human generated data or Reinforcement Learning from scratch?
Please Mario, help us...
PyTorch Conv3D neural network on 3K frames |
Proximal Policy Optimization on 5K frames |

What a fool Mario!
Your CNN got destroyed and your RL can't finish the level. While you are trying to fix it, I will enjoy my time with Peach MWAHAHAHA!
My broooo, let me explain...
- Your CNN is predicting an action based on a sequence of frames BUT it doesn't know if your action is good for beating the game. All samples are not winning runs sadly,
- You could remove the last frames before you fail because you don't want the CNN to learn that,
- Play more the game and don't forget to generate data to feed the CNN!
- I think Toad got more ideas about how to improve your skill here,
- While you are melting your brain cells, just make more runs of your PPO model, I got a good feeling. Take care my Bro!
![]()


Let's-a go! More PPO runs...
50K frames |
100K frames |
500K frames |
1m frames |
1.5m frames |
Nope, still no...

Distance travelled during Mario's adventures
