What is better, Machine Learning on human generated data or Reinforcement Learning from scratch?

Please Mario, help us...

PyTorch Conv3D neural network on 3K frames

Proximal Policy Optimization on 5K frames

What a fool Mario!

Your CNN got destroyed and your RL can't finish the level. While you are trying to fix it, I will enjoy my time with Peach MWAHAHAHA!

My broooo, let me explain...

  • Your CNN is predicting an action based on a sequence of frames BUT it doesn't know if your action is good for beating the game. All samples are not winning runs sadly,
  • You could remove the last frames before you fail because you don't want the CNN to learn that,
  • Play more the game and don't forget to generate data to feed the CNN!
  • I think Toad got more ideas about how to improve your skill here,
  • While you are melting your brain cells, just make more runs of your PPO model, I got a good feeling. Take care my Bro!

Let's-a go! More PPO runs...

50K frames

100K frames

500K frames

1m frames

1.5m frames

Nope, still no...

Distance travelled during Mario's adventures