That's just not how neural networks learn. They generally can't "watch" others to learn.
In the example above, at no point the neural network would be watching the video footage. The reference footage is just used for (automatically) scoring how well it does during training. It's not fed into the network itself in any shape or form.
In the example above, at no point the neural network would be watching the video footage. The reference footage is just used for (automatically) scoring how well it does during training. It's not fed into the network itself in any shape or form.