Alper Ahmetoglu

Open-ended learning with RL?

This thing is harder than I thought. Anyways, can I somehow build an open-ended algorithm using the current reinforcement learning stack which would serve as a baseline for other open-ended algorithms? An open-ended algorithm means (in my terms) that an agent progressively learns new representations to interact with an environment, indefinitely. Say I want to use RL, then I would probably need to devise a reward function that probably takes the agent's sensorimotor data as the input. I most probably cannot say that "Okay, this torch.randn(...) is your reward, the closer the more reward" because that random vector might be unattainable. GANs on sensorimotor data then the output of the GAN as the reward, maybe, but no. Trying to re-experience previously seen trajectories? That actually sounds good in words, but I think there would be some technicalities to make it work.

POET [1] fixes the reward function but changes the environment (of the same task). Open-ended learning work by DeepMind [2] creates a whole new environment that can be parameterized and they also set the reward randomly (if I remember correctly). Though, the reward function is not set by the agent. You cannot change your environment arbitrarily in the real world, and cannot set the reward function in the real world without understanding the real world.

By the way, for the interested: