I have started this thing where I will summarize the papers that drew my attention.
TL; DR: Generate multiple environment-agent pairs and simultaneously train agents while mutating environments. Also, transfer agents between environments if they perform better. The result is good solutions to extremely challenging environments that cannot be found with direct optimization for the task. This is an open-ended process because the problem is ever-evolving, can be run indefinitely.
- POET  video: https://www.youtube.com/watch?v=D1WWhQY9N4g
- Enhanced  POET video: https://www.youtube.com/watch?v=QlPnjpYghAk
Problem and Motivation
The main motivation of this work is the following hypothesis: advancements in history were not achieved in a monotonously increasing fashion. Some ideas were discovered while aiming for a different, possibly unrelated objective. The paper gives such an example, "Microwaves were invented not by food-heating researchers but by those studying radar", emphasizing that there might be some stepping stones that are not necessarily good for the specific problem at hand but might be promising for another.
The current machine learning paradigm first selects or defines a task with its objectives, such as classifying images in ImageNet into 1000 different classes, then designs the solution for the specific problem. Another example might be the game of chess where the challenge is to beat the opponent. However, when we define the problem, we only optimize for the specific problem, disallowing other interesting solutions. We can think of this as a closed-ended process where we solve the problem only once. Although transfer learning or meta-learning approaches allow transferring solutions to other domains, these methods themselves are closed-ended since we select from which problem to which problem we should transfer and so on. Authors advocate for open-ended methods in which the algorithm generates its own problem and solution (and they make an analogy to GANs ). Such a process is desirable since we can turn it on and run it indefinitely to find solutions to diverse problems that we do not anticipate beforehand.
The outline of POET.
This paper proposes an algorithm, Paired Open-Ended Trailblazer (POET), which generates environment-agent pairs (problem-solution pairs), and trains agents on these environments (solves the problem) while simultaneously generating more complex environments through mutation. Furthermore, it allows for transfer between the solutions and the problems thus allowing to find solutions that are like stepping stones, not good for the paired environment but better for an- other one. The general outline is shown in the above figure. The algorithm is tested on a 2D bipedal walking environment. The results show that
- POET can find solutions to very challenging problems (which it automatically generates) that cannot be solved with direct optimization.
- An alternative direct-path curriculum building method (e.g. start from easy ones, then increase difficulty) cannot solve the very challenging and the extremely challenging settings suggesting the cross-transfer of POET is essential.
Contributions in Enhanced-POET
Enhanced-POET makes some incremental changes on the first paper (POET):
- Environment encoding (EE) defines an environment, for example, 3 gaps and 5 obstacles. This was hand-designed in POET. In this paper, it is encoded with CPPN  that allows creating more and more complex environments at a very fine granularity. This is also a more generic design that is domain-independent.
- Environment characterization (EC) is a way to assess the diversity of the environment. In the first paper, this is measured with distances to k-nearest neighbors in the environment encoding (EE) space. Because EE space is hand-designed, EC is also hand-designed since it depends on EE. In this work, the authors propose Performance of All Transferred Agents EC (PATA-EC), which is independent of EE and the domain. In short, all agents are evaluated in an environment and sorted by their performance. Then, this vector of performances is used as EC. Because this is independent of EE, more diverse environments can be generated.
- Accumulated number of novel environments created and solved, ANNECS, is proposed to assess the quality of the open-ended algorithms.
- Minor modifications to cross-environment agent transfer to increase computational efficiency.
- Wang, R., Lehman, J., Clune, J., & Stanley, K. O. (2019). Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753.
- Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., & Stanley, K. (2020, November). Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International Conference on Machine Learning (pp. 9940-9951). PMLR.
- Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:1406.2661.
- Stanley, K. O. (2007). Compositional pattern producing networks: A novel abstraction of development. Genetic programming and evolvable machines, 8(2), 131-162.