Discovering Predictive Relational Object Symbols with Symbolic Attentive Layers

Abstract: In this paper, we propose and realize a new deep learning architecture for discovering symbolic representations for objects and their relations based on the self-supervised continuous interaction of a manipulator robot with multiple objects on a tabletop environment. The key feature of the model is that it can handle a changing number number of objects naturally and map the object-object relations into symbolic domain explicitly. In the model, we employ a self-attention layer that computes discrete attention weights from object features, which are treated as relational symbols between objects. These relational symbols are then used to aggregate the learned object symbols and predict the effects of executed actions on each object. The result is a pipeline that allows the formation of object symbols and relational symbols from a dataset of object features, actions, and effects in an end-to-end manner. We compare the performance of our proposed architecture with state-of-the-art symbol discovery methods in a simulated tabletop environment where the robot needs to discover symbols related to the relative positions of objects to predict the observed effect successfully. Our experiments show that the proposed architecture performs better than other baselines in effect prediction while forming not only object symbols but also relational symbols. Furthermore, we analyze the learned symbols and relational patterns between objects to learn about how the model interprets the environment. Our analysis shows that the learned symbols relate to the relative positions of objects, object types, and their horizontal alignment on the table, which reflect the regularities in the environment.
Wandb runs:
Comparison of Models
The proposed model is shown in the top panel. The encoder and the self-attention module take object features as input and process them in parallel. The encoder outputs an object symbol zi for the object oi, and the self-attention module outputs the query vector qi and the key vector ki which are used as in Equation 1 to calculate relational symbols. For comparison, we also provide high-level outlines of [1] and [2] in the bottom panel in (ii) and (iii), respectively.

  1. Ahmetoglu, A., Seker, M. Y., Piater, J., Oztop, E., & Ugur, E. (2022). Deepsym: Deep symbol generation and rule learning for planning from unsupervised robot interaction. Journal of Artificial Intelligence Research, 75, 709-745.
  2. Ahmetoglu, A., Oztop, E., & Ugur, E. (2022). Learning multi-object symbols for manipulation with attentive deep effect predictors. arXiv preprint arXiv:2208.01021.