Alper Ahmetoglu

RT-1: Robotics Transformer for Real-World Control at Scale

#reading

TL; DR: Transformers are hammer and real-world control is nail?

Motivation

Training a general model (nowadays that means transformer) with a very general objective (like predicting the next word for gpt3) led to models that can solve downstream tasks with almost no training. This is still missing in robotics (probably due to the available data) and this paper is a contribution towards this aim, namely, the proposal of robotics transformer [1].

Comments

I liked the set of experiments in the paper. They test the method from different axes which is useful for understanding the method's limitations. They test (1) generalization to new objects, new environment conditions etc. (2) its capacity to absorb information from heterogeneous data, (3) its long-horizon performance, (4) and the performance change with respect to data size and diversity. A very cool set of experiments. It seems like this is a better architecture for robotics when compared with BC-Z [2] and Gato [3].

Yet, I don't think scale is all we need, and I don't think the idea of training a large transformer with a large data in robotics will work (as partially worked in LLMs). LLMs still fail on many tasks, and you would not use it when you want reliable outcomes. It's okay to use it while coding (worst outcome wrong code suggestion) but you would not ask it to collect the kitchen. And the data (and its variety) GPT3 is trained with is huge, can we collect real world interaction data for all kinds of problems? Just compare the available data size, its variety, and it still fails to do things reliably. I get the idea of scaling, but in the long run, it is probably easier to solve the problem with something else.

References

  1. Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Dabis, J., Finn, C., ... & Zitkovich, B. (2022). RT-1: Robotics Transformer for Real-World Control at Scale. arXiv preprint arXiv:2212.06817.
  2. Jang, E., Irpan, A., Khansari, M., Kappler, D., Ebert, F., Lynch, C., ... & Finn, C. (2022, January). BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning. In Conference on Robot Learning (pp. 991-1002). PMLR.
  3. Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S. G., Novikov, A., Barth-Maron, G., ... & de Freitas, N. (2022). A Generalist Agent. arXiv preprint arXiv:2205.06175.