Deep neural networks based on self-attention are revolutionizing robotics with their ability to perform "open world" reasoning across multiple modalities including text and images, and their ability ...