Google’s DeepMind team is at the forefront of advancements in the field of robotics, addressing the complex challenges that arise when teaching robots to perform seemingly simple tasks. While humans effortlessly navigate tasks with potentially infinite variables, robots lack the inherent understanding and adaptability required for such actions.
To overcome this limitation, the robotics industry has traditionally focused on repeatable tasks in structured environments, where robots can excel. However, recent breakthroughs in robotic learning are revolutionizing the industry and paving the way for more versatile and adaptable systems.
Last year, Google DeepMind’s robotics team introduced Robotics Transformer, also known as RT-1, a groundbreaking system that trained Everyday Robot systems in various tasks like picking, placing, and opening drawers. The team leveraged a database of 130,000 demonstrations to achieve an impressive 97% success rate in performing over 700 tasks.
Now, DeepMind is unveiling the next evolution in robotic learning: RT-2. Vincent Vanhoucke, DeepMind’s Distinguished Scientist and Head of Robotics, explains that this new system enables robots to transfer the concepts learned from relatively small datasets to different scenarios. RT-2 exhibits enhanced generalization capabilities, surpassing the limitations of the robotic data it was exposed to. It can interpret new commands and respond to user instructions with rudimentary reasoning, such as identifying object categories or understanding high-level descriptions. This ability empowers RT-2 to determine the most suitable tools for specific novel tasks, drawing on existing contextual information.
One illustrative scenario is when a robot is asked to throw away trash. Traditionally, the user would need to teach the robot to identify what qualifies as trash and then train it to perform the disposal task, a laborious and challenging process for systems expected to handle a variety of tasks.
RT-2 overcomes this limitation by leveraging knowledge from a vast corpus of web data. It already has an understanding of what trash is and can identify it without explicit training. Moreover, the system even comprehends how to execute the task of throwing away the trash, despite never being explicitly trained for that action. The abstract nature of trash, such as distinguishing between a bag of chips and a banana peel, is effortlessly grasped by RT-2 through its vision-language training data.
The efficacy rate of RT-2 in executing new tasks has significantly improved from 32% in RT-1 to 62%, marking an exciting leap in robotic learning capabilities. This progress signals a bright future for the robotics industry as it moves towards creating more adaptable and intelligent robots capable of handling a wide array of tasks with ease and precision.