MIT researchers have developed a novel robot training technique that promises to revolutionise the field by cutting time and cost, while enhancing robots’ adaptability to unfamiliar tasks and environments. This approach, known as Heterogeneous Pretrained Transformers (HPT), utilises artificial intelligence and transformer models to unify and process diverse data sources, allowing robots to learn from a much broader scope of experiences.
Traditionally, robot training has relied on collecting specific data for each robot and task in controlled settings—a process that can be costly, time-consuming, and narrowly focused. Lead researcher Lirui Wang, a graduate student in electrical engineering and computer science at MIT, highlights a unique challenge in robotics: the diversity of domains, data types, and robot hardware. According to Wang, the HPT method offers a novel way to navigate these challenges by integrating disparate data sources into a “shared language” that generative AI models can interpret.
The HPT architecture is designed to unify various data types, such as camera images, language instructions, and depth maps. Drawing inspiration from the transformer models that power advanced language models, HPT enables robots to process both visual and proprioceptive inputs—essentially the robot’s sense of its own movements and positions.
Initial tests of HPT showed impressive results, with robots trained through this method outperforming conventional approaches by over 20 percent in both simulated and real-world environments. The system’s versatility was particularly evident when robots encountered tasks outside their training data, demonstrating enhanced adaptability to novel challenges.
The research team constructed an extensive pretraining dataset, pulling from 52 diverse datasets with over 200,000 robot trajectories across multiple categories. This integration of human demonstrations, simulations, and real-world scenarios allows robots to build a richer understanding from a variety of experiences.
A key innovation in HPT is its handling of proprioception, an often-overlooked aspect of robotic design. By balancing the robot’s proprioceptive inputs with visual data, HPT enables more nuanced, dexterous movements. “This balance between proprioception and vision gives robots the agility they need to perform complex tasks,” said Wang.
Looking to the future, the team plans to expand HPT’s capabilities to process unlabelled data, pushing it closer to the adaptability of advanced language models. Their ultimate vision is ambitious: to create a universal robot brain that could be easily downloaded and deployed across different robots, eliminating the need for task-specific training.
While the research is still in its early stages, the MIT team is optimistic. They believe scaling this approach could lead to transformative advancements in robotic capabilities, similar to the leaps seen in large language models.
Reference: https://www.artificialintelligence-news.com/news/mit-breakthrough-could-transform-robot-training/