“Robots need to be able to deal with uncertainty if they’re going to be useful to us in the future. They need to be able to deal with unexpected situations and that’s sort of the goal of a general purpose or multi-purpose robot, and that’s just hard,” said Robert Playter, CEO of Boston Dynamics, in an interview with Lex Fridman last year.
Playter couldn’t have been more real in describing the difficulty in robotics. Boston Dynamics, which began developing general purpose robots in the early 2000s, introduced its humanoid Atlas only in 2013. Apart from struggling for investments in robotics, training robots is always a challenge.
Simulation for Robots
Simulated training is the most commonly adopted technique to equip general purpose robots for the real world. This is where virtual environments are created to develop, test and refine algorithms for robots to mimic real-world conditions.
“Simulation works very well for certain aspects. They work well in simulation for tasks like walking and doing backflips, where you need to balance your robot. And that is the only way,” said Mankaran Singh, founder of Flow Drive, which makes autonomous vehicle capabilities.
However, for tasks that can be learned through imitation such as folding shirts, it does not require a simulated environment.
Simulation is Not the Only Way
CynLr Robotics, a Bengaluru-based deep-tech company that is building robotic arms, believes simulation is not the only way to train its robots. “There are so many layers of perception and fundamental intuition using perception that are still missing. These are capabilities that we should focus on to be able to make them more autonomous,” said Gokul NA, founder of CynLr.
Meanwhile, NVIDIA’s Isaac Sim that is powered by Omniverse, is a robotic simulation platform that provides a virtual environment for AI-based robots to design, test, and train.
“We do leverage those [Omniverse] technologies as a tool, but you can’t say a tool is the solution,” said Gokul. The limitations come into the picture when you bring these robots into the real world.
“When you bring from a simulated assumption to reality, it doesn’t work. It doesn’t work at all, because it has never learned that. It has learned something else independently. Your mistakes are what it has learned, what you have left out,” he said.
He attributes this gap to machines lacking the cognitive layers that aid in understanding objects and environments that can lead to discrepancies between what is seen and what is understood.
Imitation learning is another common method for training robots where a user can demonstrate a task. However, it also comes with its limitations. For instance, if a user tries to train a robot to pick a white-coloured mug, the robot will fail to pick mugs of other colours.
Arm and Humanoid Robots
Similarly the form factor of general purpose robots also has a huge role to play in training them. For instance, robotics arm manufacturing requires a lot of manipulation, something that most companies overlook.
Gokul believes that today’s robotics developments, especially robotic arms, are more of ‘record and playback machines’ with sophisticated manipulation, however, they lack in perception. “Most cases where you want to commercially deploy these robots, you don’t need legs. Wheels are more than enough, but you need more capability with the hands,” said Gokul, hinting at the current humanoids that are being developed.
2024 being the year of robotics, many players such as Figure AI, Tesla, UniTree, and Aptronix are focusing on building humanoid robots, while Google DeepMind and other research institutes are training and developing arm-based robots to execute multiple functions.
AutoRT, SARA-RT and RT-Trajectory are a few robotics research systems Google DeepMind released. Stanford University introduced Mobile ALOHA, a system designed to replicate bimanual mobile manipulation tasks necessitating full body control – cooking being the main task demonstrated.
NVIDIA: The Robot-Enabler
In addition to Omniverse, GPU giant NVIDIA is aggressively investing in robotics and recently unveiled GR00T, a general-purpose foundation model for humanoid robots. Robots powered by GR00T are engineered to understand natural language and mimic human movements by observing action.
“Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” said NVIDIA chief Jensen Huang, at GTC 2024.
NVIDIA is even building a comprehensive AI platform for all the leading humanoid robot companies, including OpenAI-backer 1X Technologies, Agility Robotics, Boston Dynamics, Figure AI, Unitree Robotics and many more.
Not just NVIDIA, other players are also enabling the robot training ecosystem. OpenAI-backed Physical Intelligence which recently raised $70M in funding, is an emerging startup working towards bringing general-purpose AI into the physical world.