Researchers at the U.S. Army Research Laboratory and the Robotics Institute at Carnegie Mellon University developed a new technique to quickly teach robots novel traversal behaviors with minimal human oversight.
The technique allows mobile robot platforms to navigate autonomously in environments while carrying out actions a human would expect of the robot in a given situation.
The experiments of the study were recently published and presented at the Institute of Electrical and Electronics Engineers’ International Conference on Robotics and Automation held in Brisbane, Australia.
ARL researchers Drs. Maggie Wigness and John Rogers engaged in face-to-face discussions with hundreds of conference attendees during their two and a half hour interactive presentation.
According to Wigness, one of research team’s goals in autonomous systems research is to provide reliable autonomous robot teammates to the Soldier.
“If a robot acts as a teammate, tasks can be accomplished faster and more situational awareness can be obtained,” Wigness said. “Further, robot teammates can be used as an initial investigator for potentially dangerous scenarios, thereby keeping Soldiers further from harm.”
To achieve this, Wigness said the robot must be able to use its learned intelligence to perceive, reason and make decisions.
“This research focuses on how robot intelligence can be learned from a few human example demonstrations,” Wigness said. “The learning process is fast and requires minimal human demonstration, making it an ideal learning technique for on-the-fly learning in the field when mission requirements change.”
ARL and CMU researchers focused their initial investigation on learning robot traversal behaviors with respect to the robot’s visual perception of terrain and objects in the environment.
More specifically, the robot was taught how to navigate from various points in the environment while staying near the edge of a road, and also how to traverse covertly using buildings as cover.
According to the researchers, given different mission tasks, the most appropriate learned traversal behavior can be activated during robot operation.
This is done by leveraging inverse optimal control, also commonly referred to as inverse reinforcement learning, which is a class of machine learning that seeks to recover a reward function given a known optimal policy.
In this case, a human demonstrates the optimal policy by driving a robot along a trajectory that best represents the behavior to be learned.
These trajectory exemplars are then related to the visual terrain/object features, such as grass, roads and buildings, to learn a reward function with respect to these environment features.
While similar research exists in the field of robotics, what ARL is doing is especially unique.
“The challenges and operating scenarios that we focus on here at ARL are extremely unique compared to other research being performed,” Wigness said. “We seek to create intelligent robotic systems that reliably operate in warfighter environments, meaning the scene is highly unstructured, possibly noisy, and we need to do this given relatively little a priori knowledge of the current state of the environment. The fact that our problem statement is so different than so many other researchers allows ARL to make a huge impact in autonomous systems research. Our techniques, by the very definition of the problem, must be robust to noise and have the ability to learn with relatively small amounts of data.”
According to Wigness, this preliminary research has helped the researchers demonstrate the feasibility of quickly learning an encoding of traversal behaviors.
“As we push this research to the next level, we will begin to focus on more complex behaviors, which may require learning from more than just visual perception features,” Wigness said. “Our learning framework is flexible enough to use a priori intel that may be available about an environment. This could include information about areas that are likely visible by adversaries or areas known to have reliable communication. This additional information may be relevant for certain mission scenarios, and learning with respect to these features would enhance the intelligence of the mobile robot.”
The researchers are also exploring how this type of behavior learning transfers between different mobile platforms.
Their evaluation to date has been performed with a small unmanned Clearpath Husky robot, which has a visual field of view that is relatively low to the ground.
“Transferring this technology to larger platforms will introduce new perception viewpoints and different platform maneuvering capabilities,” Wigness said. “Learning to encode behaviors that can be easily transferred between different platforms would be extremely valuable given a team of heterogeneous robots. In this case, the behavior can be learned on one platform instead of each platform individually.”
This research is funded through the Army’s Robotics Collaborative Technology Alliance, or RCTA, which brings together government, industrial and academic institutions to address research and development required to enable the deployment of future military unmanned ground vehicle systems ranging in size from man-portables to ground combat vehicles.
“ARL is positioned to actively collaborate with other members of the RCTA, leveraging the efforts of top researchers in academia to work on Army problems,” Rogers said. “This particular research effort was the synthesis of several components of the RCTA with our internal research; it would not have been possible if we didn’t work together so closely.”
Ultimately, this research is crucial for the future battlefield, where Soldiers will be able to rely on robots with more confidence to assist them in executing missions.
“The capability for the Next Generation Combat Vehicle to autonomously maneuver at optempo in the battlefield of the future will enable powerful new tactics while removing risk to the Soldier,” Rogers said. “If the NGCV encounters unforeseen conditions which require teleoperation, our approach could be used to learn to autonomously handle these types of conditions in the future.”