Waypoint

A Question of Perception (Pt 2) — Talking Robo-Collaboration & Future Challenges & More with Prof. Scaramuzza

Last month we caught up with Prof. Davide Scaramuzza, the head of the University of Zurich’s Robotics and Perception Group. Here we continue our discussion, covering terrestrial/flying robot collaboration, how his work compliments that of EPFL’s Prof. Floreano, redundancy research, and the key technical challenges of agility and event-based vision.

Hello again professor. Let’s pickup by talking about your work on terrestrial and flying robots working together. How are the terrestrial machines navigating their environment — is that using the on-the-fly maps being made by their flying colleagues? And what kind of communication link are these two robots using?

What we know as a society is that collaboration and cooperation are better than individual entities operating alone. This is also true for a team of collaborating ground and aerial robots.

Ground robots have a limited view, they are constrained to move on the ground. But they can carry heavy payloads, such as sensors and batteries, and, thus, they have longer endurance. An aerial robot meanwhile can provide an overhead view of the environment, and can overcome all the obstacles of ground robots, but has limited payload and endurance. If we combine them together, we get a system that benefits from the advantages of these two robots.

So, the idea we are currently researching about is how to use an aerial robot as an external flying camera to provide the ground robot with an overview of the environment, to help it better plan its actions.

A_Question_of Perception4_senseFly_Waypoint
“The aerial robot effectively acts as an external flying eye that continuously localises the ground robot within the global map built by the aerial robot,” Prof. Scaramuzza explains. (Photo: Alain Herzog.)

In one of our research publications, the drone searches for a victim and builds a map of the environment that a rover then utilises to plan the shortest path to the victim and deliver a first-aid kit. Since there are obstacles on the way, we demonstrated how the ground robot removes them by using its onboard robotic arm. These robots communicate through Wi-Fi. Since the ground robot can carry heavier payloads, most computation is done on the PC onboard the ground robot.

… the drone searches for a victim and builds a map of the environment that a rover then utilises to plan the shortest path to the victim and deliver a first-aid kit

The aerial robot effectively acts as an external flying eye that continuously localises the ground robot within the global map built by the aerial robot.

The aerial robot also builds a semantic map, where each obstacle is labeled as fixed or removable. Then, the ground robot computes the shortest path to the victim, taking into account the time to remove and place an obstacle somewhere else. Because removable obstacles are surrounded by fixed obstacles, the time to remove each obstacle can vary depending on the path that the rover has to drive in order to place it away from the fixed obstacles.

And how is the terrestrial system actually identifying and moving removable obstacles. What is the technology at play there?

The rover uses a limited range laser scanner to precisely locate and grasp the obstacle with its onboard arm.

Could you help us understand how your work on computer vision differs in approach to that of the LIS lab here at EPFL here in French-speaking Switzerland, led by Prof. Floreano, where the team has recently been working on optical flow techniques using curved artificial compound eyes? What are the pro’s and cons of these different approaches to robot navigation?

Our research is different but complementary to Floreano’s. Optical flow is concerned with estimating the speed of pixels between two consecutive frames. The curved artificial compound eye could one day be used as an additional sensing modality to compute optical flow. Visual SLAM instead estimates the 3D position of pixels and, thus, tracks pixels over multiple frames.

Our research is different but complementary to Floreano’s. Optical flow is concerned with estimating the speed of pixels between two consecutive frames. Visual SLAM instead estimates the 3D position of pixels and, thus, tracks pixels over multiple frames.

Optical flow is useful to detect and reactively avoid obstacles, as demonstrated by Professor Floreano. SLAM instead is useful for precisely following a given trajectory.

I can explain the difference with a simple example. Let’s say you are sitting in your office and you need to go to the elevator. There are different ‘algorithms’ that you can follow to go to the elevator. For example, in my case I can say I need to follow the left wall and if I follow the left wall it will take me directly to the elevator. This is called ‘reactive navigation’.

Then there is another way, called a map-based navigation. I know where I am, I know where the destination is and I just have to use my sensors to localise and navigate to the elevator.

Optical flow belongs to the first category. Reactive navigation—optical flow—is how bees and flies navigate. They do not build a map, they use optical flow to avoid (i.e. to react to) obstacles and only follow the smell to reach their goal.

Optical flow works like this: it tracks the pixels between the current image and the next image and tells you the direction a pixel is moving. It’s not building maps.

Could you also expand upon your work into redundancy?

We are currently working on integrating range, GPS, event-based vision, and inertial sensors as an additional sensory source. Redundancy is of utmost importance when you want to achieve robustness. If a sensor fails, the other sensors should take over to finish the task, or recover and stabilise the motion.

In our recent work, we used a combination of a camera, inertial sensors, and a range sensor to recover and stabilise a quadrotor after an agile [fast] maneuver in the air. In another project, we instead used dense reconstruction and inertial sensors to locate landing spots and execute autonomous landing.

Are there already commercial entities, companies that have been born as a result of RPG’s research, that are using your innovations in the real world? If so, which and what applications are they targeting?

We have a startup devoted to the commercialisation of visual-inertial navigation solutions. However, I cannot disclose the applications yet.

What is the key technical challenge you’re facing right now? Or, if you can’t say, what is the next big hurdle to overcome in the fields of computer vision and autonomous robotics? (I see your most recent video online is on pre-integrating inertial measurements between selected keyframes to accelerate computation.) What will be some key themes moving forwards and what work still needs to bring these developments to fruition?

Agility and event-based vision are the next big challenges. Fast drone maneuvers are currently confined to controlled environments.

In recent years, we have witnessed quadrotors that can juggle a ball or dance in swarms; however, all those demonstrations use motion capture systems or radio-based localization. These localisation systems are extremely valuable for evaluating control strategies or for entertainment, however achieving those maneuvers using only onboard sensors and “without” any external infrastructure still remains unsolved.

One of the main obstacles is a technological limitation: current onboard sensors have still big latencies (50 to 200ms), which puts a hard boundary on the maximum achievable agility of a flying machine. If you consider that the actuators of a quadrotor have latency in the order of 10 ms, you can understand that standard vision is still too slow to be used for agile navigation, meaning as agile as birds.

One of the main obstacles is a technological limitation: current onboard sensors have still big latencies

In this context, event based vision is a viable solution, because the latency of an event-based camera is just a few microseconds. However, the challenge is that the output is not standard frames, but rather asynchronous events. This means that a new paradigm shift is needed to deal with these sensors. This is what I am currently researching.

So to wrap up professor… you are a member and are involved with the National Centre of Competence in Research (NCCR) Robotics. How does that organisation bring together the work of different research institutions?

NCCR Robotics comprises four institutions across Switzerland: ETH, UZH, IDSIA and EPFL. Each of these participating institutions has complementary expertise, so UZH for perception and navigation for ground and flying robots, ETH for locomotion and navigation of legged robots, EPFL for locomotion, learning and control of flying, amphibious robots, and robot arms, and IDSIA for the distributed navigation of robot swarms, learning, and human-robot interaction.

Professor, we should leave it there and let you get on. Thanks so much for your time.

You’re very welcome.

===========================
About Professor Davide Scaramuzza

Professor Scaramuzza, born in Italy in 1980, is Assistant Professor of Robotics at the University of Zurich. He is also the founder and director of the university’s Robotics and Perception Group, where he develops cutting-edge research on low-latency vision and visually-guided micro aerial vehicles.

He received his PhD in Robotics and Computer Vision at ETH Zurich. He completed his postdoc at both ETH Zurich and the University of Pennsylvania. He led the European project SFLY between 2009 and 2012, which introduced the world’s first autonomous navigation of micro quadrotors in GPS-denied environments using vision as the main sensor modality. For his research contributions, he was awarded an SNSF-ERC Starting Grant, the IEEE Robotics and Automation Early Career Award, and a Google Research Award. He also co-authored the book, Introduction to Autonomous Mobile Robots (MIT Press, 2011) and he is co-founder of the Zurich-Eye startup, dedicated to the commercialisation of visual-inertial navigation solutions.
===========================

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *