Friday, December 17, 2010

Limitations of the Kinect

or "Why do we still need other sensors if the Kinect is so awesome?"

The Kinect is a great sensor and a game changer for mobile robotics, but it won't make every other sensor obsolete. Though maybe I can use it to convince someone to let me take apart a Swiss Ranger 4000.

Field of View

The field of view is an important consideration for sensors because if you can't see enough features you can't use scan matching/ICP to estimate your change in position. Imagine if you were in a giant white room with only one wall that was perfectly flat. if you walk along the wall there is no way to determine your movement except from how many steps you take. With a robot you have wheel slip and encoder errors that build up over time unless there are landmarks that can be used to bound the error.


The depth image on the Kinect has a field of view of 57.8°, whereas the Hokuyo lasers have between 240° and 270°, and the Neato XV-11's LIDAR has a full 360° view.

In addition to being able to view more features, the wider field of view also allows the robot to efficiently build a map without holes. A robot with a narrower field of view will constantly need to maneuver to fill in the missing pieces to build a complete map.

One solution would be to add more Kinects to increase the field of view. The main problem is the sheer volume of data. 640 x 480 x 30fps x (3 bytes of color + 2 bytes of Depth) puts us at close to the maximum speed of the USB bus, at least to the point where you are only going to get good performance with one Kinect per bus. My laptop has two USB buses that have accessible ports, and you might get four separate buses on a desktop. Assuming you down sample until it works computationally you still have to deal with power requirements, unless your robot is powered by a reactor, and possible interference from reflections to deal with.

Another simpler approach is to mount the Kinect on a servo to pan horizontally, this however reduces the robot to intermittent motion where it is constantly stopping and scanning. Depending on your robot's mechanical design you may be better off rotating the robot.

Range

The minimum range for the Kinect is about 0.6m and the maximum range is somewhere between 4-5m depending on how much error your application can handle. The Hokuyo URG-04LX-UG01 works from 0.06m to 4m with 1% error, and the UTM-30LX works from 0.1m to 60m. The XV-11 LIDAR does 0.2m to 6m. So the Kinect will have the same problems as the more expensive laser range finders in terms of being able to see the end of a long hallway, but the bigger problem will be the Kinect's close range blind spot. I'm sure it wouldn't be hard to imagine the dangers of a soft squishy dynamic obstacle approaching a robot from behind, then standing within 0.6m (2ft) while the robot turns and drives forward over the now no longer dynamic obstacle. It can also make maneuvering in confined spaces difficult without a priori knowledge of the environment.

One solution to this would be to add an additional laser projector to the Kinect so that the baseline could be adjustable and the minimum range could be closer. Another approach would be to place the sensor on the robot looking downward at a point high enough to ensure that the dynamic obstacles and their parents were detectable at all times.

The maximum range will be limited by the need to be eye-safe, the power output of the Kinect laser is spread out as a set of points projected over a large surface area, while more traditional 2D laser scanners direct their entire power output to a single point. The 2D laser scanner will generally be capable of a longer range given accurate time-of-flight measurements. The other major limit to the Kinect's maximum range will be the need to make the Kinect wider to increase the distance between the laser projector and the IR imager to have a large enough baseline.

Environmental

The environmental challenges for the outdoor use of laser scanners has been fairly well studied, with rain and dust being known problems. Changing lighting conditions, from clouds passing overhead, can also wreak havoc with some sensors. I would be interested in seeing experimental results using the Kinect outdoors, during the day, in adverse weather. However, it should work well at night with good weather.

One important question is, can the Kinect see snow?


Computation and Thermodynamics

(updated: 22:28 EST 12/17)

Having designed and built several mobile robots I can safely say that once all the software is debugged, all the electronics are rewired and actually labeled, and the mechanisms are all lubricated, the biggest problem is thermodynamics. Battery technologies are a set of trade-offs that in the end give you some amount of potential energy stored in a constant volume of space, having a constant mass.

The amount of operating time for the robot is limited by how fast you convert that potential energy into kinetic energy or heat. On the electromechanical side you can recover some energy by using regenerative breaking to convert kinetic energy into potential energy and heat. While researchers have made some progress on reversible computing, there is currently no regenerative computing so all of the energy spent computing ends up as heat. So as we add computation power to the robot we are effectively decreasing the operating time.

As the compute power is increased the run time is decreased, so to make up for it you can add more batteries. As you add more batteries the mass of the robot increases, so the motors need more power to accelerate the robot. So to make up for it you can add more batteries.

In terms of testing algorithms and getting research done, offloading computation to a ground station is a valid solution. However, it may not be a practical solution for regular operation since wireless networks may not have complete coverage or reliability.

As you may imagine these problems are worse if your robot is flying. On the upside, cooling the robot is easier.

Results

With a little bit of work the Kinect can provide more than enough sensing capabilities to put your robot into the big leagues, but traditional laser range finders still have a use and at the very least make solving some of the problems easier.

References

This paper has important details on the XV-11 LIDAR
Kinect calibration technical details can be found here.

Submit your ideas or corrections in the comments. Clarifications available upon request.

11 comments:

shimniok said...

Wow! Great article on many levels!!

Quick correction: "wreak havoc" is the correct spelling.

I Heart Robotics said...

Good catch. I think that's the first typo anyone has caught. It reminds me of when I learned that "to pore over something" was spelled correctly.

I'm still going to blame my spell checker for the error.

RobotNV said...

The world is not uniform vertically.

It's much easier to find landmarks for matching if you have a 3D view that Kinect/PSDK sensors give you rather than 2D view that Hokuyos give.

It's just that current crop of software has been optimized to use Hokuyos. But give it 6 months and we shall see where the software will be.

Let's go back though and say we accept that the world is uniform vertically. Then why not break Kinect's view vertically into 7 regions to cover 360 degrees (with even some overlap)using 6 mirrors positioned at different angles? The data can be adjusted into one plane using software then.

I Heart Robotics said...

Keep in mind that I am not anti-Kinect. I'm actively using it for research and I think that it is an important sensor for mobile robotics and it provides a low cost entry point for DIY robotics.

3D generally won't get you more features along a hallway which is where you are going to have problems. RGB helps since you can see thing like signs taped to the wall, which is why a lot of the 3D(6DoF) SLAM algorithms are leaning toward visual SLAM. The 2D laser is probably going to give better results under these conditions.

I'm not sure it's going to take six months, but the results from the Kinect are going to be nothing sort of amazing. However I think, assuming you can afford it, there are advantages to having a 2D laser in addition to the Kinect/PSDK. For example, on a ground robot it should be straightforward to fuse the pose estimates from kinect vSLAM and a 2D laser scan matcher to get an improved pose estimate. Not to mention preventing you from running someone over.

Taking into account that you are dealing with multiple views from the cameras and the projector, using two mirrors to improve the field of view is not unreasonable. From a manufacturing perspective, it may not be feasible to build an arrangement of 6-7 mirrors but I would be interested in seeing diagrams to the contrary.

RobotNV said...

Imagine a horisontal hexagon. Each side represents a bottom side of a mirrow mounted at 45 degrees. Kinect would be pointed upward facing this combination of 6 mirrors.

Side view showing 2 mirrors of 6:
\ /
\ / - mirror
\___/


__o_o__0_
| Kinect |
-----------

Eight mirrors might work better.

RobotNV said...

This blog has eaten my spaces.
Better diagram is here:
http://forums.trossenrobotics.com/showthread.php?p=44922#post44922

I Heart Robotics said...

I think I see what you're suggesting.

However, I still suspect that this design will be annoying to manufacture and there may be projective geometry issues.

Imagine if you hold a mirror in front of your eyes, your eyes can create a stereo depth image from the image in the mirror.

If you hold two mirrors up, where each eye can only see an image in in one mirror. You may or may not be able to resolve an image depending on how the mirrors are arranged.

I suspect, based on my understanding of your diagram, that the four mirrors pointing left and right may not provide the save view to each camera. So you may be able to reflect the points from the laser projector into a 360 degree pattern, but can you see them from the depth camera?

That said, I think there may be something to this mirror idea given the right geometry.

Anonymous said...

Thanks for writing this article in response to my initial query on another post. This is an interesting topic and the only place I've seen it discussed.

Whether through a mirror arrangement (as one poster suggests) or multiple kinects (perhaps down sampled to prevent USB saturation) - I think fov is less of a concern. Isn't the update rate much higher on the kinect than the other sensors? so downsampling may not be a big deal in some applications where USB bus resources are limited? Otherwise other strategies e.g. splitting the work among multiple CPU's on a ground based robot should be fine (PR2 has several)

The biggest (only?) disadvantage I see is the range limitations and I haven't heard of any way to easily address this on the kinect yet.

ian

I Heart Robotics said...

- Downsampling
As far as I know, the OpenNI drivers do not currently offer away to get less data from the Kinect. So you can downsample to improve computational performance but you still use up the USB bandwidth. I'll try to follow up on this later this week or next.

- Update Rate
The kinect offers a maximum rate of 30HZ
A Hokuyo UTM-30LX has a scan rate of 40Hz
Hokuyo URG-04LX-UG01 has an update rate of 10Hz.
A SwissRanger 4000 does 50Hz

- Range Issues
I don't see a way to solve the range issues without a compromise. The kinect is very similar to a stereo camera. For it to usable at shorter ranges the distance between the laser projector and the IR camera needs to be smaller, and to detect things at a longer range the camera and project need to be farther apart. One trick humans use to improve the near range depth perception is the the eyes can move so you become cross-eyed if you look at a finger in front of your nose.

Another thing to consider is that while the RGB image does not contain depth information, it does contain visual information. It probably wouldn't be robust enough for lawyers in the US but one option would be to use a face detector to avoid running over humans that may not be within sensing range.

Anonymous said...

RANGING CLOSER

you can put a plastic rectangular prism at an angle in front of the kinect ir sender and thereby offset it sideways.

this way you can trick the sensor into ranging stuff closer or farther away than what the unit was intended for...

and if you put the prism on a servo or something you can scan both closer and farther...

you obviously need to do some calibration of the output data so that you know how to interpret it.

hope it helps...

I Heart Robotics said...

The new Nyko Zoom should help widen the field of view and allow for objects to be ranged at closer distances.

Do you have any test results or references for the prism idea?