Capturing Large-scale panoramas with sparse visual features

Transcript

Capturing Large-scale panoramas with sparse visual features
Research and Development Project
Report
Vision based control strategies
for Quadcopter Navigation
Author:
Anirudh Vemula
Supervisor:
Prof. Sharat Chandran
A report submitted in partial fulfilment of the requirements
for the degree of B.Tech in the
Department of Computer Science and Engineering
Indian Institute of Technology Bombay
May 2015
Contents
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
2
2 Related Work
3
3 Description of the existing system
3.1 Quadcopter . . . . . . . . . . . .
3.2 IMU . . . . . . . . . . . . . . . .
3.3 Monocular SLAM & PTAM . . .
3.4 Extended Kalman Filter . . . . .
3.5 PID Control . . . . . . . . . . . .
4 Improvements made in the system
4.1 Auto Scale Initialization . . . . .
4.2 Reducing Speed . . . . . . . . . .
4.3 User Interface . . . . . . . . . . .
4.4 Convex Hull and Plane fitting . .
4.5 Path Planning . . . . . . . . . . .
5 Results
5.1 Single photo . . . . .
5.2 Indoor planar scene .
5.3 Outdoor planar scene
5.4 State estimation with
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
and without PTAM
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
4
5
6
6
.
.
.
.
.
7
8
9
9
10
11
.
.
.
.
11
12
12
14
14
6 Conclusion
15
7 Future work
16
1
1
1.1
Introduction
Motivation
In recent years, the use of unmanned aerial vehicle (UAV) has been a major
focus of active research, since they extend our capability in a variety of areas.
Applications such as surveillance, medical evacuation and search-and-rescue
missions are some such areas where use of UAVs is explored. The major challenge in developing UAVs is to extract useful information about its current
position and orientation in a robust manner. In the aforementioned applications, human lives depend on UAVs to achieve stable flight and navigate
accurately. To accomplish this, we need the vehicles to be self-aware of its
own location and move reliably.
In our work, we use a cheap, off-the-shelf quadcopter as testbed. The
control of quadcopter during autonomous flights is dependent on several
variables such as its current position, velocity and its orientation. These
variables can be partly estimated using on-board inertial sensors. However,
measurements from the inertial sensors are erroneous making a steadily accurate estimation of pose nearly impossible. To overcome this drawback, we
need an absolute reference such as GPS for outdoor applications. In indoor
environments, where no GPS is available, we resort to visual tracking methods to estimate pose of the quadcopter. Our work deals with a specific visual
tracking method, Parallel tracking and mapping (PTAM) [9], applied to the
problem of localizing a quadcopter in indoor and outdoor environments.
1.2
The Problem
In tasks such as capturing panoramas of large structures, the use of hand-held
camera is difficult as it is hard to reach distant parts of the structure to take
a close, frontal shot. Instead, the use of quadcopter makes the task simpler
as it can reach such distant parts of the structure and capture required shots.
But, the major challenge to realize this scenario is to navigate the quadcopter
accurately without compromising on both image quality and the quadcopter’s
safety. We need the quadcopter to effectively cover the region such that the
resulting pictures can be stitched into a panorama.
The problem we tackle in this work is to accurately navigate the quadcopter according to a given flight plan. The flight of the quadcopter should
be safe, stable and reliable. A flight plan would essentially entail a set of
2
points that the quadcopter needs to cover steadily in an order. We realize
this goal by using PTAM in conjunction with measurements from on-board
sensors, to localize the quadcopter and determine its position and orientation
at all times. Our work improves on an already existing application developed
by TU Munich, [3], to make it more autonomous, user-friendly and enable
users to achieve more accurate navigation on the quadcopter, when applied
to tasks such as using the quadcopter to take pictures of large-scale planar
structures.
2
Related Work
There have been several works dealing with using visual feedback to aid in the
localization of aerial vehicles. An accurate pose estimation of a moving rigid
object using a calibrated stereo vision setup is described in [10]. However,
the real time capability of it is not sufficient for quadcopter motion control.
In [7], a position controller for quadcopter is implemented using an optical
tracking system at MIT. This tracking system is nevertheless very expensive
and not transportable. 3D-pose estimation and control of quadcopter using
only visual feedback is done in [4] based on one camera fixed on the ground
and an additional camera mounted on the quadcopter facing downwards.
The feature detection in this system is very sensitive to lighting condition
and therefore, is not reliable enough.
The paper on which our work is based on is [9], where they attempt to
track the camera in scenes without any prior map and subsequently, build a
long-term map in which the tracked feature points are constantly re-visited.
[9] uses a hand-held camera to detect feature points in the current frame
and use them to track the camera’s position at all times. Concurrently, the
program builds and refines a map to include all the feature points detected
so that tracking is made easy when the points are re-visited. Another important work, [5], deals with application of PTAM to the problem of quadcopter
localization. In [5], quadcopter achieves localization by using PTAM in conjunction with the on-board inertial sensors to estimate the quadcopter’s pose
at all times. However, the system developed in [5] is not capable of executing
a given flight plan accurately and reliably.
3
3
Description of the existing system
In the following sections, we describe the existing system developed by TU
Munich, [3], to which we have made improvements.
3.1
Quadcopter
Quadcopters are compact rotor craft air vehicles that can be used for indoor
and outdoor inspection and exploration tasks. Like a conventional helicopter
they can hover, however, they have other advantages such as small size,
low power consumption, vertical take-off without tail rotor and dynamic
manoeuvrability. In recent times, there has been an explosive growth of
interest in quadcopters spurred by the availability of cheap and open-source
quadcopters. Popularly, quadcopters are called drones and we will use these
terms interchangeably in this report.
The quadcopter applied in this work is a “Parrot AR Drone 2”, [2], with a
control driver “ardrone autonomy”, [1], from ROS (Robot operating system).
We used “Parrot AR Drone 2” as it is a low-cost, off-the-shelf quadcopter
with a large amount of software support available for its control. All the
computation is performed off-board on a ground-based laptop and no modifications to hardware or on-board software are required.
The drone is equipped with a 1280x720 HD camera (92◦ diagonal FOV)
which can be used to record videos or capture pictures. The drone communicates with external drivers (or in our case, the driver ardrone autonomy)
using Wi-Fi. We can access the image stream sent by the drone through WiFi, using ROS and the ardrone autonomy driver, which is used by PTAM
and other visual tracking methods. Note that the image sent over Wi-Fi is
of 640x480 VGA quality, although the camera is of HD quality, for faster
communication.
3.2
IMU
The main controller board, Inertial measurement unit (IMU), contains inertial sensors consisting of three gyroscopes for each axis of rotation and three
accelerometer sensors for each axis of translation. The IMU estimates the
absolute roll and pitch angles by combining the acceleration vector and angular speeds given by the gyroscopes. A relative yaw angle is estimated by
integrating the angular speed measured by the gyroscope corresponding to
4
yaw axis. In addition to these sensors, the drone is equipped with ultrasound
sensor underneath to measure ground altitude accurately.
3.3
Monocular SLAM & PTAM
Using IMU alone to localize the drone’s pose and position, w.r.t. a ground
reference frame, is very inaccurate as the IMU measurements are very noisy
and leads to high error. It has been observed that the drone drifts a lot when
only IMU is used for localization and the estimated position differs from the
actual position in the X and Y axes significantly (the Z axis estimate is
accurate due to accurate ultrasound altitude sensors). The major advantage
of using IMU sensors is that they provide absolute scale of the environment,
which other alternative approaches such as monocular SLAM fail to provide.
Since the IMU is error-prone, we need an alternative for localization. The
most popular method is monocular simultaneous localization and mapping
(SLAM) where the system localizes itself, w.r.t a ground reference frame,
and also creates a map of its surrounding environment simultaneously. A
particular challenge for monocular SLAM is, that the scale of the map needs
to estimated from additional metric sensors such as IMU, as it cannot be
recovered from vision alone.
For monocular SLAM, the approach used in our work is PTAM, [9].
PTAM tracks the camera position by detecting feature points in the current
frame without any prior map of the environment and subsequently, build a
long-term map in which the detected keypoints can be constantly re-visited.
The tracking and mapping procedures are done in parallel threads leading
to fast, robust and accurate camera tracking, all while refining the map and
expanding it as new regions are explored.
In PTAM, after the map initialization, the visual map is rotated such
that the xy-plane corresponds to the horizontal plane according to the accelerometer data, and scale it such that the average keypoint depth is 1. The
positions of all the detected keypoints are stored in the map and can be used
to estimate the position of the drone itself in future iterations. Throughout
the tracking procedure, the scale of the map is estimated using the method
described in [5], using IMU measurements. Once initialized, the detected
keypoints are shown in the application as in Figure 1(a). The map built is
shown in the application similar to Figure 1(b).
5
(a) Camera feed with detected keypoints
(b) PTAM 3-D Map view
Figure 1: Screenshots of the PTAM application, [5]
3.4
Extended Kalman Filter
The extended kalman filter (EKF) is used to fuse all available data from the
IMU sensors and PTAM, regarding pose estimates of the drone. It includes
the full motion model of the quadcopter’s flight dynamics and reaction to
control commands. Given an initial pose estimate and the control command,
the EKF, using sensor data and SLAM data, provides us with the new pose
and velocity estimates. The elimination of drift obtained by using monocular
SLAM in conjunction with IMU through EKF is shown in Figure 2, [5]. In
the first graph, we compare the path estimated by both EKF and raw IMU
values with the groundtruth (target trajectory). In the second graph, the
drone’s target position is fixed (in this case, x = 0m, y = 0m) and it is
repeatedly pushed away from its target. We can observe in case of EKF it
shows that it returns to the target position again but in the case of raw IMU,
the estimated state is entirely erroneous showing the final position to be very
far from the target position.
3.5
PID Control
Based on the position and velocity estimates from the EKF, we apply PID
(proportional-integral-derivative) control to steer the quadcopter towards the
desired goal location. In PID control, the error is measured as the distance
between the current measured position and the desired position and this
6
Figure 2: Elimination of odometry drift through SLAM and EKF, picture
courtesy of [5]
error is minimized. For each of the four degrees of freedom, we employ a
separate PID controller. The existing system allows us to autonomously
control the drone by providing a short scripting language using which we
can issue control commands to the drone. Some of the commands, which are
essential to our work, are listed below:
• takeoff : Command the drone to perform a vertical take-off
• goto x y z yaw : Command the drone to go to the position given by
(x, y, z) with the given yaw.
• autoInit : Auto-initialization of PTAM for the drone
• moveByRel x y z yaw : From the current position, move by x, y, z
units in respective directions and by yaw in the yaw axis.
4
Improvements made in the system
Although, the existing system has been shown to eliminate odometry drift,
given a flight plan the navigation of the quadcopter is inaccurate and erratic.
The prominent reasons for such behaviour is the absence of an initialization
procedure that ensures high scale accuracy and slow motion speed to avoid
jerky motion.
7
4.1
Auto Scale Initialization
The current initialization of PTAM in the existing system, involves the drone
capturing two keyframes in which sufficient keypoints are detected across
both of them. The distance between the camera positions for these two
keyframes must be small (of the order of 10cm), so that there are enough
correspondences across the two frames. Once it can detect sufficient number
of keypoints, the initialization procedure ends and the drone hovers at its
current position. This procedure ensures just good PTAM map quality and
not high scale accuracy, but we need high scale accuracy to reliably estimate
the pose of the quadcopter. The scale inaccuracy also leads to inaccurate
navigation by the quadcopter, which may end up damaging the quadcopter
itself.
To increase the scale accuracy, we have developed an auto scale initialization procedure which tries to improve the scale accuracy to 100%, thereby
correcting the map and enabling accurate navigation of the drone. Also note
that when the scale accuracy is maximum, the 3-D world coordinates of the
detected keypoints are also very accurate and hence, can be used in further
computation without any error.
Our procedure starts with the vertical take-off of the drone and then
capture two keyframes differing by 10cm in height. Once we are done taking
two keyframes with enough number of keypoints detected, we proceed to
move the drone up and down in an oscillatory fashion until the scale accuracy
reaches 100%.
The up and down movement of the drone corrects the scale as the ultrasonic sensor used to measure altitude is relatively accurate. The scale
estimation method, described in [5], uses the measurement from this sensor
to compute the scale of the map created. For practical purposes, we have
also included a maximum number of oscillations as in some cases, the drone
may never reach maximum accuracy only by up-down motion. So, when the
drone oscillates sufficient number of times and the scale is still inaccurate,
we stop the procedure and it is left to the user to improve the accuracy (A
good way is to displace the drone forward/backward and start the procedure
again). The drone movement for auto scale initialization are shown in Figure
3(a) and 3(b). To aid users to perform this procedure autonomously, we have
created a command in the scripting language, described in the above section,
as below:
• autoScaleInit : Initializes the PTAM of drone and ensures that the
8
(a) Drone moving up
(b) Drone moving down
Figure 3: Drone movement for improving scale accuracy
scale accuracy is 100%
4.2
Reducing Speed
The default speeds of the drone (in roll and pitch axes) are high, leading to
jerky motion. For the problem at hand, the jerky motion in the initialization
procedure can end up with poor tracking capability. Also, for any video
recording or photo capturing application that we intend to use the drone for,
we need it to reliably move from one point to another point steadily. Hence,
the pitch and roll velocity magnitudes have been decreased, by introducing
a control factor which is multiplied to the values before being published to
the drone (by ardrone autonomy, the control driver)
4.3
User Interface
In our work, we focus on the use of autonomous drones in the context of
capturing panoramas of large-scale planar structures. In such an application,
the drone needs to cover a user-selected region in the current frame. We
created a interactive user interface which shows the image stream from the
drone and allows the user to click corners of the region that needs to be
covered.
In a parallel thread, the application always extracts the 3-D world coordinates of the keypoints detected in current frame. The corresponding 2-D
image coordinates of these keypoints, in the current frame, are obtained by
9
Figure 4: Convex hull from 4 points
applying the projection matrix to their 3-D coordinates. The projection matrix is computed from camera calibration (intrinsics) and current pose of the
drone (extrinsics).
For each user-selected point in the current frame, the nearest keypoint in
2-D image space is extracted and the keypoint’s corresponding 3-D coordinates in world space is stored, as the corner point desired by the user. The
selection is verified by showing the nearest keypoint detected in the UI, so
that the user, if unsatisfied, can delete his last selection and re-select.
4.4
Convex Hull and Plane fitting
From the selected 3D corner points, a 3-D convex hull is extracted using
the popular Jarvis march algorithm, [8]. An instance of the convex hull of
4 user-selected points is shown in Figure 4. After extracting the 3D convex
hull, all the keypoints lying inside this hull are found and a plane is fit to
these points using RANSAC, [6].
This plane is used to generate the final target points, for the drone to
capture a large-scale planar structure. The plane also gives us an estimate of
how far the drone should move forward, to ensure the drone’s safety without
compromising on image quality.
10
4.5
Path Planning
The drone now needs to cover the user-selected region in small steps so that
there is sufficient overlap between successive captured frames without compromising on image quality. This overlap is essential for stitching successive
frames into a larger panoramic picture. The drone should move steadily between each step to capture the pictures without motion blur. This requires
us to find the target points for the drone to move to, so that it can hover at
these points and capture frames. In the process, care must be taken so that
only the user-selected region must be covered and nothing else.
To achieve this, the planar region selected is divided into a rectangular
grid with overlapping cells, where each cell is captured as an image later.
The overlap between two successive grid cells in both the horizontal and
vertical direction in this rectangular grid is 50%. The size of the grid cell is
set manually by the user.
The coordinates of the grid cell corners (estimated using PTAM) and
the camera intrinsics (obtained from camera calibration) are then used to
calculate the camera extrinsics. In practice, we select more than just the
grid cell corners (centers of 2x2 grid inside this cell) to get more accurate
camera extrinsics. From the extrinsics, we get the target point for the drone
to move to, so that the entire grid cell is shown in the current frame without
any other background in the current frame. These target points are used to
send control commands (the goto command) to the drone, so that the drone
moves from one point to another along each row steadily taking pictures,
which are ultimately stitched to a panorama.
For instance, in a grid as shown in Figure 5(a) the cells are AEIG, jkwv,
EBHI, qxyp, rstu, xlmy, . . . . Note that successive cells in horizontal and
vertical direction have 50% overlap. Figure 5(b) shows where the drone
camera position (x,y,z) as computed by camera extrinsics, to capture just
the grid cell AEIG.
5
Results
We have used our system in the context of capturing panoramas using the
drone instead of the customary camera. In this section, we show some of the
results obtained when our system is used in conjunction with AR Drone2, to
capture scenes of differing size and nature. Through these results, we show
11
(a) Grid
(b) Camera position to capture grid cell
Figure 5: Single poster pictures
the obvious advantages that we get from using a quadcopter in these tasks
when compared to a usual hand-held camera.
5.1
Single photo
In this scenario, we use the drone to take the picture of a single poster.
To achieve this, we use the auto-scale initialization procedure to localize
the drone and then send appropriate control command (goto x y z yaw, in
this case) for the drone to fly to the position right in front of the poster.
In Figure 6(a) and 6(b), we observe that in the picture taken with a handheld camera the poster is shown with an angle as it is taken from ground,
whereas the poster is far up on the wall. But with the drone camera, we get
a shot which doesn’t have any inclination and is suitable for stitching into a
panorama.
5.2
Indoor planar scene
We have also tested out the drone in an indoor scene where there are a
collection of posters put up on the wall and the user uses our UI to divide
the user-selected region into a grid and generate target points, as described in
the section 4.5. These target points are used to generate appropriate control
commands so that the drone captures the entire wall and the constituent
frames are stitched together to form a panorama such as Figure 7.
12
(a) Hand-held Camera
(b) Drone Camera
Figure 6: Single poster pictures
Figure 7: Indoor planar scene panorama
13
(a) Outdoor planar scene panorama
(b) Outdoor planar scene ground truth
Figure 8: Single poster pictures
5.3
Outdoor planar scene
In an outdoor scene, our UI program doesn’t work well due to the effect
of wind which will affect the flight in a random fashion which may lead to
the drone getting damaged in some cases. Hence, we just use the auto scale
initialization procedure to localize the drone and then proceed to move the
drone manually (by sending goto commands manually) to capture the entire
scene. One such instance is shown in Figure 8(a). The ground truth is shown
in Figure 8(b).
5.4
State estimation with and without PTAM
In this section, we quantify the amount of drift eliminated by using PTAM
in conjunction with IMU, when compared to using only IMU for state estimation. In the current scenario, we move the drone along a fixed path
14
(a) Front view
(b) Top view
Figure 9: State estimation using IMU only
and observe how well the state estimate values reflect the path. State of the
drone, in this scenario, is just its world coordinates (x,y,z).
In the following graphs, we show the amount of drift that is seen in our
own experiments where we observe how accurate state estimation is, when
used with PTAM (and proper scale initialization) and without PTAM (just
IMU and manual control). Figure 9(a) shows the state estimations during
a path taken by the drone in manual control and Figure 9(b) shows the
same graph from a top view to show the high amount of drift in Y-direction
clearly. Even in Figure 9(a), we can see that there is a lot of noise in the
state estimated.
Figure 10(a) shows the state estimations during a path taken by the drone
using PTAM autonomously and Figure 10(b) shows the same graph from a
side view to show the less amount of drift in Y-direction clearly.
6
Conclusion
In this report, we presented a more autonomous and user-friendly way to
navigate quadcopter in unstructured, GPS-denied environments and does
not require artificial landmarks nor prior knowledge of the environment. We
have also ensured the safety, stability and reliability of the quadcopter’s flight
by decreasing flight speeds, achieving high scale accuracy and maintaining
a safe distance from obstacles (by fitting plane to the keypoints detected,
15
(a) Front view
(b) Top view
Figure 10: State estimation using IMU and PTAM (with scale initialized)
which are part of obstacles).
To examine the importance of scale accuracy and using PTAM for localization, we have quantified the amount of drift that is eliminated by using
our approach when compared to using only IMU for localization. We have
tested our system in the context of capturing panoramas, in diverse scenarios: indoor and outdoor scenes. The resulting panoramas are close to the
ground truth without any missing regions and discontinuities.
Using this work, we have stressed the importance of using drone with
localization based on both visual tracking and on-board sensors to realize
accurate navigation, in applications such as panorama capture. This can
be extended to several other applications where the pose and image data
obtained from the drone can be exploited to get better results.
7
Future work
Our system currently is still not entirely error-prone. Although, the drift
is mostly eliminated, the motion is still a little erratic especially in outdoor
scenarios, where the wind disturbs the drone’s motion. An interesting avenue
to pursue would be exploring other sources of visual feedback that we can
obtain from the drone’s environment to aid in its localization.
Also, our approach of dividing the region into grid and extracting target
points from camera intrinsics fails in case of outdoor situation where the
16
motion is erratic and the drone may lose track of its position suddenly for
a few moments, in which if a command is published to the drone, it would
move erratically and may possibly get damaged. Hence, our system needs
to be more robust and safe in case of outdoor situations when the drone is
autonomous. The goal of making the drone flight completely autonomous
is rich with challenges that are ready to be tackled by future researchers in
this area and our work has taken a small step towards the larger goal of fully
autonomous drones.
References
[1] Ardrone autonomy. http://wiki.ros.org/ardrone_autonomy.
[2] Parrot ar drone2. http://ardrone2.parrot.com/.
[3] Tum ardrone. http://wiki.ros.org/tum_ardrone.
[4] Erdinç Altuğ, James P Ostrowski, and Camillo J Taylor. Control of a
quadrotor helicopter using dual camera visual feedback. The International Journal of Robotics Research, 24(5):329–341, 2005.
[5] Jakob Engel, Jürgen Sturm, and Daniel Cremers. Accurate figure flying
with a quadrocopter using onboard visual and inertial sensing. IMU,
320:240, 2012.
[6] Martin A Fischler and Robert C Bolles. Random sample consensus:
a paradigm for model fitting with applications to image analysis and
automated cartography. Communications of the ACM, 24(6):381–395,
1981.
[7] Daniel Gurdan, Jan Stumpf, Michael Achtelik, K-M Doth, Gerd
Hirzinger, and Daniela Rus. Energy-efficient autonomous four-rotor flying robot controlled at 1 khz. In Robotics and Automation, 2007 IEEE
International Conference on, pages 361–366. IEEE, 2007.
[8] Ray A Jarvis. On the identification of the convex hull of a finite set of
points in the plane. Information Processing Letters, 2(1):18–21, 1973.
[9] Georg Klein and David Murray. Parallel tracking and mapping for small
ar workspaces. In Mixed and Augmented Reality, 2007. ISMAR 2007.
17
6th IEEE and ACM International Symposium on, pages 225–234. IEEE,
2007.
[10] R Laganiere, S Gilbert, and G Roth. Robust object pose estimation
from feature-based stereo. Instrumentation and Measurement, IEEE
Transactions on, 55(4):1270–1280, 2006.
18