Capturing Large-scale panoramas with sparse visual features
Transcript
Capturing Large-scale panoramas with sparse visual features
Research and Development Project Report Vision based control strategies for Quadcopter Navigation Author: Anirudh Vemula Supervisor: Prof. Sharat Chandran A report submitted in partial fulfilment of the requirements for the degree of B.Tech in the Department of Computer Science and Engineering Indian Institute of Technology Bombay May 2015 Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 2 Related Work 3 3 Description of the existing system 3.1 Quadcopter . . . . . . . . . . . . 3.2 IMU . . . . . . . . . . . . . . . . 3.3 Monocular SLAM & PTAM . . . 3.4 Extended Kalman Filter . . . . . 3.5 PID Control . . . . . . . . . . . . 4 Improvements made in the system 4.1 Auto Scale Initialization . . . . . 4.2 Reducing Speed . . . . . . . . . . 4.3 User Interface . . . . . . . . . . . 4.4 Convex Hull and Plane fitting . . 4.5 Path Planning . . . . . . . . . . . 5 Results 5.1 Single photo . . . . . 5.2 Indoor planar scene . 5.3 Outdoor planar scene 5.4 State estimation with . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and without PTAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 5 6 6 . . . . . 7 8 9 9 10 11 . . . . 11 12 12 14 14 6 Conclusion 15 7 Future work 16 1 1 1.1 Introduction Motivation In recent years, the use of unmanned aerial vehicle (UAV) has been a major focus of active research, since they extend our capability in a variety of areas. Applications such as surveillance, medical evacuation and search-and-rescue missions are some such areas where use of UAVs is explored. The major challenge in developing UAVs is to extract useful information about its current position and orientation in a robust manner. In the aforementioned applications, human lives depend on UAVs to achieve stable flight and navigate accurately. To accomplish this, we need the vehicles to be self-aware of its own location and move reliably. In our work, we use a cheap, off-the-shelf quadcopter as testbed. The control of quadcopter during autonomous flights is dependent on several variables such as its current position, velocity and its orientation. These variables can be partly estimated using on-board inertial sensors. However, measurements from the inertial sensors are erroneous making a steadily accurate estimation of pose nearly impossible. To overcome this drawback, we need an absolute reference such as GPS for outdoor applications. In indoor environments, where no GPS is available, we resort to visual tracking methods to estimate pose of the quadcopter. Our work deals with a specific visual tracking method, Parallel tracking and mapping (PTAM) [9], applied to the problem of localizing a quadcopter in indoor and outdoor environments. 1.2 The Problem In tasks such as capturing panoramas of large structures, the use of hand-held camera is difficult as it is hard to reach distant parts of the structure to take a close, frontal shot. Instead, the use of quadcopter makes the task simpler as it can reach such distant parts of the structure and capture required shots. But, the major challenge to realize this scenario is to navigate the quadcopter accurately without compromising on both image quality and the quadcopter’s safety. We need the quadcopter to effectively cover the region such that the resulting pictures can be stitched into a panorama. The problem we tackle in this work is to accurately navigate the quadcopter according to a given flight plan. The flight of the quadcopter should be safe, stable and reliable. A flight plan would essentially entail a set of 2 points that the quadcopter needs to cover steadily in an order. We realize this goal by using PTAM in conjunction with measurements from on-board sensors, to localize the quadcopter and determine its position and orientation at all times. Our work improves on an already existing application developed by TU Munich, [3], to make it more autonomous, user-friendly and enable users to achieve more accurate navigation on the quadcopter, when applied to tasks such as using the quadcopter to take pictures of large-scale planar structures. 2 Related Work There have been several works dealing with using visual feedback to aid in the localization of aerial vehicles. An accurate pose estimation of a moving rigid object using a calibrated stereo vision setup is described in [10]. However, the real time capability of it is not sufficient for quadcopter motion control. In [7], a position controller for quadcopter is implemented using an optical tracking system at MIT. This tracking system is nevertheless very expensive and not transportable. 3D-pose estimation and control of quadcopter using only visual feedback is done in [4] based on one camera fixed on the ground and an additional camera mounted on the quadcopter facing downwards. The feature detection in this system is very sensitive to lighting condition and therefore, is not reliable enough. The paper on which our work is based on is [9], where they attempt to track the camera in scenes without any prior map and subsequently, build a long-term map in which the tracked feature points are constantly re-visited. [9] uses a hand-held camera to detect feature points in the current frame and use them to track the camera’s position at all times. Concurrently, the program builds and refines a map to include all the feature points detected so that tracking is made easy when the points are re-visited. Another important work, [5], deals with application of PTAM to the problem of quadcopter localization. In [5], quadcopter achieves localization by using PTAM in conjunction with the on-board inertial sensors to estimate the quadcopter’s pose at all times. However, the system developed in [5] is not capable of executing a given flight plan accurately and reliably. 3 3 Description of the existing system In the following sections, we describe the existing system developed by TU Munich, [3], to which we have made improvements. 3.1 Quadcopter Quadcopters are compact rotor craft air vehicles that can be used for indoor and outdoor inspection and exploration tasks. Like a conventional helicopter they can hover, however, they have other advantages such as small size, low power consumption, vertical take-off without tail rotor and dynamic manoeuvrability. In recent times, there has been an explosive growth of interest in quadcopters spurred by the availability of cheap and open-source quadcopters. Popularly, quadcopters are called drones and we will use these terms interchangeably in this report. The quadcopter applied in this work is a “Parrot AR Drone 2”, [2], with a control driver “ardrone autonomy”, [1], from ROS (Robot operating system). We used “Parrot AR Drone 2” as it is a low-cost, off-the-shelf quadcopter with a large amount of software support available for its control. All the computation is performed off-board on a ground-based laptop and no modifications to hardware or on-board software are required. The drone is equipped with a 1280x720 HD camera (92◦ diagonal FOV) which can be used to record videos or capture pictures. The drone communicates with external drivers (or in our case, the driver ardrone autonomy) using Wi-Fi. We can access the image stream sent by the drone through WiFi, using ROS and the ardrone autonomy driver, which is used by PTAM and other visual tracking methods. Note that the image sent over Wi-Fi is of 640x480 VGA quality, although the camera is of HD quality, for faster communication. 3.2 IMU The main controller board, Inertial measurement unit (IMU), contains inertial sensors consisting of three gyroscopes for each axis of rotation and three accelerometer sensors for each axis of translation. The IMU estimates the absolute roll and pitch angles by combining the acceleration vector and angular speeds given by the gyroscopes. A relative yaw angle is estimated by integrating the angular speed measured by the gyroscope corresponding to 4 yaw axis. In addition to these sensors, the drone is equipped with ultrasound sensor underneath to measure ground altitude accurately. 3.3 Monocular SLAM & PTAM Using IMU alone to localize the drone’s pose and position, w.r.t. a ground reference frame, is very inaccurate as the IMU measurements are very noisy and leads to high error. It has been observed that the drone drifts a lot when only IMU is used for localization and the estimated position differs from the actual position in the X and Y axes significantly (the Z axis estimate is accurate due to accurate ultrasound altitude sensors). The major advantage of using IMU sensors is that they provide absolute scale of the environment, which other alternative approaches such as monocular SLAM fail to provide. Since the IMU is error-prone, we need an alternative for localization. The most popular method is monocular simultaneous localization and mapping (SLAM) where the system localizes itself, w.r.t a ground reference frame, and also creates a map of its surrounding environment simultaneously. A particular challenge for monocular SLAM is, that the scale of the map needs to estimated from additional metric sensors such as IMU, as it cannot be recovered from vision alone. For monocular SLAM, the approach used in our work is PTAM, [9]. PTAM tracks the camera position by detecting feature points in the current frame without any prior map of the environment and subsequently, build a long-term map in which the detected keypoints can be constantly re-visited. The tracking and mapping procedures are done in parallel threads leading to fast, robust and accurate camera tracking, all while refining the map and expanding it as new regions are explored. In PTAM, after the map initialization, the visual map is rotated such that the xy-plane corresponds to the horizontal plane according to the accelerometer data, and scale it such that the average keypoint depth is 1. The positions of all the detected keypoints are stored in the map and can be used to estimate the position of the drone itself in future iterations. Throughout the tracking procedure, the scale of the map is estimated using the method described in [5], using IMU measurements. Once initialized, the detected keypoints are shown in the application as in Figure 1(a). The map built is shown in the application similar to Figure 1(b). 5 (a) Camera feed with detected keypoints (b) PTAM 3-D Map view Figure 1: Screenshots of the PTAM application, [5] 3.4 Extended Kalman Filter The extended kalman filter (EKF) is used to fuse all available data from the IMU sensors and PTAM, regarding pose estimates of the drone. It includes the full motion model of the quadcopter’s flight dynamics and reaction to control commands. Given an initial pose estimate and the control command, the EKF, using sensor data and SLAM data, provides us with the new pose and velocity estimates. The elimination of drift obtained by using monocular SLAM in conjunction with IMU through EKF is shown in Figure 2, [5]. In the first graph, we compare the path estimated by both EKF and raw IMU values with the groundtruth (target trajectory). In the second graph, the drone’s target position is fixed (in this case, x = 0m, y = 0m) and it is repeatedly pushed away from its target. We can observe in case of EKF it shows that it returns to the target position again but in the case of raw IMU, the estimated state is entirely erroneous showing the final position to be very far from the target position. 3.5 PID Control Based on the position and velocity estimates from the EKF, we apply PID (proportional-integral-derivative) control to steer the quadcopter towards the desired goal location. In PID control, the error is measured as the distance between the current measured position and the desired position and this 6 Figure 2: Elimination of odometry drift through SLAM and EKF, picture courtesy of [5] error is minimized. For each of the four degrees of freedom, we employ a separate PID controller. The existing system allows us to autonomously control the drone by providing a short scripting language using which we can issue control commands to the drone. Some of the commands, which are essential to our work, are listed below: • takeoff : Command the drone to perform a vertical take-off • goto x y z yaw : Command the drone to go to the position given by (x, y, z) with the given yaw. • autoInit : Auto-initialization of PTAM for the drone • moveByRel x y z yaw : From the current position, move by x, y, z units in respective directions and by yaw in the yaw axis. 4 Improvements made in the system Although, the existing system has been shown to eliminate odometry drift, given a flight plan the navigation of the quadcopter is inaccurate and erratic. The prominent reasons for such behaviour is the absence of an initialization procedure that ensures high scale accuracy and slow motion speed to avoid jerky motion. 7 4.1 Auto Scale Initialization The current initialization of PTAM in the existing system, involves the drone capturing two keyframes in which sufficient keypoints are detected across both of them. The distance between the camera positions for these two keyframes must be small (of the order of 10cm), so that there are enough correspondences across the two frames. Once it can detect sufficient number of keypoints, the initialization procedure ends and the drone hovers at its current position. This procedure ensures just good PTAM map quality and not high scale accuracy, but we need high scale accuracy to reliably estimate the pose of the quadcopter. The scale inaccuracy also leads to inaccurate navigation by the quadcopter, which may end up damaging the quadcopter itself. To increase the scale accuracy, we have developed an auto scale initialization procedure which tries to improve the scale accuracy to 100%, thereby correcting the map and enabling accurate navigation of the drone. Also note that when the scale accuracy is maximum, the 3-D world coordinates of the detected keypoints are also very accurate and hence, can be used in further computation without any error. Our procedure starts with the vertical take-off of the drone and then capture two keyframes differing by 10cm in height. Once we are done taking two keyframes with enough number of keypoints detected, we proceed to move the drone up and down in an oscillatory fashion until the scale accuracy reaches 100%. The up and down movement of the drone corrects the scale as the ultrasonic sensor used to measure altitude is relatively accurate. The scale estimation method, described in [5], uses the measurement from this sensor to compute the scale of the map created. For practical purposes, we have also included a maximum number of oscillations as in some cases, the drone may never reach maximum accuracy only by up-down motion. So, when the drone oscillates sufficient number of times and the scale is still inaccurate, we stop the procedure and it is left to the user to improve the accuracy (A good way is to displace the drone forward/backward and start the procedure again). The drone movement for auto scale initialization are shown in Figure 3(a) and 3(b). To aid users to perform this procedure autonomously, we have created a command in the scripting language, described in the above section, as below: • autoScaleInit : Initializes the PTAM of drone and ensures that the 8 (a) Drone moving up (b) Drone moving down Figure 3: Drone movement for improving scale accuracy scale accuracy is 100% 4.2 Reducing Speed The default speeds of the drone (in roll and pitch axes) are high, leading to jerky motion. For the problem at hand, the jerky motion in the initialization procedure can end up with poor tracking capability. Also, for any video recording or photo capturing application that we intend to use the drone for, we need it to reliably move from one point to another point steadily. Hence, the pitch and roll velocity magnitudes have been decreased, by introducing a control factor which is multiplied to the values before being published to the drone (by ardrone autonomy, the control driver) 4.3 User Interface In our work, we focus on the use of autonomous drones in the context of capturing panoramas of large-scale planar structures. In such an application, the drone needs to cover a user-selected region in the current frame. We created a interactive user interface which shows the image stream from the drone and allows the user to click corners of the region that needs to be covered. In a parallel thread, the application always extracts the 3-D world coordinates of the keypoints detected in current frame. The corresponding 2-D image coordinates of these keypoints, in the current frame, are obtained by 9 Figure 4: Convex hull from 4 points applying the projection matrix to their 3-D coordinates. The projection matrix is computed from camera calibration (intrinsics) and current pose of the drone (extrinsics). For each user-selected point in the current frame, the nearest keypoint in 2-D image space is extracted and the keypoint’s corresponding 3-D coordinates in world space is stored, as the corner point desired by the user. The selection is verified by showing the nearest keypoint detected in the UI, so that the user, if unsatisfied, can delete his last selection and re-select. 4.4 Convex Hull and Plane fitting From the selected 3D corner points, a 3-D convex hull is extracted using the popular Jarvis march algorithm, [8]. An instance of the convex hull of 4 user-selected points is shown in Figure 4. After extracting the 3D convex hull, all the keypoints lying inside this hull are found and a plane is fit to these points using RANSAC, [6]. This plane is used to generate the final target points, for the drone to capture a large-scale planar structure. The plane also gives us an estimate of how far the drone should move forward, to ensure the drone’s safety without compromising on image quality. 10 4.5 Path Planning The drone now needs to cover the user-selected region in small steps so that there is sufficient overlap between successive captured frames without compromising on image quality. This overlap is essential for stitching successive frames into a larger panoramic picture. The drone should move steadily between each step to capture the pictures without motion blur. This requires us to find the target points for the drone to move to, so that it can hover at these points and capture frames. In the process, care must be taken so that only the user-selected region must be covered and nothing else. To achieve this, the planar region selected is divided into a rectangular grid with overlapping cells, where each cell is captured as an image later. The overlap between two successive grid cells in both the horizontal and vertical direction in this rectangular grid is 50%. The size of the grid cell is set manually by the user. The coordinates of the grid cell corners (estimated using PTAM) and the camera intrinsics (obtained from camera calibration) are then used to calculate the camera extrinsics. In practice, we select more than just the grid cell corners (centers of 2x2 grid inside this cell) to get more accurate camera extrinsics. From the extrinsics, we get the target point for the drone to move to, so that the entire grid cell is shown in the current frame without any other background in the current frame. These target points are used to send control commands (the goto command) to the drone, so that the drone moves from one point to another along each row steadily taking pictures, which are ultimately stitched to a panorama. For instance, in a grid as shown in Figure 5(a) the cells are AEIG, jkwv, EBHI, qxyp, rstu, xlmy, . . . . Note that successive cells in horizontal and vertical direction have 50% overlap. Figure 5(b) shows where the drone camera position (x,y,z) as computed by camera extrinsics, to capture just the grid cell AEIG. 5 Results We have used our system in the context of capturing panoramas using the drone instead of the customary camera. In this section, we show some of the results obtained when our system is used in conjunction with AR Drone2, to capture scenes of differing size and nature. Through these results, we show 11 (a) Grid (b) Camera position to capture grid cell Figure 5: Single poster pictures the obvious advantages that we get from using a quadcopter in these tasks when compared to a usual hand-held camera. 5.1 Single photo In this scenario, we use the drone to take the picture of a single poster. To achieve this, we use the auto-scale initialization procedure to localize the drone and then send appropriate control command (goto x y z yaw, in this case) for the drone to fly to the position right in front of the poster. In Figure 6(a) and 6(b), we observe that in the picture taken with a handheld camera the poster is shown with an angle as it is taken from ground, whereas the poster is far up on the wall. But with the drone camera, we get a shot which doesn’t have any inclination and is suitable for stitching into a panorama. 5.2 Indoor planar scene We have also tested out the drone in an indoor scene where there are a collection of posters put up on the wall and the user uses our UI to divide the user-selected region into a grid and generate target points, as described in the section 4.5. These target points are used to generate appropriate control commands so that the drone captures the entire wall and the constituent frames are stitched together to form a panorama such as Figure 7. 12 (a) Hand-held Camera (b) Drone Camera Figure 6: Single poster pictures Figure 7: Indoor planar scene panorama 13 (a) Outdoor planar scene panorama (b) Outdoor planar scene ground truth Figure 8: Single poster pictures 5.3 Outdoor planar scene In an outdoor scene, our UI program doesn’t work well due to the effect of wind which will affect the flight in a random fashion which may lead to the drone getting damaged in some cases. Hence, we just use the auto scale initialization procedure to localize the drone and then proceed to move the drone manually (by sending goto commands manually) to capture the entire scene. One such instance is shown in Figure 8(a). The ground truth is shown in Figure 8(b). 5.4 State estimation with and without PTAM In this section, we quantify the amount of drift eliminated by using PTAM in conjunction with IMU, when compared to using only IMU for state estimation. In the current scenario, we move the drone along a fixed path 14 (a) Front view (b) Top view Figure 9: State estimation using IMU only and observe how well the state estimate values reflect the path. State of the drone, in this scenario, is just its world coordinates (x,y,z). In the following graphs, we show the amount of drift that is seen in our own experiments where we observe how accurate state estimation is, when used with PTAM (and proper scale initialization) and without PTAM (just IMU and manual control). Figure 9(a) shows the state estimations during a path taken by the drone in manual control and Figure 9(b) shows the same graph from a top view to show the high amount of drift in Y-direction clearly. Even in Figure 9(a), we can see that there is a lot of noise in the state estimated. Figure 10(a) shows the state estimations during a path taken by the drone using PTAM autonomously and Figure 10(b) shows the same graph from a side view to show the less amount of drift in Y-direction clearly. 6 Conclusion In this report, we presented a more autonomous and user-friendly way to navigate quadcopter in unstructured, GPS-denied environments and does not require artificial landmarks nor prior knowledge of the environment. We have also ensured the safety, stability and reliability of the quadcopter’s flight by decreasing flight speeds, achieving high scale accuracy and maintaining a safe distance from obstacles (by fitting plane to the keypoints detected, 15 (a) Front view (b) Top view Figure 10: State estimation using IMU and PTAM (with scale initialized) which are part of obstacles). To examine the importance of scale accuracy and using PTAM for localization, we have quantified the amount of drift that is eliminated by using our approach when compared to using only IMU for localization. We have tested our system in the context of capturing panoramas, in diverse scenarios: indoor and outdoor scenes. The resulting panoramas are close to the ground truth without any missing regions and discontinuities. Using this work, we have stressed the importance of using drone with localization based on both visual tracking and on-board sensors to realize accurate navigation, in applications such as panorama capture. This can be extended to several other applications where the pose and image data obtained from the drone can be exploited to get better results. 7 Future work Our system currently is still not entirely error-prone. Although, the drift is mostly eliminated, the motion is still a little erratic especially in outdoor scenarios, where the wind disturbs the drone’s motion. An interesting avenue to pursue would be exploring other sources of visual feedback that we can obtain from the drone’s environment to aid in its localization. Also, our approach of dividing the region into grid and extracting target points from camera intrinsics fails in case of outdoor situation where the 16 motion is erratic and the drone may lose track of its position suddenly for a few moments, in which if a command is published to the drone, it would move erratically and may possibly get damaged. Hence, our system needs to be more robust and safe in case of outdoor situations when the drone is autonomous. The goal of making the drone flight completely autonomous is rich with challenges that are ready to be tackled by future researchers in this area and our work has taken a small step towards the larger goal of fully autonomous drones. References [1] Ardrone autonomy. http://wiki.ros.org/ardrone_autonomy. [2] Parrot ar drone2. http://ardrone2.parrot.com/. [3] Tum ardrone. http://wiki.ros.org/tum_ardrone. [4] Erdinç Altuğ, James P Ostrowski, and Camillo J Taylor. Control of a quadrotor helicopter using dual camera visual feedback. The International Journal of Robotics Research, 24(5):329–341, 2005. [5] Jakob Engel, Jürgen Sturm, and Daniel Cremers. Accurate figure flying with a quadrocopter using onboard visual and inertial sensing. IMU, 320:240, 2012. [6] Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. [7] Daniel Gurdan, Jan Stumpf, Michael Achtelik, K-M Doth, Gerd Hirzinger, and Daniela Rus. Energy-efficient autonomous four-rotor flying robot controlled at 1 khz. In Robotics and Automation, 2007 IEEE International Conference on, pages 361–366. IEEE, 2007. [8] Ray A Jarvis. On the identification of the convex hull of a finite set of points in the plane. Information Processing Letters, 2(1):18–21, 1973. [9] Georg Klein and David Murray. Parallel tracking and mapping for small ar workspaces. In Mixed and Augmented Reality, 2007. ISMAR 2007. 17 6th IEEE and ACM International Symposium on, pages 225–234. IEEE, 2007. [10] R Laganiere, S Gilbert, and G Roth. Robust object pose estimation from feature-based stereo. Instrumentation and Measurement, IEEE Transactions on, 55(4):1270–1280, 2006. 18