538 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 2, FEBRUARY 2015
Fusing Inertial Sensor Data in an Extended
Kalman Filter for 3D Camera Tracking
Arif Tanju Erdem, Member, IEEE, and Ali O¨zer Ercan, Member, IEEE
Abstract— In a setup where camera measurements are used to estimate 3D egomotion in an extended Kalman filter (EKF) framework, it is well-known that inertial sensors (i.e., accelerometers and gyroscopes) are especially useful when the camera undergoes fast motion. Inertial sensor data can be fused at the
EKF with the camera measurements in either the correction stage (as measurement inputs) or the prediction stage (as control inputs). In general, only one type of inertial sensor is employed in the EKF in the literature, or when both are employed they are both fused in the same stage. In this paper, we provide an extensive performance comparison of every possible combination of fusing accelerometer and gyroscope data as control or measurement inputs using the same data set collected at different motion speeds. In particular, we compare the performances of different approaches based on 3D pose errors, in addition to camera reprojection errors commonly found in the literature, which provides further insight into the strengths and weaknesses of different approaches. We show using both simulated and real data that it is always better to fuse both sensors in the measurement stage and that in particular, accelerometer helps more with the 3D position tracking accuracy, whereas gyroscope helps more with the 3D orientation tracking accuracy. We also propose a simulated data generation method, which is beneficial for the design and validation of tracking algorithms involving both camera and inertial measurement unit measurements in general.
Index Terms— Inertial sensor fusion, extended Kalman filter, 3D camera tracking, inertial measurement unit, accelerometer, gyroscope.
ACCURATE 3D tracking is important for manyapplications including navigation, visualization, humancomputer interaction and augmented reality . Although there are various methods proposed for 3D tracking, those that use GPS or cellular technologies are not suitable for indoor applications . Methods using IR light and RF signals require the placement of IR light emitters or RFID tags on the scene , which may not be acceptable, or even possible, for some applications such as cultural heritage. Computer vision based tracking methods that rely on camera measurements only, do not possess these problems and perform well only at
Manuscript received August 5, 2013; revised December 31, 2013,
June 21, 2014, and September 28, 2014; accepted November 26, 2014. Date of publication December 12, 2014; date of current version January 8, 2015. This work was supported by the Scientific and Technological Research Council of Turkey under Grant EEEAG-110E053. The associate editor coordinating the review of this manuscript and approving it for publication was
Dr. Dimitrios Tzovaras.
The authors are with Özyeg˘in University, Istanbul 34662, Turkey (e-mail: email@example.com; firstname.lastname@example.org).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2014.2380176 slow motion . However, fast camera motion may result in blurred features that may not be localized accurately, thereby resulting in degradation in estimated tracking accuracy. Inertial sensors (i.e., accelerometers and gyroscopes), on the other hand, measure the derivatives of motion and their signals are more reliable at fast motion since their SNR improves with the amount of motion. However, 3D pose estimation using inertial sensors alone suffers from drift . Thus, it is suggested to fuse inertial sensor data with camera measurements for 3D tracking , .
There are many approaches to fuse the inertial sensor data with camera data in the literature –. One popular approach is to fuse them in an Extended Kalman
Filter (EKF) –. EKF has two stages, namely the time update (i.e., prediction) stage and the measurement update (i.e., correction) stage. Hence, there are two alternative ways to fuse inertial sensor data in an EKF: one option is to use inertial sensor data at the correction stage, which we refer to as using inertial sensor data as measurement input, and the second option, is to use inertial sensor data at the prediction stage, which we refer to as using inertial sensor data as control input. Therefore, there are a total of eight possible approaches for fusing accelerometer and gyroscope data in an
EKF framework: both used as control inputs, both used as measurement inputs, one is used as control input while the other one is used as measurement input, and finally, only one is used as control or measurement input while the other one is not used.
Five of the above eight combinations, namely, fusing both inertial sensor data as measurement or control inputs, fusing only accelerometer as measurement or control input, and fusing only gyroscope as measurement have been investigated in the literature –. Ref.  compares three of these cases, namely, both inertial sensor data fused as measurement inputs, both fused as control inputs, and only gyroscope data fused as measurement input, and suggests that all three cases perform similarly well at fast and slow speeds except that gyroscope only as measurement input case results in poor tracking quality at fast speeds. Ref.  compares two cases, namely, both inertial sensor data fused as measurement inputs and only gyroscope data fused as measurement input, and concludes that the case of fusing both sensor data significantly outperforms the gyroscope only case. Our previous work  fuses only accelerometer data, and suggests that fusing accelerometer data either as measurement or as control input brings about similar improvement to tracking accuracy.
To the best of our knowledge, the three remaining cases, 1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.