A sparser reduced set density estimator by introducing weighted l1 penalty termby Gang Yang, Yan Wang, Lei Guo

Pattern Recognition Letters

About

Year
2015
DOI
10.1016/j.patrec.2015.01.016
Subject
Artificial Intelligence / Computer Vision and Pattern Recognition / Signal Processing / Software

Text

Accepted Manuscript

A sparser reduced set density estimator by introducing weighted l1 penalty term

Gang Yang , Yan Wang , Lei Guo

PII: S0167-8655(15)00042-2

DOI: 10.1016/j.patrec.2015.01.016

Reference: PATREC 6167

To appear in: Pattern Recognition Letters

Received date: 9 April 2014

Accepted date: 26 January 2015

Please cite this article as: Gang Yang , Yan Wang , Lei Guo , A sparser reduced set density estimator by introducing weighted l1 penalty term, Pattern Recognition Letters (2015), doi: 10.1016/j.patrec.2015.01.016

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T 1

Highlights • Adding the regularization term to the reduced set density estimator will increase sparsity and decrease the complexity. • Weighted 1l norm minimization show superior performance in increasing sparsity while obtaining remarkable accuracy. • An iterative algorithm is proposed to solve the corresponding convex optimization problem efficiently. • The proper choice of parameters and the trade-off between sparsity and accuracy in the algorithm are discussed.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T 1

Pattern Recognition Letters j our nal hom epage: w w w . elsev ier . c om

A sparser reduced set density estimator by introducing weighted 1 l penalty term

Gang Yang a , Yan Wang a, , and Lei Guob a Intelligent Control Laboratory, School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China b Science and Technology on Aircraft Control Laboratory, School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China  Corresponding author. Tel.: (+86)135-2187-9682; e-mail: w-yan@buaa.edu.cn

AB ST R ACT

Reduced set density estimator (RSDE), employing a small percentage of available data samples, is an effective and important nonparametric technique for probability density function estimation. Despite that RSDE has been demonstrated to perform better in the computational complexity compared with several existing approaches, it still faces the critical challenge in practical applications when training the estimator on large data sets. Dealing with its high complexity both in time and space, a sparser reduced set density estimator with weighted 1 l penalty term (RSDE-WL1) is proposed in this paper. By introducing the weighted 1l norm of the estimated coefficients as additional penalty term, the sparse weights of density estimation are estimated, in which small weight coefficients are more likely to be driven to zero. The proposed iterative algorithm is used to solve the corresponding convex optimization problem efficiently. Some discussions about the choice of parameters and the trade-off between sparsity and accuracy are also given. The simulations of several examples demonstrate that the proposed RSDE-WL1 is superior to the related methods including the RSDE in sparsity and complexity. While requiring less number of kernels in density estimation, our approach is comparable to the full-sample optimized Parzen Window estimator in terms of test accuracy.

Keywords: Reduced set density estimator; Weighted 1 l penalty term; Sequential minimal optimization; Sparsity 2014 Elsevier Ltd. All rights reserved.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T 2 1. Introduction

The estimation of probability density function from observed data samples of unknown distributions is a fundamental problem in machine learning, pattern recognition, data analysis and many fields of engineering [1,2]. It is known that both the parametric and nonparametric methods can be used to estimate the density.

The parametric methods assume that the samples are drawn from the identical distribution with unknown parameters to be estimated. However, in many practical applications, it is often difficult to obtain the prior assumption of the underlying distribution. On the contrary, nonparametric methods can be achieved only from given training samples without introducing any prior knowledge of underlying density [3]. Due to the merits above, nonparametric techniques have attracted much attention, as they can be utilized to estimate the density of arbitrary shapes as long as adequate samples are given.

The Kernel Density Estimation (KDE), referred to as the

Parzen Window (PW) estimator, is a simple but remarkably accurate nonparametric density estimation method [4]. KDE is frequently used for machine learning and computer vision problems, such as classification [5], clustering [6], independent component analysis [7], background subtraction [8] and image segmentation [9]. The celebrated PW estimator can be regarded as a special case of the finite mixture model (FMM) [10], in which the number of mixtures is equivalent to that of training data samples and all the mixing weights are equal. Because of using the entire samples, the PW estimator suffers from high computational cost both in time and space, and becomes intractable for subsequent use.

To improve the computational efficiency, there has been a considerable interest into research on the sparse KDE. The support vector machine (SVM) estimator has been proposed by

Weston et al. [11, 12], in which the problem of density estimation is formulated into a supervised learning model. The optimization in SVM method is to solve a constrained quadratic optimization problem, which induce sparsity by the nature of SVM. Girolami and He [13] present the reduced set density estimator (RSDE) to solve the problem by constructing a sparse KDE that employs a small percentage of available data samples. It is optimal in the 2L sense in that the integrated square error between the unknown true density and the RSDE is minimized in training the estimator.