CAES Lezione 3 - Elementi di Teoria della Stima

Transcript

CAES Lezione 3 - Elementi di Teoria della Stima
Outline
Introduction
Statistical Models
CAES
Lezione 3 Elementi di Teoria della Stima
Prof. Michele Scarpiniti
Dip. NFOCOM - “Sapienza” Università di Roma
http://ispac.ing.uniroma1.it/scarpiniti/index.htm
[email protected]
Roma, 10 Marzo 2010
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
1 / 28
Outline
Introduction
Statistical Models
1
Introduction
Basic Definition
Classical and Bayesian estimation
2
Statistical Models
AR Model
MA Model
ARMA Model
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
2 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Introduction
Introduction
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
3 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Introduction to Estimation Theory
In several applications the statistic properties (cdf, pdf, acf, PSD, etc.) of some
stochastic processes are not known, thus it is very important the estimation of
such properties from measured data. The estimation theory deals with this kind
of problems.
N−1
Given a SP x[n], let us pose x = [x[n]]n=0 a sequence of N values of x[n]
and suppose to estimate a statistical parameter θ ∈ Θ, where Θ is the
parameter space, using a function h(·), called estimator and denoted with θ̂:
θ̂ = h(x).
In general the problem is the estimation of a set of L unknown parameters
L−1
N−1
θ = [θ[n]]n=0 from a series of N observations x = [x[n]]n=0 , by means of an
estimation function or estimator h(·), such that θ̂ = h(x).
Summarizing:
1 θ ∈ Θ is the vector of the parameters that we want to estimate. It can be a random
variable or a unknown deterministic constant;
2 h(x) is the estimator (the law able to estimate the parameters from the
observations);
3 θ̂ is the result of the estimation, θ̂ = h(x). It is alway a RV.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
4 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Introduction to Estimation Theory
The estimator is a RV that can be described by the sampling distribution
fx;θ (x; θ). The sampling distribution gives information on the goodness of
an estimator: in fact a good estimator is focused around its true value and
has a minimum variance.
If θ is a deterministic parameter, we have the classical estimation
theory. In this case θ represents a parametric dependence of the
fx;θ (x; θ) from the measured data x.
If θ is a stochastic parameter, characterized by an own pdf fθ (θ),
which gives all the a priori knowledges and known as a priori pdf, we
have the Bayesian estimation theory. Then the sampling distribution
can be evaluated, using Bayes’ theorem, as
fx,θ (x, θ) = fx,θ (x|θ)fθ (θ) = fx,θ (θ|x)fx (x)
wherefx,θ (x|θ) the conditioned pdf which represents the knowledge
taken from data x conditioned to the knowledges of θ.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
5 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Unbiased estimator
The sampling distribution
fx;θ (x; θ) isnnot
n o
o alway known: in this sense we can usenthe
o
expectation E θ̂ , the variance var θ̂ or σθ̂2 and the mean square error mse θ̂ , as
an indirect measure of the goodness of the estimator.
An estimator is said to be unbiased if
E (θ̂) = θ
and the difference
∆
b(θ̂) = E (θ̂) − θ
Remark
The bias can be due to a systematic
error, i.e. a measurement error. An
unbiased estimator is not necessarily a
“good” estimator.
is defined as bias.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
6 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Variance and MSE of an estimator
The variance of an estimator is define as
∆
var(θ̂) = σθ̂2 = E {|θ̂ − E (θ̂)|2 }
that measures the dispersion of the pdf of θ̂ around its mean value.
The mean square error (MSE) of an estimator is defined as
mse(θ̂) = E {|θ̂ − θ|2 }
where θ is the true value of the parameter. It measures the mean quadratic
dispersion of the estimator from its true value. It can be decomposed as
mse(θ̂) = σθ̂2 + |b(θ̂)|2
Proof.
E {|θ̂ − θ|2 } = E {|θ̂ − θ + E (θ̂) − E (θ̂)|2 } = E {|[θ̂ − E (θ̂)] + [E (θ̂) − θ]|2 }
= E {|θ̂ − E (θ̂)|2 } + |E (θ̂) − θ|2
|
{z
} |
{z
}
σ2
θ̂
M. Scarpiniti
|b(θ̂)|2
CAES Lezione 3 - Elementi di Teoria della Stima
7 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Minimum variance unbiased (MVU) estimator
Ideally we want an estimator with a zero MSE. Unfortunately such an
estimator is not existing. In fact sometimes an optimum parameter can
depend on the estimator itself. In this sense the best estimator is not that
with a minimum MSE, but that with a zero bias, such that
mse(θ̂) = σθ̂2
Obviously this estimator is called minimum variance unbiased or MVU. A
such estimator cannot exist.
Remark
A good estimator should be unbiased and with minimum variance. These
two properties are often contrasting: reducing the variance, the bias can
increase. This situation is known as bias-variance trade-off.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
8 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Consistent estimator
An estimator is said to be weakly consistent if increasing the length N
of the sample, we have
lim p{|h(x) − θ| > ε} = 0, ∀ε > 0
N→∞
An estimator is said to be strongly consistent if increasing the length
N of the sample, we have
lim p{h(x) = θ} = 1
N→∞
A sufficient condition for the weakly consistency is that, for a large N:
lim E {h(x)} = θ,
N→∞
lim var{h(x)} = 0.
N→∞
In this way the sampling distribution tends to an impulse around the
estimation value.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
9 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Confidence interval
A confidence interval (CI) is a particular kind of interval estimate of parameters.
Instead of estimating the parameter by a single value, an interval likely to include
the parameter is given. On the other side, increasing the length N of data the
sampling distribution tends to a Gaussian distribution for the central limit
theorem.
Known the sampling distribution it is possible to evaluate the probability on an
interval (−∆, ∆).
This interval, the confidence interval, indicates that the estimator θ̂ is into the
interval (−∆, ∆) around θ, with probability 1 − β or confidence (1 − β) · 100%.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
10 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Classical and Bayesian estimation
Now we can analyze several methods for the estimation of the unknown
parameters. As we have said there exist two approaches to the estimation
theory:
Classical estimation theory where the parameter θ is deterministic. In
particular we will introduce:
1
Maximum likelihood or ML estimation.
Bayesian estimation theory where the parameter θ is a random
variable. In particular we will introduce:
1
2
3
M. Scarpiniti
Maximum a posteriori or MAP estimation;
Minimum mean square error or MMSE estimation;
Minimum absolute error or MAE estimation.
CAES Lezione 3 - Elementi di Teoria della Stima
11 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Maximum likelihood (ML) estimation
The Maximum likelihood (ML) estimation consists in the determination of
θML through the maximization of the sampling distribution fx,θ (x; θ), here
named ad likelihood function Lθ :
Lθ = fx,θ (x; θ)
Let us note that if fx,θ (x; θ1 ) > fx,θ (x; θ2 ) then θ1 is “more plausible” than
θ2 . In this way the paradigm ML shows that the estimate θML is the most
plausible due to the observations x. Usually one considers the natural
logarithm of the likelihood ln Lθ = ln fx,θ (x; θ). The the estimate is done as
θML = arg max {ln Lθ }
θ∈Θ
that is solving the following equation
∂ ln fx,θ (x; θ)
∆
θML = θ ∴
=0
∂θ
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
12 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Maximum a posteriori (MAP) estimation
In the Maximum a posteriori (MAP) estimation the parameter θ is characterized
by an a priori pdf pθ (θ). The knowledge given by measures on data modifies this
probability conditioning it on the data x itself fx,θ (θ|x), known as pdf of θ
conditioned a posteriori by measures x. The MAP estimation consists in the
determination of the maximum of the a posteriori pdf fx,θ (θ|x). Usually one
considers the natural logarithm of this pdf:
∂ ln fx,θ (θ|x)
∆
=0
θMAP = θ ∴
∂θ
Now for the Bayes’ theorem
fx,θ (θ|x) =
fx,θ (x|θ)fθ (θ)
fx (x)
and because fx (x) does not depend on θ, we have
∂
∆
θMAP = θ ∴
(ln fx,θ (x|θ) + ln fθ (θ)) = 0
∂θ
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
13 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Minimum mean square error (MMSE) estimation
In the minimum mean square error (MMSE) estimation the target is the minimization of
the MSE
Z∞ Z∞
mse(θ̂) = E {|h(x) − θ|2 } =
|h(x) − θ|2 fx,θ (x, θ)dθdx
−∞ −∞
Remembering that
fx,θ (x, θ) = fx,θ (θ|x)fx (x)
we obtain the following function to be minimized
 ∞

Z∞
Z
2
mse(θ̂) =
fx (x) 
|h(x) − θ| fx,θ (θ|x)dθ dx
−∞
−∞
Because both the integrands are positive and the external integrals does not depend on
h(x), we can minimize only the internal one
Z∞
|h(x) − θ|2 fx,θ (θ|x)dθ
−∞
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
14 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Minimum mean square error (MMSE) estimation
Differentiating with respect h(x) and posing the result to zero
Z∞
|h(x) − θ|fx,θ (θ|x)dθ = 0
2
−∞
that is
Z∞
h(x)
Z∞
fx,θ (θ|x)dθ =
−∞
R∞
because
θfx,θ (θ|x)dθ
−∞
fx,θ (θ|x)dθ = 1, we finally obtain
−∞
∆
Z∞
θMMSE = h(x) =
θfx,θ (θ|x)dθ = E (θ|x)
−∞
Hence the MMSE estimation is the expectation value of θ conditioned to data x. It is
usually a nonlinear function of data with the exception of Gaussian data: in this case
θMMSE is a linear function of x.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
15 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Minimum absolute error (MAE) estimation
The cost function MSE is not the unique possible, but we can choose
other cost functions. For example, widely used in literature is the
minimum absolute error or MAE
mae(θ̂) = E (|h(x) − θ|)
It can be interpreted as the median of the a posteriori distribution.
θMAP corresponds to the
maximum of the a posteriori
distribution;
θMAE corresponds to the median
of the a posteriori distribution;
θMMSE corresponds to the mean
of the a posteriori distribution.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
16 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
Linear minimum mean square error (MMSE) estimation
As we have remarked the MMSE is usually a nonlinear function of data x. If we
want a linear trend, we have to impose a linear constraint, for example the
estimator is a linear combination of the measures
∗
θMMSE
∆
= h(x) =
N−1
X
hi · x[i]
i=0
where the coefficients hi , called weights, can be estimated with the MSE
 


2 
N−1


X
∂
∆
hopt = h ∴
hi x[i]  = 0
E θ −

∂hj 
i=0
∗
Let us pose e = θ − θMMSE
=θ−
N−1
P
hi x[i], then
i=0
∂E {e 2 }
∂hj
= 0 for
j = 0, 1, . . . , N − 1, we obtain
E {e · x[j]} = 0
that is: the error e is orthogonal to the data vector x (orthogonality principle).
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
17 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
The Cramér-Rao lower bound (CRLB)
The Cramér-Rao lower bound (CRLB) or information inequality expresses
the minimum value of the variance that can be obtained for the estimation
of the parameters θ.
Given a vector of RVs and an unbiased estimator
θ̂ = h(x) characterized
h
i
by the covariance matrix Cθ = cov(θ̂) = E (θ − θ̂)(θ − θ̂)T , we can
define the Fisher information matrix J
∂2
{ln fx,θ (x; θ)}
for i, j = 0, 1, . . . , L − 1
J(i, j) = −E
∂θ[i]∂θ[j]
Then the Cramér-Rao lower bound (CRLB) is defined by the following
inequality
Cθ ≥ J−1
An estimator that satisfies this condition is called fully efficient and is a
minimum variance unbiased estimator (MVU) too.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
18 / 28
Outline
Introduction
Statistical Models
Basic Definition
Classical and Bayesian estimation
The Cramér-Rao lower bound (CRLB)
Often the Cramér-Rao lower bound is limited to the variances only. In this
case the elements on the principal diagonal of cov(θ̂) have to satisfy the
condition
1
var(θ[i]) ≥
for i = 0, 1, . . . , L − 1
J(i, i)
for a mono-dimensional RV, we have
1
var(θ̂) ≥
−E
∂[ln fx,θ (x;θ)] 2
∂θ
or, alternatively
1
var(θ̂) ≥
−E
M. Scarpiniti
h
∂ 2 [ln fx,θ (x;θ)]
∂θ2
CAES Lezione 3 - Elementi di Teoria della Stima
i
19 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Statistical Models
Statistical Models
Remark
An extremely powerful paradigm for the characterization of temporal series
is to consider them obtained as the output of a LTI filter, with a white
noise in input.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
20 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Wold’s theorem
A random stationary sequence x[n] can be represented as the output of a
LTI filter with impulse response h[n] when the input is a white noise η[n]
x[n] =
∞
X
h[k]η[n − k]
k=0
This sequence is defined as linear stochastic process or linear process,
shown in the following figure.
If H(e jω ) is the frequency response of h[n], then the PSD of x[n] is
2
Rxx (e jω ) = H(e jω ) ση2
where ση2 is the variance of the white noise η[n].
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
21 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Autoregressive (AR) model
The autoregressive (AR) model of a temporal series is characterized by the
following equation
p
X
x[n] = −
a[k]x[n − k] + η[n]
k=1
that describes an AR model of order p, indicated as AR(p). The coefficients
a = [a1 , a2 , . . . , ap ] are called autoregressive parameters.
The frequency response of this filter is (all poles filter )
1
H(e jω ) =
1+
p
P
a[k]e −jωk
k=1
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
22 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Autoregressive (AR) model
Hence the PSD is
ση2
Rxx (e jω ) = 2
p
P
−jωk 1 +
a[k]e
k=1
It is possible to show that the auto-correlation function of the AR(p) model
satisfies the following equation

p
P

 −
a[l]r [k − l]
k ≥l

l=1
r [k] =
p
P


a[l]r [l] + ση2
k=0
 −
l=1
that can be re-write in matrix

r [0]
r [1]
 r [1]
r [0]


..
..

.
.
r [p − 1] r [p − 2]
M. Scarpiniti
form as
· · · r [p − 1]
· · · r [p − 2]
..
..
.
.
···
r [0]
CAES Lezione 3 - Elementi di Teoria della Stima





a[1]
a[2]
..
.
a[p]






 = −


r [1]
r [2]
..
.





r [p]
23 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Autoregressive (AR) model
In addition we have
ση2 = r [0] +
p
X
a[k]r [k]
k=1
hence if we know the coefficient of acf r [k] for k = 1, 2, . . . , p then the AR
parameters can be estimated by the previous p equations, known as the
Yule-Walker equations.
Example: given the AR process of first order x[n] = −a[1]x[n − 1] + η[n]
it is
r [k] = −a[1]r [k − 1], k ≥ 1 ⇒ r [k] = r [0](−a[1])k , k > 0
Then from ση2 = r [0] + a[1]r [1], we obtain
r [k] =
ση2
1 − a2 [1]
(−a[1])|k|
hence
Rxx (e jω ) =
M. Scarpiniti
ση2
|1 + a[1]e −jω |2
CAES Lezione 3 - Elementi di Teoria della Stima
24 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Moving average (MA) model
The moving average (MA) model of a temporal series is characterized by the
following equation
q
X
x[n] =
b[k]η[n − k]
k=0
that describes an MA model of order q, indicated as MA(q). The coefficients
b = [b1 , b2 , . . . , bq ] are called moving average parameters.
The frequency response of this filter is (all zeros filter )
H(e jω ) =
q
X
b[k]e −jωk
k=0
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
25 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Moving average (MA) model
Hence the PSD is
jω
Rxx (e ) =
ση2
q
2
X
−jωk b[k]e
k=0
The auto-correlation function of the model MA(q) is

P
 2 q−|k|
ση
b[l]b[l + |k|]
|k| ≤ q
r [k] =
l=0

0
k >q
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
26 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
Autoregressive moving average (ARMA) model
The autoregressive moving average (ARMA) model of a temporal series is
characterized by the following equation
x[n] = −
p
X
a[k]x[n − k] +
k=1
q
X
b[k]η[n − k]
k=0
that describes an ARMA model of order p, q, indicated as ARMA(p, q), where p
is the degree of the denominator and q the degree of the nominator, respectively.
The PSD is the following
2
Rxx (e jω ) = ση2 | H(z)|z=e jω | =
2
|b0 +b1 e −jω +b2 e −j2ω +···+bM e −jqω |
= ση2 |1+a e −jω +a e −j2ω +···+a e −jpω |2
1
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
2
N
27 / 28
Outline
Introduction
Statistical Models
AR Model
MA Model
ARMA Model
References
S.M. Kay.
Fundamentals of Statistical Signal Processing: Estimation Theory.
Prentice Hall, Upper Saddle River, NJ, 1998.
D.G. Manolakis, V.K. Ingle, S.M. Kogon
Statistical and Adaptive Signal Processing.
Artech House, Norwood, MA, 2005.
B. Widrow, S.D. Stearns
Adaptive Signal Processing.
Prentice Hall ed., 1985.
M. Scarpiniti
CAES Lezione 3 - Elementi di Teoria della Stima
28 / 28

Documenti analoghi

Lezione3 - Sapienza

Lezione3 - Sapienza TECNICHE DI ELABORAZIONE NUMERICA DI IMMAGINE E VIDEO Ing. Michele Scarpiniti Dipartimento INFOCOM – Università di Roma “La Sapienza” [email protected] http://ispac.ing.uniroma1.it/sca...

Dettagli