Notes on computing the Fisher Information matrix for MARSS models. Part IV Implementing the Recursion in Harvey 1989
This is part of a series on computing the Fisher Information for Multivariate Autoregressive State-Space Models. Part I: Background, Part II: Louis 1982, Part III: Harvey 1989, Background, Part IV: Harvey 1989, Implementation.
Citation: Holmes, E. E. 2017. Notes on computing the Fisher Information matrix for MARSS models. Part IV Implementing the Recursion in Harvey 1989.
Part III Introduced the approach of Harvey (1989) for computing the expected and observed Fisher Information matrices by using the prediction error form of the log-likelihood function. Here I show the Harvey (1989) recursion on page 143 for computing the derivatives in his equations.
Derivatives needed for the 2nd derivative of the conditional log-likelihood
Equations 3.4.66 and 3.4.69 in Harvey (1989) have first and second derivatives of vt and Ft with respect to θi and θj. These in turn involve derivatives of the parameter matrices and of ˜xt|t and ˜Vt|t. Harvey shows all the first derivatives, and it is easy to compute the second derivatives by taking the derivatives of the first.
The basic idea of the recursion is simple, if a bit tedious.
First we set up matrices for all the first derivatives of the parameters.
Then starting from t=1 and working forward, we will do the recursion (described below) for all θi and we store the first derivatives of vt, Ft, ˜xt|t and ˜Vt|t with respect to θi.
Then we go through the parameter vector a second time, to get all the second derivatives with respect to θi and θj.
We input the first and second derivatives of vt and Ft into equations 3.4.66 and 3.4.69 to get the observed Fisher Information at time t and add to the Fisher Information from the previous time step. The Fisher Information matrix is symmetric, so we can use an outer loop from θ1 to θp (p is the number of parameters) and an inner loop from θi to θp. That will be p(p−1)/2 loops for each time step.
The end result with be the observed Fisher Information matrix using equation 3.4.66 and using 3.4.69.
Outline of the loops in the recursion
This is a forward recursion starting at t=1. We will save the previous time step’s ∂vt/θi and ∂Ft/θi. That will be p×2 (n×1) vectors and n×2 (n×n) matrices. We do not need to store all the previous time steps since this is a one-pass recursion unlike the Kalman smoother, which is forward-backward.
Set-up
Number of parameters = p.
- Create Iijt and oIijt which are p x p matrices.
- Create dvit which is a n x p matrix. n Innovations and p θi.
- Create d2vijt which is a n x p x p array. n Innovations and p θi.
- Create dFit which is a n x n x p array. n x n Sigma matrix and p θi.
- Create d2Fijt which is a n x n x p x p array. n x n Sigma matrix and p θi.
Outer loop from t=1 to t=T.
Inner loop over all MARSS parameters: x0, V0, Z, a, R, B, u, Q. This is par$Z
, e.g., and is a vector of the estimated parameters elements in Z.
Inner loop over parameters in parameter matrix, so, e.g. over the rows in the column vector par$Z
.
Keep track of what parameter element I am on via p counter.
The form of the parameter derivatives
Within the recursion, we have terms like, ∂M/∂θi, where M means some parameter matrix. We can write M as vec(M)=f+Dθm, where θm is the vector of parameters that appear in M. This is the way that matrices are written in Holmes (2010). So
is written in vec form as
The derivative of this with respect to θi=a is
So in MARSS, ∂M/∂θi would be
dthetai=matrix(0,ip,1)
dthetai[i,]=1 #set up the d theta_i bit.
dM=unvec(f+D%*%dthetai,dim(M)) #only needed if M is matrix
The reason is that MARSS allows any linear constraint of the form α+βa+β2b, etc. The vec form allows me to work with a generic linear constraint without having to know the exact form of that constraint. The model and parameters are all specified in vec form with f, D, and p matrices (lower case = column vector).
The second derivative of a parameter matrix with respect to θj is always 0 since 3 has no parameters in it, only constants.
Derivatives of the innovations and variance of innovations
Equation 3.4.71b in Harvey shows ∂vt/∂θi. Store result in dvit[,p].
˜xt|t−1 is the one-step ahead prediction covariance output from the Kalman filter, and in MARSSkf is xtt1[,t]
. Next, use equation 3.4.73, to get ∂Ft/∂θi. Store result in dFit[,,p]
.
˜Vt|t−1 is the one-step ahead prediction covariance output from the Kalman filter, and in MARSSkf is denoted Vtt1[,,t]
.
Recursion for derivatives of states and variance of states
If t=1
Case 1. π=x0 is treated as a parameter and V0=0. For any θi that is not in π, Z or a, ∂v1/∂θi =0. For any θi that is not in Z or R, ∂F1/∂θi =0 (a n x n matrix of zeros).
From equation 3.4.73a: ∂˜x1|0∂θi=∂B1∂θiπ+B1∂π∂θi+∂ut∂θi
From equation 3.4.73b and using V0=0: ∂˜V1|0∂θi=∂B1∂θiV0B⊤1+B1∂V0∂θiB⊤1+B1V0∂B⊤1∂θi+∂(GtQtG⊤t)∂θi=∂(GtQtG⊤t)∂θi
Case 2. π=x1|0 is treated as a parameter and V1|0=0. ∂˜x1|0∂θi=∂π∂θi and ∂V1|0/∂θi=0.
Case 3. x0 is specified by a fixed prior. x0=π and V0=Λ. The derivatives of these are 0, because they are fixed.
From equation 3.4.73a and using x0=π and ∂π/∂θi=0: ∂˜x1|0∂θi=∂B1∂θiπ+B1∂π∂θi+∂ut∂θi=∂B1∂θiπ+∂ut∂θi
From equation 3.4.73b and using V0=Λ and ∂Λ/∂θi=0: ∂˜V1|0∂θi=∂B1∂θiV0B⊤1+B1∂V0∂θiB⊤1+B1V0∂B⊤1∂θi+∂(GtQtG⊤t)∂θi =∂B1∂θiΛB⊤1+B1Λ∂B⊤1∂θi+∂(GtQtG⊤t)∂θi
Case 4. x1|0 is specified by a fixed prior. x1|0=π and V1|0=Λ. ∂V1|0/∂θi=0 and ∂x1|0/∂θi=0.
Case 5. Estimate V0 or V1|0. That is unstable (per Harvey 1989, somewhere). I do not allow that in the MARSS package.
When coding this recursion, I will loop though the MARSS parameters (x0, V, Z, a, R, B, u, Q) and within that loop, loop through the individual parameters within the parameter vector. So say Q is diagonal and unequal. It has m variance parameters, and I’ll loop through each.
Now we have ∂˜x1|0∂θi and ∂˜V1|0∂θi for t=1 and we can proceed.
If t>1
The derivative of ˜xt|t−1 is (3.4.73a in Harvey)
Then we take the derivative of this to get the second partial derivative.
In the equations, ˜xt|t is output by the Kalman filter. In MARSSkf, it is called xtt[,t]
. ˜xt−1|t−1 would be called xtt[,t-1]
. The derivatives of ˜xt−1|t−1 is from the next part of the recursion (below).
The derivative of ˜Vt|t−1 is (3.4.73b in Harvey)
The second derivative of ˜Vt|t−1 is obtained by taking the derivative of 13 and eliminating any second derivatives of parameters:
In the derivatives, ˜Vt|t is output by the Kalman filter. In MARSSkf, it is called Vtt[,t]
. ˜Vt−1|t−1 would be called Vtt[,t-1]
. The derivatives of ˜Vt−1|t−1 is from the rest of the recursion (below).
Rest of the recursion equations are the same for all t.
From equation 3.4.74a:
˜Vt|t−1 is output by the Kalman filter. In MARSSkf, it is called Vtt1[,t]
. vt are the innovations. In MARSSkf, they are called Innov[,t]
.
From equation 3.4.74b:
- Repeat for next element in parameter matrix.
- Repeat for parameter matrix at time t.
- Loop over i = 1 to p.
Loop over j = i to p.
Compute Iij(θ) and add to previous time step. This is equation 3.4.69 with the expectation dropped. Store in
Iij[i,j]
andIij[j,i]
. Iij(θ)t=Iji(θ)t=12[tr[F−1t∂Ft∂θiF−1t∂Ft∂θj]]+(∂vt∂θi)⊤F−1t∂vt∂θjAdd this on to previous one to get new Iij(θ): Iij(θ)=Iij(θ)+Iij(θ)t
- Repeat for next j.
- Repeat for next i.
- Repeat for next t.
At the end, Iij(θ) is the observed Fisher Information Matrix.
Note that Q and R do not appear in ∂vt/∂θi, but all the other parameters do appear. So the second term in Iij(θ) is always zero between Q and R and any other parameters. In the second term, u and a do not appear, but every other terms do appear. So the first term in Iij(θ) is always zero between u and a and any other parameters. This means that there is always zero covariance between u or a and Q or R. But this will not be the case between Q or R and B or Z.
Part of the motivation of implementing the Harvey (1989) recursion is that currently in MARSS, I use a numerical estimate of the Fisher Information matrix by using one of R’s functions to return the Hessian. But it often returns errors. I might improve it if I constrained it. If I am only estimating u, a, Q and R, I could do a two-step process. Get the Hessian holding the variances at the MLEs and then repeat with u and a at the MLEs.