I have been working the last 5 years on a project having to do with the upwelling zones off the coast of India. As part of this, I have been working on ways to visualize the upwelling zone—a spatial and temporal visualization problem. I have some RMarkdown files showing how to create these in R using data from the NOAA ERDDAP data server.
The first example I have is how to make a movie (or gif) from remote-sensing data.
I have two vignettes that show how to do this:
The second example is how to make a colorbar over time and space. Here is an example of chlorophyll off the coast of India in 2010. It is not very interesting since there are many NAs, but you get the idea.
]]>This is part of a series on computing the Fisher Information for Multivariate Autoregressive State-Space Models. Part I: Background, Part II: Louis 1982, Part III: Harvey 1989, Background, Part IV: Harvey 1989, Implementation.
Citation: Holmes, E. E. 2017. Notes on computing the Fisher Information matrix for MARSS models. Part IV Implementing the Recursion in Harvey 1989.
Part III Introduced the approach of Harvey (1989) for computing the expected and observed Fisher Information matrices by using the prediction error form of the log-likelihood function. Here I show the Harvey (1989) recursion on page 143 for computing the derivatives in his equations.
Equations 3.4.66 and 3.4.69 in Harvey (1989) have first and second derivatives of $v_t$ and $F_t$ with respect to $\theta_i$ and $\theta_j$. These in turn involve derivatives of the parameter matrices and of $\tilde{x} _ {t\vert t}$ and $\tilde{V} _ {t\vert t}$. Harvey shows all the first derivatives, and it is easy to compute the second derivatives by taking the derivatives of the first.
The basic idea of the recursion is simple, if a bit tedious.
First we set up matrices for all the first derivatives of the parameters.
Then starting from t=1 and working forward, we will do the recursion (described below) for all $\theta_i$ and we store the first derivatives of $v_t$, $F_t$, $\tilde{x} _ {t\vert t}$ and $\tilde{V} _ {t\vert t}$ with respect to $\theta_i$.
Then we go through the parameter vector a second time, to get all the second derivatives with respect to $\theta_i$ and $\theta_j$.
We input the first and second derivatives of $v_t$ and $F_t$ into equations 3.4.66 and 3.4.69 to get the observed Fisher Information at time t and add to the Fisher Information from the previous time step. The Fisher Information matrix is symmetric, so we can use an outer loop from $\theta_1$ to $\theta_p$ ($p$ is the number of parameters) and an inner loop from $\theta_i$ to $\theta_p$. That will be $p(p-1)/2$ loops for each time step.
The end result with be the observed Fisher Information matrix using equation 3.4.66 and using 3.4.69.
This is a forward recursion starting at t=1. We will save the previous time step’s $\partial v_t / \theta_i$ and $\partial F_t / \theta_i$. That will be $p \times 2$ ($n \times 1$) vectors and $n \times 2$ ($n \times n$) matrices. We do not need to store all the previous time steps since this is a one-pass recursion unlike the Kalman smoother, which is forward-backward.
Number of parameters = p.
Inner loop over all MARSS parameters: x0, V0, Z, a, R, B, u, Q. This is par$Z
, e.g., and is a vector of the estimated parameters elements in Z.
Inner loop over parameters in parameter matrix, so, e.g. over the rows in the column vector par$Z
.
Keep track of what parameter element I am on via p counter.
Within the recursion, we have terms like, $\partial M/\partial \theta_i$, where M means some parameter matrix. We can write M as $vec(M)=f+D\theta_m$, where $\theta_m$ is the vector of parameters that appear in M. This is the way that matrices are written in Holmes (2010). So
is written in vec form as
The derivative of this with respect to $\theta_i=a$ is
So in MARSS, $\partial M/\partial \theta_i$ would be
dthetai=matrix(0,ip,1)
dthetai[i,]=1 #set up the d theta_i bit.
dM=unvec(f+D%*%dthetai,dim(M)) #only needed if M is matrix
The reason is that MARSS allows any linear constraint of the form $\alpha+\beta a + \beta_2 b$, etc. The vec form allows me to work with a generic linear constraint without having to know the exact form of that constraint. The model and parameters are all specified in vec form with f, D, and p matrices (lower case = column vector).
The second derivative of a parameter matrix with respect to $\theta_j$ is always 0 since \ref{dpar} has no parameters in it, only constants.
Equation 3.4.71b in Harvey shows $\partial v_t / \partial \theta_i$. Store result in dvit[,p].
$\tilde{x} _ {t\vert t-1}$ is the one-step ahead prediction covariance output from the Kalman filter, and in MARSSkf is xtt1[,t]
.
Next, use equation 3.4.73, to get $\partial F_t / \partial \theta_i$. Store result in dFit[,,p]
.
$\tilde{V} _ {t\vert t-1}$ is the one-step ahead prediction covariance output from the Kalman filter, and in MARSSkf is denoted Vtt1[,,t]
.
Case 1. $\pi=x_0$ is treated as a parameter and $V_0 = 0$. For any $\theta_i$ that is not in $\pi$, $Z$ or $a$, $\partial v_1/\partial \theta_i\ = 0$. For any $\theta_i$ that is not in $Z$ or $R$, $\partial F_1/\partial \theta_i\ = 0$ (a n x n matrix of zeros).
From equation 3.4.73a: \begin{equation} \frac{\partial \tilde{x}_{1\vert 0}}{\partial\theta_i } = \frac{\partial B_1}{\partial \theta_i} \pi + B_1 \frac{\partial \pi}{\partial \theta_i} + \frac{\partial u_t}{\partial \theta_i}\end{equation}
From equation 3.4.73b and using $V_0 = 0$: \begin{equation} \frac{\partial \tilde{V}_{1\vert 0}}{\partial\theta_i } = \frac{\partial B_1}{\partial \theta_i} V_0 B_1^\top + B_1 \frac{\partial V_0}{\partial \theta_i} B_1^\top + B_1 V_0 \frac{\partial B_1^\top}{\partial \theta_i} + \frac{\partial (G_t Q_t G_t^\top)}{\partial \theta_i} = \frac{\partial (G_t Q_t G_t^\top)}{\partial \theta_i}\end{equation}
Case 2. $\pi=x_{1\vert 0}$ is treated as a parameter and $V_{1\vert 0}=0$. \begin{equation}\frac{\partial \tilde{x} _ {1\vert 0}}{\partial \theta_i}=\frac{\partial \pi}{\partial \theta_i} \text{ and } \partial V_{1\vert 0}/\partial\theta_i = 0.\end{equation}
Case 3. $x_0$ is specified by a fixed prior. $x_0=\pi$ and $V_0=\Lambda$. The derivatives of these are 0, because they are fixed.
From equation 3.4.73a and using $x_0 = \pi$ and $\partial \pi/\partial \theta_i = 0$: \begin{equation} \frac{\partial \tilde{x}_{1\vert 0}}{\partial\theta_i } = \frac{\partial B_1}{\partial \theta_i} \pi + B_1 \frac{\partial \pi}{\partial \theta_i} + \frac{\partial u_t}{\partial \theta_i}=\frac{\partial B_1}{\partial \theta_i} \pi + \frac{\partial u_t}{\partial \theta_i}\end{equation}
From equation 3.4.73b and using $V_0 = \Lambda$ and $\partial \Lambda/\partial \theta_i = 0$: \begin{equation}\begin{split} \frac{\partial \tilde{V}_{1\vert 0}}{\partial\theta_i } &= \frac{\partial B_1}{\partial \theta_i} V_0 B_1^\top + B_1 \frac{\partial V_0}{\partial \theta_i} B_1^\top + B_1 V_0 \frac{\partial B_1^\top}{\partial \theta_i} + \frac{\partial (G_t Q_t G_t^\top)}{\partial \theta_i}\ &= \frac{\partial B_1}{\partial \theta_i} \Lambda B_1^\top + B_1 \Lambda \frac{\partial B_1^\top}{\partial \theta_i} + \frac{\partial (G_t Q_t G_t^\top)}{\partial \theta_i}\end{split}\end{equation}
Case 4. $x_{1\vert 0}$ is specified by a fixed prior. $x_{1\vert 0}=\pi$ and $V_{1\vert 0} = \Lambda$. $\partial V_{1\vert 0}/\partial\theta_i = 0$ and $\partial x_{1\vert 0}/\partial\theta_i = 0$.
Case 5. Estimate $V_0$ or $V_{1\vert 0}$. That is unstable (per Harvey 1989, somewhere). I do not allow that in the MARSS package.
When coding this recursion, I will loop though the MARSS parameters (x0, V, Z, a, R, B, u, Q) and within that loop, loop through the individual parameters within the parameter vector. So say Q is diagonal and unequal. It has m variance parameters, and I’ll loop through each.
Now we have $\frac{\partial \tilde{x} _ {1\vert 0}}{\partial \theta_i}$ and $\frac{\partial \tilde{V} _ {1\vert 0}}{\partial \theta_i}$ for $t=1$ and we can proceed.
The derivative of $\tilde{x} _ {t\vert t-1}$ is (3.4.73a in Harvey)
Then we take the derivative of this to get the second partial derivative.
In the equations, $\tilde{x} _ {t\vert t}$ is output by the Kalman filter. In MARSSkf, it is called xtt[,t]
. $\tilde{x} _ {t-1\vert t-1}$ would be called xtt[,t-1]
. The derivatives of $\tilde{x} _ {t-1\vert t-1}$ is from the next part of the recursion (below).
The derivative of $\tilde{V} _ {t\vert t-1}$ is (3.4.73b in Harvey)
The second derivative of $\tilde{V} _ {t\vert t-1}$ is obtained by taking the derivative of \ref{derivVtt1} and eliminating any second derivatives of parameters:
In the derivatives, $\tilde{V} _ {t\vert t}$ is output by the Kalman filter. In MARSSkf, it is called Vtt[,t]
. $\tilde{V} _ {t-1\vert t-1}$ would be called Vtt[,t-1]
. The derivatives of $\tilde{V} _ {t-1\vert t-1}$ is from the rest of the recursion (below).
From equation 3.4.74a:
$\tilde{V} _ {t\vert t-1}$ is output by the Kalman filter. In MARSSkf, it is called Vtt1[,t]
. $v_t$ are the innovations. In MARSSkf, they are called Innov[,t]
.
From equation 3.4.74b:
Loop over j = i to p.
Compute $I_{ij}(\theta)$ and add to previous time step. This is equation 3.4.69 with the expectation dropped. Store in Iij[i,j]
and Iij[j,i]
. \begin{equation}I _ {ij}(\theta)_t = I _ {ji}(\theta) _ t = \frac{1}{2}\left[ tr\left[ F_t^{-1}\frac{\partial F_t}{\partial \theta_i}F_t^{-1}\frac{\partial F_t}{\partial \theta_j}\right]\right] + \left(\frac{\partial v_t}{\partial \theta_i}\right)^\top F_t^{-1}\frac{\partial v_t}{\partial \theta_j}\end{equation}
Add this on to previous one to get new $I_{ij}(\theta)$: \begin{equation}I_{ij}(\theta) = I_{ij}(\theta) + I_{ij}(\theta)_t\end{equation}
At the end, $I_{ij}(\theta)$ is the observed Fisher Information Matrix.
Note that $Q$ and $R$ do not appear in $\partial v_t/\partial \theta_i$, but all the other parameters do appear. So the second term in $I_{ij}(\theta)$ is always zero between $Q$ and $R$ and any other parameters. In the second term, $u$ and $a$ do not appear, but every other terms do appear. So the first term in $I_{ij}(\theta)$ is always zero between $u$ and $a$ and any other parameters. This means that there is always zero covariance between $u$ or $a$ and $Q$ or $R$. But this will not be the case between $Q$ or $R$ and $B$ or $Z$.
Part of the motivation of implementing the Harvey (1989) recursion is that currently in MARSS, I use a numerical estimate of the Fisher Information matrix by using one of R’s functions to return the Hessian. But it often returns errors. I might improve it if I constrained it. If I am only estimating $u$, $a$, $Q$ and $R$, I could do a two-step process. Get the Hessian holding the variances at the MLEs and then repeat with $u$ and $a$ at the MLEs.
]]>The Sloan Fellowship is a prestigious award for early career STEM scientists at U.S. and Canadian academic institutions. The data set I am using is one I assembled on the Baccalaureate origins of the Sloan Fellows who received their undergraduate education in the U.S., which was approximately 50% of the fellows. The data were collected by looking up the CVs of the fellows, the names of which are posted in the Sloan Foundation press releases. I combined the Bacculaureate origin data with data on the undergraduate institutions from the Scorecard database on U.S. Baccalaureate institutions.
The data were then filtered to only include fellows who received their undergraduate degree in the U.S. and from either a research university or a liberal arts college (LAC). This is important to understand when looking at the results—the liberal arts colleges are being compared to research universities. This filtering only excluded a handful of fellows who received their undergraduate degrees in the U.S.; basically all Sloan fellows received their undergraduate degrees at a research university or liberal arts college.
This is my second attempt at this analysis. In my first attempt, I used the Equality of Opportunity tier groups (e.g. ‘Highly selective private’) and also tiers based on ACT scores (31, 32, etc). The most important difference is that I used total undergraduate enrollment as the ‘size’ of the school. I realized that the number of students with high SAT Math scores would be a better ‘size’ to use. The nature of Sloan Fellowships means that it is unlikely that the awardee would have had an SAT Math score under 700 as an undergraduate. These are people went to graduate school in fields that require quantitative skills. This second analysis uses the number of undergraduates with an SAT Math score between 700-800.
I will be showing a series of barplots where I look at the Sloan production within groups of schools. I am not concerned with individual schools, but rather the production within a whole group of schools.
Standardizing by the SAT math scores does not completely remove the selectivity effect—the effect that more selective schools have a higher production of future Sloan fellows. However it removes much of the effect.
Four schools have produced the lion’s share of the future Sloan fellows: MIT, CalTech, Harvard and Princeton. This is particularly the case in math, physics and economics where 36% of Sloan fellows received their undergraduate degrees from these four schools. In the other fields, 20% received their undergraduate degrees from these schools. Because these groups have an outsized effect on the results for their ‘group’, I removed them for the rest of the analysis and looked at where the other Sloan fellows received their undergraduate education.
The most striking result with these four outlier institutions removed is that Liberal Arts Colleges (LACs) produce an unusually high number of future Sloan fellows given their enrollment size. Approximately 25% of future Sloan fellows in math, physics and economics get their undergraduate degrees at a LAC (8% if we include the four outlier schools). Similarly in the other fields, 25% of future Sloan fellows went to a LAC as an undergraduate (15% if we include the four outlier schools). This is a high percentage if we consider the enrollment size of LACs. LACs have a higher per capita production of future Sloan fellows than the elite private schools (which includes the Ivies minus Harvard and Princeton). Production of future Sloan fellows is not restricted to ‘elite’ LACs. The mid-tier LACs also have higher production than the elite private schools, which are considerably more selective.
That small Liberal Arts Schools produce unusually high number graduates who go on to get PhDs in STEM is well-known and has been reported in other studies. What my analysis indicates is that this higher than expected production is also seen when we look at a select group of highly productive early-career research scientists in academia. These individuals are many years past their undergraduate education: they have finished a PhD, finished post-doctoral training, gained an academic position, and published ground-breaking research. For those in neuroscience, chemistry, and ocean science, they will have also successfully established a productive research group (‘lab’) of students and post-doctoral researchers.
The other big take-home is public institutions, even the very selective ones with similar SAT scores as the private schools, have low production of future Sloan fellows in all of the fields. Across the board, Liberals Arts Colleges outproduce on this metric. However, public institutions do outproduce the less selective private research universities across the board. In fact the less selective private research universities by and large do not produce future Sloan fellows.
This analysis focuses on comparting Liberal Arts colleges to research universities. The idea is to examine whether there is a difference in the production of future Sloan Fellows based on the type of institution (research university versus undergraduate institution).
I used these Carnegie Classifications
I labelled each institution by type: Liberal Arts colleges, technical research universities, and research universities but not a technical institute. The research universities were further separated into public and private, but the technical institutes were not. Only private Liberal Arts colleges were included in the analysis as there were too few public Liberal Arts colleges with data.
I divided schools into three SAT Math score bands based on the upper 75% math scores.
Some Liberal Arts colleges were missing SAT data in the Scorecard database and I filled those in from www.collegedata.com.
I created ‘tiers’ based on the Carnegie Classifications and the SAT Math bands.
The number of undergraduates with SAT Math 700-800 (‘Size’) was estimated by multiplying the total undergraduate enrollment (‘Tot.Size’) by the fraction of incoming freshman with SAT Math scores between 700-800 (‘SAT.700.800’).
Size | Tot.Size | SAT.700.800 | |
---|---|---|---|
brandeis university | 2129 | 3715 | 0.57 |
case western reserve university | 2662 | 4807 | 0.55 |
dartmouth college | 2703 | 4184 | 0.65 |
rice university | 3112 | 3888 | 0.80 |
tufts university | 3250 | 5143 | 0.63 |
When I compute the per-capita rate of Sloan Fellow production, I will be dividing the number of Fellows by the ‘Size’ column. This is to attempt to compare ‘apples to apples’. Yes, more Sloan Fellows attended Princeton as undergraduates. Is this something about Princeton, or simply that Princeton has an unusually high fraction of undergraduates who are strong academically, esp. in math.
The number of undergraduates (in thousands) in each bar is shown in parentheses below the number of Sloan Fellows. The bottom bar plot shows the per capita production of future Sloan Fellows.
The T1 and Pr1 groups have the highest per-capita production of future Sloan Fellows, but these groups are dominated by a few institutions, namely MIT, CalTech, Harvard and Princeton. These four institutions produce 35.8 percent of the future Sloan fellows in math, physics and economics.
What about the other 64.2 percent of Sloan fellows? To look at this, I remove MIT, CalTech, Harvard and Princeton. With these four outlier schools removed, the Liberal Arts colleges have the highest per-capita production in all SAT Math bands compared to research universities (private or public) in the same SAT Math band.
This shows the same bar-plot for Neuroscience, Chemistry, and Ocean Sciences. The number of undergraduates (n in thousands) and number of institutions (i) is shown below the number of Sloan Fellows for each bar.
MIT, CalTech, Harvard and Princeton produce 21.5 percent of the future Sloan fellows in neuroscience, chemistry, computational biology and ocean science.
However the sample sizes are small (only 13 fellows in the L1 group on the left), and the difference between 2 versus 3 Sloan Fellows looks large in the pie but can occur due the chance. Williams and Swarthmore are schools which one would expect to have more Sloan Fellows, but one would also expect Amherst and Bowdoin to be similarly high.
With these 4 schools removed, the Liberal Arts colleges have the highest per-capita production in all SAT Math bands.
This shows the institutions in each group along with the number of Sloan Fellows. I’ve added the SAT Math upper 75% score for each school. The Tot.Size is the undergraduate enrollment and ‘Size’ is the estimated number of undergraduates with an SAT score of 700-800.
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
california institute of technology | 12 | 983 | 983 | 800 | 1.00 |
georgia institute of technology-main campus | 2 | 13996 | 9043 | 770 | 0.65 |
massachusetts institute of technology | 10 | 4476 | 4217 | 800 | 0.94 |
rensselaer polytechnic institute | 2 | 5557 | 3590 | 770 | 0.65 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
colorado school of mines | 0 | 4383 | 1613 | 720 | 0.37 |
illinois institute of technology | 0 | 3046 | 1359 | 740 | 0.45 |
stevens institute of technology | 0 | 2842 | 1466 | 745 | 0.52 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
california polytechnic state university-san luis obispo | 0 | 19177 | 3308 | 680 | 0.17 |
florida institute of technology | 0 | 3348 | 462 | 660 | 0.14 |
lawrence technological university | 0 | 2798 | 277 | 650 | 0.10 |
michigan technological university | 0 | 5576 | 997 | 680 | 0.18 |
new jersey institute of technology | 0 | 6748 | 1003 | 670 | 0.15 |
virginia polytechnic institute and state university | 0 | 24191 | 4325 | 680 | 0.18 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
brandeis university | 1 | 3715 | 2129 | 770 | 0.57 |
brown university | 6 | 6264 | 3886 | 780 | 0.62 |
carnegie mellon university | 2 | 5819 | 4364 | 800 | 0.75 |
case western reserve university | 1 | 4807 | 2662 | 760 | 0.55 |
columbia university | 7 | 8100 | 6075 | 790 | 0.75 |
cornell university | 4 | 14195 | 9171 | 770 | 0.65 |
dartmouth college | 2 | 4184 | 2703 | 770 | 0.65 |
duke university | 2 | 6480 | 4860 | 790 | 0.75 |
georgetown university | 1 | 7211 | 3993 | 760 | 0.55 |
harvard university | 33 | 7236 | 5753 | 800 | 0.80 |
johns hopkins university | 1 | 6039 | 4188 | 770 | 0.69 |
northeastern university | 0 | 13492 | 8527 | 760 | 0.63 |
northwestern university | 1 | 8725 | 6544 | 790 | 0.75 |
princeton university | 27 | 5258 | 4181 | 800 | 0.80 |
rice university | 5 | 3888 | 3112 | 790 | 0.80 |
stanford university | 11 | 7018 | 5264 | 790 | 0.75 |
tufts university | 0 | 5143 | 3250 | 760 | 0.63 |
university of chicago | 12 | 5729 | 4694 | 800 | 0.82 |
university of notre dame | 1 | 8427 | 5445 | 770 | 0.65 |
university of pennsylvania | 2 | 10678 | 7476 | 780 | 0.70 |
university of southern california | 0 | 18392 | 10184 | 760 | 0.55 |
vanderbilt university | 0 | 6818 | 5756 | 800 | 0.84 |
washington university in st louis | 4 | 6913 | 5836 | 800 | 0.84 |
yale university | 8 | 5473 | 4105 | 800 | 0.75 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
bentley university | 0 | 4190 | 836 | 690 | 0.20 |
boston college | 0 | 9483 | 4232 | 740 | 0.45 |
boston university | 0 | 16457 | 6247 | 730 | 0.38 |
emory university | 0 | 7730 | 3865 | 750 | 0.50 |
george washington university | 0 | 10433 | 2608 | 700 | 0.25 |
lehigh university | 0 | 5034 | 2247 | 740 | 0.45 |
new york university | 2 | 24539 | 10478 | 740 | 0.43 |
santa clara university | 0 | 5447 | 1634 | 710 | 0.30 |
southern methodist university | 0 | 6340 | 1901 | 710 | 0.30 |
tulane university | 0 | 7892 | 1973 | 700 | 0.25 |
university of miami | 0 | 10828 | 4110 | 730 | 0.38 |
university of tulsa | 0 | 3441 | 860 | 700 | 0.25 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
american university | 0 | 7094 | 797 | 660 | 0.11 |
baylor university | 0 | 13801 | 1936 | 670 | 0.14 |
bethel university-saint paul | 0 | 2936 | 260 | 640 | 0.09 |
brigham young university-provo | 0 | 27163 | 4686 | 680 | 0.17 |
butler university | 0 | 4013 | 434 | 650 | 0.11 |
chapman university | 0 | 6211 | 479 | 650 | 0.08 |
creighton university | 0 | 3977 | 553 | 665 | 0.14 |
drake university | 0 | 3290 | 551 | 670 | 0.17 |
drexel university | 0 | 16681 | 2340 | 670 | 0.14 |
emerson college | 0 | 3757 | 290 | 650 | 0.08 |
fordham university | 0 | 8485 | 1464 | 680 | 0.17 |
gonzaga university | 0 | 4754 | 367 | 650 | 0.08 |
loyola marymount university | 0 | 6064 | 682 | 660 | 0.11 |
marquette university | 0 | 8212 | 634 | 650 | 0.08 |
rockhurst university | 0 | 1671 | 233 | 650 | 0.14 |
rollins college | 0 | 2670 | 313 | 660 | 0.12 |
saint louis university | 0 | 7822 | 1162 | 670 | 0.15 |
seattle university | 0 | 4415 | 350 | 640 | 0.08 |
spring arbor university | 0 | 2632 | 327 | 645 | 0.12 |
syracuse university | 0 | 14768 | 1802 | 660 | 0.12 |
texas christian university | 0 | 8600 | 763 | 650 | 0.09 |
university of dallas | 0 | 1324 | 105 | 640 | 0.08 |
university of dayton | 0 | 8305 | 658 | 640 | 0.08 |
university of denver | 0 | 5629 | 633 | 660 | 0.11 |
university of detroit mercy | 0 | 2677 | 481 | 670 | 0.18 |
yeshiva university | 0 | 2814 | 518 | 680 | 0.18 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
university of california-berkeley | 9 | 27126 | 14125 | 770 | 0.52 |
university of california-san diego | 2 | 24801 | 11887 | 760 | 0.48 |
university of illinois at urbana-champaign | 2 | 31875 | 23906 | 790 | 0.75 |
university of michigan-ann arbor | 3 | 28217 | 15624 | 760 | 0.55 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
college of william and mary | 0 | 6256 | 2671 | 740 | 0.43 |
cuny bernard m baruch college | 0 | 14420 | 2955 | 690 | 0.20 |
michigan state university | 1 | 38395 | 8535 | 690 | 0.22 |
missouri university of science and technology | 0 | 6418 | 1604 | 700 | 0.25 |
ohio state university-main campus | 0 | 43733 | 16601 | 730 | 0.38 |
purdue university-main campus | 0 | 29977 | 7494 | 700 | 0.25 |
rutgers university-new brunswick | 1 | 34094 | 8524 | 700 | 0.25 |
stony brook university | 1 | 16170 | 4765 | 710 | 0.29 |
suny at binghamton | 0 | 13372 | 3343 | 700 | 0.25 |
university of california-davis | 0 | 27547 | 6887 | 700 | 0.25 |
university of california-irvine | 0 | 24474 | 5394 | 690 | 0.22 |
university of california-los angeles | 1 | 29627 | 12550 | 750 | 0.42 |
university of california-santa barbara | 3 | 20237 | 5699 | 710 | 0.28 |
university of connecticut | 0 | 18016 | 3767 | 690 | 0.21 |
university of florida | 2 | 31879 | 6666 | 690 | 0.21 |
university of iowa | 2 | 21486 | 5372 | 700 | 0.25 |
university of maryland-college park | 3 | 26532 | 10072 | 730 | 0.38 |
university of minnesota-twin cities | 2 | 30135 | 13205 | 750 | 0.44 |
university of north carolina at chapel hill | 0 | 17908 | 6141 | 720 | 0.34 |
university of pittsburgh-pittsburgh campus | 0 | 18474 | 3785 | 690 | 0.20 |
university of texas at austin | 5 | 38914 | 12705 | 720 | 0.33 |
university of texas at dallas | 0 | 14300 | 3575 | 700 | 0.25 |
university of virginia-main campus | 6 | 15515 | 6625 | 740 | 0.43 |
university of washington-seattle campus | 0 | 29468 | 7367 | 700 | 0.25 |
university of wisconsin-madison | 1 | 29302 | 12046 | 740 | 0.41 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
arizona state university-tempe | 0 | 39316 | 3487 | 640 | 0.09 |
auburn university | 0 | 20514 | 1918 | 645 | 0.09 |
clemson university | 0 | 17083 | 2819 | 680 | 0.16 |
college of new jersey | 0 | 6407 | 899 | 670 | 0.14 |
cuny city college | 0 | 12175 | 1680 | 660 | 0.14 |
cuny hunter college | 0 | 15778 | 1560 | 650 | 0.10 |
florida state university | 0 | 32432 | 1489 | 640 | 0.05 |
indiana university-bloomington | 0 | 32252 | 4209 | 660 | 0.13 |
iowa state university | 2 | 28336 | 5568 | 680 | 0.20 |
miami university-oxford | 0 | 15454 | 2763 | 680 | 0.18 |
mississippi state university | 0 | 15800 | 1662 | 640 | 0.11 |
north carolina state university at raleigh | 0 | 22925 | 3572 | 680 | 0.16 |
north dakota state university-main campus | 0 | 11763 | 1981 | 665 | 0.17 |
pennsylvania state university-main campus | 2 | 39958 | 5938 | 670 | 0.15 |
southern illinois university-carbondale | 0 | 13171 | 1480 | 640 | 0.11 |
texas a and m university-college station | 0 | 46941 | 6975 | 670 | 0.15 |
university at buffalo | 0 | 19488 | 1729 | 650 | 0.09 |
university of alabama at birmingham | 0 | 11383 | 1279 | 640 | 0.11 |
university of alabama in huntsville | 0 | 5451 | 589 | 650 | 0.11 |
university of california-riverside | 1 | 18784 | 2186 | 650 | 0.12 |
university of california-santa cruz | 0 | 16277 | 2015 | 650 | 0.12 |
university of central florida | 0 | 52280 | 3022 | 640 | 0.06 |
university of cincinnati-main campus | 0 | 23795 | 3284 | 660 | 0.14 |
university of colorado boulder | 0 | 25873 | 3376 | 660 | 0.13 |
university of delaware | 0 | 18222 | 2223 | 660 | 0.12 |
university of georgia | 0 | 26738 | 3751 | 670 | 0.14 |
university of houston | 0 | 31643 | 2180 | 640 | 0.07 |
university of illinois at chicago | 0 | 16635 | 2296 | 660 | 0.14 |
university of kentucky | 0 | 21725 | 2114 | 640 | 0.10 |
university of maryland-baltimore county | 0 | 11274 | 1582 | 670 | 0.14 |
university of massachusetts-amherst | 0 | 21864 | 3068 | 670 | 0.14 |
university of massachusetts-lowell | 0 | 12190 | 840 | 640 | 0.07 |
university of michigan-dearborn | 0 | 6906 | 945 | 658 | 0.14 |
university of michigan-flint | 0 | 6565 | 1304 | 680 | 0.20 |
university of minnesota-duluth | 0 | 9120 | 887 | 640 | 0.10 |
university of missouri-columbia | 1 | 27276 | 3764 | 660 | 0.14 |
university of missouri-kansas city | 0 | 8127 | 1645 | 680 | 0.20 |
university of missouri-st louis | 0 | 8936 | 1435 | 660 | 0.16 |
university of nebraska-lincoln | 0 | 19979 | 2889 | 660 | 0.14 |
university of new orleans | 0 | 6742 | 1218 | 670 | 0.18 |
university of north carolina wilmington | 0 | 12686 | 582 | 640 | 0.05 |
university of oklahoma-norman campus | 0 | 20538 | 3329 | 670 | 0.16 |
university of puerto rico-rio piedras | 0 | 12086 | 2179 | 673 | 0.18 |
university of south carolina-columbia | 0 | 24623 | 2494 | 660 | 0.10 |
university of tennessee-knoxville | 1 | 21396 | 1695 | 640 | 0.08 |
university of utah | 1 | 22804 | 2654 | 650 | 0.12 |
university of vermont | 0 | 9958 | 985 | 650 | 0.10 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
amherst college | 1 | 1792 | 1158 | 770 | 0.65 |
bowdoin college | 0 | 1797 | 1202 | 770 | 0.67 |
carleton college | 2 | 2042 | 1131 | 760 | 0.55 |
claremont mckenna college | 0 | 1293 | 897 | 770 | 0.69 |
grinnell college | 0 | 1670 | 925 | 760 | 0.55 |
harvey mudd college | 2 | 802 | 756 | 800 | 0.94 |
illinois wesleyan university | 0 | 1883 | 1043 | 760 | 0.55 |
pomona college | 0 | 1635 | 1134 | 770 | 0.69 |
swarthmore college | 1 | 1530 | 989 | 770 | 0.65 |
williams college | 0 | 2019 | 1224 | 770 | 0.61 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
augustana college | 0 | 2469 | 560 | 690 | 0.23 |
barnard college | 0 | 2556 | 767 | 710 | 0.30 |
bates college | 1 | 1773 | 466 | 703 | 0.26 |
bryn mawr college | 0 | 1303 | 467 | 730 | 0.36 |
bucknell university | 0 | 3528 | 1210 | 720 | 0.34 |
colby college | 0 | 1847 | 633 | 720 | 0.34 |
colgate university | 2 | 2865 | 1432 | 750 | 0.50 |
college of the holy cross | 0 | 2754 | 531 | 690 | 0.19 |
colorado college | 1 | 2036 | 611 | 710 | 0.30 |
connecticut college | 0 | 1875 | 469 | 700 | 0.25 |
davidson college | 0 | 1765 | 605 | 720 | 0.34 |
dickinson college | 0 | 2315 | 579 | 700 | 0.25 |
franklin and marshall college | 0 | 2182 | 859 | 730 | 0.39 |
hamilton college | 0 | 1890 | 945 | 740 | 0.50 |
haverford college | 0 | 1187 | 594 | 740 | 0.50 |
kalamazoo college | 0 | 1431 | 315 | 690 | 0.22 |
kenyon college | 0 | 1651 | 330 | 690 | 0.20 |
lafayette college | 0 | 2471 | 847 | 720 | 0.34 |
lawrence university | 1 | 1483 | 546 | 730 | 0.37 |
macalester college | 2 | 2053 | 844 | 735 | 0.41 |
middlebury college | 0 | 2498 | 1067 | 740 | 0.43 |
mount holyoke college | 0 | 2161 | 828 | 735 | 0.38 |
oberlin college | 0 | 2961 | 1015 | 720 | 0.34 |
occidental college | 0 | 2023 | 506 | 700 | 0.25 |
pitzer college | 0 | 1076 | 381 | 720 | 0.35 |
reed college | 1 | 1335 | 526 | 730 | 0.39 |
rhodes college | 0 | 2007 | 411 | 690 | 0.20 |
scripps college | 0 | 962 | 289 | 710 | 0.30 |
smith college | 0 | 2563 | 879 | 720 | 0.34 |
soka university of america | 0 | 411 | 118 | 710 | 0.29 |
st olaf college | 0 | 2990 | 645 | 690 | 0.22 |
trinity college | 0 | 2262 | 566 | 700 | 0.25 |
union college of schenectady, ny | 0 | 2204 | 780 | 720 | 0.35 |
university of richmond | 0 | 3223 | 1105 | 720 | 0.34 |
vassar college | 1 | 2389 | 1123 | 740 | 0.47 |
washington and lee university | 0 | 1880 | 868 | 730 | 0.46 |
wellesley college | 0 | 2172 | 969 | 740 | 0.45 |
wesleyan university | 1 | 2907 | 1241 | 740 | 0.43 |
wheaton college of wheaton, il | 0 | 2402 | 600 | 700 | 0.25 |
whitman college | 1 | 1480 | 430 | 710 | 0.29 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
agnes scott college | 0 | 802 | 78 | 640 | 0.10 |
albion college | 0 | 1249 | 111 | 640 | 0.09 |
austin college | 0 | 1272 | 126 | 650 | 0.10 |
bard college | 0 | 2021 | 361 | 680 | 0.18 |
beloit college | 0 | 1225 | 102 | 645 | 0.08 |
centre college | 0 | 1379 | 254 | 680 | 0.18 |
college of wooster | 0 | 2029 | 302 | 670 | 0.15 |
concordia college at moorhead | 0 | 2296 | 49 | 660 | 0.02 |
cornell college | 0 | 1082 | 126 | 650 | 0.12 |
denison university | 0 | 2265 | 391 | 680 | 0.17 |
depauw university | 0 | 2185 | 246 | 660 | 0.11 |
earlham college | 0 | 942 | 162 | 680 | 0.17 |
furman university | 0 | 2796 | 341 | 660 | 0.12 |
gettysburg college | 0 | 2437 | 352 | 680 | 0.14 |
gordon college | 0 | 1703 | 166 | 640 | 0.10 |
gustavus adolphus college | 0 | 2455 | 447 | 675 | 0.18 |
hendrix college | 0 | 1322 | 173 | 660 | 0.13 |
hobart william smith colleges | 0 | 2344 | 247 | 670 | 0.11 |
hope college | 0 | 3312 | 358 | 650 | 0.11 |
knox college | 0 | 1376 | 102 | 660 | 0.07 |
lewis and clark college | 0 | 2039 | 242 | 670 | 0.12 |
luther college | 0 | 2326 | 226 | 640 | 0.10 |
muhlenberg college | 0 | 2332 | 262 | 660 | 0.11 |
ripon college | 0 | 820 | 173 | 680 | 0.21 |
sarah lawrence college | 0 | 1366 | 258 | 680 | 0.19 |
sewanee-university of the south | 0 | 1616 | 125 | 650 | 0.08 |
skidmore college | 0 | 2612 | 467 | 680 | 0.18 |
st john’s college | 0 | 429 | 77 | 680 | 0.18 |
st lawrence university | 0 | 2282 | 278 | 660 | 0.12 |
thomas aquinas college | 0 | 378 | 17 | 640 | 0.05 |
transylvania university | 0 | 1014 | 140 | 660 | 0.14 |
university of puget sound | 0 | 2550 | 226 | 650 | 0.09 |
wabash college | 0 | 923 | 82 | 640 | 0.09 |
washington college | 0 | 1408 | 97 | 640 | 0.07 |
westmont college | 1 | 1297 | 115 | 640 | 0.09 |
wheaton college of norton, ma | 0 | 1575 | 219 | 665 | 0.14 |
willamette university | 0 | 2009 | 199 | 650 | 0.10 |
wofford college | 0 | 1654 | 131 | 640 | 0.08 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
california institute of technology | 9 | 983 | 983 | 800 | 1.00 |
georgia institute of technology-main campus | 0 | 13996 | 9043 | 770 | 0.65 |
massachusetts institute of technology | 21 | 4476 | 4217 | 800 | 0.94 |
rensselaer polytechnic institute | 0 | 5557 | 3590 | 770 | 0.65 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
colorado school of mines | 0 | 4383 | 1613 | 720 | 0.37 |
illinois institute of technology | 0 | 3046 | 1359 | 740 | 0.45 |
stevens institute of technology | 0 | 2842 | 1466 | 745 | 0.52 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
california polytechnic state university-san luis obispo | 1 | 19177 | 3308 | 680 | 0.17 |
florida institute of technology | 0 | 3348 | 462 | 660 | 0.14 |
lawrence technological university | 0 | 2798 | 277 | 650 | 0.10 |
michigan technological university | 0 | 5576 | 997 | 680 | 0.18 |
new jersey institute of technology | 0 | 6748 | 1003 | 670 | 0.15 |
virginia polytechnic institute and state university | 0 | 24191 | 4325 | 680 | 0.18 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
brandeis university | 3 | 3715 | 2129 | 770 | 0.57 |
brown university | 6 | 6264 | 3886 | 780 | 0.62 |
carnegie mellon university | 3 | 5819 | 4364 | 800 | 0.75 |
case western reserve university | 0 | 4807 | 2662 | 760 | 0.55 |
columbia university | 5 | 8100 | 6075 | 790 | 0.75 |
cornell university | 7 | 14195 | 9171 | 770 | 0.65 |
dartmouth college | 6 | 4184 | 2703 | 770 | 0.65 |
duke university | 3 | 6480 | 4860 | 790 | 0.75 |
georgetown university | 1 | 7211 | 3993 | 760 | 0.55 |
harvard university | 22 | 7236 | 5753 | 800 | 0.80 |
johns hopkins university | 2 | 6039 | 4188 | 770 | 0.69 |
northeastern university | 0 | 13492 | 8527 | 760 | 0.63 |
northwestern university | 1 | 8725 | 6544 | 790 | 0.75 |
princeton university | 13 | 5258 | 4181 | 800 | 0.80 |
rice university | 4 | 3888 | 3112 | 790 | 0.80 |
stanford university | 12 | 7018 | 5264 | 790 | 0.75 |
tufts university | 2 | 5143 | 3250 | 760 | 0.63 |
university of chicago | 4 | 5729 | 4694 | 800 | 0.82 |
university of notre dame | 1 | 8427 | 5445 | 770 | 0.65 |
university of pennsylvania | 3 | 10678 | 7476 | 780 | 0.70 |
university of southern california | 3 | 18392 | 10184 | 760 | 0.55 |
vanderbilt university | 0 | 6818 | 5756 | 800 | 0.84 |
washington university in st louis | 6 | 6913 | 5836 | 800 | 0.84 |
yale university | 4 | 5473 | 4105 | 800 | 0.75 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
bentley university | 0 | 4190 | 836 | 690 | 0.20 |
boston college | 1 | 9483 | 4232 | 740 | 0.45 |
boston university | 0 | 16457 | 6247 | 730 | 0.38 |
emory university | 1 | 7730 | 3865 | 750 | 0.50 |
george washington university | 0 | 10433 | 2608 | 700 | 0.25 |
lehigh university | 1 | 5034 | 2247 | 740 | 0.45 |
new york university | 4 | 24539 | 10478 | 740 | 0.43 |
santa clara university | 0 | 5447 | 1634 | 710 | 0.30 |
southern methodist university | 1 | 6340 | 1901 | 710 | 0.30 |
tulane university | 0 | 7892 | 1973 | 700 | 0.25 |
university of miami | 0 | 10828 | 4110 | 730 | 0.38 |
university of tulsa | 0 | 3441 | 860 | 700 | 0.25 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
american university | 0 | 7094 | 797 | 660 | 0.11 |
baylor university | 1 | 13801 | 1936 | 670 | 0.14 |
bethel university-saint paul | 0 | 2936 | 260 | 640 | 0.09 |
brigham young university-provo | 0 | 27163 | 4686 | 680 | 0.17 |
butler university | 0 | 4013 | 434 | 650 | 0.11 |
chapman university | 0 | 6211 | 479 | 650 | 0.08 |
creighton university | 0 | 3977 | 553 | 665 | 0.14 |
drake university | 0 | 3290 | 551 | 670 | 0.17 |
drexel university | 0 | 16681 | 2340 | 670 | 0.14 |
emerson college | 0 | 3757 | 290 | 650 | 0.08 |
fordham university | 0 | 8485 | 1464 | 680 | 0.17 |
gonzaga university | 1 | 4754 | 367 | 650 | 0.08 |
loyola marymount university | 0 | 6064 | 682 | 660 | 0.11 |
marquette university | 1 | 8212 | 634 | 650 | 0.08 |
rockhurst university | 0 | 1671 | 233 | 650 | 0.14 |
rollins college | 0 | 2670 | 313 | 660 | 0.12 |
saint louis university | 1 | 7822 | 1162 | 670 | 0.15 |
seattle university | 0 | 4415 | 350 | 640 | 0.08 |
spring arbor university | 0 | 2632 | 327 | 645 | 0.12 |
syracuse university | 0 | 14768 | 1802 | 660 | 0.12 |
texas christian university | 0 | 8600 | 763 | 650 | 0.09 |
university of dallas | 0 | 1324 | 105 | 640 | 0.08 |
university of dayton | 0 | 8305 | 658 | 640 | 0.08 |
university of denver | 0 | 5629 | 633 | 660 | 0.11 |
university of detroit mercy | 0 | 2677 | 481 | 670 | 0.18 |
yeshiva university | 0 | 2814 | 518 | 680 | 0.18 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
university of california-berkeley | 11 | 27126 | 14125 | 770 | 0.52 |
university of california-san diego | 0 | 24801 | 11887 | 760 | 0.48 |
university of illinois at urbana-champaign | 6 | 31875 | 23906 | 790 | 0.75 |
university of michigan-ann arbor | 5 | 28217 | 15624 | 760 | 0.55 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
college of william and mary | 4 | 6256 | 2671 | 740 | 0.43 |
cuny bernard m baruch college | 0 | 14420 | 2955 | 690 | 0.20 |
michigan state university | 2 | 38395 | 8535 | 690 | 0.22 |
missouri university of science and technology | 0 | 6418 | 1604 | 700 | 0.25 |
ohio state university-main campus | 2 | 43733 | 16601 | 730 | 0.38 |
purdue university-main campus | 0 | 29977 | 7494 | 700 | 0.25 |
rutgers university-new brunswick | 1 | 34094 | 8524 | 700 | 0.25 |
stony brook university | 0 | 16170 | 4765 | 710 | 0.29 |
suny at binghamton | 0 | 13372 | 3343 | 700 | 0.25 |
university of california-davis | 3 | 27547 | 6887 | 700 | 0.25 |
university of california-irvine | 1 | 24474 | 5394 | 690 | 0.22 |
university of california-los angeles | 5 | 29627 | 12550 | 750 | 0.42 |
university of california-santa barbara | 0 | 20237 | 5699 | 710 | 0.28 |
university of connecticut | 0 | 18016 | 3767 | 690 | 0.21 |
university of florida | 1 | 31879 | 6666 | 690 | 0.21 |
university of iowa | 1 | 21486 | 5372 | 700 | 0.25 |
university of maryland-college park | 9 | 26532 | 10072 | 730 | 0.38 |
university of minnesota-twin cities | 2 | 30135 | 13205 | 750 | 0.44 |
university of north carolina at chapel hill | 6 | 17908 | 6141 | 720 | 0.34 |
university of pittsburgh-pittsburgh campus | 0 | 18474 | 3785 | 690 | 0.20 |
university of texas at austin | 2 | 38914 | 12705 | 720 | 0.33 |
university of texas at dallas | 0 | 14300 | 3575 | 700 | 0.25 |
university of virginia-main campus | 3 | 15515 | 6625 | 740 | 0.43 |
university of washington-seattle campus | 1 | 29468 | 7367 | 700 | 0.25 |
university of wisconsin-madison | 6 | 29302 | 12046 | 740 | 0.41 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
arizona state university-tempe | 1 | 39316 | 3487 | 640 | 0.09 |
auburn university | 0 | 20514 | 1918 | 645 | 0.09 |
clemson university | 0 | 17083 | 2819 | 680 | 0.16 |
college of new jersey | 0 | 6407 | 899 | 670 | 0.14 |
cuny city college | 0 | 12175 | 1680 | 660 | 0.14 |
cuny hunter college | 1 | 15778 | 1560 | 650 | 0.10 |
florida state university | 0 | 32432 | 1489 | 640 | 0.05 |
indiana university-bloomington | 2 | 32252 | 4209 | 660 | 0.13 |
iowa state university | 0 | 28336 | 5568 | 680 | 0.20 |
miami university-oxford | 0 | 15454 | 2763 | 680 | 0.18 |
mississippi state university | 1 | 15800 | 1662 | 640 | 0.11 |
north carolina state university at raleigh | 1 | 22925 | 3572 | 680 | 0.16 |
north dakota state university-main campus | 0 | 11763 | 1981 | 665 | 0.17 |
pennsylvania state university-main campus | 7 | 39958 | 5938 | 670 | 0.15 |
southern illinois university-carbondale | 0 | 13171 | 1480 | 640 | 0.11 |
texas a and m university-college station | 2 | 46941 | 6975 | 670 | 0.15 |
university at buffalo | 0 | 19488 | 1729 | 650 | 0.09 |
university of alabama at birmingham | 0 | 11383 | 1279 | 640 | 0.11 |
university of alabama in huntsville | 0 | 5451 | 589 | 650 | 0.11 |
university of california-riverside | 1 | 18784 | 2186 | 650 | 0.12 |
university of california-santa cruz | 1 | 16277 | 2015 | 650 | 0.12 |
university of central florida | 1 | 52280 | 3022 | 640 | 0.06 |
university of cincinnati-main campus | 0 | 23795 | 3284 | 660 | 0.14 |
university of colorado boulder | 2 | 25873 | 3376 | 660 | 0.13 |
university of delaware | 2 | 18222 | 2223 | 660 | 0.12 |
university of georgia | 1 | 26738 | 3751 | 670 | 0.14 |
university of houston | 0 | 31643 | 2180 | 640 | 0.07 |
university of illinois at chicago | 0 | 16635 | 2296 | 660 | 0.14 |
university of kentucky | 0 | 21725 | 2114 | 640 | 0.10 |
university of maryland-baltimore county | 1 | 11274 | 1582 | 670 | 0.14 |
university of massachusetts-amherst | 0 | 21864 | 3068 | 670 | 0.14 |
university of massachusetts-lowell | 0 | 12190 | 840 | 640 | 0.07 |
university of michigan-dearborn | 0 | 6906 | 945 | 658 | 0.14 |
university of michigan-flint | 0 | 6565 | 1304 | 680 | 0.20 |
university of minnesota-duluth | 0 | 9120 | 887 | 640 | 0.10 |
university of missouri-columbia | 1 | 27276 | 3764 | 660 | 0.14 |
university of missouri-kansas city | 0 | 8127 | 1645 | 680 | 0.20 |
university of missouri-st louis | 0 | 8936 | 1435 | 660 | 0.16 |
university of nebraska-lincoln | 0 | 19979 | 2889 | 660 | 0.14 |
university of new orleans | 1 | 6742 | 1218 | 670 | 0.18 |
university of north carolina wilmington | 0 | 12686 | 582 | 640 | 0.05 |
university of oklahoma-norman campus | 1 | 20538 | 3329 | 670 | 0.16 |
university of puerto rico-rio piedras | 0 | 12086 | 2179 | 673 | 0.18 |
university of south carolina-columbia | 0 | 24623 | 2494 | 660 | 0.10 |
university of tennessee-knoxville | 1 | 21396 | 1695 | 640 | 0.08 |
university of utah | 2 | 22804 | 2654 | 650 | 0.12 |
university of vermont | 1 | 9958 | 985 | 650 | 0.10 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
amherst college | 0 | 1792 | 1158 | 770 | 0.65 |
bowdoin college | 1 | 1797 | 1202 | 770 | 0.67 |
carleton college | 1 | 2042 | 1131 | 760 | 0.55 |
claremont mckenna college | 0 | 1293 | 897 | 770 | 0.69 |
grinnell college | 0 | 1670 | 925 | 760 | 0.55 |
harvey mudd college | 1 | 802 | 756 | 800 | 0.94 |
illinois wesleyan university | 1 | 1883 | 1043 | 760 | 0.55 |
pomona college | 2 | 1635 | 1134 | 770 | 0.69 |
swarthmore college | 3 | 1530 | 989 | 770 | 0.65 |
williams college | 4 | 2019 | 1224 | 770 | 0.61 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
augustana college | 0 | 2469 | 560 | 690 | 0.23 |
barnard college | 0 | 2556 | 767 | 710 | 0.30 |
bates college | 1 | 1773 | 466 | 703 | 0.26 |
bryn mawr college | 1 | 1303 | 467 | 730 | 0.36 |
bucknell university | 0 | 3528 | 1210 | 720 | 0.34 |
colby college | 1 | 1847 | 633 | 720 | 0.34 |
colgate university | 1 | 2865 | 1432 | 750 | 0.50 |
college of the holy cross | 0 | 2754 | 531 | 690 | 0.19 |
colorado college | 1 | 2036 | 611 | 710 | 0.30 |
connecticut college | 1 | 1875 | 469 | 700 | 0.25 |
davidson college | 0 | 1765 | 605 | 720 | 0.34 |
dickinson college | 0 | 2315 | 579 | 700 | 0.25 |
franklin and marshall college | 0 | 2182 | 859 | 730 | 0.39 |
hamilton college | 1 | 1890 | 945 | 740 | 0.50 |
haverford college | 0 | 1187 | 594 | 740 | 0.50 |
kalamazoo college | 0 | 1431 | 315 | 690 | 0.22 |
kenyon college | 0 | 1651 | 330 | 690 | 0.20 |
lafayette college | 1 | 2471 | 847 | 720 | 0.34 |
lawrence university | 1 | 1483 | 546 | 730 | 0.37 |
macalester college | 4 | 2053 | 844 | 735 | 0.41 |
middlebury college | 3 | 2498 | 1067 | 740 | 0.43 |
mount holyoke college | 0 | 2161 | 828 | 735 | 0.38 |
oberlin college | 1 | 2961 | 1015 | 720 | 0.34 |
occidental college | 0 | 2023 | 506 | 700 | 0.25 |
pitzer college | 1 | 1076 | 381 | 720 | 0.35 |
reed college | 2 | 1335 | 526 | 730 | 0.39 |
rhodes college | 0 | 2007 | 411 | 690 | 0.20 |
scripps college | 0 | 962 | 289 | 710 | 0.30 |
smith college | 2 | 2563 | 879 | 720 | 0.34 |
soka university of america | 0 | 411 | 118 | 710 | 0.29 |
st olaf college | 1 | 2990 | 645 | 690 | 0.22 |
trinity college | 0 | 2262 | 566 | 700 | 0.25 |
union college of schenectady, ny | 0 | 2204 | 780 | 720 | 0.35 |
university of richmond | 0 | 3223 | 1105 | 720 | 0.34 |
vassar college | 0 | 2389 | 1123 | 740 | 0.47 |
washington and lee university | 0 | 1880 | 868 | 730 | 0.46 |
wellesley college | 1 | 2172 | 969 | 740 | 0.45 |
wesleyan university | 2 | 2907 | 1241 | 740 | 0.43 |
wheaton college of wheaton, il | 0 | 2402 | 600 | 700 | 0.25 |
whitman college | 0 | 1480 | 430 | 710 | 0.29 |
Fellows | Tot.Size | Size | SAT.Math75 | SAT.700.800 | |
---|---|---|---|---|---|
agnes scott college | 0 | 802 | 78 | 640 | 0.10 |
albion college | 0 | 1249 | 111 | 640 | 0.09 |
austin college | 0 | 1272 | 126 | 650 | 0.10 |
bard college | 0 | 2021 | 361 | 680 | 0.18 |
beloit college | 0 | 1225 | 102 | 645 | 0.08 |
centre college | 0 | 1379 | 254 | 680 | 0.18 |
college of wooster | 0 | 2029 | 302 | 670 | 0.15 |
concordia college at moorhead | 1 | 2296 | 49 | 660 | 0.02 |
cornell college | 0 | 1082 | 126 | 650 | 0.12 |
denison university | 0 | 2265 | 391 | 680 | 0.17 |
depauw university | 0 | 2185 | 246 | 660 | 0.11 |
earlham college | 0 | 942 | 162 | 680 | 0.17 |
furman university | 1 | 2796 | 341 | 660 | 0.12 |
gettysburg college | 1 | 2437 | 352 | 680 | 0.14 |
gordon college | 0 | 1703 | 166 | 640 | 0.10 |
gustavus adolphus college | 1 | 2455 | 447 | 675 | 0.18 |
hendrix college | 0 | 1322 | 173 | 660 | 0.13 |
hobart william smith colleges | 0 | 2344 | 247 | 670 | 0.11 |
hope college | 1 | 3312 | 358 | 650 | 0.11 |
knox college | 0 | 1376 | 102 | 660 | 0.07 |
lewis and clark college | 0 | 2039 | 242 | 670 | 0.12 |
luther college | 1 | 2326 | 226 | 640 | 0.10 |
muhlenberg college | 0 | 2332 | 262 | 660 | 0.11 |
ripon college | 0 | 820 | 173 | 680 | 0.21 |
sarah lawrence college | 0 | 1366 | 258 | 680 | 0.19 |
sewanee-university of the south | 0 | 1616 | 125 | 650 | 0.08 |
skidmore college | 0 | 2612 | 467 | 680 | 0.18 |
st john’s college | 0 | 429 | 77 | 680 | 0.18 |
st lawrence university | 0 | 2282 | 278 | 660 | 0.12 |
thomas aquinas college | 0 | 378 | 17 | 640 | 0.05 |
transylvania university | 0 | 1014 | 140 | 660 | 0.14 |
university of puget sound | 1 | 2550 | 226 | 650 | 0.09 |
wabash college | 0 | 923 | 82 | 640 | 0.09 |
washington college | 0 | 1408 | 97 | 640 | 0.07 |
westmont college | 0 | 1297 | 115 | 640 | 0.09 |
wheaton college of norton, ma | 0 | 1575 | 219 | 665 | 0.14 |
willamette university | 0 | 2009 | 199 | 650 | 0.10 |
wofford college | 0 | 1654 | 131 | 640 | 0.08 |
The Sloan Fellowship is a prestigious award for early career scientists at U.S. and Canadian academic institutions. Each year 126 scientists are recognized. From the Sloan Fellowship website: “These 126 early-career scholars represent the most promising scientific researchers working today. Their achievements and potential place them among the next generation of scientific leaders in the U.S. and Canada.” Forty-three Sloan Fellows have gone on to win a Nobel Prize and 16 have won the Fields Medal in mathematics.
I looked up the CVs for all the Sloan Fellowship winners for all available years (2008-2016) in all fields: Math, Physics, Chemistry, Economics, Neuroscience, Computational Biology and Ocean Sciences. The names of the awardees are listed in the Sloan Foundation press releases. I noted the school where the awardee got their undergraduate degree. Here I analyze the patterns of the undergraduate institutions of those awardees who received their undergraduate degree in the United States (about 50% of awardees). I supplemented this data, with data on the undergraduate institutions from the Scorecard database on U.S. Baccalaureate institutions.
I will be showing a series of cumulative plots where I look at the Sloan production within groups of schools. I am not concerned with individual schools, but rather the production within a whole group of schools. Cumulative plots help one see patterns when one is looking at rare events, like Sloan fellows. Within a group, I sort by size of school (this is step is not important), and then take the cumulative sum of school size and number of Sloan fellows. I then plot cumulative sum of size (enrollment) against cumulative number of Sloan fellows.
For the first analysis, I used the tiers defined by the Equality of Opportunity Project (EOP) . These tiers include a relatively small set of schools (relative to all the private and public schools). I’m not sure how this set was defined, but certainly not using number of future Sloan Fellows. Notice the mean upper 75% ACT scores are similar across the groups within a tier (elite or highly selective). The EOP did not separate out the Liberal Arts colleges. I have to create separate tiers for these colleges. I did not use the EOP ‘highly selective public’ group since there are too many name differences between the EOP and Scorecard databases.
Mean upper 75% ACT is 34.4615385.
## [1] "brown university"
## [2] "columbia university"
## [3] "cornell university"
## [4] "dartmouth college"
## [5] "duke university"
## [6] "harvard university"
## [7] "princeton university"
## [8] "stanford university"
## [9] "university of chicago"
## [10] "university of pennsylvania"
## [11] "yale university"
## [12] "massachusetts institute of technology"
## [13] "california institute of technology"
Mean upper 75% ACT is 33.1052632.
## [1] "boston college"
## [2] "brandeis university"
## [3] "carnegie mellon university"
## [4] "case western reserve university"
## [5] "emory university"
## [6] "georgetown university"
## [7] "johns hopkins university"
## [8] "lehigh university"
## [9] "new york university"
## [10] "northeastern university"
## [11] "northwestern university"
## [12] "rice university"
## [13] "tufts university"
## [14] "tulane university"
## [15] "university of miami"
## [16] "university of notre dame"
## [17] "university of rochester"
## [18] "university of southern california"
## [19] "vanderbilt university"
## [20] "wake forest university"
## [21] "washington university in st louis"
Mean upper 75% ACT is 33.
## [1] "college of william and mary"
## [2] "university of california-berkeley"
## [3] "university of california-los angeles"
## [4] "university of michigan-ann arbor"
## [5] "university of north carolina at chapel hill"
## [6] "university of virginia-main campus"
Mean upper 75% ACT is 32.53125.
## [1] "amherst college" "barnard college"
## [3] "bates college" "bowdoin college"
## [5] "bryn mawr college" "bucknell university"
## [7] "carleton college" "claremont mckenna college"
## [9] "colby college" "colgate university"
## [11] "college of the holy cross" "colorado college"
## [13] "davidson college" "franklin and marshall college"
## [15] "grinnell college" "hamilton college"
## [17] "haverford college" "illinois wesleyan university"
## [19] "kenyon college" "macalester college"
## [21] "middlebury college" "oberlin college"
## [23] "pomona college" "reed college"
## [25] "scripps college" "swarthmore college"
## [27] "vassar college" "washington and lee university"
## [29] "wellesley college" "wesleyan university"
## [31] "whitman college" "williams college"
Mean upper 75% ACT is 29.9130435.
## [1] "american university" "baylor university"
## [3] "boston university" "chapman university"
## [5] "college of new jersey" "elon university"
## [7] "emerson college" "fordham university"
## [9] "george washington university" "gonzaga university"
## [11] "loyola university chicago" "loyola university maryland"
## [13] "loyola university new orleans" "marquette university"
## [15] "pepperdine university" "providence college"
## [17] "quinnipiac university" "santa clara university"
## [19] "southern methodist university" "syracuse university"
## [21] "texas christian university" "university of richmond"
## [23] "university of san diego" "university of tulsa"
## [25] "villanova university"
Mean upper 75% ACT is 30.35.
## [1] "american university" "baylor university"
## [3] "boston university" "chapman university"
## [5] "college of new jersey" "elon university"
## [7] "emerson college" "fordham university"
## [9] "george washington university" "gonzaga university"
## [11] "loyola university chicago" "loyola university maryland"
## [13] "loyola university new orleans" "marquette university"
## [15] "pepperdine university" "providence college"
## [17] "quinnipiac university" "santa clara university"
## [19] "southern methodist university" "syracuse university"
## [21] "texas christian university" "university of richmond"
## [23] "university of san diego" "university of tulsa"
## [25] "villanova university"
This shows the cumulative number of Sloan fellows in each tier group. The x-axis is the cumulative total number of undergraduates in the tier group; I’ve ordered the institutions by size and then summed the sizes and number of Sloan fellows. Cumulative plots help one see patterns for rare events, and Sloan fellows are rare. If the line is higher it means the per capita production is higher in that tier group. The ‘Ivy Plus’ group greatly outproduced future Sloan Fellows in Math and Physics, however as you will see in the next blog post this is actually due to just four institutions: MIT, CalTech, Harvard and Princeton.
Again the ‘Ivy Plus’ group outproduced Sloan Fellows. However, for these fields, the ‘Elite’ Liberal Arts colleges have a higher per capita (per undergraduate) production of future Sloan Fellows than the ‘Elite’ private and public schools.
This uses the Equality of Opportunity Project ‘elite private’ and ‘highly selective private’ tiers, but only looks at the Liberal Arts colleges. Mean upper 75% ACT for the ‘elite’ Liberal Arts colleges is 32.53125 and the mean for the ‘highly selective’ schools is 30.35. The more selective LACs (elite) outproduce the less selective LACs.
These patterns across the tier groups might simply represent the pool of undergraduates in the different tier groups and not represent any additional value added by the institutions.
For this analysis, I ignore the EOP tiers and simply plot the cumulative sum of Sloan Fellows for schools with a upper 75% ACT score of 35, 34, …, or 31 (as reported in the Scorecard data). This plot shows that clearly the upper 75% ACT is a predictor for the number of future Sloan Fellows. I will need to correct for the differences in the selectivity of each school. If a school has more students with a very high ACT score, they would be expected to have more future Sloan fellows simply because they have a large ‘pool’.
Before doing that (in the next post), I will do a preliminary analysis of the effect of ‘school type’. I abandon the Equality of Opportunity Project tier groups and I switch to defining the Carnegie School Classifications. I will use only resarch universities and baccalaureate colleges (liberal arts colleges). This information is part of the Scorecard data along with information on whether the institution is public or private.
The idea is to examine whether there is a difference in the production of future Sloan Fellows based on the type of institution (research university versus undergraduate institution).
For this analysis, I used all schools in the above categories with upper 75% ACT of 34 or 33. Scroll below the plot to see the schools in each group. I labelled schools as either LAC (baccalaureate), private (Carnegie Classification 15, 16 or 18 and private) or public (Carnegie Classification 15, 16 or 18 and public).
This plot shows that within this very high ACT group, LACs and private research universities have similar per capita Sloan production but public schools are lower even with the same upper 75% ACT scores.
## [1] "northwestern university"
## [2] "university of notre dame"
## [3] "johns hopkins university"
## [4] "washington university in st louis"
## [5] "dartmouth college"
## [6] "columbia university"
## [7] "cornell university"
## [8] "duke university"
## [9] "carnegie mellon university"
## [10] "university of pennsylvania"
## [11] "brown university"
## [12] "vanderbilt university"
## [13] "rice university"
## [14] "stanford university"
## [15] "university of southern california"
## [16] "georgetown university"
## [17] "boston college"
## [18] "brandeis university"
## [19] "northeastern university"
## [20] "tufts university"
## [21] "case western reserve university"
## [1] "university of california-berkeley"
## [2] "university of california-los angeles"
## [3] "georgia institute of technology-main campus"
## [4] "university of michigan-ann arbor"
## [5] "college of william and mary"
## [6] "university of virginia-main campus"
## [1] "pomona college" "amherst college"
## [3] "williams college" "haverford college"
## [5] "swarthmore college" "claremont mckenna college"
## [7] "scripps college" "wesleyan university"
## [9] "grinnell college" "bowdoin college"
## [11] "wellesley college" "carleton college"
## [13] "hamilton college" "vassar college"
## [15] "reed college" "middlebury college"
## [17] "washington and lee university"
For the next analyses, I control for the (estimated) number of undergraduates with high math SAT scores at each institution. Math SAT scores are not a perfect metric of future STEM productivity, but are highly correlated with required career prerequisites for a Sloan fellowship, namely a PhD in a STEM field.
School | Fellows | Size |
---|---|---|
brown university | 5 | 6264 |
california institute of technology | 12 | 983 |
columbia university | 6 | 8100 |
cornell university | 4 | 14195 |
dartmouth college | 1 | 4184 |
duke university | 1 | 6480 |
harvard university | 23 | 7236 |
massachusetts institute of technology | 9 | 4476 |
princeton university | 22 | 5258 |
stanford university | 9 | 7018 |
university of chicago | 9 | 5729 |
university of pennsylvania | 1 | 10678 |
yale university | 8 | 5473 |
School | Fellows | Size |
---|---|---|
boston college | 0 | 9483 |
brandeis university | 0 | 3715 |
carnegie mellon university | 2 | 5819 |
case western reserve university | 1 | 4807 |
emory university | 0 | 7730 |
georgetown university | 0 | 7211 |
johns hopkins university | 1 | 6039 |
lehigh university | 0 | 5034 |
new york university | 2 | 24539 |
northeastern university | 0 | 13492 |
northwestern university | 1 | 8725 |
rice university | 4 | 3888 |
tufts university | 0 | 5143 |
tulane university | 0 | 7892 |
university of miami | 0 | 10828 |
university of notre dame | 1 | 8427 |
university of rochester | 0 | 6074 |
university of southern california | 0 | 18392 |
vanderbilt university | 0 | 6818 |
wake forest university | 0 | 4861 |
washington university in st louis | 4 | 6913 |
School | Fellows | Size |
---|---|---|
amherst college | 0 | 1792 |
barnard college | 0 | 2556 |
bates college | 1 | 1773 |
bowdoin college | 0 | 1797 |
bryn mawr college | 0 | 1303 |
bucknell university | 0 | 3528 |
carleton college | 2 | 2042 |
claremont mckenna college | 0 | 1293 |
colby college | 0 | 1847 |
colgate university | 1 | 2865 |
college of the holy cross | 0 | 2754 |
colorado college | 1 | 2036 |
davidson college | 0 | 1765 |
franklin and marshall college | 0 | 2182 |
grinnell college | 0 | 1670 |
hamilton college | 0 | 1890 |
haverford college | 0 | 1187 |
illinois wesleyan university | 0 | 1883 |
kenyon college | 0 | 1651 |
macalester college | 1 | 2053 |
middlebury college | 0 | 2498 |
oberlin college | 0 | 2961 |
pomona college | 0 | 1635 |
reed college | 1 | 1335 |
scripps college | 0 | 962 |
swarthmore college | 1 | 1530 |
vassar college | 0 | 2389 |
washington and lee university | 0 | 1880 |
wellesley college | 0 | 2172 |
wesleyan university | 1 | 2907 |
whitman college | 1 | 1480 |
williams college | 0 | 2019 |
School | Fellows | Size |
---|---|---|
college of william and mary | 0 | 6256 |
university of california-berkeley | 9 | 27126 |
university of california-los angeles | 1 | 29627 |
university of michigan-ann arbor | 3 | 28217 |
university of north carolina at chapel hill | 0 | 17908 |
university of virginia-main campus | 4 | 15515 |
School | Fellows | Size |
---|---|---|
brown university | 5 | 6264 |
california institute of technology | 9 | 983 |
columbia university | 3 | 8100 |
cornell university | 5 | 14195 |
dartmouth college | 6 | 4184 |
duke university | 2 | 6480 |
harvard university | 16 | 7236 |
massachusetts institute of technology | 14 | 4476 |
princeton university | 9 | 5258 |
stanford university | 7 | 7018 |
university of chicago | 2 | 5729 |
university of pennsylvania | 2 | 10678 |
yale university | 3 | 5473 |
School | Fellows | Size |
---|---|---|
boston college | 1 | 9483 |
brandeis university | 3 | 3715 |
carnegie mellon university | 1 | 5819 |
case western reserve university | 0 | 4807 |
emory university | 1 | 7730 |
georgetown university | 0 | 7211 |
johns hopkins university | 2 | 6039 |
lehigh university | 1 | 5034 |
new york university | 3 | 24539 |
northeastern university | 0 | 13492 |
northwestern university | 1 | 8725 |
rice university | 2 | 3888 |
tufts university | 2 | 5143 |
tulane university | 0 | 7892 |
university of miami | 0 | 10828 |
university of notre dame | 1 | 8427 |
university of rochester | 3 | 6074 |
university of southern california | 3 | 18392 |
vanderbilt university | 0 | 6818 |
wake forest university | 0 | 4861 |
washington university in st louis | 6 | 6913 |
School | Fellows | Size |
---|---|---|
amherst college | 0 | 1792 |
barnard college | 0 | 2556 |
bates college | 1 | 1773 |
bowdoin college | 1 | 1797 |
bryn mawr college | 1 | 1303 |
bucknell university | 0 | 3528 |
carleton college | 1 | 2042 |
claremont mckenna college | 0 | 1293 |
colby college | 1 | 1847 |
colgate university | 1 | 2865 |
college of the holy cross | 0 | 2754 |
colorado college | 0 | 2036 |
davidson college | 0 | 1765 |
franklin and marshall college | 0 | 2182 |
grinnell college | 0 | 1670 |
hamilton college | 1 | 1890 |
haverford college | 0 | 1187 |
illinois wesleyan university | 1 | 1883 |
kenyon college | 0 | 1651 |
macalester college | 4 | 2053 |
middlebury college | 3 | 2498 |
oberlin college | 1 | 2961 |
pomona college | 1 | 1635 |
reed college | 2 | 1335 |
scripps college | 0 | 962 |
swarthmore college | 3 | 1530 |
vassar college | 0 | 2389 |
washington and lee university | 0 | 1880 |
wellesley college | 1 | 2172 |
wesleyan university | 2 | 2907 |
whitman college | 0 | 1480 |
williams college | 4 | 2019 |
School | Fellows | Size |
---|---|---|
college of william and mary | 4 | 6256 |
university of california-berkeley | 8 | 27126 |
university of california-los angeles | 5 | 29627 |
university of michigan-ann arbor | 4 | 28217 |
university of north carolina at chapel hill | 3 | 17908 |
university of virginia-main campus | 3 | 15515 |
This is part of a series on computing the Fisher Information for Multivariate Autoregressive State-Space Models. Part I: Background, Part II: Louis 1982, Part III: Harvey 1989, Background, Part IV: Harvey 1989, Implementation. \footnote[1]{Holmes, E. E. 2016. Notes on computing the Fisher Information matrix for MARSS models. Part III Overview of Harvey 1989. }
Part II discussed the approach by Louis 1982 which uses the full-data likelihood and the first derivative of that that is part of the M-step of the EM algorithm. The conclusion of part II was that that approach is doable but computationally expensive because it scales with $T^2$ at least.
Here I will review the more common approach (Harvey 1989, pages 140-142, section 3.4.5 Information matrix) which uses the prediction error form of the likelihood function to calculate the observed Fisher Information $\mathcal{I}(\hat{\theta},y)$. A related paper is Cavanaugh and Shumway (1996), which presents an approach for calculating the expected Fisher Information.
Harvey (1989), pages 140-142, shows how to write the Hessian of the log-likelihood function using the prediction error form of the likelihood. The prediction error form is:
The Hessian of the log-likelihood can then be written as
and this can be written in terms of derivatives of the innovations $v_t$ and the variance of the innovations $F_t$. This is shown in Equation 3.4.66 in Harvey (1989). There are a couple differences between the equation below and 3.4.66 in Harvey. First, 3.4.66 has a typo; the $[I - F_t v_t v_t^\top]$ should be within the trace (as below). Second, I have written out the derivative with respect to $\theta_j$ that appears in the first trace term.
The Fisher Information matrix is the negative of the expected value (over all possible data) of \ref{hessian}:
Thus for the Fisher Information matrix, we take the expectation (over all possible data) of the sum (over t) of Equation 3 (3.4.66 in Harvey 1989). On pages 141-142, Harvey shows that the expected value of Equation 3 can be simplified and the i,j element of the Fisher Information matrix can be written as (Equation 3.4.69 in Harvey 1989):
Equation \ref{Iij} (3.4.69 in Harvey 1989) is the Fisher Information and is evaluated at the true parameter values $\theta$. We do not know $\theta$ and instead we estimate the Fisher Information using our estimates of $\theta$. The two estimates of $I(\theta)$ that are used are called the expected and observed Fisher Information matrices. The expected Fisher Information is
and the observed Fisher Information is
The $\vert_{\theta=\hat{\theta}}$ means ‘evaluated at’. $l_t$ is a function of $\theta$. We take the derivative of that function and then evaluate that derivative at $\theta = \hat{\theta}$. The expectation (which is an integral) is over that possible values of the data $y$ which are generated from the model with $\theta$.
The observed Fisher Information drops the expectation and the expected Fisher Information does not. The expectation is taken over all possible data, and we have only one observed data set. On first blush, it may seem that it is impossible to compute the expectation and that we must always use the observed Fisher Information. However, for some models, one can write down the expectations analytically. One could simulate from the MLEs to get the expectations—this is the idea behind bootstrapping. In a bootstrapping approach one uses the MLE to generate data. This is an approximation since what we would like is to simulate data from the true parameters. The mean and variance of data generated from the MLEs versus data generated the true parameters often have nice asymptotic properties.
However it is common to use the observed Fisher Information matrix. This is what one is using when one uses the Hessian of the log-likelihood function evaluated at the MLEs. To get an analytical equation for the observed Fisher Information matrix, we use Equation 3 for $l_t$ and take the sum to get the Hessian of the log-likelihood function (\ref{hessian}). This is the same Hessian that you can get numerically. In R, you can use the fdHess function in the nmle package or the optim function.
Equation \ref{Iij} (Equation 3.4.69 in Harvey) is a simplification the expected value of the sum of equation 3. The simplification occurs because a number of terms in equation 3 drop out or cancel out when you take the expectation (see bottom of page 141 in Harvey 1989). The only terms that remain are those shown in equation \ref{Iij}. Harvey (1989) does not say how to compute the expectation in equation \ref{Iij} (which is his 3.4.69). Cavanaugh and Shumway (1996) do not say how to compute it either and suggest that it is infeasible (page 1 in paragraph after their equation 1). Instead they say that you can drop the expectation in equation \ref{Iij} and get the observed Fisher Information:
This however is halfway between the expected Fisher Information matrix and the observation Fisher Information matrix because equation \ref{Iij} is what you get after doing the expectation and dropping some of the terms in equation 3. If you compare what you get from equation \ref{obsIij} and what you get from a numerical estimate of the Hessian of the log-likelihood function at the MLE, you will see that they are different. The variance of the former is less than the variance of the latter. This is what you expect since the former has had the expectation applied to some terms in equation 3 (Harvey’s 3.4.66).
This does not mean that equation \ref{obsIij} should not be used, but rather that if you compare it to the output from a numerically computed Hessian, they will not be the same.
In Part IV, I show Harvey’s recursion for computing the first derivatives of $v_t$ and $F_t$ needed in equations 3 and \ref{Iij}. I extend this recursion to get the second derivative also. Once we have all these, we can use equation \ref{observedFisherInformation2} with equation 3 to compute the observed Fisher Information matrix and use equation \ref{Iij} to compute the “observed/expected” Fisher Information.
We can compute the Hessian of the log-likelihood by using a for loop of $i$ from 1 to $p$ with an inner for loop for $j$ from $i$ to $p$. The Hessian is symmetric so the inner loop only needs to go from $i$ to $p$. However, we can also write the Hessian for time step $t$ in a single line without any for loops using the Jacobian matrices for our derivatives. With the $t$ subscripts of $F$ and $v$ dropped:
This may or may not be faster but it is more concise. Go to Part IV. to see how to compute these Jacobians using Harvey’s recursion.
Note, I am going to drop the $t$ subscript on $F$ and $v$ because things are going to get cluttered; $v_1$ will refer to the 1st element of the $n \times 1$ column vector $v$ and $F_{11}$ is the (1,1) element of the matrix $F$. There has to be a loop to go through all the $F_t$ and $v_t$ for $t=1$ to $T$.
The first term of equation 3 is
The second term of equation 3 is
All the matrices within the traces above are symmetric. The trace of products of symmetric matrices is permutation invariant. That means that if A, B, C, and D are symmetric matrices, $tr(ABCD) = tr(ACBD) = tr(ACDB)$, etc. Thus the second term can be rearranged to match the middle term in the first term. Terms 1 + 2 of equation 3 can thus be written as
We can write the first trace of equation \ref{term12eqn3} as a vector product using the relation $tr(A^\top B) = vec(A)^\top vec(B)$. Note that the matrices in the traces in equation \ref{term12eqn3} are symmetric. If A is symmetric, $A^\top = A$ and $tr(AB) = vec(A)^\top vec(B)$.
That is for the i,j element. This matrix is symmetric so it is also the j,i element. The derivative of $vec(F)$ with respect to $\theta$ (as opposed to the j-th element of $\theta$) is the Jacobian matrix of $vec(F)$.
The full matrix for the first part of equation \ref{term12eqn3} is then
The middle trace of equation \ref{term12eqn3} is similar to the first and we end up with:
We can write this in terms of the Jacobian of $vec(F)$:
The third part of equation \ref{term12eqn3} involves the second derivatives $\partial^2 F/\partial\theta_i \partial\theta_j$.
Again this is the i,j term. The term on the bottom line on the right is the $(\theta_i,\theta_j)$ term of the Jacobian of the vec of the Jacobian of F:
The full matrix for the second part of term 1 + 2 in Equation 3 is then
The subscript on the $I$ indicates the size of the identity matrix. In this case, it is a $p \times p$ matrix.
With the $t$ subscripts dropped, the 3rd term of equation 3 is <div> \begin{equation}\label{term3eqn3} \frac{1}{2} tr\left[ F^{-1}\frac{\partial F}{\partial \theta_i}F^{-1} \left( \frac{\partial v}{\partial \theta_j}v^\top + v\frac{\partial v^\top}{\partial \theta_j}\right) \right] \end{equation} </div> Using the same procedure as for the above terms, we can write this in terms of vecs. If $b$ and $a$ are $1 \times n$ column vectors,
Thus,
and
When A is symmetric, $tr(AB) = vec(A)^\top vec(B)$. Thus term 3 of equation 3 can be written as
This is the i,j term of the Fisher Information matrix from term 3 in equation 3. To get all terms, we use the Jacobian of vec(F) as above and the Jacobian of v:
where $J_F$ is defined in equation \ref{JF} and $J_v$ is
The 4th term of equation 3 is
This is for the i,j term of the Fisher Information matrix. An equation for all terms can be written as a junction of the the Jacobian of $vec(J_v)$:
The right of equation \ref{term4eqn3}, $F^{-1}v$ is a $n \times 1$ matrix. We need to write this as the $np \times p$ matrix:
Thus the full matrix for the i,j terms in the Fisher Information matrix from term 4 of equation 3 is
Term 5 of equation 3 Term 5 is
This is a scalar and thus its vec is equal to itself. We can rewrite equation \ref{term5eqn3} using the following relation:
Thus equation \ref{term5eqn3} is
This is for the i,j term of the Fisher Information matrix. For the full matrix, we use the Jacobian of v (equation \ref{Jv}) and the Jacobian of vec(F) (equation \ref{JF}):
Term 6 is
This is for the i,j term of the Fisher Information matrix and we can write it immediately as the full matrix in terms of the Jacobian of v:
Putting all the terms together, we have the full observed Fisher Information matrix:
We can simplify this a little by noting that all terms are symmetric matrices and the transpose or a symmetric matrix is equal to itself.
Thus the full observed Fisher Information matrix is
This is part of a series on computing the Fisher Information for Multivariate Autoregressive State-Space Models. Part I: Background, Part II: Louis 1982, Part III: Harvey 1989, Background, Part IV: Harvey 1989, Implementation. \footnote[1]{Holmes, E. E. 2016. Notes on computing the Fisher Information matrix for MARSS models. Part II Louis 1982. Technical Report. https://doi.org/10.13140/RG.2.2.35694.72000}
So how do we compute $I(\hat{\theta})$ or $\mathcal{I}(\hat{\theta},y)$ (in Part I)? In particular, can we use the analytical derivatives of the full log-likelihood that are part of the EM algorithm? Many researchers have worked on this idea. My notes here were influenced by EM Algorithm: Confidence Intervals which is on the same topic. This blog post is mainly a discussion of the result by Louis (1982) on calculation of the Fisher Information matrix from the ‘score’ function that one takes the derivative of in the M-step of the EM algorithm.
The ‘score’ function used in the EM algorithm for a MARSS model is
It is the expected value taken over the hidden random variable $X$ of the full data log-likelihood at $Y=y$ [3]; full means it is a function of all the random variables in the model, which includes the hidden or latent variables. $x, y$ is the full ‘data’, the left side of the $x$ state equation and the $y$ observation equation. We take the expectation of this full data likelihood conditioned on the observed data $y$ and $\theta_j$ which is the value of $\theta$ at the j-th iteration of the EM algorithm. Although $Q(\theta \vert \theta_j)$ looks a bit hairy, actually the full-data likelihood may be very easy to write down and considerably easier than the data likelihood $f(y\vert\theta)$. The hard part is often the expectation step, however for MARSS models the Kalman filter-smoother algorithm computes the expectations involving $X$ and Holmes (2010) shows how to compute the expectations involving $Y$, which comes up when there are missing values in the dataset (missing time steps, say).
In the M-step of the EM algorithm, we take the derivative of $Q(\theta \vert \theta_j)$ with respect to $\theta$ and solve for the $\theta$ where
It would be nice if one could use the following to compute the observed Fisher Information
$Q(\theta \vert \hat{\theta})$ is our score function at the end of the EM algorithm, when $\theta = \hat{\theta}$. $Q$ is a function of $\theta$, the model parameters, and will have terms like $E(X\vert Y=y, \hat{\theta})$, the expected value of $X$ conditioned on $Y=y$ and the MLE. Those are the expectations coming out of the Kalman filter-smoother. We take the second derivative of $Q$ with respect to $\theta$. That is straight-forward for the MARSS equations. You take the first derivative of $Q$ with respect to $\theta$, which you already have from the update or M-step equations, and take the derivative of that with respect to $\theta$.
Conceptually, this
looks a bit like the observed Fisher Information:
except that instead of the data likelihood $f(y\vert\theta)$, we use the expected likelihood $E_{X\vert Y,\hat{\theta} } [\log f_{XY}(X,y\vert\theta) ]$. The expected likelihood is the full likelihood with the $X$ and $XX^\top$ random variables replaced by their expected values assuming $\theta = \hat{\theta}$ and $Y=y$. The problem is that $E_{X\vert Y,\theta } [\log f(X,y\vert\theta) ]$ is a function of $\theta$ and by fixing it at $\hat{\theta}$ we are not accounting for the uncertainty in that expectation. What we need is something like
Information with $X$ fixed at expected value - Information on expected value of $X$
We account for the fact that we have over-estimated the information from the data by treating the hidden random variable as fixed. The same issue arises when we compute confidence intervals using the estimate of the variance without accounting for the fact that this is an estimate and thus has uncertainty. Louis (1982) and Oakes (1999) are concerned with how to do this correction or adjustment.
The following is equations 3.1, 3.2 and 3.3 in Louis (1982) translated to the MARSS case. In the MARSS model, we have two random variables, $X(t)$ and $Y(t)$. The joint distribution of ${X(t), Y(t) }$ conditioned on $X(t-1)$ is multivariate normal. Our full data set includes all time steps, ${X, Y }$.
Let’s call the full state at time t ${x ,y}$, the value of the $X$ and $Y$ at all times t. The full state can be an unconditional random variable, ${X,Y}$ or a conditional random variable ${X,y}$ (conditioned on $Y=y$. Page 227 near top of Louis 1982 becomes
$f(.\vert\theta)$ is the probability distribution of the random variable conditioned on $\theta$. $\lambda$ is the full likelihood; ‘full’ means is includes both $x$ and $y$. $\lambda^\ast$ is the likelihood of $y$ alone. It is defined by the marginal distribution of $y$ [1]; the integral over $X$ on the right side of \ref{lambday}. For a MARSS model, the data likelihood can be written easily as a function of the Kalman filter recursions (which is why you can write a recursion for the information matrix based on derivatives of $\lambda^\ast$; see Part III).
Next equation down. Louis doesn’t say this and his notation is not totally clear, but the expectation right above section 3 (and in his eqn 3.1) is a conditional expectation. This is critical to know to follow his derivation of equation 3.1 in the appendix. $\theta_j$ is his $\theta(0)$; it is the value of $\theta$ at the last EM iteration.
My ‘expectation’ notation is a little different than Louis’. The subscript on the $E$ shows what is being integrated ($X$) and what are the conditionals.
The term $f_{X\vert Y}(x\vert Y=y,\theta_j)$ is the probability of $x$ conditioned on $Y=y$ and $\theta=\theta_j$. The subscript on $f$ indicates that we are using the probability distribution of $x$ conditioned on $Y=y$. For the EM algorithm, we need to distinguish between $\theta$ and $\theta_j$ because we maximize with respect to $\theta$ not $\theta_j$. If we just need the expectation at $\theta$, no maximization step, then we just use $\theta$ in $f(.\vert\theta)$ and the subscript on E.
Before moving on with the derivation, notice that in \ref{expLL}, we fix $y$, the data. We are not treating that as a random variable. We could certainly treat $E_{\theta_j}[ \lambda( {X, y}, \theta)]$ as some function $g(y)$ and consider the random variable $g(Y)$. But Louis (1982) will not go that route. $y$ is fixed. Thus we are talking about the observed Fisher Information rather than the expected Fisher Information. The latter would take an expectation over the possible $y$ generated by our model with parameters at the MLE.
Now we can derive equation 3.1 in Louis (1982). I am going to combine the info in Louis’ section 3.1 and the appendix on the derivation of 3.1. Before proceeding, Louis is using ‘denominator’ format for his matrix derivations; I normally use denominator format but I will follow his convention here. $\theta$ is a column vector of parameters and the likelihood $f(.\vert\theta)$ is scalar. Under ‘denominator format’, $f^\prime(.\vert\theta) = df(.\vert\theta)/d\theta)$ will be a column vector. $f^{\prime\prime}(.\vert\theta) = d^2f(.\vert\theta)/d\theta d\theta^\top)$ will be a matrix in Hessian format (the first $d\theta$ goes 1 to $n$ down columns and the second $d\theta$ does 1 to $n$ across rows).
Take the derivative of \ref{lambdaz} with respect to $\theta$ to define $S(z,\theta)$.
Take the derivative of the far right side of \ref{lambday} with respect to $\theta$ to define $S^\ast (y,\theta)$. For the last step (far right), I used $f_Y(y\vert\theta) = \int_X f_{XY}(x,y\vert\theta)dx$, the definition of the marginal distribution [1], to change the denominator.
Now multiply the integrand in the numerator by $f_{XY}(x,y\vert\theta)/f_{XY}(x,y\vert\theta)$. The last step (far right) uses \ref{Sz}.
We combine \ref{Sy} and \ref{intfprime}:
The second to last step used the fact that $f_Y(y\vert\theta)$ does not involve $x$ thus we can bring it into the integral. This gives us $f_{XY}(x,y\vert\theta)/f_Y(y\vert\theta)$. This is the probability of $x$ conditioned on $Y=y$ [2].
The last step in the derivation of equation 3.1 is to recognize that the far right side of \ref{Sstar} is the conditional expectation in 3.1. Louis does not actually write out the expectation in 3.1 and the notation is rather vague. But the expectation in equation 3.1 is the conditional expectation on the far right side of \ref{Sstar}.
using my notation for a conditional expectation which slightly different than Louis’. At the MLE, $S^\ast (y,\hat{\theta})=0$ since that is how the MLE is defined (it’s where the derivative of the data likelihood is zero).
The meat of Louis 1982 is equation 3.2. The observed Fisher Information matrix \ref{obsFI} is
The first 3 terms on the left are just show that all are notation that refers to the observed Fisher Information. The 4th term is one of the ways we can compute the observed Fisher Information at $\theta$ and the far right term shows that derivative explicitly.
We start by taking the second derivative of \ref{lambdaz} with respect to $\theta$ to define $B(x,y,\theta)$. We use $S^\prime(z,\theta)$ as written in \ref{Sz}.
The transpose of $d\theta$ is because we are taking the second derivative $d^2 l/d\theta d\theta^\top$ (the Hessian of the log-likelihood); $d\theta d\theta$ wouldn’t make sense as that that would be a column vector times a column vector.
To do the derivative on the far right side of \ref{B1}, we first need to recognize the form of the equation. $f_{XY}^\prime(x,y\vert\theta)$ is a column vector and $f(x,y\vert\theta)$ is a scalar, thus the thing we are taking the derivative of has the form $\overrightarrow{h}(\theta)/g(\theta)$; the arrow over $h$ is indicating that it is a (column) vector while $g()$ is a scalar. Using the chain rule for vector derivatives, we have
Thus we can write the equation for the negative of $B(x,y,\theta)$ as
Let’s return to \ref{obsFI32} and take the derivative of $\lambda^{\ast \prime}(y,\theta)$ with respect to $\theta$ using the form shown in equation \ref{Sy}. I have replaced the integral in the denominator by $f_Y(y\vert\theta)$ and used the same chain rule used for \ref{B2}.
The last substitution uses \ref{Sy}. Thus,
Let’s look at the integral of the second derivative of $f_{XY}(x,y\vert\theta)$ in \ref{B4}:
This is the conditional expectation $E_{X\vert Y,\theta} [ f_{XY}^{\prime\prime}(x,y\vert\theta) dx/f_{XY}(x,y\vert\theta) ]$ that we see 5 lines above the references in Louis (1982). Using \ref{B2} we can write this in terms of $B(x,y\vert\theta)$:
Combining \ref{B4}, \ref{B5}, and \ref{B6}, we can write the equation above the references in Louis:
The negative of this is the observed Fisher Information (\ref{obsFI32}) which gives us equation 3.2 in Louis (1982):
Louis states that “The first term in (3.2) is the conditional expected full data observed information matrix, while the last two produce the expected information for the conditional distribution of X given $X \in R$.” His X is my ${X,Y}$ and $X \in R$ means $Y=y$ in my context. He writes this in simplified form with $X$ replaced by $XY$:
Let’s see how this is the case.
The full data observed information matrix is
This is simply the definition that Louis gives to $B(x,y,\theta)$. We do not know $x$ so we do not know the full data observed Information matrix. But we have the distribution of $x$ conditioned on our data $y$.
is thus the expected full data observed information matrix conditioned on our observed data $y$.
So this is the first part of his statement.
The second part of his statement takes a bit more effort to work out. First we substitute $S^\ast (y\vert\theta)$ with
$E_{X\vert Y,\theta} [ S(X,y\vert\theta) ]$ from \ref{Louise3p1}. This gives us:
Using the computational form of the variance, $var(X)=E(XX)-E(X)E(X)$, we can see that \ref{ES1} is the conditional variance of $S(X,y\vert\theta)$.
But the variance of the first derivative of $f^\prime(X\vert\theta)$ is the expected Fisher Information of $X$ [4]. In this case, it is the expected Fisher Information of the hidden state $X$, where we specify that $X$ has the conditional distribution $f_{X\vert Y} (X \vert Y=y,\theta)$. Thus we have the second part of Louis’ statement.
The main result in Louis (1982) (\ref{Louismain}) can be written
The M-step of the EM algorithm involves the first derivative of the log-likelihood with respect to $\theta$, $S(X,y\vert\theta)$, since it involves setting this derivative to zero:
With the MARSS model, $S(X,y\vert\theta)$ is analytical and we can also compute $B(X,y\vert\theta)$, the second derivative, analytically.
The difficulty arises with this term: $var_{X\vert Y,\theta} [ S(X,y\vert\theta) ]$. The $S(X,y\vert\theta)$ is a summation from $t=1$ to $T$ that involves $X_t$ or $X_t X_{t-1}^\top$ for some parameters. When we do the cross-product, we will end up with terms like $E[ X_t X_{t+k}^\top ]$ and $E[ X_t X_t^\top X_{t+k}X_{t+k}^\top ]$. The latter is not a problem; all the random variables in a MARSS models are multivariate normal and the k-th central moments can be expressed in terms of the first and second moments [5], but that will still leave us with terms like $E[ X_t X_{t+k}^\top ]$, which are the smoothed covariance between $X$ at time $t$ and $t+k$ conditioned on all the data ($t=1:T$).
Computing these is not hard. These are the the n-step apart smoothed covariances. Harvey (1989), page 148, discusses how to use the Kalman filter to get the n-step ahead prediction covariances and a similar approach can be used (presumably) to get the $V(t,t+k)$ smoothed covariances. However this will end up being computationally expensive because we will need all of the $t,t+k$ combinations, i.e., {1,3}, {1,4}, …, {2,3}, {2,4}, …. etc.. That will be a lot: $T + T-1 + T-2 + T-3 + \dots$, i.e. $T(T+1)/2$, smoothed covariances.
Lystig and Hughes (2012) and Duan and Fulop (2011) discuss this issue for in a related application of the approach in Louis (1982). They suggest that you do not need to include covariances with a large time separation because the covariance goes to zero. You just need to include enough time-steps.
I think the approach of Louis (1982) is not viable for MARSS models. The derivatives $B(x,y\vert\theta)$ and $S(x,y\vert\theta)$ are straight-forward (if tedious) to compute analytically following the approach in Holmes (2010). But the computing all the n-step smoothed covariances is going to be very slow and each computation involves many matrix multiplications. However, one could compute $\mathcal{I}(\theta,y)$ via simulation using \ref{Louismain2}. It is easy enough to simulate $X$ using the MLEs and then you compute $B(x_b,y\vert\theta)$ and $S(x_b,y\vert\theta)$ for each where $x_b$ is the bootstrapped $x$ time series and $y$ is the data. I don’t think it makes sense to do that for MARSS models since there are two recursion approaches for computing the observed and expected Fisher Information using $f(y\vert\theta)$ and the Kalman filter equations (Harvey 1989, pages 140-142; Cavanaugh and Shumway 1996).
Given a joint probability distribution of ${X,Y}$, the marginal distribution of $Y$ is $\int_X f(X,Y) dx$. Discussions of the estimators for MARSS models often use the property of the marginal distributions of a multivariate normal without actually stating that this property is being used. The step in the derivation will just say, ‘Thus’ with no indication of what property was just used.
Reviewed here: http://fourier.eng.hmc.edu/e161/lectures/gaussianprocess/node7.html
If you have a joint likelihood of some random variables, and you want the likelihood of a subset of those random variables, then you compute the marginal distribution. i.e. you integrate over the random variables you want to get rid of: \begin{equation} L(\theta\vert y) ] = \int_X L(\theta\vert X,Y) p(x\vert Y=y, \theta_j) dx \vert_{Y=y}. \end{equation} So we integrate out $X$ from the full likelihood and then set $Y=y$ to get the likelihood.
The marginal likelihood is a little different. The marginal likelihood is used when you want to get rid of some of the parameters, nuisance parameters. The integral you use is different: \begin{equation} L(\theta_1\vert y) = \int_{\theta_2} p(y\vert\theta_1,\theta_2) p(\theta_2\vert\theta_1)d\theta_2 \end{equation}
This presumes that you have $p(\theta_2\vert\theta_1)$. The expected likelihood is different yet again: \begin{equation} E_{X,Y\vert Y=y,\theta_j} [L(\theta\vert X,Y) ] = \int_X L(\theta\vert X,Y) p(x\vert Y=y, \theta_j) dx.\end{equation}
On the surface it looks like the equation for $L(\theta\vert y)$ but it is different. $\theta_j$ is not $\theta$. It is the parameter value at which we are computing the expected value of $X$. Maximizing the $E_{X,Y\vert Y=y,\theta_j} [L(\theta\vert X,Y)]$ will increase the likelihood but will not take you to the MLE. You have to embed this maximization in the EM algorithm that walks up the likelihood surface. $\hookleftarrow$
$P(A\vert B) = P(A \cup B)/P(B)$ $\hookleftarrow$
I normally think about $Y$ as being partially observed (missing values) so I also take the expectation over $Y(2)$ conditioned on $Y(1)$, where (1) means observed and (2) means missing. In Holmes (2010), this is done in order to derive general EM update equations for the missing values case. But my notation is getting hairy, so for this write-up, I’m treating $Y$ as fully observed; so no $Y(2)$ and I’ve dropped the integrals (expectations) over $Y(2)$. $\hookleftarrow$
http://people.missouristate.edu/songfengzheng/Teaching/MTH541/Lecture%20notes/Fisher_info.pdf $\hookleftarrow$
See the Wipedia page on the Multivariate Normal Distribution $\hookleftarrow$
Ng, Krishnan and McLachlan 2004. The EM algorithm. Section 3.5 discusses standard errors approaches PDF http://hdl.handle.net/10419/22198
Efron and Hinkley 1978. Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher Information. They argue that the observed Fisher Information is better than expected Fisher Information in many/some cases. They also argue for the likelihood ratio method for CIs. PDF
Hamilton 1994. PDF
Hamilton’s exposition assumes you know the marginal distribution of a multivariate normal. Scroll down to the bottom of this page to see that: http://fourier.eng.hmc.edu/e161/lectures/gaussianprocess/node7.html
Meilijson 1989. Fast improvement to the EM algorithm on its own terms. PDF
Oakes 1999. Direct calculation of the information matrix via the EM algorithm. PDF
Ho, Shumway and Ombao 2006. The state-space approach to modeling dynamic processes. Chap 7 in Models for Intensive Longitudinal Data. This has a brief statement that Oakes 1999 derivatives are hard to compute. It doesn’t say why. It says nothing about Louis 1982. Link
Louis 1982. Finding the observed information matrix when using the EM algorithm. So elegant. alas, MARSS deals with time series data and that makes this ugly. PDF Another PDF
Lystig and Hughes 2012. Exact computation of the observed information matrix for hidden Markov models. This paper helped me better understand why Louis 1982 is hard for MARSS models. Abstract
Duan and Fulop 2011. A stable estimator for the information matrix under EM for dependent data. This paper also helped me better understand why Louis 1982 is hard for MARSS models. PDF Journal
Naranjo 2007. State-space models with exogenous variables and missing data, PhD U of FL. I didn’t use this directly, but it helped me understand what I did need to use. PDF
Dempster, Laird, Rubin 1977. Maximum likelihood for incomplete data via the EM algorithm. I didn’t really use this but looked up more info on the ‘score’ function Q in this paper, and that helped me understand Louis 1982 better PDF
van Dyk, Meng and Rubin 1995. Maximum likelihood estimation via the ECM algorithm: computing the asymptotic variance. I didn’t use but it looks promising. PDF
Cavanaugh and Shumway 1996. On computing the expected Fisher Information Matrix for state-space model parameters.
Harvey 1989, pages 140-143, Section 3.4.5 Information matrix in Forecasting, structural time series models and the Kalman filter.
This is part of a series on computing the Fisher Information for Multivariate Autoregressive State-Space Models. Part I: Background, Part II: Louis 1982, Part III: Harvey 1989, Background, Part IV: Harvey 1989, Implementation.
Citation: Holmes, E. E. 2016. Notes on computing the Fisher Information matrix for MARSS models. Part I Background. Technical Report. https://doi.org/10.13140/RG.2.2.27306.11204/1
The Fisher Information is defined as
In words, it is the expected value (taken over all possible data) of the square of the gradient (first derivative) of the log likelihood surface at $\theta$. It is a measure of how much information data (from our experiment or monitoring) have about $\theta$. The log-likelihood surface is for a fixed set of data and the $\theta$ vary. The peak is at the MLE, which is not $\theta$, so the surface has some gradient (slope) at $\theta$ since the peak is at the MLE not $\theta$. The Fisher Information is the expected value (over possible data) of those gradients (squared).
It can be shown[1] that the Fisher Information can also be written as
So the Fisher Information is the average (over possible data) convexity of the log-likelihood surface at $\theta$. That doesn’t quite make sense to me. When I imagine the surface, that the convexity at a non-peak value $\theta$ is not intuitively the information. The gradient squared, I understand, but the convexity at a non-peak?
Note, my $y$ should be understood to be some multi-dimensional data set (multiple sites over multiple time points, say), and is comprised of multiple samples. Often in this case Fisher Information is written $I_n(\theta)$ and if the data points are all independent, $I(\theta)=\frac{1}{n} I_n(\theta)$. However I’m not using that notation. My $I(\theta)$ is referring to the Fisher Information for a dataset not individual data points within that data set.
We do not know $\theta$ so we need to use an estimator for the Fisher Information. A common approach is to use $I(\hat{\theta})$, the Fisher Information at the MLE $\theta$ as an estimator of $I(\theta)$ because:
This is called the expected Fisher Information and is computed at the MLE:
That $\vert_{\theta=\hat{\theta}}$ at the end means that after doing the derivative with respect to $\theta$, we replace $\theta$ with $\hat{\theta}$. It would not make sense to do the substitution before since $\hat{\theta}$ is a fixed value and so you cannot take the derivative with respect to it.
This is a viable approach if you can take the derivative of the log-likelihood with respect to $\theta$ and can take the expectation over the data. You could always do that expectation using simulation of course. You just need to be able to simulate data from your model with $\hat{\theta}$.
Another approach is to drop the expectation. This is termed the observed Fisher Information:
where $y$ is the one dataset we collected. The observed Fisher Information is the curvature of the log-likelihood function around the MLE. When you estimate the variance of the MLEs from the Hessian of the log-likelihood (output from say some kind of Newton method or any other algorithm that uses the Hessian of the log-likelihood), then you are using the observed Fisher Information matrix. Efron and Hinkley (1978) (and Fisher they say in their article) say that the observed Fisher Information is a better estimate of the variance of $\hat{\theta}$[2][3], while Cavanaugh and Shumway (1996) show results from MARSS models that indicate that the expected Fisher Information has lower mean squared error (though may be more biased; mean squared error measures both bias and precision).
So how do we compute $I(\hat{\theta})$ or $\mathcal{I}(\hat{\theta},y)$? In particular, I am interested in whether I can use the analytical derivatives of the full log-likelihood that are part of the EM algorithm to compute the Fisher Information. Notes on computing the Fisher Information matrix for MARSS models. Part II.
See any detailed write-up on Fisher Information. For example page 2 of these Lecture Notes on Fisher Information. $\hookleftarrow$
The motivation for computing the Fisher Information is to get an estimate of the variance of $\hat{\theta}$ for standard errors on the parameter estimates, say. $var(\hat{\theta}) \xrightarrow{P} \frac{1}{I(\theta)}$. $\hookleftarrow$
Note I am using the notation of Cavanaugh and Shumway (1996). Efron and Hinkley (1978) use $\mathscr{I}(\theta)$ for the expected Fisher Information and $I(\theta)$ for the observed Fisher Information. Cavanaugh and Shumway (1996) use $I(\theta)$ for the expected Fisher Information and $\mathcal{I}(\theta,Y)$ for the observed Fisher Information. I use the same notation as Cavanaugh and Shumway (1996) except that they use $I_n()$ and $\mathcal{I}_n$ to be explicit that the data have $n$ data points. I drop the $n$ since I am interested in the Fisher Information of the dataset not individual data points and if I need to use the information of the j-th data point, I would just write $I_j()$. The other difference is that I use $y$ to refer to the data. In my notation, $Y$ is the random variable ‘data’ and $y$ is a particular realization of that random variable. In some cases, I use $y(1)$. That is when the random variable $Y$ is only partially observed (meaning there are missing data points or time steps); $y(1)$ is the observed portion of $Y$. $\hookleftarrow$
Efron and Hinkley 1978. Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher Information. This paper argues that the observed Fisher Information is better than expected Fisher Information in many/some cases. The same paper argues for using the likelihood ratio method for CIs. PDF
Cavanaugh and Shumway 1996. On computing the expected Fisher Information Matrix for state-space model parameters.
## adadpted from code by Felix Schonbrodt ## http://www.nicebread.de/finally-tracking-cran-packages-downloads/ ## ====================================================================== ## Step 1: Download all log files ## ====================================================================== # start & end dates 12 months prior to current date this.year = as.numeric(format(Sys.time(), "%Y")) start <- as.Date( paste(this.year-1,"-",format(Sys.time(), "%m-%d"),sep="") ) today <- as.Date(Sys.time()) all_days <- seq(start, today, by = 'day') year <- as.POSIXlt(all_days)$year + 1900 urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz') # only download the files you don't have: missing_days <- setdiff(as.character(all_days), tools::file_path_sans_ext(dir("CRANlogs"), TRUE)) dir.create("CRANlogs") for (i in 1:length(missing_days)) { print(paste0(i, "/", length(missing_days))) download.file(urls[i], paste0('CRANlogs/', missing_days[i], '.csv.gz')) } ## ====================================================================== ## Step 2: Load single data files into one big data.table ## ## NOTE: this step takes FOREVER to run ## ====================================================================== file_list <- list.files("CRANlogs", full.names=TRUE) logs <- list() for (file in file_list) { print(paste("Reading", file, "...")) logs[[file]] <- read.table(file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", as.is=TRUE) } # rbind together all files library(data.table) dat <- rbindlist(logs) # add some keys and define variable types dat[, date:=as.Date(date)] dat[, package:=factor(package)] dat[, country:=factor(country)] dat[, weekday:=weekdays(date)] dat[, week:=strftime(as.POSIXlt(date),format="%Y-%W")] setkey(dat, package, date, week, country) save(dat, file="CRANlogs/CRANlogs.RData") # for later analyses: load the saved data.table # load("CRANlogs/CRANlogs.RData") ## ====================================================================== ## Step 3: Plot results ## ====================================================================== # vector of pkgs to compare pkgs <- c("MARSS","dlm") # vector of plot colors clr <- seq(length(pkgs)) # downloads of selected pkgs by week com1 <- dat[J(pkgs), length(unique(ip_id)), by=c("week", "package")] # total downloads to date com1[, sum(V1), by=package] # cumulative downloads by week com1$C1 <- (com1[, cumsum(V1), by=package])$V1 # nicer form for plotting plotdat <- cast(com1,week ~ package, value="C1") # plot cumulative downloads over time matplot(plotdat, type="l", lty="solid", lwd=2, col=clr, ylab="Cumulative downloads", xlab="Week of 2013") legend(x="topleft", legend=colnames(plotdat)[-1], lty="solid", lwd=2, col=clr)]]>