The Philosophical Gourmet Report (PGR) purports to be a ranking of faculty reputation based on the opinions of “research-active” faculty, which it surely is. But the legitimacy of the PGR as an accurate measure of opinion relies in part on how well the PGR pool of evaluators reflects the population of research-active faculty. Addressing this point, Brian Leiter remarks (here) on the selection procedure for the PGR evaluator pool:
Evaluators were selected with an eye to balance, in terms of area, age and educational background—though since, in all cases, the opinions of research-active faculty were sought, there was, necessarily, a large number of alumni of the top programs represented. Approximately half those surveyed were philosophers who had filled out the surveys in previous years; the other half were nominated by members of the Advisory Board, who picked research-active faculty in their fields.
However, the PGR evaluator pool is not balanced with respect to educational background, and the claim that the educational imbalance observed in the 2011 PGR is a necessary consequence of soliciting the opinion of research-active faculty is false.
Now, to criticize the composition of the PGR evaluator pool is not to criticize the individual members making up the pool. The focus of the discussion at C&I (here, here, here, here, and even here; see also Andrew Gelman’s post here) has been on methodology. The point instead is this: if you are interested in an accurate picture of professional opinion but oversubscribe from some parts of the profession and systematically omit other parts altogether, you might end up with an accurate assessment. But knowing this much about your methods, there is no reason to believe that you will. That, in a nutshell, is the nature of the PGR sampling problem.
The PGR sampling problem is not breaking news. It has been a stubbornly durable feature of the survey since its inception, as Richard Heck pointed out long ago, and Zachery Ernst has elaborated on—remarking, and I paraphrase, that the snowball sampling method used by the PGR, while suitable for surveying pimps and dope dealers, is indefensible for academics practicing philosophy out in the open.
What might be new is a data set for the 2011 PGR rater pool for you to see for yourself, and a visualization of the degree of the educational background bias within the PGR rater pool, presented below.
We start with Kieran Healy’s excellent series on the 2006 PGR data set, which provides several insights into how to think about the data for the PGR. In those posts he is pretty careful to signal that his remarks are descriptive (i.e., based on the properties of the rater pool) and not inferential (i.e., representative of corresponding properties in the profession), but he does consider some objections to using the PGR to draw such inferences and the bulk of the posts are taken up with investigating whether the data set can rule those objections out. And in some cases, the data does just that. Although it would be better to move to an open data model for the PGR, these posts are the next best thing.
In his Ratings and Specialties post, Professor Healy looks at various ways to measure “cross-field” consensus. In one exercise he reconstructs the PGR ranking from the point of view of each specialty. What would the PGR ranking of departments look like if just the Ethicists were in charge? What would it look like if just the Philosophers of Language were? The Kantians? And so on. Those rankings were then aggregated to see how much variation there is across the specialties, which gives a sense of how much (or how little) consensus there is across specialties. The box and whisker plots (top 25: .png, .pdf; total population: .png, .pdf) give a picture of this. Except for the top 6 departments, and a handfull rounding out the bottom, the answer is that the rankings vary quite a lot: people vote according to whom they recognize, and invariably those are the people working in their area(s) of specialization(s). The flip side of this appears to be a version of the closed world assumption: if a rater hasn’t heard of you, then you probably aren’t worth hearing from. This was my favorite of Healy’s posts, although I suspect that the observed volatility is attenuated by the unrepresentative composition of the evaluator pool. So, let’s turn to that.
Healy considers whether voting patterns in the 2006 data are correlated to the “social location” of the evaluators, which is their place of employment or “Home” institution. But, another question is to examine whether the voting patterns are correlated to the “imprinting location” of raters (i.e., where they earned their PhDs), once the cross-specialty volatility is controlled for. Through some reverse engineering of the data that is available, it would be surprising if Ph.D. institution and voting patterns did not turn out to be correlated. In short, while it may be true that where you stand does not depend on where you sit, it still may be that where you stand depends on from whom you learned to stand on your own.
This point can be illustrated by comparing two graphs. This first graph, with the blue ink blots, overlays the distribution of the Home Institutions in the 2011 PGR pool with Healy’s PCA of the 2006 data set. In Healy’s terms, what this graph does is overlay the distribution of “social locations” (but for the 2011 data set) over his heuristic for viewing how departments and subfields differ from one another (based on his 2006 analysis).
This picture is consistent with a happy view of the PGR, for it paints the picture of a wide and reasonably diverse sample of the profession. Even the centers of concentration of the raters are outside the center of the plot, which further contributes to a sense of balance. I suspect that something like this is what some have in mind when they counter that the PGR simply tells it like it is, like it or not, and is what bucks up their confidence to label dissenters as idiosyncratic cranks on the fringe of the field longing to be accepted by the mainstream. Although more telling of the critic than those criticized, one can be forgiven for thinking that there is some truth behind the bluster.
But that case starts to crumble when one takes a look instead at where the raters earned their PhDs. For here we see an unusually high concentration around a small cluster of universities, and this fact runs counter to the claim that the evaluator pool is “educationally balanced”.
Here we see not only a high concentration around a small number of imprinter institutions, but a cluster around the center of the PCA. This invites several questions, most of which are unanswerable without open PGR datasets. One immediate question in this context is the degree to which the oversubscription of a handful of ‘imprinter institutions’ is driving this PCA analysis and the clustering of green ink blots around the center, as this seems to be evidence for an effect from the educational imbalance in the evaluator pool.
To put a finer point on this, there are 299  evaluators and 126 universities represented in the 2011 PGR rater pool—113 individual Home institutions (blue) and 58 PhD institutions (green). But of the 736 individual rankings submitted for the 33 areas of specialization covered by the PGR, half (48.5%) were submitted by alumni from just 8 universities.
Now, you might maintain that a hard-nosed look at the research-active faculty will bear out that concentration, and that it will also bear out the 8 institutions in that set. But there are gaps in that set and holes in that argument—holes that appear even by the PGR’s own measure. Exploring that point is next.
– Gregory Wheeler
 Healy’s caveats apply here, as well as the additional warning that I am overlaying the 2011 rater pool with the departmental positions determined by the 2006 data. Leiter reports that the PGR is remarkably stable, so this shouldn’t be too far off from a new PCA constructed from the 2011 data.
 Note that evaluators are prohibited from directly ranking both their home institution and their PhD institution, but this does not eliminate the effects of an unbalanced evaluator pool.
 There are some discrepancies on the PGR website between the main list of evaluators and the evaluators listed for the individual specialties. The total number of evaluators that I can account for in the specialty rankings is 299, but Leiter’s list of “nearly 270 evaluators who completed the overall faculty quality survey” numbers 302, and there is one missing. Professor Leiter was contacted about discrepancies on the PGR website but did not reply.