## Open Position at Carnegie Mellon University

September 28, 2014

### Open Rank, Tenure-Track or Tenured Position

#### Department of Philosophy Carnegie Mellon University

Open Rank, Tenure-Track or Tenured Position, with a preference for junior candidates, beginning August 2015.

The Department welcomes applications from scholars in any branch of mathematical or scientific philosophy; it also invites scholars from related fields (e.g., computer science, mathematics, statistics, psychology, linguistics, or economics), whose work bears upon or is motivated by methodological or foundational issues.

Responsibilities. Exemplary research and publication, teaching two courses per semester (4/yr), graduate student supervision, and some committee work.

Carnegie Mellon is an equal opportunity, affirmative action employer with particular interest in identifying women, minority, individuals with disabilities and veteran applicants for faculty positions.

Deadline for applications is November 1, 2014. Applicants should electronically submit an application letter, C.V., description of research plans, research writing sample, and summary of teaching experience (preferably as a single PDF file) to:

phil-search-open@andrew.cmu.edu

For junior applicants, at least three confidential letters of reference should also be forwarded to this email address. Questions and inquiries may be directed to this email address as well.

## Letters

September 26, 2014

Several philosophers have written to ask,

Has Brian Leiter ever threatened legal action against you for your criticisms of the PGR?

No,  just insults. However, it is my understanding that he tried to lean on Columbia statistician Drew Gelman over this criticism and discussion of the PGR.  How he came to think that would work is beyond me.

Have you followed the recent call to replace Leiter as editor of the PGR? A growing consensus seems to be that the PGR is useful but just needs new management, whereas a minority of people like you and Richard Heck argue that the PGR is a flawed idea.

The genius of Leiter’s defense of the PGR has always been to make it about critics not liking where they came out in the ranking, sliding right past engagement with any and all substantive questions to do with whether the PGR made sense to begin with.  And let’s face it, that strategy worked. The community now accepts the idea that philosophy faculty are things to be ranked, that students benefit from doing this every two years, and that the PGR is a sound instrument to serve this goal.   That is embarrassing.

-Gregory Wheeler

## EBM+ blog

September 25, 2014

Readers of this blog may be interested in the blog of the EBM+ consortium, accessible at www.ebmplus.org

EBM+ is a consortium whose members are keen to develop the methods of evidence-based medicine to better handle evidence of mechanisms.

## 2014 Ig Nobel Prize Highlights

September 20, 2014

ECONOMICS PRIZE [ITALY]: ISTAT — the Italian government’s National Institute of Statistics, for proudly taking the lead in fulfilling the European Union mandate for each country to increase the official size of its national economy by including revenues from prostitution, illegal drug sales, smuggling, and all other unlawful financial transactions between willing participants.

NEUROSCIENCE PRIZE [CHINA, CANADA]: Jiangang Liu, Jun Li, Lu Feng, Ling Li, Jie Tian, and Kang Lee, for trying to understand what happens in the brains of people who see the face of Jesus in a piece of toast.

PSYCHOLOGY PRIZE [AUSTRALIA, UK, USA]: Peter K. Jonason, Amy Jones, and Minna Lyons, for amassing evidence that people who habitually stay up late are, on average, more self-admiring, more manipulative, and more psychopathic than people who habitually arise early in the morning.

PUBLIC HEALTH PRIZE [CZECH REPUBLIC, JAPAN, USA, INDIA]: Jaroslav Flegr, Jan Havlíček and Jitka Hanušova-Lindova, and to David Hanauer, Naren Ramakrishnan, Lisa Seyfried, for investigating whether it is mentally hazardous for a human being to own a cat.

ARCTIC SCIENCE PRIZE [NORWAY, GERMANY]: Eigil Reimers and Sindre Eftestøl, for testing how reindeer react to seeing humans who are disguised as polar bears.

The full list of winners is here.

## Philosophy Security Advisory System II

September 10, 2014

.  We at PSAS central are raising the PSAS alert to HIGH.

Since the introduction of PSAS, the terrorist threat to philosophers has evolved from an epistemologically challenged terror organization located in the ungovernable Great Lakes region of North America to a loosely organized confederation of Overly Sensitive Internet Stormtroopers (OSIS).  OSIS has staged a series of spectacular coordinated attacks, prompting us to update our advisory system.

During this LEVEL ORANGE alert, philosophers are advised to stay calm and to avoid keyboards.  To stay safe, remember: If you see something, say nothing.

## Minds and Machines 24(3) 2014

August 12, 2014
 A Taxonomy of Errors for Information Systems Giuseppe Primiero
 Practical Intractability: A Critique of the Hypercomputation Movement Aran Nayebi
 The Logic of Knowledge and the Flow of Information Simon D’Alfonso
 From Interface to Correspondence: Recovering Classical Representations in a Pragmatic Theory of Semantic Information Orlin Vakarelov
 Smooth Yet Discrete: Modeling Both Non-transitivity and the Smoothness of Graded Categories With Discrete Classification Rules Bert Baumgaertner
 Book Review Alvin Plantinga: Where the Conflict Really Lies: Science, Religion, and Naturalism Bradford McCall
 Book Review Pete Mandik: This is Philosophy of Mind: An Introduction Matteo Colombo

## An Introduction to Likelihoodist, Bayesian, and Frequentist Methods (2/2)

August 6, 2014

(Cross-posted from gandenberger.org)

### Introduction

##### My goal in this post and the previous one in this series is to provide a short, self-contained introduction to likelihoodist, Bayesian, and frequentist methods that is readily available online and accessible to someone with no special training who wants to know what all the fuss is about.

In the previous post in this series, I gave a motivating example that illustrates the enormous costs of the failure of philosophers, statisticians, and scientists to reach consensus on a reasonable, workable approach to statistical inference. I then used a fictitious variant on that example to illustrate how likelihoodist, Bayesian, and frequentist methods work in a simple case.

In this post, I discuss a stranger case that better illustrates how likelihoodist, Bayesian, and frequentist methods come apart. This post is considerably more technical than the previous one, and I fear that those with no special training will find it tough going. I would love to get feedback on how I can make it more accessible.

For those who want to go deeper into these topics, the first chapter of Elliott Sober’s Evidence and Evolution would be a great next step. Royall (1997), Howson and Urbach (2006), and Mayo (1996) provide good contemporary defenses of likelihoodist, Bayesian, and frequentist methods, respectively.

### Review

Statistical inference is an attempt to evaluate a set of probabilistic hypotheses about the behavior of some data-generating mechanism. It is perhaps the most tractable and well-studied kind of inductive inference.

The three leading approaches to statistical inference are Bayesian, likelihoodist, and frequentist. All three use likelihood functions, where the likelihood function for a datum $E$ on a set of hypotheses H is $\Pr(E|H)$ (the probability of $E$ given $H$) considered is a function of $H$ as it varies over the set H. However, they use likelihood functions in different ways and for different immediate purposes. Likelihoodists and Bayesians use them in ways that conform to the Likelihood Principle, according to which the evidential meaning of $E$ with respect to H depends only on the likelihood function of $E$ on H, while frequentists use them in ways that violate the Likelihood Principle (see Gandenberger 2014).

Likelihoodists use likelihood functions to characterize data as evidence. Their primary interpretive tool is the Law of Likelihood, which says that $E$ favors $H_1$ over $H_2$ if and only if their likelihood ratio $\mathcal{L}=\Pr(E|H_1)/\Pr(E|H_2)$ on $E$ is greater than 1, with $E$ measuring the degree of favoring. Two major advantages of this approach are (1) it conforms to the Likelihood Principle and (2) it uses only the quantity $\mathcal{L}$, which is often objective because scientists often consider hypotheses that entail particular probability distributions over possible observations—for instance, the hypothesis that the mean of a normal distribution with a particular variance is zero. Even when the likelihood function is not objective, it is often easier to evaluate in a way that produces a fair degree of intersubjective agreement than the prior probabilities that Bayesians use. The great weakness of the likelihoodist approach is that it only yields a measure of evidential favoring, and not any immediate guidance about what one should believe or do.

Bayesians use likelihood functions to update probability distributions in accordance with Bayes’s theorem. Their approach fits nicely with the likelihoodist approach in that the ratio of the “posterior probabilities” (that is, the probabilities after updating on the evidence) $\Pr(H_1|E)/\Pr(H_2|E)$ on $E$ equals the ratio of the prior probabilities $\Pr(H_1)/\Pr(H_2)$ times the likelihood ratio $\mathcal{L}=\Pr(E|H_1)/\Pr(E|H_2)$. The Bayesian approach conforms to the Likelihood Principle, and unlike the likelihoodist approach it can be used directly to decide what to believe or do. Its great weakness is that using it requires supplying prior probabilities, which are generally based on either an individual’s subjective opinions or some objective but contentious formal rule that is intended to represent a neutral perspective.

Frequentists use likelihood functions to design experiments that are in some sense guaranteed to perform well in repeated applications in the long run, no matter what the truth may be. Frequentist tests, for instance, control both the probability of rejecting the “null hypothesis” if it is true (often at the 5% level) and the probability of failing to reject it if it is false to a degree that one would hate to miss (often at the 20% level). They violate the Likelihood Principle, but they provide immediate guidance for belief or action without appealing to a prior probability distribution.

### A Strange Example

Warning: I am about to describe an example that is difficult to understand without some specialized training. If you get lost, you can skip to where it says “upshot,” which tells you everything you need to know for the rest of the post.

Suppose we were to take a series of observations from a normal distribution with unknown mean and known positive variance. In other words, suppose we were to take a series of observations at random from a population that follows a “bell-shaped curve,” and we know the size and shape of the curve but not the location of its center. Suppose further that instead of deciding in advance on a fixed number of observations to take, we decided to keep sampling until the average observed value $\bar{x}$ was a certain distance from zero, where that distance started at some contant $k$ times the square root of the variance and decreased at the rate $1/\sqrt{n}$ as the sample size $n$ increased. Armitage (1961) pointed out that two things will happen in such an experiment:

• The experiment will end “almost surely” after a finite number of observations, no matter what the true mean may be. That is, the probability that the experiment goes on forever, with the mean of the observed values never getting far enough from zero to end the experiment, is zero. (It does not follow that it is impossible for the experiment to go on forever—it is possible get an endless string of 0 observations, for instance—hence the phrase “almost surely.”)
• When the experiment ends, the likelihood ratio for the hypothesis $H_{\bar{X}}$ that the true mean is the observed sample mean against the hypothesis $H_0$ that the true mean is zero on the observed data will be at least $e^{\frac{1}{2}k^2}$.
##### Upshot: Given enough time and resources, it is possible to design an experiment that will with probability one yield a result that according to the Law of Likelihood favors some hypothesis over a particular hypothesis $H_0$ to whatever degree one likes, even if $H_0$ is true.

Caveat: No one would ever run this experiment, and the average number of observations required to get a high degree of evidential favoring is enormous. Thus, one might be inclined to dismiss this example as irrelevant to statistical practice. It is nevertheless useful for illustrating and pressing on the principles that underlie Bayesian, frequentist, and likelihoodist approaches to statistical inference.

Note: Following standard notation in statistics, I use $\bar{X}$ to refer to the sample mean as a random variable and $\bar{x}$ to refer to the particular realized value of that random variable.

### A Likelihoodist Take on the Strange Example

This example looks bad for likelihoodists. It shows that they are committed to the possibility of an experiment that has probability one of producing evidence that is as misleading as one likes with respect to the comparison between $H_{\bar{X}}$ and $H_0$. Frequentists avoid such possibilities: their primary aim is to control the probability that a given experiment will yield a misleading result. The great frequentist statistician David Cox went so far as to claim that “it might be argued that” this example “is enough to refute” the Likelihood Principle (2006).

Let us not be too hasty, however. The experiment has probability one of producing evidence that favors some hypothesis over $H_0$ to whatever degree one likes, even if $H_0$ is true. It does not have probability one of producing evidence that favors any particular hypothesis over $H_0$ to any particular degree. In fact, if $H_0$ is true, then the probability that any experiment produces evidence that favors any particular alternative hypothesis $H_a$ over $H_0$ to degree $k$ is at most $1/k$ (Royall 2000).

The fact that this experiment has probability one of producing evidence that favors some hypothesis over $H_0$ to some degree according to the Law of Likelihood even if $H_0$ is not a point against the Law of Likelihood. Even perfectly ordinary experiments do that, and it is clear that they do so not because the Law of Likelihood is wrong but because the evidence they produce is bound to be at least slightly misleading. Consider an experiment that involves taking a fixed number of observations from a normal distribution with unknown mean and known variance. The probability that the sample mean will be exactly equal to the mean of the distribution is zero, simply because the distribution is continuous. The Law of Likelihood will say that the evidence favors the hypothesis that the true mean equals the sample mean over the hypothesis that it equals zero even if it does in fact equal zero. But we are not inclined to reject the Law of Likelihood on those grounds: it seems to be correctly characterizing the evidential meaning of (probably only slightly) misleading data.

What makes the Armitage example apparently more problematic is that it has probability one of producing evidence that favors some hypothesis over $H_0$ to whatever degree one likes, even if $H_0$ is true. Thus, it seems to allow one to create not just misleading evidence, but arbitrarily highly misleading evidence at will, from the perspective of someone who accepts the Law of Likelihood. But this gloss on what the example shows is selective and misleading. The evidence is arbitrarily misleading with respect to the comparison between the random hypothesis $H_{\bar{X}}$ an $H_0$, if $H_0$ is true. But it is not arbitrarily misleading with respect to the difference between the mean posited by the most favored hypothesis $H_{\bar{x}}$ and the true mean. In fact, it merely trades off one dimension of misleadingness against another: as one increases the degree to which the evidence is guaranteed to favor $H_{\bar{X}}$ over $H_0$, one thereby decreases the expected difference between the final sample mean $\bar{x}$ and the true mean of 0.

In the absence of any principled way to weigh misleadingness along one dimension against misleadingness along the other, there is no principled argument for the claim (nor is it intuitively clear) that the Armitage example is any more misleading for those who accept the Law of Likelihood than the perfectly ordinary fixed-sample-size experiment that no one takes to refute the Law of Likelihood. Thus, it is at least unclear that the Armitage example refutes the Law of Likelihood either.

This example does, however, illustrate the point that it would be a mistake to adopt an unqualified rule of rejecting any hypothesis $H_0$ against any other hypothesis $H_1$ if and only if the degree to which one’s total evidence favors $H_1$ over $H_0$ exceeds some threshold. More generally, it does not seem to be possible to provide good norms of belief or action on the basis of likelihood functions alone, as I argue here. Relating likelihood functions to belief or action in a general way that conforms to the Likelihood Principle seems to require appealing to prior probabilities, as a Bayesian would do.

### A Bayesian Take on the Strange Example

Armitage has provided a recipe for producing evidence with an arbitrarily large likelihood ratio $\Pr(E|H_{\bar{X}})/\Pr(E|H_0)$ even when $H_0$ is true. Bayesian updating on new evidence has the effect of multiplying the ratio of the probabilities for a pair of hypotheses by their likelihood function on that evidence. That is, in this case, $\Pr(H_{\bar{x}}|E)/\Pr(H_0|E)=\Pr(H_{\bar{x}})/\Pr(H_0)\times\Pr(E|H_{\bar{x}})/\Pr(E|H_0)$. Doesn’t the Armitage example thus provide a recipe for producing an arbitrarily large posterior probability ratio $\Pr(H_{\bar{X}}|E)/\Pr(H_0|E)$ on the Bayesian approach?

No. There are two problems. First, because the mean of the distribution is a continuous parameter, a Bayesian is likely to have credence zero in both the realized value of $H_{\bar{x}}$ and $H_0$. We should be dealing with probability distributions rather than discrete probability functions. (See previous post.) Second, the probability density at $H_{\bar{x}}$ varies with $\bar{x}$. Because proper probability distributions integrate to one, the ratio $p(H_{\bar{x}})/p(H_0)$ of the prior probability densities has to be less than $c$ for some $\bar{x}$ and any constant $c$, provided that $p(H_0)$ is not zero. Thus, the Armitage example does not provide a recipe for producing an arbitrarily large ratio of posterior probability density values $p(H_{\bar{x}}|E)/p(H_0|E)$ on the Bayesian approach.

The Armitage example does not even provide a recipe for causing the probability the Bayesians assigns to $H_0$ to decrease. That probability will decrease if and only if the Bayesian likelihood ratio $p(\bar{x}|H_0)/p(\bar{x}|\neg H_0)$ is less than one. (This likelihood ratio is Bayesian because $p(\bar{x}|\neg H_0)$ depends on a prior probability distribution over the possible true mean values. It is a ratio of probability densities because the sample space is discrete. This fact raises some technical issues, but we need not worry about them here—see Hacking 1965 57, 66-70; Berger and Wolpert 1988, 32-6; and Pawitan 2001, 23-4.) This result is not inevitable, and indeed is guaranteed to have probability less than one if $H_0$ is true. Moreover, the expected value of that likelihood ratio is guaranteed to be less than one if $H_0$ is true (Pawitan 2001, 239).

The Armitage example does provide a recipe for causing the probability density ratio $p(H_{\mu_0})/p(H_0)$ to increase by any factor one likes for some hypothesis $H_{\mu_0}$ positing a particular value $\mu_0$ other than 0 for the mean of the distribution, even if $H_0$ is true, provided that the probability density function is positive everywhere, but not for any particular value. However, it is not clear that a Bayesian should be troubled by this result. If he or she puts positive prior probability on $H_0$ and a continuous prior probability distribution everywhere else, then $p(H_{\mu_0})/\Pr(H_0)$ will remain zero. If he or she puts positive probability on $H_0$ and on some countable number of alternatives to $H_0$, then it is not inevitable that the result of the experiment will favor any of those alternatives over $H_0$. (The axioms of probability prohibit putting positive probability on an uncountable number of alternatives.) If he or she does not put positive probability on $H_0$, then he or she has no reason to be particularly concerned about the possibility of being misled with respect to $H_0$ and some alternative to it.

See Basu (1975, 43-7) for further discussion.

### A Frequentist Take on the Strange Example

The chief difference between frequentist treatments of the Armitage example, on the one hand, and Bayesian and likelihoodist treatments, on the other hand, is that frequentists maintain that the fact that the experiment has a bizarre stopping rule and the fact that the hypothesis $H_{\bar{x}}$ was not designated for consideration independently of the data are relevant to what one can say about $H_{\bar{x}}$ in relation to $H_0$ in light of the experiment’s outcome. Neither of those facts make a difference to the likelihood function, so neither of them make a difference to what one can say about $H_{\bar{x}}$ in relation to $H_{0}$ on a likelihoodist or Bayesian approach, or on any other approach that conforms to the Likelihood Principle. However, they do make a difference to long-run error rates with respect to $H_{\bar{X}}$ and $H_{0}$, and thus to what one can say about $H_{\bar{x}}$ in relation to $H_{0}$ on a frequentist approach that is designed to control long-run error rates.

A frequentist would typically refuse to say anything about $H_{\bar{x}}$ in relation to $H_0$ in light of the outcome of an instance of the Armitage experiment. He or she would insist that if one wanted to test $H_0$ against $H_{\bar{x}}$, then one would have to start over with a procedure that controlled long-run error rates with respect to those particular, fixed hypotheses. Some frequentists make some allowances for hypotheses that are not predesignated (e.g. Mayo 1996, Ch. 9), but they would never allow a procedure such as one that says to reject $H_0$ in favor of $H_{\bar{x}}$ if and only if the likelihood ratio of the latter to the former exceeds some threshold that have probability one of rejecting $H_0$ even if it is true. Violations of predesignation are permitted if at all only when the probability of erroneously rejecting the null hypothesis is kept suitably low.

A frequentist could draw conclusions about a fixed pair of hypotheses from an experiment with Armitage’s bizarre stopping rule. They would reject a fixed null hypothesis against a fixed alternative if and only if the likelihood ratio of the latter against the former exceeded some constant threshold chosen to keep the probability of rejecting the null hypothesis if it is false acceptably low. The likelihood ratio would depend not only on the observed sample mean, but also on the number of observations. Such a test is sensible from Bayesian and likelihoodist perspectives. In testing one point hypothesis against another, frequentists respect the Likelihood Principle within but not across experiments; they use likelihood-ratio cutoffs in the tests they sanction, but they allowing their cutoffs to vary across experiments involving the same hypotheses in the same decision-theoretic context and do not allow any conclusions to be drawn at all when predesignation requirements are grossly violated.

There is something intuitively strange about the idea that facts about stopping rules and predesignation are relevant to what conclusions one would be warranted in drawing from an experimental outcome. It seems natural to think that the degree to which data warrant a conclusion is a relation between the data and the conclusion only. From a frequentist perspective, it also depends on what the intentions of the experimenters were regarding when to end the experiment and which hypotheses to consider. The dependency on stopping rules is particularly strange: it makes the conclusions one may draw from the data depend on counterfactuals about what the experimenters would have done if the data had been different. How could such counterfactuals about the experimenter’s behavior be relevant to the significance of the actual data for the hypotheses in question? (See Mayo 1996, Ch. 10 for a frequentist response to this objection.)

### Conclusion

Some frequentists consider the strange example discussed here to be a counterexample to the Likelihood Principle. However, I have argued that likelihoodist and Bayesian treatments of it are defensible, whereas frequentist violations of the Likelihood Principle are problematic.