Category Archives: Quantum Quandaries

Quantum Times Article on the PBR Theorem

I recently wrote an article (pdf) for The Quantum Times (Newsletter of the APS Topical Group on Quantum Information) about the PBR theorem. There is some overlap with my previous blog post, but the newsletter article focuses more on the implications of the PBR result, rather than the result itself. Therefore, I thought it would be worth reproducing it here. Quantum types should still download the original newsletter, as it contains many other interesting things, including an article by Charlie Bennett on logical depth (which he has also reproduced over at The Quantum Pontiff). APS members should also join the TGQI, and if you are at the March meeting this week, you should check out some of the interesting sessions they have organized.

Note: Due to the appearance of this paper, I would weaken some of the statements in this article if I were writing it again. The results of the paper imply that the factorization assumption is essential to obtain the PBR result, so this is an additional assumption that needs to be made if you want to prove things like Bell’s theorem directly from psi-ontology rather than using the traditional approach. When I wrote the article, I was optimistic that a proof of the PBR theorem that does not require factorization could be found, in which case teaching PBR first and then deriving other results like Bell as a consequence would have been an attractive pedagogical option. However, due to the necessity for stronger assumptions, I no longer think this.

OK, without further ado, here is the article.

PBR, EPR, and all that jazz

In the past couple of months, the quantum foundations world has been abuzz about a new preprint entitled “The Quantum State Cannot be Interpreted Statistically” by Matt Pusey, Jon Barrett and Terry Rudolph (henceforth known as PBR). Since I wrote a blog post explaining the result, I have been inundated with more correspondence from scientists and more requests for comment from science journalists than at any other point in my career. Reaction to the result amongst quantum researchers has been mixed, with many people reacting negatively to the title, which can be misinterpreted as an attack on the Born rule. Others have managed to read past the title, but are still unsure whether to credit the result with any fundamental significance. In this article, I would like to explain why I think that the PBR result is the most significant constraint on hidden variable theories that has been proved to date. It provides a simple proof of many other known theorems, and it supercharges the EPR argument, converting it into a rigorous proof of nonlocality that has the same status as Bell’s theorem. Before getting to this though, we need to understand the PBR result itself.

What are Quantum States?

One of the most debated issues in the foundations of quantum theory is the status of the quantum state. On the ontic view, quantum states represent a real property of quantum systems, somewhat akin to a physical field, albeit one with extremely bizarre properties like entanglement. The alternative to this is the epistemic view, which sees quantum states as states of knowledge, more akin to the probability distributions of statistical mechanics. A psi-ontologist
(as supporters of the ontic view have been dubbed by Chris Granade) might point to the phenomenon of interference in support of their view, and also to the fact that pretty much all viable realist interpretations of quantum theory, such as many-worlds or Bohmian mechanics, include an ontic state. The key argument in favor of the epistemic view is that it dissolves the measurement problem, since the fact that states undergo a discontinuous change in the light of measurement results does not then imply the existence of any real physical process. Instead, the collapse of the wavefunction is more akin to the way that classical probability distributions get updated by Bayesian conditioning in the light of new data.

Many people who advocate a psi-epistemic view also adopt an anti-realist or neo-Copenhagen point of view on quantum theory in which the quantum state does not represent knowledge about some underlying reality, but rather it only represents knowledge about the consequences of measurements that we might make on the system. However, there remained the nagging question of whether it is possible in principle to construct a realist interpretation of quantum theory that is also psi-epistemic, or whether the realist is compelled to think that quantum states are real. PBR have answered this question in the negative, at least within the standard framework for hidden variable theories that we use for other no go results such as Bell’s theorem. As with Bell’s theorem, there are loopholes, so it is better to say that PBR have placed a strong constraint on realist psi-epistemic interpretations, rather than ruling them out entirely.

The PBR Result

To properly formulate the result, we need to know a bit about how quantum states are represented in a hidden variable theory. In such a theory, quantum systems are assumed to have real pre-existing properties that are responsible for determining what happens when we make a measurement. A full specification of these properties is what we mean by an ontic state of the system. In general, we don’t have precise control over the ontic state so a quantum state corresponds to a probability distribution over the ontic states. This framework is illustrated below.

Representation of a quantum state in an ontic model

In an ontic model, a quantum state (indicated heuristically on the left as a vector in the Bloch sphere) is represented by a probability distribution over ontic states, as indicated on the right.

A hidden variable theory is psi-ontic if knowing the ontic state of the system allows you to determine the (pure) quantum state that was prepared uniquely. Equivalently, the probability distributions corresponding to two distinct pure states do not overlap. This is illustrated below.

Psi-ontic model

Representation of a pair of quantum states in a psi-ontic model

A hidden variable theory is psi-epistemic if it is not psi-ontic, i.e. there must exist an ontic state that is possible for more than one pure state, or, in other words, there must exist two nonorthogonal pure states with corresponding distributions that overlap. This is illustrated below.

Psi-epistemic model

Representation of nonorthogonal states in a psi-epistemic model

These definitions of psi-ontology and psi-epistemicism may seem a little abstract, so a classical analogy may be helpful. In Newtonian mechanics the ontic state of a particle is a point in phase space, i.e. a specification of its position and momentum. Other ontic properties of the particle, such as its energy, are given by functions of the phase space point, i.e. they are uniquely determined by the ontic state. Likewise, in a hidden variable theory, anything that is a unique function of the ontic state should be regarded as an ontic property of the system, and this applies to the quantum state in a psi-ontic model. The definition of a psi-epistemic model as the negation of this is very weak, e.g. it could still be the case that most ontic states are only possible in one quantum state and just a few are compatible with more than one. Nonetheless, even this very weak notion is ruled out by PBR.

The proof of the PBR result is quite simple, but I will not review it here because it is summarized in my blog post and the original paper is also very readable. Instead, I want to focus on its implications.

Size of the Ontic State Space

A trivial consequence of the PBR result is that the cardinality of the ontic state space of any hidden variable theory, even for just a qubit, must be infinite, in fact continuously so. This is because there must be at least one ontic state for each quantum state, and there are a continuous infinity of the latter. The fact that there must be infinite ontic states was previously proved by Lucien Hardy under the name “Ontological Excess Baggage theorem”, but we can now
view it as a corollary of PBR. If you think about it, this property is quite surprising because we can only extract one or two bits from a qubit (depending on whether we count superdense coding) so it would be natural to assume that a hidden variable state could be specified by a finite amount of information.

Hidden variable theories provide one possible method of simulating a quantum computer on a classical computer by simply tracking the value of the ontic state at each stage in the computation. This enables us to sample from the probability distribution of any quantum measurement at any point during the computation. Another method is to simply store a representation of the quantum state at each point in time. This second method is clearly inefficient, as the number of parameters required to specify a quantum state grows exponentially with the number of qubits. The PBR theorem tells us that the hidden variable method cannot be any better, as it requires an ontic state space that is at least as big as the set of quantum states. This conclusion was previously drawn by Alberto Montina using different methods, but again it now becomes a corollary of PBR. This result falls short of saying that any classical simulation of a quantum computer must have exponential space complexity, since we usually only have to simulate the outcome of one fixed measurement at the end of the computation and our simulation does not have to track the slice-by-slice causal evolution of the quantum circuit. Indeed, pretty much the first nontrivial result in quantum computational complexity theory, proved by Bernstein and Vazirani, showed that quantum circuits can be simulated with polynomial memory resources. Nevertheless, this result does reaffirm that we need to go beyond slice-by-slice simulations of quantum circuits in looking for efficient classical algorithms.

Supercharged EPR Argument

As emphasized by Harrigan and Spekkens, a variant of the EPR argument favoured by Einstein shows that any psi-ontic hidden variable theory must be nonlocal. Thus, prior to Bell’s theorem, the only open possibility for a local hidden variable theory was a psi-epistemic theory. Of course, Bell’s theorem rules out all local hidden variable theories, regardless of the status of the quantum state within them. Nevertheless, the PBR result now gives an arguably simpler route to the same conclusion by ruling out psi-epistemic theories, allowing us to infer nonlocality directly from EPR.

A sketch of the argument runs as follows. Consider a pair of qubits in the singlet state. When one of the qubits is measured in an orthonormal basis, the other qubit collapses to one of two orthogonal pure states. By varying the basis that the first qubit is measured in, the second qubit can be made to collapse in any basis we like (a phenomenon that Schroedinger called “steering”). If we restrict attention to two possible choices of measurement basis, then there are
four possible pure states that the second qubit might end up in. The PBR result implies that the sets of possible ontic states for the second system for each of these pure states must be disjoint. Consequently, the sets of possible ontic states corresponding to the two distinct choices of basis are also disjoint. Thus, the ontic state of the second system must depend on the choice of measurement made on the first system and this implies nonlocality because I can decide which measurement to perform on the first system at spacelike separation from the second.

PBR as a proto-theorem

We have seen that the PBR result can be used to establish some known constraints on hidden variable theories in a very straightforward way. There is more to this story that I can possibly fit into this article, and I suspect that every major no-go result for hidden variable theories may fall under the rubric of PBR. Thus, even if you don’t care a fig about fancy distinctions between ontic and epistemic states, it is still worth devoting a few braincells to the PBR result. I predict that it will become viewed as the basic result about hidden variable theories, and that we will end up teaching it to our students even before such stalwarts as Bell’s theorem and Kochen-Specker.

Further Reading

For further details of the PBR theorem see:

For constraints on the size of the ontic state space see:

For the early quantum computational complexity results see:

For a fully rigorous version of the PBR+EPR nonlocality argument see:

Can the quantum state be interpreted statistically?

A new preprint entitled The Quantum State Cannot be Interpreted Statistically by Pusey, Barrett and Rudolph (henceforth known as PBR) has been generating a significant amount of buzz in the last couple of days. Nature posted an article about it on their website, Scott Aaronson and Lubos Motl blogged about it, and I have been seeing a lot of commentary about it on Twitter and Google+. In this post, I am going to explain the background to this theorem and outline exactly what it entails for the interpretation of the quantum state. I am not going to explain the technicalities in great detail, since these are explained very clearly in the paper itself. The main aim is to clear up misconceptions.

First up, I would like to say that I find the use of the word “Statistically” in the title to be a rather unfortunate choice. It is liable to make people think that the authors are arguing against the Born rule (Lubos Motl has fallen into this trap in particular), whereas in fact the opposite is true.  The result is all about reproducing the Born rule within a realist theory.  The question is whether a scientific realist can interpret the quantum state as an epistemic state (state of knowledge) or whether it must be an ontic state (state of reality). It seems to show that only the ontic interpretation is viable, but, in my view, this is a bit too quick. On careful analysis, it does not really rule out any of the positions that are advocated by contemporary researchers in quantum foundations. However, it does answer an important question that was previously open, and confirms an intuition that many of us already held. Before going into more detail, I also want to say that I regard this as the most important result in quantum foundations in the past couple of years, well deserving of a good amount of hype if anything is. I am not sure I would go as far as Antony Valentini, who is quoted in the Nature article saying that it is the most important result since Bell’s theorem, or David Wallace, who says that it is the most significant result he has seen in his career. Of course, these two are likely to be very happy about the result, since they already subscribe to interpretations of quantum theory in which the quantum state is ontic (de Broglie-Bohm theory and many-worlds respectively) and perhaps they believe that it poses more of a dilemma for epistemicists like myself then it actually does.

Classical Ontic States

Before explaining the result itself, it is important to be clear on what all this epistemic/ontic state business is all about and why it matters. It is easiest to introduce the distinction via a classical example, for which the interpretation of states is clear. Therefore, consider the Newtonian dynamics of a single point particle in one dimension. The trajectory of the particle can be determined by specifying initial conditions, which in this case consists of a position \(x(t_0)\) and momentum \(p(t_0)\) at some initial time \(t_0\). These specify a point in the particle’s phase space, which consists of all possible pairs \((x,p)\) of positions and momenta.

Classical Ontic State

The ontic state space for a single classical particle, with the initial ontic state marked.

Then, assuming we know all the relevant forces, we can compute the position and momentum \((x(t),p(t))\) at some other time \(t\) using Newton’s laws or, equivalently, Hamilton’s equations. At any time \(t\), the phase space point \((x(t),p(t))\) can be thought of as the instantaneous state of the particle. It is clearly an ontic state (state of reality), since the particle either does or does not possess that particular position and momentum, independently of whether we know that it possesses those values[1]. The same goes for more complicated systems, such as multiparticle systems and fields. In all cases, I can derive a phase space consisting of configurations and generalized momenta. This is the space of ontic states for any classical system.

Classical Epistemic States

Although the description of classical mechanics in terms of ontic phase space trajectories is clear and unambiguous, we are often, indeed usually, more interested in tracking what we know about a system. For example, in statistical mechanics, we may only know some macroscopic properties of a large collection of systems, such as pressure or temperature. We are interested in how these quantities change over time, and there are many different possible microscopic trajectories that are compatible with this. Generally speaking, our knowledge about a classical system is determined by assigning a probability distribution over phase space, which represents our uncertainty about the actual point occupied by the system.

A classical epistemic state

An epistemic state of a single classical particles. The ellipses represent contour lines of constant probability.

We can track how this probability distribution changes using Liouville’s equation, which is derived by applying Hamilton’s equations weighted with the probability assigned to each phase space point. The probability distribution is pretty clearly an epistemic state. The actual system only occupies one phase space point and does not care what probability we have assigned to it. Crucially, the ontic state occupied by the system would be regarded as possible by us in more than one probability distribution, in fact it is compatible with infinitely many.

Overlapping epistemic states

Epistemic states can overlap, so each ontic state is possible in more than one epistemic state. In this diagram, the two phase space axes have been schematically compressed into one, so that we can sketch the probability density graphs of epistemic states. The ontic state marked with a cross is possible in both epistemic states sketched on the graph.

Quantum States

We have seen that there are two clear notions of state in classical mechanics: ontic states (phase space points) and epistemic states (probability distributions over the ontic states). In quantum theory, we have a different notion of state — the wavefunction — and the question is: should we think of it as an ontic state (more like a phase space point), an epistemic state (more like a probability distribution), or something else entirely?

Here are three possible answers to this question:

  1. Wavefunctions are epistemic and there is some underlying ontic state. Quantum mechanics is the statistical theory of these ontic states in analogy with Liouville mechanics.
  2. Wavefunctions are epistemic, but there is no deeper underlying reality.
  3. Wavefunctions are ontic (there may also be additional ontic degrees of freedom, which is an important distinction but not relevant to the present discussion).

I will call options 1 and 2 psi-epistemic and option 3 psi-ontic. Advocates of option 3 are called psi-ontologists, in an intentional pun coined by Chris Granade. Options 1 and 3 share a conviction of scientific realism, which is the idea that there must be some description of what is going on in reality that is independent of our knowledge of it. Option 2 is broadly anti-realist, although there can be some subtleties here[2].

The theorem in the paper attempts to rule out option 1, which would mean that scientific realists should become psi-ontologists. I am pretty sure that no theorem on Earth could rule out option 2, so that is always a refuge for psi-epistemicists, at least if their psi-epistemic conviction is stronger than their realist one.

I would classify the Copenhagen interpretation, as represented by Niels Bohr[3], under option 2. One of his famous quotes is:

There is no quantum world. There is only an abstract physical description. It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we can say about nature…[4]

and “what we can say” certainly seems to imply that we are talking about our knowledge of reality rather than reality itself. Various contemporary neo-Copenhagen approaches also fall under this option, e.g. the Quantum Bayesianism of Carlton Caves, Chris Fuchs and Ruediger Schack; Anton Zeilinger’s idea that quantum physics is only about information; and the view presently advocated by the philosopher Jeff Bub. These views are safe from refutation by the PBR theorem, although one may debate whether they are desirable on other grounds, e.g. the accusation of instrumentalism.

Pretty much all of the well-developed interpretations that take a realist stance fall under option 3, so they are in the psi-ontic camp. This includes the Everett/many-worlds interpretation, de Broglie-Bohm theory, and spontaneous collapse models. Advocates of these approaches are likely to rejoice at the PBR result, as it apparently rules out their only realist competition, and they are unlikely to regard anti-realist approaches as viable.

Perhaps the best known contemporary advocate of option 1 is Rob Spekkens, but I also include myself and Terry Rudolph (one of the authors of the paper) in this camp. Rob gives a fairly convincing argument that option 1 characterizes Einstein’s views in this paper, which also gives a lot of technical background on the distinction between options 1 and 2.

Why be a psi-epistemicist?

Why should the epistemic view of the quantum state should be taken seriously in the first place, at least seriously enough to prove a theorem about it? The most naive argument is that, generically, quantum states only predict probabilities for observables rather than definite values. In this sense, they are unlike classical phase space points, which determine the values of all observables uniquely. However, this argument is not compelling because determinism is not the real issue here. We can allow there to be some genuine stochasticity in nature whilst still maintaining realism.

An argument that I personally find motivating is that quantum theory can be viewed as a noncommutative generalization of classical probability theory, as was first pointed out by von Neumann. My own exposition of this idea is contained in this paper. Even if we don’t always realize it, we are always using this idea whenever we generalize a result from classical to quantum information theory. The idea is so useful, i.e. it has such great explanatory power, that it would be very puzzling if it were a mere accident, but it does appear to be just an accident in most psi-ontic interpretations of quantum theory.  For example, try to think about why quantum theory should be formally a generalization of probability theory from a many-worlds point of view.  Nevertheless, this argument may not be compelling to everyone, since it mainly entails that mixed states have to be epistemic. Classically, the pure states are the extremal probability distributions, i.e. they are just delta functions on a single ontic state. Thus, they are in one-to-one correspondence with the ontic states. The same could be true of pure quantum states without ruining the analogy[5].

A more convincing argument concerns the instantaneous change that occurs after a measurement — the collapse of the wavefunction. When we acquire new information about a classical epistemic state (probability distribution) say by measuring the position of a particle, it also undergoes an instantaneous change. All the weight we assigned to phase space points that have positions that differ from the measured value is rescaled to zero and the rest of the probability distribution is renormalized. This is just Bayesian conditioning. It represents a change in our knowledge about the system, but no change to the system itself. It is still occupying the same phase space point as it was before, so there is no change to the ontic state of the system. If the quantum state is epistemic, then instantaneous changes upon measurement are unproblematic, having a similar status to Bayesian conditioning. Therefore, the measurement problem is completely dissolved within this approach.

Finally, if we allow a more sophisticated analogy between quantum states and probabilities, in particular by allowing constraints on how much may be known and allowing measurements to locally disturb the ontic state, then we can qualitatively explain a large number of phenomena that are puzzing for a psi-ontologist very simply within a psi-epistemic approach. These include: teleportation, superdense coding, and much of the rest of quantum information theory. Crucially, it also includes interference, which is often held as a convincing reason for psi-ontology. This was demonstrated in a very convincing way by Rob Spekkens via a toy theory, which is recommended reading for all those interested in quantum foundations. In fact, since this paper contains the most compelling reasons for being a psi-epistemicist, you should definitely make sure you read it so that you can be more shocked by the PBR result.

Ontic models

If we accept that the psi-epistemic position is reasonable, then it would be superficially resonable to pick option 1 and try to maintain scientific realism. This leads us into the realm of ontic models for quantum theory, otherwise known as hidden variable theories[6]. A pretty standard framework for discussing such models has existed since John Bell’s work in the 1960’s, and almost everyone adopts the same definitions that were laid down then. The basic idea is that systems have properties. There is some space \(\Lambda\) of ontic states, analogous to the phase space of a classical theory, and the system has a value \(\lambda \in \Lambda\) that specifies all its properties, analogous to the phase space points. When we prepare a system in some quantum state \(\Ket{\psi}\) in the lab, what is really happening is that an ontic state \(\lambda\) is sampled from a probability distribution over \(\mu(\lambda)\) that depends on \(\Ket{\psi}\).

Representation of a quantum state in an ontic model

In an ontic model, a quantum state (indicated heuristically on the left as a vector in the Bloch sphere) is represented by a probability distribution over ontic states, as indicated on the right.

We also need to know how to represent measurements in the model[7].  For each possible measurement that we could make on the system, the model must specify the outcome probabilities for each possible ontic state.  Note that we are not assuming determinism here.  The measurement is allowed to be stochastic even given a full specification of the ontic state.  Thus, for each measurement \(M\), we need a set of functions \(\xi^M_k(\lambda)\) , where \(k\) labels the outcome.  \(\xi^M_k(\lambda)\) is the probability of obtaining outcome \(k\) in a measurement of \(M\) when the ontic state is \(\lambda\).  In order for these probabilities to be well defined the functions \(\xi^M_k\) must be positive and they must satisfy \(\sum_k \xi^M_k(\lambda) = 1\) for all \(\lambda \in \Lambda\). This normalization condition is very important in the proof of the PBR theorem, so please memorize it now.

Overall, the probability of obtaining outcome \(k\) in a measurement of \(M\) when the system is prepared in state \(\Ket{\psi}\) is given by

\[\mbox{Prob}(k|M,\Ket{\psi}) = \int_{\Lambda} \xi^M_k(\lambda) \mu(\lambda) d\lambda, \]
which is just the average of the outcome probabilities over the ontic state space.

If the model is going to reproduce the predictions of quantum theory, then these probabilities must match the Born rule.  Suppose that the \(k\)th outcome of \(M\) corresponds to the projector \(P_k\).  Then, this condition boils down to

\[\Bra{\psi} P_k \Ket{\psi} = \int_{\Lambda} \xi^M_k(\lambda) \mu(\lambda) d\lambda,\]

and this must hold for all quantum states, and all outcomes of all possible measurements.

Constraints on Ontic Models

Even disregarding the PBR paper, we already know that ontic models expressible in this framework have to have a number of undesirable properties. Bell’s theorem implies that they have to be nonlocal, which is not great if we want to maintain Lorentz invariance, and the Kochen-Specker theorem implies that they have to be contextual. Further, Lucien Hardy’s ontological excess baggage theorem shows that the ontic state space for even a qubit would have to have infinite cardinality. Following this, Montina proved a series of results, which culminated in the claim that there would have to be an object satisfying the Schrödinger equation present within the ontic state (see this paper). This latter result is close to the implication of the PBR theorem itself.

Given these constraints, it is perhaps not surprising that most psi-epistemicists have already opted for option 2, denouncing scientific realism entirely. Those of us who cling to realism have mostly decided that the ontic state must be a different type of object than it is in the framework described above.  We could discard the idea that individual systems have well-defined properties, or the idea that the probabilities that we assign to those properties should depend only on the quantum state. Spekkens advocates the first possibility, arguing that only relational properties are ontic. On the other hand, I, following Huw Price, am partial to the idea of epistemic hidden variable theories with retrocausal influences, in which case the probability distributions over ontic states would depend on measurement choices as well as which quantum state is prepared. Neither of these possibilities are ruled out by the previous results, and they are not ruled out by PBR either. This is why I say that their result does not rule out any position that is seriously held by any researchers in quantum foundations. Nevertheless, until the PBR paper, there remained the question of whether a conventional psi-epistemic model was possible even in principle. Such a theory could at least have been a competitor to Bohmian mechanics. This possibility has now been ruled out fairly convincingly, and so we now turn to the basic idea of their result.

The Result

Recall from our classical example that each ontic state (phase space point) occurs in the support of more than one epistemic state (Liouville distribution), in fact infinitely many. This is just because probability distributions can have overlapping support. Now, consider what would happen if we restricted the theory to only allow epistemic states with disjoint support. For example, we could partition phase space into a number of disjoint cells and only consider probability distributions that are uniform over one cell and zero everywhere else.

Restricted classical theory

A restricted classical theory in which only the distributions indicated are allowed as epistemic states. In this case, each ontic state is only possible in one epistemic state, so it is more accurate to say that the epistemic states represent a property of the ontic state.

Given this restriction, the ontic state determines the epistemic state uniquely. If someone tells you the ontic state, then you know which cell it is in, so you know what the epistemic state must be. Therefore, in this restricted theory, the epistemic state is not really epistemic. Its image is contained in the ontic state, and it would be better to say that we were talking about a property of the ontic state, rather than something that represents knowledge. According to the PBR result, this is exactly what must happen in any ontic model of quantum theory within the Bell framework.

Here is the analog of this in ontic models of quantum theory.  Suppose that two nonorthogonal quantum states \(\Ket{\psi_1}\) and \(\Ket{\psi_2}\) are represented as follows in an ontic model:

Psi-epistemic model

Representation of nonorthogonal states in a psi-epistemic model

Because the distributions overlap, there are ontic states that are compatible with more than one quantum states, so this is a psi-epistemic model.

In contrast, if, for every pair of quantum states \(\Ket{\psi_1},\Ket{\psi_2}\), the probability distributions do not overlap, i.e. the representation of each pair looks like this

Psi-ontic model

Representation of a pair of quantum states in a psi-ontic model

then the quantum state is uniquely determined by the ontic state, and it is therefore better regarded as a property of \(\lambda\) rather than a representation of knowledge.  Such a model is psi-ontic.  The PBR theorem states that all ontic models that reproduce the Born rule must be psi-ontic.

Sketch of the proof

In order to establish the result, PBR make use of the following idea. In an ontic model, the ontic state \(\lambda\) determines the probabilities for the outcomes of any possible measurement via the functions \(\xi^M_k\). The Born rule probabilities must be obtained by averaging these conditional probabilities with respect to the probability distribution \(\mu(\lambda)\) representing the quantum state. Suppose there is some measurement \(M\) that has an outcome \(k\) to which the quantum state \(\Ket{\psi}\) assigns probability zero according to the Born rule. Then, it must be the case that \(\xi^M_k(\lambda) = 0\) for every \(\lambda\) in the support of \(\mu(\lambda)\). Now consider two quantum states \(\Ket{\psi_1}\) and \(\Ket{\psi_2}\) and suppose that we can find a two outcome measurement such that that the first state gives zero Born rule probability to the first outcome and the second state gives zero Born rule probability to the second outcome. Suppose also that there is some \(\lambda\) that is in the support of both the distributions, \(\mu_1\) and \(\mu_2\), that represent \(\Ket{\psi_1}\) and \(\Ket{\psi_2}\) in the ontic model. Then, we must have \(\xi^M_1(\lambda) = \xi^M_2(\lambda) = 0\), which contradicts the normalization assumption \(\xi^M_1(\lambda) + \xi^M_2(\lambda) = 1\).

Now, it is fairly easy to see that there is no such measurement for a pair of nonorthogonal states, because this would mean that they could be distinguished with certainty, so we do not have a result quite yet. The trick to get around this is to consider multiple copies. Consider then, the four states \(\Ket{\psi_1}\otimes\Ket{\psi_1}, \Ket{\psi_1}\otimes\Ket{\psi_2}, \Ket{\psi_2}\otimes\Ket{\psi_1}\) and \(\Ket{\psi_2}\otimes\Ket{\psi_2}\) and suppose that there is a four outcome measurement such that \(\Ket{\psi_1}\otimes\Ket{\psi_1}\) gives zero probability to the first outcome, \(\Ket{\psi_1}\otimes\Ket{\psi_2}\) gives zero probability to the second outcome, and so on. In addition to this, we make an independence assumption that the probability distributions representing these four states must satisfy. Let \(\lambda\) be the ontic state of the first system and let \(\lambda’\) be the ontic state of the second. The independence assumption states that the probability densities representing the four quantum states in the ontic model are \(\mu_1(\lambda)\mu_1(\lambda’), \mu_1(\lambda)\mu_2(\lambda’), \mu_2(\lambda)\mu_1(\lambda’)\) and \(\mu_2(\lambda)\mu_2(\lambda’)\). This is a reasonable assumption because there is no entanglement between the two systems and we could do completely independent experiments on each of them. Assuming there is an ontic state \(\lambda\) in the support of both \(\mu_1\) and \(\mu_2\), there will be some nonzero probability that both systems occupy this ontic state whenever any of the four states are prepared. But, in this case, all four functions \(\xi^M_1,\xi^M_2,\xi^M_3\) and \(\xi^M_4\) must have value zero when both systems are in this state, which contradicts the normalization \(\sum_k \xi^M_k = 1\).

This argument works for the pair of states \(\Ket{\psi_1} = \Ket{0}\) and \(\Ket{\psi_2} = \Ket{+} = \frac{1}{\sqrt{2}} \left ( \Ket{0} + \Ket{1}\right )\). In this case, the four outcome measurement is a measurement in the basis:

\[\Ket{\phi_1} = \frac{1}{\sqrt{2}} \left ( \Ket{0}\otimes\Ket{1} + \Ket{1} \otimes \Ket{0} \right )\]
\[\Ket{\phi_2} = \frac{1}{\sqrt{2}} \left ( \Ket{0}\otimes\Ket{-} + \Ket{1} \otimes \Ket{+} \right )\]
\[\Ket{\phi_3} = \frac{1}{\sqrt{2}} \left ( \Ket{+}\otimes\Ket{1} + \Ket{-} \otimes \Ket{0} \right )\]
\[\Ket{\phi_4} = \frac{1}{\sqrt{2}} \left ( \Ket{+}\otimes\Ket{-} + \Ket{-} \otimes \Ket{+} \right ),\]

where \(\Ket{-} = \frac{1}{\sqrt{2}} \left ( \Ket{0} – \Ket{1}\right )\). It is easy to check that \(\Ket{\phi_1}\) is orthogonal to \(\Ket{0}\otimes\Ket{0}\), \(\Ket{\phi_2}\) is orthogonal to \(\Ket{0}\otimes\Ket{+}\), \(\Ket{\phi_3}\) is orthogonal to \(\Ket{+}\otimes\Ket{0}\), and \(\Ket{\phi_4}\) is orthogonal to \(\Ket{+}\otimes\Ket{+}\). Therefore, the argument applies and there can be no overlap in the probability distributions representing \(\Ket{0}\) and \(\Ket{+}\) in the model.

To establish psi-ontology, we need a similar argument for every pair of states \(\Ket{\psi_1}\) and \(\Ket{\psi_2}\). PBR establish that such an argument can always be made, but the general case is more complicated and requires more than two copies of the system. I refer you to the paper for details where it is explained very clearly.


The PBR theorem rules out psi-epistemic models within the standard Bell framework for ontological models. The remaining options are to adopt psi-ontology, remain psi-epistemic and abandon realism, or remain psi-epistemic and abandon the Bell framework. One of the things that a good interpretation of a physical theory should have is explanatory power. For me, the epistemic view of quantum states is so explanatory that it is worth trying to preserve it. Realism too is something that we should not abandon too hastily. Therefore, it seems to me that we should be questioning the assumptions of the Bell framework by allowing more general ontologies, perhaps involving relational or retrocausal degrees of freedom. At the very least, this option is the path less travelled, so we might learn something by exploring it more thoroughly.

  1. There are actually subtleties about whether we should think of phase space points as instantaneous ontic states. For one thing, the momentum depends on the first derivative of position, so maybe we should really think of the state being defined on an infinitesimal time interval. Secondly, the fact that momentum appears is because Newtonian mechanics is defined by second order differential equations. If it were higher order then we would have to include variables depending on higher derivatives in our definition of phase space. This is bad if you believe in a clean separation between basic ontology and physical laws. To avoid this, one could define the ontic state to be the position only, i.e. a point in configuration space, and have the boundary conditions specified by the position of the particle at two different times. Alternatively, one might regard the entire spacetime trajectory of the particle as the ontic state, and regard the Newtonian laws themselves as a mere pattern in the space of possible trajectories. Of course, all these descriptions are mathematically equivalent, but they are conceptually quite different and they lead to different intuitions as to how we should understand the concept of state in quantum theory. For present purposes, I will ignore these subtleties and follow the usual practice of regarding phase space points as the unambiguous ontic states of classical mechanics. []
  2. The subtlety is basically a person called Chris Fuchs. He is clearly in the option 2 camp, but claims to be a scientific realist. Whether he is successful at maintaining realism is a matter of debate. []
  3. Note, this is distinct from the orthodox interpretation as represented by the textbooks of Dirac and von-Neumann, which is also sometimes called the Copenhagen interpretation. Orthodoxy accepts the eigenvalue-eigenstate link.  Observables can sometimes have definite values, in which case they are objective properties of the system. A system has such a property when it is in an eigenstate of the corresponding observable. Since every wavefunction is an eigenstate of some observable, it follows that this is a psi-ontic view, albeit one in which there are no additional ontic degrees of freedom beyond the quantum state. []
  4. Sourced from Wikiquote. []
  5. but note that the resulting theory would essentially be the orthodox interpretation, which has a measurement problem. []
  6. The terminology “ontic model” is preferred to “hidden variable theory” for two reasons. Firstly, we do not want to exclude the case where the wavefunction is ontic, but there are no extra degrees of freedom (as in the orthodox interpretation). Secondly, it is often the case that the “hidden” variables are the ones that we actually observe rather than the wavefunction, e.g. in Bohmian mechanics the particle positions are not “hidden”. []
  7. Generally, we would need to represent dynamics as well, but the PBR theorem does not depend on this. []

The Choi-Jamiolkowski Isomorphism: You’re Doing It Wrong!

As the dear departed Quantum Pontiff used to say: New Paper Dance! I am pretty happy that this one has finally been posted because it is my first arXiv paper since I returned to work, and also because it has gone through more rewrites than Spiderman: The Musical.

What is the paper about, I hear you ask? Well, mathematically, it is about an extremely simple linear algebra trick called the Choi-Jamiolkwoski isomorphism. This is actually two different results: the Choi isomorphism and the Jamiolkowski isomorphism, but people have a habit of lumping them together. This trick is so extremely well-known to quantum information theorists that it is not even funny. One of the main points of the paper is that you should think about what the isomorphism means physically in a new way. Hence the “you’re doing it wrong” in the post title.

First Level Isomorphisms

For the uninitiated, here is the simplest way of describing the Choi isomorphism in a single equation:
\[\Ket{j}\Bra{k} \qquad \qquad \equiv \qquad \qquad \Ket{j} \otimes \Ket{k},\]
i.e. the ismomorphism works by turning a bra into a ket. The thing on the left is an operator on a Hilbert space \(\mathcal{H}\) and the thing on the right is a vector in \(\mathcal{H} \otimes \mathcal{H}\), so the isomorphism says that \(\mathcal{L}(\mathcal{H}) \equiv \mathcal{H} \otimes \mathcal{H}\), where \(\mathcal{L}(\mathcal{H})\) is the space of linear operators on \(\mathcal{H}\).

Here is how it works in general. If you have an operator \(U\) then you can pick a basis for \(\mathcal{H}\) and write \(U\) in this basis as
\[U = \sum_{j,k} U_{j,k} \Ket{j}\Bra{k},\]
where \(U_{j,k} = \Bra{j}U\Ket{k}\). Then you just extend the above construction by linearity and write down a vector
\[\Ket{\Phi_U} = \sum_{j,k} U_{j,k} \Ket{j} \otimes \Ket{k}.\]
It is pretty obvious that we can go in the other direction as well, starting with a vector on \(\mathcal{H}\otimes\mathcal{H}\), we can write it out in a product basis, turn the second ket into a bra, and then we have an operator.

So far, this is all pretty trivial linear algebra, but when we think about what this means physically it is pretty weird. One of the things that is represented by an operator in quantum theory is dynamics, in particular a unitary operator represents the dynamics of a closed system for a discrete time-step. One of the things that is represented by a vector on a tensor product Hilbert space is a pure state of a bipartite system. It is fairly easy to see that (up to normalization) unitary operators get mapped to maximally entangled states under the isomorphism, so, in some sense, a maximally entangled state is “the same thing” as a unitary operator. This is weird because there are some things that make sense for dynamical operators that don’t seem to make sense for states and vice-versa. For example, dynamics can be composed. If \(U\) represents the dynamics from \(t_0\) to \(t_1\) and \(V\) represents the dynamics from \(t_1\) to \(t_2\), then the dynamics from \(t_0\) to \(t_2\) is represented by the product \(VU\). Using the isomorphism, we can define a composition for states, but what on earth does this mean?

Before getting on to that, let us briefly pause to consider the Jamiolkowski version of the isomorphism. The Choi isomorphism is basis dependent. You get a slightly different state if you write down the operator in a different basis. To make things basis independent, we replace \(\mathcal{H}\otimes\mathcal{H}\) by \(\mathcal{H}\otimes\mathcal{H}^*\). \(\mathcal{H}^*\) denotes the dual space to \(\mathcal{H}\), i.e. it is the space of bras instead of the space of kets. In Dirac notation, the Jamiolkwoski isomorphism looks pretty trivial. It says
\[\Ket{j}\Bra{k} \qquad \qquad \equiv \qquad \qquad \Ket{j} \otimes \Bra{k}.\]
This is axiomatic in Dirac notation, because we always assume that tensor product symbols can be omitted without changing anything. However, this version of the isomorphism is going to become important later.

Conventional Interpretation: Gate Teleportation

In quantum information, the Choi isomorphism is usually interpreted in terms of “gate teleportation”. To understand this, we first reformulate the isomorphism slightly. Let \(\Ket{\Phi^+}_{AA’} = \sum_j \Ket{jj}_{AA’}\), where \(A\) and \(A’\) are quantum systems with Hilbert spaces of the same dimension. The vectors \(\Ket{j}\) form a preferred basis, and this is the basis in which the Choi isomorphism is going to be defined. Note that \(\Ket{\Phi^+}_{AA’}\) is an (unnormalized) maximally entangled state. It is easy to check that the isomorphism can now be reformulated as
\[\Ket{\Phi_U}_{AA’} = I_A \otimes U_A’ \Ket{\Phi^+}_{AA’},\]
where \(I_A\) is the identity operator on system \(A\). The reverse direction of the isomorphism is given by
\[U_A \Ket{\psi}\Bra{\psi}_A U_A^{\dagger} = \Bra{\Phi^+}_{A’A”} \left ( \Ket{\psi}\Bra{\psi}_{A”} \otimes \Ket{\Phi_U}\Bra{\Phi_U}_{A’A} \right )\Ket{\Phi^+}_{A’A”},\]
where \(A^{\prime\prime}\) is yet another quantum system with the same Hilbert space as \(A\).

Now let’s think about the physical interpretation of the reverse direction of the isomorphism. Suppose that \(U\) is the identity. In that case, \(\Ket{\Phi_U} = \Ket{\Phi^+}\) and the reverse direction of the isomorphism is easily recognized as the expression for the output of the teleportation protocol when the \(\Ket{\Phi^+}\) outcome is obtained in the Bell measurement. It says that \(\Ket{\psi}\) gets teleported from \(A^{\prime\prime}\) to \(A\). Of course, this outcome only occurs some of the time, with probability \(1/d\), where \(d\) is the dimension of the Hilbert space of \(A\), a fact that is obscured by our decision to use an unnormalized version of \(\Ket{\Phi^+}\).

Now, if we let \(U\) be a nontrivial unitary operator then the reverse direction of the isomorphism says something more interesting. If we use the state \(\Ket{\Phi_U}\) rather than \(\Ket{\Phi^+}\) as our resource state in the teleportation protocol, then, upon obtaining the \(\Ket{\Phi^+}\) outcome in the Bell measurement, the output of the protocol will not simply be the input state \(\Ket{\psi}\), but it will be that state with the unitary \(U\) applied to it. This is called “gate teleportation”. It has many uses in quantum computing. For example, in linear optics implementations, it is impossible to perform every gate in a universal set with 100% probability. To avoid damaging your precious computational state, you can apply the indeterministic gates to half of a maximally entangled state and keep doing so until you get one that succeeds. Then you can teleport your computational state using the resulting state as a resource and end up applying the gate that you wanted. This allows you to use indeterministic gates without having to restart the computation from the beginning every time one of these gates fails.

Using this interpretation of the isomorphism, we can also come up with a physical interpretation of the composition of two states. It is basically a generalization of entanglement swapping. If you take \(\Ket{\Phi_U}\) and \(\Ket{\Phi_{V}}\) and and perform a Bell measurement across the output system of the first and the input system of the second then, upon obtaining the \(\Ket{\Phi^+}\) outcome, you will have the state \(\Ket{\Phi_{UV}}\). In this way, you can perform your entire computational circuit in advance, before you have access to the input state, and then just teleport your input state into the output register as the final step.

In this way, the Choi isomorphism leads to a correspondence between a whole host of protocols involving gates and protocols involving entangled states. We can also define interesting properties of operations, such as the entanglement of an operation, in terms of the states that they correspond to. We then use the isomoprhism to give a physical meaning to these properties in terms of gate teleportation. However, one weak point of the correspondence is that it transforms something deterministic; the application of a unitary operation; into something indeterministic; getting the \(\Ket{\Phi^+}\) outcome in a Bell measurement. Unlike the teleportation protocol, gate teleportation cannot be made deterministic by applying correction operations for the other outcomes, at least not if we want these corrections to be independent of \(U\). The states you get for the other outcomes involve nasty things like \(U^*, U^T, U^\dagger\) applied to \(\Ket{\psi}\), depending on exactly how you construct the Bell basis, e.g. choice of phases. These can typically not be corrected without applying \(U\). In particular, that would screw things up in the linear optics application wherein \(U\) can only be implemented non-deterministically.

Before turning to our alternative interpretation of Choi-Jamiolkowski, let’s generalize things a bit.

Second Level Isomorphisms

In quantum theory we don’t just have pure states, but also mixed states that arise if you have uncertainty about which state was prepared, or if you ignore a subsystem of a larger system that is in a pure state. These are described by positive, trace-one, operators, denoted \(\rho\), called density operators. Similarly, dynamics does not have to be unitary. For example, we might bring in an extra system, interact them unitarily, and then trace out the extra system. These are described by Completely-Positive, Trace-Preserving (CPT) maps, denoted \(\mathcal{E}\). These are linear maps that act on the space of operators, i.e. they are operators on the space of operators, and are often called superoperators.

Now, the set of operators on a Hilbert space is itself a Hilbert space with inner product \(\left \langle N, M \right \rangle = \Tr{N^{\dagger}M}\). Thus, we can apply Choi-Jamiolkowski on this space to define a correspondence between superoperators and operators on the tensor product. We can do this in terms of an orthonormal operator basis with respect to the trace inner product, but it is easier to just give the teleportation version of the isomorphism. We will also generalize slightly to allow for the possibility that the input and output spaces of our CPT map may be different, i.e. it may involve discarding a subsystem of the system we started with, or bringing in extra ancillary systems.

Starting with a CPT map \(\mathcal{E}_{B|A}: \mathcal{L}(\mathcal{H}_A) \rightarrow \mathcal{L}(\mathcal{H}_B)\) from system \(A\) to system \(B\), we can define an operator on \(\mathcal{H}_A \otimes \mathcal{H}_B\) via
\[\rho_{AB} = \mathcal{E}_{B|A’} \otimes \mathcal{I}_{A} \left ( \Ket{\Phi^+}\Bra{\Phi^+}_{AA’}\right ),\]
where \(\mathcal{I}_A\) is the identity superoperator. This is a positive operator, but it is not quite a density operator as it satisfies \(\PTr{B}{\rho_{AB}} = I_A\), which implies that \(\PTr{AB}{\rho_{AB}} = d\) rather than \(\PTr{AB}{\rho_{AB}} = 1\). This is analogous to using unnormalized states in the pure-state case. The reverse direction of the isomorphism is then given by
\[\mathcal{E}_{B|A} \left ( \sigma_A \right ) = \Bra{\Phi^+}_{A’A}\sigma_{A’} \otimes \rho_{AB}\Ket{\Phi^+}_{A’A}.\]
This has the same interpretation in terms of gate teleportation (or rather CPT-map teleportation) as before.

The Jamiolkowski version of this isomorphism is given by
\[\varrho_{AB} = \mathcal{E}_{B|A’} \otimes \mathcal{I}_{A} \left ( \Ket{\Phi^+}\Bra{\Phi^+}_{AA’}^{T_A}\right ),\]
where \(^T_A\) denotes the partial transpose in the basis used to define \(\Ket{\Phi^+}\). Although it is not obvious from this formula, this operator is independent of the choice of basis, as \(\Ket{\Phi^+}\Bra{\Phi^+}_{AA’}^{T_A}\) is actually the same operator for any choice of basis. I’ll keep the reverse direction of the isomorphism a secret for now, as it would give a strong hint towards the punchline of this blog post.

Probability Theory

I now want to give an alternative way of thinking about the isomorphism, in particular the Jamiolkowski version, that is in many ways conceptually clearer than the gate teleportation interpretation. The starting point is the idea that quantum theory can be viewed as a noncommutative generalization of classical probability theory. This idea goes back at least to von Neumann, and is at the root of our thinking in quantum information theory, particularly in quantum Shannon theory. The basic idea of the generalization is that that probability distributions \(P(X)\) get mapped to density operators \(\rho_A\) and sums over variables become partial traces. Therefore, let’s start by thinking about whether there is a classical analog of the isomorphism, and, if so, what its interpretation is.

Suppose we have two random variables, \(X\) and \(Y\). We can define a conditional probability distribution of \(Y\) given \(X\), \(P(Y|X)\), as a positive function of the two variables that satisfies \(\sum_Y P(Y|X) = 1\) independently of the value of \(X\). Given a conditional probability distribution and a marginal distribution, \(P(X)\), for \(X\), we can define a joint distribution via
\[P(X,Y) = P(Y|X)P(X).\]
Conversely, given a joint distribution \(P(X,Y)\), we can find the marginal \(P(X) = \sum_Y P(X,Y)\) and then define a conditional distribution
\[P(Y|X) = \frac{P(X,Y)}{P(X)}.\]
Note, I’m going to ignore the ambiguities in this formula that occur when \(P(X)\) is zero for some values of \(X\).

Now, suppose that \(X\) and \(Y\) are the input and output of a classical channel. I now want to think of the probability distribution of \(Y\) as being determined by a stochastic map \(\Gamma_{Y|X}\) from the space of probability distributions over \(X\) to the space of probability distributions over \(Y\). Since \(P(Y) = \sum_{X} P(X,Y)\), this has to be given by
\[P(Y) = \Gamma_{Y|X} \left ( P(X)\right ) = \sum_X P(Y|X) P(X),\]
\[\Gamma_{Y|X} \left ( \cdot \right ) = \sum_{X} P(Y|X) \left ( \cdot \right )\].

What we have here is a correspondence between a positive function of two variables — the conditional proabability distribution — and a linear map that acts on the space of probability distributions — the stochastic map. This looks analogous to the Choi-Jamiolkowski isomorphism, except that, instead of a joint probability distribution, which would be analogous to a quantum state, we have a conditional probability distribution. This suggests that we made a mistake in thinking of the operator in the Choi isomorphism as a state. Maybe it is something more like a conditional state.

Conditional States

Let’s just plunge in and make a definition of a conditional state, and then see how it makes sense of the Jamiolkowski isomorphism. For two quantum systems, \(A\) and \(B\), a conditional state of \(B\) given \(A\) is defined to be a positive operator \(\rho_{B|A}\) on \(\mathcal{H}_A \otimes \mathcal{H}_B\) that satisfies
\[\PTr{B}{\rho_{B|A}} = I_A.\]
This is supposed to be analogous to the condition \(\sum_Y P(Y|X) = 1\). Notice that this is exactly how the operators that are Choi-isomorphic to CPT maps are normalized.

Given a conditional state, \(\rho_{B|A}\), and a reduced state \(\rho_A\), I can define a joint state via
\[\rho_{AB} = \sqrt{\rho_A} \rho_{B|A} \sqrt{\rho_A},\]
where I have suppressed the implicit \(\otimes I_B\) required to make the products well defined. The conjugation by the square root ensures that \(\rho_{AB}\) is positive, and it is easy to check that \(\PTr{AB}{\rho_{AB}} = 1\).

Conversely, given a joint state, I can find its reduced state \(\rho_A = \PTr{B}{\rho_{AB}}\) and then define the conditional state
\[\rho_{B|A} = \sqrt{\rho_A^{-1}} \rho_{AB} \sqrt{\rho_A^{-1}},\]
where I am going to ignore cases in which \(\rho_A\) has any zero eigenvalues so that the inverse is well-defined (this is no different from ignoring the division by zero in the classical case).

Now, suppose you are given \(\rho_A\) and you want to know what \(\rho_B\) should be. Is there a linear map that tells you how to do this, analogous to the stochastic map \(\Gamma_{Y|X}\) in the classical case? The answer is obviously yes. We can define a map \(\mathfrak{E}_{B|A}: \mathcal{L} \left ( \mathcal{H}_A\right ) \rightarrow \mathcal{L} \left ( \mathcal{H}_B\right )\) via
\[\mathfrak{E}_{B|A} \left ( \rho_A \right ) = \PTr{A}{\rho_{B|A} \rho_A},\]
where we have used the cyclic property of the trace to combine the \(\sqrt{\rho_A}\) terms, or
\[\mathfrak{E}_{B|A} \left ( \cdot \right ) = \PTr{A}{\rho_{B|A} (\cdot)}.\]
The map \(\mathfrak{E}_{B|A}\) so defined is just the Jamiolkowski isomorphic map to \(\rho_{B|A}\) and the above equation gives the reverse direction of the Jamiolkowski isomorphism that I was being secretive about earlier.

The punchline is that the Choi-Jamiolkowski isomorphism should not be thought of as a mapping between quantum states and quantum operations, but rather as a mapping between conditional quantum states and quantum operations. It is no more surprising than the fact that classical stochastic maps are determined by conditional probability distributions. If you think of it in this way, then your approach to quantum information will become conceptually simpler a lot of ways. These ways are discussed in detail in the paper.

Causal Conditional States

There is a subtlety that I have glossed over so far that I’d like to end with. The map \(\mathfrak{E}_{B|A}\) is not actually completely positive, which is why I did not denote it \(\mathcal{E}_{B|A}\), but when preceeded by a transpose on \(A\) it defines a completely positive map. This is because the Jamiolkowski isomorphism is defined in terms of the partial transpose of the maximally entangled state. Also, so far I have been talking about two distinct quantum systems that exist at the same time, whereas in the classical case, I talked about the input and output of a classical channel. A quantum channel is given by a CPT map \(\mathcal{E}_{B|A}\) and its Jamiolkowski representation would be
\[\mathcal{E}_{B|A} \left (\rho_A \right ) = \PTr{A}{\varrho_{B|A}\rho_A},\]
where \(\varrho_{B|A}\) is the partial transpose over \(A\) of a positive operator and it satisfies \(\PTr{B}{\varrho_{B|A}} = I_A\). This is the appropriate notion of a conditional state in the causal scenario, where you are talking about the input and output of a quantum channel rather than two systems at the same time. The two types of conditional state are related by a partial transpose.

Despite this difference, a good deal of unification is achieved between the way in which acausally related (two subsystems) and causally related (input and output of channels) degrees of freedom are described in this framework. For example, we can define a “causal joint state” as
\[\varrho_{AB} = \sqrt{\rho_A} \varrho_{B|A} \sqrt{\rho_A},\]
where \(\rho_A\) is the input state to the channel and \(\varrho_{B|A}\) is the Jamiolkowski isomorphic map to the CPT map. This unification is another main theme of the paper, and allows a quantum version of Bayes’ theorem to be defined that is independent of the causal scenario.

The Wonderful World of Conditional States

To end with, here is a list of some things that become conceptually simpler in the conditional states formalism developed in the paper:

  • The Born rule, ensemble averaging, and quantum dynamics are all just instances of a quantum analog of the formula \(P(Y) = \sum_X P(Y|X)P(X)\).
  • The Heisenberg picture is just a quantum analog of \(P(Z|X) = \sum_Y P(Z|Y)P(Y|X)\).
  • The relationship between prediction and retrodiction (inferences about the past) in quantum theory is given by the quantum Bayes’ theorem.
  • The formula for the set of states that a system can be ‘steered’ to by making measurements on a remote system, as in EPR-type experiments, is just an application of the quantum Bayes’ theorem.

If this has whet your appetite, then this and much more can be found in the paper.

Foundations Mailing Lists

Bob Coecke has recently set up an email mailing list for announcements in the foundations of quantum theory (conference announcements, job postings and the like). You can subscribe by sending a blank email to The mailing list is moderated so you will not get inundated by messages from cranks.

On a similar note, I thought I would mention the philosophy of physics mailing list, which has been going for about seven years and also often features announcements that are relevant to the foundations of quantum theory. Obviously, the focus is more on the philosophy side, but I have often heard about interesting conferences and workshops via this list.

Job/Course/Conference Announcements

Here are a few announcements that have arrived in my inbox in the past few days.

Perimeter Scholars International

Canada’s Perimeter Institute for Theoretical Physics (PI), in partnership with the University of Waterloo, welcomes applications to the Master’s level course, Perimeter Scholars International (PSI). Exceptional students with an undergraduate honours degree in Physics, Math, Engineering or Computer Science are encouraged to apply. Students must have a minimum of 3 upper level undergraduate or graduate courses in physics. PSI recruits a diverse group of students and especially encourages applications from qualified women candidates. The due date for applications to PSI is February 1st, 2011. Complete details are available at

Foundations Postdocs

Also a reminder that it is currently postdoc hiring season at Perimeter Institute. Although, the deadline for applications has passed, they will always consider applications from qualified candidates if not all positions have been filled. Anyone looking for a postdoc in quantum foundations should definitely apply. In fact, if you are looking for a foundations job and you have not applied to PI then you must be quite mad, since there are not a lot of foundations positions in physics to be had elsewhere. Details are here.

Quantum Interactions

I will admit that this next conference announcement is a little leftfield, but some of the areas it covers are very interesting and worthwhile in my opinion, particularly the biological and artificial intelligence applications.




The Fifth International Symposium on Quantum Interaction (QI’2010,, 27-29 June 2010, Aberdeen, United Kingdom.

Quantum Interaction (QI) is an emerging field which is applying quantum theory (QT) to domains such as artificial intelligence, human language, cognition, information retrieval, biology, political science, economics, organisations and social interaction.

After highly successful previous meetings (QI’2007 at Stanford, QI’2008 at Oxford, QI’2009 at Saarbruecken, QI’2010 at Washington DC), the Fifth International Quantum Interaction Symposium will take place in Aberdeen, UK from 27 to 29 June 2011.

This symposium will bring together researchers interested in how QT addresses problems in non-quantum domains. QI’2011 will also include a half day tutorial session on 26 June 2011, with a number of leading researchers delivering tutorial on the foundations of QT, the application of QT to human cognition and decision making, and QT inspired semantic information processing.

***Call for Papers***

We are seeking submission of high-quality and original research papers that have not been previously published and are not under review for another conference or journal. Papers should address one or more of the following broad content areas, but not limited to:

– Artificial Intelligence (Logic, planning, agents and multi-agent systems)

– Biological or Complex Systems

– Cognition and Brain (memory, cognitive processes, neural networks, consciousness)

– Decision Theory (political, psychological, cultural, organisational, social sciences)

– Finance and Economics (decision-making, mergers, corporate cultures)

– Information Processing and Retrieval

– Language and Linguistics

The post-conference proceedings of QI’2011 will be published by Springer in its Lecture Notes in Computer Science (LNCS) series. Authors will be required to submit a final version 14 days after the conference to reflect the comments made at the conference. We will also consider organizing a special issue for a suitable journal to publish selected best papers.

***Important Dates***

28th March 2011: Abstract submission deadline

1st April 2011: Paper submission deadline

1st May 2011: Notification of acceptance

1st June 2011: Camera-Ready Copy

26th June 2011: Tutorial Session

27th – 29th June 2011: Conference


Authors are invited to submit research papers up to 12 pages. All submissions should be prepared in English using the LNCS template, which can be downloaded from

Please submit online at:


Steering Committee:

Peter Bruza (Queensland University of Technology, Australia)

William Lawless (Paine College, USA)

Keith van Rijsbergen (University of Glasgow, UK)

Donald Sofge (Naval Research Laboratory, USA)

Dominic Widdows (Google, USA)

General Chair:

Dawei Song (Robert Gordon University, UK)

Programme Committee Chair:

Massimo Melucci (University of Padua, Italy)

Publicity Chair:

Sachi Arafat (University of Glasgow, UK)

Proceedings Chair:

Ingo Frommholz (University of Glasgow, UK)

Local Organization co-Chairs:

Jun Wang and Peng Zhang (Robert Gordon University, UK)

Quantum Foundations Meetings

Prompted in part by the Quantum Pontiff’s post about the APS March meeting, I thought it would be a good idea to post one of my extremely irregular lists of interesting conferences about the foundations of quantum theory that are coming up. A lot of my usual sources for this sort of information have become defunct in the couple of years I was away from work, so if anyone knows of any other interesting meetings then please post them in the comments.

  • March 21st-25th 2011: APS March Meeting (Dallas, Texas) – Includes a special session on Quantum Information For Quantum Foundations. Abstract submission deadline Nov. 19th.
  • April 29th-May 1st 2011: New Directions in the Foundations of Physics (Washington DC) – Always one of the highlights of the foundations calendar, but invite only.
  • May 2nd-6th 2011: 5th Feynman Festival (Brazil) – Includes foundations of quantum theory as one of its topics, but likely there will be more quantum information/computation talks. Registration deadline Feb. 1st, Abstract submission deadline Feb. 15th.
  • July 25th-30th 2011: Frontiers of Quantum and Mesoscopic Thermodynamics (Prague, Czech Republic) – Not strictly a quantum foundations conference, but there are a few foundations speakers and foundations of thermodynamics is interesting to many quantum foundations people.

A Reading List on the Foundations of Probability and Statistics

The continuing saga of time-travel in the quantum universe has been delayed because I have been working hard to finish writing a paper. Rest assured, it is coming in the next week or two. For now, I have been getting more interested in the foundations of probability and statistics. More accurately, I have always been interested (and opinionated) on the subject, but I have recently become interested in reading more widely around the subject, in the hope that I will actually come to know what I am talking about. The literature on this subject is vast, so I have decided to concentrate on the arguments for different conceptions of probability and how they are used to justify statistical methodology. I have also decided to concentrate on books and collections rather than listing references to original papers, except in a few instances where I could not find a collection containing an important paper. The references are generally to the most recent edition of the texts rather than the originals. I have added comments to the references that I know something about, and will add more as I read through them. If anyone thinks I have missed something vital then please mention it in the comments.

Disclosure: All links to Amazon are affiliate links.

General Introductions

T. L. Fine, Theories of Probability (Academic Press 1973)
READ This is an excellent book, but it is not for the feint of heart. Fine does not view any of the major approaches to probability as adequate, so some parts of the book are a bit ranty, but personally I do love a good rant. It covers most of the major approaches to probability theory in full gory mathematical detail. This includes, axiomatic, relative frequency, algorithmic complexity, classical, logical and subjective approaches. Fairly unique to this text is a comprehensive treatment of comparative probability, where you just have a relation of “more probable than” rather than a quantitative measure of probability. This occurs right at the beginning of the book, and may put some readers off as it is extremely technical and unfamiliar. However, once Fine gets to the more familiar territory of quantitative probability, the book becomes a lot more readable. If you are interested in the mathematical foundations of probability then you will not find a better book. A final warning is that some sections of the book are a bit out of date, since it was written in the 1970’s and there has been a lot of progress in some areas since then, e.g. in maximum entropy methods and algorithmic complexity. Nevertheless, nobody has done such a comprehensive job of covering the mathematics since this book was published.
Maria Galavotti, A Philosophical Introduction to Probability (Stanford: Center for the Study of Language and Information Publications 2005)
READ A better title for this book would be “A Histotico-Philosophical Introduction to Probability”. Galavotti covers all the standard interpretations of probability: classical, frequentist, propensity, logical, subjective; but she does so by focussing on the people that developed these views. Each chapter consists of sections devoted to individual researchers in the foundations of probability, beginning with a potted biography followed by a description of their view. This contrasts with other introductory texts, which tend to focus on a specific version of each viewpoint, e.g. von Mises theory of frequentism is usually discussed in detail, with only passing mentions of the other proponents like Venn and Reichenbach. This historical approach is useful as an entry point to the historical literature and has the advantage that it covers a wider variety of opinions than other introductory texts. There are several people who are frequently mentioned in the modern literature, but usually without a detailed description of their viewpoint. From this point of view, I found the accounts of Reichenbach, Jeffreys and Ramsay very useful. Reichenbach was a frequentist, but he took a Bayesian approach to statistical inference. Given the close association between frequentism and classical statistics on the one hand, and subjectivism and Bayesian statistics on the other, it is easy to overlook the possibility of Reichenbach’s position and to view criticisms of classical statistics as criticisms of frequentism in general. The treatment of Ramsay is especially good, as this is an area in which Galavotti has done significant scholarship. Ramsay is one of the originators of the subjective view of probability, but he is usually viewed as a pluralist about probability because he made comments to the effect that a different account of probability is required for science in his published essay. Unfortunately, Ramsay died before he completed his account of probability in the sciences. Using unpublished notebooks as sources, Galavotti argues that Ramsay was not a pluralist, and that his account of scientific probability would have been in terms of the stability of subjective probabilities. This is not completely conclusive, but represents an interesting alternative to the usual account of Ramsay’s viewpoint.

However, there are three negatives about Galavotti’s approach in this book. Firstly, given the number of viewpoints she discusses, many of the discussions are too brief to give a real understanding of the subtleties involved. Secondly, her treatment of the basic features of probability theory at the beginning of the book is rather awkward, and would be confusing for someone who had never encountered probability before (part of the awkwardness may have to do with the fact that this is a translation of the Italian original). Thirdly, mathematics is eschewed in this book, even where it would be extremely helpful. In some cases, the main objections to a viewpoint are that the mathematics does not say what the proponents would like it to say, and it is not possible to do justice to these arguments without writing down an equation or two. Therefore, even though this book has “Introduction” in the title, I cannot recommend it as a first textbook on the subject. It would be better to read something like Hacking or Gillies first, and then to use this as supplementary reading to get some historical context. Overall, this is a distinctive and original work, and provides a useful complement to more conventional textbooks on the subject.

Donald Gillies, Philosophical Theories of Probability (Routledge 2000)
READ This the the best introductory textbook on the foundations of probability from a philosophy point of view that I have read. The first part of the book treats most of the best known theories of probability: Classical, logical, frequentist, subjective Bayesain and propensities. The only mainstream interpretation that is missing is a discussion of Lewis’ conception of objective chances and the principal principle. This is a shame as it is currently one of the most fashionable, particularly amongst the philosophers of quantum theory that I associate with. Gillies does a good job of explaining the distinction between objective and subjective approaches to probability and the discussion of the merits and criticisms of each view is largely balanced and measured. Places where mathematical technicalities come up, such as infinite sample spaces and limit theorems, are described accurately. Even though the mathematical technicalities are omitted, as is appropriate in an introductory book, what he says about them is conceptually accurate. The second part of the book lays out the author’s own views on probability, which include a defence of a pluralist approach where different interpretations of probability are appropriate for different subject areas. He comes down in favour of a propensity-long run frequency view of objective chances and a subjectivist view of other probabilities, with a spectrum of other possibilities in between. There is much that I disagree with in this part of the book, but this is not a big criticism because almost all philosophical textbooks become controversial when the author discusses their own views. For completeness, here are the main points that I disagree with:

  1. I think the argument that exchangeability cannot justify statistical methodology is based on a double standard about the degree to which interpretations of probability are allowed to be approximate. Frequentist theories are given far more lenience in the degree to which they only have to approximate reality.
  2. I do not think that the “intersubjective” interpretation of probability is distinct from the usual subjective one. The distinction is based on a misunderstanding about what an “agent” is in the subjective theory. It is not necessarily and individual human being, but could be a well-programmed computer or a community that has approximately shared values. Thus, the intersubjective theory is just a special case of the usual subjective one.
  3. I do not agree with the pluralist view of probability. For example, the argument that probabilities in economics are fundamentally different from those in natural sciences is based on our ability to conduct repeatable experiments. This is a feature of our epistemic situation and not a feature of reality. For example, we could imagine a race of aliens that are able to create multiple copies of the planet Earth that are identical in all factors that are relevant for economics. They could then perform experiments about economics that have the same status as the experiments that we perform in physics. I also think that Gillies distinction fails to take account of the way probabilities are used in modern subjects such as quantum information theory, where you surely have subjective probabilities infecting our description of natural physical systems.

Despite these criticisms, which would take a whole article to explain fully, this is still an extremely good introductory text.

Ian Hacking, An Introduction to Probability and Inductive Logic (CUP 2001)
READ A general introduction geared towards philosophy students. Would be suitable for undergrads with no prior exposure to probability, but perhaps a little logic and/or naive set theory. A bit simplistic for anyone with a stronger background than that, but the later chapters may be useful to those unfamiliar with the different philosophical approaches to probability theory.
Alan Hájek, Interpretations of Probability, The Stanford Encyclopedia of Philosophy (Spring 2010 Edition), Edward N. Zalta (ed.),
READ As usual for the Stanford Encyclopedia of Philosophy, this is a good summary and starting point for references.
D. H. Mellor, Probability: A Philosophical Introduction (Routledge 2005)
READ I have to admit that this textbook on the philosophy of probability left me feeling more confused than I was when I started reading it. Perhaps this is because this is definitely a philosophy textbook and Mellor does does not shirk on the philosophical jargon from time to time, e.g. “Humean theory of causality”. One thing that I liked about the approach taken in this book is that Mellor introduces three kinds of probability — objective chances, epistemic probability and credences — right at the beginning and then goes on to discuss how each interpretation of probability reads them. This is in contrast to most other treatments, which make a broad distinction between objective and subjective interpretations and then go on to discuss each interpretation on its own without any common thread. Mellor’s approach is better because each of the interpretations of probability has its own scope, e.g. von Mises denies the relevance of anything but chances and subjectivists deny everything except credences, so this makes it clear when the different interpretations are discussing the same thing, and when they are attempting to reduce one concept to another. The only problem with this approach is that it suggests that there definitely are three types of probability and hence it effectively presupposes that a pluralist approach is going to be needed. I would prefer to say that there appear to be three types of probability statement in the language we use to discuss the theory, and that an interpretation of probability has to supply a meaning for each kind of statement, without assuming at the outset that the different types of statement actually do correspond to distinct concepts. Unlike Gillies, this book does include extensive discussion of the principal principle and its relatives, which is a good thing. However, I found Mellor’s discussion of things like limits and infinite samples spaces to be much more misleading than the discussion in Gillies. For example, when he introduces the concept of limit he uses an example of a function that tends to its limit uniformly and from one side. This is unlike probabilistic limits which can be subject to large fluctuations. He also suggests at one point that the only probabilities that make sense on an infinite sample space are zero and one, before going on to correct himself. Whilst he eventually does get the concepts right, these sort of statements are apt to confuse. Now, in an introductory philosophical text, I do not expect every mathematical concept to be treated in full rigour, but Gillies shows that it is possible to discuss these concepts at a heuristic level without saying anything inaccurate. Finally, Mellor has a tendency to assume that the mathematics does say what the advocates of each interpretation want it to say and then goes on to criticize at a more conceptual level, whereas I think that some of the most effective arguments against interpretations of probability are just that the mathematics does not say what they need it to say. For all these reasons, I would recommend this as a supplementary text, particularly for philosophers, but not as a first introductory text on the subject.

General Collections

These are collections of papers that are not specific to a single approach. Collections on specific approaches are listed in the appropriate sections.

Antony Eagle (ed.), Philosophy of Probability: Contemporary Readings (Routledge 2010). Due to be published Nov. 19th.
UNREAD Contains many of the classic papers including de Finetti, Popper and Lewis, as well as modern commentary.


Ian Hacking, The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference, 2nd edition (CUP 2006)
READ In this book, Hacking takes a look at the emergence of the concept of probability during the enlightenment. The history goes up to Bernoulli’s proof of the first limit theorem. Hacking’s goal in looking at the history is to defend a philosophical thesis. Modern debates on the foundations of probability and statistics are focussed on whether probability should be regarded as an objective physical concept (what Hacking calls aleatory probability), usually framed in terms of frequencies or propensities, or as an epistemic concept concerning our knowledge and belief. Hacking argues that it was essential to the development of the probability concept that both of these ideas arose in tandem. The history is fascinating and Hacking’s argument provides a lot of context for modern debates.
Ian Hacking, The Taming of Chance (CUP 1990).
UNREAD My impression is that it covers a later period of history than the previous text.
David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (Holt McDougal 2002)
READ This is a popular-level book about mathematical statistics. It is a very difficult subject to write a popular book about. Most popular science books are about the weird and wonderful things that we have discovered about reality, but this book is about the evolution of the methods that we use to justify such discoveries, and hence it is one level more abstract. It is really difficult to convey the content of the different approaches without using much mathematics. Salsburg’s approach is historical and he largely tackles developments in chronological order. The first half of the book covers Pearson, Fisher and Neyman-Pearson(II) and the invention of mathematical statistics in the twentieth century. He paints a vivid picture of the personalities involved and conveys how their ideas revolutionized the scientific method. I learnt a lot from this part of the book. In particular, I did not realize the extent to which the attempt to prove Darwinian evolution was a driving force in the development of mathematical statistics and I also did not realize how important UCL was in the early development of the subject. The second part of the book covers more modern developments and is less successful. The reason for this is that research in statistics has expanded into a vast number of different directions and is still a work in progress. Therefore, it is not clear what future generations will think the most important developments are. Salsburg deals with this by basing most of the remaining chapters on the life and theory of individual statisticians. However, these sketches are too brief to leave much of a lasting impression on the reader. There is also a brief discussion of classical vs. Bayesian statistics. Whilst the Bayesian methodology is given some credit, Salsburg is pretty firmly in the classical camp and the dismissal of Bayesianism essentially boils down to the fact that he thinks it is too subjective. I would have liked to have seen a more balanced discussion of this. Overall, this is an interesting book and I recommend it to anyone interested in the foundations of probability and statistics as supplementary reading. Salsburg is to be admired for attempting a popularization of such an important topic, but I have my doubts about how much readers with no prior background in statistics will get out of the book.
Stephen M. Stigler, The History of Statistics: The Measurement of Uncertainty before 1900 (Harvard University Press 1986)
UNREAD A very well regarded treatment of the early history of statistics.
Jan von Plato, Creating Modern Probability: Its Mathematics, Physics and Philosophy in Historical Perspective (CUP 1994)
UNREAD Said to be selective in its treatment of the history. Attempts to unify von Mises with de Finetti at the end of the book.

@scidata pointed me to this correspondence (pdf) between Fermat and Pascal, which provides a record of early ideas about probability.

Classical (Laplacian) Approach To Probability

Pierre Simon Marquis de Laplace, A Philosophical Essay on Probabilities (Dover 1996)
UNREAD One of the earliest attempts to outline the theory of probability. Origin of the principle of indifference.


Richard von Mises, Probability, Statistics and Truth (Dover 1981)
UNREAD The canonical work on the ensemble-based, frequentist approach to probability.

Subjective/Personalist Bayesianism

As you may be able to detect from the structure of the list, this is my current favored approach, and I have a particular fondness for the works of de Finetti and Jeffrey. This may change as I read further into the subject.

José M. Bernardo and Adrian F. M. Smith, Bayesian Theory (Wiley 2000)
READ (well, at least the first few chapters). The modern technical “bible” of subjective Bayesianism. Contains a very intricate decision theoretic derivation of probability theory that is much more complicated than Savage as well as virtually every theorem that crops up in subjective foundations.
Bruno de Finetti, Theory of Probability: A Critical Introductory Treatment, 2 volumes (Wiley 1990)
READ. Despite the title, this is not really suitable for those without a background in probability theory or the foundational debate. Contains the loss-function approach where one takes previsions (the subjective correlate of expectation values) as fundamental rather than probabilities. Also contains extensive discussion of the finite vs. countable additivity debate and the de Finetti representation theorem.
Bruno de Finetti, Probabilism (1989), Erkenntnis, 31:169-223.
PARTLY READ. English translation of Probabilismo, which was de Finetti’s first work on subjective probability from 1937. Needs to be read in conjunction with Richard Jeffrey, Reading Probabilismo (1989), Erkenntnis, 31:225-237.
Bruno de Finetti, Philosophical Lectures on Probability, edited by Alberto Mura (Springer 2008)
READ Based on transcripts of a graduate course given by de Finetti in 1979. This is for die-hard de Finetti fans only. He was obviously pretty senior when he gave this course and there is a lot of repetition. It is useful if you are a scholar who wants to pin down precisely what the later de Finetti’s ideas on fundamental topics were. Everyone else should read de Finetti’s textbook instead.
Richard Jeffrey, Subjective Probability: The Real Thing (CUP 2004). Free pdf version
READ A very readable introduction to the basics of the subjective approach. Also discusses Jeffrey conditioning (a generalization of Bayes’ rule) and applications to confirmation theory.
Richard Jeffrey, The Logic of Decision 2nd edition (University of Chicago Press 1990)
READ A philosopher’s account of the decision theoretic foundations of subjective probability. Jeffrey’s approach to decision theoretic foundations differs from the more commonly used approach of Savage in that he ascribes both probabilities and utilities to propositions, whereas Savage assigns probabilities to “states of the world” and utilities to “acts”. In general, Jeffrey also allows utilities to change as the state of belief changes, which helps to solve problems with the Bayesian treatment of things like the prisoner’s dilemma and Newcomb’s paradox. The representation theorems in Jeffrey’s approach are not as strong as in Savage’s, i.e. the probability function is not quite unique unless utilities are unbounded. Nevertheless, this is an interesting and arguably more realistic approach to decision theory as it should be applied in the real world. Finally, this book contains a comprehensive treatment of Jeffrey conditioning, which is a generalization of Bayesian conditioning to the case where an observation does not make any event in the sample space certain.
H. E. Kyburg and Howard E. Smokler (eds.), Studies in Subjective Probability (Wiley 1964)
READ This collection is mainly of historical interest in my view. The most relevant paper for contemporary Bayeisanism is de Finetti’s, which is available from a variety of other sources. The collection starts with an excerpt from Venn’s book, which sets the stage by outlining common objections to subjective approaches to probability (Venn was one of the first to present a detailed relative frequency theory). The other paper that I found interesting is Ramsey’s, since this was the first paper to present the modern subjective approach to probability based on Dutch book and decision theoretic arguments.
Leonard J. Savage, The Foundations of Statistics (Dover 1972)
READ The canonical work on decision theoretic foundations of the subjective approach.

Logical Probabilities

Rudolf Carnap, Logical Foundations of Probability (University of Chicago Press 1950)
UNREAD Supposedly one of the best worked out treatments of logical probability.
John Maynard Keynes, A Treatise On Probability (MacMillan 1921) – free ebook available from project Guttenberg
UNREAD Supposedly more readable than Carnap. WARNING – Because this book is out of copyright there are numerous editions available from online bookstores that are of dubious quality. This is why I am not linking to any of the dozens of versions on Amazon. The best advice is to use the Guttenberg ebook or look for an edition from a reputable publisher in a bricks and mortar bookshop. (Irrelevant fact: according to my mother, my maternal grandmother worked as a maid for Keynes.)

Objective Bayesianism and MaxEnt

Arguably, objective Bayesianism is the same thing as logical probabilities, but since I rarely hear people mention Jaynes and Cox in the same breath as Carnap and Keynes, I have decided to give the former their own section. Jaynes, in particular, if far more focussed on methodology and applications than the earlier authors.

Richard T. Cox, The Algebra of Probable Inference (The John’s Hopkin’s Press 1961)
UNREAD Contains the Cox axioms that characterize probability theory as an extension of logic.
Solomon Kullback, Information Theory and Statistics (Dover 1968)
UNREAD Origin of the minimization of relative entropy as an update rule in statistics. Closely related to MaxEnt.
Edwin T. Jaynes, Probability Theory: The Logic of Science
(CUP 2003)
UNREAD The doyen of MaxEnt, in his own words.

Propensities and Objective Chances

Charles Sanders Peirce, Philosophical Writings of Peirce
, edited by Justus Buchler (Dover 1955)
UNREAD Peirce foreshadows the propensity concept of Popper in papers 11-14.
Karl R. Popper, The Propensity Interpretation of the Calculus of Probability and the Quantum Theory (1957) in S. Körner (ed.), The Colston Papers, 9: 65–70 and The Propensity Interpretation of Probability (1959) British Journal of the Philosophy of Science, 10: 25–42
UNREAD Introduces the idea of propensities. Interestingly, for Popper, quantum mechanics provides a strong motivation for the need for single-case probabilities.
Karl R. Popper, The Logic of Scientific Discovery
(Routledge Classics 2002)
UNREAD Chapter 8 explains his views on probability in comparison other approaches.

David Lewis, Philosophical Papers: Volume II
(OUP 1986). Also available online if you have a subscription to Oxford Scholarship Online.
UNREAD Contains a reprint of the 1980 paper that introduced the principal principle, as well as a paper on conditional probabilities.

Application to the Problem of Induction and Philosophy of Science

Most of the philosophy-based introductory texts cover this subject, but these texts are specifically focussed on understanding the scientific method.

John Earman, Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory
(MIT Press 1992)
READ This is a landmark book in the philosophy of Bayesian inference as an approach to the confirmation of scientific theories. Earman is a Bayesian (at least sometimes) and he gives an honest appraisal of the successes and failures of Bayesianism in this context. The book starts with an analysis of Bayes’ original paper, which I did not personally find very interesting because I am more interested in contemporary theory than historical analysis. The second chapter gives a good overview of Bayesian methodology. The remainder of the book discusses the successes and failures of the Bayesian approach. I found the discussion of “convergence to the truth” results, done in terms of martingale theory, particularly insightful, since I have only previously seen this discussed in terms of exchangeability, and the assumptions in the martingale approach seem more reasonable (at least for non-statistical hypotheses which is all that Earman discusses). Earman argues that the “problem of old evidence” is subsumed into the more general problem of how to incorporate new theories into a Bayesian analysis. It is the latter that is the real obstacle to obtaining scientific objectivity in the Bayesian approach. Also interesting is the discussion of the role of a more sophisticated version of Sherlock Holmes style eliminative induction in science, illustrated with a case study of experimental tests of General Relativity. At the end of the book, the Bayesian approach is compared to formal learning theory, with neither really winning the battle. Subject to a mathematical conjecture, Earman shows that formal learning theory does not really have an edge of Bayesianism. Learning theory has developed significantly since the publication of this book, so it would be interesting to see where this debate stands today. The conclusion of the book is rather pessimistic. Bayesianism seems to provide a better account of scientific inference than its rivals, but it does not really license scientific objectivity.
Colin Howson and Peter Urbach, Scientific Reasoning: The Bayesian Approach
, 2nd edition (Open Court 1993)
READ This is a great book that argues for a Bayesian approach to scientific methodology. Most of the other major approaches to probability are well-criticized and it reads well as an introduction to the whole area. I would have liked to see more mathematical detail in some sections, but this is a book for philosophy students and it does have good pointers to the literature where you can follow up on details. Particularly insightful are the chapters that criticize classical statistical methodology, e.g. estimators, confidence intervals, least-squares regression, etc. This goes far beyond the usual myopic focus on idealized coin-flips and covers many topics that are relevant to the design of real scientific experiments, e.g. randomization, sampling, etc. My only complaint is that they kind of wuss out at the end of the book by arguing for a von-Mises style relative-frequency interpretation of objective chances, connecting it to Bayesian probabilities by an asymptotic Dutch book argument that I found unconvincing because it does not refer to a bet whose outcome can be decided (similar remarks apply to the argument for countable additivity). Despite this reservation, this book is valuable ammunition for researchers who want to be Bayesian about everything.
Brian Skyrms, Choice and Chance: An Introduction to Inductive Logic
, 4th edition (Wadsworth 1999)
READ This is not a text about probability per-se, but about how to go about formulating a calculus of inductive inference, in close parallel to the calculus of deductive logic. The usual problems of induction are extensively discussed, so this would be a great companion to a first course in the philosophy of science. Probability is introduced towards the end of the book and, of couse, the whole approach taken by this book biases the discussion towards logical (Keynes/Carnap) approaches to probability. Ultimately, I think that the problems addressed by this book are best treated by a subjective Bayesian approach, and that the construction of an objective calculus of induction is doomed to failure. However, a lot is learnt from the attempt so I would heartily recommend this book to new students of the foundations of scientific methodology.


Krzysztof Burdzy, The Search for Certainty: On the Clash of Science and Philosophy of Probability
(World Scientific 2009)
UNREAD I do love a good rant and Burdzy certainly seems to have a lot of them stored up when it comes to the foundations of probability. He argues that neither the frequentist or subjectivist foundation can account for the practice of actual probabilists and statisticians. He also offers his own account, but the criticism seems to be the main point.

Mathematical Foundations

Eventually, a bit of rigorous measure-theoretic probability is needed, so…

A. N. Kolmogorov, Foundations of the theory of probability
, Second English Edition (Chelsea 1956)
UNREAD Classic text from the originator of measure-theoretic probability.
David Williams, Probability with Martingales
(CUP 1991)
UNREAD A lively modern treatment of rigorous probability theory that has been recommended to me many times.