An atom named Fred: defending the total evidence requirement principle.

ReasonMeThis March 10, 2021 113 Comments

Warning: this particular article is going to be quite a bit more incomprehensible than my other articles. It is part of a discussion Alex Popescu, I, and a poster that goes by Disagreeable Me have been having on some tricky aspects of probabilistic inferences, particularly in the context of accounting for fine-tuning. The following passage from a philosophy paper (Fine-tuning, multiple universes, and the "this universe" objection, by N. Manson and M. Thrush) was brought up, which makes an argument that contradicts my position:

Let us return to White’s point about total evidence. It requires some qualification. Suppose that we are given a one-kilogram sample of matter and are told to determine how many uranium atoms it contains, if any. Unfortunately, our Geiger counter is broken. Luckily for us, however, we have at our disposal an amazing resource: a uranium oracle. This gifted individual knows the state of each and every uranium atom, and even has names for them. We leave the oracle in a room with the sample and come back an hour later. The oracle tells us that just one uranium atom decayed: Fred. From the fact that Fred decayed we deduce that one uranium atom decayed. Can we proceed to use half-life calculations to estimate the number of uranium atoms in the sample? Not if we are required to reason from the fact that Fred decayed rather than the fact that some uranium atom or other decayed. Since the presence of other uranium atoms makes it no more likely that Fred should have decayed, the fact that Fred decayed doesn’t confirm the hypothesis that the sample contains the calculated number of uranium atoms. Indeed, we are not even entitled to conclude that there is more than one uranium atom in the sample. The extra information that it was Fred who decayed blocks such inferences, even though the extra information itself is quite compatible with the conclusion that there are many uranium atoms. Whatever the obligation to consider our total evidence amounts to, it should not block inferences of the above sort.

I could not disagree more with this analysis, because the alleged difficulties arising from using the more specific knowledge disappear, I claim, very quickly if we perform the probabilistic analysis rigorously. Define:

F = the unique id (name) of the decayed atom, i.e. Fred, although in reality, given the huge potential number of atoms, we should probably imagine F to be some long id number, so maybe F=657644398765555789866778.
D1(X) = "Only 1 atom decayed and it had id X".
D1 = "Only 1 atom decayed".
S = the hypothesis that there is only a small number of uranium atoms
M = many uranium atoms (the numbers are not important for the logic)

Normally we would use D1, not D1(F), to evaluate the competing hypotheses about the number of atoms, in real life we are not given the atoms' names. The thesis the above passage advances is that using the full available information we have in our peculiar situation, i.e. D(F), would actually lead to an incorrect result.

This is not so, there is nothing wrong with using the full datum D1(F). To prove that recall we use the Bayes theorem to update our credences for M and S. One convenient way to talk about what happens is to focus on the ratio: the credence for M divided by the credence for S. So what happens to this ratio during the update? Bayes tells us it gets multiplied by the Bayes factor. This is a convenient way to do the math, because to calculate the Bayes factor we don't need to know or care about the values of the prior credences at all.

So then what do we need to do to disprove the assertion that using D1(F) instead of D1 leads to an error? Our mission will be complete if we show that the Bayes factors are the same. Which means we just need to prove the

Claim. P(D1(F) | M) / P(D1(F) | S) = P(D1 | M) / P(D1 | S).

Proof. It is assumed in the story that the id number (name) of the atom carries no particular significance relevant to M or S (otherwise it would be foolish of us not to use it!). In other words, if one atom decayed (D1), the chance that its id number would be F is independent of how many other atoms are in the sample. Let's call this chance PF, then we can express the assumption of insignificance like this:

P(D1(F) | M) / P(D1 | M) = PF = P(D1(F) | S) / P(D1 | S),

which immediately, after a trivial rearrangement of terms, proves the claim.

113 Comments - Go to bottom

Alex PopescuMarch 11, 2021 at 4:18 AM
Hey Dmitriy,

I wrote a reply to you on Philip's site. Basically, I think the authors are saying that we know S, and we can't reason to C (the correct half-life calculation) on the basis of F, because F entails a drastically higher half-life calculation than C. That's because satisfying F requires that we meet the 1/N condition that you specified.

I also used quite sloppy language in my reply, so hopefully this makes it clearer that I think the authors are talking about coming to the correct half-life calculations in a given sample S, and not debating S vs M. I think in the end this tells us that TER must be combined with some other approaches for determining relevancy of evidence; TER is just one aspect here.
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 5:52 AM
Hey Dmitriy,

Now that I think about it more, I'm pretty sure that the authors are just saying that reasoning on the basis of Fred "blocks" us from inferring a hypothesis here, meaning that we have no reason to prefer M over S. Whereas, in reality, knowing that one atom decayed, we do have a reason to prefer one or the other (i.e. if the average decay time is longer, then S more likely). So, by showing that the S and M calculations are the same; you've inadvertently confirmed their point. It's confusing, but I take back the claim that the authors screwed this up.
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 11:33 AM
Dmitriy,

"what I showed is that updating credences on D1(F) gives the same result as doing it on D"

Now I am the one confused here. I thought you showed this: P(D1 | M) = N * P(D1(F) | M

Which to me basically says that the two calculations are not the same. One takes N into account; the other does not. The probability that F given M is going to be the probability that (a random atom decays) AND that the random atom is Fred (1/N chance). The probabilities are obviously different we agree, so I'm not sure how this is supposed to demonstrate the ultimate conclusion of yours.

In any case, the real concern is that reasoning on the basis of Fred means that M is not more likely that S correct? That's because N * P(D1(F) | M) = N * P(D1(F) | S). Since N is just related to Fred, it follows that M is no more likely than S. Because of course having a huge jar of uranium atoms increases the decay rate, but also correspondingly increases N (so that the number of possibilities that an atom couldn't be Fred increases). As such, they cancel out and M ends up being equally likely to S.

But notice that reasoning on the basis that some atom decayed, allows us to answer how likely it was for a general atom to decay. In which case it most certainly makes a difference whether M or S is the case. With M the higher decay rate means it is more likely that a general atom will have decayed. So you do get different results! And the authors point was that this different result, which entails that in the Fred case S and M are equally likely (and also any other hypothesis regarding the number of atoms), necessitates that we are blocked from choosing one such hypothesis over another.

My point on the other hand was different, but still problematic. Which was that if we reason on the basis of half life observations instead, we get incorrect results. Both cases of the incorrect/blocked reasoning are troublesome and seem to demonstrate the author's point; though I argue there are solutions of course. Do you disagree with one or both of the points I made?
ReplyDelete
Replies
Disagreeable MeMarch 11, 2021 at 1:59 PM
Hi Dmitriy,

Nice analysis. At first I thought there must be some mistake in your result, but then I was able to explain it to myself.

I think the Fred argument works on the intuition that we don't have a probability distribution in mind in advance for the name of the particular atom. That the name is Fred is then supposed to be meaningless, of no consequence. It is true that the size of the sample has no impact on whether Fred will decay. But what you have shown is that this is not the end of the story, because if we learn that Fred decayed, then we also learn that Fred was in the sample, which is likelier if the sample is larger.

In the end, as you say, we get to the same conclusion. Nice work.
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 2:10 PM
Dmitriy, this statement is ambiguous: "This gifted individual knows the state of each and every uranium atom, and even has names for them"

It can mean all the uranium atoms in the sample, or all the possible uranium atoms in some hypothetical uranium heaven :)

I am pretty sure it is the former. But so what if it is not? Again, you have just constructed a particular interpretation where N does not change if this so. But I believe that you have missed the most crucial point; the very fact that we can construct the scenario so that N changes (in the former definition) proves the author's point (even if they themselves missed such a construction)! I gave an interpretation where our evidence does change, and that's enough to demonstrate the point (remember I only need to give one such instance). Also, if you remember I already showed that reasoning on the particular does matter when we are trying to deduce the correct decay rate in a given sample.

In any case, we have two possible interpretations of the authors statement; on yours they are incorrect, on mine they are correct. To me it is clear that the principle of charity requires the latter no?

That said, do you have arguments against the fact that my updated definition for N (which of course is what I think the authors were saying anyway) shows that reasoning from Fred does matter? As well as my point about how such reasoning gives us an incorrect conclusion in the decay rate in a given sample?
ReplyDelete
Replies
Disagreeable MeMarch 11, 2021 at 2:38 PM
Hi Alex,

> Fred is guaranteed to be in the sample because we are given his existence.

Where does it say that? I think that's the wrong interpretation.

> It can mean all the uranium atoms in the sample, or all the possible uranium atoms in some hypothetical uranium heaven :)

> I am pretty sure it is the former.

I'm pretty sure it's all uranium atoms that exist (i.e. in the universe).

If N is the number of atoms in the sample, such that N varies with M and S, then the size of N gives the size of the sample and so there's no reason to use the evidence of decay at all.

If we don't know the size of N, then we are obliged (I think) to work with N = the number of possible Ids which is just the number of uranium atoms in the universe.
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 4:14 PM
Hey DM and Dmitriy,

So to sum up my position succinctly; to wrap things up with a neat bow on top (as I once heard Dmitriy put it) :)

I am only asking you to grant me Dmitriy's exact same setup, but additionally specifying that N must be the number of total uranium atoms in the sample. So in other words, Fred could be taken out of the entirety of the possible set of all named uranium atoms. If you do this, you will see that the N for M and S are different, entailing that the ultimate conclusion "that reasoning based on D is the same as based on F" is false. Again, I just need to come up with a single such case like the above, to prove my point.

The only possible objection I have heard to this was DM's point that if we say that N is equal to the number of total uranium atoms, then since we already know N, we don't need the decay rate to come to the correct calculations. But this isn't actually true. That's because while we know N for any given stipulated sample; we can't know which sample set of uranium atoms is better unless we take into account the decay rate calculations (based on the time that it took for Fred to pop up). The latter is still vital to our deducing the correct number of uranium atoms. So modifying the definition of N doesn't change matters.

@DM: I've noticed, in my re-reading of my posts, that many of them can come off as brusque if not downright rude. My apologies for that, sometimes I'm really rushed or in a hurry when posting (especially when I'm conversing with you for some reason). I also have a bad habit of saying mean things like "I think you're confused" which can obviously be phrased more pleasantly. I think I took up this nasty habit from my mentors that I admire, but I'm trying to change it. Hopefully your move (to Britain no?) goes well if it has not already been undertaken.

Best,

Alex
ReplyDelete
Replies
ReasonMeThisMarch 11, 2021 at 8:24 PM
Hi Alex and DM,

I just finished reworking the proof and completely got rid of N. This actually starts to look like a more general defense of TER, specifically of my contention that if we do Bayes by including known but irrelevant info, we never get an error, we get the same result we would get if we excluded such info.

This proposed counterexample doesn't work:
"I am only asking you to grant me Dmitriy's exact same setup, but additionally specifying that N must be the number of total uranium atoms in the sample. So in other words, Fred could be taken out of the entirety of the possible set of all named uranium atoms. If you do this, you will see that the N for M and S are different, entailing that the ultimate conclusion "that reasoning based on D is the same as based on F" is false. Again, I just need to come up with a single such case like the above, to prove my point."

It's true now that the answer we get on F (in my notation on D1(F)) differs from the answer based on D (D1). But it's the former that's correct, not the latter! So no contradiction with TER.
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 8:41 PM
Hey guys; there still seems to be some confusion over what I am talking about. I am not saying that we have to accept that there are N possible names outside the sample, but that nevertheless we should change the definition of N to be the number of atoms in the analysis. I am saying that if we make the additional stipulation that the authors' claim about the oracle's knowledge is actually meant to interpreted as being about the atoms in the sample; then we can see that reasoning from the particular does make a difference. So not only should we adopt my proposed analysis, but the principle of charity would actually demand that we interpret the authors in the way I proposed.

"But then you have different values of N. You have N_m and N_s. Dmitriy's analysis just has one N. You can't really translate the findings of one analysis to another. Can you?"

Yes that's the whole point (that we get different N values for M and S). And of course you can, here's the math (I'm trying to keep as much of it as close to RMT's to avoid confusion):

Let

F = the unique id (name) of the decayed atom, i.e. Fred, where a decayed atom has a name iff it exists in the sample.
D1(X) = "Only 1 atom decayed and it had id X".
D1 = "Only 1 atom decayed".
S = the hypothesis that there is only a small number of uranium atoms in the sample
M = many uranium atoms in the sample (the numbers are not important for the logic)
N1= The number of uranium atoms in M
N2= The number of uranium atoms in S

We can quickly see as before that: P(D1 | M) = N1* P(D1(F) | M)
and P(D1 | S) = N2*P(D1(F) | S)

But note that N1 and N2 have different values; that's because they are dependent on the values of M and S we stipulated (we can choose any M or S that we want). As RMT said, the numbers are not important for the logic. What's important is that we know that the N values for M and S must be different. Therefore we can derive in

Step 1:

N1* P(D1(F) | M)/N2*P(D1(F) | S) ≠ P(D1(F) | M)/P(D1(F) | S)

Step 2:

N1* P(D1(F) | M) = P(D1| M)
N2*P(D1(F) | S) = P(D1| S)

From Step 1 & Step 2 you get: P(D1| M)/P(D1| S) = K[P(D1(F) | M)/P(D1(F) | S)]

Where K = N1/N2; where K ≠ 1

Conclusion: you can't get

P(D1| M)/P(D1|S) = P(D1(F)| M)/P(D1(F)| S)

Because now the right hand side of the equation includes a constant (not equal to 1).

Best,

Alex
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 9:29 PM
Finally here's another issue with a similar analysis; to demonstrate that such problems can crop up elsehwere.

Let
F = the unique id (name) of the decayed atom, i.e. Fred, where a decayed atom has a name iff it exists in the sample.
D1(X) = "At least some atom decayed and it had id X".
D1 = "At least some atom decayed".
C = the hypothesis that the decay rate in sample S is x.
S= fixed sample size of uranium atoms (let's just say 1 kg of pure uranium)
We want to know whether we we should privilege C or C2 on the basis of D1 or F.

Assume C is the correct half-life calculation. Assume we observe a small time period over which a small percentage of the total number of uranium atoms decayed. We don't know what this percentage is, or how many such atoms may have decayed. We just know that Fred decayed.

We get P(C| D1) = P(C| D1(F)) *K

Where K is the value of the constant needed to make up for the fact that P(D1(F)) occurring, given C, is much rarer than P(D1) occurring. So we see that we do get different results regarding the likelihood of C being the case if we use D1 or D1(F). If P(C| D1) is correct, then P(C| D1(F)) can't be.

To put this another way:
The probability that C has a high value (high decay rate) is going to be higher given D1(F) than just D1. So using the former would predict a huge decay rate on account of the fact that we would presume that many such atoms regularly decay to explain the improbability of our observing D1(F) in that small time period

Obviously, P(C| D1) is correct since our knowing the information about Fred is completely irrelevant. As TER demands we use Fred, this is a real problem. If we don't add a qualifier that TER is about taking into account all *relevant evidence, then we are in trouble because we have no way to discard D1(F).
ReplyDelete
Replies
Alex PopescuMarch 11, 2021 at 9:32 PM
If you guys wish, I can do a formal analysis of the "this planet" objection as well, because I think it works too (and is actually easier to follow). I can come up with many other such examples as well. But I'll stop here; in case you wanted to address what I wrote.
ReplyDelete
Replies
ReasonMeThisMarch 11, 2021 at 10:48 PM
Alex,

I don't get the "Why?..." part at all. But I am pretty sure the conclusion in your first paragraph above is not right.

"You still didn't address the point, I feel, that P(M) =P(S) given D1(F)". I agreed above that under your assumptions about N1 and N2 that conclusion follows from TER. But I added that, contrary to what the authors say, that's not a problem for TER. It would only be a problem if that conclusion was incorrect under your assumptions. But it is correct, and the answer based on noTER is incorrect, showing that it is noTER, not TER, that has a problem.

ReplyDelete
Replies
ReasonMeThisMarch 11, 2021 at 11:28 PM
Hey Alex,

Let's focus on the first part first, I think it's vital:
"Are you sure you want to say that? :)

Remember that it is stipulated that only one atom decayed in the specified time interval, which was Fred. Are you really trying to argue that we can't, on the basis of that information, deduce whether M or S is more likely?"

Yes:) Consider
S = one atom, Fred
M = two atoms, Fred and Franny
Suppose the priors are 50-50. Suppose the decay rate is small.

If you are told "Franny decayed", the correct posteriors are 100:0.
If you are told "Fred decayed", the correct posteriors are 50-50.
ReplyDelete
Replies
ReasonMeThisMarch 12, 2021 at 12:22 AM
In my example, adding "only" doesn't change the analysis because of my stipulation that the decay rate is small.

In your example, this is no longer true:
" The problem is that:

"D1(F)| M is just as likely to occur as D1(F)| S."

As a result, the TER doesn't reach the erroneous conclusion that S and M are 50-50.
ReplyDelete
Replies
ReasonMeThisMarch 12, 2021 at 1:29 PM
Hey Alex,

It seems like by D(F) you meant something other than D1(F) because D1(F) certainly entails V.

The oracle tells us D1(F), and the authors erroneously argued that using the full info the oracle gives leads to an error. Do you agree that even in your scenario using D1(F), not D(F), is ok?
ReplyDelete
Replies
ReasonMeThisMarch 12, 2021 at 6:03 PM
Hey Alex,

I got confused very quickly, can we do this in tiny chunks?

"Because they want TER to be able to tell them which of these, D(F) or D1, is better. And it seems like TER can't do that."

Why you think that is what they want? D(F) doesn't entail D1, or vice versa, so what does this have to do with TER? TER doesn't talk about choosing between evidences A and B when neither of them entails the other.
ReplyDelete
Replies
ReasonMeThisMarch 12, 2021 at 9:32 PM
"This isn't the end story, because one can say that our using D(F) and V (or D1(F) for simplicity) still gets us the results we want even if we don't have D."

That was exactly the point of my post. My contention was that the authors erroneously disagree with this. Do you agree that they do claim using D(F) and V blocks the correct inference?

Let's get through this little chunk, and after that I’m very curious to hear what undesirable consequences this would have, and in what sense TER would be incomplete.

ReplyDelete
Replies
Alex PopescuMarch 12, 2021 at 11:42 PM
One possible objection I can anticipate to the above, is that TER isn't meant to apply to compound propositions. If A entails part of B, we can't use TER to discard that part of B that A entails. This, in effect, is what you have been doing by trying to deflect the authors' criticism by incorporating the evidence of D into some other thing, be it D1F (only one atom decayed and it was Fred) or V (only one atom decayed).

However this objection relating to compound propositions cannot work. Firstly, because such compound propositions are equivalent to the conjunction of their component parts. So, we might ask why we can't reason from the latter but not the former. If we reason from the latter, then we would see that interpretation A requires us to get rid of the conjunct part that is entailed by the stronger piece of evidence.

Furthermore, we can just stick the weak piece of evidence into a larger proposition in order to save it. For example, in the multiverse argument, we could emplace the evidence of 'some universe is fine tuned' into the proposition 'some evidence is fine tuned and 2 +2'. Then we can't use TER to discard the combined proposition, and we would be stuck with the wrong conclusions. So we do have to adopt B after all.
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 12:16 AM
Hey Alex,

I don't think we need anything like B. A is closer to my position, but it seems a bit imprecise to me. I would propose

C) TER is meant to be interpreted absolutely: Meaning that it is never erroneous to use Bayes with the total evidence. In our case that means using everything the oracle said, D(F) & D1.

I think our discussion has been impeded somewhat by a lack of precision on both our parts, so let me make sure you know what I mean by "use": I mean "perform Bayesian update on". And I switched to D1 from V, because I am not sure what V is now. D1 = "Exactly one atom decayed". So now the precise technical meaning of C is:

----------------------------------------------------------------------------------------------------
C) TER: Given a situation with competing hypotheses M and S, the correct Bayes factor to update odds ratio M:S is always equal to

P(E | M) / P(E | S), where
E = total evidence available in the situation
----------------------------------------------------------------------------------------------------

In our example E = D(F) & D1. Importantly, note that C doesn't prescribe HOW that Bayes factor must be calculated. In particular, we can first update just on D(F) and then on D1, or the other way around, or on both simultaneously, That is a question of the most convenient technique, and has nothing to do with C.

So now, do you feel your objection against A applies to C? I don't think it does:

"We are forced to always consider the likelihood of D(F) being true; which means we can't come to the correct conclusion regarding the sample size (P(M) = P(S))"
Given the technical definition of C, we definitely can.
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 1:35 AM
First can we clarify this:
" it is not enough to show that updating based on the total evidence can give correct credence's (I agree with this)."
If you agree with this then it logically follows that if updating on partial evidence gives different credences, then they must be incorrect.
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 9:38 AM
Hey Alex,

I think if we analyze your case carefully all problems will dissappear:
"
In our case we can construct a simple scenario:
T = D(F) & D1
Y= This atom's name was Fred, and it had a 1/N chance of forming.

Now suppose we took our T and subtracted Y from it. Call this new hypothesis absent Y, the X hypothesis. According to you, it can never be wrong to update X using Y. But in this case updating based on Y would mean we have to take into account a new condition (1/N) which screws up the calculation. Of course we don't update based on Y, because we all understand that Y is irrelevant. But if you don't stipulate that TER is just about *relevant evidence, you will run into problems like the above.
"

- I don't quite understand what you mean by forming and N. Is N the unknown number of atoms?
- Most importantly, whatever Y exactly means, it doesn't seem to be entailed by D(F)&D1
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 4:23 PM
Ok, let's keep things as separate as possible, I propose Y shouldn't be part of D1&D(F),
Y = "Probability of an atom to be named Fred is 1/N, where N = atoms in sample"
D(F) = "One of the decayed atoms is named Fred"
D1 = "Exactly one atom decayed"

Now the crucial part is: In my article I assumed Y to be the background knowledge, so the two Bayes factors I proved to be equal were:

factor for updating from knowing Y to knowing Y&D(F) EQUALS
factor for updating from knowing Y to knowing Y&D(F)&D1 (C1)

Why like that? First, because in the story what the oracle gave us was I felt the quantifiable statement D(F)&D1, while Y, the information about the set up was known separately. And second, because it doesn't matter in the slightest for the specific purpose of proving that using all info doesn't mess things up, claim (C1) is mathematically equivalent to

factor for updating from knowing 0 to knowing Y&D(F) EQUALS
factor for updating from knowing 0 to knowing Y&D(F)&D1 (C0)

and also equivalent to

factor for updating from knowing Y&D(F) to knowing Y&D(F)&D1 EQUALS 1 (C2)

I feel like you conceived of the whole situation somewhat differently, but do my definitions make sense to you, and do you agree that all three claims above are true?

ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 6:22 PM
Ok, I understand now that your objection is specifically about updating on T-Y vs on T, but one key part of your objection is still unclear to me. Are you saying that:
1. T-Y gives the correct answer for M:S; T gives a different, incorrect answer, OR
2. T gives the correct answer; T-V gives a different, incorrect answer even though Y is irrelevant

I have good responses for both I am pretty sure :)
ReplyDelete
Replies
Alex PopescuMarch 13, 2021 at 8:28 PM
Forgive me for diving into some semantics here, but I think it very important that we specify exactly what we think the function of TER is, and what it should be doing.

We seem to agree that TER tells us to take into account Y as evidence, in the case of T-Y vs T. So what does 'taking into account' evidence really mean? Typically, philosophers just think that evidence is any piece of information which modifies the likelihood of the hypothesis being correct. Or, in other words, has bearing on the probabilistic outcomes of the hypothesis. This definition suits us well for our conversation today.

In the multiverse case, we see that the additional piece of information, that the universe which was fine tuned was 'our universe' (call this extra information F), must be taken into account when we invoke TER. But if we are arguing on the basis of the likelihood that the multiverse makes some universe fine tuned, then F wouldn't qualify as evidence. F fails to qualify because it doesn't modify the likelihood of the former being correct. Therefore, because we know that F *must be evidence, per TER, we have to introduce an extra condition which modifies our hypothesis. This extra condition is that our hypothesis must take into account the likelihood of F being the case (and we see that the multiverse doesn't make F more likely).

Analogously, taking into account Y at first doesn't seem to impact the likelihood of the hypothesized sample size being correct. That's because the extra information about Fred has no obvious bearing on the likelihood of M or S being correct. But, similarly to the multiverse case, TER tells us that Y *must be construed as evidence (i.e. it must impact the likelihood of our hypothesis). The only way to get around this (just like in the multiverse case) is to modify our hypothesis by introducing the extra condition of 1/N, which our hypothesis must now meet.

Once we do so we can see that we are blocked in our estimates as to the correct sample size; all because we relied on TER to inform us what pieces of information should be counted as evidence or not.
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 9:22 PM
And I have argued that if we do the analysis rigorously then T turns out to give the right answer, which would dissolve 1. I think we can quickly show that for your example. I claim that whenever using T results in no change for M:S, then it's not because TER has an issue - it's because no change in the odds is actually the provably correct answer for that situation.

With your example there are two possibilities:
a) If the following two assumptions hold:
a1. The chance of more than one atom decaying is negligible on M,
a2. Exactly one atom in the sample is Fred.

then the result of no change for M vs S, given by T, is actually the right answer in that situation, and 1 is dissolved.

b) The two assumptions above don't hold. In that case T does not result in no change for M vs S.

Which possibility do you want? If it's b, then we need to know what replaces those two assumptions.
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 9:36 PM
I have always used the word evidence in this discussion to mean any known information. I then distinguish between relevant evidence, which modifies credences, and irrelevant evidence, which doesn't. I take TER to mean the correct credences are given by updating on all evidence, which by definition also implies that the same correct credences are given by updating on all *relevant* evidence.

We can demonstrate that pretty easily for your example if you pick possibility a.
ReplyDelete
Replies
ReasonMeThisMarch 13, 2021 at 11:29 PM
No, Y is not irrelevant, here's the calculation, with b1 and b2.

P(T | M) = P(Y | M) * P(D1 | Y&M) * P(D(F) | D1&Y&M) = blah * miniscule * 1/N2
P(T | S) = P(Y | S) * P(D1 | Y&S) * P(D(F) | D1&Y&S) = blah * normal * 1/N1

Bayes factor = miniscule / normal * N1/N2 << 1

This is the correct Bayes factor, though it might seem to you that N1/N2 should not be there. But one thing is clear right away: the inference to S is not blocked since the factor is much less than one.
ReplyDelete
Replies
Disagreeable MeMarch 14, 2021 at 6:57 PM
I skipped this discussion as I couldn't keep up with the frequency of posts, but it looks like you guys reached agreement at the end. Good news!

So, just to confirm, we're all agreed then that Dmitriy's original analysis was correct?

If so, TER isn't in trouble from Fred. I still have my doubts about how it fares in observer selection scenarios like ObserverCoin, as I don't think I can accept that the particular identity of the observer matters when any observer could have made analogous deductions. The inference to a multiverse only from the observation of one world still seems to me like something has gone wrong somewhere, a bit like Dmitriy pointed out in his analysis of the paper's Fred argument.

So I'm wondering if a similar trick could be employed there to what Dmitriy did here for Fred, to show that actually the particular identity of the observer does not matter. If it's there, it'll be subtle.

Unfortunately I can't port the analysis over. It's not really analogous to Fred because (1) we can only ever observe one world and (2) the whole enterprise is predicated on the fact that the observer's world was created in the first place.
ReplyDelete
Replies
Alex PopescuMarch 14, 2021 at 9:32 PM
Also I know realize why Y has the counterintuitive effect of making our credences in S even stronger. It's because Dmitriy did his analysis by starting with our knowledge of Y first; which technically didn't represent the actual scenario where we learn Y last. If we constructed the scenario in the manner I envisioned, we would see that Y is irrelevant and doesn't change things.
If:

P(T | M) = * P(D1 |M) * P(D(F) | D1&M) * P(Y | D1 & D(F) & M)= miniscule * Blah *Blah
P(T | S) = P(D1 | S) * P(D(F) | D1&S) * P(Y | D1 & D(F) & S) = normal * Blah * Blah

The knowledge of Y incorporated in the third column, is 0/0 since the evidence we had, had no bearing on the likelihood of the hypothesis being Y being true. So the Bayes factor remains the same (miniscule/normal), which accords with our intuition that Y shouldn't change things. So why then does reversing the order and starting with Y change things? That's because Dmitriy's analysis assumes that we started with the knowledge of Y first. Our knowing Y first would indeed modify the probabilities of D(F).

Without Y, asking what the probability of D(F) is would be a useless question. Our knowing Y first means that we knew that there could possibly exist a decayed atom named Fred, which had a 1/N chance of forming. Therefore, the later incorporation of the knowledge that our first atom is named Fred is an extraordinarily improbable event. The conjunction of the two events D(F) and D1 is now more improbable than just D1; so it makes sense that knowledge of Y (if we start with it first) modifies the probabilities even further to S' favour.
ReplyDelete
Replies
ReasonMeThisMarch 14, 2021 at 10:46 PM
Hey Alex,

"You can see my fallacy of reasoning I'm sure. The principal difference is that P(D1 | Y&M) was low when we needed it to be high, and P(D1 | Y&S) was higher when we needed it to be lower. Therefore the introduction of the knowledge of Y compounded, and did not reverse the Bayes Factor. Again, I still find it bizarre that knowledge of Y should somehow increase our credence in S by an even bigger factor. Let me know if I am misinterpreting that.

Sorry for all the comments on your blog; I hope I didn't ruin your site. You'll have to give me some time, but I have every intention of fulfilling my promise I laid out in my last email. As soon as I'm able, I'll get right onto the noble task of buying a hat, and well you know the rest...."

Haha:) And no, the page is not too slow. I'm wondering if eventually Blogger would create a "see older/newer comments" feature like what appeared on Philip's blog. And about why knowledge of Y should increase our credence in S further, it's probably not super critical to get a great intuitive feel for this, but I think I can give a reasonable explanation.

First it's important to realize that it's the combination of Y with D(F) that creates this effect, either one by itself wouldn't. Then let's remember exactly what Y amounts to: "either size sample contains exactly one Fred". That statement couldn't possibly be known for all names (id numbers), in fact it could only be known for at most N1 names. As an example, suppose it was known for N1 names. Then, because hearing D(X) for X NOT one of those N1 names would immediately increase our credence in M to 100%, it stands to reason that if X IS one of those names (such as Fred) hearing it should immediately decrease our credence in M.
ReplyDelete
Replies
ReasonMeThisMarch 14, 2021 at 11:23 PM
Hey Alex and DM,

Just a couple of quick points:
"So I'm wondering if a similar trick could be employed there to what Dmitriy did here for Fred, to show that actually the particular identity of the observer does not matter. If it's there, it'll be subtle."
In some sense it doesn't, but not in the way that helps Steven 's camp. It's probably better to just do Bayes carefully. With Alex's analysis:
"
Then P(T|M) = P(O|M) * P(D| O & M)
And P(T|S) = P(O|S) * P(D|O & S)
",
first we should remember that D&O = D, so mathematically there is no need to go through the intermediate step of O at all. The math then simplifies significantly.

Secondly, I would say it's tricky and potentially confusing to say that T = D&O (=D). We have to be super duper careful how exactly we interpret D. This is easier to express in the IVF case:
D1 = the roll of the dice for my embryo was lucky
D0 = my embryo was picked / created to do dice rolls on in the first place

I think it's easy to make the mistake of interpreting T as just D1, but it should be D0&D1.
ReplyDelete
Replies
Disagreeable MeMarch 15, 2021 at 2:50 PM
Thanks guys.

> Since Dmitriy showed that taking into account the total evidence can't be wrong,

I don't see how he can have showed that. He showed that taking TER into account in cases analogous to Fred doesn't lead you to the wrong conclusion after all. That doesn't mean that TER works in anthropic/observer selection cases as these are not perfectly analogous to Fred.

As I mentioned on Philip's blog, I think I'm done discussing TER. I'll just sum up by saying that it seems to work for non-anthropic scenarios, but I'm not convinced it works for anthropic scenarios. Doing Bayes carefully doesn't really answer the question, as whether you think TER applies will influence how you think Bayes ought to be done.

TER is a principle which we have good reason to adopt, I agree. But the inference from "my universe exists" to "many universes probably exist" with no other evidence to support it other than the fact that your universe would be more likely to exist if lots of universes exist seems to me to be so backwards that I'd be inclined to adopt an additional clarifying principle something like "When applying TER in cases of anthropic reasoning, the specifics of an observer's identity are not counted as evidence". I'm not sure I've worded that perfectly, but the idea is just that you shouldn't count as relevant evidence specific details of your circumstances that have no bearing on the general form of the argument. This is just an assertion grounded in my intuitions, but then I would say so are principles such as TER.

If there is no subtle analytical trick by which it can be shown that the observer's identity is as irrelevant as Fred's, I'm not sure how one could argue for or against that view except by appeal to intuition. I can understand the appeal of trying to settle the question with some sort of ensemble analysis, but I think that also runs into problems because I think your intuitions about this are going to be baked into how you conduct the analysis, and because we can only conduct an ensemble analysis from a God's eye view when we're trying to decide what an observer within the ensemble should believe.

Again, I think it may be as intractable as the Sleeping Beauty problem.

I've done a bit more work on my ensemble simulation, so I do expect to get back on that at some point.
ReplyDelete
Replies
ReasonMeThisMarch 15, 2021 at 6:19 PM
DM: "TER is a principle which we have good reason to adopt, I agree. But the inference from "my universe exists" to "many universes probably exist" with no other evidence to support it other than the fact that your universe would be more likely to exist if lots of universes exist seems to me to be so backwards that I'd be inclined to adopt an additional clarifying principle something like "When applying TER in cases of anthropic reasoning, the specifics of an observer's identity are not counted as evidence". I'm not sure I've worded that perfectly, but the idea is just that you shouldn't count as relevant evidence specific details of your circumstances that have no bearing on the general form of the argument. "

I wonder what you think of this part, and the comment above it:
"
Or, to be more precise,

T = D0 & D1 & irrelevant info about my hair color etc.

DM, it's this last part that makes it true that the details of the observer's identity don't matter, the relevant bit is D0, just the fact that my embryo was picked / created."

In other words I agree that the details are actually irrelevant. But I think we disagree about D0.

---------
Alex,

my intuition is completely the opposite, I think it doesn't matter in what order you learned facts A and B. The credences should only be determined by the fact that you now know A&B.
ReplyDelete
Replies
Alex PopescuMarch 15, 2021 at 10:09 PM
I think it might be helpful to clarify that
A) Our universe must meet *some identity condition(s).
B) Every object or class of object must meet *some identity condition(s).

A means that there must be some necessary condition which our universe holds; that makes it distinct from every other possible (or actual) universe. To refer to our universe is to refer (by default) to the universe which meets this/these unique condition(s). But notice that this condition can be as simple as the unique space-time coordinates of our own universe.

So if our universe exists within a multiverse, then our universe is the universe which has the space-time coordinates X within the multi-dimensional inflationary landscape. What's important is that we only need one such condition, so Dmitriy's argument just entails that extra metaphysical conditions which may be unique to our universe are unnecessary. I'm getting the sense that I should also stress B. Because it seems to me DM, that you believe that our action of including particular evidence, or our referring to a particular entity is somehow inherently engaged in special pleading.

But it makes little sense to say that we can't reason on the basis of details of identity; that's because every possible and actual entity has identity conditions. There are identify conditions tied to the multiverse, and to every other possible/actual universe within the multiverse (i.e. their own space-time coordinates). Thus, to say that referring to an entity on the basis of its identity conditions is to necessarily invoke specialness on that entity is, I think, confused.

Rather, what makes an entity special is not its holding identity conditions (because every entity does so), but rather its holding certain *types of identity conditions. I earlier pointed out that every universe within the multiverse has its own identity condition unique to its location. To make a claim that our universe is special is just to say that additional identity conditions are warranted for our universe which go above and beyond normal universe conditions. For example, if we specified that every other universe can be referred to by its location within the multiverse ensemble, but that our universe can only be referred to as the thing which meets condition x (where condition x could be the fact that earth exists); then we are engaged in special pleading on the part of our universe.

But notice that we're doing no such thing. No special pleading is therefore committed because our universe is construed as belonging to a general reference class that includes all other possible/actual universes. The same applies to the particular observer; we aren't invoking special identity conditions exclusive to the observer in question (i.e. she has to have black hair), but using the same type of condition relevant to all other observers (i.e. their location within the multiverse).

The bottom line is that it would actually be special pleading to argue that reasoning on the basis of certain identity conditions (i.e. space/time coordinates within the multiverse ensemble) is erroneous, but reasoning on different identity conditions like in the 'some universe' case is okay. Why we can we talk about the identity details of specific types of objects (i.e. universes) and specific types of embryos (i.e. the embryos in IVF trial) but not specific types of universes or more narrow categories of embryos (if you do in fact object to the IVF case)? Some reasoning must be given for that, otherwise that's just special pleading in the extreme.
ReplyDelete
Replies
Disagreeable MeMarch 15, 2021 at 11:10 PM
Distinguishing embryos by order in the trial means that you're taking the ordinal of the embryo as evidence to be reasoned with, even though that's just a specific detail that is only of any significance because it is tied to your identity. That's exactly the kind of detail I want to throw out in anthropic reasoning.

My own ideas on identity are radical, and I'm not trying to bring them into the debate. But I don't actually agree that our own universe or any object must meet some identity conditions. I don't think identity is anything more than a useful concept in practical situations. I don't think it's a real metaphysical thing. I don't think there's any fact of the matter as to whether I'm the same person I was yesterday, for example. You say that our universe is a universe with certain coordinates, but I think you run into two problems -- (1) there is no objective basis from which to define objective coordinates, so it's not clear to me that our universe would indeed have objective coordinates in an inflationary string landscape and (2) in some cosmologies (e.g. the MUH), universes are not merely at different coordinates but completely disconnected.

I'm not accusing you of engaging in special pleading. If anyone is engaged in special pleading it's me. I'm saying I'm happy to adopt TER except in the special case of Anthropic reasoning, where I plead that we may not use the evidence of our specific identity to conclude the kinds of things you want to conclude.

> Why are some identity conditions special/off-limits, but not others? Some reason has to be given for that.

You can use any conditions you like as long as you're not bringing them in just because they're "your" conditions and for no other reason than that. Fine tuning is a condition I think it's OK to bring in because it's interesting/surprising from an (imaginary) objective standpoint and not just because its a feature of your universe.

Otherwise, and again, this is not strictly analogous, but the mistake you seem to me to be making is being amazed that of all the UUIDs you could have generated just now, the one you got was e9983ab1-2a56-41b8-892b-887c5285e507. The problem with being amazed by this is that there is nothing at all significant about it other than the fact that this is the one that was just generated. The same thing happens, I feel, when we take as significant any property of our perspective that is only significant because it is a property of our perpective. If we must not take it as significant then we must, I feel, take it as irrelevant, which means not taking it as evidence in Bayesian arguments.
ReplyDelete
Replies
ReasonMeThisMarch 16, 2021 at 2:06 AM
Hey guys,

I will try to organize my thoughts on how to justify including D0 as part of the evidence into something coherent enough to make a new post. I think it can always be denied that this is the right thing to do without ending up in a contradiction, but I think the intellectual price tag on this denial is huge.

It's important to realize that the inclusion of D0 is equivalent to the ensemble principle, and moreover equivalent to the Self-Indication Assumption. Of course it's not blasphemy to deny SIA, Bostrom for example does argue against it in Anthropic Bias.
ReplyDelete
Replies
Disagreeable MeMarch 16, 2021 at 2:09 PM
Hi Alex,

> So an object/class having identity conditions is a necessary feature of our being able to refer/quantify over such a class

Agreed that it's a question of semantics and with the above. But that being the case, different people or different conversations can use different identity conditions. My point being that there is no fact of the matter over what the right identity conditions are. So anything you consider to be an identity condition, I might consider not to be and vice versa.

But this is neither here nor there. I was just checking that you weren't taking for granted assumptions about identity we may not all share. Perhaps you aren't.

> when it's actually not relevant/significant, is the same thing as saying that we are engaged in special pleading.

I don't think so, but now we're just arguing about the semantics of "special pleading". Special pleading is agreeing that some principle applies in general, but not in some specific case. That is what I am arguably doing in accepting TER in general but not in the specific case of anthropic reasoning. I don't think I can cast you as engaging in special pleading. There is no general principle you're accepting but rejecting in some special case. You're not even taking this universe to be special in any objective sense. We all agree that it is special from the subjective perspective of the observer. What I disagree with is reasoning from this subjective perspective and treating it as more than "some universe" by using its specific identity as evidence, when from an objective perspective it would be interchangeable with any other universe in some class without changing anything at all about the reasoning.
ReplyDelete
Replies
Alex PopescuMarch 16, 2021 at 3:43 PM
Hey DM,

"Any inference that we make knowing only that some observable is some fungible instance of a class should not be blocked *just because* we know that it is a specific instance of the class"

Since you define fungible as fungible with respect to the hypothesis, this just means that every instance of a class must confirm the hypothesis. Which is another way of saying that: "Any inference based on a particular cannot invalidate/block a logically necessary hypothesis about that general class". That's because contingent hypotheses about a class could be potentially overturned by particular evidence.

That's an obvious truth, but one which isn't so helpful here on account of its generality. A more helpful and narrower construal would be "Any inference based on a particular cannot invalidate/overturn a hypothesis about the general class holding a necessary property; assuming the hypothesis is sound". The latter type of hypothesis is still logically necessary (meaning it is either necessarily true or false); while being a more relevant sub-category.

So the reason knowing the identity of the president cannot invalidate the hypothesis about all human presidents being mortal; is because such a hypothesis is logically necessary (assuming that humans can't be immortal by definition); it has nothing to do with reasoning based on the particular being special or different.

If on the other hand, we had a contingent hypothesis, such as one where we try to estimate the half life of a particle based on it having decayed in a sample consisting of a mix of U-235 & U-238 & Plutonium & Thorium nuclides; then particular evidence overrules inferences based on general evidence. Suppose we only knew the general evidence "some atom in the sample decayed"; this, at best, gives a 1/4 chance that the half life of the atom is w,x,y, or z. Once we know, "this specific plutonium atom decayed" however, we can give a better probabilistic inference which overturns the previous one.

This isn't going to be true for all particular evidence in all cases of contingent hypotheses obviously, because not all such evidence is relevant (e.g. knowing that our atom exists at coordinates x,y,z). But the way to see whether such evidence is relevant is to do the math and find out if the probabilistic outcomes are affected, and so TER tells us that taking into account irrelevant evidence can never be bad because it's just a superficial thing (it doesn't change anything)

But in our case we aren't dealing with a logically necessary hypothesis or irrelevant evidence (we all agree that D modifies the likelihood of M being true, compared to O); so we can't use your principle to argue that reasoning on the particular in those instances is blocked/invalidated.
ReplyDelete
Replies
ReasonMeThisMarch 16, 2021 at 4:17 PM
Hey Alex and DM, I think this puts it really well:
"This isn't going to be true for all particular evidence in all cases of contingent hypotheses obviously, because not all such evidence is relevant (e.g. knowing that our atom exists at coordinates x,y,z). But the way to see whether such evidence is relevant is to do the math and find out if the probabilistic outcomes are affected, and so TER tells us that taking into account irrelevant evidence can never be bad because it's just a superficial thing (it doesn't change anything)"

About DM's proposed principle, I don't feel I entirely understand the exact meaning of fungible. But I am curious if the principle is disconfirmed by the example with 100 sleeping patients and the doctor killing 99 if he flips tails. This, or an example like this, is one of the crazy anthropic scenarios I wanted to use to understand why and how we must include D0 (our existence). Check out my first installment: https://www.reasonmethis.com/2021/03/anthropic-reasoning-1-i-am-therefore-i.html

It starts slow, but I think it might be good to go step by step, if only to see which exact step is the one that where we first start to disagree. Perhaps we could keep talking in the comments section of that new blurb - I think this page is becoming a bit too large for browsers to handle smoothly.
ReplyDelete
Replies