Warning: this particular article is going to be quite a bit more incomprehensible than my other articles. It is part of a discussion Alex Popescu, I, and a poster that goes by Disagreeable Me have been having on some tricky aspects of probabilistic inferences, particularly in the context of accounting for fine-tuning. The following passage from a philosophy paper (Fine-tuning, multiple universes, and the "this universe" objection, by N. Manson and M. Thrush) was brought up, which makes an argument that contradicts my position:
Let us return to White’s point about total evidence. It requires some qualification. Suppose that we are given a one-kilogram sample of matter and are told to determine how many uranium atoms it contains, if any. Unfortunately, our Geiger counter is broken. Luckily for us, however, we have at our disposal an amazing resource: a uranium oracle. This gifted individual knows the state of each and every uranium atom, and even has names for them. We leave the oracle in a room with the sample and come back an hour later. The oracle tells us that just one uranium atom decayed: Fred. From the fact that Fred decayed we deduce that one uranium atom decayed. Can we proceed to use half-life calculations to estimate the number of uranium atoms in the sample? Not if we are required to reason from the fact that Fred decayed rather than the fact that some uranium atom or other decayed. Since the presence of other uranium atoms makes it no more likely that Fred should have decayed, the fact that Fred decayed doesn’t confirm the hypothesis that the sample contains the calculated number of uranium atoms. Indeed, we are not even entitled to conclude that there is more than one uranium atom in the sample. The extra information that it was Fred who decayed blocks such inferences, even though the extra information itself is quite compatible with the conclusion that there are many uranium atoms. Whatever the obligation to consider our total evidence amounts to, it should not block inferences of the above sort.
I could not disagree more with this analysis, because the alleged difficulties arising from using the more specific knowledge disappear, I claim, very quickly if we perform the probabilistic analysis rigorously. Define:
- F = the unique id (name) of the decayed atom, i.e. Fred, although in reality, given the huge potential number of atoms, we should probably imagine F to be some long id number, so maybe F=657644398765555789866778.
- D1(X) = "Only 1 atom decayed and it had id X".
- D1 = "Only 1 atom decayed".
- S = the hypothesis that there is only a small number of uranium atoms
- M = many uranium atoms (the numbers are not important for the logic)
113 Comments - Go to bottom
Hey Dmitriy,
ReplyDeleteI wrote a reply to you on Philip's site. Basically, I think the authors are saying that we know S, and we can't reason to C (the correct half-life calculation) on the basis of F, because F entails a drastically higher half-life calculation than C. That's because satisfying F requires that we meet the 1/N condition that you specified.
I also used quite sloppy language in my reply, so hopefully this makes it clearer that I think the authors are talking about coming to the correct half-life calculations in a given sample S, and not debating S vs M. I think in the end this tells us that TER must be combined with some other approaches for determining relevancy of evidence; TER is just one aspect here.
Hey Alex, I think this:
Delete"Can we proceed to use [blah blah] to estimate the number of uranium atoms in the sample?" is phrased a bit weirdly but I think it's still clear that the problem is to estimate the number of atoms.
I agree; good catch! But notice that the authors screwing up this particular example doesn't invalidate the more general point of theirs. In my 'steel man' example, I think what I showed does follow. This does in my opinion tell us something interesting, namely that they've got the wrong interpretation of TER. Saying that we need to rely on all the available evidence, doesn't entail P:(if B entails A, then we should reason on B over A if possible).
DeleteI think we need to realize that we have to take into account all *relevant evidence; once we do so we can see that having an independent line of evidence for B outside A, doesn't have to dispute TER (as long as A is not relevant). Whereas, if we believe that TER is just principle P then it would dispute it, because we can show in certain cases where we have independent evidence (e.g. this planet scenario) that the principle P objection doesn't hold.
Hey Dmitriy,
ReplyDeleteNow that I think about it more, I'm pretty sure that the authors are just saying that reasoning on the basis of Fred "blocks" us from inferring a hypothesis here, meaning that we have no reason to prefer M over S. Whereas, in reality, knowing that one atom decayed, we do have a reason to prefer one or the other (i.e. if the average decay time is longer, then S more likely). So, by showing that the S and M calculations are the same; you've inadvertently confirmed their point. It's confusing, but I take back the claim that the authors screwed this up.
Hey Alex,
Deletewhat I showed is that updating credences on D1(F) gives the same result as doing it on D1. Therefore if the latter gives correct results, the former must too. Therefore this:
"Can we proceed to use half-life calculations to estimate the number of uranium atoms in the sample? Not if we are required to reason from the fact that Fred decayed rather than the fact that some uranium atom or other decayed."
is incorrect. Right? And I have to think more about your point about P and "this planet", I'm still not completely clear on it but I will reread your points and think it through. But I am nevertheless sure that it cannot lead to errors to use all available evidence, the results should be exactly the same as using all relevant evidence.
In other words, both of these: P(D1 | M) = N * P(D1(F) | M) and similarly
DeleteP(D1 | S) = N * P(D1(F) | S
Can't be true at the same time. If P(D1 | M) = N * P(D1(F) | M) then we are stipulating an N based on M. But notice that this no longer holds: P(D1 | S) = N * P(D1(F) | S
That's because introducing M increases the number of N compared to S; so you cannot be using the same values of N for both. You are equivocating between the two I believe.
Dmitriy,
ReplyDelete"what I showed is that updating credences on D1(F) gives the same result as doing it on D"
Now I am the one confused here. I thought you showed this: P(D1 | M) = N * P(D1(F) | M
Which to me basically says that the two calculations are not the same. One takes N into account; the other does not. The probability that F given M is going to be the probability that (a random atom decays) AND that the random atom is Fred (1/N chance). The probabilities are obviously different we agree, so I'm not sure how this is supposed to demonstrate the ultimate conclusion of yours.
In any case, the real concern is that reasoning on the basis of Fred means that M is not more likely that S correct? That's because N * P(D1(F) | M) = N * P(D1(F) | S). Since N is just related to Fred, it follows that M is no more likely than S. Because of course having a huge jar of uranium atoms increases the decay rate, but also correspondingly increases N (so that the number of possibilities that an atom couldn't be Fred increases). As such, they cancel out and M ends up being equally likely to S.
But notice that reasoning on the basis that some atom decayed, allows us to answer how likely it was for a general atom to decay. In which case it most certainly makes a difference whether M or S is the case. With M the higher decay rate means it is more likely that a general atom will have decayed. So you do get different results! And the authors point was that this different result, which entails that in the Fred case S and M are equally likely (and also any other hypothesis regarding the number of atoms), necessitates that we are blocked from choosing one such hypothesis over another.
My point on the other hand was different, but still problematic. Which was that if we reason on the basis of half life observations instead, we get incorrect results. Both cases of the incorrect/blocked reasoning are troublesome and seem to demonstrate the author's point; though I argue there are solutions of course. Do you disagree with one or both of the points I made?
Hey Alex,
DeleteI think my article turned out to be even more incomprehensible than I anticipated in the introduction :) Briefly,
N is defined as the total number of possible ids an atom can have, so it's the same for M and S.
"Now I am the one confused here. I thought you showed this: P(D1 | M) = N * P(D1(F) | M
Which to me basically says that the two calculations are not the same. "
No, that was just an intermediate step to prove the equation that says the two Bayes factors are equal, which entails the final credence for M, say, is the same in the two calculations.
I should rewrite things to make all that clear in the post.
Dmitriy, I think I understood the analysis. I just didn't realize that you meant the values for N to be the same in both S and M. Hence why I disagreed that we reached your ultimate conclusion from what you showed (the stuff I highlighted); as for my points on that, see my post below.
DeleteHi Dmitriy,
ReplyDeleteNice analysis. At first I thought there must be some mistake in your result, but then I was able to explain it to myself.
I think the Fred argument works on the intuition that we don't have a probability distribution in mind in advance for the name of the particular atom. That the name is Fred is then supposed to be meaningless, of no consequence. It is true that the size of the sample has no impact on whether Fred will decay. But what you have shown is that this is not the end of the story, because if we learn that Fred decayed, then we also learn that Fred was in the sample, which is likelier if the sample is larger.
In the end, as you say, we get to the same conclusion. Nice work.
DM,
DeleteFred is guaranteed to be in the sample because we are given his existence. So this: "because if we learn that Fred decayed, then we also learn that Fred was in the sample, which is likelier if the sample is larger"
Actually entails that there are more possibilities for atoms to be non-Fred (if we are using the most charitable definition of N), but notice this is counterbalanced by the decay rate. So I argue we do indeed get the same P(F) by both M and S.
In any case, Dmitriy is trying to show that P(D) is no different from P(F). Because M and S do yield a difference for P(D); Dmitriy must show that they yield a difference for P(F). So he's not actually trying to show that M and S yield the same conclusion for P(F)!
Dmitriy, this statement is ambiguous: "This gifted individual knows the state of each and every uranium atom, and even has names for them"
ReplyDeleteIt can mean all the uranium atoms in the sample, or all the possible uranium atoms in some hypothetical uranium heaven :)
I am pretty sure it is the former. But so what if it is not? Again, you have just constructed a particular interpretation where N does not change if this so. But I believe that you have missed the most crucial point; the very fact that we can construct the scenario so that N changes (in the former definition) proves the author's point (even if they themselves missed such a construction)! I gave an interpretation where our evidence does change, and that's enough to demonstrate the point (remember I only need to give one such instance). Also, if you remember I already showed that reasoning on the particular does matter when we are trying to deduce the correct decay rate in a given sample.
In any case, we have two possible interpretations of the authors statement; on yours they are incorrect, on mine they are correct. To me it is clear that the principle of charity requires the latter no?
That said, do you have arguments against the fact that my updated definition for N (which of course is what I think the authors were saying anyway) shows that reasoning from Fred does matter? As well as my point about how such reasoning gives us an incorrect conclusion in the decay rate in a given sample?
Hi Alex,
ReplyDelete> Fred is guaranteed to be in the sample because we are given his existence.
Where does it say that? I think that's the wrong interpretation.
> It can mean all the uranium atoms in the sample, or all the possible uranium atoms in some hypothetical uranium heaven :)
> I am pretty sure it is the former.
I'm pretty sure it's all uranium atoms that exist (i.e. in the universe).
If N is the number of atoms in the sample, such that N varies with M and S, then the size of N gives the size of the sample and so there's no reason to use the evidence of decay at all.
If we don't know the size of N, then we are obliged (I think) to work with N = the number of possible Ids which is just the number of uranium atoms in the universe.
DM,
Delete"Where does it say that? I think that's the wrong interpretation"
See here: "The oracle tells us that just one uranium atom decayed: Fred. From the fact that Fred decayed... Can we proceed to use half-life calculations to estimate the number of uranium atoms in the sample"
You wrote:
"If N is the number of atoms in the sample, such that N varies with M and S, then the size of N gives the size of the sample and so there's no reason to use the evidence of decay at all"
This doesn't follow at all. M and S are arbitrary, we are stipulating their size because we are trying to do this:
"estimate the number of uranium atoms in the sample"
We're just trying to know what size set is best. And what the authors are trying to show is that given any stipulated sample, M or S (for which we know the values), we can't prefer one over the other by relying on Fred.
Also, you do realize that you are arguing against the authors and your own point on the other blog that the Fred objection has merit right? I am actually trying to defend them :)
Hi Alex,
Delete> The oracle tells us that just one uranium atom decayed: Fred. From the fact that Fred decayed
I don't take that to mean that the existence of Fred (especially in the sample) is a given.
Here, what I mean by "givens" are what we know before our evidence comes in. The way you're reading it, the fact that Fred decayed is also a given, so we are guaranteed that Fred would decay.
We don't know that Fred is in the sample, or maybe even that Fred exists, until the oracle tells us that Fred decayed. So when the oracle tells us that Fred decayed, we also learn that Fred is in the sample. This arguably gives us reason to update on the size of the sample (at least when coupled with the knowledge that we learn that Fred is in the sample because Fred decayed and that the same would go for any other atom).
> This doesn't follow at all. M and S are arbitrary, we are stipulating their size because we are trying to do this:
I don't understand what your point has to do with my point. I accept all that. Nevertheless, if you interpret N to be the number of atoms in the sample, then N picks out M vs S. The only sensible interpretation is that N is the number of possible names of atoms inside or outside of the sample.
> Also, you do realize that you are arguing against the authors and your own point on the other blog that the Fred objection has merit right? I am actually trying to defend them :)
Of course. And I could say the same to you. It would suit me if the Fred objection had merit but I no longer think it does. What this goes to show is that neither of us are simply trying to win a debate. This is not a contest. We're all just trying to find the truth. As such, I have interest in defending only arguments I think work, and I'm happy to acknowledge any point that goes against my "side".
So while I recant on Fred, I'm now retreating to the tentative position that perhaps TER works in all cases where observer selection effects don't come into play.
DM,
Delete"Here, what I mean by "givens" are what we know before our evidence comes in."
In this case I say the evidence of Fred is given, because we need Fred to be true before we can even begin our analysis, per the author's stipulation (they said given that the oracle has told us x, estimate y etc...). In any case it doesn't matter that our definitions our different.
"I don't understand what your point has to do with my point. I accept all that. Nevertheless, if you interpret N to be the number of atoms in the sample, then N picks out M vs S. The only sensible interpretation is that N is the number of possible names of atoms inside or outside of the sample."
I think this is backwards in terms of what is picking out what; also I am in fact saying that N is the number of atoms in the sample. My point is that we are trying to infer the sample, call it S. So we come up with competing sample sizes S1, S2, S3, and then see which is the better explanation. Notice that we are the ones trying to show which sample is correct. Therefore we stipulate the values of S1, S2 etc... From those values we can deduce the number of N (the number of atoms that Fred could have been selected from), since N just stands for the total number of uranium atoms in the sample (as I defined it).
"Of course. And I could say the same to you."
Okay got it! I just wasn't sure if you were recanting on Fred or not. Also I think we are losing sight of the big picture here (including Dmitriy); which is that it doesn't matter even if the authors are wrong! Why? Because the authors are just trying to show that they can come up with a single instance which violates their interpretation of TER. Therefore your point about this:
"> It can mean all the uranium atoms in the sample, or all the possible uranium atoms in some hypothetical uranium heaven :)
> I am pretty sure it is the former."
Is irrelevant. What matters is that I can construct a "steel man" version of their argument, wherein if N is defined to be what I said it is, if Fred is given, if you grant me all the stipulations that I ask for, and which you are arguing that the authors don't believe, then it follows that we can actually show that reasoning on the basis of Fred blocks the hypothesis selection. That is enough, because I just want to show that the author's general claim has merit. Not necessarily their specific example.
I should clarify that the S1, and S2 samples stand for the number of total uranium atoms predicted to exist in the 1 kg matter block. In case that wasn't clear
DeleteAlso: I intended to quote you saying this:
Delete"I'm pretty sure it's all uranium atoms that exist (i.e. in the universe)."
But I just ended up quoting myself, oops!
In other words, yes N is analogous to our stipulated sample. But the point is that we don't come up with S by just guessing on what we think N should be, we come up with S by trying to see if it reasonably correlates with the half-life measurements. So it's not " and so there's no reason to use the evidence of decay at all." because the evidence of decay informs S and N.
DeleteHey DM and Dmitriy,
ReplyDeleteSo to sum up my position succinctly; to wrap things up with a neat bow on top (as I once heard Dmitriy put it) :)
I am only asking you to grant me Dmitriy's exact same setup, but additionally specifying that N must be the number of total uranium atoms in the sample. So in other words, Fred could be taken out of the entirety of the possible set of all named uranium atoms. If you do this, you will see that the N for M and S are different, entailing that the ultimate conclusion "that reasoning based on D is the same as based on F" is false. Again, I just need to come up with a single such case like the above, to prove my point.
The only possible objection I have heard to this was DM's point that if we say that N is equal to the number of total uranium atoms, then since we already know N, we don't need the decay rate to come to the correct calculations. But this isn't actually true. That's because while we know N for any given stipulated sample; we can't know which sample set of uranium atoms is better unless we take into account the decay rate calculations (based on the time that it took for Fred to pop up). The latter is still vital to our deducing the correct number of uranium atoms. So modifying the definition of N doesn't change matters.
@DM: I've noticed, in my re-reading of my posts, that many of them can come off as brusque if not downright rude. My apologies for that, sometimes I'm really rushed or in a hurry when posting (especially when I'm conversing with you for some reason). I also have a bad habit of saying mean things like "I think you're confused" which can obviously be phrased more pleasantly. I think I took up this nasty habit from my mentors that I admire, but I'm trying to change it. Hopefully your move (to Britain no?) goes well if it has not already been undertaken.
Best,
Alex
Hi Alex,
Delete> because we need Fred to be true before we can even begin our analysis
I don't know what this means. "Fred" is a name. It's neither true nor false. You could be saying that Fred must exist or that Fred must be in the sample or that Fred has decayed.
We know that Fred is in the sample at exactly the same time as we know that Fred has decayed. They have exactly the same status as givens or evidence.
> My point is that we are trying to infer the sample
If by that you mean we are trying to infer the sample size, I agree. On your interpretation, that would mean we are trying to infer N.
Well, OK, but then we can't use N in our calculations, otherwise it would be trivial to infer N. But Dmitriy is using N in his calculations. I don't think you can use this to show that Dmitriy's point is invalidated just because N changes for M vs N -- his analysis would have to change because what he means by N is not what you mean by N.
If the Oracle only knows the names of the atoms in the sample, then nothing substantial has changed, you've just made it slightly more complicated than it needs to be. There is still some set of possible names that are not known to be in the sample, it's just that now the Oracle doesn't know their names.
I don't think this changes anything. I still learn that Fred is in the sample when the Oracle tells me Fred has decayed.
I completely agree that you can steel man by setting up any scenario you want to demonstrate the point that you (or rather I!) am trying to make, but by insisting that N is equal to the number of uranium atoms in the sample rather than the number of possible uranium atoms anywhere, you're not changing the scenario so much as misinterpreting Dmitriy's analysis.
Or at least that's how it comes off to me. If you think their argument works with a different interpretation, that's fine but I think we need a fresh analysis.
> That's because while we know N for any given stipulated sample;
But then you have different values of N. You have N_m and N_s. Dmitriy's analysis just has one N. You can't really translate the findings of one analysis to another. Can you?
If you read back over Dmitriy's equations with this interpretation in mind, I don't think they make sense.
Hmm, but Dmitriy now seems to have rewritten the post not to use N at all so it's not so easy to check.
Hi Alex and DM,
ReplyDeleteI just finished reworking the proof and completely got rid of N. This actually starts to look like a more general defense of TER, specifically of my contention that if we do Bayes by including known but irrelevant info, we never get an error, we get the same result we would get if we excluded such info.
This proposed counterexample doesn't work:
"I am only asking you to grant me Dmitriy's exact same setup, but additionally specifying that N must be the number of total uranium atoms in the sample. So in other words, Fred could be taken out of the entirety of the possible set of all named uranium atoms. If you do this, you will see that the N for M and S are different, entailing that the ultimate conclusion "that reasoning based on D is the same as based on F" is false. Again, I just need to come up with a single such case like the above, to prove my point."
It's true now that the answer we get on F (in my notation on D1(F)) differs from the answer based on D (D1). But it's the former that's correct, not the latter! So no contradiction with TER.
I'm not sure what you mean by the former being correct but not the latter; they should both be correct no? All equations are correct in that they correctly give us what is stipulated. D1(F) for M is correct and D1(F) for S is correct, as well as D1 for S and M. N is irrelevant to the probabilities for D1; so they don't suddenly become incorrect on our changing the definition of N.
DeleteThe difference is that D1(F) for M and S are equivalent. If you do the math, you'll see that the constant K works out to be the exact difference between P(D1| M)/P(D1|S). So P(D1(F)| M)=P(D1(F)| S). This was the point of the entire analysis, not that this would give us incorrect results, but that because P(D1(F)| M)=P(D1(F)| S) we wouldn't be able to choose M over S on knowing Fred.
And this is because if M is 2S, then on M we have twice the N in S. This means that the number of atoms from the pool that Fred is drawn is doubled, so that N1/N2 =2. But also note that the decay rate is twice as fast for M relative to S. So D1(F)| M is just as likely to occur as D1(F)| S.
DeleteHey guys; there still seems to be some confusion over what I am talking about. I am not saying that we have to accept that there are N possible names outside the sample, but that nevertheless we should change the definition of N to be the number of atoms in the analysis. I am saying that if we make the additional stipulation that the authors' claim about the oracle's knowledge is actually meant to interpreted as being about the atoms in the sample; then we can see that reasoning from the particular does make a difference. So not only should we adopt my proposed analysis, but the principle of charity would actually demand that we interpret the authors in the way I proposed.
ReplyDelete"But then you have different values of N. You have N_m and N_s. Dmitriy's analysis just has one N. You can't really translate the findings of one analysis to another. Can you?"
Yes that's the whole point (that we get different N values for M and S). And of course you can, here's the math (I'm trying to keep as much of it as close to RMT's to avoid confusion):
Let
F = the unique id (name) of the decayed atom, i.e. Fred, where a decayed atom has a name iff it exists in the sample.
D1(X) = "Only 1 atom decayed and it had id X".
D1 = "Only 1 atom decayed".
S = the hypothesis that there is only a small number of uranium atoms in the sample
M = many uranium atoms in the sample (the numbers are not important for the logic)
N1= The number of uranium atoms in M
N2= The number of uranium atoms in S
We can quickly see as before that: P(D1 | M) = N1* P(D1(F) | M)
and P(D1 | S) = N2*P(D1(F) | S)
But note that N1 and N2 have different values; that's because they are dependent on the values of M and S we stipulated (we can choose any M or S that we want). As RMT said, the numbers are not important for the logic. What's important is that we know that the N values for M and S must be different. Therefore we can derive in
Step 1:
N1* P(D1(F) | M)/N2*P(D1(F) | S) ≠ P(D1(F) | M)/P(D1(F) | S)
Step 2:
N1* P(D1(F) | M) = P(D1| M)
N2*P(D1(F) | S) = P(D1| S)
From Step 1 & Step 2 you get: P(D1| M)/P(D1| S) = K[P(D1(F) | M)/P(D1(F) | S)]
Where K = N1/N2; where K ≠ 1
Conclusion: you can't get
P(D1| M)/P(D1|S) = P(D1(F)| M)/P(D1(F)| S)
Because now the right hand side of the equation includes a constant (not equal to 1).
Best,
Alex
Alex,
DeleteAll of that is true, but it is not a demonstration that TER is incorrect. The point I was trying to make before is that if the conclusion based on including all data (D1(F)) differs from one based on incomplete data (D1), then we should adopt the former conclusion. In that sense the latter conclusion is incorrect.
Dmitriy, I agree that TER says we should adopt D1(F). But that's not the problem that the authors describe; did you read my other comment? The problem is that:
Delete"D1(F)| M is just as likely to occur as D1(F)| S. And this is because if M is 2S, then on M we have twice the N in S. This means that the number of atoms from the pool that Fred is drawn is doubled, so that N1/N2 =2. But also note that the decay rate is twice as fast for M relative to S."
So we can't privilege M over S. Even though of course there is only one correct sample size, and we can come to a reasonable conclusion on the correct sample size if we instead reasoned from just D1.
This is why I feel my interpretation of N is even more charitable, because their conclusions don't even make sense on your interpretation of N.
DeleteFinally here's another issue with a similar analysis; to demonstrate that such problems can crop up elsehwere.
ReplyDeleteLet
F = the unique id (name) of the decayed atom, i.e. Fred, where a decayed atom has a name iff it exists in the sample.
D1(X) = "At least some atom decayed and it had id X".
D1 = "At least some atom decayed".
C = the hypothesis that the decay rate in sample S is x.
S= fixed sample size of uranium atoms (let's just say 1 kg of pure uranium)
We want to know whether we we should privilege C or C2 on the basis of D1 or F.
Assume C is the correct half-life calculation. Assume we observe a small time period over which a small percentage of the total number of uranium atoms decayed. We don't know what this percentage is, or how many such atoms may have decayed. We just know that Fred decayed.
We get P(C| D1) = P(C| D1(F)) *K
Where K is the value of the constant needed to make up for the fact that P(D1(F)) occurring, given C, is much rarer than P(D1) occurring. So we see that we do get different results regarding the likelihood of C being the case if we use D1 or D1(F). If P(C| D1) is correct, then P(C| D1(F)) can't be.
To put this another way:
The probability that C has a high value (high decay rate) is going to be higher given D1(F) than just D1. So using the former would predict a huge decay rate on account of the fact that we would presume that many such atoms regularly decay to explain the improbability of our observing D1(F) in that small time period
Obviously, P(C| D1) is correct since our knowing the information about Fred is completely irrelevant. As TER demands we use Fred, this is a real problem. If we don't add a qualifier that TER is about taking into account all *relevant evidence, then we are in trouble because we have no way to discard D1(F).
Now to be fair, I'm not really sure how I feel about this. I think it's the worst of all my potential objections. Because our knowing that some atom decayed in a time interval only puts a higher probabilistically-limited bound on what the decay rate is likely to be. We're not really going to be able to derive P(C) with any degree of high likelihood from that sparse information.
DeleteNevertheless, it still follows that using D1(F) is silly because it artificially constrains our analysis to make the higher bound of the half life extremely low. If the elapsed time is stipulated to be super low, then the higher bound may be ridicously low (i.e. more probable than not that half life decay rate is less than 5 seconds, given that Fred occurred in first .1 seconds).
Whereas, let's assume that we knew absolutely nothing about uranium decay, except that the half life can't be more than 100 billion years. In that case, artificially limiting our higher bounds to such a low range will obviously get erroneous results.
If you guys wish, I can do a formal analysis of the "this planet" objection as well, because I think it works too (and is actually easier to follow). I can come up with many other such examples as well. But I'll stop here; in case you wanted to address what I wrote.
ReplyDeleteIn your last example I don't see how you concluded
Delete"We get P(C| D1) = P(C| D1(F)) *K
Where K is the value of the constant needed to make up for the fact that P(D1(F)) occurring, given C, is much rarer than P(D1) occurring. "
If you flip C and D1.. then the equation is correct.
On a separate note, the way you defined things P(D1|C) = almost 1 for hypothesized decay rates above a certain threshold, which is not necessarily a problem. But the first thing I believe is.
I didn't derive it; I just assumed it intuitive given that D1(F) is more unlikely on C than D1. So K is derived from the fact that P(D1| C) >P(D1(F)| C). If D1 is more likely instantiated by C than D1(F); then it follows that knowing D1 makes C more likely than knowing D1(F) would. Why? Because the population class of D1 and D1(F) are the same size.
DeleteIn any case, my follow-up comment elaborated more on how I envisioned this scenario to work. But I think this irrelevant until we finish the discussion about the authors. You still didn't address the point, I feel, that P(M) =P(S) given D1(F). And so, as the authors write:
"Since the presence of other uranium atoms makes it no more likely that Fred should have decayed, the fact that Fred decayed doesn’t confirm the hypothesis that the sample contains the calculated number of uranium atoms. Indeed, we are not even entitled to conclude that there is more than one uranium atom in the sample. The extra information that it was Fred who decayed blocks such inferences"
Alex,
ReplyDeleteI don't get the "Why?..." part at all. But I am pretty sure the conclusion in your first paragraph above is not right.
"You still didn't address the point, I feel, that P(M) =P(S) given D1(F)". I agreed above that under your assumptions about N1 and N2 that conclusion follows from TER. But I added that, contrary to what the authors say, that's not a problem for TER. It would only be a problem if that conclusion was incorrect under your assumptions. But it is correct, and the answer based on noTER is incorrect, showing that it is noTER, not TER, that has a problem.
Dmitriy,
DeleteAre you sure you want to say that? :)
Remember that it is stipulated that only one atom decayed in the specified time interval, which was Fred. Are you really trying to argue that we can't, on the basis of that information, deduce whether M or S is more likely? I agree of course that TER prevents us from doing so, but the point is that if we have a good method for discerning between M and S (relying on D1 alone); then we can actually choose the correct one. In that case, it seems obvious that we should either abandon or qualify TER.
As for this: "I don't get the "Why?..." part at all. But I am pretty sure the conclusion in your first paragraph above is not right."
Let me briefly illustrate with a thought experiment. Suppose that there were 100 students (S), 10 of whom read books (B) and 5 of whom liked to take walks (W). If the population class of all the walkers and book readers is the same size, let's say 1000, then it follows that if B, there is a 1% chance of being a student, and if W, then there is a .5% of being a student. We see that if S, then B is more likely to be instantiated than W, and also that knowing that we are a walker makes us less likely to be a student than knowing that we are a book reader.
I am similarly stipulating that the population class for D1 and D1(F) is the same size, and so from "P(D1| C) >P(D1(F)| C)" it must follow that: "knowing D1 makes C more likely than knowing D1(F) would"
For the same reason as in our thought experiment.
Hey Alex,
ReplyDeleteLet's focus on the first part first, I think it's vital:
"Are you sure you want to say that? :)
Remember that it is stipulated that only one atom decayed in the specified time interval, which was Fred. Are you really trying to argue that we can't, on the basis of that information, deduce whether M or S is more likely?"
Yes:) Consider
S = one atom, Fred
M = two atoms, Fred and Franny
Suppose the priors are 50-50. Suppose the decay rate is small.
If you are told "Franny decayed", the correct posteriors are 100:0.
If you are told "Fred decayed", the correct posteriors are 50-50.
I agree, but I'm not sure how this is supposed to refute the author's point? So in this case, knowing Franny is mutually exclusive to S. Therefore, knowing ~Fred is mutually exclusive to S.
DeleteBut in our stipulated scenario we have:
S= one uranium atom, Fred
M= 10^500 uranium atoms, including Fred
I have modified the value of M to make this more obvious. Suppose we knew that in the time interval of ten seconds, only Fred decayed. Suppose that we knew that the half life of uranium was ten seconds (remember that the correct half life value is also given). In this case, M is obviously the incorrect option.
I actually don't think the mutual exclusivity is relevant, never mind that. So the reason the posterior is 50/50 is because you have stipulated "Fred decayed' whereas we know "only Fred decayed" in actual fact. Per the authors:
Delete"The oracle tells us that just one uranium atom decayed: Fred"
In my example, adding "only" doesn't change the analysis because of my stipulation that the decay rate is small.
ReplyDeleteIn your example, this is no longer true:
" The problem is that:
"D1(F)| M is just as likely to occur as D1(F)| S."
As a result, the TER doesn't reach the erroneous conclusion that S and M are 50-50.
Dmitriy,
DeleteIf you mean that taking into account the evidence that other atoms did not decay (call this V) leads to the conclusion that P(M) ≠ P(S), and is still compatible with D(F), so that we can reason as such: P(D(F) + V) entails (P(M) ≠ P(S)); then I agree with you. But notice this isn't what the authors are disputing, they are saying that reasoning on the basis of P(D(F)) just means stipulating the conditions under which P(M) or P(S) come out true. And it's still correct that if you require that (D(F)) has to be the case, then knowing V doesn't modify the impact that P(M) or P(S) have on D(F) being fulfilled. I hope we agree on that, even if we can also agree that knowing V gives us strong independent evidence for realizing that P(M) ≠ P(S).
But according to the authors, accepting such independent evidence would be rejecting the consideration that we should take into account the conditions set by D(F). I do agree with you (as I mentioned in the email) that the way in which they construe TER is too narrow. I still agree that taking into account all the evidence is going to be best. They're just stipulating that if A entails B, then reasoning solely on A should be better than B (call this P). But notice that P is meant to be used comparatively, absent one's background knowledge. Meaning we can just "hide" the knowledge of V as background knowledge, and then invoke P (wherein we get the problems).
What you are doing is pointing out that there is background information here that, when taken into consideration, will do away with the problems. I agree with this, but the point of the author's interpretation of TER, and the reason it was considered to be so helpful/useful; is that it was supposed to show that everything else being equal (i.e. no matter what we have in our background knowledge) P is a great way of resolving matters without having to dig up any potential auxiliary hypotheses. And it's still a blow to show that we can't do that.
You've just demonstrated that taking into account V, or all the evidence, works regardless if we are trying to show P(D(F) or P(D). Which is another way of saying that V works regardless if we take the evidence of P(D(F) or P(D) into account. That's because determining the former is a matter of backwards probabilistic inference (BPI); we resolve it by asking whether M or S makes P(D(F)) or P(D) more likely. If yes, then the latter is evidence of the former. So what we are trying to show through M or S, is the same as the evidence we reason from.
Therefore, the above is not enough to solve the dilemma that TER means to solve (regarding which evidence should be taken into consideration), because we are using BPI, which means that we need to also demonstrate whether we should in fact use P(D(F)) or P(D) to solve the question about the number of uranium atoms. So TER is incomplete in that sense, and I hope you can see that P was of no help in resolving the question it purported to solve in this case.
Now of course you might say that we don't need TER for that, and the answer is obvious that we should reason to the likelihood of P(D) being the case in order to solve the actual question posed, and I agree that there are alternative commonsensical ways of deriving that. I laid them out in Philip's blog, but the point is that we need to go above TER to do that.
Hence, I agree that the authors are wrong to say that TER needs to be modified in the manner they envisioned. I laid this all out in my email I sent you, saying that I disagreed with their modifications but thought that this still demonstrated that the narrow way of using TER (principle P) is incomplete. You seemed to disagree with what I was saying in your reply; so I assumed you meant to defend P as it was being used (i.e. it's true no matter what we hold back in our background knowledge). If that's not the case, then my apologies. My other attacks were just against P too; I think we might have just been talking past each other :)
Best,
Alex
Hey Alex,
ReplyDeleteIt seems like by D(F) you meant something other than D1(F) because D1(F) certainly entails V.
The oracle tells us D1(F), and the authors erroneously argued that using the full info the oracle gives leads to an error. Do you agree that even in your scenario using D1(F), not D(F), is ok?
Hey Dmitriy,
DeleteI took D1(F) to mean that one atom decayed which is Fred. If this is supposed to mean only one atom decayed which is Fred; then of course that entails V. But just knowing that Fred decayed D(F), isn't going to be enough. I think the authors were just trying to construe TER as a very narrow principle P approach; I'm sure they would agree that using D1(F), if I've interpreted it correctly, is okay and gives you good results. Rather, they're arguing that reasoning from this: "the fact that Fred decayed" or D(F) blocks the inference argument. Why is this a problem? Because they want TER to be able to tell them which of these, D(F) or D1, is better. And it seems like TER can't do that.
You'll notice that you showed that using D1(F) works correctly in both circumstances where we are trying to show the likelihood of D1 or the likelihood of D(F) on the basis of M or S. But we need an additional step to get to this :"we need to also demonstrate whether we should in fact use P(D(F)) or P(D) to solve the question about the number of uranium atoms."
I think the bottom line is that the authors are simply too demanding; they are expecting TER to solve something it was not meant to solve (principle P). I argue that TER isn't meant to be used to make comparative choices between two sets of evidence (on the grounds that one entails the other) without modifications. It worked in the multiverse Bayesian analysis, and it works for choosing between D1(F) and D1, but doesn't for D(F) and D1.
That doesn't mean that taking into account the total evidence is bad of course. It just means that TER isn't the sole approach we can use to determine relevancy; we need to rely on other things. Whereas they prefer to modify the principle in ways that is favorable for their argument; another approach that would work or us is to modify principle P by saying that we should only take into account all *relevant evidence. Or another way of putting that is P, ceteris paribus. So it only works if we already controlled for other ways of taking into account relevancy. In that case, D(F) would be discarded because it's irrelevant; of course this admits that TER is incomplete.
In other words; I interpret the authors to be saying something like this.
DeleteScenario 1) Suppose absolutely everything above was stipulated except for the name of the particle. So we know at least one particle decayed (D), and we know V. Here I distinguish D from D1 (only one particle decayed).
Scenario 2) Suppose we now know the name of the particle, Fred
It seems like going from scenario 1 to scenario 2 is just to take into account irrelevant information, normally we would discard it. But if we took into account D(F) then according to the authors we would need to discard D. Because of BPI, that means we can't reason to the likelihood that D is the case, because saying the former is just the same as saying that we are taking D to be evidence. Therefore, we can only reason to the likelihood that Fred is the case, which obviously isn't useful to us.
And notice, all this is fully compatible with the claim that just relying on D1(F) can give us the right answer. Because D1(F) is just: only one particle decayed, D1, and his name was Fred, D(F).
My apologies for the equivocation in definition. In my first reply above I was using D1 in the way I originally used it; to mean this:D1 = "At least some atom decayed"
DeleteLater, I re-read your post and realized you meant D1 (and D1(F)) to take into account the 'only' clause. So I switched D1 to D to make the distinction clearer. Just substitute D for D1, in my first post above. Apologies
Hey Alex,
ReplyDeleteI got confused very quickly, can we do this in tiny chunks?
"Because they want TER to be able to tell them which of these, D(F) or D1, is better. And it seems like TER can't do that."
Why you think that is what they want? D(F) doesn't entail D1, or vice versa, so what does this have to do with TER? TER doesn't talk about choosing between evidences A and B when neither of them entails the other.
Hey Dmitriy,
DeleteSorry about that, I addressed this in my comment above: "In my first reply above I was using D1 in the way I originally used it; to mean this:D1 = "At least some atom decayed" Later, I re-read your post and realized you meant D1 (and D1(F)) to take into account the 'only' clause. So I switched D1 to D to make the distinction clearer. Just substitute D for D1, in my first post above."
I can't remember if you initially wrote D1 as just being an atom decayed, and then included the 'only clause' in your updated revision; or it's possible I simply misread it from the beginning. To simplify from now on I will use the following terminology:
D1 = "Only 1 atom decayed"
D= At least one atom decayed
D(F) = Fred decayed
D1(F)= Only Fred decayed
V= Only one atom decayed
So as you can see, I meant to say that TER wrongly tells us to use D(F) over D. Remember that the phrase 'updating on total evidence" is relative. This is because what is being updated is going to depend on how we construct our hypotheses and what gets included in background knowledge. In your case, you decided to merge D(F) and V into D1(F), but we can just separate them so we have D and V, but where we are missing D(F).
In this new scenario I constructed, which is the same as yours but just using different terminology for the pieces of evidence; we can see that updating on total evidence (given that we have D and V), just means adding D(F). So then we have a case where using TER means including D(F) over D. Hopefully that makes sense.
This isn't the end story, because one can say that our using D(F) and V (or D1(F) for simplicity) still gets us the results we want even if we don't have D. But in fact, doing so will have undesirable consequences (it will mean TER is incomplete). Consequences that I argue entail that the Bayesian analysis for the multiverse (that updates on D) is also incomplete. However, I won't get into that just yet.
When I refer to the Bayesian analysis of the multiverse updating on D, I am of course referring to the evidence that was used in the multiverse case; which was, "our universe is fine tuned".
Delete"This isn't the end story, because one can say that our using D(F) and V (or D1(F) for simplicity) still gets us the results we want even if we don't have D."
ReplyDeleteThat was exactly the point of my post. My contention was that the authors erroneously disagree with this. Do you agree that they do claim using D(F) and V blocks the correct inference?
Let's get through this little chunk, and after that I’m very curious to hear what undesirable consequences this would have, and in what sense TER would be incomplete.
Okay moving on then. Assuming you agree that we should use TER to discard D in favor of D(F). Then we are left with D(F) and V. We are "stuck" with reasoning towards the likelihood of D(F) being the case (not what we wanted), unless we use V to derive D. "At least" just means not less than, and so D can actually be derived from V.
DeleteIf you're wondering why we don't use D(F) to derive D and then reason from the latter; that wouldn't make any sense since that directly overturns the whole point of the TER principle. Note in the multiverse case, the whole point was that we need to abandon reasoning on some universe (analogous to D) because we had evidence in the form of this universe (analogous to D(F)). Now there are two possible scenarios:
A) TER is meant to be interpreted absolutely: Meaning that it is never okay to use D as long as one has D(F), because D(F) entails D.
or we add a caveat to TER in the form of
B) TER isn't applicable in instances where one has independent evidence for the weaker evidence: Meaning that one can use D, as long as one has independent evidence for D (here in the form of V). Under B, we wouldn't invoke TER to discard D if one has V.
If we use interpretation A then we are unfortunately stuck. Even though we have V, we couldn't use it to derive D again under interpretation A, because we would still be forced to invoke TER again and discard D on the grounds of D(F). Hence, we can never incorporate D as evidence assuming A. Consequently, we couldn't ask what is the likelihood of D having been true on M or S, because that would assume we had D in the first place (of course you can still do the calculations and they would be valid, but they remain unsound as long as we lack D). We are forced to always consider the likelihood of D(F) being true; which means we can't come to the correct conclusion regarding the sample size (P(M) = P(S)).
Therefore, I argue that we must use interpretation B. So we use TER to reject D at first, but our having V, along with D(F), means we can still reason to D being true because we have independent evidence for D (in the form of V). Therefore, under B, we aren't 'stuck'. The problem with this approach of adding a caveat to TER, is that TER becomes incomplete.
We can see why this is problematic if we go back to the multiverse argument: in so far as we relied on TER, it was to back up the 'this universe' objection. Assuming however that we had independent evidence for some universe being fine tuned in the form of the multiverse hypothesis (M), then we could use M to argue for "some universe is fine tuned (O)". And under interpretation B, we could no longer use TER to refute the Bayesian analysis that relies on O. Hence, if we ever had independent evidence for M, then the Bayesian analysis goes out the window.
That doesn't mean that the failure of the Bayes' analysis demonstrates that there really is a dependence between P and M. Of course not, the failure of the Bayesian analysis to illustrate the independence in all instances, doesn't therefore imply that there is a dependence. This is why the ensemble is so important, because it can demonstrate the independence in all cases, even if we had good grounds for M.
You'll note that I already wrote about this on Philip's blog a few days ago; so I basically settled my opinions on this issue then and haven't changed them since. I hope that all this made sense and was easier to follow in comparison to my other posts; I wonder what you think. Basically, I made this same case in my email I sent you 2 days ago where I similarly argued that the Bayesian analysis is incomplete because we need to adopt interpretation B. I wonder if you now agree, or still think we can get away with reasoning on A.
As for the authors, I'm guessing they think interpretation A is correct. Which of course I don't agree with. They do try to 'save' TER by introducing some caveats of their own (which naturally are very favorable to their argument). I argue we don't need any of that, as long as we stick with B.
DeleteMy apologies I must add yet another caveat! In my original post; I used V to mean "no other atoms decayed", I changed it here to mean "only one atom decayed" in the hopes that it simplified things. Unfortunately this would actually invalidate my own analysis! Because obviously we then could just reason to V. However this again is just a peculiarity of the way we define our postulates/evidences. If we instead modify V to be "only Fred decayed" or my original version, we could then see that we still need D (i.e. we need the knowledge that a single generic atom decayed).
DeleteIn fact, we could still use the original version of (only one atom decayed), as long as we realize that this is a composite proposition. It has multiple parts: the knowledge that a single atom must have decayed, and the knowledge that no other atoms decayed. The problem is that under interpretation A, the knowledge of Fred invalidates the knowledge of the former portion (that a single atom must have decayed) which was 'masked' or incorporated in V.
DeleteSo then we would be forced to modify our V evidence, discarding the portion that the knowledge of Fred invalidated. Because, TER tells us that reasoning on the basis of some piece of evidence entailed by Fred (and part of V is entailed by Fred) is bad so long as we had Fred.
One possible objection I can anticipate to the above, is that TER isn't meant to apply to compound propositions. If A entails part of B, we can't use TER to discard that part of B that A entails. This, in effect, is what you have been doing by trying to deflect the authors' criticism by incorporating the evidence of D into some other thing, be it D1F (only one atom decayed and it was Fred) or V (only one atom decayed).
ReplyDeleteHowever this objection relating to compound propositions cannot work. Firstly, because such compound propositions are equivalent to the conjunction of their component parts. So, we might ask why we can't reason from the latter but not the former. If we reason from the latter, then we would see that interpretation A requires us to get rid of the conjunct part that is entailed by the stronger piece of evidence.
Furthermore, we can just stick the weak piece of evidence into a larger proposition in order to save it. For example, in the multiverse argument, we could emplace the evidence of 'some universe is fine tuned' into the proposition 'some evidence is fine tuned and 2 +2'. Then we can't use TER to discard the combined proposition, and we would be stuck with the wrong conclusions. So we do have to adopt B after all.
*some universe is fine tuned and 2 +2 =4
DeleteHey Alex,
ReplyDeleteI don't think we need anything like B. A is closer to my position, but it seems a bit imprecise to me. I would propose
C) TER is meant to be interpreted absolutely: Meaning that it is never erroneous to use Bayes with the total evidence. In our case that means using everything the oracle said, D(F) & D1.
I think our discussion has been impeded somewhat by a lack of precision on both our parts, so let me make sure you know what I mean by "use": I mean "perform Bayesian update on". And I switched to D1 from V, because I am not sure what V is now. D1 = "Exactly one atom decayed". So now the precise technical meaning of C is:
----------------------------------------------------------------------------------------------------
C) TER: Given a situation with competing hypotheses M and S, the correct Bayes factor to update odds ratio M:S is always equal to
P(E | M) / P(E | S), where
E = total evidence available in the situation
----------------------------------------------------------------------------------------------------
In our example E = D(F) & D1. Importantly, note that C doesn't prescribe HOW that Bayes factor must be calculated. In particular, we can first update just on D(F) and then on D1, or the other way around, or on both simultaneously, That is a question of the most convenient technique, and has nothing to do with C.
So now, do you feel your objection against A applies to C? I don't think it does:
"We are forced to always consider the likelihood of D(F) being true; which means we can't come to the correct conclusion regarding the sample size (P(M) = P(S))"
Given the technical definition of C, we definitely can.
I think we can derive a stronger interpretation of TER here. If TER is just "Meaning that it is never erroneous to use Bayes with the total evidence."
DeleteThen we can derive the principle that if using an analysis with the total evidence conflicts with using an analysis with (total evidence - some evidence), then we should prefer the former. Otherwise, the latter being true would make the former erroneous, violating your definition. So it is not enough to show that updating based on the total evidence can give correct credence's (I agree with this).
You also have to defend this part: Suppose the total evidence (T) is composed of X (all the relevant evidence) and Y (a piece of irrelevant evidence). If it turns out that doing an analysis based on Y + X yields a contrary conclusion to an analysis based on just X, we have to logically adopt the former for the reasons I mentioned.
In our case we can construct a simple scenario:
T = D(F) & D1
Y= This atom's name was Fred, and it had a 1/N chance of forming.
Now suppose we took our T and subtracted Y from it. Call this new hypothesis absent Y, the X hypothesis. According to you, it can never be wrong to update X using Y. But in this case updating based on Y would mean we have to take into account a new condition (1/N) which screws up the calculation. Of course we don't update based on Y, because we all understand that Y is irrelevant. But if you don't stipulate that TER is just about *relevant evidence, you will run into problems like the above.
In other words, you assume that we are supposed to understand that updating based on Y wouldn't change anything because it's obvious that Y is irrelevant. But that doesn't follow if you interpret TER too literally, and the whole point of this thought experiment was to pick out a piece of evidence which is obviously irrelevant. But suppose the irrelevant evidence was not obviously so (like in the multiverse case), then it would be more difficult and it would seem that TER would lead us to the incorrect conclusion.
In the multiverse case, you used the absolute interpretation of TER to reason that the 'this universe' evidence entailed our adopting a new condition which the Bayesian analysis had to meet (M had to explain why our particular universe...). For some reason, we can't use TER here to say that Y tells us we have a new condition of 1/N names that the hypothesis must explain. And you haven't justified why we couldn't do that on your preferred interpretation.
I wrote: "Call this new hypothesis absent Y, the X hypothesis"
DeleteI meant: "Call this new proposition absent Y, the X proposition"
First can we clarify this:
ReplyDelete" it is not enough to show that updating based on the total evidence can give correct credence's (I agree with this)."
If you agree with this then it logically follows that if updating on partial evidence gives different credences, then they must be incorrect.
Discard that. What I meant is that it is not obvious what conditions we should take into account when we use a piece of evidence. For example, we might think that 'taking into account' the evidence D1(F) just means we need to consider the condition that only one atom decayed (the hypothesis must explain this). It is only when we separate the evidence into component parts that we can see that taking into account just the fact that Fred has a 1/N chance of forming must mean stipulating that our hypothesis must meet the 1/N condition.
DeleteSo using the total evidence can be done without stipulating the 1/N condition because the nature of 'taking into account' evidence is vague. That is what I meant, but let's discard that because I certainly did not mean to say that what you showed is correct. The problem is that your principle entails what I wrote above, that even if we separate the 1/N part from the total evidence, we should still update based on such evidence.
So I argue that the process of "updating based on evidence" is fundamentally vague. I argue that you are exploiting this vagueness to your benefit, so that it is not obvious when we engage in such Bayesian updating for a large composite piece of evidence, that we must take the condition 1/N into account. In fact, I would agree that we don't necessarily have to; that's the nature of vagueness after all.
DeleteBut by decomposing strands of evidence into their component parts, we can then get a more intuitive feel that updating based on the component part must necessitate our adopting the condition 1/N for the hypothesis in question. Which would give erroneous results.
Hey Alex,
ReplyDeleteI think if we analyze your case carefully all problems will dissappear:
"
In our case we can construct a simple scenario:
T = D(F) & D1
Y= This atom's name was Fred, and it had a 1/N chance of forming.
Now suppose we took our T and subtracted Y from it. Call this new hypothesis absent Y, the X hypothesis. According to you, it can never be wrong to update X using Y. But in this case updating based on Y would mean we have to take into account a new condition (1/N) which screws up the calculation. Of course we don't update based on Y, because we all understand that Y is irrelevant. But if you don't stipulate that TER is just about *relevant evidence, you will run into problems like the above.
"
- I don't quite understand what you mean by forming and N. Is N the unknown number of atoms?
- Most importantly, whatever Y exactly means, it doesn't seem to be entailed by D(F)&D1
To clarify, in the story there is no literal forming, only decaying Do you mean the probability that it would be named Fred was 1/something?
DeleteYes the latter, that the probability that the atom would be Fred out of all the possible names is 1/N. Where here N is contingent on the size of the uranium sample.
DeleteAnd I assumed that Y was in D(F); that you were granting that our knowing D(F) came about from the oracle telling us etc... But if we all we literally know is D(F) and D1, and we didn't know anything about any oracle, then the solution is simple. Just add Y to the total evidence T.
DeleteSo X = D(F) and D1, and T = D(F) & D(1) & Y.
Without Y we don't consider D(F) to be special because we don't know the conditions under which Fred could have happened. We might assume that all decayed atoms are named Fred etc...
However in reality I meant D(F) to be inclusive of Y. For D(F) to mean that we know Fred decayed, and of course it is presumed that we already know what Fred is (i.e. that he has a 1/N chance of decaying out of all the atoms). In that case, taking Y out would modify D(F); so I just neatly expressed the remainder as X.
Where this "then the solution is simple. Just add Y to the total evidence T.
DeleteSo X = D(F) and D1, and T = D(F) & D(1) & Y."
Of course means that we have X, and then at some later date learn Y. Once we incorporate Y we can see that our hypothesis about the correct uranium sample size gets screwed up if we use interpretation A. In a similar manner to the this universe objection (although in that case we aren't screwing things up).
Because incorporating Y in our Bayesian analysis (i.e. updating on Y), means we take into account the 1/N condition.
Ok, let's keep things as separate as possible, I propose Y shouldn't be part of D1&D(F),
ReplyDeleteY = "Probability of an atom to be named Fred is 1/N, where N = atoms in sample"
D(F) = "One of the decayed atoms is named Fred"
D1 = "Exactly one atom decayed"
Now the crucial part is: In my article I assumed Y to be the background knowledge, so the two Bayes factors I proved to be equal were:
factor for updating from knowing Y to knowing Y&D(F) EQUALS
factor for updating from knowing Y to knowing Y&D(F)&D1 (C1)
Why like that? First, because in the story what the oracle gave us was I felt the quantifiable statement D(F)&D1, while Y, the information about the set up was known separately. And second, because it doesn't matter in the slightest for the specific purpose of proving that using all info doesn't mess things up, claim (C1) is mathematically equivalent to
factor for updating from knowing 0 to knowing Y&D(F) EQUALS
factor for updating from knowing 0 to knowing Y&D(F)&D1 (C0)
and also equivalent to
factor for updating from knowing Y&D(F) to knowing Y&D(F)&D1 EQUALS 1 (C2)
I feel like you conceived of the whole situation somewhat differently, but do my definitions make sense to you, and do you agree that all three claims above are true?
Of course you can construct the scenario so that Y is inclusive to both analyses.
DeleteSo that we're going from Y to knowing Y&D(F)&D1 or Y&D(F)
In that case, additionally updating on D1 obviously doesn't change things. Nor vice versa (i.e. updating on D(F) if we had D1). I never disputed any of that; I am saying that updating on irrelevant evidence (in this case Y) is going to change things. You just showed that updating on relevant evidence doesn't change things (which I completely agree with). The whole point is that you are supposed to grant my stipulation that we have a piece of irrelevant evidence (Y) with which we wish to update on. Constructing your scenario so that you deliberately ignore this (by saying that we already have Y in our background) completely misses the point of my objection.
That's because interpretation A is absolute; meaning that you should be able to show that we have to use (T) over (T-S), where S is some evidence, for ALL possible cases. Just because you showed that in your case, where S is either D(F), or D1 & D(F), that the TER principle works, doesn't mean that it works for all cases. You need to defend it in my case. So your objections based on what you think the oracle actually showed, what the authors actually meant, are inapplicable; concentrate instead on my case.
I constructed our scenario so that we had T, but then subtracted Y from it. The analysis that includes Y will differ from the analysis that doesn't in one important key factor (the former and more inclusive one, will come to the conclusion that we must take into account the 1/N condition). Under your interpretation of TER we would have to adopt that condition; which leads to an erroneous conclusion.
So in this scenario assume that we had T-Y. Meaning the oracle told us everything, including D(F), except that Fred had a 1/N chance of decaying, where N is the number of uranium atoms in the sample. We also don't know that there are other named atoms (for all we know D(F) just means that every possible decayed atom is named Fred). Now assume that the oracle gave us this additional information of Y (we have T). In going from T-Y to T, we have to accept T because T entails T-Y.
The problem is that our hypothesis makes a claim about the correct number of uranium atoms in the sample. We had good grounds to believe this hypothesis on the T-Y evidence, but if we update on Y; then we no longer have good grounds because M =S= any other hypothesis. Updating on Y means that all hypotheses regarding the number of uranium atoms are equally likely to be true. And so we can't reason to the correct conclusion.
And the reason that updating based on relevant evidence (which your analysis does) doesn't change things in this case is because both Y&D(F) and Y&D(F)&D1 tell us that we should reason to the same conclusion that M =S. That's because both cases incorporate the totally irrelevant piece of information, Y, as evidence.
DeleteOk, I understand now that your objection is specifically about updating on T-Y vs on T, but one key part of your objection is still unclear to me. Are you saying that:
ReplyDelete1. T-Y gives the correct answer for M:S; T gives a different, incorrect answer, OR
2. T gives the correct answer; T-V gives a different, incorrect answer even though Y is irrelevant
I have good responses for both I am pretty sure :)
I am saying 1. So let's say that hypothesis L is the correct answer as to the size of the sample concerning the total number of uranium atoms. T-Y gives us a high probability for L. T can't do this because all hypotheses about the size of the size become equally likely.
DeleteAnd this is because taking into account the piece of information Y as evidence is precisely something we should not do. Y is irrelevant, therefore it's not actual evidence, and so we shouldn't update on Y. But notice the whole point of TER is to tell us what is relevant or not (i.e. what pieces of information should constitute evidence).
In the multiverse case, proponents of the some universe line of reasoning would say that the information that "this universe is fine tuned" is irrelevant and should be discarded (it shouldn't count as evidence). Invoking TER allows us to refute this and argue that actually it should count as evidence. So, why can't we utilize TER here to say that what seems like an irrelevant piece of information Y, actually needs to be counted as evidence? Meaning we need to take into account the 1/N condition, and update our credence's accordingly.
*because all hypotheses about the size of the sample become equally likely.
DeleteForgive me for diving into some semantics here, but I think it very important that we specify exactly what we think the function of TER is, and what it should be doing.
ReplyDeleteWe seem to agree that TER tells us to take into account Y as evidence, in the case of T-Y vs T. So what does 'taking into account' evidence really mean? Typically, philosophers just think that evidence is any piece of information which modifies the likelihood of the hypothesis being correct. Or, in other words, has bearing on the probabilistic outcomes of the hypothesis. This definition suits us well for our conversation today.
In the multiverse case, we see that the additional piece of information, that the universe which was fine tuned was 'our universe' (call this extra information F), must be taken into account when we invoke TER. But if we are arguing on the basis of the likelihood that the multiverse makes some universe fine tuned, then F wouldn't qualify as evidence. F fails to qualify because it doesn't modify the likelihood of the former being correct. Therefore, because we know that F *must be evidence, per TER, we have to introduce an extra condition which modifies our hypothesis. This extra condition is that our hypothesis must take into account the likelihood of F being the case (and we see that the multiverse doesn't make F more likely).
Analogously, taking into account Y at first doesn't seem to impact the likelihood of the hypothesized sample size being correct. That's because the extra information about Fred has no obvious bearing on the likelihood of M or S being correct. But, similarly to the multiverse case, TER tells us that Y *must be construed as evidence (i.e. it must impact the likelihood of our hypothesis). The only way to get around this (just like in the multiverse case) is to modify our hypothesis by introducing the extra condition of 1/N, which our hypothesis must now meet.
Once we do so we can see that we are blocked in our estimates as to the correct sample size; all because we relied on TER to inform us what pieces of information should be counted as evidence or not.
To elaborate on:
Delete"F fails to qualify because it doesn't modify the likelihood of the former being correct."
I mean: F fails to qualify because knowledge of F doesn't impact the likelihood of some universe being fine tuned on account of M (what the 'some universe' reasoners are trying to ascertain).
And so in order to reconcile the fact that TER tells us that F is evidence (it modifies the likelihood that M has on the result); we introduce our new condition that M must meet (it must explain why our particular universe is fine tuned). In this way, the likelihood of M explaining this new condition plummets, and so we see that F modifies the probability from before (i.e. F is evidence).
DeleteAnd I have argued that if we do the analysis rigorously then T turns out to give the right answer, which would dissolve 1. I think we can quickly show that for your example. I claim that whenever using T results in no change for M:S, then it's not because TER has an issue - it's because no change in the odds is actually the provably correct answer for that situation.
ReplyDeleteWith your example there are two possibilities:
a) If the following two assumptions hold:
a1. The chance of more than one atom decaying is negligible on M,
a2. Exactly one atom in the sample is Fred.
then the result of no change for M vs S, given by T, is actually the right answer in that situation, and 1 is dissolved.
b) The two assumptions above don't hold. In that case T does not result in no change for M vs S.
Which possibility do you want? If it's b, then we need to know what replaces those two assumptions.
You can definitely construct the scenario as in A to get around the fact that T doesn't modify things. Again I feel this misses the point, you need to show that TER works in all cases. So proving it works in A, conveniently constructed to work for you, isn't enough.
DeleteSo let's go with B; let's go with my original scenario a few posts back.
b1. Half-life is small, and M is huge. The chance of more than one atom decaying is ~100% on M,
b2. There is exactly one atom in the sample time period which decayed; this is Fred.
Meaning that T-Y can properly eliminate M. But incorporating Y means we need to introduce the 1/N condition, and so we can't properly eliminate M; even though M is obviously not the right answer.
I have always used the word evidence in this discussion to mean any known information. I then distinguish between relevant evidence, which modifies credences, and irrelevant evidence, which doesn't. I take TER to mean the correct credences are given by updating on all evidence, which by definition also implies that the same correct credences are given by updating on all *relevant* evidence.
ReplyDeleteWe can demonstrate that pretty easily for your example if you pick possibility a.
Ok, let's do b then. Then we can show that T doesn't actually block the inference to S, and TER isn't in trouble. Do you still accept these:
Delete"Y = "Probability of an atom to be named Fred is 1/N, where N = atoms in sample"
D(F) = "One of the decayed atoms is named Fred"
D1 = "Exactly one atom decayed"
"
Yes I do naturally
DeleteDo you mean to say that since Y is irrelevant evidence, it won't modify our credences?
DeleteNo, Y is not irrelevant, here's the calculation, with b1 and b2.
ReplyDeleteP(T | M) = P(Y | M) * P(D1 | Y&M) * P(D(F) | D1&Y&M) = blah * miniscule * 1/N2
P(T | S) = P(Y | S) * P(D1 | Y&S) * P(D(F) | D1&Y&S) = blah * normal * 1/N1
Bayes factor = miniscule / normal * N1/N2 << 1
This is the correct Bayes factor, though it might seem to you that N1/N2 should not be there. But one thing is clear right away: the inference to S is not blocked since the factor is much less than one.
Hey Dmitriy,
DeleteI agree with the validity and soundness of your analysis. I do find it bizarre that knowledge of Y should modify our relative credences even higher in the direction of S though, because it gives us N1/N2. Whereas without Y, then P(F) just comes out to more blah blah.
I assumed that Y is problematic on account of its introduction having a similar impact to the knowledge of our universe being fine tuned (for the multiverse case). If you'll permit a detour, going back to the multiverse case we find:
Let O = Some universe is fine tuned
Let D = Our universe is fine tuned
M= Multiverse
S= Single universe
It T = O + D
Then P(T|M) = P(O|M) * P(D| O & M)
And P(T|S) = P(O|S) * P(D|O & S)
This comes out to:
P(T|M) = High * Low
P(S|M) = Low * High
Naturally, P(D|O & S) comes out to high because O tells us the one universe in S is fine tuned, which has to be D. I assume the probabilities cancels out, so that knowledge of D means the Bayes factor is now 1/1; whereas before when we only had evidence of O, this came out to high/low.
Yours was a really great and easy way to demonstrate this; thanks. As you can see from the above; I basically assumed that since the knowledge of D entailed the probability of P(D| O & M) to be low, and since we concluded from this that knowledge of D flipped the Bayes factor away from the favour of M; well then I thought that since P(D(F) | D1&Y&M) was low we would also have to flip the Bayes factor from before.
You can see my fallacy of reasoning I'm sure. The principal difference is that P(D1 | Y&M) was low when we needed it to be high, and P(D1 | Y&S) was higher when we needed it to be lower. Therefore the introduction of the knowledge of Y compounded, and did not reverse the Bayes Factor. Again, I still find it bizarre that knowledge of Y should somehow increase our credence in S by an even bigger factor. Let me know if I am misinterpreting that.
Sorry for all the comments on your blog; I hope I didn't ruin your site. You'll have to give me some time, but I have every intention of fulfilling my promise I laid out in my last email. As soon as I'm able, I'll get right onto the noble task of buying a hat, and well you know the rest....
:)
I skipped this discussion as I couldn't keep up with the frequency of posts, but it looks like you guys reached agreement at the end. Good news!
ReplyDeleteSo, just to confirm, we're all agreed then that Dmitriy's original analysis was correct?
If so, TER isn't in trouble from Fred. I still have my doubts about how it fares in observer selection scenarios like ObserverCoin, as I don't think I can accept that the particular identity of the observer matters when any observer could have made analogous deductions. The inference to a multiverse only from the observation of one world still seems to me like something has gone wrong somewhere, a bit like Dmitriy pointed out in his analysis of the paper's Fred argument.
So I'm wondering if a similar trick could be employed there to what Dmitriy did here for Fred, to show that actually the particular identity of the observer does not matter. If it's there, it'll be subtle.
Unfortunately I can't port the analysis over. It's not really analogous to Fred because (1) we can only ever observe one world and (2) the whole enterprise is predicated on the fact that the observer's world was created in the first place.
Hey DM,
DeleteWe did reach an agreement in the end yes. In light of which, I don't think this assertion of yours is correct:
"The inference to a multiverse only from the observation of one world still seems to me like something has gone wrong somewhere"
Since Dmitriy showed that taking into account the total evidence can't be wrong, it can't be that using the evidence that our particular universe is fine tuned in addition to previous evidence is wrong. So, if I can show that we can in fact reason from *both the fact that some universe is fine tuned, and the evidence that our particular universe is fine tuned; then we need to adopt the results of that analysis. When we do so; we'll immediately realize that the extra information of our particular universe being fine tuned does affect things in the manner we argued. See my analysis below.
Here is what I earlier wrote:
Let O = Some universe is fine tuned
Let D = Our universe is fine tuned
M= Multiverse
S= Single universe
T = Total evidence available
If T = O + D
Then P(T|M) = P(O|M) * P(D| O & M)
And P(T|S) = P(O|S) * P(D|O & S)
Basically this is the probability that the total evidence we have (O + D) would be true on M and S. Notice that O gets incorporated as evidence when we reason to the probability of D being true. That's because this is to simulate our first learning O and then asking what the probability of O being true on M or S is, and then we later incorporate D as evidence. But can actually do the analysis backwards (starting with D), where we learn D first and then O later, and we would get the same results.
So, since Dmitriy showed that taking into account additional evidence can never be erroneous, and since my analysis shows that taking into account D as well as O means that M is not more preferable to S; then we have to ultimately accept the conclusion that fine tuning is not evidence for M.
Let's do the math:
Before we had D, when we were just reasoning from O we got:
P(O|M) = High
and P(O|S) = Low
The Bayes factor for M/S is High/Low. Meaning that M is indeed more preferable to S. Now let's incorporate D:
Then P(T|M) = P(O|M) * P(D| O & M)
And P(T|S) = P(O|S) * P(D|O & S)
For P(T|M) we get = High * P
For P(S|M) we get = P * High
Where P = fine tuning constant
P(D|O & S) = High, ~100%, because we know a universe is fine tuned, and there's just one universe (S); so obviously the fine tuned universe has to be ours. Similarly, P(O|M) is also ~100%.
Therefore, the Bayes Factor is now 1/1. So the incorporation of evidence D, changes the Bayes factor so that M is no longer preferable to S. And notice that we weren't doing what you said we were doing (i.e. ignoring O and just reasoning on D), we just incorporated new evidence in addition to the old, and got the above results.
*But we can actually do the analysis backwards
DeleteAlso I know realize why Y has the counterintuitive effect of making our credences in S even stronger. It's because Dmitriy did his analysis by starting with our knowledge of Y first; which technically didn't represent the actual scenario where we learn Y last. If we constructed the scenario in the manner I envisioned, we would see that Y is irrelevant and doesn't change things.
ReplyDeleteIf:
P(T | M) = * P(D1 |M) * P(D(F) | D1&M) * P(Y | D1 & D(F) & M)= miniscule * Blah *Blah
P(T | S) = P(D1 | S) * P(D(F) | D1&S) * P(Y | D1 & D(F) & S) = normal * Blah * Blah
The knowledge of Y incorporated in the third column, is 0/0 since the evidence we had, had no bearing on the likelihood of the hypothesis being Y being true. So the Bayes factor remains the same (miniscule/normal), which accords with our intuition that Y shouldn't change things. So why then does reversing the order and starting with Y change things? That's because Dmitriy's analysis assumes that we started with the knowledge of Y first. Our knowing Y first would indeed modify the probabilities of D(F).
Without Y, asking what the probability of D(F) is would be a useless question. Our knowing Y first means that we knew that there could possibly exist a decayed atom named Fred, which had a 1/N chance of forming. Therefore, the later incorporation of the knowledge that our first atom is named Fred is an extraordinarily improbable event. The conjunction of the two events D(F) and D1 is now more improbable than just D1; so it makes sense that knowledge of Y (if we start with it first) modifies the probabilities even further to S' favour.
Actually my analysis didn't presuppose any particular order of learning information. The formulas
DeleteP(T | M) = P(Y | M) * P(D1 | Y&M) * P(D(F) | D1&Y&M) = blah * miniscule * 1/N2
P(T | S) = P(Y | S) * P(D1 | Y&S) * P(D(F) | D1&Y&S) = blah * normal * 1/N1
are mathematical truths independent of the actual order of obtaining evidence. For that reason your formulas are just as correct and must give identical final Bayes factor:
"P(T | M) = * P(D1 |M) * P(D(F) | D1&M) * P(Y | D1 & D(F) & M)= miniscule * Blah *Blah2
P(T | S) = P(D1 | S) * P(D(F) | D1&S) * P(Y | D1 & D(F) & S) = normal * Blah * Blah1"
It might be a bit confusing, but we can't actually assume that blah1=blah2. So we can't conclude that learning Y doesn't affect credences.
Interesting. I would say that you are not so much disputing the fact that the ordering of the formulas can be used to represent the actual order of obtaining evidence; so much as disputing our natural intuitions that the order in which we obtain evidence should matter.
DeleteThat's because to say that we received the first piece of information (P1) before the second piece of information (P2), is just to say that we reasoned to the probability of P2 being true with the knowledge of P1. Therefore, P(P2| P1 & K) where K is background knowledge/other evidence. It can't both be true that we received P1 before P2, and the probability that P1 is the case depends on our knowledge of P2 (as it would if we did it backwards). So if there is a dispute here, it must be with our natural intuitions which tell us that the order of events matter.
For instance, we have the intuition that learning D(F) after we learned Y is more special than if we were just told Y afterwards. If we just got rid of D1 to simplify things. we can see:
P(T | S) = P(Y | S) * P(D(F) |Y&S) = Blah * 1/N1
P(T | S) = P(D(F) | S) * P(Y | D(F) & S) = Blah * Blah
Further, it seems like (Y | D1 & D(F) & M) & P(Y | D1 & D(F) & S) can't possibly have different probabilistic outcomes (blah 1 & 2). The only difference between the blah 1 and 2 case are S and M; therefore knowledge of S and M must have bearing on the likelihood of Y being true. But that doesn't seem to follow. Why would the belief that the sample is one size or the other affect the likelihood that the atoms have names, and furthermore that these names are possibly contingent on the size of S/M?
It seems to me that it can't, and the same holds true for the shortened version above. The only way I could even see the remotest impact on probabilistic likelihood, is if the hypothesis carries existential import and our evidence grants us existential proof. For example, if the hypothesis is that I exist and like to chew bubble gum and eat cheese grown on the moon, then the evidence of my existence might remotely increase the chance of the hypothesis being correct because it at least verifies one of the conjuncts.
But this isn't true in our case or for the shortened version; neither Y nor D(F) are existential claims. I propose what's going on here is that the mathematical provability of order of formulas being irrelevant hinges on a frequentist interpretation of the probabilities. If we adopt such a viewpoint, then P(D(F) | D1&M) & P(D(F) | D1&S) end up equaling 1/N2 and 1/N1 respectively. This would mean that we don't have to argue that blah 1 and blah 2 are different, because now if they were the same then re-arranging the order of events yields no difference. This also squares well with our natural intuition that the order of knowledge is important, because such intuitions are of course epistemic.
I wonder what you think?
Hey Alex,
ReplyDelete"You can see my fallacy of reasoning I'm sure. The principal difference is that P(D1 | Y&M) was low when we needed it to be high, and P(D1 | Y&S) was higher when we needed it to be lower. Therefore the introduction of the knowledge of Y compounded, and did not reverse the Bayes Factor. Again, I still find it bizarre that knowledge of Y should somehow increase our credence in S by an even bigger factor. Let me know if I am misinterpreting that.
Sorry for all the comments on your blog; I hope I didn't ruin your site. You'll have to give me some time, but I have every intention of fulfilling my promise I laid out in my last email. As soon as I'm able, I'll get right onto the noble task of buying a hat, and well you know the rest...."
Haha:) And no, the page is not too slow. I'm wondering if eventually Blogger would create a "see older/newer comments" feature like what appeared on Philip's blog. And about why knowledge of Y should increase our credence in S further, it's probably not super critical to get a great intuitive feel for this, but I think I can give a reasonable explanation.
First it's important to realize that it's the combination of Y with D(F) that creates this effect, either one by itself wouldn't. Then let's remember exactly what Y amounts to: "either size sample contains exactly one Fred". That statement couldn't possibly be known for all names (id numbers), in fact it could only be known for at most N1 names. As an example, suppose it was known for N1 names. Then, because hearing D(X) for X NOT one of those N1 names would immediately increase our credence in M to 100%, it stands to reason that if X IS one of those names (such as Fred) hearing it should immediately decrease our credence in M.
Hey Alex and DM,
ReplyDeleteJust a couple of quick points:
"So I'm wondering if a similar trick could be employed there to what Dmitriy did here for Fred, to show that actually the particular identity of the observer does not matter. If it's there, it'll be subtle."
In some sense it doesn't, but not in the way that helps Steven 's camp. It's probably better to just do Bayes carefully. With Alex's analysis:
"
Then P(T|M) = P(O|M) * P(D| O & M)
And P(T|S) = P(O|S) * P(D|O & S)
",
first we should remember that D&O = D, so mathematically there is no need to go through the intermediate step of O at all. The math then simplifies significantly.
Secondly, I would say it's tricky and potentially confusing to say that T = D&O (=D). We have to be super duper careful how exactly we interpret D. This is easier to express in the IVF case:
D1 = the roll of the dice for my embryo was lucky
D0 = my embryo was picked / created to do dice rolls on in the first place
I think it's easy to make the mistake of interpreting T as just D1, but it should be D0&D1.
Or, to be more precise,
DeleteT = D0 & D1 & irrelevant info about my hair color etc.
DM, it's this last part that makes it true that the details of the observer's identity don't matter, the relevant bit is D0, just the fact that my embryo was picked / created.
"first we should remember that D&O = D, so mathematically there is no need to go through the intermediate step of O at all. The math then simplifies significantly."
DeleteOh yes I of course completely agree. But it's one thing to say that TER tells us to reason on D because of the entailment, and another thing to realize that adding D can't make any difference because of that entailment. I got the sense that DM thought that reasoning just on D is to ignore O, and so putting O and D side by side in the above format hopefully showed that this cannot be the case.
I appreciate your clarifying all this up.
Thanks guys.
ReplyDelete> Since Dmitriy showed that taking into account the total evidence can't be wrong,
I don't see how he can have showed that. He showed that taking TER into account in cases analogous to Fred doesn't lead you to the wrong conclusion after all. That doesn't mean that TER works in anthropic/observer selection cases as these are not perfectly analogous to Fred.
As I mentioned on Philip's blog, I think I'm done discussing TER. I'll just sum up by saying that it seems to work for non-anthropic scenarios, but I'm not convinced it works for anthropic scenarios. Doing Bayes carefully doesn't really answer the question, as whether you think TER applies will influence how you think Bayes ought to be done.
TER is a principle which we have good reason to adopt, I agree. But the inference from "my universe exists" to "many universes probably exist" with no other evidence to support it other than the fact that your universe would be more likely to exist if lots of universes exist seems to me to be so backwards that I'd be inclined to adopt an additional clarifying principle something like "When applying TER in cases of anthropic reasoning, the specifics of an observer's identity are not counted as evidence". I'm not sure I've worded that perfectly, but the idea is just that you shouldn't count as relevant evidence specific details of your circumstances that have no bearing on the general form of the argument. This is just an assertion grounded in my intuitions, but then I would say so are principles such as TER.
If there is no subtle analytical trick by which it can be shown that the observer's identity is as irrelevant as Fred's, I'm not sure how one could argue for or against that view except by appeal to intuition. I can understand the appeal of trying to settle the question with some sort of ensemble analysis, but I think that also runs into problems because I think your intuitions about this are going to be baked into how you conduct the analysis, and because we can only conduct an ensemble analysis from a God's eye view when we're trying to decide what an observer within the ensemble should believe.
Again, I think it may be as intractable as the Sleeping Beauty problem.
I've done a bit more work on my ensemble simulation, so I do expect to get back on that at some point.
"
DeleteI don't see how he can have showed that. "
I didn't mean that Dmitriy proved that TER is sound in all cases. I meant that since we all accept this (what you wrote here):
"TER is a principle which we have good reason to adopt, I agree"
It follows that TER is applicable until we have good reason to reject it. The point is that there are sound mathematical reasons to grant TER, and it has been so extensively verified that it is only reasonable to place the burden of proof squarely on the shoulders of those who argue for exceptions. Of course, this is precisely what the above authors were trying to do; by showing that they failed Dmitriy denied that such cases can constitute exceptions.
So we have no good grounds to adopt this clause: "that I'd be inclined to adopt an additional clarifying principle something like 'When applying TER in cases of anthropic reasoning, the specifics of an observer's identity are not counted as evidence' ", or at least you haven't provided any.
Furthermore, such an exception can only be warranted if you argue that all ensemble cases which involve such selection effects are erroneous. However, I thought that you granted that we showed (through the ensemble) that the IVF and other such reasoning were sound. Meaning that the likelihood of being born had no bearing on the probability of the multi-IVF scenario being true. So the Bayesian reasoning on particular evidence which verifies such results cannot be erroneous if you grant that. Therefore, we can't argue for such a general clause.
Naturally, you can continue to add sub-clauses upon sub-clauses; to modify your exclusionary class to become smaller. So, maybe you could continue to say that Bayesian analysis on anthropic reasoning is incorrect in cases like the multiverse where a single population is being invoked. But this just constitutes special pleading unless you have good grounds for showing such a thing. Not to mention that I think the multiverse case is exactly analogous to the IVF and other such scenarios; I showed we don't actually have to draw the S and M observers from the same population (which I feel you never adequately addressed).
In any case, I think all these criticisms are irrelevant until you give us good reasons why we should introduce exceptional clauses for x, and why Bayesian reasoning on the evidence of 'this universe being fine tuned' falls into x.
The top part of my comment appears to have been cut off; I wrote:
Delete"Hey DM,
> Since Dmitriy showed that taking into account the total evidence can't be wrong,
I don't see how he can have showed that."
DM: "TER is a principle which we have good reason to adopt, I agree. But the inference from "my universe exists" to "many universes probably exist" with no other evidence to support it other than the fact that your universe would be more likely to exist if lots of universes exist seems to me to be so backwards that I'd be inclined to adopt an additional clarifying principle something like "When applying TER in cases of anthropic reasoning, the specifics of an observer's identity are not counted as evidence". I'm not sure I've worded that perfectly, but the idea is just that you shouldn't count as relevant evidence specific details of your circumstances that have no bearing on the general form of the argument. "
ReplyDeleteI wonder what you think of this part, and the comment above it:
"
Or, to be more precise,
T = D0 & D1 & irrelevant info about my hair color etc.
DM, it's this last part that makes it true that the details of the observer's identity don't matter, the relevant bit is D0, just the fact that my embryo was picked / created."
In other words I agree that the details are actually irrelevant. But I think we disagree about D0.
---------
Alex,
my intuition is completely the opposite, I think it doesn't matter in what order you learned facts A and B. The credences should only be determined by the fact that you now know A&B.
I think this all hinges on how DM interprets this statement of his:
Delete"the specifics of an observer's identity are not counted as evidence"
I agree that you showed that irrelevant detail about an observer (hair color etc...) don't matter. But I'm not sure if DM disagrees with that; he might just be interpreting his statement about identity conditions to mean anything that relies on the fact that Ex is an observer. So maybe DM feels that just saying ExPX, where something exists which is a particular observer, is a detail which should be left out. I.e. the very fact that we are noting that the observer in question is a particular one; is to run afoul of this statement "the specifics of an observer's identity are not counted as evidence".
If I'm correct and that's what DM means, then DM would need to renounce the results of the ensemble analysis for the IVF case and others. If I have the incorrect interpretation, then in any case you showed that extra particular evidence like metaphysical conditions necessary for the existence of the observer, are irrelevant.
Indeed, I would not regard D0 as relevant because it relies on details of the observer's identity, i.e. that it was *my* embryo that was picked/created.
DeleteTo elaborate a little.
DeleteSuppose we discount all the specifics of an observer's identity, e.g. genome, hair colour, position in space, etc as irrelevant. What are we left with to distinguish one embryo from another? Nothing, I would say. If you truly agree with me that all those details are irrelevant, then there is literally nothing relevant to distinguish one embryo from another. So we can't reason from the fact that it was *my* embryo, because embryos are fungible, interchangeable at this point. All we know is that there was some embryo.
That's not true, we can distinguish every embryo by their order in the trial etc.. The point is that we should discard all irrelevant identity details (e.g. their hair etc...) but keep the relevant ones (like their location in the order of the IVF sequence). See my comment below for more elaborate follow-up.
DeleteSo, D is indeed meant to pick out a condition (so we shouldn't discard all identity conditions), but we presume that the condition is generic to every other embryo/universe (i.e. every other embryo has a location in the trial, but not necessarily every one has hair).
I think it might be helpful to clarify that
ReplyDeleteA) Our universe must meet *some identity condition(s).
B) Every object or class of object must meet *some identity condition(s).
A means that there must be some necessary condition which our universe holds; that makes it distinct from every other possible (or actual) universe. To refer to our universe is to refer (by default) to the universe which meets this/these unique condition(s). But notice that this condition can be as simple as the unique space-time coordinates of our own universe.
So if our universe exists within a multiverse, then our universe is the universe which has the space-time coordinates X within the multi-dimensional inflationary landscape. What's important is that we only need one such condition, so Dmitriy's argument just entails that extra metaphysical conditions which may be unique to our universe are unnecessary. I'm getting the sense that I should also stress B. Because it seems to me DM, that you believe that our action of including particular evidence, or our referring to a particular entity is somehow inherently engaged in special pleading.
But it makes little sense to say that we can't reason on the basis of details of identity; that's because every possible and actual entity has identity conditions. There are identify conditions tied to the multiverse, and to every other possible/actual universe within the multiverse (i.e. their own space-time coordinates). Thus, to say that referring to an entity on the basis of its identity conditions is to necessarily invoke specialness on that entity is, I think, confused.
Rather, what makes an entity special is not its holding identity conditions (because every entity does so), but rather its holding certain *types of identity conditions. I earlier pointed out that every universe within the multiverse has its own identity condition unique to its location. To make a claim that our universe is special is just to say that additional identity conditions are warranted for our universe which go above and beyond normal universe conditions. For example, if we specified that every other universe can be referred to by its location within the multiverse ensemble, but that our universe can only be referred to as the thing which meets condition x (where condition x could be the fact that earth exists); then we are engaged in special pleading on the part of our universe.
But notice that we're doing no such thing. No special pleading is therefore committed because our universe is construed as belonging to a general reference class that includes all other possible/actual universes. The same applies to the particular observer; we aren't invoking special identity conditions exclusive to the observer in question (i.e. she has to have black hair), but using the same type of condition relevant to all other observers (i.e. their location within the multiverse).
The bottom line is that it would actually be special pleading to argue that reasoning on the basis of certain identity conditions (i.e. space/time coordinates within the multiverse ensemble) is erroneous, but reasoning on different identity conditions like in the 'some universe' case is okay. Why we can we talk about the identity details of specific types of objects (i.e. universes) and specific types of embryos (i.e. the embryos in IVF trial) but not specific types of universes or more narrow categories of embryos (if you do in fact object to the IVF case)? Some reasoning must be given for that, otherwise that's just special pleading in the extreme.
Meaning, why can we rely on identity conditions/details to refer to the class of 'some universe' and 'some embryo', but not in other cases? Why are some identity conditions special/off-limits, but not others? Some reason has to be given for that.
DeleteIn this case, the identity conditions for "some universe" and "some embryo" are whatever necessary conditions constitute a universe (as opposed to some other object like a a generic banana), and the condition that (is an embryo in trial X) respectively.
Distinguishing embryos by order in the trial means that you're taking the ordinal of the embryo as evidence to be reasoned with, even though that's just a specific detail that is only of any significance because it is tied to your identity. That's exactly the kind of detail I want to throw out in anthropic reasoning.
ReplyDeleteMy own ideas on identity are radical, and I'm not trying to bring them into the debate. But I don't actually agree that our own universe or any object must meet some identity conditions. I don't think identity is anything more than a useful concept in practical situations. I don't think it's a real metaphysical thing. I don't think there's any fact of the matter as to whether I'm the same person I was yesterday, for example. You say that our universe is a universe with certain coordinates, but I think you run into two problems -- (1) there is no objective basis from which to define objective coordinates, so it's not clear to me that our universe would indeed have objective coordinates in an inflationary string landscape and (2) in some cosmologies (e.g. the MUH), universes are not merely at different coordinates but completely disconnected.
I'm not accusing you of engaging in special pleading. If anyone is engaged in special pleading it's me. I'm saying I'm happy to adopt TER except in the special case of Anthropic reasoning, where I plead that we may not use the evidence of our specific identity to conclude the kinds of things you want to conclude.
> Why are some identity conditions special/off-limits, but not others? Some reason has to be given for that.
You can use any conditions you like as long as you're not bringing them in just because they're "your" conditions and for no other reason than that. Fine tuning is a condition I think it's OK to bring in because it's interesting/surprising from an (imaginary) objective standpoint and not just because its a feature of your universe.
Otherwise, and again, this is not strictly analogous, but the mistake you seem to me to be making is being amazed that of all the UUIDs you could have generated just now, the one you got was e9983ab1-2a56-41b8-892b-887c5285e507. The problem with being amazed by this is that there is nothing at all significant about it other than the fact that this is the one that was just generated. The same thing happens, I feel, when we take as significant any property of our perspective that is only significant because it is a property of our perpective. If we must not take it as significant then we must, I feel, take it as irrelevant, which means not taking it as evidence in Bayesian arguments.
DM,
DeleteI think you're conflating two different things here; which are the identity conditions and the evidence we possess. TER has to do with the latter, while appeals to specialness have to do with the former.
"But I don't actually agree that our own universe or any object must meet some identity conditions. I don't think identity is anything more than a useful concept in practical situations. I don't think it's a real metaphysical thing"
So this is exactly what I was not saying. Identity conditions are just a feature of semantics; whether they carry metaphysical/existential import has to do with your own personal beliefs (which are irrelevant here). So an object/class having identity conditions is a necessary feature of our being able to refer/quantify over such a class; otherwise your reference is ambiguous (it could refer to multiple things). The nature of such conditions can be vague if you feel that objects/classes are inherently vague things (e.g. I am not the same person I was yesterday).
Your point about my identity criterion failing doesn't make an impact unless you're trying to argue that it is incoherent to refer to/talk about a universe, or a particular embryo.
"Distinguishing embryos by order in the trial means that you're taking the ordinal of the embryo as evidence to be reasoned with, even though that's just a specific detail that is only of any significance because it is tied to your identity"
So this isn't the case at all, I'm not saying that the nature of the embryos by their order in the trial needs to be taken into account (or is significant) only because of their identity conditions; I'm saying this evidence has to be taken into account because of TER. It is TER which makes the evidence of this universe significant, not anything to do with the way we construe our universes' identity. Where the identity conditions come into play is to show that it is possible to refer to a particular class without engaging in special pleading. Basically, to address critiques that our universe must be some special thing by default of its identity.
So my point is that we can't be engaged in special pleading because we are invoking TER (so we have good reasons), UNLESS you believed that our universe is some special thing by default. Obviously, I hope you agree that you have the burden of proof to give reasons for why we should make exceptions for TER. My post about the identity conditions was to counter the assertion that we are engaged in special pleading (i.e. we think x is significant when it is not) despite our invoking TER.
The only way you could argue that is if you believed the identity conditions of our universe are such that, simply making reference to our universe is to talk about a special object which doesn't belong to a general class involving other universes. I hope you agree that this is clearly not the case (that our universe can belong to a general class of other universes).
So that's what, "thinking that x is significant because of it's identity details/conditions" really means. To say that we must think D is significant merely because we took it into account; IS to say that our universe/our embryo has special identity conditions. So it seems to have turned out that you weren't making such an argument after all, because you don't think these things have special identity conditions. In which case, we are back where we started regarding TER. But now it is up to you to show that TER fails in this case.
Also I failed to address this:
Delete"I'm not accusing you of engaging in special pleading."
But you later do go on to write this:
"but the mistake you seem to me to be making is being amazed that of all the UUIDs you could have generated just now, the one you got was e9983ab1-2a56-41b8-892b-887c5285e507. The problem with being amazed by this is that there is nothing at all significant about it other than the fact that this is the one that was just generated."
To say that we are finding some piece of evidence significant (and arguing that the multiverse/IVF must explain it) when it's actually not relevant/significant, is the same thing as saying that we are engaged in special pleading. If it's not actually significant, then we're engaged in fallacious reasoning in assigning some importance to this class of evidence.
It's specious to say "the multiverse fails as an explanation on account of it not being able to explain this thing", simply because we think this thing is special. And making an exception for "this thing" for no reason other than that we think it special (i.e. identity), is what you seem to be accusing us of doing. Of course the real reason is TER.
Hey guys,
ReplyDeleteI will try to organize my thoughts on how to justify including D0 as part of the evidence into something coherent enough to make a new post. I think it can always be denied that this is the right thing to do without ending up in a contradiction, but I think the intellectual price tag on this denial is huge.
It's important to realize that the inclusion of D0 is equivalent to the ensemble principle, and moreover equivalent to the Self-Indication Assumption. Of course it's not blasphemy to deny SIA, Bostrom for example does argue against it in Anthropic Bias.
Sounds great! Also, I can't wait for the prospective doomsday argument article (no pressure).
Delete:)
Hi Alex,
ReplyDelete> So an object/class having identity conditions is a necessary feature of our being able to refer/quantify over such a class
Agreed that it's a question of semantics and with the above. But that being the case, different people or different conversations can use different identity conditions. My point being that there is no fact of the matter over what the right identity conditions are. So anything you consider to be an identity condition, I might consider not to be and vice versa.
But this is neither here nor there. I was just checking that you weren't taking for granted assumptions about identity we may not all share. Perhaps you aren't.
> when it's actually not relevant/significant, is the same thing as saying that we are engaged in special pleading.
I don't think so, but now we're just arguing about the semantics of "special pleading". Special pleading is agreeing that some principle applies in general, but not in some specific case. That is what I am arguably doing in accepting TER in general but not in the specific case of anthropic reasoning. I don't think I can cast you as engaging in special pleading. There is no general principle you're accepting but rejecting in some special case. You're not even taking this universe to be special in any objective sense. We all agree that it is special from the subjective perspective of the observer. What I disagree with is reasoning from this subjective perspective and treating it as more than "some universe" by using its specific identity as evidence, when from an objective perspective it would be interchangeable with any other universe in some class without changing anything at all about the reasoning.
To bring it back to Fred for a second. Dmitriy showed that the evidence that it was Fred does not block the correct inference, so TER is not violated. But I think we should agree that the fact that it was Fred was nevertheless irrelevant, and we would have been as well off only knowing that it was "some" uranium atom.
DeleteMy strong intuition is that the same applies in anthropic reasoning, and that the specific fact that our universe is "this" universe is irrelevant, and we would be as well off only knowing (or only pretending we know) that it was "some" universe. If this can be made to accord with TER somehow, then that would be nice. If it can't, then so much the worse for TER, because I'm more inclined to accept this principle than to accept TER.
I don't see why the following principle is any less compelling than TER.
Delete"Any inference that we make knowing only that some observable is some fungible instance of a class should not be blocked when we know that it is a specific instance of the class"
But it's less snappy because it's more subtle and I need to explain it. The word "fungible" here is important. What I mean is that the instances of the class should be fungible with respect to the inference. For example, if I know only that some human is the current US President, then the inference that the US President is mortal is justifiable, and cannot be overturned by knowing the specific identity of the US president, because the members of the class "human" are fungible with respect to mortality. But the inference "The US President's name is probably not Joe Biden" is not fungible with respect to the class "human", so knowing that the US President is Joe Biden can block the original inference.
What this means, I think, is that for an inference from generic information to be blocked by specific information, there should be some special property of the specific information that disconfirms or casts doubt on the original inference, such as "This X is called Joe Biden". Whatever this property is, it should not have the character that we could find analogous disconfirming properties on any other member of the class. It would be crazy (I feel) to have an inference which follows from "Some observable is of the class C" blocked by any and all possible evidence of the form "Some observable is X which is in the class C".
A better wording of my proposed principle would be
Delete"Any inference that we make knowing only that some observable is some fungible instance of a class should not be blocked *just because* we know that it is a specific instance of the class"
Hey DM,
ReplyDelete"Any inference that we make knowing only that some observable is some fungible instance of a class should not be blocked *just because* we know that it is a specific instance of the class"
Since you define fungible as fungible with respect to the hypothesis, this just means that every instance of a class must confirm the hypothesis. Which is another way of saying that: "Any inference based on a particular cannot invalidate/block a logically necessary hypothesis about that general class". That's because contingent hypotheses about a class could be potentially overturned by particular evidence.
That's an obvious truth, but one which isn't so helpful here on account of its generality. A more helpful and narrower construal would be "Any inference based on a particular cannot invalidate/overturn a hypothesis about the general class holding a necessary property; assuming the hypothesis is sound". The latter type of hypothesis is still logically necessary (meaning it is either necessarily true or false); while being a more relevant sub-category.
So the reason knowing the identity of the president cannot invalidate the hypothesis about all human presidents being mortal; is because such a hypothesis is logically necessary (assuming that humans can't be immortal by definition); it has nothing to do with reasoning based on the particular being special or different.
If on the other hand, we had a contingent hypothesis, such as one where we try to estimate the half life of a particle based on it having decayed in a sample consisting of a mix of U-235 & U-238 & Plutonium & Thorium nuclides; then particular evidence overrules inferences based on general evidence. Suppose we only knew the general evidence "some atom in the sample decayed"; this, at best, gives a 1/4 chance that the half life of the atom is w,x,y, or z. Once we know, "this specific plutonium atom decayed" however, we can give a better probabilistic inference which overturns the previous one.
This isn't going to be true for all particular evidence in all cases of contingent hypotheses obviously, because not all such evidence is relevant (e.g. knowing that our atom exists at coordinates x,y,z). But the way to see whether such evidence is relevant is to do the math and find out if the probabilistic outcomes are affected, and so TER tells us that taking into account irrelevant evidence can never be bad because it's just a superficial thing (it doesn't change anything)
But in our case we aren't dealing with a logically necessary hypothesis or irrelevant evidence (we all agree that D modifies the likelihood of M being true, compared to O); so we can't use your principle to argue that reasoning on the particular in those instances is blocked/invalidated.
Hey Alex and DM, I think this puts it really well:
ReplyDelete"This isn't going to be true for all particular evidence in all cases of contingent hypotheses obviously, because not all such evidence is relevant (e.g. knowing that our atom exists at coordinates x,y,z). But the way to see whether such evidence is relevant is to do the math and find out if the probabilistic outcomes are affected, and so TER tells us that taking into account irrelevant evidence can never be bad because it's just a superficial thing (it doesn't change anything)"
About DM's proposed principle, I don't feel I entirely understand the exact meaning of fungible. But I am curious if the principle is disconfirmed by the example with 100 sleeping patients and the doctor killing 99 if he flips tails. This, or an example like this, is one of the crazy anthropic scenarios I wanted to use to understand why and how we must include D0 (our existence). Check out my first installment: https://www.reasonmethis.com/2021/03/anthropic-reasoning-1-i-am-therefore-i.html
It starts slow, but I think it might be good to go step by step, if only to see which exact step is the one that where we first start to disagree. Perhaps we could keep talking in the comments section of that new blurb - I think this page is becoming a bit too large for browsers to handle smoothly.
Post a Comment