probability and prediction

Introduction

This session is about how we can best make predictions based on the information we have. What are our best guesses at unknown quantities given the data we observe? If we have new information, how do we update our guesses? Key to this analysis is an important formula known as Bayes' Theorem, which you'll learn about this week.

1. A puzzling probability problem

2. Bayes' Theorem: switching round tree diagrams

3. Making predictions: part 1

4. Making predictions: part 2

5. Bayes in action: the Prosecutor's Fallacy

1. A puzzling probability problem

I have three coins in my pocket. On one of them, both sides are "heads"; the second coin has two "tails" and the last coin is normal and has "heads" on one side, "tails" on the other.

I take out one of the coins at random and place it on a table. It shows heads.

What's the probability that the other side is also heads?

Have a look at this simulation, which repeats the experiment many times. Are you still convinced by your original answer?

Most people, when they see this problem for the first time, give an answer of 1/2, but the simulation might convince you that the other side shows heads around 2/3 of the time, and in fact this is the correct answer to my original question.

Have a think about why. Can you come up with a convincing explanation? Below is one way of explaining it:

The information that we're given - that the coin shows heads - means it's twice as likely that the double-heads coin was chosen than the heads-tails coin. The situation "the heads-tail coin was chosen and it shows heads" has a probability of 1/6 (think of a branch of a tree diagram), whereas the situation "the double-heads coin was chosen and it shows heads" has a probability of 1/3, because if it's chosen it always shows heads.

What is the flaw in the argument below?

If I see heads, it's equally likely to be the normal coin or the double-heads coin, and so half the time there will be another head and half the time it will be a tail.

Here's a video explanation of another famous counter-intuitive problem, usually called the Monty Hall Problem. Watch the video and make sure you understand the explanation. Here's another explanation of the problem which is very succinct:

If you always switch, the only times you don't win are when you initially picked the door with the car behind.

2. Bayes' Theorem: switching round tree diagrams

Have a think about this problem:

Every day I roll a die to decide how to travel to school. If I roll a six, I walk; otherwise I cycle. If I cycle, the probability that I arrive late is one seventh, whereas if I walk, the probability that I'm late is three fifths.

One day, you see me arrive late to school. What's the probability that I walked to school that day?

Here's a video discussion of the problem.

Now watch this video explaining a result known as Bayes' Theorem, after the 18th century statistician and church minister Thomas Bayes. Then have a go at these questions. Solutions here.

3. Making predictions: part 1

Look at the picture below:

Presented with just this information, how many volumes would you guess there to be in the full set? All we really know is that the answer to this question is: at least 13. But we could perhaps make an intuitive guess at which is the most likely out of a set of options, say: 13, 16, 30, 200.

Of these four choices, which do you think is the most sensible guess? Why?

We're going to analyse this problem with some code, but we'll need to start making an assumption, which you may or may not agree with:

Assume that this is a random set of four numbers from the list 1, 2, 3, ..., n, where n is the quantity which we want to estimate.

Here are some ideas for an estimator of n, if all we're presented with is a sample of four numbers from the list:

A: double the mean of the set of four numbers and round to a whole number;
B: add together the smallest and largest of the four numbers;
C: increase the largest of the four numbers by 25% and round to a whole number;
D: come up with your own estimator...

For the sets below, I picked a value of n and generated 10 random choices of 4 'volumes':

6, 10, 20, 19

4, 11, 13, 14

16, 14, 8, 7

19, 15, 13, 20

So each set represents seeing four particular volumes of a series in a book shop.

Let's work out the values of each of the estimators A, B, C and D for each of the four sets above:

6, 10, 20, 19 A = 28, B = 26, C = 25, D = ??

4, 11, 13, 14 A = 21, B = 18, C = 18, D = ??

16, 14, 8, 7 A = 23, B = 23, C = 20, D = ??

19, 15, 13, 20 A = 34, B = 33, C = 25, D = ??

Which of them gives the most consistent results? The value of n I used was in fact 22. Now you know this, say which of the estimators gives the most accurate results [Hint: what's the average value of each of the estimators A, B, C and D?]

If we want a good estimator of an unknown value, we want one which is both accurate (in the sense of being close to the true value on average) and consistent (doesn't vary wildly for different sets of data).

Use this notebook to investigate various estimators for this problem. There's a challenge for you at the end. Have a go at these questions using the notebook.

4. Making predictions: part 2

Suppose I see a coin tossed 15 times, and it lands on heads 11 times. What would you think about the underlying probability of getting heads? Here are some possible answers:

The probability of getting heads is 11/15.
The probability of getting heads is 1/2; the evidence above is not compelling enough to dissuade me from this view.
My best guess at the probability was 1/2 before I saw the coin tossed, perhaps I now need to adjust this estimate upwards.

Maybe a better question would be: given this information, what's the probability that the coin is fair?

To start thinking about how we analyse this situation, watch this video.

Then work through this sheet, to explore why - in some senses - our best estimate in the situation above should be 12/17.

5. Bayes in action: the Prosecutor's Fallacy

Suppose that a certain DNA profile matches only one in a million people, and a man stands accused of a crime based on the fact that his DNA matches some found at the crime scene. What is wrong with the following statement?

The probability that the man is innocent is only one in a million.

This is an example of getting a conditional probability the wrong way round. The one in a million figure is the probability that a DNA match is found given that a person is innocent. It is not the same as the probability that a person is innocent given a match, which is what we really need to know to decide whether to convict a person of a crime. Juries are required to decide 'beyond reasonable doubt' whether someone is guilty, so it's vital to be able to get a handle on the second probability.

Read the article here and then watch the first 5 mins 30s of this video for an analysis of the probabilities.