### Trust and Lie

Are You Biased? The short answer to this question, as we

By carefully choosing my words, I accidentally on purpose made it look like all these deaths occurred in 2016. They didn’t. In fact Boeing reported that these were the cumulative deaths from 1959 to 2016, and not just in the US, but worldwide. The average figure (for what it’s worth – it’s ridiculously easy to lie with averages, as we shall see later) is just 526 worldwide airline passenger deaths per year, and the annual accident rate per million departures has declined annually from 50 in 1959 to around 1 in 2016. Clearly it’s safe to fly – and becoming increasingly so. You see, I told the truth, but I did not tell the whole truth. I deliberately deceived by using a number of different biasing techniques.

I was highly selective in reporting only the facts I wanted you to see and added in some other stuff to artificially inflate the figure in the reader’s mind. And I did this without actually telling a lie. Never mind that Boeing’s numbers were recorded, tallied, analysed, calculated, graphed and nicely reported, if you have a mind to misrepresent the data it is not difficult to do so. Newspapers and politicians do it all the time and they are constantly being rewarded for it (more newspaper sales, more votes). Bias is a type of error in which a measure or observation is systematically different from the ‘truth’ (whatever that is). It can affect any stage of the research process, from your literature review to measuring and recording your data, from analysing and interpreting your results to publishing them in a journal, thesis, report or newspaper article.

Imagine you sent out questionnaires to a thousand households, asking a single, simple question: Do you like to answer questionnaires?

I hope it’s obvious to you that those people who do enjoy questionnaires will answer that they do, while those that do not will file the paperwork carefully in the nearest waste-paper bin. The former group of people will have self-included themselves in the study, while the latter will have self-omitted. Any analyses, results and interpretations based upon these flawed data will have almost zero validity. This may be a very extreme example, but selection bias like this is deliberately inserted into data every day by unscrupulous commercial organisations that are more interested in selling their wares to you than presenting the truth.

Like the dieting programme that rejects the data of all those that drop out of the trial – most of those that drop out likely do so because it wasn’t working, whereas those that successfully lose weight are likely to stay in. Report your results, data be damned! It is not just a simple matter of saying ‘I’m not biased, so therefore I won’t bias my study’, though – bias may be conscious (deliberate) or unconscious (accidental, careless or negligent), and there are literally hundreds of ways in which you can bias your research. And ignorance is no excuse. If you’re not experienced in data gathering, a more experienced practitioner will easily spot bias in your processes and tear your study to shreds, rendering your 3-year study worthless. Not a nice feeling! And just because you’ve made calculations to sixteen decimal places, don’t think for one second that your numbers are in any way valid.

A calculation based on poor assumptions and flawed measurements, however accurate it may seem, is just as guilty of being biased as if you’d just pulled the number out of a hat. Incidentally, a few years back I did some analysis for a colleague who wanted me to check some of his results.

I shall call him Paco, for no other reason than that is not his name. Despite having precisely the same data and having used the same analysis methods, I was unable to verify some of Paco’s results. When I asked why some were statistically significant in his analyses where they were not in mine, he replied ‘well, if I exclude these 2 patients from the analysis the p-value becomes significant’. On enquiring why those patients should be excluded from the analysis, he said ‘because when I do, the p-value becomes significant…’. Oh dear – my good friend Paco was guilty of omission bias. He didn’t make things any better for himself though; he was quite unrepentant, insisting that there was a very good scientific reason why he should exclude these patients from the study, but that he just hasn’t found it yet!

I hope it’s obvious to you that those people who do enjoy questionnaires will answer that they do, while those that do not will file the paperwork carefully in the nearest waste-paper bin. The former group of people will have self-included themselves in the study, while the latter will have self-omitted. Any analyses, results and interpretations based upon these flawed data will have almost zero validity. This may be a very extreme example, but selection bias like this is deliberately inserted into data every day by unscrupulous commercial organisations that are more interested in selling their wares to you than presenting the truth.

Like the dieting programme that rejects the data of all those that drop out of the trial – most of those that drop out likely do so because it wasn’t working, whereas those that successfully lose weight are likely to stay in. Report your results, data be damned! It is not just a simple matter of saying ‘I’m not biased, so therefore I won’t bias my study’, though – bias may be conscious (deliberate) or unconscious (accidental, careless or negligent), and there are literally hundreds of ways in which you can bias your research. And ignorance is no excuse. If you’re not experienced in data gathering, a more experienced practitioner will easily spot bias in your processes and tear your study to shreds, rendering your 3-year study worthless. Not a nice feeling! And just because you’ve made calculations to sixteen decimal places, don’t think for one second that your numbers are in any way valid.

A calculation based on poor assumptions and flawed measurements, however accurate it may seem, is just as guilty of being biased as if you’d just pulled the number out of a hat. Incidentally, a few years back I did some analysis for a colleague who wanted me to check some of his results.

I shall call him Paco, for no other reason than that is not his name. Despite having precisely the same data and having used the same analysis methods, I was unable to verify some of Paco’s results. When I asked why some were statistically significant in his analyses where they were not in mine, he replied ‘well, if I exclude these 2 patients from the analysis the p-value becomes significant’. On enquiring why those patients should be excluded from the analysis, he said ‘because when I do, the p-value becomes significant…’. Oh dear – my good friend Paco was guilty of omission bias. He didn’t make things any better for himself though; he was quite unrepentant, insisting that there was a very good scientific reason why he should exclude these patients from the study, but that he just hasn’t found it yet!

Another way of adding bias into your study is by consistently understating or overstating a particular measure. If measurement errors are systematically biased you will be guilty of measurement bias, also known as observational bias. You might spot this at unscrupulous weight loss clubs where they ‘fix’ the scales at first weighing to read a higher weight, then alter the scales to deliberately record a lower weight at subsequent weigh-ins.

This ensures small but consistent weight loss is recorded, keeping the happy customer coming back for more. A real-world example of measurement bias came in 1628, when the Swedish ship, Vasa, sank less than a mile into her maiden voyage, culminating in the deaths of 30 people. A recent investigation into the sinking of what was considered to be the most powerful warship in the world has discovered that the ship is thinner on the port side that the starboard side. Apparently, the workers on the starboard side used rulers that were calibrated in Swedish feet (12 inches), while workers on the port side used rulers calibrated in Amsterdam feet (11 inches). Of course, in these days of great computing power and automated software programs these kinds of mistakes just don’t happen.

Or do they? A catastrophic and very expensive example of measurement bias came in 1999, as the Mars Orbiter was lost as it travelled too close to the planet’s atmosphere. An investigation said the cause of the loss of the $125m probe was because the NASA team used metric units while a contractor used Imperial measurements. In other words, despite an abundance of space in that region of the solar system, the probe missed and crashed into Mars instead. Oops… I particularly enjoy the story of the British rock band Black Sabbath, who, in 1983 had a replica of Stonehenge made for their stage show. Unfortunately, it was so big that it got in the way of the band and very few of the replica stones would fit on the stage.

The legend goes that there was a mix-up between metres and feet, so the entire structure was three times bigger than it should have been. This was parodied the following year in the riotous This is Spinal Tap mockumentary, where the rock group ordered a model of Stonehenge for their stage show, but the note written on a napkin said 18’’ (18 inches) rather than 18’ (18 feet). When it comes to reporting results, you can bias the outcome by highlighting favourable evidence that seems to confirm a particular position while suppressing evidence to the contrary. This is known as cherry picking, also known as suppressing evidence, and may be intentional or unintentional. In public debate, though, and particularly in politics, it is rarely unintentional.

This ensures small but consistent weight loss is recorded, keeping the happy customer coming back for more. A real-world example of measurement bias came in 1628, when the Swedish ship, Vasa, sank less than a mile into her maiden voyage, culminating in the deaths of 30 people. A recent investigation into the sinking of what was considered to be the most powerful warship in the world has discovered that the ship is thinner on the port side that the starboard side. Apparently, the workers on the starboard side used rulers that were calibrated in Swedish feet (12 inches), while workers on the port side used rulers calibrated in Amsterdam feet (11 inches). Of course, in these days of great computing power and automated software programs these kinds of mistakes just don’t happen.

Or do they? A catastrophic and very expensive example of measurement bias came in 1999, as the Mars Orbiter was lost as it travelled too close to the planet’s atmosphere. An investigation said the cause of the loss of the $125m probe was because the NASA team used metric units while a contractor used Imperial measurements. In other words, despite an abundance of space in that region of the solar system, the probe missed and crashed into Mars instead. Oops… I particularly enjoy the story of the British rock band Black Sabbath, who, in 1983 had a replica of Stonehenge made for their stage show. Unfortunately, it was so big that it got in the way of the band and very few of the replica stones would fit on the stage.

The legend goes that there was a mix-up between metres and feet, so the entire structure was three times bigger than it should have been. This was parodied the following year in the riotous This is Spinal Tap mockumentary, where the rock group ordered a model of Stonehenge for their stage show, but the note written on a napkin said 18’’ (18 inches) rather than 18’ (18 feet). When it comes to reporting results, you can bias the outcome by highlighting favourable evidence that seems to confirm a particular position while suppressing evidence to the contrary. This is known as cherry picking, also known as suppressing evidence, and may be intentional or unintentional. In public debate, though, and particularly in politics, it is rarely unintentional.

After Donald Trump’s inauguration as President of the United States, the White House Press Secretary, Sean Spicer, held a press briefing and angrily accused the media of deliberately underestimating the size of the inauguration crowd, stating that it was the “largest audience to ever witness an inauguration – period…”. To back up his claims, Spicer (who was subsequently – and mercilessly – parodied by comedienne Melissa McCarthy on the US comedy show Saturday Night Live) claimed that 420,000 people rode the Washington DC Metro on inauguration day in 2017 compared to 317,000 in 2013.

This statement led to a series of highly public (and, to the casual observer, highly amusing) arguments between the press and the White House staff about the meaning of the words ‘truth’ and ‘facts’. When Spicer quoted the number 317,000, he was attributing it to Barack Obama’s 2013 inauguration, and it was a correct number, although it was cherry picked. The 317,000 referred to the number of people that travelled on the DC Metro in the morning of Obama’s inauguration. The whole day figure was 782,000. Conversely, no one knows where the figure of 420,000 for Trump’s inauguration came from.

The official figures for the ridership of the DC Metro were 193,000 in the morning and 570,000 for the whole day – the lowest figures of each of the previous four inaugurations! In an interview the following day, Trump’s campaign strategist Kellyanne Conway defended Spicer’s statements, saying “Sean Spicer gave alternative facts to these claims”. Prior to Trump’s inauguration, the term ‘alternative facts’ meant something different. It was generally understood to be ‘OK, so you’ve shown me your facts, now let me show you mine’, and the two sets of alternative facts (all of which were, actually, factual) were examined. The world now sees ‘alternative facts’ as a synonym for deliberate falsehoods or intentional lies. For those interested in spotting bias in news reports and research studies, it is not enough simply to inspect the numbers being presented. It is useful to dig deeper and ask questions such as Who is the author? What do they have to gain? Who sponsored or commissioned the work? Has the author diligently highlighted potential sources of bias in their report? If not, why not? It is only by asking such questions that you can reveal the motives behind the author and come to your own conclusion about the validity of their arguments. Back to Contents The Average Human Being Has Only One Testicle A physicist, an engineer and a statistician on a hunting trip spot a deer in a clearing. The physicist shoulders his rifle, takes aim and misses the deer by 5 feet to the left. The engineer grabs the rifle and says, “You forgot to account for the wind”. He takes aim and misses the deer by 5 feet to the right.

The statistician suddenly jumps up into the air, shouting excitedly “We got it, we got it!”. When it comes to misdirection there is no better way than using averages. The term ‘average’ is used an awful lot in the popular press – and even appears far too often in respected peer-reviewed journals. You see, there isn’t a single meaning when it comes to the word ‘average’, it can mean any of a dozen different things, all of which could be very different. In general, when someone uses the word ‘average’ they mean the middle value. Let’s look at an example to see how different measures of average (mean, median and mode) can give wildly varying answers.

Let’s say that there are ten people that live on your street and that, on average, everyone is a millionaire. Happy days! Now what if I told you that three are out of work and are flat broke, six are employed on normal wages of about £20-30k, and the legendary rock musician Ozzie von Hendrix, who earned £10 million last year, lives in the mansion on the cliff at the end of the street. That would rather change the whole complexion of the neighbourhood, wouldn’t it? You see, it all depends which measure of average you choose to use.

If you’re a real estate agent and you’re trying to sell a house in the street, it will do your cause the power of good to tell prospective buyers that this is a really good area with an average income of over a million quid (the mean). On the other hand, you could argue to the local taxes commission that the residents cannot afford any rate rises because the average resident for this street is unemployed and hasn’t earned a bean in the past year (the mode in these data is £0).

And yet the numbers don’t lie – you could argue that on average you are all millionaires (mean), earn a standard living of £25k (median) and at the same time you are all broke (mode). This is why you should get a very nervous tick whenever you hear the word ‘average’ without an explanation of which specific measure is used. There is a specific case, though, when mean, median and mode are all the same, and that is when your data are normally distributed. It is useful to understand when data are not likely to fit a normal distribution when trying to figure out if someone is trying to pull a fast one. Heights of men are normal, as are blood pressure measurements or IQ scores. Beware, though, the data distributions that might sound like they would be normal, but in fact hide two or more distinct distributions, such as annual salaries. It is not difficult to rationalise annual salaries in a company as being normally distributed – the janitors and cleaners are the lowliest paid, the managers are more highly paid and the bulk of the workforce have a wage somewhere in between. Sounds normal to me. Ah, yes, but what about the directors and executives?

Typically, they have an entirely different pay scale.

If you were to graph all the salaries with income on the x-axis and frequency on the y-axis (the number of workers in a particular pay band) there would be a large hump on the left representing the wages of the employees, a gap, and then a smaller hump on the right corresponding to the wages of the executives. If one were to include all wages in a single ‘company average salary’, the executive salaries would skew the results of the mean wages to the right, suggesting a higher ‘average’ salary. The median salary, though, would hardly be affected by the executive wages.

If the intention of the analyst was to try to downplay the average company wages, perhaps for some sort of tax audit, she might omit the executive wages (remember selection bias from above?) and then use the median. Alternatively, to exaggerate the average salary, perhaps for recruitment purposes, she would include executive salaries and use the mean. Better still, make sure that all executive bonuses, dividends and payments-in-kind are included too – that would make for an even higher average! Whenever the word ‘average’ is used, we should be sceptical – it is very often used to mislead. Like the man who drowned in a pool of water, the average depth of which was one inch. Or that the average human being has only one testicle. We should also spot when a report specifies which measure of average is being used for some figures but not all: ‘Our survey said that median age was 27 years old and average income was £32k’.

The attempt to hide a rogue ‘average’ income in there amongst a ‘median’ age should not fool the wary, and questions should be asked as to the validity and motives of the report. Back to Contents.

This statement led to a series of highly public (and, to the casual observer, highly amusing) arguments between the press and the White House staff about the meaning of the words ‘truth’ and ‘facts’. When Spicer quoted the number 317,000, he was attributing it to Barack Obama’s 2013 inauguration, and it was a correct number, although it was cherry picked. The 317,000 referred to the number of people that travelled on the DC Metro in the morning of Obama’s inauguration. The whole day figure was 782,000. Conversely, no one knows where the figure of 420,000 for Trump’s inauguration came from.

The official figures for the ridership of the DC Metro were 193,000 in the morning and 570,000 for the whole day – the lowest figures of each of the previous four inaugurations! In an interview the following day, Trump’s campaign strategist Kellyanne Conway defended Spicer’s statements, saying “Sean Spicer gave alternative facts to these claims”. Prior to Trump’s inauguration, the term ‘alternative facts’ meant something different. It was generally understood to be ‘OK, so you’ve shown me your facts, now let me show you mine’, and the two sets of alternative facts (all of which were, actually, factual) were examined. The world now sees ‘alternative facts’ as a synonym for deliberate falsehoods or intentional lies. For those interested in spotting bias in news reports and research studies, it is not enough simply to inspect the numbers being presented. It is useful to dig deeper and ask questions such as Who is the author? What do they have to gain? Who sponsored or commissioned the work? Has the author diligently highlighted potential sources of bias in their report? If not, why not? It is only by asking such questions that you can reveal the motives behind the author and come to your own conclusion about the validity of their arguments. Back to Contents The Average Human Being Has Only One Testicle A physicist, an engineer and a statistician on a hunting trip spot a deer in a clearing. The physicist shoulders his rifle, takes aim and misses the deer by 5 feet to the left. The engineer grabs the rifle and says, “You forgot to account for the wind”. He takes aim and misses the deer by 5 feet to the right.

The statistician suddenly jumps up into the air, shouting excitedly “We got it, we got it!”. When it comes to misdirection there is no better way than using averages. The term ‘average’ is used an awful lot in the popular press – and even appears far too often in respected peer-reviewed journals. You see, there isn’t a single meaning when it comes to the word ‘average’, it can mean any of a dozen different things, all of which could be very different. In general, when someone uses the word ‘average’ they mean the middle value. Let’s look at an example to see how different measures of average (mean, median and mode) can give wildly varying answers.

Let’s say that there are ten people that live on your street and that, on average, everyone is a millionaire. Happy days! Now what if I told you that three are out of work and are flat broke, six are employed on normal wages of about £20-30k, and the legendary rock musician Ozzie von Hendrix, who earned £10 million last year, lives in the mansion on the cliff at the end of the street. That would rather change the whole complexion of the neighbourhood, wouldn’t it? You see, it all depends which measure of average you choose to use.

If you’re a real estate agent and you’re trying to sell a house in the street, it will do your cause the power of good to tell prospective buyers that this is a really good area with an average income of over a million quid (the mean). On the other hand, you could argue to the local taxes commission that the residents cannot afford any rate rises because the average resident for this street is unemployed and hasn’t earned a bean in the past year (the mode in these data is £0).

And yet the numbers don’t lie – you could argue that on average you are all millionaires (mean), earn a standard living of £25k (median) and at the same time you are all broke (mode). This is why you should get a very nervous tick whenever you hear the word ‘average’ without an explanation of which specific measure is used. There is a specific case, though, when mean, median and mode are all the same, and that is when your data are normally distributed. It is useful to understand when data are not likely to fit a normal distribution when trying to figure out if someone is trying to pull a fast one. Heights of men are normal, as are blood pressure measurements or IQ scores. Beware, though, the data distributions that might sound like they would be normal, but in fact hide two or more distinct distributions, such as annual salaries. It is not difficult to rationalise annual salaries in a company as being normally distributed – the janitors and cleaners are the lowliest paid, the managers are more highly paid and the bulk of the workforce have a wage somewhere in between. Sounds normal to me. Ah, yes, but what about the directors and executives?

Typically, they have an entirely different pay scale.

If you were to graph all the salaries with income on the x-axis and frequency on the y-axis (the number of workers in a particular pay band) there would be a large hump on the left representing the wages of the employees, a gap, and then a smaller hump on the right corresponding to the wages of the executives. If one were to include all wages in a single ‘company average salary’, the executive salaries would skew the results of the mean wages to the right, suggesting a higher ‘average’ salary. The median salary, though, would hardly be affected by the executive wages.

If the intention of the analyst was to try to downplay the average company wages, perhaps for some sort of tax audit, she might omit the executive wages (remember selection bias from above?) and then use the median. Alternatively, to exaggerate the average salary, perhaps for recruitment purposes, she would include executive salaries and use the mean. Better still, make sure that all executive bonuses, dividends and payments-in-kind are included too – that would make for an even higher average! Whenever the word ‘average’ is used, we should be sceptical – it is very often used to mislead. Like the man who drowned in a pool of water, the average depth of which was one inch. Or that the average human being has only one testicle. We should also spot when a report specifies which measure of average is being used for some figures but not all: ‘Our survey said that median age was 27 years old and average income was £32k’.

The attempt to hide a rogue ‘average’ income in there amongst a ‘median’ age should not fool the wary, and questions should be asked as to the validity and motives of the report. Back to Contents.

Give Ps A Chance A physicist, a chemist and a statistician spot a fire in the waste paper basket. The physicist decides that to put out the fire they must cool down the materials until the temperature is lower than the ignition temperature. The chemist disagrees, deciding that they should eliminate one of the reactants by cutting off the oxygen supply.

With the debate raging, the physicist and chemist are alarmed to see the statistician running around starting fires in all the other waste paper baskets. “What are you doing?” they both scream, to which the statistician replies, “Trying to get an adequate sample size…”. What does it mean when a survey finds that “66.7% of people agree that…”? You can be pretty sure that there were only three people in the survey and one of them disagreed. What if one person in the study changed their mind? Well, then, either 100% or 33.3% agreed. These percentage swings are huge, and – I’m sure you will agree – the survey has almost no credibility.

The number of participants in a poll, study or survey, then, is critically important in determining its validity. We call this the sample size. If you’re reading this book then you will likely have at least a nodding acquaintance with statistics and will understand why we need a sample – it’s because most of the time it’s practically, ethically or economically impractical to poll the entire population. Instead, we take a small representative sample and then scale up to infer the results of the whole population. Ah, yes, but how small?

There is a whole branch of statistics dedicated to calculating sample sizes in all sorts of situations, and there is just as much art in the calculations as science. What can be said with certainty, though, is that the larger the sample size the higher your confidence in the results – though the results might not be what you wish to find. If you’re a smart, conscientious medical researcher whose job it is to save lives and whatnot, then you will try to have as large a sample size as your ethics committee will allow. Conversely, if you’re a researcher who works for a company whose job it is to sell as much product as possible, then your main role might be to show how your product is ‘better’ or ‘preferred’ over rival products, even if it isn’t. Not to worry – you have statistics and small sample sizes on your side! This reminds me of a piece of research that led to Dr Andrew Wakefield being struck off the UK medical register for his 1998 research paper (amongst other things) that claimed there was a link between the measles, mumps and rubella (MMR) vaccine and the appearance of autism in children, which has since been shown to be fraudulent.

The paper was reported widely in the media and panic ensued as millions of parents across the US, UK and beyond refused to have their children vaccinated with the MMR vaccine. After vaccination rates dropped, incidences of two of the three diseases increased greatly. As a direct result, many children who were not immunised became very ill, some were permanently injured and there were even deaths reported across the globe. During an investigation into the study, all sorts of unethical and unauthorised shenanigans were uncovered, but there were two things in particular that really stood out for me.

The first was that the paper had a sample size of just 12 children. As we’ve already established, small sample sizes are the playground of the unscrupulous, and any medical study that has only 12 samples should be taken with a pinch – correction – a huge cartful of salt. The second was that the paper was published in the Lancet – one of the most prestigious peer-reviewed publications in the world. Really. They should have known better. To their credit, the Lancet eventually retracted the paper claiming that it was “utterly false” and that the journal had been “deceived”. One of the big problems with this research was that with only 12 patients it was very easy to ‘turn a blind eye’ to a piece of evidence or ‘tweak’ the odd bit of data here and there and – hey presto – a highly significant result. It would have been much harder to deceive the research community if the sample size had been much higher.

OK, take a deep breath, count to ten and we’ll continue with a lighter subject… The bottom-line about sample sizes is that the larger the sample the higher the confidence in the results. Sure, but how do we measure this confidence? Many years ago, the British statistician Ronald Aylmer Fisher invented the p-value. Well, actually, that is disputed, but I’m not getting into it. Fisher, at the very least popularised the p-value as a measure of confidence about a particular hypothesis test. We all know how to do this; you make a hypothesis that something is not true (the null hypothesis) and then try to disprove it. The p-value, a number between 0 and 1 tells us the probability that the null hypothesis is correct. If the p-value is large (greater than 0.05), there is strong evidence that the null hypothesis is correct and our confidence of a significant result is low. On the other hand, if the p-value is small (smaller than 0.05), there is weak evidence to support the null hypothesis and we can be confident in rejecting the null hypothesis and concluding that there is evidence of a significant result. But where did this arbitrary figure of 0.05 come from?

This came directly from Fisher himself, who said that: …it is convenient to draw the line at about the level at which we can say: “Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials”… And of course, 1 in 20 expressed as a decimal is 0.05. In seven years as a statistical consultant in one of the largest teaching hospitals in Europe, not a day went by when I did not receive a dataset with the instructions ‘I need a p-value for my study’. Since Fisher introduced the p-value in 1925 it has become one of the cornerstones of statistical analysis, and any study that does not have a p-value attached to it is deemed worthless. So why is it that seemingly important numbers as “our survey found a 57% increase…” is not accompanied by a p-value? Do they have something to hide? You can bet your sweet tush that they do! Take the example of the cat food commercial, where they claim, “9 out of 10 cats prefer MiaoMix cat food”.

To what? Budgie seed? And how many times did they repeat the experiment of putting cats in front of two bowls of food and waiting until 9 out of 10 went to the bowl that they wanted them to go to? If they had reported that ‘90 out of 100 cats preferred…’ it would be much more believable, but it is very much easier to get 9 out of 10 cats to go to the ‘right’ bowl than 90 out of a 100, especially when you spray the ‘wrong’ bowl with cat repellent. You see, sample size matters, and so does the p-value. Where is the p-value? Nowhere to be seen! How many repeats did they conduct on the experiment? No idea – they won’t tell us. And just where did they file the other results from all these repeat experiments? To quote the late, lamented Douglas Adams: They were on public display in the display department beneath the cellar in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’…

So the next time you see a report, paper, thesis or whatever, take the time to check out the sample size and ask yourself if the sample is large enough to permit any reliable conclusion, then follow up with the question ‘what is the probability that this result is true?’. Look for whether a p-value is reported. If the study is statistically significant, you can bet your mother’s top lip that the authors will include a p-value and anything else that would add weight to their assertions.

If they are not there, be very, very suspicious.

With the debate raging, the physicist and chemist are alarmed to see the statistician running around starting fires in all the other waste paper baskets. “What are you doing?” they both scream, to which the statistician replies, “Trying to get an adequate sample size…”. What does it mean when a survey finds that “66.7% of people agree that…”? You can be pretty sure that there were only three people in the survey and one of them disagreed. What if one person in the study changed their mind? Well, then, either 100% or 33.3% agreed. These percentage swings are huge, and – I’m sure you will agree – the survey has almost no credibility.

The number of participants in a poll, study or survey, then, is critically important in determining its validity. We call this the sample size. If you’re reading this book then you will likely have at least a nodding acquaintance with statistics and will understand why we need a sample – it’s because most of the time it’s practically, ethically or economically impractical to poll the entire population. Instead, we take a small representative sample and then scale up to infer the results of the whole population. Ah, yes, but how small?

There is a whole branch of statistics dedicated to calculating sample sizes in all sorts of situations, and there is just as much art in the calculations as science. What can be said with certainty, though, is that the larger the sample size the higher your confidence in the results – though the results might not be what you wish to find. If you’re a smart, conscientious medical researcher whose job it is to save lives and whatnot, then you will try to have as large a sample size as your ethics committee will allow. Conversely, if you’re a researcher who works for a company whose job it is to sell as much product as possible, then your main role might be to show how your product is ‘better’ or ‘preferred’ over rival products, even if it isn’t. Not to worry – you have statistics and small sample sizes on your side! This reminds me of a piece of research that led to Dr Andrew Wakefield being struck off the UK medical register for his 1998 research paper (amongst other things) that claimed there was a link between the measles, mumps and rubella (MMR) vaccine and the appearance of autism in children, which has since been shown to be fraudulent.

The paper was reported widely in the media and panic ensued as millions of parents across the US, UK and beyond refused to have their children vaccinated with the MMR vaccine. After vaccination rates dropped, incidences of two of the three diseases increased greatly. As a direct result, many children who were not immunised became very ill, some were permanently injured and there were even deaths reported across the globe. During an investigation into the study, all sorts of unethical and unauthorised shenanigans were uncovered, but there were two things in particular that really stood out for me.

The first was that the paper had a sample size of just 12 children. As we’ve already established, small sample sizes are the playground of the unscrupulous, and any medical study that has only 12 samples should be taken with a pinch – correction – a huge cartful of salt. The second was that the paper was published in the Lancet – one of the most prestigious peer-reviewed publications in the world. Really. They should have known better. To their credit, the Lancet eventually retracted the paper claiming that it was “utterly false” and that the journal had been “deceived”. One of the big problems with this research was that with only 12 patients it was very easy to ‘turn a blind eye’ to a piece of evidence or ‘tweak’ the odd bit of data here and there and – hey presto – a highly significant result. It would have been much harder to deceive the research community if the sample size had been much higher.

OK, take a deep breath, count to ten and we’ll continue with a lighter subject… The bottom-line about sample sizes is that the larger the sample the higher the confidence in the results. Sure, but how do we measure this confidence? Many years ago, the British statistician Ronald Aylmer Fisher invented the p-value. Well, actually, that is disputed, but I’m not getting into it. Fisher, at the very least popularised the p-value as a measure of confidence about a particular hypothesis test. We all know how to do this; you make a hypothesis that something is not true (the null hypothesis) and then try to disprove it. The p-value, a number between 0 and 1 tells us the probability that the null hypothesis is correct. If the p-value is large (greater than 0.05), there is strong evidence that the null hypothesis is correct and our confidence of a significant result is low. On the other hand, if the p-value is small (smaller than 0.05), there is weak evidence to support the null hypothesis and we can be confident in rejecting the null hypothesis and concluding that there is evidence of a significant result. But where did this arbitrary figure of 0.05 come from?

This came directly from Fisher himself, who said that: …it is convenient to draw the line at about the level at which we can say: “Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials”… And of course, 1 in 20 expressed as a decimal is 0.05. In seven years as a statistical consultant in one of the largest teaching hospitals in Europe, not a day went by when I did not receive a dataset with the instructions ‘I need a p-value for my study’. Since Fisher introduced the p-value in 1925 it has become one of the cornerstones of statistical analysis, and any study that does not have a p-value attached to it is deemed worthless. So why is it that seemingly important numbers as “our survey found a 57% increase…” is not accompanied by a p-value? Do they have something to hide? You can bet your sweet tush that they do! Take the example of the cat food commercial, where they claim, “9 out of 10 cats prefer MiaoMix cat food”.

To what? Budgie seed? And how many times did they repeat the experiment of putting cats in front of two bowls of food and waiting until 9 out of 10 went to the bowl that they wanted them to go to? If they had reported that ‘90 out of 100 cats preferred…’ it would be much more believable, but it is very much easier to get 9 out of 10 cats to go to the ‘right’ bowl than 90 out of a 100, especially when you spray the ‘wrong’ bowl with cat repellent. You see, sample size matters, and so does the p-value. Where is the p-value? Nowhere to be seen! How many repeats did they conduct on the experiment? No idea – they won’t tell us. And just where did they file the other results from all these repeat experiments? To quote the late, lamented Douglas Adams: They were on public display in the display department beneath the cellar in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’…

So the next time you see a report, paper, thesis or whatever, take the time to check out the sample size and ask yourself if the sample is large enough to permit any reliable conclusion, then follow up with the question ‘what is the probability that this result is true?’. Look for whether a p-value is reported. If the study is statistically significant, you can bet your mother’s top lip that the authors will include a p-value and anything else that would add weight to their assertions.

If they are not there, be very, very suspicious.

The Confidence Trickster “Look at the difference in heights on this bar chart,” says the researcher, “there must be a significant difference between them – this is twice the height of the control sample.” “Ah yes,” says the statistician “but where is your confidence about the result?” “Oh, I’m very confident about the result…” explains the researcher. Statistician face palms, exclaims “d’oh!”. Measure the heights of 100 randomly chosen men, and calculate that the mean height is 175cm. If you repeat the experiment (with 100 different men, obviously), will you get precisely the same answer as you did first time? If you get a different answer, how do you know which one is correct? Are either of them correct? Would you rather be doing something more fun instead? No, wait, there isn’t anything more fun – statistics is the most fun anyone can have by themselves… In statistics, confidence isn’t about your ego, it is a calculation of how sure you are that the result represents the ‘true’ measure – that is, the answer you would get if you measured the whole population. If you repeat the height measurement experiment lots of times, you will get many different answers.

Most will be quite similar and a few will be quite different, but each of the answers gives us more information about the true measure. Are all the results quite similar or is there a lot of variation? Take the middle half of the results by ranking all the results from smallest to largest and discarding the upper and lower quarters of all the results. If these results fall between 150cm and 200cm, how confident will you be in stating that the mean height is 175cm? Not very. OK, so let us instead say that the middle half of the results lie between 170cm and 180cm. Now how confident are you that the mean height is 175cm? Feeling a little more smug now, aren’t we?

By now, you should be starting to get the sense that the calculated mean is just one possible measure out of many. In other words, it is not likely to be the ‘true’ value. In fact, you’re never likely to know exactly what the true value is, and the best you can do is give some estimate of how confident you are that it lies between some lower and upper measurement. As above, you could perhaps say that you estimate with 50% confidence that the ‘true’ height of men lies between 170cm and 180cm (since 50% of the means lie between these values).

These confidence intervals (170-180cm) are quite tight and the level of confidence (50%) in the estimate of the true mean is quite low; it is the same probability as tossing a coin – I don’t know about you, but I’m not going to put my shirt on it. To increase confidence in the result we can widen the confidence intervals. We might then find that 75% of the results lie between 160cm and 190cm, or that 95% of the results fall in the range 150-200cm. In this latter example, we can say that 95% of experiments carried out just like this one will include the true mean, but 5% will not – there remains a 1 in 20 chance that our confidence interval does not include the true mean.

Let’s have another look at the example from the top of this chapter, where the researcher pointed out that his results were twice the size of the control sample. It is very common to see this kind of graph, accompanied by a lot of excitement. The more important the discovery, the more excitement goes along with it. You can feel the researcher sensing that their career is about to take off and shoot through the stratosphere. Until the more experienced researcher coughs politely and apologises for asking a silly question, but ‘umm, why aren’t there any 95% Confidence Intervals on the graph?’. Here is a graph of the sample and control with 95% confidence intervals: Are we still confident that we have made a major discovery? The height of the sample is still twice that of the control, but my – look how wide those confidence intervals are. You see, the measurement of the mean is not important, but the confidence intervals are, and this is why sampling is so important to any study. With a small sample size, you can achieve pretty much any mean value you desire.

The confidence intervals will betray you, but only if you publish them with your study. Remember the “9 out of 10 cats” example from earlier? Given that the company commissioning the research is likely very rich (otherwise they wouldn’t be spending hundreds of thousands of pounds on a large advertising campaign), don’t you think they could have afforded to do a study with more than 10 cats? Of course they could, but getting the result they desire depends on a small sample size and the likelihood that they could actually repeat their results occasionally. If they had used 100 cats, how likely do you think it would have been for them to be able to publish a result that says “90 out of 100 cats preferred MiaoMix cat food (p=0.0001, 95% confidence intervals: 87.3-92.7% in over 1000 repeat experiments)”? Believe me – if these were their results they would be shouting them from the rooftops, and you would be left in absolutely no doubt that their cat food is the best. These are not their results though. They state ‘9 out of 10’ because they cannot get 90 out of 100. There is no p-value because their results are not significant.

They make no mention of the repeatability of their results because their actual results vary extremely widely. They don’t publish their confidence intervals because that would actually have the opposite effect – cat owners would be disgusted and would switch to another brand. Telling the truth in a commercial setting can be the equivalent of business suicide! A few years ago, I was attending a business course and I was chatting with the presenter at the break. Unusually, he became very interested when

I told him that I was a statistician – they generally begin looking for the nearest available exit

Incidentally, that reminds me of a story that my old professor told me and my colleagues when I was a physics undergraduate student. He said that if we are at a party and are asked what we do for a living, under no circumstances should we say that we are physicists. If we have any designs on settling down with a wife and having 2.3 kids, or even just settling for an occasional girlfriend or being invited to any other party, we should say that we are lorry drivers or postmen. I must say that it is pretty good advice, and I get invited to at least two parties most years. Now I’m a statistician I tell the same story to my students, although of course I substitute ‘physics’ for ‘statistics’. But I digress – back to the business course. The presenter, a former Chief Financial Officer of a very successful FTSE 100 tech company (before they were acquired by a much larger Japanese rival), told me the story of how the company spent millions of pounds trying to figure out why the reported accuracy of their device was lower than that of their main rival. They looked at every component, all the hardware, the software, the testing methods – everything. They were convinced that every component in their device was at least as good as the equivalent component in their rival’s device, and in many cases superior. And yet their numbers didn’t tally.

They finally turned to a statistician to lead the investigation. He found that, although the reported accuracy of their device appeared worse than that of their rival’s, the confidence intervals of the competing devices were, in fact, pretty much identical – their device was not inferior to their rival’s product after all! Now if only they had consulted a statistician in the first place. After all, we are pretty easy to spot – we’re the uncomfortable looking guys at parties claiming to be lorry drivers and postmen…

Most will be quite similar and a few will be quite different, but each of the answers gives us more information about the true measure. Are all the results quite similar or is there a lot of variation? Take the middle half of the results by ranking all the results from smallest to largest and discarding the upper and lower quarters of all the results. If these results fall between 150cm and 200cm, how confident will you be in stating that the mean height is 175cm? Not very. OK, so let us instead say that the middle half of the results lie between 170cm and 180cm. Now how confident are you that the mean height is 175cm? Feeling a little more smug now, aren’t we?

By now, you should be starting to get the sense that the calculated mean is just one possible measure out of many. In other words, it is not likely to be the ‘true’ value. In fact, you’re never likely to know exactly what the true value is, and the best you can do is give some estimate of how confident you are that it lies between some lower and upper measurement. As above, you could perhaps say that you estimate with 50% confidence that the ‘true’ height of men lies between 170cm and 180cm (since 50% of the means lie between these values).

These confidence intervals (170-180cm) are quite tight and the level of confidence (50%) in the estimate of the true mean is quite low; it is the same probability as tossing a coin – I don’t know about you, but I’m not going to put my shirt on it. To increase confidence in the result we can widen the confidence intervals. We might then find that 75% of the results lie between 160cm and 190cm, or that 95% of the results fall in the range 150-200cm. In this latter example, we can say that 95% of experiments carried out just like this one will include the true mean, but 5% will not – there remains a 1 in 20 chance that our confidence interval does not include the true mean.

Let’s have another look at the example from the top of this chapter, where the researcher pointed out that his results were twice the size of the control sample. It is very common to see this kind of graph, accompanied by a lot of excitement. The more important the discovery, the more excitement goes along with it. You can feel the researcher sensing that their career is about to take off and shoot through the stratosphere. Until the more experienced researcher coughs politely and apologises for asking a silly question, but ‘umm, why aren’t there any 95% Confidence Intervals on the graph?’. Here is a graph of the sample and control with 95% confidence intervals: Are we still confident that we have made a major discovery? The height of the sample is still twice that of the control, but my – look how wide those confidence intervals are. You see, the measurement of the mean is not important, but the confidence intervals are, and this is why sampling is so important to any study. With a small sample size, you can achieve pretty much any mean value you desire.

The confidence intervals will betray you, but only if you publish them with your study. Remember the “9 out of 10 cats” example from earlier? Given that the company commissioning the research is likely very rich (otherwise they wouldn’t be spending hundreds of thousands of pounds on a large advertising campaign), don’t you think they could have afforded to do a study with more than 10 cats? Of course they could, but getting the result they desire depends on a small sample size and the likelihood that they could actually repeat their results occasionally. If they had used 100 cats, how likely do you think it would have been for them to be able to publish a result that says “90 out of 100 cats preferred MiaoMix cat food (p=0.0001, 95% confidence intervals: 87.3-92.7% in over 1000 repeat experiments)”? Believe me – if these were their results they would be shouting them from the rooftops, and you would be left in absolutely no doubt that their cat food is the best. These are not their results though. They state ‘9 out of 10’ because they cannot get 90 out of 100. There is no p-value because their results are not significant.

They make no mention of the repeatability of their results because their actual results vary extremely widely. They don’t publish their confidence intervals because that would actually have the opposite effect – cat owners would be disgusted and would switch to another brand. Telling the truth in a commercial setting can be the equivalent of business suicide! A few years ago, I was attending a business course and I was chatting with the presenter at the break. Unusually, he became very interested when

I told him that I was a statistician – they generally begin looking for the nearest available exit

Incidentally, that reminds me of a story that my old professor told me and my colleagues when I was a physics undergraduate student. He said that if we are at a party and are asked what we do for a living, under no circumstances should we say that we are physicists. If we have any designs on settling down with a wife and having 2.3 kids, or even just settling for an occasional girlfriend or being invited to any other party, we should say that we are lorry drivers or postmen. I must say that it is pretty good advice, and I get invited to at least two parties most years. Now I’m a statistician I tell the same story to my students, although of course I substitute ‘physics’ for ‘statistics’. But I digress – back to the business course. The presenter, a former Chief Financial Officer of a very successful FTSE 100 tech company (before they were acquired by a much larger Japanese rival), told me the story of how the company spent millions of pounds trying to figure out why the reported accuracy of their device was lower than that of their main rival. They looked at every component, all the hardware, the software, the testing methods – everything. They were convinced that every component in their device was at least as good as the equivalent component in their rival’s device, and in many cases superior. And yet their numbers didn’t tally.

They finally turned to a statistician to lead the investigation. He found that, although the reported accuracy of their device appeared worse than that of their rival’s, the confidence intervals of the competing devices were, in fact, pretty much identical – their device was not inferior to their rival’s product after all! Now if only they had consulted a statistician in the first place. After all, we are pretty easy to spot – we’re the uncomfortable looking guys at parties claiming to be lorry drivers and postmen…

Pay No Attention To That Man Behind The Curtain… The British illusionist Derren Brown once famously claimed that he could flip a coin fairly, under controlled conditions and – in front of multiple cameras recording continuously from multiple angles – have it come up heads ten times in a row.

Then he did it. You can imagine him putting his fingers to his temples, saying “now

I want all of you at home to focus…”, ignoring the fact that the footage had been pre-recorded and the audience was being asked to mentally influence an event that had already taken place some weeks earlier. “Focus…” he might have said, “and I want you to picture in your mind the image of heads coming up each and every time”. “Focus…” he might have repeated, because he didn’t actually say any of this, I’m using artistic licence to raise the tension for dramatic effect, “…focus, and visualise a head coming up with each new coin toss…”. “Are we ready? Let’s count them off together,” he says, as he tosses the first coin into the bowl. “1 – that’s the first head”, he says as he picks up the coin and prepares for the second toss. “2, that’s 2, that’s the second head” “3. That’s 3 heads” “4. That’s 4 heads in a row”. *Sniff*, he sniffed nonchalantly … … “8. That’s 8 now, just 2 more to go”. A little rub of the nose and a shrug of the shoulders.

“9 heads. Last one coming up”, he said confidently. “10.

That’s 10 heads in a row. Thank you very much ladies and gentlemen. Good night, and don’t forget – I’ll be here all week…” Ta-daaaaa! Ten heads in a row. It’s a modern day miracle! Then he revealed how he had done it. The likelihood of ten heads coming up in a row is about one in 1000 (you can do the maths if you’d like), but if you flip enough coins you will eventually get those ten heads that you are looking for. Start from the beginning, with the camera rolling (not live, of course – we’re pre-recording this), and count the heads: “1, 2, oops, a tails. Start again. 1, 2, 3, oh bugger, another tails. Start again…” and so on and so forth, etc., etc., ad nauseum until you eventually get the ten heads in a row. Once you have this footage, simply edit the film to show only the last ten coin flips, conveniently ignoring the previous 963 flips before the successful sequence emerged. In his explanation, after several hours of flipping coins and looking at nothing but the bottom of the bowl, he is heard to exclaim, “I can’t see the bowl anymore!”.

The filming lasted for over nine hours until he got the run of ten heads he was seeking. No doubt he ended up with a crippling migraine and woke up screaming “Tails!” in the middle of the night. You see, Mr Derren Brown, magician, illusionist, con man, told no lies. He told the truth, but he did not tell you the whole truth. He used the camera to lie by omission, showing you only the things he wanted you to see, while leaving out other inconvenient truths. He left it to your imagination to either deduce the truth, which any self-respecting mathematician or statistician would have done, or to be truly astounded by the amazing spectacle. For the record, your author wasn’t fooled – I knew what was going on. I guess you can fool some of the people some of the time…

A famous example of lying by omission came by way of the then President of the United States, Bill Clinton.

When asked of his infamous affair with a White House intern, Monica Lewinsky, President Clinton answered, “there’s nothing going on between us”. In his grand jury testimony he insisted that he had answered this question truthfully because at the time he was asked there was no ongoing relationship with Miss Lewinsky – it had ended months previously. There was then a long debate as to the meaning of the word ‘is’, as to whether it only pertains to the present or whether it can be construed to also include events in the past. Hilarious! I guess the interviewer should have been more careful with the question, and should have insisted on also using the word ‘was’ in the dialogue. Then we could have debated the meaning of that word too! You don’t necessarily need to lie to get your point across though, there are other ways. Take the example that “83% of dentists recommend ToothWhyte toothpaste”. Do they indeed? Well, it must be really good stuff and better than all that other rubbish then, mustn’t it? If that were truly the case, then surely all the other toothpaste companies would have tested ToothWhyte for themselves and figured out their secret ingredient. Once they’ve done that, then their toothpaste will be just as good.

Except that ToothWhyte don’t need a super-expensive secret ingredient when they have statistics on their side. All they need to do is poll some dentists (who are no more experts on different brands of commercial toothpaste than the average Josephine on the street) and ask them “Which toothpastes in this list would you recommend (choose all that apply)?”. Dentists are, of course, going to recommend all reputable toothpastes, and many of them would likely tick the ToothWhyte box as well as some of the others. ToothWhyte would be likely to score highly; after all, that was what the study was designed to do. As a precursor question, they only need ask, “Have you heard of ToothWhyte toothpaste before?”.

Any dentist that answers that they have not would be excluded from the study. While ToothWhyte received 83% of recommendations, their competitors could perhaps be scoring 90-100%, putting ToothWhyte at the bottom of the list. Nevertheless, there is no need to confuse the issue by reporting all these other irrelevant results is there? All that your audience needs to know is that 83% of dentists recommend our lovely toothpaste.

This is called wilful misdirection, where you selectively report only those results that make your point and bury those that do not. The customer might want an answer to the question “is ToothWhyte the best toothpaste for protecting and whitening my teeth?”. The manufacturer of said toothpaste doesn’t want to answer the question though. They cannot answer ‘Yes’ because it isn’t true, and they can’t answer ‘No’ because they will lose money. What they do instead is point to a result that answers a different question, one that looks similar to – but is distinctly different from – the one posed.

The question written at the top of the dentists’ poll may well have been “which of these toothpastes do you recommend?”, but the answers elicited from the structure of the poll were more closely aligned to the question “do you recognise any of these commercial brands of toothpaste?”. Do you think it likely that a real toothpaste company would engage in such practices as in this fictitious example? In 2007, Colgate was ordered by the Advertising Standards Authority of the UK to drop their claim that “more than 80% of dentists recommend Colgate”. This was deemed to be inherently misleading, as another competitor’s brand was recommended almost as much as the Colgate brand by the dentists surveyed. Colgate had surveyed dentists about which toothpastes they would recommend, and allowed them to select one or more brands.

The ASA said that the claim would be understood by readers to mean that 80% of dentists recommend Colgate over and above other brands, and the remaining 20% would recommend different brands. Apparently, the survey script also informed the dentists that an independent research company was carrying out the research, which was not true. Oh dear… Here’s another wonderful example of wilful misdirection by the ex-Chancellor of the Exchequer. I don’t recall the exact details, so I’ve made up the numbers here, but I distinctly remember the mathematical sleight of hand trick that he used to misdirect. He was delivering the budget statement live from Westminster and announcing that he was going to increase funding for some government department or other, and he proceeded to set out how much extra funding he was giving them.

Then he did it. You can imagine him putting his fingers to his temples, saying “now

I want all of you at home to focus…”, ignoring the fact that the footage had been pre-recorded and the audience was being asked to mentally influence an event that had already taken place some weeks earlier. “Focus…” he might have said, “and I want you to picture in your mind the image of heads coming up each and every time”. “Focus…” he might have repeated, because he didn’t actually say any of this, I’m using artistic licence to raise the tension for dramatic effect, “…focus, and visualise a head coming up with each new coin toss…”. “Are we ready? Let’s count them off together,” he says, as he tosses the first coin into the bowl. “1 – that’s the first head”, he says as he picks up the coin and prepares for the second toss. “2, that’s 2, that’s the second head” “3. That’s 3 heads” “4. That’s 4 heads in a row”. *Sniff*, he sniffed nonchalantly … … “8. That’s 8 now, just 2 more to go”. A little rub of the nose and a shrug of the shoulders.

“9 heads. Last one coming up”, he said confidently. “10.

That’s 10 heads in a row. Thank you very much ladies and gentlemen. Good night, and don’t forget – I’ll be here all week…” Ta-daaaaa! Ten heads in a row. It’s a modern day miracle! Then he revealed how he had done it. The likelihood of ten heads coming up in a row is about one in 1000 (you can do the maths if you’d like), but if you flip enough coins you will eventually get those ten heads that you are looking for. Start from the beginning, with the camera rolling (not live, of course – we’re pre-recording this), and count the heads: “1, 2, oops, a tails. Start again. 1, 2, 3, oh bugger, another tails. Start again…” and so on and so forth, etc., etc., ad nauseum until you eventually get the ten heads in a row. Once you have this footage, simply edit the film to show only the last ten coin flips, conveniently ignoring the previous 963 flips before the successful sequence emerged. In his explanation, after several hours of flipping coins and looking at nothing but the bottom of the bowl, he is heard to exclaim, “I can’t see the bowl anymore!”.

The filming lasted for over nine hours until he got the run of ten heads he was seeking. No doubt he ended up with a crippling migraine and woke up screaming “Tails!” in the middle of the night. You see, Mr Derren Brown, magician, illusionist, con man, told no lies. He told the truth, but he did not tell you the whole truth. He used the camera to lie by omission, showing you only the things he wanted you to see, while leaving out other inconvenient truths. He left it to your imagination to either deduce the truth, which any self-respecting mathematician or statistician would have done, or to be truly astounded by the amazing spectacle. For the record, your author wasn’t fooled – I knew what was going on. I guess you can fool some of the people some of the time…

A famous example of lying by omission came by way of the then President of the United States, Bill Clinton.

When asked of his infamous affair with a White House intern, Monica Lewinsky, President Clinton answered, “there’s nothing going on between us”. In his grand jury testimony he insisted that he had answered this question truthfully because at the time he was asked there was no ongoing relationship with Miss Lewinsky – it had ended months previously. There was then a long debate as to the meaning of the word ‘is’, as to whether it only pertains to the present or whether it can be construed to also include events in the past. Hilarious! I guess the interviewer should have been more careful with the question, and should have insisted on also using the word ‘was’ in the dialogue. Then we could have debated the meaning of that word too! You don’t necessarily need to lie to get your point across though, there are other ways. Take the example that “83% of dentists recommend ToothWhyte toothpaste”. Do they indeed? Well, it must be really good stuff and better than all that other rubbish then, mustn’t it? If that were truly the case, then surely all the other toothpaste companies would have tested ToothWhyte for themselves and figured out their secret ingredient. Once they’ve done that, then their toothpaste will be just as good.

Except that ToothWhyte don’t need a super-expensive secret ingredient when they have statistics on their side. All they need to do is poll some dentists (who are no more experts on different brands of commercial toothpaste than the average Josephine on the street) and ask them “Which toothpastes in this list would you recommend (choose all that apply)?”. Dentists are, of course, going to recommend all reputable toothpastes, and many of them would likely tick the ToothWhyte box as well as some of the others. ToothWhyte would be likely to score highly; after all, that was what the study was designed to do. As a precursor question, they only need ask, “Have you heard of ToothWhyte toothpaste before?”.

Any dentist that answers that they have not would be excluded from the study. While ToothWhyte received 83% of recommendations, their competitors could perhaps be scoring 90-100%, putting ToothWhyte at the bottom of the list. Nevertheless, there is no need to confuse the issue by reporting all these other irrelevant results is there? All that your audience needs to know is that 83% of dentists recommend our lovely toothpaste.

This is called wilful misdirection, where you selectively report only those results that make your point and bury those that do not. The customer might want an answer to the question “is ToothWhyte the best toothpaste for protecting and whitening my teeth?”. The manufacturer of said toothpaste doesn’t want to answer the question though. They cannot answer ‘Yes’ because it isn’t true, and they can’t answer ‘No’ because they will lose money. What they do instead is point to a result that answers a different question, one that looks similar to – but is distinctly different from – the one posed.

The question written at the top of the dentists’ poll may well have been “which of these toothpastes do you recommend?”, but the answers elicited from the structure of the poll were more closely aligned to the question “do you recognise any of these commercial brands of toothpaste?”. Do you think it likely that a real toothpaste company would engage in such practices as in this fictitious example? In 2007, Colgate was ordered by the Advertising Standards Authority of the UK to drop their claim that “more than 80% of dentists recommend Colgate”. This was deemed to be inherently misleading, as another competitor’s brand was recommended almost as much as the Colgate brand by the dentists surveyed. Colgate had surveyed dentists about which toothpastes they would recommend, and allowed them to select one or more brands.

The ASA said that the claim would be understood by readers to mean that 80% of dentists recommend Colgate over and above other brands, and the remaining 20% would recommend different brands. Apparently, the survey script also informed the dentists that an independent research company was carrying out the research, which was not true. Oh dear… Here’s another wonderful example of wilful misdirection by the ex-Chancellor of the Exchequer. I don’t recall the exact details, so I’ve made up the numbers here, but I distinctly remember the mathematical sleight of hand trick that he used to misdirect. He was delivering the budget statement live from Westminster and announcing that he was going to increase funding for some government department or other, and he proceeded to set out how much extra funding he was giving them.

He announced that he was going to give an extra £10 million per year over the next 5 years. He then stated that this gives an additive £150 million over the next 5 years over and above the already agreed funding. Wait, what? What just happened? How do 5 lots of 10 make 150? It didn’t take long before the press jumped on this and start asking questions.

The Chancellor’s explanation was thus: in the first year, the department will get an extra £10 million in funding. Over the first 2 years, they will get an extra £20 million, £30 million in extra funding over the first 3 years, and so on. Cumulatively, they will get an additive £150 million (i.e. 10 + 20 + 30 + 40 + 50 = 150). He was repeatedly asked whether that means that an extra £150 million of new money was going to be given to the department, and he would not directly answer the question, only repeating that there would be an additive £150 million in funding. He never repeated the figure without using the word additive attached to it. What he really said was that he was giving an extra £50 million in funding but he wanted it to look much bigger, so he used inappropriate mathematics to get an artificially inflated answer and presented that rather than the true figure that he didn’t want to present. Don’t we just love politicians? Lying by omission and wilful misdirection are very simple devices to avoid telling the whole truth or answering the question.

If you use small sample sizes it is easy to repeat your test until you get the result you want, then selectively report only those results you want your audience to see. Alternatively, you can simply design your study to answer a different question to the one that seems to be asked. Oh yes, and name-drop as often as you can. ‘These data came from a Harvard study, don’t you know…’ Did they indeed? And were the data analysed, interpreted and published by Harvard too? Naming of reputable sources gives the appearance of legitimacy to a study and makes the work look genuine, even if it has been totally fudged!

The Chancellor’s explanation was thus: in the first year, the department will get an extra £10 million in funding. Over the first 2 years, they will get an extra £20 million, £30 million in extra funding over the first 3 years, and so on. Cumulatively, they will get an additive £150 million (i.e. 10 + 20 + 30 + 40 + 50 = 150). He was repeatedly asked whether that means that an extra £150 million of new money was going to be given to the department, and he would not directly answer the question, only repeating that there would be an additive £150 million in funding. He never repeated the figure without using the word additive attached to it. What he really said was that he was giving an extra £50 million in funding but he wanted it to look much bigger, so he used inappropriate mathematics to get an artificially inflated answer and presented that rather than the true figure that he didn’t want to present. Don’t we just love politicians? Lying by omission and wilful misdirection are very simple devices to avoid telling the whole truth or answering the question.

If you use small sample sizes it is easy to repeat your test until you get the result you want, then selectively report only those results you want your audience to see. Alternatively, you can simply design your study to answer a different question to the one that seems to be asked. Oh yes, and name-drop as often as you can. ‘These data came from a Harvard study, don’t you know…’ Did they indeed? And were the data analysed, interpreted and published by Harvard too? Naming of reputable sources gives the appearance of legitimacy to a study and makes the work look genuine, even if it has been totally fudged!

Many changed has mad to Original copy to ensure the:

Copyright Reserved Publisher. Full book available on Amazon kindle

Do not Lie, Business needs honest Person to do it right.

KhD Business.Free Audible books available here

## Comments

## Post a Comment