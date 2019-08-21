“I only believe in statistics that I doctored myself.” — Winston Churchill
Statistics are everywhere and influence all of us. Daily we are confronted with ratings on cars, hospitals, schools, political candidates and that may affect our decisions.
Some experts credit the British Prime Minister Benjamin Disraeli with the phrase: “There are three kinds of lies: lies, damned lies, and statistics” but the expression has been around almost as long as the word statistics. The problem is not the statistics, but how they can be misused or misunderstood. Let me illustrate five of the most common issues.
1. The inappropriate use of percentages without having the underlying numbers. Suppose we start a new retail company in 2017. In 2018 we had a good year, and we claim that we are the fastest-growing retailer in the U.S. with a 500 percent growth rate. Impressive until we find out that the underlying base numbers were $50,000 in 2017 and $250,000 in 2018. Never rely on impressive percentages without the base numbers.
2. Polls where a difference is claimed when there is no evidence for it. Almost daily, we see political surveys that track candidates for the presidency. If the polls are taken scientifically using a random sample of the population, there naturally tends to be changes from time to time.
For example, we are told a sample of 600 likely voters was taken on candidate Smootley. (Given the sample size the report indicates that it has a plus or minus 3 percent error rate.) Last week his/her approval rating was 44 percent.
This week the approval rating has fallen to 42 percent. The media reporter might then go on to say that this could mean big trouble for Smootley. This claim is absolute nonsense. It is inexcusable for a journalist to make this mistake. Even a first-semester statistics student, will tell you that given the error rate the “true” approval value could be as high as 45 percent or as low as 39 percent.
The error rate is critical. Incidentally, if we want to cut the error rate in half, we would have to quadruple the sample size to 2,400. So when you see reports of relatively small changes, don’t take it with a grain of salt, take it with the whole salt shaker.
3. Manipulated non-random samples. This example is based on an actual university in the southwest U.S. A few years ago, the university proudly declared, “We have the top-rated faculty in the nation.”
Sounds impressive until you find out this is a very small school with under 40 faculty and that the number of students rating the professors varied. In some cases, a faculty member was only rated by 3 or fewer students. But one faculty member (who may have been a reasonably good teacher) encouraged his 16 or so students to go online and rate him.
The school’s overall average rating (because they just totaled all the ratings without weighting them) soared to the top. Unfortunately, CBS News picked up the story apparently without checking the facts and methodology. At best, the university and then CBS were inept in using this result. Indeed, it was unethical to advertise such on its website and promotional materials.
4. Drawing a conclusion based on one or very few studies. Ever wonder about those health claims. Coffee is bad for you; coffee will prevent colon cancer; coffee does not affect your health. On and on it goes. Why all the confusion? One reason is the media too often picks up on single study.
Studies almost always have conflicting results. Things are usually not black and white. And you must consider the source. Is it being done with a vested interest, or by researchers who don’t have an ax or any coffee to grind? How was the study done, how was the sample selected and so forth?
One of the most powerful techniques in research is what we call a “Meta” study. What that means is we do a study of studies. Rather than looking at just one or two studies on a topic, we examine comprehensively 20, 30, 40 or more studies. These studies then answer what the preponderance of the evidence is. They can rightfully be the basis for good decision-making.
5. Statistical significance may or may not be of any practical importance. In everyday use, the word significant means something is important or noteworthy. In statistics, we can have a statistically significant difference, but that doesn’t tell us much.
For example, suppose the local school district is considering an expensive new math curriculum that will cost $200 a student. The company trying to sell us the product provides us with research from West Overshoe University that proves that the curriculum produces a “statistically significant increase” in test scores.
In fact, their claim is valid. But should we buy it? We simply need more information. If we take a large enough sample, we might find that the scores went up from 78 to 79, and that is a statistically significant result. Doubtful that the school district would want to spend $200 for that modest of a gain.
On the other hand, if the results went up from 78 to 92, not only would it likely be statistically significant, but it would be of practical importance and might justify the investment.
If something is not statistically significant — end of the game. But if someone claims statistical significance (whether it has to do with nutrition, growing hair, test score improvement, and a myriad of other claims), you need to demand more information. Just how substantial is the increase or decrease you are seeking.
In the end, understanding basic statistics is essential for all of us. You want always to ask questions and be wary of those few people either through ignorance or intent, misuse them.
