About two years ago, Faiza Sani from Kano, graduated from secondary school with nine As in WAEC. If we’re going to take such a brilliant result as the definition of success in school, we may then be interested in the factors that account for such result. In the hope that when we reverse engineer (take apart the whole), we may be able to create many more Faizas.
In other words, you may be interested in knowing whether it’s Faiza’s IQ, hard-work, upbringing, excellent teachers, schools (she did her primary at Esteem School Abuja and secondary at Olumawo School Abuja) peers or the environment that account for the result. But it’s reasonable to conclude that many factors would account for such success.
She herself tried to explain that she studied her books at home regularly. This may explain the performance. Yet, we would be interested in knowing which factors contribute how much. For example, how much did Faiza’s IQ contribute to the result?
Many of us would guess that it was a lot. But research has shown that IQ contributes only 20% to success in school. If this holds, it then means that 80% of Faiza’s result was due to factors other than IQ. But which factors? To answer questions like this, statisticians use factor analysis. But there are many questions that statistics can answer.
Yet, despite its power or because of it, we often get overwhelmed by statistics. Some may even want to study it but don’t know where to start. If only there was a guide to show us the key principles of statistics.
Actually, there is.
Kaiser Fung, author of the book “Numbers Rule Your World,” argues that statistics is based on five key principles: variation, correlation and causation, patterns, accounting for group differences and the trade-off between two types of errors. If you therefore want to know statistics, you may want to get a handle on these five.
Let’s take them one by one.
VARIATIONS
Statisticians and data scientists try to find variations in the data. For example, to them, the variation to the mean is more important than the mean itself. If a group that includes Dangote for example has a median annual income of N1 million, Dangote, the richest black person, would easily stand out and become a variation of interest.
CAUSATION AND CORRELATION
Statistics is also interested in correlation and causation. Epidemiologists for example, want to know the cause of a disease. When some citizens of the United States tested positive for the bacteria E. coli, the experts were able to trace the cause to the consumption of bagged spinach. How did they do it?
They did extensive interviews with five of the patients and found that the food four of them had in common was spinach. They consulted other statistics and found that during normal times, only one in five people from the area ate spinach in a week. And therefore, they were able to pin down the cause of the bacteria outbreak.
In the case of correlation, we find the use for it even in our regular life. For instance, there’s a positive correlation between height and weight, which means the taller you’re, the heavier you’re likely (pay attention to this word) to be.
So if you want to transport tall friends to school, market or conference, you may want to look for a car that can accommodate heavy people, but if you’re expecting shorter comrades, even a little Kia would do. But note that correlation is not as powerful as causation. Some unrelated things could be correlated and therefore related. For example, ice cream sales are correlated with burglary. That is, the higher the sales, the higher the number of houses burgled. But we can’t say more ice-cream consumption causes theft. Sometimes you can’t find an explanation for some crazy correlations. However, if you look deeper, sometimes you do. As in this case, people consume more ice-cream in the summer to cool down. And for the same reason, it’s also the time when people leave their doors and windows open.
And as we read from the autobiography of Malcolm X, you’re more likely to be burgled if the thief can see inside your house. The fact that the doors are not locked, helped the case of the degenerate guest.
PATTERNS
Statisticians look for patterns. If students in a WAEC exam from a particular school get a high number of same questions right and also a higher number of same questions wrong especially if those questions were erased from wrong to right and vise versa, WAEC’s statisticians would be able to detect that the teachers probably gave them the answers. Because some patterns are not probable i.e. the probability of them happening is near zero.
ACCOUNTING FOR GROUP DIFFERENCES
Statisticians also account for group differences. Let’s go back to the WAEC example. If WAEC finds that a question is not fair to a certain section of the country, maybe through the way it was worded, they may remove the question so that it doesn’t contribute to the total score.
For example, say many southerners failed a question in the subject of government that mentions Uthman Dan Fodio and many northerners got it right, WAEC may remove it because it disadvantaged people who are not northerners.
But before doing that, they wouldn’t compare all the northern students with southern students. That wouldn’t be a fair comparison; instead, they would compare high performing students from both regions and low performing students from south and north.
TRADE-OFF AND TWO ERRORS
Finally, statisticians are concerned and often face a trade-off between two types of errors. Let’s use another examination body, NECO, for this example. In testing whether students cheated or not, two types of errors can create a problem – the false positive and false negative. False positive occurs when the students who didn’t cheat are accused of cheating and false negative means those who cheated are allowed to go.
What to do?
Testers, as in the case of athletes using enhanced drugs, use methods that would show positive result when the evidence DEFINITIVELY shows cheating. They do this with the knowledge that they are allowing some of the cheats to go. That’s a trade-off. They are allowing cheats to go, so that the innocent is not falsely punished.
There are other many things that statisticians are interested in but this list is a good beginning. To learn more statistics, here are some accessible resources I recommend: Statistics for those who hate statistics, Naked Statistics, Slippery Math in Public Affairs, Cartoon Guide to Statistics and KhanAcademy.org statistics videos.