Are you right or wrong? How statistics, through testing and significance, allows us to evaluate opinions, hypotheses

 

 

 

Suppose you have an opinion. Let’s define opinion otherwise I won’t establish common ground: a word that comes from Latin and means a concept that one or more people form regarding particular facts, phenomena, manifestations, when, lacking a criterion of absolute certainty to judge their nature (or their causes , their qualities, etc.), a personal interpretation is proposed which is believed to be correct and to which one therefore gives one’s assent, admitting however the possibility of being deceived in judging it as such.

This last sentence of the definition can be discussed, because I have never heard a person who admits that in the future they will change their mind, especially an adult.

Suppose you have an opinion. I have defined what it means to have an opinion.

Examples of opinions

 

I sell more to customers with tattoos 

I earn more since the government put the bonus x

Customers who come from channel y make me earn more than those who come from channel z

Men who listen to metal music offer less dinners 

 

As anticipated, you can transform an opinion into a hypothesis. You can test the hypothesis: at the time of Galileo Galilei it was enough the sensible experience to test hypotheses, but when things become more abstract it is no longer enough. Rejecting or accepting a hypothesis or opinion, in statistics, is much less complicated than proving a theorem, even if the theorem turns out to be a empirical reality which transcends space and time, therefore has much greater weight than statistical significance. 

 

As I said, in statistics we can reject or accept a hypothesis. I won’t make the discussion heavier by expanding the statistical lexicon of the case, i.e. null hypothesis, alternative hypothesis, false positives and false negatives, I’ll just say that in statistics the “innocent until proven guilty” argument applies, translating this legalese concept into statistics-lingo: ” mean not different, non-existent correlation, useless variables and useless model until proven otherwise”. Taking stock of the situation, we have an opinion transformed into a hypothesis, at this point we have a hypothesis that we can test by doing a statistical test. A statistic test will give us a score from which we can calculate a statistical significance.

 

I have to say what is and what is not significance in statistics.

 

I start from what it is not. 

 

  • First of all, it is not an observation not due to chance: an observation can always mean an average or a correlation or a variable inserted into the model.
  • Statistical significance is certainly not the oracle of Delphi: the oracle was a being or entity considered a source of wise advice, of prophecies which also had the characteristic of infallibility. Someone who knows little about statistics might see significance as a divine, immutable judgment. 

 

 

 

  • Furthermore, statistical significance is not proof that by increasing the observations of an experiment, regardless of what you are measuring, you will have the same conclusions in terms of significance, for example 0.05 or less. So if you increase the observations to verify an average it does not mean that the significance remains the same, the same goes for the correlations and also for the usefulness of inserting variables into a model.
  • Another thing connected to a previous point, statistical significance is not the mirror of desires: the significances can change depending on the tool, model you use, more than anything depending on the tool. Unfortunately, researchers and others only use the tools that confirm their hypotheses, producing results in bad faith. I’ll give you an example that has often come up on YouTube: correlations with the classical method (with the p value) have significance, often with the Bayesian method part of those significances disappear. Even if the Bayesian method is more rigorous than the classical one, if I do the partial correlations with the classical method, therefore that linear correlation method which allows me to evaluate the correlation net of the effect of the other variables, once again gives me significance different from the two previous methods.

 

 

 

  • Statistical significance is not an indicator of the causal relationship: this requires particularly advanced methodologies and tools and it is no coincidence that in my videos very rarely, but also in the podcast, you will hear me talk about causality. At most I say that some models try to explore these causal relationships and I usually mention the network model, the path model and the structural equation model. Note that the order in which I mentioned these models is not random. 

 

At this point, after I have filtered out the non-meanings of significance, excuse this pun, I’ll tell you what a significance is or what significance means.

 

  • The statistical significance of a variable, in the case of a model, indicates whether the variable has a significant effect on the response variable, i.e. whether it serves to explain your variable of interest, after considering the effects of other variables. You will realize that when telling you what significance is, I still used significant effect: you can replace those two words with statistically relevant or important, you choose.

 

In the context of classical or Frequentist significance we associate significance with the p-value, what someone has called the Holy Grail of classical statistics, clearly each variable, in the case of a model, has associated its p-value. The p value ranges from 0 to 1 because it indicates a probability. It is calculated on the basis of the probability distribution associated with the statistical test used. Who tells us which probability distributions are associated with a type of statistical test? Theorems give us this answer, so theory helps practice.  

 

In general, a p value less than 0.05 indicates statistical significance. I add that depends also from the field because, for example, biologists, as well as psychologists, open the champagne as soon as they see a significance of this type and flood the literature with publications that can easily be dismantled due to this somewhat rigid limit, but in the field of particles physics they have a much more stringent level. Generally, if you have seen some of my videos on YouTube, I use the BF which comes from the Bayesian approach to quantify a significance, which has a higher level of rigor than the classical approach so there is less risk of drawing incorrect conclusions. The BF goes from 0 to infinity so has a slightly more complicated interpretation than the p-value of the classical world.

 

Having a significance indicator such as p-value or BF is not enough to draw definitive conclusions: in the podcast episode on how statistics can age, I mention a control, sequential analysis, which you see very often in my videos too, which is used to keep an eye on how significance can change over time or with the number of observations. 

 

Let’s get to the most painful part of this article: what limits does the significance deriving from a test statistic have? It seems like such a powerful tool for testing hypotheses and opinions, it surely has an Achilles’ heel. 

 

Yes it has: the quantity and quality of data. Certain things such as Einstein’s general relativity received the latest confirmation only a few years ago through scientific satellites that collected data. Therefore, certain opinions and statements cannot be verified or denied because the material to work with is missing: the data. Without data, no processing, no information, no knowledge (better if rigorous).

 

In the absence of data, various things can still be done to buffer the situation but in any case it is a strategy that varies case by case, for example inventing the data according to (imputation) techniques but here we are playing with fire. There are also tools to manage significance with the presence of extreme data, variables not considered, variables of interest where people lie more frequently (the three S: Sex, Money, Substances), unstable models, etc. 

 

Precisely because the significance can change depending on how you investigate the data, or torture it, this is especially true for economists, psychologists and biologists, you have a further reason why you need a professional to evaluate the significance.

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Privacy Policy