Tuesday, April 16, 2013

Critique of correlative reason

(Translated from Critique de la raison corrélative).

Statistics provides counts, averages and totals; it also provides a measure of dispersion for quantitative variables, the standard deviation; finally it provides a measure of the relationship between quantitative variables, the correlation (for qualitative variables, the equivalent of the correlation is the chi2).

I spare the reader the mathematical expressions of these concepts: they are found in textbooks.

When there is an affine relationship (Y = aX + b) between two variables X and Y the absolute value of their correlation coefficient is equal to 1: they are "correlated".

When no such relationship exists, the correlation coefficient is equal to zero: the two variables are not correlated. When the relationship exists but is fuzzy, the absolute value of the correlation coefficient lies somewhere between 0 and 1.

*     *
Confronted to the descriptions statistics provide, we are like those children who always want to know why things are as they are: we want to know the causes. Felix qui potuit rerum cognoscere causas [1]!
Nevertheless some statisticians (Karl Pearson in the wake of Ernst Mach, Jean-Paul Benzécri) criticize the concept of cause because any causal explanation involves assumptions they believe "subjective" or "ideological". They cultivate statistics without causality and push to the extreme, sometimes to the mystic, the contemplative position of the statistician : Benzécri thinks that the observation of correlations reveals "the pure diamond of true nature".

Yet when they must act - driving their car, brushing their teeth etc. - they certainly anticipate the results of their action, which implies to postulate a causality...

We will continue regardless of their objections.

*     *

Correlation is an indication of causality: if X is the cause of Y, maybe X and Y are correlated; conversely if X is irrelevant to Y, maybe their correlation will be zero.

We must say "maybe" because:
  • corr (X, Y) = corr (Y, X): being symmetric, the correlation does not indicate the direction of causality, it does not distinguish the "predictor" and the "dependent variable";
  • there may be functional relationships that are not affine: the correlation does not tell all;
  • there may be a functional relationship (including affine) between two variables when they are not connected by a causal relationship;
  • there may be a causal relationship between two variables without showing any obvious functional relationship;
  • the notion of "cause" itself is open to several interpretations, located at varying degrees of depth.
Nonlinear functional relationship

Consider for example a mobile launched in vacuum and subjected to the action of gravity. The equation of its movement in a suitably chosen reference is X = (1/2) gT2.

Suppose we observe X and T at regular intervals, the positions of the mobile building a "population" on which we construct a statistic.

If the observed values of T are symmetric with respect to zero the correlation between X and T is zero: it is the case when there is a symmetrical relationship of the second degree.

Thus the nullity of the correlation can either be due the independence of two variables, or it masks a functional relationship which is not affine.

A clever statistician will see that the speed of the mobile and time are correlated since V = gT and that will put him on the trail of a correct modeling. But not all statisticians are clever.

 Functional relationship without cause

A similar phenomenon may be the cause of two others that appear correlated without there being a causal relationship between them.

In episodes of economic growth (or decline), for example, many variables are correlated without being connected by any causality because they are driven by a similar trend.  

Cause without apparent functional relationship

If the evolution of a variable causes changes in the value of another variable, there will obviously a functional relationship between them but this can be hidden by a time lag: we won't find this relationship if we observe the two variables on the same date, and in order to find it one would have to shift one variable some weeks or months.

 This is the case for example in the relationship between inventory levels and production, between demand and investment etc. The clever econometrician knows how to identify such shifts, the naive econometrician (they exist) sees nothing.  

Stages of causality

Take the example of the mobile in free fall. If we consider only positive values of T a correlation between X and T appears. Can we say that T is the cause of X?

The naive answer is "yes": the more time passes, the more the mobile goes down. But a physicist, thinking deeper, will say that the cause lies not in time but in the acceleration of gravity g.

He may also go further and explain this acceleration following Newton g = km/d2: this provides a model of more general scope. He may also explain the force, following Einstein, by the curvature of space and gravitational waves. String theory provides further hypotheses to explain the propagation of these waves...

Hence the cause can be found in various theories, each of which considers the phenomenon with assumptions of different depths. It is the same, of course, in economics: the expression of the cause it is necessary to retain corresponds to the the scope, the depth of the model that we built.

Let us add that at the same level of depth a cause can still articulate several solidary layers, each one of them obeying a logic of its own (see Aristotle and the Business).  

Economics and Econometrics

Econometrics is based entirely on the exploitation of correlations: Logit models that econometricians are fond today or elementary forms of regression give what is necessary for calibrating equations and producing projections.

Econometrics may thus be subjected to the ambiguities of correlation. Learned econometricians know how to avoid pitfalls - as the overall correlation of variables between them, and with time, during periods of growth or recession, or as offsets that mask correlations etc. - but it remains difficult to avoid them all.

 Econometrics is not enough to identify causalities at work : we must know the economic theory and be skilled in the choice of assumptions. An economic model is nothing but the staging of a beam of assumptions.

*     *
Pure statistical description calls interpretation, which involves:
  1. to be are aware of the choices that led to the observation, to know for what action it was organized;
  2. to be aware of the extent of defects of data (eg that any population census has a bias of about 1 %, or 600 000 people in France);
  3. to possess a theoretical background sufficient to interpret correlations by cleverly choosing the assumptions about causality.
____________
[1] Virgil, Georgics, II, 489.

No comments:

Post a Comment