Is Big Data Science?

The Elevate festival taking place in Graz every year always provides a lot of food for the brain. This year it was all about data, and the algorithms processing it.

One topic discussed on multiple panels is the use of (big) data collected on social networks (like Facebook) to conduct (social) studies. This often concerned the question whether it is possible to prove a hypothesis by just looking at data.

Epistemology (Erkenntnistheorie) is a complex topic, but a topic worth thinking about. Especially as a scientist.

During the Autonomous Machines - Controlled People? panel, panelist Claudia Wagner highlighted the difference between a causal model and a statistical model. A statistical model (data-driven model; like directly inferring from data) is looking for connections between datapoints: correlations. But correlation does not imply causation. In contrast, a causal model is doing exactly that: Looking for the cause. Trying to explain. Making sure the effect was really caused by the found cause.

So for a statistical model to be of scientific value, there needs to be further research concerning the actual cause of the observed effect. This is often not happening in pure data-driven studies, because it is not easily possible.

“Empirically observed covariation is a necessary but not sufficient condition for causality.” [Tufte 2006]