Random Thoughts about Complexity, Data and Models
DOI: 10.3406/intel.2020.1948
to download freely
Data Science and Machine learning have been growing strong for the past decade. We argue that to make the most of this exciting field we should resist the temptation of assuming that forecasting can be reduced to brute-force data analytics. This owes to the fact that modelling, as we illustrate below, requires mastering the art of selecting relevant variables.
More specifically, we investigate the subtle relation between “data and models” by focussing on the role played by algorithmic complexity, which contributed to making mathematically rigorous the long-standing idea that to understand empirical phenomena is to describe the rules which generate the data in terms which are “simpler” than the data itself.
A key issue for the appraisal of the relation between algorithmic complexity and algorithmic learning is to do with a much needed clarification on the related but distinct concepts of compressibility, determinism and predictability. To this end we will illustrate that the evolution law of a chaotic system is compressibile, but a generic initial condition for it is not, making the time series generated by chaotic systems incompressible in general. Hence knowledge of the rules which govern an empirical phenomenon are not sufficient for predicting its outcomes. In turn this implies that there is more to understanding phenomena than learning – even from data alone – such rules. This can be achieved only in those cases when we are capable of “good modelling”.
Clearly, the very idea of algorithmic complexity rests on Turing’s seminal analysis of computation. This motivates our remarks on this extremely telling example of analogy-based abstract modelling which is nonetheless heavily informed by empirical facts.
Pour citer cet article :
Hosni Hykel, Vulpiani Angelo (2020/1). Random Thoughts about Complexity, Data and Models. In De Glas Michel & Lassègue Jean (Eds), Looking Back at Turing: His Heritage Today, Intellectica, 72, (pp.111-122), DOI: 10.3406/intel.2020.1948.