Páginas

14 fevereiro 2019

Importantes limitações de Machine Learning

Estamos na era da abundância de informação e há todo momento vemos a imprensa falando de big data, machine learning e inteligência artificial. Há muita propaganda em torno do tema, pois os pesquisadores e entusiastas querem angariar dinheiro com o conhecimento que têm do tema, vendendo cursos, dando palestras etc. Ha aplicaçoes importantes de machine learning. Mas há um exagero em torno da aplicabilidade dessas ferramentas. Muitos deles acreditam que basta ter uma grande base de dados, um algoritmo computacional, que será possível resolver diversos problemas em várias áreas. No entanto, para isso acontecer é necessário o cumprimento de 2 hipóteses:

1) Dado amostra grande o suficiente de alguma distribuição, podemos aproximá-la eficientemente com um modelo estatístico, o que é conhecido como teorema da aproximação universal.

2) Uma amostra estatística de um fenômeno é suficiente para automatizar , raciocinar , prever o fenômeno;

No entanto, essas 2 hipóteses na maioria das vezes não ocorre na realidade:

1º Problema: Aproximação universal não é universal

There is a theoretical result known as the universal approximation theorem. In summary it states that any function can be approximated to an arbitrary precision by (at least) three  level composition of real functions, such as e.g. a multilayer perceptron with sigmoidal activation. This is a mathematical statement, but a rather existential one. It does not say if such approximation would be practical or achievable with, say, gradient descent approach. It merely states that such approximation exists. As with many such existential arguments, their applicability to real world is limited. In the real world, we work with strictly finite (in fact even "small") systems that we simulate on the computer. Even though the models we exercise are "small", the parameter spaces are large enough, such that we cannot possibly search any significant portion of them. Instead we use gradient descent methods to actually find a solution, but by no means there are guarantees that we can find the best solution (or even good at all).[..]

2º Problema: O teorema do limite central tem limites

The second issue  is that any time a dataset is created, it is assumed that the data contained in the set has a complete description of the phenomenon, and that there is a (high dimensional) distribution localised on some low dimensional manifold which captures the essence of the phenomenon. This strong belief is in part caused by the belief in the central limit theorem: a mathematical result that tells us something about averaging of random variables. [..]But there is a fine print: the theorem assumes that the original random variables had finite variance and expected values, were independent and identically distributed. Not all distributions have those properties!  In particular critical phenomena exhibit fat tail distributions which often have unbounded variance and undefined expected values (not to mention many real world samples are actually not independent or identically distributed).

3º Problema: O mundo está sempre mudando ( em termos técnicos: o mundo é não estacionário)


Aside from the fact that many real world phenomena may exhibit rather hairy fat tail distributions, there is another quirk which is often not captured in datasets: that phenomena are not always stationary. In other words, the world keeps evolving, changing the context and consequently altering fundamental statistical properties of many real world phenomena. Therefore a stationary distribution (a snapshot) at a given time may not work indefinitely[...]
These things happen all the time, potentially changing the daily/weekly/monthly patterns of behaviours. The pace at which things are changing may be to quick for statistical models to follow, even if they retrain online. Granted there are certainly aspects of behaviour and some humans which are very stable:  some people work in the same place for many years and have very stable patterns of behaviours. But this cannot be assumed in general.


 Fonte: aqui


 

Nenhum comentário:

Postar um comentário