Supervision : Christophe MARSALA
Co-supervision : DETYNIECKI Marcin
Adaptive machine learning algorithms for data streams subject to concept drifts
In this thesis, we investigate the problem of supervised classification on a data stream subject to concept drifts. A stream of data is a source which continuously (and potentially endlessly) emits data. Learning from these data streams is a tremendous challenge. The learning algorithm must be capable of learning out of sequential data and must obtain good predictions performances under the constraints of limited running time and computer memory.
Another major challenge is that the hidden probability distribution (the concept) which generates the observations might change over time (concept drift). This means that the observations used to learn can’t be assumed to be i.i.d. anymore and that a successful learning algorithm must have the flexibility to adapt to these changing distributions. In order to deal with these challenges, we claim that a successful learning algorithm must combine several characteristics. It must be able to learn and adapt continuously, it shouldn’t make any assumption on the nature of the concept or the expected type of drifts and it should be allowed to abstain from prediction when necessary.
On-line learning algorithms are the obvious choice to handle data streams. Indeed, their update mechanism allows them to continuously update their learned model by always making use of the latest data.
The instance based (IB) structure also has some properties which make it extremely well suited to handle the issue of data streams with drifting concepts. Indeed, IB algorithms make very little assumptions about the nature of the concept they are trying to learn. This grants them a great flexibility which make them likely to be able to learn from a wide range of concepts. Another strength is that storing some of the past observations into memory can bring valuable meta-informations which can be used by an algorithm. Furthermore, the IB structure allows the adaptation process to rely on hard evidences of obsolescence and, by doing so, adaptation to concept changes can happen without the need to explicitly detect the drifts.
Finally, in this thesis we stress the importance of allowing the learning algorithm to abstain from prediction in this framework. This is because the drifts can generate a lot of uncertainties and at times, an algorithm might lack the necessary information to accurately predict. In these cases, instead of trying to output a prediction at all cost, we have argued that it might be better to automatically disconnect the algorithm by allowing it to abstain from prediction.
Defence : 12/04/2017 - 14h - Site Jussieu - 405-24/25
Jury members :
M. Joao Gama, [Rapporteur]
Mme. Ludmila Kuncheva, [Rapporteur]
M. Bernd Amann
M. Albert Bifet
M. Antoine Cornuéjol
M. Vincent Lemaire
M. Marcin Detyniecki
M. Christophe Marsala
- P.‑X. Loeffel : “Algorithmes de machine learning adaptatifs pour flux de données sujets à des changements de concept”, thesis, defence 12/04/2017, supervision Marsala, Christophe, rapporteurs : DETYNIECKI Marcin (2017)
- P.‑X. Loeffel, V. Lemaire, Ch. Marsala, M. Detyniecki : “Improving the Prediction Cost of Drift Handling Algorithms by Abstaining”, IEEE International Conference on Data Mining (ICDM 2016), Barcelone, Spain (2016)
- P.‑X. Loeffel, Ch. Marsala, M. Detyniecki : “Memory management for data streams subject to concept drift”, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium (2016)
- P.‑X. Loeffel, Ch. Marsala, M. Detyniecki : “Classification with a reject option under Concept Drift: the Droplets Algorithm”, Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA'2015), Paris, France (2015)