Clustering based active learning for evolving data streams

Apprentissage actif sur flux des données basé sur clustering

Ienco, D. ; Bifet, A. ; Zliobaite, I. ; Pfahringer, B.

Type de document
Communication scientifique avec actes
Langue
Anglais
Affiliation de l'auteur
IRSTEA MONTPELLIER UMR TETIS FRA ; YAHOO RESEARCH BARCELONE SPAIN ; AALTO UNIVERSITY AND HELSINKI INSTITUTE FOR INFORMATION TECHNOLOGY FIN ; UNIVERSITY OF WAIKATO HAMILTON NZL
Année
2013
Résumé / Abstract
Data labeling is an expensive andt ime-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in non-streaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a pre-clustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported.
Congrès
Discovery Science, 06/10/2013 - 09/10/2013, Singapour, SGP
Editeur
Springer Verlag

puce  Accés à la notice sur le site Irstea Publications / Display bibliographic record on Irstea Publications website

  Liste complète des notices de CemOA