Set characterization-selection towards classification based on interaction index
Murillo, J. ; Guillaume, S. ; Spetale, F. ; Tapia, E. ; Bulacio, P.
Type de document
Article de revue scientifique à comité de lecture
Affiliation de l'auteur
UNIVERSIDAD NACIONAL DE ROSARIO CIFASIS-CONICET ARG ; IRSTEA MONTPELLIER UMR ITAP FRA ; UNIVERSIDAD NACIONAL DE ROSARIO CIFASIS-CONICET ARG ; UNIVERSIDAD NACIONAL DE ROSARIO CIFASIS-CONICET ARG ; UNIVERSIDAD NACIONAL DE ROSARIO CIFASIS-CONICET ARG
Résumé / Abstract
In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.
Fuzzy Sets and Systems, vol. 270, p. 74 - 89