Séminaire Donnees et APprentissage Artificiel
Plus d'informations ici
Automated Feature Weighting in Naive Bayes for High-dimensional Data Classification
Intervenant(s) : Shengrui Wang (Université de Sherbrooke)This talk is about our recent work in the area of feature weighting for high dimensional data classification (and clustering). The first part of my talk relates to Naive Bayes (NB for short) classifier. Currently, in many real-world applications, high-dimensionality poses a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this work, we propose an automated feature weighting solution to enable the NB method to deal effectively with high-dimensional data. First a locally weighted probability model will be presented for implementing a soft feature selection scheme. Then an optimization algorithm will be presented to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies will show the effectiveness and suitability of the proposed model for high-dimensional data classification.
In the second part of this talk, I will briefly present our work on central clustering of categorical data with automated feature weighting. A novel kernel-density-based definition of cluster center is proposed using a Bayes-type probability estimator. Then, an algorithm called k-centers is proposed incorporating a new feature weighting scheme by which each attribute is automatically assigned with a weight measuring its individual contribution for the clusters.
benjamin.piwowarski (at) null