Séminaire Donnees et APprentissage Artificiel
New Perspectives in Social Data Management / Understanding Similarity Metrics in Neighbour-based Recommender Systems
Speaker(s) : Sihem Amer-Yahia / Arjen de Vries (LIG/CWI)
**Please note that _there are two talks_ (at 15:00 and 15:45)**
_(15:00) [Sihem Amer-Yahia](http://membres-liglab.imag.fr/amery/), DR CNRS, Laboratoire d'Informatique de Grenoble_
The web has evolved from a technology platform to a social milieu where a mix of factual, opinion and behavior data interleave. A number of social applications are being built to analyze and extract value from this data and is encouraging us to do data-driven research.
I will describe a perspective on why and how social data management is fundamentally different from data management as it is taught in school today. More specifically, I'll talk about social data preparation, social data exploration and social application validation.
This talk is based on published and ongoing work with colleagues at LIG, UT Austin, U. of Trento, U. of Tacoma, and Google Research.
_(15:45) [Arjen de Vries](http://homepages.cwi.nl/~arjen/), Professor, Centrum Wiskunde & Informatica, Amsterdam (CWI)_
Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the CWI Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: we won the ACM RecSys 2013 News Recommender Systems challenge!). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.
More details here
benjamin.piwowarski (at) null