Séminaire Donnees et APprentissage Artificiel

Ranking and Scoring XML Data

Lundi 19 juin 2006
A. Marian (Rutgers University, UK)

XML repositories are usually queried on both structure and content. Due to the structural heterogeneity of XML, queries are often interpreted approximately and the k best answers are returned ranked by scores. In this context, it is important to provide both efficient top-k query evaluation algorithms to return answers as early as possible, and adequate scoring functions to return high-quality results. The efficiency of top-k query algorithms relies on using scores to prune irrelevant answers as early as possible in the evaluation process. In such a setting, evaluating the same query plan for all answers might be too rigid: at any time in the evaluation, answers have gone through the same number and sequence of operations, which limits the speed at which scores grow. Therefore, adaptive query processing that permits different plans for different partial matches and maximizes the best scores is more appropriate. We propose an architecture and adaptive algorithms for efficiently computing top-k matches to XML queries. Computing answer scores in XML is an active area of research that oscillates between pure content scoring, such as the well known tf*idf, and taking structure into account. However, none of the existing proposals fully accounts for relaxations on the query structure, or combines structure with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations.

Plus d'informations ici …
Javier.Diaz (at) nulllip6.fr