- Computer Science Laboratory

LIP6 1999/011

  • Habilitation «La reconnaissance vocale et son mentor : l'évaluation»
  • M.-J. Caraty
  • 67 pages - 04/07/1999 - document en - http://www.lip6.fr/lip6/reports/1999/lip6.1999.011.ps.gz 308 Ko
  • Contact Marie-Jose.Caraty (at) nulllip6.fr
  • Ancien Thème : APA
  • The work presented in this document covers ten years of post-PhD research, in the field of computer speech processing at the Pattern Recognition and Artificial Intelligence Laboratory (LAFORIA), now a part of the Computer Science Laboratory of Pierre et Marie Curie University (LIP6). The title of the document presents the evaluation as the mentor of speech recognition research. In this document, the evaluation is often taken into account. Among the various identified evaluation methodologies (adequation, diagnostic, quantitative, qualitative), a paradigm of evaluation related to quantitative is observed in speech recognition. The principle of this paradigm can be summarized as follows : 'common task, common data, common evaluation' ; from evaluation plans, the development of the research and the vocal technology has been sped up. In high vocabulary and speaker independent speech recognition, the Hidden Markov Models (HMM) based systems are the systems of the state of the art. For such a problem, the control of the markovian technology is not easy. The studies are described through the development of our own vocal dictation system, our choices, and our experience of an evaluation plan. Related to qualitative evaluation, other studies are presented in the field of HMM-based hybrid systems. The first one deals with the temporal control in the HMM-based systems. The second one uses the K-nearest neighbors decision rule as an alternative of the ouput probabilities gaussian estimation. After the vocal recognition of the state of the art, the studies are presented from the earliest one to the most recent one. These studies concern the representation space and the decision process. The first work deals with the formant-based representation of speech, the conception of an adapted dissimilarity measure based on perceptual criteria and the quantitative/qualitative evaluations carried on. Another work on numeric-symbolic approach is a first experiment in applying symbolic learning to speech recognition. The studies on speaker recognition address a dual issue of speech recognition. At last, to deal with the non-stationarity of the speech signal, the most recent work proposes an extension of any representation space based on a temporal multi-resolution. From inertia measure and centroid computation, many application are found in speech recognition. In the conclusion of the document, an application of our knowledge in vocal technology is given in a field wider than the speech recognition one : information retrieval in multimedia documents. One of the perspectives addresses the problem of the evolution of the vocal dictation systems : a key is found in Soft Engineery with the necessity to develop reusable logicial components.