Mesurer et améliorer la qualité des corpus comparables

Thèse
 - 
LIG
Bo Li
Mardi 26 juin 2012
Réalisation technique : Djamel Hadji | Tous droits réservés

Different from previous studies exploiting comparable corpora, the work presented in this thesis aims at enhancing the quality of a comparable corpus in order to improve the performance of NLP tasks exploiting it. The idea is advantageous since it can work with any existing algorithm making use of comparable corpora. We concentrate on the following aspects : (1) We propose a comparability measure to quantify the degree of comparability of comparable corpora. This measure is developed within a simple probabilistic framework and can correlate well with gold-standard comparability levels. (2) With the proposed comparability measure, we develop two methods to improve the quality of any given comparable corpus. The efficiency of the methods is confirmed in terms of both the comparability scores and the quality of bilingual lexicons extracted from the enhanced comparable corpora. (3) The extracted lexicons are lastly used to enhance a novel information-based CLIR model.

L'UMS MI2S a fermé le 31 décembre 2016, les vidéos hébergées sur son site le sont maintenant sur le site de GRICAD. Conformément à la loi informatique et libertés du 6 janvier 1978 modifiée, vous pouvez exercer vos droits de rétraction ou de modification relatifs aux autorisations validées par MI2S auprès de l'UMS GRICAD.