Learning Information Retrieval Functions and Parameters on Unlabeled Collections

Thèse
 - 
LIG
Parantapa Goswami
Lundi 06 octobre 2014
Réalisation technique : Djamel Hadji | Tous droits réservés

The present study focuses on (a) predicting parameters of already existing standard IR models and (b) learning new IR functions.

We first explore various statistical methods to estimate the collection parameter of family of information based models. This parameter determines  the behavior of a term in the collection. In earlier studies, it was set to the average number of documents where the term appears, without full justification. We introduce here a fully formalized estimation method which leads to improved versions of these models over the original ones. But the method developed is applicable only to estimate the collection parameter under the information model framework.

To alleviate this we propose a transfer learning approach which can predict values for any parameter for any IR model. This approach uses relevance judgments on a past collection to learn a regression function which can infer parameter values for each single query on a new unlabeled target collection. The proposed method not only outperforms the standard IR models with their default parameter values, but also yields either better or at par performance with popular parameter tuning methods which use relevance judgments on target collection.

We then investigate the application of transfer learning based techniques to directly transfer relevance information from a source collection to derive a "pseudo-relevance" judgment on an unlabeled target collection. From this derived pseudo-relevance a ranking function is learned using any standard learning algorithm which can rank documents in the target collection. In various experiments the learned function outperformed standard IR models as well as other state-of-the-art transfer learning based algorithms.

Though a ranking function learned through a learning algorithm is effective still it has a predefined form based on the learning algorithm used. We thus introduce an exhaustive discovery approach to search ranking functions from a space of simple functions. Through experimentation we found that some of the discovered functions are highly competitive with respect to standard IR models.

L'UMS MI2S a fermé le 31 décembre 2016, les vidéos hébergées sur son site le sont maintenant sur le site de GRICAD. Conformément à la loi informatique et libertés du 6 janvier 1978 modifiée, vous pouvez exercer vos droits de rétraction ou de modification relatifs aux autorisations validées par MI2S auprès de l'UMS GRICAD.