UM  > 科技學院  > 電腦及資訊科學系
An improvement in cross-language document retrieval based on statistical models
Wang L.-Y.; Wong D.F.; Chao L.S.
2012-12-01
Conference Namethe 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)
Source PublicationProceedings of the 24th Conference on Computational Linguistics and Speech Processing, ROCLING 2012
Pages144-155
Conference Date2012 September
Conference PlaceChung-Li, Taiwan
Abstract

This paper presents a proposed method integrated with three statistical models including Translation model, Query generation model and Document retrieval model for cross-language document retrieval. Given a certain document in the source language, it will be translated into the target language of statistical machine translation model. The query generation model then selects the most relevant words in the translated version of the document as a query. Finally, all the documents in the target language are scored by the document searching model, which mainly computes the similarities between query and document. This method can efficiently solve the problem of translation ambiguity and query expansion for disambiguation, which are critical in Cross-Language Information Retrieval. In addition, the proposed model has been extensively evaluated to the retrieval of documents that: 1) texts are long which, as a result, may cause the model to over generate the queries; and 2) texts are of similar contents under the same topic which is hard to be distinguished by the retrieval model. After comparing different strategies, the experimental results show a significant performance of the method with the average precision close to 100%. It is of a great significance to both cross-language searching on the Internet and the parallel corpus producing for statistical machine translation systems.

KeywordCross-language Document Retrieval Document Translation-based Statistical Machine Translation Tf-idf
URLView the original
Language英语
Fulltext Access
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
AffiliationUniversidade de Macau
Recommended Citation
GB/T 7714
Wang L.-Y.,Wong D.F.,Chao L.S.. An improvement in cross-language document retrieval based on statistical models[C],2012:144-155.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang L.-Y.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang L.-Y.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang L.-Y.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.