UM  > 科技學院  > 電腦及資訊科學系
Edit distance: A new data selection criterion for domain adaptation in SMT
Wang L.2; Wong D.F.2; Chao L.S.2; Xing J.2; Lu Y.2; Trancoso I.1
2013-12-23
Conference NameInternational Conference Recent Advances in Natural Language Processing, RANLP 2013
Source PublicationProceedings of Recent Advances in Natural Language Processing
Pages727-732
Conference Date7-13 September 2013.
Conference PlaceHissar, Bulgaria,
Abstract

This paper aims at effective use of training data by extracting sentences from large generaldomain corpora to adapt statistical machine translation systems to domain-specific data. We regard this task as a problem of filtering training sentences with respect to the target domain1 via different similarity metrics. Thus, we give new insights into when data selection model can best benefit the in-domain translation. Based on the investigation of the state-ofthe-art similarity metrics, we propose edit distance as a new data selection criterion for this topic. To evaluate this proposal, we compare it with other methods on a large dataset. Comparative experiments are conducted on Chinese-English travel dialog domain and the results indicate that the proposed approach achieves a significant improvement over the baseline system (+4.36 BLEU) as well as the best rival model (+1.23 BLEU) using a much smaller training subset. This study may have a significant impact on mining very large corpora in a computationally-limited environment.

URLView the original
Language英语
Fulltext Access
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
2.Universidade de Macau
Recommended Citation
GB/T 7714
Wang L.,Wong D.F.,Chao L.S.,et al. Edit distance: A new data selection criterion for domain adaptation in SMT[C],2013:727-732.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang L.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang L.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang L.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.