UM  > 科技學院
Rules Design in Word Segmentation of Chinese Micro-Blog
Zong Hao; Derek F. Wong; Lidia S.Chao
2012
Conference Namethe Second CIPS-SIGHAN Joint Conference on Chinese Language Processing
Source PublicationProceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing
Pages90–94
Conference Date20-21 DEC. 2012
Conference PlaceTianjin, China
Abstract

This paper proposed a Hidden Markov Model (HMM) based tokenizer for Chinese micro-blog texts. Comparing with normal Chinese texts, micro-blog texts contain more uncertainties. These uncertainties are generally aroused by the irregular use of bloggers (such as network words, dialect words, wrong written characters, mixture of foreign words and symbols, etc.). Besides the lack of the annotated training corpus is also a restriction in solving this task. Hence the segmentation for micro-blogs is much more difficult than that of general text, we present an HMM based segmentation model integrated with a pre and post correction module. The evaluation results show that the proposed approach can achieve an F-measure of 90.98% on test set of 5,000 sentences.

Language英语
Fulltext Access
Document TypeConference paper
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
AffiliationNLP2CT Research Group, Department of Computer and Information Science, University of Macau, Macau SAR, China
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Zong Hao,Derek F. Wong,Lidia S.Chao. Rules Design in Word Segmentation of Chinese Micro-Blog[C],2012:90–94.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zong Hao]'s Articles
[Derek F. Wong]'s Articles
[Lidia S.Chao]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zong Hao]'s Articles
[Derek F. Wong]'s Articles
[Lidia S.Chao]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zong Hao]'s Articles
[Derek F. Wong]'s Articles
[Lidia S.Chao]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.