Text classification of web based news articles by using Turkish grammatical features
Yükleniyor...
Dosyalar
Tarih
2012
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
The dimensions of the feature vectors being used at the classification methods in the literature affect directly the time performance. In this study, how to reduce the dimension of the feature vector by using Turkish's grammar rules without compromising success rates is explained. The feature vector is weighted on the basis of the word frequency as the word stems have been selected as features. During this selection the effects of selection of the word stems with different length and type to the classification are investigated and when the word stems with noun type and the maximum length are selected as features, the success rate has been found to be at the highest level. When this selection is applied with the other methods which reduce the dimension, the dimension of the feature vector is decreased to 97.46%. Using the reduced feature vector the better succes rates generally have been obtained from Naive Bayes, SVM, C 4.5 and RF classification methods and the best performance achieved is 92.73% which has been obtained using the Naive Bayes method. © 2012 IEEE.
Açıklama
2012 20th Signal Processing and Communications Applications Conference, SIU 2012 -- 18 April 2012 through 20 April 2012 -- Fethiye, Mugla -- 90786
Anahtar Kelimeler
Classification methods, Feature vectors, Grammar rules, Naive bayes, News articles, Text classification, Time performance, Turkishs, Web based, Word frequencies, Signal processing, Classifiers
Kaynak
2012 20th Signal Processing and Communications Applications Conference, SIU 2012, Proceedings