Text classification of web based news articles by using Turkish grammatical features

Yükleniyor...
Küçük Resim

Tarih

2012

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The dimensions of the feature vectors being used at the classification methods in the literature affect directly the time performance. In this study, how to reduce the dimension of the feature vector by using Turkish's grammar rules without compromising success rates is explained. The feature vector is weighted on the basis of the word frequency as the word stems have been selected as features. During this selection the effects of selection of the word stems with different length and type to the classification are investigated and when the word stems with noun type and the maximum length are selected as features, the success rate has been found to be at the highest level. When this selection is applied with the other methods which reduce the dimension, the dimension of the feature vector is decreased to 97.46%. Using the reduced feature vector the better succes rates generally have been obtained from Naive Bayes, SVM, C 4.5 and RF classification methods and the best performance achieved is 92.73% which has been obtained using the Naive Bayes method. © 2012 IEEE.

Açıklama

2012 20th Signal Processing and Communications Applications Conference, SIU 2012 -- 18 April 2012 through 20 April 2012 -- Fethiye, Mugla -- 90786

Anahtar Kelimeler

Classification methods, Feature vectors, Grammar rules, Naive bayes, News articles, Text classification, Time performance, Turkishs, Web based, Word frequencies, Signal processing, Classifiers

Kaynak

2012 20th Signal Processing and Communications Applications Conference, SIU 2012, Proceedings

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye