Derin öğrenme algoritmaları kullanarak yazar, tür ve cinsiyet tanıma

Bektaş, Melike

dc.contributor.advisor	Tüfekci, Pınar
dc.contributor.author	Bektaş, Melike
dc.date.accessioned	2022-04-06T06:49:27Z
dc.date.available	2022-04-06T06:49:27Z
dc.date.issued	2020
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=wf-FPgY-5qjHEzEoOgvMs7TBFectlKaGs_WtHX3z1Hg9se_MhMuHe5t6-GSQx29A
dc.identifier.uri	https://hdl.handle.net/20.500.11776/4119
dc.description.abstract	Günümüzde artan veri miktarı, bu verilerin sınıflandırılma ihtiyacını beraberinde getirmiştir. Sınıflandırma, benzer özellikte olan verilerin kategorize edilmesi işlemidir. Bu çalışmada, veri olarak Türkçe haber metinlerinin seçildiği ve bu verilerin yazar, tür ve cinsiyete göre sınıflandırılabilmelerini sağlayan, makine öğrenmesi ve derin öğrenme algoritmalarının sınıflandırıcı olarak kullanıldığı geniş kapsamlı bir modelleme çalışması yapılması amaçlanmıştır. Bu amaçla ilk olarak, bir gazetenin köşe yazarlarına ait köşe yazılarını içeren, yazar tanıma, tür tanıma ve cinsiyet tanıma işlemlerinde kullanılabilecek, büyük ölçekli ve çoklu sınıflara sahip, toplam 14 adet yeni veri seti oluşturulmuştur. Yazar tanıma için 7, tür tanıma için 6 ve cinsiyet tanıma için de 1 adet olan bu veri setleri, Türkçe diline özel, doğal dil işleme adımlarından geçirilerek, sınıflandırma işlemlerinin yapılacağı sınıflandırıcıların uygulandığı ve en yüksek doğruluk başarılarının araştırıldığı, modelleme aşaması için hazır hale getirilmiştir. Modelleme aşamasında, Türkçe metinlerde yazar tanıma, tür tanıma ve cinsiyet tanıma problemlerinin çözümüne yönelik makine öğrenmesi algoritmalarından Multinominal Naive Bayes (MNB) ve Random Forest (RF) algoritmaları, derin öğrenme algoritmalarından da Convolutional Neural Networks (CNN) ve Long Short Term Memory (LSTM) algoritmaları, sınıflandırıcı olarak veri setlerine uygulanmıştır. Ayrıca, bu sınıflandırıcılardan en yüksek performansın alındığı hiperparametre değerleri, uzun deneysel çalışmalar sonucunda bulunmaya çalışılmıştır. Modelleme sonucunda, her bir veri seti için en iyi modellere ait, doğruluk, kesinlik ve duyarlılık değerleri kullanılarak her modelin performansı bulunmuştur. Modelleme aşamasının sonucunda, yazar tanıma için, genel olarak tüm veri setleri arasında, en yüksek başarının alındığı en iyi model, % 95,81 doğruluk başarı değeriyle, AI-TNKU-7 veri seti için, CNN algoritmasının sınıflandırıcı olarak kullanıldığı model olarak bulunmuştur. Tür tanıma içinse, en yüksek başarının alındığı en iyi model, GI-TNKU-6 veri seti için LSTM algoritmasının sınıflandırıcı olarak kullanıldığı ve %96,73 doğruluk başarı değerinin alındığı model olmuştur. Cinsiyet tanıma için de, en yüksek başarının alındığı en iyi model, %88,68 doğruluk başarı değeriyle LSTM algoritmasının sınıflandırıcı olarak kullanıldığı model olarak bulunmuştur.	en_US
dc.description.abstract	Nowadays, the increasing amount of data has brought the need to classify these data. Classification is the process of categorizing similar data. In this study, it is aimed to make a modeling study in which Turkish news texts are selected as data and that these data can be classified according to author, genre and gender, machine learning and deep learning algorithms are used as classifiers. For this purpose, firstly, a total of 14 new data sets with large-scale and multiple classes, which can be used in author identification, genre identification and gender identification processes, containing columnists of a newspaper, were created. These data sets, which are 7 for author identification, 6 for genre identification and 1 for gender identification, have been made ready for the modeling phase, where the classifiers for identification are applied and the highest accuracy successes are investigated by passing through natural language processing steps specific to Turkish language. In the modeling phase, Multinominal Naive Bayes (MNB) and Random Forest (RF) algorithms, which are machine learning algorithms for the solution of author identification, genre identification and gender identification problems in Turkish texts, and Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) from deep learning algorithms have been applied to data sets as classifiers. In addition, hyperparameter values with the highest performance from these classifiers have been tried to be found as a result of long experimental studies. As a result of modeling, using the accuracy, precision and recall values of the best models for each data set, the performance of each model was found. As a result of the modeling stage for author identification, it was seen that the CNN algorithm achieved the highest 95.81% accuracy in the AI-TNKU-7 data set compared to other algorithms used. As a result of the modeling for genre identification, an accuracy of 96.73% was achieved with the LSTM algorithm in the GI-TNKU-6 data set. It has been observed that the success of deep learning algorithms is higher than machine learning algorithms in other data sets used in genre identification. As a result of the modeling phase for gender identification, the LSTM algorithm performed better than other classifiers and an accuracy success of 88.68% was achieved.	en_US
dc.language.iso	tur	en_US
dc.publisher	Tekirdağ Namık Kemal Üniversitesi	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	en_US
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Derin öğrenme algoritmaları kullanarak yazar, tür ve cinsiyet tanıma	en_US
dc.title.alternative	Author, genre and gender identification using deep learning algorithms	en_US
dc.type	masterThesis	en_US
dc.department	Enstitüler, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.identifier.startpage	1	en_US
dc.identifier.endpage	77	en_US
dc.institutionauthor	Bektaş, Melike
dc.relation.publicationcategory	Tez	en_US
dc.identifier.yoktezid	654853	en_US

Bu öğenin dosyaları:

Ad:: 654853.pdf
Boyut:: 2.239Mb
Biçim:: PDF
Açıklama:: Tam Metin / Full Text

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Fen Bilimleri Enstitüsü Tez Koleksiyonu [1953]

Basit öğe kaydını göster