CNN vs. LSTM for Turkish text classification

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

In this paper, the efficiency of two states of the art text classification techniques, i.e., Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for supporting the Turkish text classification has been investigated. In addition, the effect of the main preprocessing steps such as Tokenization, Stop Word Elimination, Stemming, etc. has also been studied. Several experiments using "TTC-3600"dataset were performed, and it has been observed that both CNN and LSTM can efficiently support the Turkish language and can achieve quite good performance. Related to data preprocessing, results indicated that such a process improves the performance, however, for the Turkish language, it is preferred to exclude stemming. Also, by comparing the performance of feature extraction techniques for processing Turkish language, Word2Vec outperforms TF-IDF. © 2021 IEEE.

Açıklama

Anahtar Kelimeler

Convolutional Neural Networks, Long Short-Term Memory, Natural Language Processing, Text Classification, Turkish Language

Kaynak

2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedings

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Yayla, M., Diyar Demirkol, M., Alqaraleh, S. (2021). CNN vs. LSTM for Turkish text classification. 2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedings: Code 172175.

Onay

İnceleme

Ekleyen

Referans Veren