CNN vs. LSTM for Turkish text classification
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
In this paper, the efficiency of two states of the art text classification techniques, i.e., Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for supporting the Turkish text classification has been investigated. In addition, the effect of the main preprocessing steps such as Tokenization, Stop Word Elimination, Stemming, etc. has also been studied. Several experiments using "TTC-3600"dataset were performed, and it has been observed that both CNN and LSTM can efficiently support the Turkish language and can achieve quite good performance. Related to data preprocessing, results indicated that such a process improves the performance, however, for the Turkish language, it is preferred to exclude stemming. Also, by comparing the performance of feature extraction techniques for processing Turkish language, Word2Vec outperforms TF-IDF. © 2021 IEEE.










