BERT Based Topic-Specific Crawler
Yükleniyor...
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Institute of Electrical and Electronics Engineers Inc.
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Nowadays, retrieving certain information using search engines is very popular and one of the main applications of the Internet. To speed up the process of getting the required information(web pages), having a topic-specific crawler is essential to fetch and index only the relevant ones. This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S- BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages. This provides a lightweight model compared to using a word embedding with deep learning for text classification. © 2021 IEEE.
Açıklama
Anahtar Kelimeler
document classification, search engine, text categorization, text classification, topic-specific crawler, web crawler
Kaynak
Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021
WoS Q Değeri
Scopus Q Değeri
Cilt
Sayı
Künye
Tawil, Y., Alqaraleh, S. (2021). BERT Based Topic-Specific Crawler. Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021: Code 174400.










