BERT Based Topic-Specific Crawler

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Nowadays, retrieving certain information using search engines is very popular and one of the main applications of the Internet. To speed up the process of getting the required information(web pages), having a topic-specific crawler is essential to fetch and index only the relevant ones. This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S- BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages. This provides a lightweight model compared to using a word embedding with deep learning for text classification. © 2021 IEEE.

Açıklama

Anahtar Kelimeler

document classification, search engine, text categorization, text classification, topic-specific crawler, web crawler

Kaynak

Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Tawil, Y., Alqaraleh, S. (2021). BERT Based Topic-Specific Crawler. Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021: Code 174400.

Onay

İnceleme

Ekleyen

Referans Veren