BERT Based Topic-Specific Crawler
Citation
Tawil, Y., Alqaraleh, S. (2021). BERT Based Topic-Specific Crawler. Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021: Code 174400.Abstract
Nowadays, retrieving certain information using search engines is very popular and one of the main applications of the Internet. To speed up the process of getting the required information(web pages), having a topic-specific crawler is essential to fetch and index only the relevant ones. This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S- BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages. This provides a lightweight model compared to using a word embedding with deep learning for text classification. © 2021 IEEE.