BERT Based Topic-Specific Crawler
| dc.contributor.author | Tawil, Yahya | |
| dc.contributor.author | Alqaraleh, Saed | |
| dc.contributor.institutionauthor | Tawil, Yahya | |
| dc.contributor.institutionauthor | Alqaraleh, Saed | |
| dc.date.accessioned | 2023-03-13T05:54:14Z | |
| dc.date.available | 2023-03-13T05:54:14Z | |
| dc.date.issued | 2021 | en_US |
| dc.department | HKÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | en_US |
| dc.description.abstract | Nowadays, retrieving certain information using search engines is very popular and one of the main applications of the Internet. To speed up the process of getting the required information(web pages), having a topic-specific crawler is essential to fetch and index only the relevant ones. This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S- BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages. This provides a lightweight model compared to using a word embedding with deep learning for text classification. © 2021 IEEE. | en_US |
| dc.identifier.citation | Tawil, Y., Alqaraleh, S. (2021). BERT Based Topic-Specific Crawler. Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021: Code 174400. | en_US |
| dc.identifier.doi | 10.1109/ASYU52992.2021.9599076 | |
| dc.identifier.isbn | 978-166543405-8 | |
| dc.identifier.orcid | 0000-0003-0321-0866 | en_US |
| dc.identifier.orcid | 0000-0002-7146-3905 | en_US |
| dc.identifier.scopus | 2-s2.0-85123208764 | |
| dc.identifier.scopusquality | N/A | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11782/3119 | |
| dc.indekslendigikaynak | Scopus | |
| dc.language.iso | en | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
| dc.relation.ispartof | Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021 | |
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| dc.rights | info:eu-repo/semantics/openAccess | en_US |
| dc.subject | document classification | en_US |
| dc.subject | search engine | en_US |
| dc.subject | text categorization | en_US |
| dc.subject | text classification | en_US |
| dc.subject | topic-specific crawler | en_US |
| dc.subject | web crawler | en_US |
| dc.title | BERT Based Topic-Specific Crawler | |
| dc.type | Conference Object |
Dosyalar
Orijinal paket
1 - 1 / 1
Yükleniyor...
- İsim:
- makale - yayıncı sürümü70.pdf
- Boyut:
- 334.91 KB
- Biçim:
- Adobe Portable Document Format
- Açıklama:
- Makale Dosyası
Lisans paketi
1 - 1 / 1
Yükleniyor...
- İsim:
- license.txt
- Boyut:
- 1.44 KB
- Biçim:
- Item-specific license agreed upon to submission
- Açıklama:










