Open-set 3D semantic instance maps for vision language navigation–O3D-SIM

dc.contributor.authorNanwani, Laksh
dc.contributor.authorGupta, Kumaraditya
dc.contributor.authorMathur, Aditya
dc.contributor.authorAgrawal, Swayam
dc.contributor.authorHafez, A. H. Abdul
dc.contributor.authorKrishna, K. Madhava
dc.date.accessioned2024-09-18T07:18:55Z
dc.date.available2024-09-18T07:18:55Z
dc.date.issued2024en_US
dc.departmentHKÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.description.abstractHumans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps (Nanwani L, Agarwal A, Jain K, et al. Instance-level semantic maps for vision language navigation. In: 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE; 2023 Aug.) showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify. Project Page - https://smart-wheelchair-rrc.github.io/o3d-sim-webpage. © 2024 The Robotics Society of Japan.en_US
dc.identifier.citationNanwani L., Gupta K., Mathur A., Agrawal S., Hafez A.H.A. & Krishna K.M. (2024). Open-set 3D semantic instance maps for vision language navigation–O3D-SIM. Advanced Robotics. https://doi.org/10.1080/01691864.2024.2395926.en_US
dc.identifier.doi10.1080/01691864.2024.2395926
dc.identifier.issn01691864
dc.identifier.orcid0000-0002-1908-5521en_US
dc.identifier.scopus2-s2.0-85202594595
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.1080/01691864.2024.2395926
dc.identifier.urihttps://hdl.handle.net/20.500.11782/4443
dc.identifier.wosWOS:001303480500001
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherRobotics Society of Japanen_US
dc.relation.ispartofAdvanced Robotics
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subject3D scene understandingen_US
dc.subjectlanguage guidanceen_US
dc.subjectLLMsen_US
dc.subjectOpen-vocabularyen_US
dc.subjectrobotic perceptionen_US
dc.titleOpen-set 3D semantic instance maps for vision language navigation–O3D-SIM
dc.typeArticle

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
1010800169186420242395926.pdf
Boyut:
4.29 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Makale Dosyası

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: