Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
Prompt-OVD is an efficient and effective DETR-based framework for open -vocabulary object detection that utilizes class embeddings from CLIP as prompts, guiding the Transformer decoder to detect objects in base and novel classes. Additionally, our RoI-pruned masked attention helps leverage the zero -shot classification ability of the Vision Transformer -based CLIP, resulting in improved detection performance at a minimal computational cost. Our experiments on the OV-COCO and OV-LVIS datasets demonstrate that Prompt-OVD achieves an impressive 21.2 times faster inference speed than the first end -to -end open -vocabulary detection method (OVDETR), while also achieving higher APs than four two -stage methods operating within similar inference time ranges. We release the code at https://github.com/DISL-Lab/Prompt-OVD.
Publisher
ELSEVIER SCI LTD
Issue Date
2024-11
Language
English
Article Type
Article
Citation

PATTERN RECOGNITION, v.155

ISSN
0031-3203
DOI
10.1016/j.patcog.2024.110648
URI
http://hdl.handle.net/10203/322447
Appears in Collection
IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0