Publications

(2023). FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection.

PDF Cite DOI URL

(2022). Phrase-level Prediction for Video Temporal Localization. Proceedings of the 2022 International Conference on Multimedia Retrieval.

Cite DOI URL