Cvpr 2025 Context Aware Multimodal Pretraining

[CVPR 2025] Context-Aware Multimodal Pretraining

Paper: https://arxiv.org/abs/2411.15099 Authors: Karsten Roth, Zeynep Akata, Dima Damen, Ivana Balažević*, Olivier J. Hénaff* ...

PromptHMR | CVPR 2025 | Meshcapade

Next in our #CVPR2025 lineup: PromptHMR ✨ Drop a video and watch it blossom into crisp 3D people, even when limbs are ...

[CVPR 2025] LongVALE: Vision-Audio-Language-Event Benchmark

We propose LongVALE, the first time-

CVPR 2025: AIpparel: A Multimodal Foundation Model for Digital Garments

CVPR 2025

PersonaBooth (CVPR 2025)

PersonaBooth: Personalized Text-to-Motion Generation (

How to nitpick multimodal AI evaluations (CVPR 2025 Tutorial Excerpt)

My part of the

[CVPR 2025] Question-Aware Gaussian Experts for Audio-Visual Question Answering (Highlight)

Project Page: https://aim-skku.github.io/QA-TIGER/ Abstract: Audio-Visual Question Answering (AVQA) requires not only ...

CVPR 2025: How to Merge Your Multimodal Models Over Time?

Paper: https://arxiv.org/abs/2412.06712 Code: https://github.com/ExplainableML/fomo_in_flux.

CVPR 2025 Highlights: AI, Computer Vision, and What’s Next

Experience

[CVPR 2025] Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

Virtual presentation of our recent work "Towards Zero-Shot Anomaly Detection and Reasoning with

[CVPR 2025] HuMoCon: Concept Discovery for Human Motion Understanding

This is the official video of the

[CVPR 2025] SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

We introduce SeqAfford, a

HyperDUM CVPR 2025 presentation

Abstract: Uncertainty Quantification (UQ) is crucial for ensuring the reliability of machine learning models deployed in real-world ...

CVPR 2025: Compositional Caching for Training-free Open-vocabulary Attribute Detection

CVPR 2025

[CVPR 2026]

Disentangle-then-Align: Non-Iterative Hybrid

[CVPR 2026] Boosting Reasoning in Large Multimodal Models via Activation Replay

Brief intro of our paper. Feel free to find more in https://arxiv.org/abs/2511.19972.

[CVPR 2025] FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in VQA

Visual question answering (VQA) systems face significant challenges when adapting to real-world data shifts, especially in ...