1.
Self-Prior Guided Pixel Adversarial Networks for Blind Image Inpai..
[559]
|
2.
Set Prediction Guided by Semantic Concepts for Diverse Video Capti..
[471]
|
3.
Cross-Architecture Knowledge Distillation
[274]
|
4.
Improving metric-based few-shot learning with dynamically scaled s..
[262]
|
5.
Learning Semantics-Grounded Vocabulary Representation for Video-Te..
[258]
|
6.
Multi-scale self-attention-based feature enhancement for detection..
[256]
|
7.
Hierarchical Curriculum Learning for No-Reference Image Quality As..
[254]
|
8.
Dynamic adjustment of hyperparameters for anchor-based detection o..
[240]
|
9.
CSC-Unet: A Novel Convolutional Sparse Coding Strategy Based Neura..
[233]
|
10.
CONSISTENT4D: CONSISTENT 360° DYNAMIC OBJECT GENERATION FROM MONO..
[228]
|
11.
RSI-Net: Two-Stream Deep Neural Network for Remote Sensing Images-..
[224]
|
12.
PolarFormer: Multi-Camera 3D Object Detection with Polar Transform..
[215]
|
13.
Learn from Noise: Detecting Deepfakes via Regional Noise Consisten..
[215]
|
14.
An Experimental Study on Exploring Strong Lightweight Vision Trans..
[214]
|
15.
A Closer Look at Self-Supervised Lightweight Vision Transformers
[210]
|
16.
UniGen: Unified Generative Pre-training for Multilingual Multimoda..
[197]
|
17.
Chinese Title Generation for Short Videos: Dataset, Metric and Alg..
[195]
|
18.
ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual ..
[190]
|
19.
Temporal Correlation Meets Embedding: Towards a 2nd Generation of ..
[190]
|
20.
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-La..
[178]
|
21.
MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-..
[169]
|
22.
A Closer Look at Self-Supervised Lightweight Vision Transformers
[161]
|
23.
BEV2PR: BEV-Enhanced Visual Place Recognition with Stru..
[161]
|
24.
Unifying Latent and Lexicon Representations for Effective Video-Te..
[158]
|
25.
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detect..
[156]
|
26.
MIBench: Evaluating Multimodal Large Language Models over Multiple..
[152]
|
27.
How to Make Cross Encoder a Good Teacher for Efficient Image-Text ..
[152]
|
28.
One-Stage Anchor-Free Online Multiple Target Tracking With Deforma..
[149]
|
29.
NFT1000: A Cross-Modal Dataset for Non-Fungible Token Retrieval
[148]
|
30.
Exploiting Contextual Objects and Relations for 3D Visual Groundin..
[141]
|
31.
PromptIQA: Boosting the Performance and Generalization for No-Refe..
[140]
|
32.
GMC-IQA: Exploiting Global-correlation and Mean-opinion Consistenc..
[128]
|
33.
BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues
[121]
|
34.
SEAGULL: No-reference Image Quality Assessment for Regions of Inte..
[107]
|
35.
How to Make Cross Encoder a Good Teacher for Efficient Image-Text ..
[103]
|
36.
iESTA: Instance-Enhanced Spatial-Temporal Alignment for Video Copy..
[94]
|
37.
HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Trackin..
[90]
|
38.
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discret..
[76]
|
39.
Development of Infant Brain Functional Connectome Gradient during ..
[68]
|
40.
线性分解注意力的边缘端高效Transformer跟踪
[48]
|
41.
EA-VTR: Event-Aware Video-Text Retrieval
[46]
|
42.
Content-decoupled Contrastive Learning-based Implicit Degradation ..
[28]
|
43.
Task-aware Attentional Dynamic Alignment for Few-Shot Compressed V..
[23]
|
44.
Two-stream transformer tracking with messengers
[7]
|
45.
PromptIQA: Boosting the Performance and Generalization for No-R..
[2]
|