Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation
3DAvatarGAN Bridging Domains for Personalized Editable Avatars
Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
Affection Learning Affective Explanations for Real-World Visual Dat
ShapeTalk A Language Dataset and Framework for 3D Shape Edits
Canonical Fields Self-Supervised Learning of Pose-Canonicalized Neural Fields
Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving
Multi-Realism Image Compression With a Conditional Generato
Interactive Cartoonization With Controllable Perceptual Factors
LINe Out-of-Distribution Detection by Leveraging Important Neurons
Neural Kaleidoscopic Space Sculpting
Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Imag
Balanced Product of Calibrated Experts for Long-Tailed Recognition
HRDFuse Monocular 360deg Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth
MetaCLUE Towards Comprehensive Visual Metaphors Research
Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding
Look Radiate and Learn Self-Supervised Localisation via Radio-Visual Correspondenc
Is BERT Blind Exploring the Effect of Vision-and-Language Pretraining on
DC2 Dual-Camera Defocus Control by Learning To Refocus
RenderDiffusion Image Diffusion for 3D Reconstruction Inpainting and Generation
RangeViT Towards Vision Transformers for 3D Semantic Segmentation in Autonomous
PanoHead Geometry-Aware 3D Full-Head Synthesis in 360deg
ZBS Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foregroun
Deep Curvilinear Editing Commutative and Nonlinear Image Manipulation for Pretrain
BUFFER Balancing Accuracy Efficiency and Generalizability in Point Cloud Registration
CIRCLE Capture in Rich Contextual Environments
Ham2Pose Animating Sign Language Notation Into Pose Sequences
Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models
HierVL Learning Hierarchical Video-Language Embeddings
MaLP Manipulation Localization Using a Proactive Schem
Spider GAN Leveraging Friendly Neighbors To Accelerate GAN Training
Self-Supervised Learning From Images With a Joint-Embedding Predictive Architectu
TarViS A Unified Approach for Target-Based Video Segmentation
Generalizable Local Feature Pre-Training for Deformable Shape Analysis
Understanding and Improving Features Learned in Deep Functional Maps
HyperReel High-Fidelity 6-DoF Video With Ray-Conditioned Sampling
SpaText Spatio-Textual Representation for Controllable Image Generation
TempSAL - Uncovering Temporal Information for Deep Saliency Prediction
High-Res Facial Appearance Capture From Polarized Smartphone Images
A New Dataset Based on Images Taken by Blind Peopl
Test of Time Instilling Video-Language Models With a Sense o
Affordances From Human Videos as a Versatile Representation for Robotics
AUNet Learning Relations Between Action Units for Face Forgery Detection
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
FFHQ-UV Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
GLeaD Improving GANs With a Generator-Leading Task
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors
Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB
Masked Autoencoders Enable Efficient Knowledge Distillers
Sliced Optimal Partial Transport
Bayesian Posterior Approximation With Stochastic Ensembles
Learning Visual Representations via Language-Guided Sampling
AdaMAE Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders
DualRefine Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling
Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing
Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views
All Are Worth Words A ViT Backbone for Diffusion Models
CiCo Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
DexArt Benchmarking Generalizable Dexterous Manipulation With Articulated Objects
Object Discovery From Motion-Guided Tokens
SINE Semantic-Driven Image-Based NeRF Editing With Prior-Guided Editing Fiel
A Large-Scale Homography Benchmark
Finding Geometric Models by Clustering in the Consensus Spac
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
Two-View Geometry Scoring Without Correspondences
Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation
MaskSketch Unpaired Structure-Guided Masked Image Generation
RMLVQA A Margin Loss Approach for Visual Question Answering With
Galactic Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Secon
Kernel Aware Resampl
Blowing in the Wind CycleNet for Human Cinemagraphs From Still
FlexiViT One Model for All Patch Sizes
A Light Touch Approach to Teaching Transformers Multi-View Geometry
CCuantuMM Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
Person Image Synthesis via Denoising Diffusion Model
Sketch2Saliency Learning To Detect Salient Objects From Human Drawings
NoPe-NeRF Optimising Neural Radiance Field With No Pose Prio
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of D
Probabilistic Debiasing of Scene Graphs
BEDLAM A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animat
Align Your Latents High-Resolution Video Synthesis With Latent Diffusion Models
Architectural Backdoors in Neural Networks
Meta Omnium A Benchmark for General-Purpose Learning-To-Learn
Neural Part Priors Learning To Optimize Part-Based Object Completion in
Instant Multi-View Head Capture Through Learnable Registration
DejaVu Conditional Regenerative Learning To Enhance Dense Prediction
Open-Set Likelihood Maximization for Few-Shot Learning
ALSO Automotive Lidar Self-Supervision by Occupancy Estimation
CR-FIQA Face Image Quality Assessment by Learning Sample Relative Classifiability
A-La-Carte Prompt Tuning APT Combining Distinct Data via Composable Prompting
Accelerated Coordinate Encoding Learning to Relocalize in Minutes Using RGB
A Probabilistic Framework for Lifelong Test-Time Adaptation
Open-Vocabulary Attribute Detection
Omni3D A Large Benchmark and Model for 3D Object Detection
InstructPix2Pix Learning To Follow Image Editing Instructions
Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy
Learning and Aggregating Lane Graphs for Urban Automated Driving
LASP Text-to-Text Optimization for Language-Aware Soft Prompting of Vision
Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks
Introducing Competition To Boost the Transferability of Targeted Adversarial Examples
Ensemble-Based Blackbox Attacks on Dense Prediction
MARLIN Masked Autoencoder for Facial Video Representation LearnINg
Multi-Centroid Task Descriptor for Dynamic Class Incremental Inferenc
NeuDA Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon
Orthogonal Annotation Benefits Barely-Supervised Medical Image Segmentation
RIAV-MVS Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo
Source-Free Adaptive Gaze Estimation by Uncertainty Reduction
A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection an
CiaoSR Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
Contrastive Mean Teacher for Domain Adaptive Object Detectors
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
HexPlane A Fast Representation for Dynamic Scenes
Iterative Proposal Refinement for Weakly-Supervised Video Grounding
Multi-View Azimuth Stereo via Tangent Space Consistency
Observation-Centric SORT Rethinking SORT for Robust Multi-Object Tracking
Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography
Real-Time Neural Light Field on Mobile Devices
Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transform
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
SeSDF Self-Evolved Signed Distance Field for Implicit 3D Clothed Human
SVGformer Representation Learning for Continuous Vector Graphics Using Transformers
Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning
Few-Shot Semantic Image Synthesis With Class Affinity Trans
Towards Better Decision Forests Forest Alternating Optimization
Generalizing Dataset Distillation via Deep Generative Prio
Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition
CoMFormer Continual Learning in Semantic and Panoptic Segmentation
Unifying Short and Long-Term Tracking With Graph Hierarchies
An Image Quality Assessment Dataset for Portraits
LayoutDM Transformer-Based Diffusion Model for Layout Generation
Persistent Nature A Generative Model of Unbounded 3D Worlds
Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality
Seeing With Sound Long-range Acoustic Beamforming for Multimodal Scene Understanding
Continuous Landmark Detection With 3D Queries
1000 FPS HDR Video With a Spike-RGB Hybrid Cam
An Erudite Fine-Grained Visual Classification Model
Depth Estimation From Indoor Panoramas With Neural Scene Representation
Domain Generalized Stereo Matching via Hierarchical Visual Transformation
L-CoIns Language-Based Colorization With Instance Awareness
Making Vision Transformers Efficient From a Token Sparsification View
Pointersect Neural Rendering With Cloud-Ray Intersection
Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning
Equivalent Transformation and Dual Stream Network Construction for Mobile Imag
AVFace Towards Detailed Audio-Visual 4D Face Reconstruction
Data-Free Sketch-Based Image Retrieval
Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From
Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning
Privacy-Preserving Representations Are Not Enough Recovering Scene Content From Cam
BoxTeacher Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation
M6Doc A Large-Scale Multi-Format Multi-Type Multi-Layout Multi-Language Multi-Annotation Category Dataset
Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
Panoptic Compositional Feature Field for Editable Scene Rendering With Network-In
SDFusion Multimodal 3D Shape Completion Reconstruction and Generation
VindLU A Recipe for Effective Video-and-Language Pretraining
WildLight In-the-Wild Inverse Rendering With a Flashlight
Activating More Pixels in Image Super-Resolution Transform
Affordance Grounding From Demonstration Video To Target Imag
AnchorFormer Point Cloud Completion From Discriminative Nodes
A Unified Knowledge Distillation Framework for Deep Directed Graphical Models
Better CMOS Produces Clearer Images Learning Space-Variant Blur Estimation fo
Beyond Appearance A Semantic Controllable Self-Supervised Learning Framework for Human-Centric
Boosting Semi-Supervised Learning by Exploiting All Unlabeled Dat
Boundary Unlearning Rapid Forgetting of Deep Networks via Shifting th
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization
CLIP2Scene Towards Label-Efficient 3D Scene Understanding by CLI
DAA A Delta Age AdaIN Operation for Age Estimation vi
DBARF Deep Bundle-Adjusting Generalizable Neural Radiance Fields
DeepMapping2 Self-Supervised Large-Scale LiDAR Map Optimization
Detecting Human-Object Contact in Images
DisCo-CLIP A Distributed Contrastive Loss for Memory Efficient CLIP Training
Divide and Conquer Answering Questions With Object Factorization and Compositional
DPF Learning Dense Prediction Fields With Weak Supervision
Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes
Elastic Aggregation for Federated Optimization
End-to-End 3D Dense Captioning With Vote2Cap-DET
Enhanced Multimodal Representation Learning With Cross-Modal KD
Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Executing Your Commands via Motion Diffusion in Latent Spac
Extracting Class Activation Maps From Non-Discriminative Features As Well
FFF Fragment-Guided Flexible Fitting for Building Complete Protein Structures
From Node Interaction To Hop Interaction New Effective and Scalabl
Generative Semantic Segmentation
GM-NeRF Learning Generalizable Model-Based Neural Radiance Fields From Multi-View Images
gSDF Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
Hand Avatar Free-Pose Hand Animation and Rendering From Monocular Video
HNeRV A Hybrid Neural Representation for Videos
Human Guided Ground-Truth Generation for Realistic Image Super-Resolution
Imitation Learning As State Matching via Differentiable Physics
Implicit Neural Head Synthesis via Controllable Local Deformation Fields
Improved Test-Time Adaptation for Domain Generalization
iQuery Instruments As Queries for Audio-Visual Sound Separation
LargeKernel3D Scaling Up Kernels in 3D Sparse CNNs
Learning a Deep Color Difference Metric for Photographic Images
Learning a Sparse Transformer Network for Effective Image Deraining
Learning From Unique Perspectives User-Aware Saliency Modeling
Learning the Distribution of Errors in Stereo Matching for Joint
Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
MagicNet Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery
MammalNet A Large-Scale Video Benchmark for Mammal Recognition and Behavio
Masked Image Training for Generalizable Deep Image Denoising
Meta-Causal Learning for Single Domain Generalization
Mixed Autoencoder for Self-Supervised Visual Representation Learning
MobileNeRF Exploiting the Polygon Rasterization Pipeline for Efficient Neural Fiel
Mod-Squad Designing Mixtures of Experts As Modular Multi-Task Learners
Movies2Scenes Using Movie Metadata To Learn Scene Representation
Multivariate Multi-Frequency and Multimodal Rethinking Graph Neural Networks for Emotion
NeuralEditor Editing Neural Radiance Fields via Manipulating Point Clouds
Novel-View Acoustic Synthesis
OvarNet Towards Open-Vocabulary Object Attribute Recognition
PAniC-3D Stylized Single-View 3D Reconstruction From Portraits of Anime Characters
PiMAE Point Cloud and Image Interactive Masked Autoencoders for 3D
Private Image Generation With Dual-Purpose Auxiliary Classifi
RankMix Data Augmentation for Weakly Supervised Learning of Classifying Whol
Revisiting Multimodal Representation in Contrastive Learning From Patch and Token
Run Dont Walk Chasing Higher FLOPS for Faster Neural Networks
ScaleDet A Scalable Multi-Dataset Object Detecto
Seeing Beyond the Brain Conditional Diffusion Model With Sparse Mask
SeqTrack Sequence to Sequence Learning for Visual Object Tracking
SparseViT Revisiting Activation Sparsity for Efficient High-Resolution Vision Transform
TexPose Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
The Dark Side of Dynamic Routing Neural Networks Towards Efficiency
Towards Modality-Agnostic Person Re-Identification With Descriptive Query
Train-Once-for-All Personalization
Transfer Knowledge From Head to Tail Uncertainty Calibration Under Long-Tail
TrojDiff Trojan Attacks on Diffusion Models With Diverse Targets
Understanding and Improving Visual Prompting A Label-Mapping Perspectiv
Unsupervised Inference of Signed Distance Functions From Single Sparse Point
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
UV Volumes for Real-Time Rendering of Editable Free-View Human Performanc
ViewNet A Novel Projection-Based Backbone With View Pooling for Few-Shot
Viewpoint Equivariance for Multi-View 3D Object Detection
ViLEM Visual-Language Error Modeling for Image-Text Retrieval
VoxelNeXt Fully Sparse VoxelNet for 3D Object Detection and Tracking
Are Deep Neural Networks SMARTer Than Second Graders
Reproducible Scaling Laws for Contrastive Language-Image Learning
Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding
Automatic High Resolution Wire Segmentation and Removal
AdamsFormer for Spatial Action Localization in the Futu
BEV-SAN Accurate BEV 3D Object Detection via Slice Attention Networks
HDR Imaging With Spatially Varying Signal-to-Noise Ratios
Adversarial Normalization I Can Visualize Everything ICE
Balanced Energy Regularization Loss for Out-of-Distribution Detection
Balanced Spherical Grid for Egocentric View Synthesis
Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network
Local-Guided Global Paired Similarity Representation for Visual Reinforcement Learning
MAIR Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Progressive Random Convolutions for Single Domain Generalization
Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With
TMO Textured Mesh Acquisition of Objects With a Mobile Devic
Context-Aware Relative Object Queries To Unify Video Instance and Panoptic
How to Backdoor Diffusion Models
SceneTrilogy On Human Scene-Sketch and Its Complementarity With Photo an
What Can Human Sketches Do for Object Detection
STDLens Model Hijacking-Resilient Federated Learning for Object Detection
Generative Bias for Robust Visual Question Answering
Implicit 3D Human Mesh Recovery Using Consistency With Pose an
itKD Interchange Transfer-Based Knowledge Distillation for 3D Object Detection
Learning Adaptive Dense Event Stereo From the Image Domain
Look Around for Anomalies Weakly-Supervised Anomaly Detection via Context-Motion Relational
PartDistillation Learning Parts From Instance Segmentation
Transformer-Based Unified Recognition of Two Hands Manipulating Objects
Learning Human-to-Robot Handovers From Point Clouds
Regularization of Polynomial Networks for Image Recognition
Shakes on a Plane Unsupervised Depth Estimation From Unstabilized Photography
Parallel Diffusion Models of Operator and Image for Blind Invers
Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models
BUOL A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D
Command-Driven Articulated Object Understanding and Manipulation
GFPose Learning 3D Human Pose Prior With Gradient Fields
UniHCP A Unified Model for Human-Centric Perceptions
RealImpact A Dataset of Impact Sound Fields for Real Objects
Where We Are and What Were Looking At Query Bas
Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation
Learning To Dub Movies via Hierarchical Prosody Models
Structured 3D Features for Reconstructing Controllable Avatars
The Differentiable Lens Compound Lens Search Over Glass Surfaces an
Seasoning Model Soups for Robustness to Adversarial and Natural Distribution
Biomechanics-Guided Facial Action Unit Detection Through Force Modeling
Feature Aggregated Queries for Transformer-Based Video Object Detectors
KD-DLGAN Data Limited Image Generation via Knowledge Distillation
Learning Joint Latent Space EBM Prior Model for Multi-Layer Generato
Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion
Neuralizer General Neuroimage Analysis Without Re-Training
Mofusion A Framework for Denoising-Diffusion-Based Motion Synthesis
Disentangling Writer and Character Styles for Handwriting Generation
Hybrid Neural Rendering for Large-Scale Scenes With Motion Blu
Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prio
SLOPER4D A Scene-Aware Dataset for Global 4D Human Pose Estimation
Improving Selective Visual Question Answering by Learning From Your Peers
Thermal Spread Functions TSF Physics-Guided Material Classification
Learning Expressive Prompting With Residuals for Vision Transformers
Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning
TimeBalance Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
3D Highlighter Localizing Regions on 3D Shapes via Text Descriptions
Objaverse A Universe of Annotated 3D Objects
Phone2Proc Bringing Robust Robots Into Our Chaotic Worl
Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection
3D-Aware Conditional Image Synthesis
Harmonious Teacher for Cross-Domain Object Detection
Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis
NeRDi Single-View NeRF Synthesis With Language-Guided Diffusion As General Imag
PointVector A Vector Representation in Point Cloud Analysis
SE-ORNet Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondenc
Therbligs in Action Video Understanding Through Motion Primitives
Cross-Domain Image Captioning With Discriminative Finetuning
Learning a Depth Covariance Function
Jorge Reliability in Semantic Segmentation Are We on the Right Track
Luigi DrapeNet Garment Generation and Self-Supervised Draping
Plaen Unbalanced Optimal Transport A Unified Framework for Object Detection
Silva Edirimuni IterativePFN True Iterative Point Cloud Filtering
CAP Robust Point Cloud Classification via Semantic and Structural Modeling
DiffusionRig Learning Personalized Priors for Facial Appearance Editing
Exploring Structured Semantic Prior for Multi Label Recognition With Incomplet
HGFormer Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
Hidden Gems 4D Radar Scene Flow Learning Using Cross-Modal Supervision
Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing
Network Expansion for Practical Training Acceleration
PLA Language-Driven Open-Vocabulary 3D Scene Understanding
Revisiting the P3P Problem
Visual Dependency Transformers Dependency Tree Emerges From Reversed Attention
Robust Mean Teacher for Continual and Gradual Test-Time Adaptation
Sphere-Guided Training of Neural Implicit Surfaces
Adversarial Robustness via Random Projection Filters
Benchmarking Robustness of 3D Object Detection to Common Corruptions
DisWOT Student Architecture Search for Distillation WithOut Training
Fast Monocular Scene Reconstruction With Global-Sparse Local-Dense Grids
Federated Incremental Semantic Segmentation
Implicit Identity Leakage The Stumbling Block to Improving Deepfake Detection
MaskCLIP Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral
Rethinking Optical Flow From Geometric Matching Consistent Perspectiv
The Enemy of My Enemy Is My Friend Exploring Invers
Weakly Supervised Video Representation Learning With Unaligned Text for Sequential
GaitGCI Generative Counterfactual Intervention for Gait Recognition
Multiplicative Fourier Level of Detail
Teaching Structured Vision Language Concepts to Vision Languag
Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis
Federated Learning With Data-Agnostic Distribution Fusion
RWSC-Fusion Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security
Burstormer Burst Image Restoration and Enhancement Transform
Modular Memorability Tiered Representations for Video Memorability Prediction
Adaptive Sparse Convolutional Networks With Global Context Enhancement for Fast
Avatars Grow Legs Generating Smooth Human Motion From Sparse Tracking
Conditional Generation of Audio From Video via Foley Analogies
Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation
Efficient Mask Correction for Click-Based Interactive Image Segmentation
Global and Local Mixture Consistency Cumulative Learning for Long-Tailed Visual
Learning To Render Novel Views From Wide-Baseline Stereo Pairs
Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation
No One Left Behind Improving the Worst Categories in Long-Tail
Object-Goal Visual Navigation via Effective Exploration of Relations Among Historical
On-the-Fly Category Discovery
Rethinking the Approximation Error in 3D Surface Fitting for Point
SuperDisco Super-Class Discovery Improves Visual Recognition for the Long-Tail
Weak-Shot Object Detection Through Mutual Knowledge Trans
StepFormer Self-Supervised Step Discovery and Localization in Instructional Videos
DKM Dense Kernelized Feature Matching for Geometry Estimation
G-MSM Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors
Why Is the Winner the Best
EvShutter Transforming Events for Unconstrained Rolling Shutter Correction
DepGraph Towards Any Structural Pruning
Efficient Robust Principal Component Analysis via Block Krylov Iteration an
EVA Exploring the Limits of Masked Visual Representation Learning at
Learning Analytical Posterior Probability for Human Mesh Recovery
Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blin
TBP-Former Learning Temporal Birds-Eye-View Pyramid for Joint Perception and Prediction
You Can Ground Earlier Than See An Effective and Efficient
ARCTIC A Dataset for Dexterous Bimanual Hand-Object Manipulation
Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction
OpenGait Revisiting Gait Recognition Towards Better Practicality
PMR Prototypical Modal Rebalance for Multimodal Learning
PointListNet Deep Learning on 3D Point Lists
SelfME Self-Supervised Motion Learning for Micro-Expression Recognition
Quantum Multi-Model Fitting
Generative Diffusion Prior for Unified Image Restoration and Enhancement
Masked Auto-Encoders Meet Generative Adversarial Networks and Beyon
CRAFT Concept Recursive Activation FacTorization for Explainability
Dont Lie to Me Robust and Efficient Explainability With Verifi
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in
AeDet Azimuth-Invariant Multi-View 3D Object Detection
Detecting Backdoors in Pre-Trained Encoders
Dynamic Generative Targeted Attacks With Pattern Injection
ERNIE-ViLG 2.0 Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts
Evolved Part Masking for Self-Supervised Learning
Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in
Learning Federated Visual Prompt in Null Space for MRI Reconstruction
MaskCon Masked Contrastive Learning for Coarse-Labelled Dataset
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in
Network-Free Unsupervised Semantic Segmentation With Synthetic Images
Neural Dependencies Emerging From Learning Massive Categories
NVTC Nonlinear Vector Transform Coding
OT-Filter An Optimal Transport Filter for Learning With Noisy Labels
Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism
RONO Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization
Semi-Supervised Learning Made Simple With Self-Supervised Clustering
Tree Instance Segmentation With Temporal Contour Graph
Plateau-Reduced Differentiable Path Tracing
System-Status-Aware Adaptive Network for Online Streaming Video Understanding
Unified Pose Sequence Modeling
Reconstructing Signing Avatars From Video Using Linguistic Priors
Leveraging Temporal Context in Low Representational Power Regimes
Batch Model Consolidation A Multi-Task Model Consolidation Framework
Probing Neural Representations of Scene Perception in a Hippocampally Dependent
K-Planes Explicit Radiance Fields in Space Time and Appearanc
The Best Defense Is a Good Offense Adversarial Augmentation Against
VIVE3D Viewpoint-Independent Video Editing Using 3D-Aware GANs
Controllable Light Diffusion for Portraits
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual
Auto-CARD Efficient and Robust Codec Avatar Driving for Real-Time Mobil
Learning a Simple Low-Light Image Enhancer From Paired Low-Light Instances
Learning Semantic Relationship Among Instances for Image-Text Matching
Neural Transformation Fields for Arbitrary-Styled Font Generation
sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model
StyleAdv Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
Tell Me What Happened Unifying Text-Guided Video Completion via Multimodal
You Do Not Need Additional Priors or Regularizers in Retinex-Bas
CoWs on Pasture Baselines and Benchmarks for Language-Driven Zero-Shot Object
CNVid-3.5M Build Filter and Pre-Train the Large-Scale Public Chinese Video-Text
Collaborative Noisy Label Cleaner Learning Scene-Aware Trailers for Multi-Modal Highlight
Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation
AsyFOD An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object
Backdoor Defense via Adaptively Splitting Poisoned Dataset
Back to the Source Diffusion-Driven Adaptation To Test-Time Corruption
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Decompose More and Aggregate Better Two Closer Looks at Frequency
DKT Diverse Knowledge Transfer Transformer for Class Incremental Learning
Exploring Data Geometry for Continual Learning
Flexible-Cm GAN Towards Precise 3D Dose Prediction in Radiotherapy
Generalized Relation Modeling for Transformer Tracking
High-Fidelity and Freely Controllable Talking Head Video Generation
Implicit Diffusion Models for Continuous Super-Resolution
MIST Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering
SurfelNeRF Neural Surfel Radiance Fields for Online Photorealistic Reconstruction o
The ObjectFolder Benchmark Multisensory Learning With Neural and Real Objects
ULIP Learning a Unified Representation of Language Images and Point
VisFusion Visibility-Aware Online 3D Scene Reconstruction From Videos
Uncurated Image-Text Datasets Shedding Light on Demographic Bias
Samples With Low Loss Curvature Improve Data Efficiency
Transformer-Based Learned Optimization
Recurrent Vision Transformers for Object Detection With Event Cameras
Dense-Localizing Audio-Visual Events in Untrimmed Videos A Large-Scale Benchmark an
GAPartNet Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable an
Human Pose As Compositional Tokens
Learning Neural Volumetric Representations of Dynamic Humans in Minutes
PartManip Learning Cross-Category Generalizable Part Manipulation Policy From Point Clou
Hyperbolic Contrastive Learning for Visual Representations Beyond Objects
Improving Zero-Shot Generalization and Robustness of Multi-Modal Models
Policy Adaptation From Foundation Model Feedback
Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object
Real-Time Evaluation in Online Continual Learning A New Ho
Learning Neural Parametric Head Models
Iterative Next Boundary Detection for Instance Segmentation of Tree Rings
Latency Matters Real-Time Action Forecasting Transform
ImageBind One Embedding Space To Bind Them All
OmniMAE Single Model Masked Pretraining on Images and Videos
Interactive Segmentation of Radiance Fields
Video Compression With Entropy-Constrained Neural Representations
Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural
DiffPose Toward More Reliable 3D Pose Estimation
MMG-Ego4D Multimodal Generalization in Egocentric Action Recognition
SkyEye Self-Supervised Birds-Eye-View Semantic Mapping Using Monocular Frontal View Images
LiDAR-in-the-Loop Hyperparameter Optimization
Leveraging per Image-Token Consistency for Vision-Language Pre-Training
Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspectiv
Finetune Like You Pretrain Improved Finetuning of Zero-Shot Vision Models
Towards Practical Plug-and-Play Diffusion Models
PaCa-ViT Learning Patch-to-Cluster Attention in Vision Transformers
HOOD Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
Image Super-Resolution Using T-Tetromino Pixels
Self-Supervised Implicit Glyph Attention for Text Recognition
StyleSync High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generato
MACARONS Mapping and Coverage Anticipation With RGB Online Self-Supervision
PCT-Net Full Resolution Image Harmonization Using Pixel-Wise Color Transformations
TruFor Leveraging All-Round Clues for Trustworthy Image Forgery Detection an
NIFF Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural
ObjectMatch Robust Registration Using Canonical Object Correspondences
Modernizing Old Photos Using Multiple References via Photorealistic Style Trans
ALOFT A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform fo
Class Attention Transfer Based Knowledge Distillation
Dealing With Cross-Task Class Discrimination in Online Continual Learning
DINN360 Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling
Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition
From Images to Textual Prompts Zero-Shot Visual Question Answering With
GANmouflage 3D Object Nondetection With Texture Fields
HandNeRF Neural Radiance Fields for Animatable Interacting Hands
Hierarchical Fine-Grained Image Forgery Detection and Localization
Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch
Knowledge Distillation for 6D Pose Estimation by Aligning Distributions o
Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation
ShadowDiffusion When Degradation Prior Meets Diffusion Model for Shadow Removal
Texts as Images in Prompt Tuning for Multi-Label Image Recognition
Vid2Avatar 3D Avatar Reconstruction From Videos in the Wild vi
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning
Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grain
Visual Programming Compositional Visual Reasoning Without Training
Mobile User Interface Element Detection via Adaptively Prompt Tuning
MSINet Twins Contrastive Search of Multi-Scale Interaction for Object ReID
Preserving Linear Separability in Continual Learning by Backward Feature Projection
Text With Knowledge Graph Augmented Transformer for Video Captioning
ViP3D End-to-End Visual Trajectory Prediction via 3D Agent Queries
Unified Keypoint-Based Action Recognition Framework via Structured Keypoint Pooling
Best of Both Worlds Multimodal Contrastive Learning With Tabular an
Rigidity-Aware Detection for 6D Object Pose Estimation
Shape-Constraint Recurrent Flow for 6D Object Pose Estimation
A Strong Baseline for Generalized Few-Shot Semantic Segmentation
Hierarchical Neural Memory Network for Low Latency Event Processing
In-Hand 3D Object Scanning From an RGB Sequenc
Efficient Verification of Neural Networks Against LVM-Based Specifications
ABCD Arbitrary Bitwise Coefficient for De-Quantization
AstroNet When Astrocyte Meets Artificial Neural Network
AutoAD Movie Description in Context
FAME-ViL Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
FashionSAP Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training
High-Fidelity 3D Human Digitization From Single 2K Resolution Images
High-Fidelity Event-Radiance Recovery via Transient Event Frequency
Learning a 3D Morphable Face Reflectance Model From Low-Cost Dat
Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis
Noisy Correspondence Learning With Meta Similarity Correction
Reinforcement Learning-Based Black-Box Model Inversion Attacks
Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval
Learning Attention As Disentangler for Compositional Zero-Shot Learning
Semidefinite Relaxations for Robust Multiview Triangulation
Neighborhood Attention Transform
A Generalized Framework for Video Instance Segmentation
CARTO Category and Joint Agnostic Reconstruction of ARTiculated Objects
3D Video Object Detection With Learnable Object-Centric Global Optimization
Align and Attend Multimodal Summarization With Dual Contrastive Losses
Analyzing and Diagnosing Pose Estimation With Attributions
A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization
Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction
CLIP-S4 Language-Guided Self-Supervised Semantic Segmentation
Compositor Bottom-Up Clustering and Compositing for Robust Part and Object
D2Former Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Bas
Dynamic Focus-Aware Positional Queries for Semantic Segmentation
FastInst A Simple Query-Based Model for Real-Time Instance Segmentation
Few-Shot Geometry-Aware Keypoint Localization
Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning
Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training
Grad-PU Arbitrary-Scale Point Cloud Upsampling via Gradient Descent With Learn
MSF Motion-Guided Sequential Fusion for Efficient 3D Object Detection From
Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation
Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation
Towards Scalable Neural Representation for Diverse Videos
MOVES Manipulated Objects in Video Enable Segmentation
Model-Agnostic Gender Debiased Image Captioning
3D Concept Learning and Reasoning From Multi-View Images
ACL-SPC Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion
Watch or Listen Robust Audio-Visual Speech Recognition With Visual Corruption
Evading DeepFake Detectors via Adversarial Statistical Consistency
Mask3D Pre-Training 2D Vision Transformers by Learning Masked 3D Priors
MIC Masked Image Consistency for Context-Enhanced Domain Adaptation
Learning Locally Editable Virtual Humans
Four-View Geometry With Unknown Radial Distortion
Towards Compositional Adversarial Robustness Generalizing Adversarial Training to Composite Semantic
NS3D Neuro-Symbolic Grounding of 3D Objects and Relations
PosterLayout A New Benchmark and Approach for Content-Aware Visual-Textual Presentation
ReVISE Self-Supervised Speech Resynthesis With Visual Input for Universal an
Adaptive Assignment for Geometry Aware Local Feature Matching
Anchor3DLane Learning To Regress 3D Anchors for Monocular 3D Lan
Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial
Clover Towards a Unified Video-Language Alignment and Fusion Model
Collaborative Diffusion for Multi-Modal Face Generation and Editing
Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank
CP3 Channel Pruning Plug-In for Point-Based Networks
Diffusion-Based Generation Optimization and Planning in 3D Scenes
Diversity-Aware Meta Visual Prompting
Divide and Adapt Active Domain Adaptation via Customized Learning
Egocentric Audio-Visual Object Localization
End-to-End Video Matting With Trimap Propagation
Feature Shrinkage Pyramid for Camouflaged Object Detection With Transformers
Generic-to-Specific Distillation of Masked Autoencoders
Implicit Identity Driven Deepfake Face Swapping Detection
Improving Table Structure Recognition With Visual-Alignment Sequential Coordinate Modeling
Inverting the Imaging Process by Learning an Implicit Camera Model
KiUT Knowledge-Injected U-Transformer for Radiology Report Generation
Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging
Learning Sample Relationship for Exposure Correction
Learning To Measure the Point Cloud Reconstruction Loss in
Local Implicit Ray Function for Generalizable Radiance Field Representation
Neural Kernel Surface Reconstruction
Neural Voting Field for Camera-Space 3D Hand Pose Estimation
Not All Image Regions Matter Masked Vector Quantization for Autoregressiv
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation
QuantArt Quantizing Image Style Transfer Towards High Visual Fidelity
RefSR-NeRF Towards High Fidelity and Super Resolution View Synthesis
Rethinking Federated Learning With Domain Shift A Prototype View
Rethinking Few-Shot Medical Segmentation A Vector Quantization View
Revisiting Residual Networks for Adversarial Robustness
Robust Generalization Against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Self-Supervised AutoFlow
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo
SemiCVT Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation
ShapeClipper Scalable 3D Shape Learning From Single-View Images via Geometric
Siamese DET
Style Projected Clustering for Domain Generalized Semantic Segmentation
T-SEA Transfer-Based Self-Ensemble Attack on Object Detection
Towards Accurate Image Coding Improved Autoregressive Image Generation With Dynamic
Tracking Multiple Deformable Objects in Egocentric Videos
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
Twin Contrastive Learning With Noisy Labels
VoP Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training
SOOD Towards Semi-Supervised Oriented Object Detection
Bridging Search Region Interaction With Template for RGB-T Tracking
Unifying Layout Generation With a Decoupled Diffusion Model
SplineCam Exact Visualization and Characterization of Deep Network Geometry an
GeoVLN Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Languag
SimpSON Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network
Architecture Dataset and Model-Scale Agnostic Data-Free Meta-Learning
A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
Collaboration Helps Camera Overtake LiDAR in 3D Detection
Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression
Continuous Sign Language Recognition With Correlation Network
Dense Network Expansion for Class Incremental Learning
Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
Discriminator-Cooperated Feature Map Distillation for GAN Compression
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
GFIE A Dataset and Baseline for Gaze-Following From 2D to
Label-Free Liver Tumor Segmentation
NeRF-RPN A General Framework for Object Detection in NeRFs
Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D
Planning-Oriented Autonomous Driving
Point2Pix Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
REVEAL Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
Self-Guided Diffusion Models
TriVol Point Cloud Rendering via Triple Volumes
You Only Segment Once Towards Real-Time Panoptic Segmentation
Meta-Explore Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Text2Scene Text-Driven Indoor Scene Stylization With Part-Aware Details
Local 3D Editing via 3D Distillation of CLIP Knowledg
Fresnel Microfacet BRDF Unification of Polari-Radiometric Surface-Body Reflection
expOSE Accurate Initialization-Free Projective Factorization Using Exponential Regularization
Scalable Detailed and Mask-Free Universal Photometric Stereo
3D Shape Reconstruction of Semi-Transparent Worms
ScaleFL Resource-Adaptive Federated Learning With Heterogeneous Clients
LayoutDM Discrete Diffusion Model for Controllable Layout Generation
Towards Flexible Multi-Modal Document Models
Bias in Pruned Vision Models In-Depth Analysis and Countermeasures
Exact-NeRF An Exploration of a Precise Volumetric Parameterization for Neural
Improving Image Recognition by Retrieving From Web-Scale Image-Text Dat
Exemplar-FreeSOLO Enhancing Unsupervised Instance Segmentation With Exemplars
Efficient Movie Scene Detection Using State-Space Transformers
RelightableHands Efficient Neural Relighting of Articulated Hand Models
SfM-TTR Using Structure From Motion for Test-Time Refinement of Single-View
Normal-Guided Garment UV Prediction for Human Re-Texturing
A Data-Based Perspective on Transfer Learning
A Meta-Learning Approach to Predicting Performance and Data Requirements
DART Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
Enhanced Stable View Synthesis
OneFormer One Transformer To Rule Universal Image Segmentation
VectorFusion Text-to-SVG by Abstracting Pixel-Based Diffusion Models
VGFlow Visibility Guided Flow Network for Human Reposing
Difficulty-Based Sampling for Debiased Contrastive Representation Learning
Unsupervised Contour Tracking of Live Cells by Mechanical and Cycl
FlexNeRF Photorealistic Free-Viewpoint Rendering of Moving Humans From Sparse Views
Adversarial Counterfactual Visual Explanations
Beyond mAP Towards Better Evaluation of Instance Segmentation
DistractFlow Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
Enhancing Multiple Reliability Measures via Nuisance-Extended Information Bottleneck
WinCLIP Zero-Few-Shot Anomaly Classification and Segmentation
Context-Based Trit-Plane Coding for Progressive Image Compression
Genie Show Me the Data for Quantization
Polarimetric iToF Measuring High-Fidelity Depth Through Scattering Medi
A2J-Transformer Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation
AligNeRF High-Fidelity Neural Radiance Fields via Alignment-Aware Training
A Probabilistic Attention Model With Occlusion-Aware Texture Regression for 3D
Color Backdoor A Robust Poisoning Attack in Color Spac
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
DartBlur Privacy Preservation With Detection Artifact Suppression
DoNet Deep De-Overlapping Network for Cytology Instance Segmentation
Fair Federated Medical Image Segmentation via Client Contribution Estimation
Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy
HumanGen Generating Human Radiance Fields With Explicit Priors
Instant-NVR Instant Neural Volumetric Rendering for Human-Object Interactions From Monocul
InstantAvatar Learning Avatars From Monocular Video in 60 Seconds
LayoutFormer Conditional Graphic Layout Generation via Constraint Serialization and Decoding
Masked and Adaptive Transformer for Exemplar Based Image Translation
MixPHM Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
MotionDiffuser Controllable Multi-Agent Motion Prediction Using Diffusion
Neural Intrinsic Embedding for Non-Rigid Point Cloud Matching
Robust Outlier Rejection for 3D Registration With Variational Bayes
Self-Supervised Pre-Training With Masked Shape Prediction for 3D Scene Understanding
StyleIPSB Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Fac
Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis
MSMDFusion Fusing LiDAR and Camera at Multiple Scales With Multi-Depth
DETRs With Hybrid Matching
Think Twice Before Driving Towards Scalable Decoders for End-to-End Autonomous
Deep Graph Reprogramming
A Unified Pyramid Recurrent Network for Video Frame Interpolation
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training
Deep Incomplete Multi-View Clustering With Cross-View Partial Sample and Prototy
DNF Decouple and Feedback Network for Seeing in the Dark
Fast Contextual Scene Graph Generation With Unbiased Context Augmentation
Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerc
Long-Tailed Visual Recognition via Self-Heterogeneous Integration With Knowledge Excavation
Multi-Level Logit Distillation
Perspective Fields for Single Image Camera Calibration
Randomized Adversarial Training via Taylor Expansion
ReDirTrans Latent-to-Latent Translation for Gaze and Head Redirection
RefCLIP A Universal Teacher for Weakly Supervised Referring Expression Comprehension
TensoIR Tensorial Inverse Rendering
Video-Text As Game Players Hierarchical Banzhaf Interaction for Cross-Modal Representation
Are Binary Annotations Sufficient Video Moment Retrieval via Hierarchical Uncertainty-Bas
MAP Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
Multispectral Video Semantic Segmentation A Benchmark Dataset and Baselin
Seeing What You Miss Vision-Language Pre-Training With Semantic Completion Learning
Spatial-Temporal Concept Based Explanation of 3D ConvNets
Ultra-High Resolution Segmentation With Ultra-Rich Context A Novel Benchmark
ESLAM Efficient Dense SLAM System Based on Hybrid Representation o
Self-Supervised Representation Learning for CAD
AnyFlow Arbitrary Scale Optical Flow With Implicit Neural Representation
Devils on the Edges Selective Quad Attention for Scene Graph
On the Importance of Accurate Geometry Data for Dense 3D
Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization
Human-Art A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weath
BiasBed - Rigorous Texture Bias Evaluation
GeoNet Benchmarking Unsupervised Adaptation Across Geographies
A New Path Scaling Vision-and-Language Navigation With Synthetic Instructions an
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification Segmentation
Meta-Learning With a Geometry-Adaptive Precondition
Scaling Up GANs for Text-to-Image Synthesis
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal
Superclass Learning With Representation Enhancement
The Dialog Must Go On Improving Visual Dialog via Generativ
Variational Distribution Learning for Unsupervised Text-to-Image Generation
BlendFields Few-Shot Example-Driven Facial Modeling
Invertible Neural Skinning
Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
DynamicStereo Consistent Dynamic Depth From Stereo Videos
C-SFDA A Curriculum Learning Aided Self-Training Framework for Efficient Sourc
MED-VT Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation
HOLODIFFUSION Training a 3D Diffusion Model Using 2D Images
FIANCEE Faster Inference of Adversarial Networks via Conditional Early Exits
HARP Personalized Hand Reconstruction From a Monocular RGB Video
Teleidoscopic Imaging System for Microscale 3D Shape Reconstruction
Imagic Text-Based Real Image Editing With Diffusion Models
2PCNet Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object
Mask-Free Video Instance Segmentation
Neural Preset for Color Style Trans
VILA Learning Image Aesthetics From User Comments With Vision-Language Pretraining
Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous
Q How To Specialize Large Vision-Language Models to Data-Scarce VQA
Temporally Consistent Online Depth Estimation Using Point-Based Fusion
MaPLe Multi-Modal Prompt Learning
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
StyleGAN Salon Multi-View Latent Optimization for Pose-Invariant Hairstyle Trans
Towards Unified Scene Text Spotting Based on Sequence Generation
Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label
Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning
DATID-3D Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generativ
DCFace Synthetic Face Generation With Dual Condition Diffusion Model
Demystifying Causal Features on Adversarial Examples and Causal Inoculation fo
Diffusion Video Autoencoders Toward Temporally Consistent Face Video Editing vi
Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields
Feature Separation and Recalibration for Adversarial Robustness
Generalizable Implicit Neural Representations via Instance Pattern Composers
Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Spac
HIER Metric Learning Beyond Class Labels via Hierarchical Regularization
Improving Cross-Modal Retrieval With Set of Diverse Embeddings
MAGVLT Masked Generative Vision-and-Language Transform
NeuralField-LDM Scene Generation With Hierarchical Latent Diffusion Models
On the Stability-Plasticity Dilemma of Class-Incremental Learning
Open-Set Representation Learning Through Combinatorial Embedding
PartMix Regularization Strategy To Learn Part Discovery for Visible-Infrared Person
Re-Thinking Federated Active Learning Based on Inter-Class Diversity
Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers
Relational Context Learning for Human-Object Interaction Detection
Sampling Is Matter Point-Guided 3D Human Mesh Reconstruction
Shepherding Slots to Objects Towards Stable and Robust Object-Centric Learning
Single Domain Generalization for LiDAR Semantic Segmentation
SMPConv Self-Moving Point Representations for Continuous Convolution
Spatio-Focal Bidirectional Disparity Estimation From a Dual-Pixel Imag
The Devil Is in the Points Weakly Semi-Supervised Instance Segmentation
VNE An Effective Method for Improving Deep Representation by Manipulating
Critical Learning Periods for Multisensory Integration in Deep Networks
X3KD Knowledge Distillation Across Modalities Tasks and Stages for Multi-Cam
Two-Way Multi-Label Loss
Explaining Image Classifiers With Multiscale Directional Image Representation
Picture That Sketch Photorealistic Image Generation From Abstract Sketches
Multi-Label Compound Expression Recognition C-EXPR Database Network
Solving Relaxations of MAP-MRF Problems Combinatorial In-Face Frank-Wolfe Directions
Octree Guided Unoriented Surface Reconstruction
Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring
Indescribable Multi-Modal Spatial Evaluato
LaserMix for Semi-Supervised LiDAR Semantic Segmentation
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Understanding Masked Image Modeling via Learning Occlusion Invariant Featu
vMAP Vectorised Object Mapping for Neural Field SLAM
One-Shot Model for Mixed-Precision Quantization
Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning
Passive Micron-Scale Time-of-Flight With Sunlight Interferometry
Swept-Angle Synthetic Wavelength Interferometry
MELTR Meta Loss Transformer for Learning To Fine-Tune Video Foundation
Iterative Vision-and-Language Navigation
PaletteNeRF Palette-Based Appearance Editing of Neural Radiance Fields
Putting People in Their Place Affordance-Aware Human Insertion Into Scenes
StarCraftImage A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent
Learning To Predict Scene-Level Implicit 3D From Posed RGBD Dat
Multi-Concept Customization of Text-to-Image Diffusion
Few-Shot Referring Relationships in Videos
MethaneMapper Spectral Absorption Aware Hyperspectral Transformer for Methane Detection
Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection
IS-GGT Iterative Scene Graph Generation With Generative Transformers
HAAV Hierarchical Aggregation of Augmented Views for Image Captioning
Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier an
Probabilistic Prompt Learning for Dense Prediction
Renderable Neural Radiance Map for Visual Navigation
Spherical Transformer for LiDAR-Based 3D Recognition
Fantastic Breaks A Dataset of Paired 3D Scans of Real-Worl
SCOOP Self-Supervised Correspondence and Optimization-Based Scene Flow
Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion
Vision Transformers Are Good Mask Auto-Labelers
Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic
FitMe Deep Photorealistic 3D Morphable Model Avatars
FFCV Accelerating Training by Removing Data Bottlenecks
BAAM Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention
Decomposed Cross-Modal Distillation for RGB-Based Temporal Action Detection
Decompose Adjust Compose Effective Normalization by Playing With Frequency fo
DP-NeRF Deblurred Neural Radiance Field With Physical Scene Priors
Exploring Discontinuity for Video Frame Interpolation
Fix the Noise Disentangling Source Feature for Controllable Domain Translation
Human Pose Estimation in Extremely Low-Light Conditions
Im2Hands Learning Attentive Implicit Representation of Interacting Two-Hand Shapes
Learning Geometry-Aware Representations by Sketching
Learning Rotation-Equivariant Features for Visual Correspondenc
Multimodal Prompting With Missing Modalities for Visual Recognition
Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization
Revisiting Self-Similarity Structural Embedding for Image Retrieval
Shape-Aware Text-Driven Layered Video Editing
Single View Scene Scale Estimation Using Scale Fiel
TTA-COPE Test-Time Adaptation for Category-Level Object Pose Estimation
A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction
Blind Video Deflickering by Neural Filtering With a Flawed Atlas
EFEM Equivariant Neural Field Expectation Maximization for 3D Object Segmentation
PyramidFlow High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow
RGBD2 Generative Scene Synthesis via Incremental View Inpainting Using RGBD
SliceMatch Geometry-Guided Aggregation for Cross-View Pose Estimation
SeaThru-NeRF Neural Radiance Fields in Scattering Medi
Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision
GamutMLP A Lightweight MLP for Color Loss Recovery
Music-Driven Group Choreography
Adaptive Plasticity Improvement for Continual Learning
CrowdCLIP Unsupervised Crowd Counting via Vision-Language Model
HelixSurf A Robust and Efficient Neural Implicit Surface Learning o
Open-Vocabulary Semantic Segmentation With Mask-Adapted CLI
StyLess Boosting the Transferability of Adversarial Examples
Unknown Sniffer for Object Detection Dont Turn a Blind Ey
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
Bootstrapping Objectness From Videos by Relaxed Common Fate and Visual
Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity
AttentionShift Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instanc
A Light Weight Model for Active Speaker Detection
EMT-NASTransferring Architectural Knowledge Between Tasks From Different Datasets
High-Fidelity Clothed Avatar Reconstruction From a Single Imag
Revisiting Rolling Shutter Bundle Adjustment Toward Accurate and Fast Solution
BiasAdv Bias-Adversarial Augmentation for Model Debiasing
Learning Optical Expansion From Scale Matching
PanoSwin A Pano-Style Swin Transformer for Panorama Understanding
ShadowNeuS Neural SDF Reconstruction by Shadow Ray Supervision
Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
Adaptive Human Matting for Dynamic Videos
Being Comes From Not-Being Open-Vocabulary Text-to-Motion Generation With Wordless Training
Bit-Shrinking Limiting Instantaneous Sharpness for Improving Post-Training Quantization
Catch Missing Details Image Reconstruction With Frequency Augmented Variational Autoenco
CLIP Is Also an Efficient Segmenter A Text-Driven Approach fo
Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
Cross-Domain 3D Hand Pose Estimation With Dual Modalities
Deep Frequency Filtering for Domain Generalization
DynamicDet A Unified Dynamic Architecture for Object Detection
ERM-KTP Knowledge-Level Machine Unlearning via Knowledge Trans
Harmonious Feature Learning for Interactive Hand-Object Pose Estimation
Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images
Learning To Detect Mirrors From Videos via Dual Correspondences
Magic3D High-Resolution Text-to-3D Content Creation
Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis
Meta Architecture for Point Cloud Analysis
Multimodality Helps Unimodality Cross-Modal Few-Shot Learning With Multimodal Models
Neural Scene Chronology
One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transform
Optimal Transport Minimization Crowd Localization on Density Maps for Semi-Supervis
PCR Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning
Supervised Masked Knowledge Distillation for Few-Shot Transformers
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Languag
Video Test-Time Adaptation for Action Recognition
Vision Transformers Are Parameter-Efficient Audio-Visual Learners
Zero-Shot Everything Sketch-Based Image Retrieval and in Explainable Styl
Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation
3D Line Mapping Revisit
AdaptiveMix Improving GAN Training via Feature Space Shrinkag
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
A Soma Segmentation Benchmark in Full Adult Fly Brain
Bitstream-Corrupted JPEG Images Are Restorable Two-Stage Compensation and Alignment Framework
Building Rearticulable Models for Arbitrary 3D Objects From 4D Point
CIGAR Cross-Modality Graph Reasoning for Domain Adaptive Object Detection
Class Adaptive Network Calibration
Continual Detection Transformer for Incremental Object Detection
COT Unsupervised Domain Adaptation With Clustering and Optimal Transport
DA Wand Distortion-Aware Selection Using Neural Mesh Parameterization
DegAE A New Pretraining Paradigm for Low-Level Vision
Delving Into Discrete Normalizing Flows on SO3 Manifold for Probabilistic
Delving Into Shape-Aware Zero-Shot Semantic Segmentation
Delving StyleGAN Inversion for Image Editing A Foundation Latent Spac
Detecting Backdoors During the Inference Stage Based on Corruption Robustness
Diversity-Measurable Anomaly Detection
DualVector Unsupervised Vector Font Synthesis With Dual-Part Representation
EfficientViT Memory Efficient Vision Transformer With Cascaded Group Attention
Explicit Visual Prompting for Low-Level Structure Segmentations
Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization
FAC 3D Representation Learning via Foreground Aware Feature Contrast
Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization
Fine-Grained Face Swapping via Regional GAN Inversion
FlatFormer Flattened Window Attention for Efficient Point Cloud Transform
FlowGrad Controlling the Output of Generative ODEs With Gradients
Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping
GEN Pushing the Limits of Softmax-Based Out-of-Distribution Detection
GRES Generalized Referring Expression Segmentation
Hierarchical Prompt Learning for Multi-Task Learning
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object
Humans As Light Bulbs 3D Human Reconstruction From Thermal Reflection
InstMove Instance Motion for Object-Centric Video Segmentation
Joint HDR Denoising and Fusion A Real-World Mobile HDR Imag
Learned Image Compression With Mixed Transformer-CNN Architectures
Learning Customized Visual Models With Retrieval-Augmented Knowledg
Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation
LEMaRT Label-Efficient Masked Region Transform for Image Harmonization
Marching-Primitives Shape Abstraction From Signed Distance Function
MarS3D A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan
MixMAE Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical
MixTeacher Mining Promising Labels With Mixed Scale Teacher for Semi-Supervis
ML2P-Encoder On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning
MMVC Learned Multi-Mode Video Compression With Block-Based Prediction Mode Selection
Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning
NeUDF Leaning Neural Unsigned Distance Fields With Volume Rendering
NoisyQuant Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
OSAN A One-Stage Alignment Network To Unify Multimodal Alignment an
PartSLIP Low-Shot Part Segmentation for 3D Point Clouds via Pretrain
PD-Quant Post-Training Quantization Based on Prediction Difference Metric
PolyFormer Referring Image Segmentation As Sequential Polygon Generation
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation
PoseExaminer Automated Testing of Out-of-Distribution Robustness in Human Pose an
Progressive Neighbor Consistency Mining for Correspondence Pruning
Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
Promoting Semantic Connectivity Dual Nearest Neighbors Contrastive Learning for Unsupervis
Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring
RIATIG Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts
Robust Dynamic Radiance Fields
SAP-DETR Bridging the Gap Between Salient Points and Queries-Based Transform
SCOTCH and SODA A Transformer Video Shadow Detection Framework
Semantic Ray Learning a Generalizable Semantic Field With Cross-Reprojection Attention
Semi-Weakly Supervised Object Kinematic Motion Prediction
SimpleNet A Simple Network for Image Anomaly Detection and Localization
Single Image Depth Prediction Made Better A Multivariate Gaussian Tak
Slimmable Dataset Condensation
SlowLiDAR Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples
Soft Augmentation for Image Classification
Spectral Bayesian Uncertainty for Image Super-Resolution
StyleRF Zero-Shot 3D Style Transfer of Neural Radiance Fields
SynthVSR Scaling Up Visual Speech Recognition With Synthetic Supervision
Target-Referenced Reactive Grasping for Dynamic Objects
TWINS A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness
Unsupervised Continual Semantic Adaptation Through Neural Rendering
VLPD Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
What You Can Reconstruct From a Shadow
3D-Aware Face Swapping
3D-Aware Multi-Class Image-to-Image Translation With NeRFs
3D Cinemagraphy From a Single Imag
ACSeg Adaptive Conceptualization for Unsupervised Semantic Segmentation
Adjustment and Alignment for Unbiased Open Set Domain Adaptation
Adversarially Masking Synthetic To Mimic Real Adaptive Noise Injection fo
AMT All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
An In-Depth Exploration of Person Re-Identification and Gait Recognition in
Are Data-Driven Explanations Robust Against Out-of-Distribution Dat
AShapeFormer Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection
Azimuth Super-Resolution for FMCW Radar in Autonomous Driving
A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift
A Whac-a-Mole Dilemma Shortcuts Come in Multiples Where Mitigating On
BBDM Image-to-Image Translation With Brownian Bridge Diffusion Models
BioNet A Biologically-Inspired Network for Face Recognition
Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt
Boosting Weakly-Supervised Temporal Action Localization With Text Information
Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning
Compressing Volumetric Radiance Fields to 1 MB
Correlational Image Modeling for Self-Supervised Visual Pre-Training
DANI-Net Uncalibrated Photometric Stereo by Differentiable Shadow Handling Anisotropic Reflectanc
DATE Domain Adaptive Product Seeker for E-Commerc
Decoupled Multimodal Distilling for Emotion Recognition
Deep Random Projector Accelerated Deep Image Prio
Diffusion-SDF Text-To-Shape via Voxelized Diffusion
Discrete Point-Wise Attack Is Not Enough Generalized Manifold Adversarial Attack
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
DISC Learning From Noisy Labels via Dynamic Instance-Specific Selection an
DropKey for Vision Transform
DSFNet Dual Space Fusion Network for Occlusion-Robust 3D Dense Fac
DynaMask Dynamic Mask Selection for Instance Segmentation
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation
DynIBaR Neural Dynamic Image-Based Rendering
Edge-Aware Regional Message Passing Controller for Image Forgery Localization
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
Efficient Multimodal Fusion via Interactive Prompting
Ego-Body Pose Estimation via Ego-Head Pose Estimation
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Languag
FCC Feature Clusters Compression for Long-Tailed Visual Recognition
Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
GLIGEN Open-Set Grounded Text-to-Image Generation
Guided Recommendation for Model Fine-Tuning
Hard Sample Matters a Lot in Zero-Shot Quantization
ImageNet-E Benchmarking Neural Network Robustness via Attribute Editing
Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
Inverse Rendering of Translucent Objects Using Physical and Neural Renderers
KERM Knowledge Enhanced Reasoning for Vision-and-Language Navigation
LAVENDER Unifying Video-Language Understanding As Masked Language Modeling
Learning Distortion Invariant Representation for Image Restoration From a Causality
Learning Generative Structure Prior for Blind Text Image Super-Resolution
Learning Steerable Function for Efficient Image Resampling
Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth
Less Is More Reducing Task and Model Complexity for 3D
Lift3D Synthesize 3D Training Data by Lifting 2D GAN to
Lite DETR An Interleaved Multi-Scale Encoder for Efficient DET
LOCATE Localize and Transfer Object Parts for Weakly Supervised Affordanc
LoGoNet Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion
Long Range Pooling for 3D Large-Scale Scene Understanding
MAGE MAsked Generative Encoder To Unify Representation Learning and Imag
Mask DINO Towards a Unified Transformer-Based Framework for Object Detection
MDQE Mining Discriminative Query Embeddings To Segment Occluded Instances on
MEGANE Morphable Eyeglass and Avatar Network
Metadata-Based RAW Reconstruction via Implicit Neural Functions
MobileBrick Building LEGO for 3D Reconstruction on Mobile Devices
MoDAR Using Motion Forecasting for 3D Object Detection in Point
Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery
MSeg3D Multi-Modal 3D Semantic Segmentation for Autonomous Driving
Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes
Neuralangelo High-Fidelity Neural Surface Reconstruction
Neural Video Compression With Diverse Contexts
NIKI Neural Inverse Kinematics With Invertible Neural Networks for 3D
NLOST Non-Line-of-Sight Imaging With Transform
OmniCity Omnipotent City Understanding With Multi-Level and Multi-View Images
One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Fiel
One-to-Few Label Assignment for End-to-End Dense Detection
On the Effectiveness of Partial Variance Reduction in Federated Learning
Open-Set Semantic Segmentation for Point Clouds via Adversarial Prototype Framework
OVTrack Open-Vocabulary Multiple Object Tracking
Patch-Based 3D Natural Scene Generation From a Single Exampl
Photo Pre-Training but for Sketch
Physical-World Optical Adversarial Attacks on 3D Face Recognition
PillarNeXt Rethinking Network Designs for 3D Object Detection in LiDA
Polarized Color Image Denoising
PREIM3D 3D Consistent Precise Image Attribute Editing From a Singl
ProxyFormer Proxy Alignment Assisted Point Cloud Completion With Missing Part
Referring Image Matting
Regularize Implicit Neural Representation by Itsel
Rethinking Feature-Based Knowledge Distillation for Face Recognition
Rethinking Out-of-Distribution OOD Detection Masked Image Modeling Is All You
Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation
Scaling Language-Image Pre-Training via Masking
ScarceNet Animal Pose Estimation With Scarce Annotations
SCConv Spatial and Channel Reconstruction Convolution for Feature Redundancy
SECAD-Net Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations
Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization
SGLoc Scene Geometry Encoding for Outdoor LiDAR Localization
SHS-Net Learning Signed Hyper Surfaces for Oriented Normal Estimation o
Sibling-Attack Rethinking Transferable Adversarial Attacks Against Face Recognition
SIM Semantic-Aware Instance Mask Generation for Box-Supervised Instance Segmentation
Source-Free Video Domain Adaptation With Spatial-Temporal-Historical Consistency Learning
Spatial-Then-Temporal Self-Supervised Learning for Video Correspondenc
Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
SteerNeRF Accelerating NeRF Rendering via Smooth Viewpoint Trajectory
StyleGene Crossover and Mutation of Region-Level Facial Genes for Kinshi
Super-CLEVR A Virtual Benchmark To Diagnose Domain Robustness in Visual
SViTT Temporal Learning of Sparse Video-Text Transformers
Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whol
Token Boosting for Robust Self-Supervised Visual Transformer Pre-Training
ToThePoint Efficient Contrastive Learning of 3D Point Clouds via Recycling
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
Trade-Off Between Robustness and Accuracy of Vision Transformers
Uni-Perceiver v2 A Generalist Model for Large-Scale Vision and Vision-Languag
Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
VoxFormer Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion
Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving
WINNER Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding
Beyond Attentive Tokens Incorporating Token Importance and Diversity for Efficient
CapDet Unifying Dense Captioning and Open-World Detection Pretraining
NeuralUDF Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces
PointClustering Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering
All-in-Focus Imaging From Event Focal Stack
Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic
High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency
Camouflaged Instance Segmentation via Explicit De-Camouflaging
Class-Incremental Exemplar Compression for Class-Incremental Learning
Constrained Evolutionary Diffusion Filter for Monocular Endoscope Tracking
GeoLayoutLM Geometric Pre-Training for Visual Information Extraction
GradMA A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting
Leverage Interactive Affinity for Affordance Learning
MOT Masked Optimal Transport for Partial Domain Adaptation
RaBit Parametric Modeling of 3D Biped Cartoon Characters With
Semantic-Conditional Diffusion Networks for Image Captioning
SIEDOB Semantic Image Editing by Disentangling Object and Backgroun
Towards Generalisable Video Moment Retrieval Visual-Dynamic Injection to Image-Text Pre-Training
Zero-Shot Model Diagnosis
Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
LinK Linear Kernel for LiDAR-Based 3D Perception
Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Trans
Neuron Structure Modeling for Generalizable Remote Physiological Measurement
Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation
PADA Jointly Sampling Path and Data for Consistent NAS
Robust and Scalable Gaussian Process Regression and Its Applications
Specialist Diffusion Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To
TransFlow Transformer As Flow Learn
Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
Improving Generalization With Domain Convex Gam
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Box-Level Active Detection
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
Heterogeneous Continual Learning
Tunable Convolutions With Parametric Multi-Loss Optimization
Transfer4D A Framework for Frugal Motion Capture and Deformation Trans
Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
NIRVANA Neural Implicit Representations of Videos With Adaptive Networks an
DualRel Semi-Supervised Mitochondria Segmentation From a Prototype Perspectiv
Chat2Map Efficient Scene Mapping From Multi-Ego Conversations
Change-Aware Sampling and Contrastive Learning for Satellite Images
Zero-Shot Noise2Noise Efficient Image Denoising Without Any Dat
BEV-Guided Multi-Modality Fusion for Driving Perception
Doubly Right Object Recognition A Why Prompt for Visual Rationales
Leapfrog Diffusion Model for Stochastic Trajectory Prediction
Computational Flash Photography Through Intrinsics
SLACK Stable Learning of Augmentations With Cold-Start and KL Regularization
3D Human Mesh Estimation From Virtual Markers
3D Video Loops From Asynchronous Input
Annealing-Based Label-Transfer Learning for Open World Object Detection
CAT LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object
CREPE Can Vision-Language Foundation Models Reason Compositionally
Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification
DiGeo Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
Dynamic Aggregated Network for Gait Recognition
OTAvatar One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering
ProD Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspectiv
Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion
Towards Better Gradient Consistency for Neural Signed Distance Functions vi
Language-Guided Music Recommendation for Video via Prompt Analogies
Spring A High-Resolution High-Detail Dataset and Benchmark for Scene Flow
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
Deep Polarization Reconstruction With PDAVIS Events
Exploring and Utilizing Pattern Imbalanc
LightPainter Interactive Portrait Relighting With Freehand Scribbl
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
PC2 Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction
RealFusion 360deg Reconstruction of Any Object From a Single Imag
Modality-Invariant Visual Odometry for Embodied Vision
Detection Hub Unifying Object Detection Datasets via Query Adaptation on
NeAT Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View
On Distillation of Guided Diffusion Models
Data-Driven Feature Tracking for Event Cameras
DivClust Controlling Diversity in Deep Clustering
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
Guided Depth Super-Resolution by Deep Anisotropic Diffusion
Progressively Optimized Local Radiance Fields for Robust View Synthesis
Unsupervised Space-Time Network for Temporally-Consistent Segmentation of Multiple Motions
Realistic Saliency Guided Image Enhancement
FedSeg Class-Heterogeneous Federated Learning for Semantic Segmentation
Recurrence Without Recurrence Stable Video Landmark Detection With Deep Equilibrium
Alias-Free Convnets Fractional Shift Invariance via Polynomial Activations
MobileVOS Real-Time Video Object Segmentation Contrastive Learning Meets Knowledge Distillation
Deep Dive Into Gradients Better Optimization for 3D Object Detection
NeurOCS Neural NOCS Supervision for Monocular 3D Object Localization
SPIn-NeRF Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields
ActMAD Activation Matching To Align Distributions for Test-Time-Training
Ranking Regularization for Critical Rare Classes Minimizing False Positives at
NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
Gazeformer Scalable Effective and Fast Prediction of Goal-Directed Human Attention
Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
Large-Capacity and Flexible Video Steganography via Invertible Neural Network
Audio-Visual Grouping Network for Sound Localization From Mixtures
Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyfram
Event-Based Shape From Polarization
Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares
Deep Deterministic Uncertainty A New Simple Baselin
Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning
DiffRF Rendering-Guided 3D Radiance Field Diffusion
Bridging Precision and Confidence A Train-Time Loss for Calibrating Object
EC2 Emergent Communication for Embodied Control
Progressive Backdoor Erasing via Connecting Backdoor and Adversarial Attacks
I2MVFormer Large Language Model Generated Multi-View Document Supervision for Zero-Shot
Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow
Post-Processing Temporal Action Detection
Unbiased Scene Graph Generation in Videos
3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D
Unite and Conquer Plug Play Multi-Modal Synthesis Using Diffusion
Sparse Multi-Modal Graph Transformer With Shared-Context Processing for Representation Learning
DF-Platter Multi-Face Heterogeneous Deepfake Dataset
ProtoCon Pseudo-Label Refinement via Online Clustering and Prototypical Consistency fo
PIP-Net Patch-Based Intuitive Prototypes for Interpretable Image Classification
DARE-GRAM Unsupervised Domain Adaptation Regression by Aligning Inverse Gram Matrices
ISBNet A 3D Point Cloud Instance Segmentation Network With Instance-Aw
Efficient Scale-Invariant Generator With Column-Row Entangled Pixel Synthesis
Micron-BERT BERT-Based Facial Micro-Expression Recognition
Re-Thinking Model Inversion Attacks Against Deep Neural Networks
TIPI Test Time Adaptation With Transformation Invarianc
Bilateral Memory Consolidation for Continual Learning
Learning 3D Scene Priors With 2D Supervision
HOICLIP Efficient Knowledge Transfer for HOI Detection With Vision-Language Models
Trap Attention Monocular Depth Estimation With Manual Traps
Domain Expansion of Image Generators
Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark
Conditional Image-to-Video Generation With Latent Flow Diffusion Models
NUWA-LIP Language-Guided Image Inpainting With Defect-Free VQGAN
PATS Patch Area Transportation With Subdivision for Local Feature Matching
Disentangled Representation Learning for Unsupervised Neural Quantization
Adaptive Global Decay Process for Event Cameras
Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in
Neural Congealing Aligning Images to a Joint Semantic Atlas
AssemblyHands Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
BlackVIP Black-Box Visual Prompting for Robust Transfer Learning
Recovering 3D Hand Mesh Sequence From a Single Blurry Imag
Towards Universal Fake Image Detectors That Generalize Across Generative Models
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification an
Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns
Cross-GAN Auditing Unsupervised Identification of Attribute Level Similarities and Differences
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
DyNCA Real-Time Dynamic Texture Synthesis Using Neural Cellular Automat
B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution
Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities an
Visual Localization Using Imperfect 3D Models From the Internet
Backdoor Cleansing With Unlabeled Dat
DPE Disentanglement of Pose and Expression for General Video Portrait
Standing Between Past and Future Spatio-Temporal Modeling for Multi-Camera 3D
Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast
BAEFormer Bi-Directional and Early Interaction Transformers for Birds Eye View
Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-Worl
Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network
Slide-Transformer Hierarchical Vision Transformer With Local Self-Attention
Stitchable Neural Networks
Towards Open-World Segmentation of Parts
Learning To Name Classes for Vision and Language Models
All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters
BiFormer Learning Bilateral Motion Estimation via Bilateral Transformer for 4K
Dual-Path Adaptation From Image to Video Transformers
LANIT Language-Driven Image-to-Image Translation for Unlabeled Dat
Mask-Guided Matting in the Wil
Multi-Modal Representation Learning With Text-Driven Soft Masks
Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation
RGB No More Minimally-Decoded JPEG Vision Transformers
Self-Positioning Point-Based Transformer for Point Cloud Understanding
Temporal Interpolation Is All You Need for Dynamic Neural Radianc
Training Debiased Subnetworks With Contrastive Weight Pruning
ViPLO Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction
Learning To Retain While Acquiring Combating Distribution-Shift in Adversarial Data-F
Sequential Training of GANs Against GAN-Classifiers Reveals Correlated Knowledge Gaps
Multiclass Confidence and Localization Calibration for Object Detection
DeepLSD Line Segment Detection and Refinement With Deep Image Gradients
Shape Pose and Appearance From a Single Image via Bootst
Megahertz Light Steering Without Moving Parts
StyleRes Transforming the Residuals for Real Image Editing With StyleGAN
CLIPPING Distilling CLIP-Based Models With a Student Base for Video-Languag
Re-Basin via Implicit Sinkhorn Differentiation
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
On the Convergence of IRLS and Its Variants in Outlier-Robust
OpenScene 3D Scene Understanding With Open Vocabularies
Perception and Semantic Aware Regularization for Sequential Confidence Calibration
Representing Volumetric Videos As Dynamic MLP Maps
Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
Use Your Head Improving Long-Tail Video Recognition
pCON Polarimetric Coordinate Networks for Neural Scene Representations
Object Pop-Up Can We Infer 3D Objects and Their Poses
HyperCUT Video Sequence From a Single Blurry Image Using Unsupervis
Wavelet Diffusion Models Are Fast and Scalable Image Generators
iDisc Internal Discretization for Monocular Depth Estimation
Rethinking Video ViTs Sparse Video Tubes for Joint Image an
SegLoc Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization
Handwritten Text Generation From Visual Archetypes
DynaFed Tackling Client Data Heterogeneity With Global Dynamics
Frame Interpolation Transformer and Uncertainty Guidanc
GlassesGAN Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspac
Robust Unsupervised StyleGAN Image Restoration
Handy Towards a High Fidelity 3D Hand Shape and Appearanc
Enhancing Deformable Local Features by Jointly Learning To Detect an
Computationally Budgeted Continual Learning What Does Matt
DINER Depth-Aware Image-Based NEural Radiance Fields
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
Adaptive Data-Free Quantization
End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curv
Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation
Bi-Level Meta-Learning for Few-Shot Domain Generalization
Class-Balancing Diffusion Models
Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration
FreeSeg Unified Universal and Open-Vocabulary Image Segmentation
Ground-Truth Free Meta-Learning for Deep Compressive Sampling
Learning To Exploit the Sequence-Specific Prior Knowledge for Image Processing
MotionTrack Learning Robust Short-Term and Long-Term Motions for Multi-Object Tracking
Reliable and Interpretable Personalized Federated Learning
Robust 3D Shape Classification via Non-Local Graph Attention Network
CafeBoost Causal Feature Boost To Eliminate Task-Induced Bias for Class
Graph Representation for Order-Aware Visual Transformation
Looking Through the Glass Neural Surface Reconstruction Against High Specul
PSVT End-to-End Multi-Person 3D Pose and Shape Estimation With Progressiv
REC-MV REconstructing 3D Dynamic Cloth From Monocular Videos
Diverse 3D Hand Gesture Prediction From Body Dynamics by Bilateral
Motion Information Propagation for Neural Video Compression
Real-Time 6K Image Rescaling With Rate-Distortion Optimization
Bias Mimicking A Simple Sampling Approach for Bias Mitigation
Neumann Network With Recursive Kernels for Single Image Defocus Deblurring
A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation
How To Prevent the Poor Performance Clients for Personalized Federat
Learning To Segment Every Referring Object Point by Point
Modality-Agnostic Debiasing for Single Domain Generalization
SketchXAI A First Look at Explainability for Human Sketches
Towards Robust Tampered Text Detection in Document Image New Dataset
Upcycling Models Under Domain and Category Shift
MoDi Unconditional Motion Synthesis From Diverse Dat
Filtering Distillation and Hard Negatives for Vision-Language Pre-Training
Ambiguous Medical Image Segmentation Using Diffusion Models
Learning Partial Correlation Based Deep Visual Representation for Image Classification
Make-a-Story Visual Memory Conditioned Consistent Story Generation
Infinite Photorealistic Worlds Using Procedural Generation
On the Benefits of 3D Pose and Tracking for Human
NaQ Leveraging Narrations As Queries To Supervise Episodic Memory
PACO Parts and Attributes of Common Objects
Overlooked Factors in Concept-Based Explanations Dataset Choice Concept Learnability an
SmallCap Lightweight Image Captioning Prompted With Retrieval Augmentation
PIRLNav Pretraining With Imitation and RL Finetuning for ObjectNav
Visual DNA Representing and Comparing Images Using Distributions of Neuron
Hybrid Active Learning via Deep Clustering for Video Action Detection
NoisyTwins Class-Consistent and Diverse Image Generation Through StyleGANs
FaceLit Neural 3D Relightable Faces
Masked Representation Learning for Domain Generalized Stereo Matching
TranSG Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompt
Fine-Tuned CLIP Models Are Efficient Video Learners
Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement
Understanding Deep Generative Models With Generalized Empirical Likelihoods
Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types fo
Trace and Pace Controllable Pedestrian Animation via Guided Trajectory Diffusion
Autonomous Manipulation Learning for Similar Deformable Objects via Only On
Crossing the Gap Domain Generalization for Image Captioning
Defining and Quantifying the Emergence of Sparse Concepts in DNNs
Focus on Details Online Multi-Object Tracking With Diverse Fine-Grained Representation
Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues
Masked Jigsaw Puzzle A Versatile Position Embedding for Vision Transformers
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
TinyMIM An Empirical Study of Distilling MIM Pre-Trained Models
VolRecon Volume Rendering of Signed Ray Distance Functions for Generalizabl
CoralStyleCLIP Co-Optimized Region and Layer Selection for Image Editing
Masked Wavelet Representation for Compact Neural Radiance Fields
NeRFLight Fast and Light Neural Radiance Fields Using a Sh
PivoTAL Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
Novel Class Discovery for 3D Point Cloud Semantic Segmentation
UMat Uncertainty-Aware Single Image High Resolution Material Captu
Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching
Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation
Proximal Splitting Adversarial Attack for Semantic Segmentation
PermutoSDF Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices
FJMP Factorized Joint Multi-Agent Motion Prediction Over Learned Directed Acyclic
MM-Diffusion Learning Multi-Modal Diffusion Models for Joint Audio and Video
EventNeRF Neural Radiance Fields From a Single Colour Event Cam
BITE Beyond Priors for Improved Three-D Dog Pose Estimation
DreamBooth Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
GazeNeRF 3D-Aware Gaze Redirection With Neural Radiance Fields
Token Contrast for Weakly-Supervised Semantic Segmentation
Egocentric Auditory Attention Localization in Conversations
Token Turing Machines
Instant Domain Augmentation for LiDAR Semantic Segmentation
OCELOT Overlapped Cell on Tissue Dataset for Histopathology
RobustNeRF Ignoring Distractors With Robust Losses
CUDA Convolution-Based Unlearnable Datasets
Re-IQA Unsupervised Learning for Image Quality Assessment in the Wil
CLIP for All Things Zero-Shot Sketch-Based Image Retrieval Fine-Grained o
Exploiting Unlabelled Photos for Stronger Fine-Grained SBI
Pic2Word Mapping Pictures to Words for Zero-Shot Composed Image Retrieval
Prefix Conditioning Unifies Language and Label Supervision
RUST Latent Neural Scene Representations From Unposed Imagery
CLIP-Sculptor Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural
Structured Kernel Estimation for Photon-Limited Deconvolution
WIRE Wavelet Implicit Neural Representations
Simulated Annealing in Early Layers Leads to Better Generalization
Fake It Till You Make It Learning Transferable Representations From
Parameter Efficient Local Implicit Image Function Network for Face Segmentation
OrienterNet Visual Localization in 2D Public Maps With Neural Matching
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Prompt-Guided Zero-Shot Anomaly Action Recognition Using Pretrained Deep Skeleton Features
Unsupervised Intrinsic Image Decomposition With LiDAR Intensity
OReX Object Reconstruction From Planar Cross-Sections Using Neural Fields
Re-GAN Data-Efficient GANs Training via Architectural Reconfiguration
A Large-Scale Robustness Analysis of Video Action Recognition Models
Safe Latent Diffusion Mitigating Inappropriate Degeneration in Diffusion Models
Simple Cues Lead to a Strong Multi-Object Track
HuManiFlow Ancestor-Conditioned Normalising Flows on SO3 Manifolds for Human Pos
Independent Component Alignment for Multi-Task Learning
Leveraging Hidden Positives for Unsupervised Semantic Segmentation
AVFormer Injecting Vision Into Frozen Speech Models for Zero-Shot AV-AS
MixNeRF Modeling a Ray With Mixture Density for Novel View
DeAR Debiasing Vision-Language Models With Additive Residuals
HouseDiffusion Vector Floorplan Generation via a Diffusion Model With Discret
Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding
HaLP Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions
CLIP2Protect Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent
Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces
Incrementer Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing
Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Tim
Post-Training Quantization on Diffusion Models
Detecting and Grounding Multi-Modal Media Manipulation
Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual
ReasonNet End-to-End Driving With Temporal and Global Reasoning
Tensor4D Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction an
Cant Steal Cont-Steal Contrastive Stealing Attacks Against Image Encoders
PixHt-Lab Pixel Height Based Light Effect Generation for Image Compositing
Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising
DeepMAD Mathematical Architecture Design for Deep Convolutional Neural Network
DiffTalk Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
DiGA Distil To Generalize and Then Adapt for Domain Adaptiv
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With
Equiangular Basis Vectors
Fine-Grained Audible Video Description
GINA-3D Learning To Generate Implicit Neural Assets in the Wil
Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation
Learning Human Mesh Recovery in 3D Scenes
LidarGait Benchmarking 3D Gait Recognition With Point Clouds
MoStGAN-V Video Generation With Temporal Motion Styles
PointCMP Contrastive Mask Prediction for Self-Supervised Learning on Point Clou
Progressive Transformation Learning for Leveraging Virtual Images in Training
Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
StructVPR Distill Structural Knowledge With Weighting Samples for Visual Plac
X-Avatar Expressive Human Avatars
PLIKS A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body
Listening Human Behavior 3D Human Pose Estimation With Acoustic Signals
Learning Decorrelated Representations Efficiently Using Fast Fourier Transform
Diffusion-Based Signed Distance Fields for 3D Shape Generation
Deep Depth Estimation From Thermal Imag
Local Connectivity-Based Density Estimation for Face Clustering
NIPQ Noise Proxy-Based Integrated Pseudo-Quantization
SDC-UDA Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality
FlowFormer Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Learning 3D-Aware Image Synthesis With Unknown Pose Distribution
Make Landscape Flatter in Differentially Private Federated Learning
Matching Is Not Enough A Two-Stage Framework for Category-Agnostic Pos
Top-Down Visual Attention From Analysis by Synthesis
Transformer Scale Gate for Semantic Segmentation
TriDet Temporal Action Detection With Relative Boundary Modeling
GraVoS Voxel Selection for 3D Point-Cloud Detection
3D Neural Field Generation Using Triplane Diffusion
Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual
Unsupervised Volumetric Animation
Panoptic Lifting for 3D Scene Understanding With Neural Fields
Adaptive Annealing for Robust Geometric Estimation
Unsupervised Object Localization Observing the Background To Discover Objects
Depth Estimation From Camera Image and mmWave Radar Point Clou
EVAL Explainable Video Anomaly Localization
High-Fidelity Guided Image Synthesis With Latent Diffusion Models
Multi Domain Learning for Motion Magnification
Polynomial Implicit Neural Representations for Large Diverse Datasets
Common Pets in 3D Dynamic New-View Synthesis of Real-Life Deformabl
SparsePose Sparse-View Camera Pose Regression and Refinement
Angelic Patches for Improving Third-Party Object Detector Performanc
Fully Self-Supervised Depth Estimation From Defocus Clu
CODA-Prompt COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning
ConStruct-VL Data-Free Continual Structured VL Concepts Learning
Visual Prompt Tuning for Generative Transfer Learning
Integral Neural Networks
Role of Transients in Two-Bounce Non-Line-of-Sight Imaging
Diffusion Art or Digital Forgery Investigating Data Replication in Diffusion
Advancing Visual Grounding With Scene Knowledge Benchmark and Metho
DIFu Depth-Guided Implicit Function for Clothed Human Reconstruction
EcoTTA Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
Efficient Hierarchical Entropy Model for Learned Point Cloud Compression
Learning With Fantasy Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
ObjectStitch Object Compositing With Diffusion Model
OPE-SR Orthogonal Position Encoding for Designing a Parameter-Free Upsampling Modul
Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing
Robust Single Image Reflection Removal Against Adversarial Attacks
Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity
SinGRAF Learning a 3D Generative Radiance Field for a Singl
MarginMatch Improving Semi-Supervised Learning with Pseudo-Margins
Non-Contrastive Unsupervised Learning of Physiological Signals From Video
Unicode Analogies An Anti-Objectivist Visual Reasoning Challeng
How You Feelin Learning Emotions and Mental States in Movi
Learning Articulated Shape With Keypoint Pseudo-Labels From Web Images
CrOC Cross-View Online Clustering for Dense Visual Representation Learning
The Wisdom of Crowds Temporal Progressive Attention for Early Action
BASiS Batch Aligned Spectral Embedding Spac
Omnimatte3D Associating Objects and Their Effects in Unconstrained Monocular Video
ScanDMM A Deep Markov Model of Scanpath Prediction for 360deg
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological
BKinD-3D Self-Supervised 3D Keypoint Discovery From Multi-View Videos
Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards
Consistent Direct Time-of-Flight Video Depth Super-Resolution
Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution
Decoupling Learning and Remembering A Bilevel Memory Framework With Knowledg
DeFeeNet Consecutive 3D Human Motion Prediction With Deviation Feedback
Event-Based Frame Interpolation With Ad-Hoc Deblurring
Hierarchical Semantic Contrast for Scene-Aware Video Anomaly Detection
Indiscernible Object Counting in Underwater Scenes
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing
Masked Motion Encoding for Self-Supervised Video Representation Learning
MISC210K A Large-Scale Dataset for Multi-Instance Semantic Correspondenc
MOSO Decomposing MOtion Scene and Object for Video Prediction
Next3D Generative Neural Texture Rasterization for 3D-Aware Head Avatars
Pose Synchronization Under Multiple Pair-Wise Relative Poses
RefTeacher A Strong Baseline for Semi-Supervised Referring Expression Comprehension
Regularizing Second-Order Influences for Continual Learning
Rethinking Domain Generalization for Face Anti-Spoofing Separability and Alignment
Single Image Backdoor Inversion via Robust Smoothed Classifiers
Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal
TRACE 5D Temporal Regression of Avatars With Dynamic Cameras in
Ultrahigh Resolution ImageVideo Matting With Spatio-Temporal Sparsity
MixSim A Hierarchical Framework for Mixed Reality Traffic Simulation
S3C Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Language Adaptive Weight Generation for Multi-Task Visual Grounding
Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain
Breaching FedMD Image Recovery via Paired-Logits Inversion Attack
Visual Atoms Pre-Training Vision Transformers With Sinusoidal Waves
Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Featu
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention
ABLE-NeRF Attention-Based Rendering With Learnable Embeddings for Neural Radiance Fiel
A New Benchmark On the Utility of Synthetic Data With
Contrastive Grouping With Transformer for Referring Image Segmentation
DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object
Fair Scratch Tickets Finding Fair Sparse Networks Without Weight Training
FLAG3D A 3D Fitness Activity Dataset With Language Instruction
Graph Transformer GANs for Graph-Constrained House Generation
HumanBench Towards General Human-Centric Perception With Projector Assisted Pretraining
Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models
Label Information Bottleneck for Label Enhancement
Master Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic
NeuMap Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
Parts2Words Learning Joint Embedding of Point Clouds and Texts by
Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prio
Unifying Vision Text and Layout for Universal Document Processing
Visual Recognition by Request
Weakly Supervised Posture Mining for Fine-Grained Classification
What Happened 3 Seconds Ago Inferring the Past With Thermal
You Need Multiple Exiting Dynamic Early Exiting for Accelerating Unifi
Interactive and Explainable Region-Guided Radiology Report Generation
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigg
Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
Learning on Gradients Generalized Artifacts Representation for GAN-Generated Images Detection
Sample-Level Multi-View Graph Clustering
SMOC-Net Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation
Temporal Attention Unit Towards Efficient Spatiotemporal Predictive Learning
Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability
GALIP Generative Adversarial CLIPs for Text-to-Image Synthesis
Siamese Image Modeling for Self-Supervised Vision Representation Learning
Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection an
ViTs for SITS Vision Transformers for Satellite Image Time Series
Jedi Entropy-Based Localization and Removal of Adversarial Patches
Logical Implications for Visual Question Answering Consistency
CaPriDe Learning Confidential and Private Decentralized Learning Based on Encryption-Friendly
Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning
Full or Weak Annotations An Adaptive Strategy for Budget-Constrained Annotation
FLEX Full-Body Grasping Without Full-Body Grasps
Generating Part-Aware Editable 3D Shapes Without 3D Supervision
Learning To Zoom and Unzoom
CABM Content-Aware Bit Mapping for Single Image Super-Resolution Network With
GeoMAE Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training
GradICON Approximate Diffeomorphisms via Gradient Inverse Consistency
Integrally Pre-Trained Transformer Pyramid Networks
Manipulating Transfer Learning for Property Inferenc
Modeling the Distributional Uncertainty for Salient Object Detection Models
Multi-Object Manipulation via Object-Centric Neural Scattering Functions
ResFormer Scaling ViTs With Multi-Resolution Training
Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation
Trainable Projected Gradient Method for Robust Fine-Tuning
Revisiting Reverse Distillation for Anomaly Detection
Energy-Efficient Adaptive 3D Sensing
ORCa Glossy Objects As Radiance-Field Cameras
Breaking the Object in Video Object Segmentation
TeSLA Test-Time Self-Learning With Automatic Adversarial Augmentation
Seeing Through the Glass Neural 3D Reconstruction of Object Insi
ReLight My NeRF A Dataset for Novel View Synthesis an
NeRF-Supervised Deep Stereo
Co-Training 2L Submodels for Visual Recognition
3D Human Pose Estimation via Intuitive Physics
Edges to Shapes to Concepts Adversarial Augmentation for Robust Vision
Hubs and Hyperspheres Reducing Hubness and Improving Transductive Few-Shot Learning
On the Effects of Self-Supervision and Contrastive Alignment in D
FREDOM Fairness Domain Adaptation Approach to Semantic Scene Understanding
SPARF Neural Radiance Fields From Sparse and Noisy Poses
CLIPPO Image-and-Language Understanding From Pixels Only
Consistent View Synthesis With Pose-Guided Diffusion Models
EDGE Editable Dance Generation From Music
Improving Visual Representation Learning Through Perceptual Understanding
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
SUDS Scalable Urban Dynamic Scenes
A Bag-of-Prototypes Representation for Dataset-Level Applications
Learning From Noisy Labels With Decoupled Meta Label Purifi
Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking
Toward Accurate Post-Training Quantization for Image Super Resolution
Visual Query Tuning Towards Effective Usage of Intermediate Representations fo
DeGPR Deep Guided Posterior Regularization for Multi-Class Cell Detection an
Learning Situation Hyper-Graphs for Video Question Answering
SCADE NeRFs from Space Carving With Ambiguity-Aware Depth Estimates
Dynamic Inference With Grounding Based Vision and Language Models
Patch-Craft Self-Supervised Training for Correlated Image Denoising
Amsterdam ASPnet Action Segmentation With Shared-Private Representation of Multiple Data Sources
Hoorick Tracking Through Containers and Occluders in the Wil
CUF Continuous Upsampling Filters
MobileOne An Improved One Millisecond Mobile Backbon
GeneCIS A Benchmark for General Conditional Image Similarity
Test Time Adaptation With Regularized Loss for Weakly Supervised Salient
JRDB-Pose A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking
CLIP the Gap A Single Domain Generalization Approach for Object
Learning Transformations To Reduce the Geometric Shift in Object Detection
PIVOT Prompting for Video Continual Learning
Connecting Vision and Language With Video Localized Narratives
Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction
A-Cap Anticipation Captioning With Commonsense Knowledg
Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
Mask-Free OVIS Open-Vocabulary Instance Segmentation Without Manual Mask Annotations
EDICT Exact Diffusion Inversion via Coupled Transformations
Teaching Matters Investigating the Role of Supervision in Vision Transformers
Gated Stereo Joint Depth Estimation From Gated and Wide-Baseline Activ
3Mformer Multi-Order Multi-Mode Transformer for Skeletal Action Recognition
Accelerating Vision-Language Pretraining With Free Language Modeling
Adapting Shortcut With Normalizing Flow An Efficient Tuning Framework fo
Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo
All in One Exploring Unified Video-Language Pre-Training
AltFreezing for More General Video Face Forgery Detection
ALTO Alternating Latent Topologies for Implicit 3D Reconstruction
Are We Ready for Vision-Centric Driving Streaming Perception The ASA
ARO-Net Learning Implicit Fields From Anchored Radial Observations
AttriCLIP A Non-Incremental Learner for Incremental Knowledge Learning
AutoRecon Automated 3D Object Discovery and Reconstruction
A Practical Stereo Depth System for Smart Glasses
A Practical Upper Bound for the Worst-Case Attribution Deviations
BAD-NeRF Bundle Adjusted Deblur Neural Radiance Fields
Balancing Logit Variation for Long-Tailed Semantic Segmentation
BEV-LaneDet An Efficient 3D Lane Detection Based on Virtual Cam
Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning
Bi-LRFusion Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
Binary Latent Diffusion
CF-Font Content Fusion for Few-Shot Font Generation
Clothed Human Performance Capture With a Double-Layer Neural Radiance Fields
Co-SLAM Joint Coordinate and Sparse Parametric Encodings for Neural Real-Tim
Compacting Binary Neural Networks by Sparse Kernel Selection
Complete 3D Human Reconstruction From a Single Incomplete Imag
Compression-Aware Video Super-Resolution
Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
Consistent-Teacher Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection
Context-Aware Pretraining for Efficient Blind Image Decomposition
Cooperation or Competition Avoiding Player Domination for Multi-Target Robustness vi
Cut and Learn for Unsupervised Object Detection and Instance Segmentation
DaFKD Domain-Aware Federated Knowledge Distillation
Decoupling-and-Aggregating for Image Exposure Correction
DeepVecFont-v2 Exploiting Transformers To Synthesize Vector Fonts With Higher Quality
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit
Deep Factorized Metric Learning
Deep Hashing With Minimal-Distance-Separated Hash Centers
Deep Learning of Partial Graph Matching via Differentiable Top-K
Detecting Everything in the Open World Towards Universal Object Detection
Dionysus Recovering Scene Structures by Dividing Into Semantic Pieces
DR2 Diffusion-Based Robust Degradation Remover for Blind Face Restoration
DSVT Dynamic Sparse Voxel Transformer With Rotated Sets
Dynamically Instance-Guided Adaptation A Backward-Free Approach for Test-Time Domain Adaptiv
Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfak
EfficientSCI Densely Connected Network With Space-Time Factorization for Large-Scale Video
F2-NeRF Fast Neural Radiance Field Training With Free Camera Trajectories
FeatureBooster Boosting Feature Descriptors With a Lightweight Neural Network
Feature Alignment and Uniformity for Test Time Adaptation
FEND A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-Tail
Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment
Flow Supervision for Deformable NeRF
FrustumFormer Adaptive Instance-Aware Resampling for Multi-View 3D Detection
Generalist Decoupling Natural and Robust Generalization
Generalized UAV Object Detection via Frequency Domain Disentanglement
Glocal Energy-Based Learning for Few-Shot Open-Set Recognition
Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning
Hard Patches Mining for Masked Image Modeling
Hunting Sparsity Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation
HypLiLoc Towards Effective LiDAR Pose Regression With Hyperbolic Fusion
Imagen Editor and EditBench Advancing and Evaluating Text-Guided Image Inpainting
Images Speak in Images A Generalist Painter for In-Context Visual
Image as a Foreign Language BEiT Pretraining for Vision an
Image Cropping With Spatial-Aware Feature and Rank Consistency
Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
InternImage Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
JAWS Just a Wild Shot for Cinematic Transfer in Neural
LANA A Language-Capable Navigator for Instruction Following and Generation
Learning Bottleneck Concepts in Image Classification
Learning Conditional Attributes for Compositional Zero-Shot Learning
Learning To Detect and Segment for Open Vocabulary Object Detection
Learning Transformation-Predictive Representations for Detection and Description of Local Features
LG-BPN Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising
LiDAR2Map In Defense of LiDAR-Based Semantic Map Construction Using Onlin
LipFormer High-Fidelity and Generalizable Talking Face Generation With a Pre-Learn
Look Before You Match Instance Understanding Matters in Video Object
LP-DIF Learning Local Pattern-Specific Deep Implicit Function for 3D Objects
Masked Image Modeling With Local Multi-Scale Reconstruction
Masked Video Distillation Rethinking Masked Feature Modeling for Self-Supervised Video
MCF Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
MDL-NAS A Joint Multi-Domain Learning Framework for Vision Transform
MeMaHand Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction
MetaMix Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation
MetaViewer Towards a Unified Multi-View Representation
METransformer Radiology Report Generation by Transformer With Multiple Learnable Expert
MHPL Minimum Happy Points Learning for Active Source Free Domain
Model Barrier A Compact Un-Transferable Isolation Domain for Model Intellectual
MoLo Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition
Multi-Agent Automated Machine Learning
Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling
Multilateral Semantic Relations Modeling for Image Text Retrieval
Multimodal Industrial Anomaly Detection via Hybrid Fusion
NeMo Learning 3D Neural Motion Fields From Multiple Video Instances
Neural Fields Meet Explicit Geometric Representations for Inverse Rendering o
Neural Koopman Pooling Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
NeuWigs A Neural Dynamic Model for Volumetric Hair Capture an
Non-Line-of-Sight Imaging With Signal Superresolution Network
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Omni Aggregation Networks for Lightweight Image Super-Resolution
On Calibrating Semantic Segmentation Models Analyses and an Algorithm
On the Pitfall of Mixup for Uncertainty Calibration
Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluato
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
PDPPProjected Diffusion for Procedure Planning in Instructional Videos
PET-NeuS Positional Encoding Tri-Planes for Neural Surfaces
Pixels Regions and Objects Multiple Enhancement for Salient Object Detection
PlaneDepth Self-Supervised Depth Estimation via Orthogonal Planes
Position-Guided Text Prompt for Vision-Language Pre-Training
Practical Network Acceleration With Tiny Sets
Privacy-Preserving Adversarial Facial Features
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
Propagate and Calibrate Real-Time Passive Non-Line-of-Sight Tracking
ProphNet Efficient Agent-Centric Motion Forecasting With Anchor-Informed Proposals
ProTeGe Untrimmed Pretraining for Video Temporal Grounding by Video Temporal
PyPose A Library for Robot Learning With Physics-Based Optimization
Raw Image Reconstruction With Learned Compact Metadat
Rethinking the Correlation in Few-Shot Segmentation A Buoys View
Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition
RIFormer Keep Your Vision Backbone Effective but Removing Token Mix
Robust Multiview Point Cloud Registration With Reliable Pose Graph Initialization
RODIN A Generative Model for Sculpting 3D Digital Avatars Using
Scene-Aware Egocentric 3D Human Pose Estimation
Score Jacobian Chaining Lifting Pretrained 2D Diffusion Models for 3D
Seeing What You Said Talking Face Generation Guided by
Selective Structured State-Spaces for Long-Form Video Understanding
Semantic Scene Completion With Cleaner Sel
Semi-Supervised Parametric Real-World Image Harmonization
Sharpness-Aware Gradient Matching for Domain Generalization
SmartAssign Learning a Smart Knowledge Assignment Strategy for Deraining an
Spatial-Frequency Mutual Learning for Face Super-Resolution
SunStage Portrait Reconstruction and Relighting Using the Sun as
Task Difficulty Aware Parameter Allocation Regularization for Lifelong Learning
Towards Domain Generalization for Multi-View 3D Object Detection in Bird-Eye-View
Towards Professional Level Crowd Annotation of Expert Domain Dat
Towards Transferable Targeted Adversarial Examples
Turning Strengths Into Weaknesses A Certified Robustness Inspired Attack Framework
Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms
VideoMAE V2 Scaling Video Masked Autoencoders With Dual Masking
VL-SAT Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph
YOLOv7 Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters
Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition
Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
Vita-CLIP Video and Text Adaptive CLIP via Multimodal Prompting
Virtual Occlusions Through Implicit Depth
Power Bundle Adjustment for Large-Scale 3D Reconstruction
Removing Objects From Neural Radiance Fields
Masked Autoencoding Does Not Help Natural Language Supervision at Scal
Adaptive Graph Convolutional Subspace Clustering
Autoregressive Visual Tracking
CFA Class-Wise Calibrated Fair Adversarial Training
Enhancing the Self-Universality for Transferable Targeted Attacks
Fine-Grained Classification With Noisy Labels
Focused and Collaborative Feedback Integration for Interactive Image Segmentation
iCLIP Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual
Inferring and Leveraging Parts From Object Shape for Improving Semantic
Joint Token Pruning and Squeezing Towards More Aggressive Compression o
LEGO-Net Learning Regular Rearrangements of Objects in Rooms
MMANet Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning
Physically Adversarial Infrared Patches With Learnable Shapes and Locations
Sparsifiner Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Super-Resolution Neural Operato
TAPS3D Text-Guided 3D Textured Shape Generation From Pseudo Supervision
Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
Towards Realistic Long-Tailed Semi-Supervised Learning Consistency Is All You N
3D Human Keypoints Estimation From Point Clouds in the Wil
Event-Based Blurry Frame Interpolation Under Blind Exposu
PersonNeRF Personalized Reconstruction From Photo Collections
BundleSDF Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
CAP-VSTNet Content Affinity Preserved Versatile Style Trans
Crowd3D Towards Hundreds of People Reconstruction From a Single Imag
DIP Dual Incongruity Perceiving Network for Sarcasm Detection
Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action
Highly Confident Local Structure Based Consensus Graph Learning for Incomplet
Learnable Skeleton-Aware 3D Point Cloud Sampling
Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete an
Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation
Behind the Scenes Density Fields for Single View Reconstruction
Initialization Noise in Image Gradients and Saliency Maps
Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh
ConvNeXt V2 Co-Designing and Scaling ConvNets With Masked Autoencoders
Differentiable Shadow Mapping for Efficient Inverse Graphics
Aligning Bag of Regions for Open-Vocabulary Object Detection
Asymmetric Feature Fusion for Image Retrieval
Attention-Based Point Cloud Edge Sampling
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Languag
Boosting Detection in Crowd Analysis via Underutilized Output Features
Cap4Video What Can Auxiliary Captions Do for Text-Video Retrieval
CHMATCH Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervis
Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking
CORA Adapting CLIP for Open-Vocabulary Detection With Region Prompting an
Deep Stereo Video Inpainting
Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEnco
DropMAE Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks
EDA Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Fast Point Cloud Generation With Straight Flows
GANHead Towards Generative Animatable Neural Head Avatars
High-Fidelity 3D Face Generation From Natural Language Descriptions
Incremental 3D Semantic Scene Graph Prediction From RGB Sequences
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement
Logical Consistency and Greater Descriptive Power for Facial Hair Attribut
MagicPony Learning Articulated 3D Animals in the Wil
Masked Scene Contrast A Scalable Framework for Unsupervised 3D Representation
Multiview Compressive Coding for 3D Reconstruction
NeFII Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination
Neural Fourier Filter Bank
NewsNet A Novel Dataset for Hierarchical Temporal Segmentation
OmniObject3D Large-Vocabulary 3D Object Dataset for Realistic Perception Reconstruction an
Pix2map Cross-Modal Retrieval for Inferring Street Maps From Images
PointConvFormer Revenge of the Point-Based Convolution
Referring Multi-Object Tracking
RIDCP Revitalizing Real Image Dehazing via High-Quality Codebook Priors
SCoDA Domain Adaptive Shape Completion for Real Scans
Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus
Semi-Supervised Video Inpainting With Cycle Consistency Constraints
Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures
Spatiotemporal Self-Supervised Learning for Point Clouds in the Wil
STMixer A One-Stage Sparse Action Detecto
Switchable Representation Learning Framework With Self-Compatibility
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternat
Virtual Sparse Convolution for Multimodal 3D Object Detection
DiffusioNeRF Regularizing Neural Radiance Fields With Denoising Diffusion Models
SQUID Deep Feature In-Painting for Unsupervised Anomaly Detection
Neural Lens Modeling
3D Semantic Segmentation in the Wild Learning Generalized Models fo
CutMIB Boosting Light Field Super-Resolution via Multi-View Image Blending
DLBD A Self-Supervised Direct-Learned Binary Descripto
Endpoints Weight Fusion for Class Incremental Semantic Segmentation
Level-S2fM Structure From Motion on Neural Level Set of Implicit
LSTFE-NetLong Short-Term Feature Enhancement Network for Video Small Object Detection
Masked Images Are Counterfactual Samples for Robust Fine-Tuning
SCPNet Semantic Scene Completion on Point Clou
Structured Sparsity Learning for Efficient Video Super-Resolution
Towards Effective Visual Representations for Partial-Label Learning
VecFontSDF Learning To Reconstruct and Synthesize High-Quality Vector Fonts vi
Active Finetuning Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
Adversarially Robust Neural Architecture Search for Graph Neural Networks
An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Grou
Blemish-Aware and Progressive Face Retouching With Limited Paired Dat
Category Query Learning for Human-Object Interaction Classification
DINER Disorder-Invariant Implicit Neural Representation
Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification
GP-VTON Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing
High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization
MAESTER Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurat
OmniVidar Omnidirectional Depth Estimation From Multi-Fisheye Images
On Data Scaling in Masked Image Modeling
Poly-PC A Polyhedral Network for Multiple Point Cloud Tasks at
RA-CLIP Retrieval Augmented Contrastive Language-Image Pre-Training
Revealing the Dark Secrets of Masked Image Modeling
SmartBrush Text and Shape Guided Object Inpainting With Diffusion Model
Towards a Smaller Student Capacity Dynamic Distillation for Efficient Imag
Toward Stable Interpretable and Lightweight Hyperspectral Super-Resolution
Unpaired Image-to-Image Translation With Shortest Path Regularization
VideoTrack Learning To Track Objects via Video Transform
Visibility Aware Human-Object Interaction Tracking From Single RGB Cam
CodeTalker Speech-Driven 3D Facial Animation With Discrete Motion Prio
SVFormer Semi-Supervised Video Transformer for Action Recognition
CAPE Camera View Position Embedding for Multi-View 3D Object Detection
CASP-Net Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual
FedDM Iterative Distribution Matching for Communication-Efficient Federated Learning
Learning Compact Representations for LiDAR Completion and Generation
Neural Map Prior for Autonomous Driving
Similarity Metric Learning for RGB-Infrared Group Re-Identification
ECON Explicit Clothed Humans Optimized via Normal Integration
Egocentric Video Task Translation
Freestyle Layout-to-Image Synthesis
GarmentTracking Category-Level Garment Pose Tracking
IMP Iterative Matching and Pose Estimation With Adaptive Pooling
SFD2 Semantic-Guided Feature Detection and Description
Stare at What You See Masked Image Modeling Without Reconstruction
Abstract Visual Reasoning An Algebraic Approach for Solving Ravens Progressiv
A Unified Spatial-Angular Structured Light for Single-View Acquisition of Sh
Bias-Eliminating Augmentation Learning for Debiased Federated Learning
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
Constructing Deep Spiking Neural Networks From Artificial Neural Networks With
CXTrack Improving 3D Point Cloud Tracking With Contextual Information
DisCoScene Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scen
Dream3D Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Imag
Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection
EqMotion Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning
Gaussian Label Distribution Learning for Spherical Image Object Detection
Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection
Grid-Guided Neural Radiance Fields for Large Urban Scenes
H2ONet Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction
HandsOff Labeled Dataset Generation With No Additional Human Annotations
High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Spac
Iterative Geometry Encoding Volume for Stereo Matching
JacobiNeRF NeRF Shaping With Mutual Information Gradients
Learning Dynamic Style Kernels for Artistic Style Trans
Learning Imbalanced Data With Vision Transformers
Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
Learning To Generate Image Embeddings With User-Level Differential Privacy
Low-Light Image Enhancement via Structure Modeling and Guidanc
MEDIC Remove Model Backdoors via Importance Driven Cloning
Meta Compositional Referring Expression Segmentation
MM-3DScene 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserv
Multi-View Adversarial Discriminator Mine the Non-Causal Factors for Object Detection
MV-JAR Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
NeuralLift-360 Lifting an In-the-Wild 2D Photo to a 3D Object
OmniAvatar Geometry-Guided Controllable 3D Head Synthesis
Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models
PIDNet A Real-Time Semantic Segmentation Network Inspired by PID Controllers
Probabilistic Knowledge Distillation of Face Ensembles
Q-DETR An Efficient Low-Bit Quantized Detection Transform
Seeing Electric Network Frequency From Events
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Toward RAW Object Detection A New Benchmark and a New
Uncovering the Missing Pattern Unified Framework Towards Trajectory Imputation an
UniDexGrasp Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation
Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation
V2V4Real A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception
Video Dehazing via a Multi-Range Temporal Alignment Network With Physical
Visual-Tactile Sensing for In-Hand Object Reconstruction
Where Is My Wallet Modeling Object Proposal Sets for Egocentric
Zero-Shot Dual-Lens Super-Resolution
Zero-Shot Object Counting
Habitat-Matterport 3D Semantics Dataset
Behavioral Analysis of Vision-and-Language Navigation Agents
BEVFormer v2 Adapting Modern Image Backbones to Birds-Eye-View Recognition vi
BEVHeight A Robust Framework for Vision-Based Roadside 3D Object Detection
BiCro Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal
Bootstrap Your Own Prior Towards Distribution-Agnostic Novel Class Discovery
Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoo
Context De-Confounded Emotion Recognition
ContraNeRF Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis
DeCo Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fin
Diffusion Probabilistic Model Made Slim
Directional Connectivity-Based Segmentation of Medical Images
Efficient On-Device Training via Gradient Filtering
FreeNeRF Improving Few-Shot Neural Rendering With Free Frequency Regularization
GD-MAE Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation
Global Vision Transformer Pruning With Hessian-Aware Saliency
Good Is Bad Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification
HOTNAS Hierarchical Optimal Transport for Neural Architecture Search
IDGI A Framework To Eliminate Explanation Noise From Integrated Gradients
Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations
K3DN Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring
Language in a Bottle Language Model Guided Concept Bottlenecks fo
Learning Event Guided High Dynamic Range Video Reconstruction
MIANet Aggregating Unbiased Instance and General Information for Few-Shot Semantic
Modeling Entities As Semantic Points for Visual Information Extraction in
NeRFVS Neural Radiance Fields for Free View Synthesis via Geometry
Neural Vector Fields Implicit Representation by Explicit Learning
Neural Volumetric Memory for Visual Locomotion Control
Object Pose Estimation With Statistical Guarantees Conformal Keypoint Detection an
Paint by Example Exemplar-Based Image Editing With Diffusion Models
Panoptic Video Scene Graph Generation
POEM Reconstructing Hand in a Point Embedded Multi-View Stereo
Progressive Open Space Expansion for Open-Set Model Attribution
Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on
PVT-SSD Single-Stage 3D Object Detector With Point-Voxel Transform
QPGesture Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gestu
Reconstructing Animatable Categories From Videos
ReCo Region-Controlled Text-to-Image Generation
Relational Space-Time Query in Long-Form Videos
Resource-Efficient RGBD Aerial Tracking
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
RILS Masked Visual Reconstruction in Language Semantic Spac
Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label
TINC Tree-Structured Implicit Neural Compression
TopDiG Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images
Towards Bridging the Performance Gaps of Joint Energy-Based Models
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
UniSim A Neural Closed-Loop Sensor Simulato
VectorFloorSeg Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
Vector Quantization With Self-Attention for Quality-Independent Representation Learning
Vid2Seq Large-Scale Pretraining of a Visual Language Model for Dens
Video Event Restoration Based on Keyframes for Video Anomaly Detection
Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics
A Unified HDR Imaging Method With Pixel and Patch Level
CIMI4D A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions
GCFAgg Global and Cross-View Feature Aggregation for Multi-View Clustering
Linking Garment With Person via Semantically Associated Landmarks for Virtual
Long-Term Visual Localization With Mobile Sensors
NeRF-DS Neural Radiance Fields for Dynamic Specular Objects
PlenVDB Memory Efficient VDB-Based Radiance Fields for Fast Training an
SMAE Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders
Towards Trustable Skin Cancer Diagnosis via Rewriting Models Decision
Two-Shot Video Object Segmentation
Universal Instance Perception As Object Discovery and Retrieval
DetCLIPv2 Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection
HGNet Learning Hierarchical Geometry From Points Edges and Surfaces
Hi-LASSIE High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Imag
Large-Scale Training Data Search for Object Re-Identification
Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models
Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization
Meta-Personalizing Vision-Language Models To Find Named Instances in Video
AccelIR Task-Aware Image Compression for Accelerating Neural Restoration
Affordance Diffusion Synthesizing Hand-Object Interactions
Decoupling Human and Camera Motion From Videos in the Wil
DeepSolo Let Transformer Decoder With Explicit Points Solo for Text
DistilPose Tokenized Pose Regression With Heatmap Distillation
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
NEF Neural Edge Fields for 3D Parametric Curve Reconstruction From
Partial Network Cloning
PVO Panoptic Visual Odometry
Self-Supervised Super-Plane for Neural 3D Reconstruction
Mapping Degeneration Meets Label Evolution Learning Infrared Small Target Detection
1 VS 100 Parameter-Efficient Low Rank Adapter for Dense Predictions
3D GAN Inversion With Facial Symmetry Prio
AGAIN Adversarial Training With Attribution Span Enlargement and Hybrid Featu
GIVL Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods
Gloss Attention for Gloss-Free Sign Language Translation
Hi4D 4D Instance Segmentation of Close Human Interaction
Multi-Space Neural Radiance Fields
NeRFInvertor High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation
A Simple Framework for Text-Supervised Semantic Segmentation
Generating Holistic 3D Human Motion From Speech
MIME Human-Aware 3D Scene Generation
NAR-Former Neural Architecture Representation Learning Towards Holistic Attributes Prediction
Towards Artistic Image Aesthetics Assessment A Large-Scale Dataset an
Weakly-Supervised Single-View Image Relighting
A General Regret Bound of Preconditioned Gradient Method for DNN
Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution fo
Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional
Light Source Separation and Intrinsic Image Decomposition Under AC Illumination
Rawgment Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety
Deformable Mesh Transformer for 3D Human Mesh Recovery
Castling-ViT Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision
UTM A Unified Multiple Object Tracking Model With Identity-Aware Featu
Bi3D Bi-Domain Active Learning for Cross-Domain 3D Object Detection
Devil Is in the Queries Advancing Mask Transformers for Real-Worl
Robust Test-Time Adaptation in Dynamic Scenarios
You Are Catching My Attention Are Vision Transformers Bad Learners
Connecting the Dots Floorplan Reconstruction Using Two-Level Queries
IFSeg Image-Free Semantic Segmentation via Vision-Language Model
Accidental Light Probes
ACR Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction
Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
ANetQA A Large-Scale Benchmark for Fine-Grained Compositional Reasoning Over Untrimm
Block Selection Method for Using Feature Norm in Out-of-Distribution Detection
Boost Vision Transformer With GPU-Friendly Sparsity and Quantization
CelebV-Text A Large-Scale Facial Text-Video Dataset
Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint
Distribution Shift Inversion for Out-of-Distribution Prediction
DyLiN Making Light Field Networks Dynamic
Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point
Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning
Graphics Capsule Learning Hierarchical 3D Face Representations From 2D Images
Hint-Aug Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot
How To Prevent the Continuous Damage of Noises To Model
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations
MAGVIT Masked Generative Video Transform
Mind the Label Shift of Augmentation-Based Graph OOD Generalization
MonoHuman Animatable Human Neural Field From Monocular Video
MVImgNet A Large-Scale Dataset of Multi-View Images
On the Difficulty of Unpaired Infrared-to-Visible Video Translation Fine-Grained Content-Rich
OSRT Omnidirectional Image Super-Resolution With Distortion-Aware Transform
Overcoming the Trade-Off Between Accuracy and Plausibility in 3D Han
PanelNet Understanding 360 Indoor Environment via Panel Representation
PEAL Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration
Phase-Shifting Coder Predicting Accurate Orientation in Oriented Object Detection
Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation
Rotation-Invariant Transformer for Point Cloud Matching
Semi-Supervised Domain Adaptation With Source Label Adaptation
Task Residual for Tuning Vision-Language Models
TOPLight Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition
Turning a CLIP Model Into a Scene Text Detecto
V2X-Seq A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception an
Video Probabilistic Diffusion Models in Projected Latent Spac
X-Pruner eXplainable Pruning for Vision Transformers
Zero-Shot Referring Image Segmentation With Global-Local Context Features
Hierarchical Video-Moment Retrieval and Step-Captioning
TrainTest-Time Adaptation With Retrieval
Discovering the Real Association Multimodal Causal Reasoning in Video Question
AutoLabel CLIP-Based Framework for Open-Set Video Domain Adaptation
OCTET Object-Aware Counterfactual Explanations
3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic
CLIP2 Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Dat
ConZIC Controllable Zero-Shot Image Captioning by Sampling-Based Polishing
Deep Fair Clustering via Maximizing and Minimizing Mutual Information Theory
Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection
Learning Transferable Spatiotemporal Representations From Natural Script Knowledg
PEFAT Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation an
Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video
SceneComposer Any-Level Semantic Image Synthesis
Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion
3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
3D Registration With Maximal Cliques
Accelerating Dataset Distillation via Model Augmentation
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
Analyzing Physical Impacts Using Transient Surface Wave Imaging
A Loopback Network for Explainable Microvascular Invasion Classification
Backdoor Defense via Deconfounded Representation Learning
Blind Image Quality Assessment via Vision-Language Correspondence A Multitask Learning
Boosting Verified Training for Robust Image Classifications via Abstraction
Boosting Video Object Segmentation via Space-Time Correspondence Learning
CLAMP Prompt-Based Contrastive Learning for Connecting Language and Animal Pos
Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation
CloSET Modeling Clothed Humans on Continuous Surface With Explicit Templat
Coaching a Teachable Student
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
CompletionFormer Depth Completion With Convolutions and Vision Transformers
DA-DETR Domain Adaptive Detection Transformer With Information Fusion
Decoupling MaxLogit for Out-of-Distribution Detection
Delivering Arbitrary-Modal Semantic Segmentation
Dense Distinct Query for End-to-End Object Detection
DeSTSeg Segmentation Guided Denoising Student-Teacher for Anomaly Detection
DiffCollage Parallel Generation of Large Content With Diffusion Models
Differentiable Architecture Search With Random Features
Dimensionality-Varying Diffusion Process
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-In
Document Image Shadow Removal Guided by Color-Aware Backgroun
Efficient Map Sparsification Based on 2D and 3D Discretized Grids
Efficient RGB-T Tracking via Cross-Modality Distillation
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervis
Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervis
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video
Federated Domain Generalization With Generalization Adjustment
Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
Frame Flexible Network
Frequency-Modulated Point Cloud Rendering With Easy Editing
Generalization Matters Loss Minima Flattening via Parameter Hybridization for Efficient
Generating Human Motion From Textual Descriptions With Discrete Representations
GeoMVSNet Learning Multi-View Stereo With Geometry Perception
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
GrowSP Unsupervised Semantic Segmentation of 3D Point Clouds
Hyperspherical Embedding for Point Cloud Completion
Implicit Surface Contrastive Clustering for LiDAR Point Clouds
Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering
Improving the Transferability of Adversarial Samples by Path-Augmented Metho
Ingredient-Oriented Multi-Degradation Learning for Image Restoration
Inversion-Based Style Transfer With Diffusion Models
Layout-Based Causal Inference for Object Navigation
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Mask
Learning Debiased Representations via Conditional Attribute Interpolation
Learning Emotion Representations From Verbal and Nonverbal Communication
Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in
Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Train
Lite-Mono A Lightweight CNN and Transformer Architecture for Self-Supervised Monocul
LOGO A Long-Form Video Dataset for Group Action Quality Assessment
Lookahead Diffusion Probabilistic Models for Refining Mean Estimation
LVQAC Lattice Vector Quantization Coupled With Spatially Adaptive Companding fo
MD-VQA Multi-Dimensional Quality Assessment for UGC Live Videos
MetaPortrait Identity-Preserving Talking Head Generation With Fast Personalized Adaptation
Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning
MOTRv2 Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
MP-Former Mask-Piloted Transformer for Image Segmentation
Multi-View Stereo Representation Revist Region-Aware MVSNet
Nerflets Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation
NeuralDome A Neural Modeling Pipeline on Multi-View Human-Object Interactions
NICO Towards Better Benchmarking for Domain Generalization
Object Detection With Self-Supervised Scene Adaptation
Painting 3D Nature in 2D View Synthesis of Natural Scenes
PeakConv Learning Peak Receptive Field for Radar Semantic Segmentation
PHA Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification
PointCert Point Cloud Classification With Deterministic Certified Robustness Guarantees
PointDistiller Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
PRISE Demystifying Deep Lucas-Kanade With Strongly Star-Convex Constraints for Multimodel
PromptCAL Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel
Prompt Generate Then Cache Cascade of Foundation Models Makes Strong
Prototypical Residual Networks for Anomaly Detection and Localization
Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification
Real-Time Controllable Denoising for Image and Video
Ref-NPR Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization
Regularized Vector Quantization for Tokenized Image Synthesis
Revisiting Rotation Averaging Uncertainties and Robust Losses
Revisiting the Stack-Based Inverse Tone Mapping
SadTalker Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Singl
Seeing a Rose in Five Thousand Ways
Semi-DETR Semi-Supervised Object Detection With Detection Transformers
SINE SINgle Image Editing With Text-to-Image Diffusion Models
Skinned Motion Retargeting With Residual Perception of Motion Semantics
Starting From Non-Parametric Networks for 3D Point Cloud Analysis
Structural Multiplane Image Bridging Neural View Synthesis and 3D Reconstruction
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
TokenHPE Learning Orientation Tokens for Efficient Head Pose Estimation vi
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry
Towards Unsupervised Object Detection From LiDAR Point Clouds
Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization
Transforming Radiance Field With Lipschitz Network for Photorealistic 3D Scen
Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human
Uni3D A Unified Baseline for Multi-Dataset 3D Object Detection
UniDAformer Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask
Unlearnable Clusters Towards Label-Agnostic Unlearnable Examples
VQACL A Novel Visual Question Answering Continual Learning Setting
Weakly Supervised Segmentation With Point Annotations for Histopathology Images vi
Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal
WeatherStream Light Transport Automation of Single Image Deweathering
Wide-Angle Rectification via Content-Aware Conformal Mapping
ARKitTrack A New Diverse Dataset for Tracking Using Mobile RGB-D
Augmentation Matters A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
CDDFuse Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
Comprehensive and Delicate An Efficient Transformer for Image Restoration
DiffSwap High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
DNeRV Modeling Inherent Dynamics via Difference Neural Representation for Videos
Exploring Incompatible Knowledge Transfer in Few-Shot Image Generation
Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation
High-Frequency Stereo Matching Network
Improved Distribution Matching for Dataset Condensation
Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation
Learning Anchor Transformations for 3D Garment Animation
Learning Video Representations From Large Language Models
MetaFusion Infrared and Visible Image Fusion via Meta-Feature Embedding From
Minimizing Maximum Model Discrepancy for Transferable Black-Box Targeted Attacks
OmniAL A Unified CNN Framework for Unsupervised Anomaly Localization
Open Set Action Recognition via Multi-Label Evidential Learning
PoseFormerV2 Exploring Frequency Domain for Efficient and Robust 3D Human
Quality-Aware Pre-Trained Models for Blind Image Quality Assessment
Re2TAL Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Representation Learning for Visual Object Tracking by Masked Appearance Trans
Rethinking Gradient Projection Continual Learning Stability Plasticity Feature Spac
Search-Map-Search A Frame Selection Paradigm for Action Recognition
Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial
Streaming Video Model
The Resource Problem of Using Linear Layer Leakage Attack in
Towards Better Stability and Adaptability Improve Online Self-Training for Model
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
Both Style and Distortion Matter Dual-Path Unsupervised Domain Adaptation fo
CAMS CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis
Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing
CVT-SLR Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational
EditableNeRF Editing Topologically Varying Neural Radiance Fields by Key Points
EXIF As Language Learning Cross-Modal Associations Between Images and Cam
FeatER An Efficient Network for Human Reconstruction via Feature Map-Bas
HairStep Transfer Synthetic to Real Using Strand and Depth Maps
HS-Pose Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation
LayoutDiffusion Controllable Diffusion Model for Layout-to-Image Generation
Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting
NeuFace Realistic 3D Neural Face Rendering From Multi-View Images
NeuralPCI Spatio-Temporal Neural Field for 3D Point Cloud Multi-Frame Non-Lin
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
PointAvatar Deformable Point-Based Head Avatars From Videos
POTTER Pooling Attention Transformer for Efficient Human Mesh Recovery
Prototype-Based Embedding Network for Scene Graph Generation
TrojViT Trojan Insertion in Vision Transformers
Where Is My Spot Few-Shot Image Generation via Latent Subspac
Decentralized Learning With Multi-Headed Distillation
Blur Interpolation Transformer for Real-World Motion From Blu
Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Understanding Imbalanced Semantic Segmentation Through Neural Collaps
Adaptive Sparse Pairwise Loss for Object Re-Identification
BEVDC Birds-Eye View Assisted Training for Depth Completion
Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition
Efficient Second-Order Plane Adjustment
Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
How Can Objects Help Action Recognition
Human Body Shape Completion With Implicit Shape and Flow Learning
HyperMatch Noise-Tolerant Semi-Supervised Learning via Relaxed Contrastive Constraint
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test G
Instance-Aware Domain Generalization for Face Anti-Spoofing
Interactive Segmentation As Gaussion Process Classification
Joint Visual Grounding and Tracking With Natural Language Specification
Learning Discriminative Representations for Skeleton Based Action Recognition
MonoATT Online Monocular 3D Object Detection With Adaptive Token Transform
Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on
NeRFLix High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-Viewpoint
NeRF in the Palm of Your Hand Corrective Augmentation fo
Neural Texture Synthesis With Guided Correspondenc
Non-Contrastive Learning Meets Language-Image Pre-Training
OcTr Octree-Based Transformer for 3D Object Detection
Procedure-Aware Pretraining for Instructional Video Understanding
Query-Centric Trajectory Prediction
Relightable Neural Human Assets From Multi-View Gradient Illuminations
RepMode Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction
Revisiting Prototypical Network for Cross Domain Few-Shot Learning
Shifted Diffusion for Text-to-Image Generation
SparseFusion Distilling View-Conditioned Diffusion for 3D Reconstruction
STAR Loss Reducing Semantic Ambiguity in Facial Landmark Detection
Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection
The Treasure Beneath Multiple Annotations An Uncertainty-Aware Edge Detecto
UDE A Unified Driving Engine for Human Motion Generation
UniDistill A Universal Cross-Modality Knowledge Distillation Framework for 3D Object
Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow
ZegCLIP Towards Adapting CLIP for Zero-Shot Semantic Segmentation
Deep Semi-Supervised Metric Learning With Mixed Label Propagation
GKEAL Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental
Towards Stable Human Pose Estimation via Cross-View Fusion and Foot
BiFormer Vision Transformer With Bi-Level Routing Attention
Conditional Text Image Generation With Diffusion Models
Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization
ConQueR Query Contrast Voxel-DETR for 3D Object Detection
Continual Semantic Segmentation With Automatic Memory Sample Selection
Curricular Object Manipulation in LiDAR-Based Object Detection
E2PN Efficient SE3-Equivariant Point Network
EXCALIBUR Encouraging and Evaluating Embodied Exploration
I2-SDF Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in
IPCC-TP Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory
Knowledge Combination To Learn Rotated Detection Without Rotated Annotation
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multipl
LightedDepth Video Depth Estimation in Light of Limited Inference View
NerVE Neural Volumetric Edges for Parametric Curve Extraction From Point
Occlusion-Free Scene Recovery via Neural Radiance Fields
OpenMix Exploring Outlier Samples for Misclassification Detection
Patch-Mix Transformer for Unsupervised Domain Adaptation A Game Perspectiv
PMatch Paired Masked Image Modeling for Dense Geometric Matching
Probability-Based Global Cross-Modal Upsampling for Pansharpening
R2Former Unified Retrieval and Reranking Transformer for Place Recognition
ScaleKD Distilling Scale-Aware Knowledge in Small Object Detecto
STMT A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
TopNet Transformer-Based Object Placement Network for Image Compositing
Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph
TryOnDiffusion A Tale of Two UNets
Understanding the Robustness of 3D Object Detection With Birds-Eye-View Representations
VDN-NeRF Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Visual Prompt Multi-Modal Tracking
Instant Volumetric Head Avatars
Multi-View Reconstruction Using Signed Ray Distance Functions SRDF
AutoFocusFormer Image Segmentation off the Gri
PROB Probabilistic Objectness for Open World Object Detection
CLOTH4D A Dataset for Clothed Human Reconstruction
Generalized Decoding for Pixel Image and Languag
Natural Language-Assisted Sign Language Recognition