WACV2024论文集

PDA-RWSR Pixel-Wise Degradation Adaptive Real-World Super-Resolution
ScanEnts3D Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D
VD-GR Boosting Visual Dialog With Cascaded Spatial-Temporal Multi-Modal Graphs
ParticleNeRF A Particle-Based Encoding for Online Neural Radiance Fields
Real Time GAZED Online Shot Selection and Editing of Virtual
SeaTurtleID2022 A Long-Span Dataset for Reliable Sea Turtle Re-Identification
ARNIQA Learning Distortion Manifold for Image Quality Assessment
Reference-Based Restoration of Digitized Analog Videotapes
Causal Analysis for Robust Interpretability of Neural Networks
Unsupervised Co-Generation of Foreground-Background Segmentation From Text-to-Image Synthesis
OptFlow Fast Optimization-Based Scene Flow Estimation Without Supervision
Cross-Feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Dat
A Coarse-To-Fine Pseudo-Labeling C2FPL Framework for Unsupervised Video Anomaly Detection
Optimizing Long-Term Robot Tracking With Multi-Platform Sensor Fusion
OVeNet Offset Vector Network for Semantic Segmentation
P-Age Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification
Self-Supervised Learning With Masked Autoencoders for Teeth Segmentation From Intra-Oral
DDAM-PS Diligent Domain Adaptive Mixer for Person Search
Domain Generalization by Rejecting Extreme Augmentations
Late to the Party On-Demand Unlabeled Personalized Federated Learning
Self-Supervised Learning for Visual Relationship Detection Through Masked Bounding Box
Elusive Images Beyond Coarse Analysis for Fine-Grained Recognition
Amodal Intra-Class Instance Segmentation Synthetic Datasets and Benchmark
High-Fidelity Zero-Shot Texture Anomaly Localization Using Feature Correspondence Analysis
Blurry Video Compression A Trade-Off Between Visual Enhancement and Dat
Hybrid Sample Synthesis-Based Debiasing of Classifier in Limited Data Setting
TransFed A Way To Epitomize Focal Modulation Using Transformer-Based Federat
Continuous Adaptation for Interactive Segmentation Using Teacher-Student Architectu
Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling
Beyond Self-Attention Deformable Large Kernel Attention for Medical Image Segmentation
EmoStyle One-Shot Facial Expression Editing Using Continuous Emotion Parameters
Neural Echos Depthwise Convolutional Filters Replicate Biological Receptive Fields
Temporally-Consistent Video Semantic Segmentation With Bidirectional Occlusion-Guided Feature Propagation
Partial Binarization of Neural Networks for Budget-Aware Efficient Learning
AMEND Adaptive Margin and Expanded Neighborhood for Efficient Generalized Category
United We Stand Divided We Fall UnityGraph for Unsupervised Procedu
Weakly-Supervised Representation Learning for Video Alignment and Analysis
ProcSim Proxy-Based Confidence for Robust Similarity Learning
Fixed Pattern Noise Removal for Multi-View Single-Sensor Infrared Cam
FOSSIL Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval
MoRF Mobile Realistic Fullbody Avatars From a Monocular Video
EfficientAD Accurate Visual Anomaly Detection at Millisecond-Level Latencies
Beyond Active Learning Leveraging the Full Potential of Human Interaction
Multi-Source Domain Adaptation for Object Detection With Prototype-Based Mean Teach
Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization
Adversarial Likelihood Estimation With One-Way Flows
IKEA Ego 3D Dataset Understanding Furniture Assembly Actions From Ego-View
Volumetric Disentanglement for 3D Scene Manipulation
PETIT-GAN Physically Enhanced Thermal Image-Translating Generative Adversarial Network
NOMAD A Natural Occluded Multi-Scale Aerial Dataset for Emergency Respons
Whats Outside the Intersection Fine-Grained Error Analysis for Semantic Segmentation
Guided Distillation for Semi-Supervised Instance Segmentation
EvDNeRF Reconstructing Event Data With Dynamic Neural Radiance Fields
TriPlaneNet An Encoder for EG3D Inversion
HALSIE Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Imag
Multi-View Classification Using Hybrid Fusion and Mutual Distillation
ArtQuest Countering Hidden Language Biases in ArtVQ
Feed-Forward Latent Domain Adaptation
From Chaos to Calibration A Geometric Mutual Information Approach To
STYLIP Multi-Scale Style-Conditioned Prompt Learning for CLIP-Based Domain Generalization
FOUND Foot Optimization With Uncertain Normals for Surface Deformation Using
SupeRVol Super-Resolution Shape and Reflectance Estimation in Inverse Volume Rendering
Investigating the Role of Attribute Context in Vision-Language Models fo
MEGANet Multi-Scale Edge-Guided Attention Network for Weak Boundary Polyp Segmentation
UOW-Vessel A Benchmark Dataset of High-Resolution Optical Satellite Images fo
CHAI Craters in Historical Aerial Images
Spiking Denoising Diffusion Probabilistic Models
What Decreases Editing Capability Domain-Specific Hybrid Refinement for Improved GAN
ClusterFix A Cluster-Based Debiasing Approach Without Protected-Group Supervision
Pixel-Grounded Prototypical Part Networks
Location-Aware Self-Supervised Transformers for Semantic Segmentation
WildlifeDatasets An Open-Source Toolkit for Animal Re-Identification
Unsupervised and Semi-Supervised Co-Salient Object Detection via Segmentation Frequency Statistics
Learning-Based Spotlight Position Optimization for Non-Line-of-Sight Human Localization and Postu
PhISH-Net Physics Inspired System for High Resolution Underwater Image Enhancement
BEVMap Map-Aware BEV Modeling for 3D Perception
Fast Sun-Aligned Outdoor Scene Relighting Based on TensoRF
FLORA Fine-Grained Low-Rank Architecture Search for Vision Transform
LibreFace An Open-Source Toolkit for Deep Facial Expression Analysis
Shape From Shading for Robotic Manipulation
Continual Learning of Unsupervised Monocular Depth From Videos
3D Reconstruction of Interacting Multi-Person in Clothing From a Singl
NCIS Neural Contextual Iterative Smoothing for Purifying Adversarial Perturbations
Stereo Matching in Time 100 FPS Video Stereo Matching fo
A Sequential Learning-Based Approach for Monocular Human Performance Captu
Depth From Asymmetric Frame-Event Stereo A Divide-and-Conquer Approach
Letting 3D Guide the Way 3D Guided 2D Few-Shot Imag
Longformer Longitudinal Transformer for Alzheimers Disease Classification With Structural MRIs
Panelformer Sewing Pattern Reconstruction From 2D Garment Images
Pixel Matching Network for Cross-Domain Few-Shot Segmentation
Residual Graph Convolutional Network for Birds-Eye-View Semantic Segmentation
SCUNet Swin-UNet and CNN Bottleneck Hybrid Architecture With Multi-Fusion Dens
Show Your Face Restoring Complete Facial Images From Partial Observations
Training-Free Layout Control With Cross-Attention Guidanc
FIRE Food Image to REcipe Generation
Classifying Cable Tendency With Semantic Segmentation by Utilizing Real an
Masking Improves Contrastive Self-Supervised Learning for ConvNets and Saliency Tells
Re-Evaluating LiDAR Scene Flow
Dual Domain Diffusion Guidance for 3D CBCT Metal Artifact Reduction
P2D Plug and Play Discriminator for Accelerating GAN Frameworks
Bipartite Graph Diffusion Model for Human Interaction Generation
Interactive Network Perturbation Between Teacher and Students for Semi-Supervised Semantic
RMFER Semi-Supervised Contrastive Learning for Facial Expression Recognition With Reaction
Slice and Conquer A Planar-to-3D Framework for Efficient Interactive Segmentation
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Single Domain Generalization via Normalised Cross-Correlation Based Convolutions
PreciseDebias An Automatic Prompt Engineering Approach for Generative AI To
Membership Inference Attack Using Self Influence Functions
Simple Post-Training Robustness Using Test Time Augmentations and Random Forest
BSRAW Improving Blind RAW Image Super-Resolution
ZRG A Dataset for Multimodal 3D Residential Rooftop Understanding
LatentPaint Image Inpainting in Latent Space With Diffusion Models
PsyMo A Dataset for Estimating Self-Reported Psychological Traits From Gait
PECoP Parameter Efficient Continual Pretraining for Action Quality Assessment
TransRadar Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation
Automated Camera Calibration via Homography Estimation With GNNs
IR-FRestormer Iterative Refinement With Fourier-Based Restormer for Accelerated MRI Reconstruction
Harnessing the Power of Multi-Lingual Datasets for Pre-Training Towards Enhancing
Limited Data Unlimited Potential A Study on ViTs Augmented by
FishTrack23 An Ensemble Underwater Dataset for Multi-Object Tracking
RGBT-Dog A Parametric Model and Pose Prior for Canine Body
RGB-X Object Detection via Scene-Specific Fusion Modules
Learning the What and How of Annotation in Video Object
How Do Deepfakes Move Motion Magnification for Deepfake Source Detection
Expanding Hyperspherical Space for Few-Shot Class-Incremental Learning
Ray Deformation Networks for Novel View Synthesis of Refractive Objects
Textual Alchemy CoFormer for Scene Text Understanding
Context in Human Action Through Motion Complementarity
CycleCL Self-Supervised Learning for Periodic Videos
AnyStar Domain Randomized Universal Star-Convex 3D Instance Segmentation
Nardin A One-Shot Learning Approach To Document Layout Segmentation of Ancient
Plaen Contrastive Learning for Multi-Object Tracking With Transformers
Improving Fairness Using Vision-Language Driven Image Augmentation
Estimating Fog Parameters From an Image Sequence Using Non-Linear Optimisation
PATROL Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks
CARE Counterfactual-Based Algorithmic Recourse for Explainable Pose Correction
ProS Facial Omni-Representation Learning via Prototype-Based Self-Distillation
Do VSR Models Generalize Beyond LRS3
Learning Saliency From Fixations
Physical-Space Multi-Body Mesh Detection Achieved by Local Alignment and Global
Understanding Dark Scenes by Contrasting Multi-Modal Observations
A Multimodal Benchmark and Improved Architecture for Zero Shot Learning
Semantic Generative Augmentations for Few-Shot Counting
RobustCLEVR A Benchmark and Framework for Evaluating Robustness in Object-Centric
Evidential Uncertainty Quantification A Variance-Based Perspectiv
Mining and Unifying Heterogeneous Contrastive Relations for Weakly-Supervised Actor-Action Segmentation
Towards More Realistic Membership Inference Attacks on Large Diffusion Models
Tracking Skiers From the Top to the Bottom
HMP Hand Motion Priors for Pose and Shape Estimation From
POISE Pose Guided Human Silhouette Extraction Under Occlusions
Real-Time 6-DoF Pose Estimation by an Event-Based Camera Using Activ
Driving Through the Concept Gridlock Unraveling Explainability Bottlenecks in Automat
A Generic and Flexible Regularization Framework for NeRFs
Leveraging Bitstream Metadata for Fast Accurate Generalized Compressed Video Quality
Nested Diffusion Processes for Anytime Image Generation
DR10K Transfer Learning Using Weak Labels for Grading Diabetic Retinopathy
Mixing Gradients in Neural Networks as a Strategy To Enhanc
Exploiting the Signal-Leak Bias in Diffusion Models
Data Augmentation for Object Detection via Controllable Diffusion Models
Dynamic Multimodal Information Bottleneck for Multimodality Classification
Face Presentation Attack Detection by Excavating Causal Clues and Adapting
CryoRL Reinforcement Learning Enables Efficient Cryo-EM Data Collection
DeVos Flow-Guided Deformable Transformer for Video Object Segmentation
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Languag
Seeing Stars Learned Star Localization for Narrow-Field Astrometry
3D Face Style Transfer With a Hybrid Solution of NeRF
RankDVQA Deep VQA Based on Ranking-Inspired Hybrid Training
MagneticPillars Efficient Point Cloud Registration Through Hierarchized Birds-Eye-View Cell Correspondenc
Deep Optics for Optomechanical Control Policy Design
dacl10k Benchmark for Semantic Bridge Damage Segmentation
Unsupervised Event-Based Video Reconstruction
InfraParis A Multi-Modal and Multi-Task Autonomous Driving Dataset
Automated Sperm Assessment Framework and Neural Network Specialized for Sperm
DTrOCR Decoder-Only Transformer for Optical Character Recognition
Few-Shot Generative Model for Skeleton-Based Human Action Synthesis Using Cross-Domain
Unsupervised Model-Based Learning for Simultaneous Video Deflickering and Deblotching
AssemblyNet A Point Cloud Dataset and Benchmark for Predicting Part
Generalizing to Unseen Domains in Diabetic Retinopathy Classification
Unified Concept Editing in Diffusion Models
An Empirical Investigation Into Benchmarking Model Multiplicity for Trustworthy Machin
CLIPAG Towards Generator-Free Text-to-Image Generation
Self-Supervised Representation Learning With Cross-Context Learning Between Global and Hypercolumn
STEP - Towards Structured Scene-Text Spotting
SphereCraft A Dataset for Spherical Keypoint Detection Matching and Cam
Towards a Dynamic Vision Sensor-Based Insect Camera T
FacadeNet Conditional Facade Synthesis via Selective Editing
Co-Speech Gesture Detection Through Multi-Phase Sequence Labeling
SigmML Metric Meta-Learning for Writer Independent Offline Signature Verification in
Do We Still Need Non-Maximum Suppression Accurate Confidence Estimates an
Beyond RGB A Real World Dataset for Multispectral Imaging in
So You Think You Can Track
Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning
Domain Aligned CLIP for Few-Shot Classification
Separable Self and Mixed Attention Transformers for Efficient Object Tracking
SynergyNet Bridging the Gap Between Discrete and Continuous Representations fo
ISAR A Benchmark for Single- and Few-Shot Object Instance Segmentation
Active Batch Sampling for Multi-Label Classification With Binary User Feedback
Whats in the Flow Exploiting Temporal Motion Cues for Unsupervis
Learning Intra-Class Multimodal Distributions With Orthonormal Matrices
PressureVision Estimating Fingertip Pressure From Diverse RGB Images
WATCH Wide-Area Terrestrial Change Hypercub
The Paleographers Eye ex machina Using Computer Vision To Assist
TIAM - A Metric for Evaluating Alignment in Text-to-Image Generation
JOADAA Joint Online Action Detection and Action Anticipation
Boosting Weakly Supervised Object Detection Using Fusion and Priors From
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO
Robust Object Detection in Challenging Weather Conditions
Torque Based Structured Pruning for Deep Neural Network
You Can Run but Not Hide Improving Gait Recognition With
Deep Metric Learning With Chance Constraints
Single Frame Semantic Segmentation Using Multi-Modal Spherical Images
Complex Organ Mask Guided Radiology Report Generation
Solving the Plane-Sphere Ambiguity in Top-Down Structure-From-Motion
Tracking Tiny Insects in Cluttered Natural Environments Using Refinable Recurrent
Watch Where You Head A View-Biased Domain Gap in Gait
BoostRad Enhancing Object Detection by Boosting Radar Reflections
Efficient MAE Towards Large-Scale Vision Transformers
Hybrid Neural Diffeomorphic Flow for Shape Representation and Generation vi
ProxEdit Improving Tuning-Free Real Image Editing With Proximal Guidanc
Diffusion-Based Generation of Histopathological Whole Slide Images at a Gigapixel
LInKs Lifting Independent Keypoints - Partial Pose Lifting for Occlusion
Learning To Generate Training Datasets for Robust Semantic Segmentation
FinderNet A Data Augmentation Free Canonicalization Aided Loop Detection an
Improving Graph Networks Through Selection-Based Convolution
Text-Guided Face Recognition Using Multi-Granularity Cross-Modal Contrastive Learning
Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervis
ArcAid Analysis of Archaeological Artifacts Using Drawings
Attentive Prototypes for Source-Free Unsupervised Domain Adaptive 3D Object Detection
Active Learning With Task Consistency and Diversity in Multi-Task Networks
Monocular 3D Object Detection With LiDAR Guided Semi Supervised Activ
NITEC Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction
Registered and Segmented Deformable Object Reconstruction From a Single View
PromptonomyViT Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scen
Prototype Learning for Explainable Brain Age Prediction
LidarCLIP or How I Learned To Talk to Point Clouds
Learning Transferable Representations for Image Anomaly Localization Using Dense Pretraining
Sound3DVDet 3D Sound Source Detection Using Multiview Microphone Array an
MS-EVS Multispectral Event-Based Vision for Deep Learning Based Face Detection
CLID Controlled-Length Image Descriptions With Limited Dat
Random Walks for Temporal Action Segmentation With Timestamp Supervision
Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-Based Gaze Estimation
Learn To Unlearn for Deep Neural Networks Minimizing Unlearning Interferenc
CLRerNet Improving Confidence of Lane Detection With LaneIoU
Concept-Centric Transformers Enhancing Model Interpretability Through Object-Centric Concept Learning Within
Robust Eye Blink Detection Using Dual Embedding Video Vision Transform
D4 Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles
Framework-Agnostic Semantically-Aware Global Reasoning for Segmentation
Multi-Modal Gaze Following in Conversational Scenarios
Natural Light Can Also Be Dangerous Traffic Sign Misinterpretation Un
Removing the Quality Tax in Controllable Face Generation
Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution
Semantic Fusion Augmentation and Semantic Boundary Detection A Novel Approach
SOAP Cross-Sensor Domain Adaptation for 3D Object Detection Using Stationary
Bias and Diversity in Synthetic-Based Face Recognition
Efficient Explainable Face Verification Based on Similarity Score Argument Backpropagation
Are Natural Domain Foundation Models Useful for Medical Image Classification
Synthesizing Anyone Anywhere in Any Pos
CSAM A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical
Expanding Expressiveness of Diffusion Models With Limited Data via Self-Distillation
Embodied Human Activity Recognition
Learning To Adapt CLIP for Few-Shot Monocular Depth Estimation
ReCLIP Refine Contrastive Language Image Pre-Training With Source Free Domain
Temporal Context Enhanced Referring Video Object Segmentation
Booster-SHOT Boosting Stacked Homography Transformations for Multiview Pedestrian Detection With
EASUM Enhancing Affective State Understanding Through Joint Sentiment and Emotion
ReConPatch Contrastive Patch Representation Learning for Industrial Anomaly Detection
Adaptive Deep Neural Network Inference Optimization With EENet
Tunable Hybrid Proposal Networks for the Open Worl
Think Before You Simulate Symbolic Reasoning To Orchestrate Neural Computation
Learnable Cube-Based Video Encryption for Privacy-Preserving Action Recognition
Visually Guided Audio Source Separation With Meta Consistency Learning
Specular Object Reconstruction Behind Frosted Glass by Differentiable Rendering
Controlling Rate Distortion and Realism Towards a Single Comprehensive Neural
CCMR High Resolution Optical Flow Estimation via Coarse-To-Fine Context-Guided Motion
Stochastic Binary Network for Universal Domain Adaptation
M33D Learning 3D Priors Using Multi-Modal Masked Autoencoders for 2D
Composite Diffusion whole Sparts
Robust Unsupervised Domain Adaptation Through Negative-View Regularization
Designing a Hybrid Neural System To Learn Real-World Crack Segmentation
Text-to-Image Models for Counterfactual Explanations A Black-Box Approach
WaveMixSR Resource-Efficient Neural Network for Image Super-Resolution
EResFD Rediscovery of the Effectiveness of Standard Convolution for Lightweight
Training-Free Content Injection Using H-Space in Diffusion Models
USDN A Unified Sample-Wise Dynamic Network With Mixed-Precision and Early-Exit
Back to Optimization Diffusion-Based Zero-Shot 3D Human Pose Estimation
Neural Image Compression Using Masked Sparse Visual Representation
Army of Thieves Enhancing Black-Box Model Extraction via Ensemble Bas
iBARLE imBalance-Aware Room Layout Estimation
Unsupervised 3D Pose Estimation With Non-Rigid Structure-From-Motion Modeling
Semantic Labels-Aware Transformer Model for Searching Over a Large Collection
S3AD Semi-Supervised Small Apple Detection in Orchard Environments
High-Fidelity Pseudo-Labels for Boosting Weakly-Supervised Segmentation
Iterative Multi-Granular Image Editing Using Diffusion Models
ConfTrack Kalman Filter-Based Multi-Person Tracking by Utilizing Confidence Score o
SC-MIL Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in
Improving Fairness in Deepfake Detection
Robust Feature Learning and Global Variance-Driven Classifier Alignment for Long-Tail
Intrinsic Hand Avatar Illumination-Aware Hand Appearance and Shape Reconstruction From
Critical Gap Between Generalization Error and Empirical Error in Activ
MetaSeg MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation
Privacy-Enhancing Person Re-Identification Framework - A Dual-Stage Approach
ShadowSense Unsupervised Domain Adaptation and Feature Fusion for Shadow-Agnostic T
HaGRID -- HAnd Gesture Recognition Image Dataset
Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species
The Background Also Matters Background-Aware Motion-Guided Objects Discovery
Real-Time Weakly Supervised Video Anomaly Detection
AvatarOne Monocular 3D Human Animation
Synergizing Contrastive Learning and Optimal Transport for 3D Point Clou
Label Augmentation As Inter-Class Data Augmentation for Conditional Image Synthesis
Revisiting Latent Space of GAN Inversion for Robust Real Imag
Soft Curriculum for Learning Conditional GANs With Noisy-Labeled and Uncurat
MIDAS Mixing Ambiguous Data With Soft Labels for Dynamic Facial
INCODE Implicit Neural Conditioning With Prior Knowledge Embeddings
Robust TRISO-Fueled Pebble Identification by Digit Recognition
Leveraging Synthetic Data To Learn Video Stabilization Under Adverse Conditions
Estimating Blood Alcohol Level Through Facial Features for Driver Impairment
Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents
A Hybrid Graph Network for Complex Activity Detection in Video
CamoFocus Enhancing Camouflage Object Detection With Split-Feature Focal Modulation an
Spectroformer Multi-Domain Query Cascaded Transformer Network for Underwater Image Enhancement
Lightweight Delivery Detection on Doorbell Cameras
Improving Normalization With the James-Stein Estimato
Adaptive Latent Diffusion Model for 3D Medical Image to Imag
A Atrous Spatial Temporal Action Recognition for Real Time Applications
Controllable Text-to-Image Synthesis for Multi-Modality MR Images
Efficient Semantic Matching With Hypercolumn Correlation
Enhancing Diverse Intra-Identity Representation for Visible-Infrared Person Re-Identification
Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspectiv
Human Motion Aware Text-to-Video Generation With Explicit Camera Control
Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction
Learning Residual Elastic Warps for Image Stitching Under Dirichlet Boundary
LensNeRF Rethinking Volume Rendering Based on Thin-Lens Camera Model
MICS Midpoint Interpolation To Learn Compact and Separated Representations fo
Offline-to-Online Knowledge Distillation for Video Instance Segmentation
Randomized Adversarial Style Perturbations for Domain Generalization
Token Fusion Bridging the Gap Between Token Pruning and Token
Out-of-Distribution Detection With Logical Reasoning
Masked Event Modeling Self-Supervised Pretraining for Event Cameras
Spatio-Temporal Filter Analysis Improves 3D-CNN for Action Classification
SGRec3D Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction
RecycleNet Latent Feature Recycling Leads to Iterative Decision Refinement
Multi-Class Segmentation From Aerial Views Using Recursive Noise Diffusion
Top-Down Beats Bottom-Up in 3D Instance Segmentation
SimA Simple Softmax-Free Attention for Vision Transformers
ZIGNeRF Zero-Shot 3D Scene Representation With Invertible Generative Neural Radianc
MAELi Masked Autoencoder for Large-Scale LiDAR Point Clouds
Image Denoising and the Generative Accumulation of Photons
ATS Adaptive Temperature Scaling for Enhancing Out-of-Distribution Detection Methods
AU-Aware Dynamic 3D Face Reconstruction From Videos With Transform
Textron Weakly Supervised Multilingual Text Detection Through Data Programming
C2AIR Consolidated Compact Aerial Image Haze Removal
Learning to Detour Shortcut Mitigating Augmentation for Weakly Supervised Semantic
Self-Supervised Learning of Semantic Correspondence Using Web Videos
A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping
Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models
Gradient-Guided Knowledge Distillation for Object Detectors
Fast Diffusion EM A Diffusion Model for Blind Inverse Problems
ENTED Enhanced Neural Texture Extraction and Distribution for Reference-Based Blin
Label-Free Synthetic Pretraining of Object Detectors
Adaptive Manifold for Imbalanced Transductive Few-Shot Learning
GLAD Global-Local View Alignment and Background Debiasing for Unsupervised Video
Hard Sample-Aware Consistency for Low-Resolution Facial Expression Recognition
HELA-VFA A Hellinger Distance-Attention-Based Feature Aggregation Network for Few-Shot Classification
Meta-Learned Kernel for Blind Super-Resolution Kernel Estimation
PIDiffu Pixel-Aligned Diffusion Model for High-Fidelity Clothed Human Reconstruction
PoseDiff Pose-Conditioned Multimodal Diffusion Model for Unbounded Scene Synthesis From
Pruning From Scratch via Shared Pruning Module and Nuclear Norm-Bas
RADIO Reference-Agnostic Dubbing Video Synthesis
Re-VoxelDet Rethinking Neck and Head Architectures for High-Performance Voxel-Based 3D
Real-Time User-Guided Adaptive Colorization With Vision Transform
Semi-Supervised Scene Change Detection by Distillation From Feature-Metric Alignment
Sharp-NeRF Grid-Based Fast Deblurring Neural Radiance Fields Using Sharpness Prio
UGPNet Universal Generative Prior for Image Restoration
UNSPAT Uncertainty-Guided SpatioTemporal Transformer for 3D Human Pose and Sh
Self-Sampling Meta SAM Enhancing Few-Shot Medical Image Segmentation With Meta-Learning
Learning to Read Analog Gauges from Synthetic Dat
Linking Convolutional Kernel Size to Generalization Bias in Face Analysis
Multi-View 3D Object Reconstruction and Uncertainty Modelling With Neural Sh
Progressive Hypothesis Transformer for 3D Human Mesh Recovery
CAMOT Camera Angle-Aware Multi-Object Tracking
MetaVers Meta-Learned Versatile Representations for Personalized Federated Learning
Common Diffusion Noise Schedules and Sample Steps Are Flaw
Ego2HandsPose A Dataset for Egocentric Two-Hand 3D Global Pose Estimation
FastSR-NeRF Improving NeRF Efficiency on Consumer Devices With a Simpl
MPT Mesh Pre-Training With Transformers for Human Pose and Mesh
Restoring Degraded Old Films With Recursive Recurrent Transformer Networks
Spiking Neural Networks for Active Time-Resolved SPAD Imaging
Annotation-Free Audio-Visual Segmentation
Bi-Directional Training for Composed Image Retrieval via Text Prompt Learning
BPKD Boundary Privileged Knowledge Distillation for Semantic Segmentation
Detecting Content Segments From Online Sports Streaming Events Challenges an
Dynamic Token-Pass Transformers for Semantic Segmentation
Efficient Feature Distillation for Zero-Shot Annotation Object Detection
FarSight A Physics-Driven Whole-Body Biometric System at Large Distance an
Generation of Upright Panoramic Image From Non-Upright Panoramic Imag
Global Occlusion-Aware Transformer for Robust Stereo Matching
LatentDR Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration
Let the Beat Follow You - Creating Interactive Drum Sounds
Rethinking Knowledge Distillation With Raw Features for Semantic Segmentation
Revisiting Token Pruning for Object Detection and Instance Segmentation
Tackling Data Bias in MUSIC-AVQA Crafting a Balanced Dataset fo
U3DS3 Unsupervised 3D Semantic Scene Segmentation
Wakening Past Concepts Without Past Data Class-Incremental Learning From Onlin
Bridging the Gap Between Multi-Focus and Multi-Modal A Focused Integration
Controlling Character Motions Without Observable Driving Sourc
Controlling Virtual Try-On Pipeline Through Rendering Policies
CPSeg Finer-Grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
Disentangled Pre-Training for Image Matting
Efficient Layout-Guided Image Inpainting for Mobile Us
Enforcing Sparsity on Latent Space for Robust and Explainable Representations
Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person
Neural Style Protection Counteracting Unauthorized Neural Style Trans
OTAS Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation
PromptAD Zero-Shot Anomaly Detection Using Text Prompts
Repetitive Action Counting With Motion Feature Learning
Robust Source-Free Domain Adaptation for Fundus Image Segmentation
SDNet An Extremely Efficient Portrait Matting Model via Self-Distillation
Steering Prototypes With Prompt-Tuning for Rehearsal-Free Continual Learning
Task-Oriented Human-Object Interactions Generation With Implicit Neural Representations
TCP Triplet Contrastive-Relationship Preserving for Class-Incremental Learning
Video Instance Matting
VMFormer End-to-End Video Matting With Transform
A Neural Height-Map Approach for the Binocular Photometric Stereo Problem
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
PlantPlotGAN A Physics-Informed Generative Adversarial Network for Plant Disease Prediction
Differentially Private Video Activity Recognition
Zero-Shot Video Moment Retrieval From Frozen Vision-Language Models
Deblur-NSFF Neural Scene Flow Fields for Blurry Dynamic Scenes
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
SBCFormer Lightweight Network Capable of Full-Size ImageNet Classification at 1
SLoSH Set Locality Sensitive Hashing via Sliced-Wasserstein Embeddings
Towards Visual Saliency Explanations of Face Verification
Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval
Uncertainty-Weighted Loss Functions for Improved Adversarial Attacks on Semantic Segmentation
CL-MAE Curriculum-Learned Masked Autoencoders
Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
SSVOD Semi-Supervised Video Object Detection With Sparse Annotations
OE-CTST Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly
LIVENet A Novel Network for Real-World Low-Light Image Denoising an
Taming Normalizing Flows
One Style Is All You Need To Generate a Video
Mini but Mighty Finetuning ViTs With Mini Adapters
MonoProb Self-Supervised Monocular Depth Estimation With Interpretable Uncertainty
Universal Test-Time Adaptation Through Weight Ensembling Diversity Weighting and Prio
Training-Based Model Refinement and Representation Disagreement for Semi-Supervised Object Detection
Object Aware Contrastive Prior for Interactive Image Segmentation
Indoor Visual Localization Using Point and Line Correspondences in Dens
A Geometry Loss Combination for 3D Human Pose Estimation
Beyond SOT Tracking Multiple Generic Objects at Onc
Learning Low-Rank Latent Spaces With Simple Deterministic Autoencoder Theoretical an
CVTHead One-Shot Controllable Head Avatar With Vertex-Feature Transform
MACP Efficient Model Adaptation for Cooperative Perception
Joint 3D Shape and Motion Estimation From Rolling Shutter Light-Fiel
HalluciDet Hallucinating RGB Modality for Person Detection Through Privileged Information
Context-Based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting
Stereo Conversion With Disparity-Aware Warping Compositing and Inpainting
MotionAGFormer Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network
On the Fly Neural Style Smoothing for Risk-Averse Domain Generalization
HyperMix Out-of-Distribution Detection and Classification in Few-Shot Settings
Latent Feature-Guided Diffusion Models for Shadow Removal
Fixing Overconfidence in Dynamic Neural Networks
Increasing Biases Can Be More Efficient Than Increasing Weights
Hyperbolic vs Euclidean Embeddings in Few-Shot Learning Two Sides o
Wino Vidi Vici Conquering Numerical Instability of 8-Bit Winograd Convolution
Bag of Tricks for Fully Test-Time Adaptation
Diff2Lip Audio Conditioned Diffusion Models for Lip-Synchronization
Small Objects Matters in Weakly-Supervised Semantic Segmentation
Prompting Classes Exploring the Power of Prompt Class Learning in
Self-Supervised Learning for Place Representation Generalization Across Appearance Changes
Interactive Segmentation for Diverse Gesture Types Without Context
CAD - Contextual Multi-Modal Alignment for Dynamic AVQ
SEMA Semantic Attention for Capturing Long-Range Dependencies in Egocentric Lifelogs
PatchRefineNet Improving Binary Segmentation by Incorporating Signals From Optimal Patch-Wis
BigSmall Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological
Reverse Knowledge Distillation Training a Large Model Using a Small
TAMPAR Visual Tampering Detection for Parcel Logistics in Postal Supply
Implicit Neural Representation for Change Detection
Diverse Imagenet Models Transfer Bett
MFT Long-Term Tracking of Every Pixel
ICF-SRSR Invertible Scale-Conditional Function for Self-Supervised Real-World Single Image Super-Resolution
Contrastive Viewpoint-Aware Shape Learning for Long-Term Person Re-Identification
Debiasing Calibrating and Improving Semi-Supervised Learning Performance via Simple Ensembl
Diffusion in the Dark A Diffusion Model for Low-Light Text
Domain Generalisation via Risk Distribution Matching
FocusTune Tuning Visual Localization Through Focus-Guided Sampling
Robust Learning via Conditional Prevalence Adjustment
SequenceMatch Revisiting the Design of Weak-Strong Augmentations for Semi-Supervised Learning
VideoFACT Detecting Video Forgeries Using Attention Scene Context and Forensic
MoP-CLIP A Mixture of Prompt-Tuned CLIP Models for Domain Incremental
Generalization by Adaptation Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation
Triplet Attention Transformer for Spatiotemporal Predictive Learning
HashReID Dynamic Network With Binary Codes for Efficient Person Re-Identification
Effective Restoration of Source Knowledge in Continual Test Time Adaptation
3D-Aware Talking-Head Video Motion Trans
Scene Text Image Super-Resolution Based on Text-Conditional Diffusion Models
Prototypical Contrastive Network for Imbalanced Aerial Image Segmentation
StyleGenes Discrete and Efficient Latent Distributions for GANs
Automated Monitoring of Ear Biting in Pigs by Tracking Individuals
Defending Object Detection Models Against Image Distortions
FRoG-MOT Fast and Robust Generic Multiple-Object Tracking by IoU an
DiffBody Diffusion-Based Pose and Shape Editing of Human Images
Unsupervised Domain Adaptation of MRI Skull-Stripping Trained on Adult Dat
Guided Cluster Aggregation A Hierarchical Approach to Generalized Category Discovery
MarsLS-Net Martian Landslides Segmentation Network and Benchmark Dataset
Domain Adaptive 3D Shape Retrieval From Monocular Images
Exploring the Impact of Rendering Method and Motion Quality on
Learning Visual Body-Shape-Aware Embeddings for Fashion Compatibility
Revisiting Pixel-Level Contrastive Pre-Training on Scene Images
Synthesizing Coherent Story With Auto-Regressive Latent Diffusion Models
Zero-Shot Building Attribute Extraction From Large-Scale Vision and Language Models
Can CLIP Help Sound Source Localization
Fully-Automatic Reflection Removal for 360-Degree Images
Grafting Vision Transformers
Hard-Label Based Small Query Black-Box Adversarial Attack
Layer-Wise Auto-Weighting for Non-Stationary Test-Time Adaptation
Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Imag
Point-DynRF Point-Based Dynamic Radiance Fields From a Monocular Video
Shape-Guided Diffusion With Inside-Outside Attention
CrashCar101 Procedural Generation for Damage Assessment
Motion Matters Neural Motion Transfer for Better Camera Physiological Measurement
PHG-Net Persistent Homology Guided Medical Image Classification
CGAPoseNetGCAN A Geometric Clifford Algebra Network for Geometry-Aware Camera Pos
StyleAvatar Stylizing Animatable Head Avatars
An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning
Simple Token-Level Confidence Improves Caption Correctness
Embedding Task Structure for Action Detection
Frequency Attention for Knowledge Distillation
I-AI A Controllable Interpretable AI System for Decoding Radiologists
LP-OVOD Open-Vocabulary Object Detection by Linear Probing
MixtureGrowth Growing Neural Networks by Recombining Learned Parameters
NVAutoNet Fast and Accurate 360deg 3D Visual Perception for Sel
Fast and Interpretable Face Identification for Out-of-Distribution Data Using Vision
ZEETAD Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action
Multi-Level Attention Aggregation for Aesthetic Face Relighting
Beyond Classification Definition and Density-Based Estimation of Calibration in Object
Towards Accurate Disease Segmentation in Plant Images A Comprehensive Dataset
ConeQuest A Benchmark for Cone Segmentation on Mars
DISCO Distributed Inference With Sparse Communications
Revolutionize the Oceanic Drone RGB Imagery With Pioneering Sun Glint
Shape-Biased CNNs Are Not Always Superior in Out-of-Distribution Robustness
Design Choices for Enhancing Noisy Student Self-Training
Vision Transformer for Multispectral Satellite Imagery Advancing Landcover Classification
Online Class-Incremental Learning for Real-World Food Image Classification
ENIGMA-51 Towards a Fine-Grained Understanding of Human Behavior in Industrial
G-CASCADE Efficient Cascaded Graph Convolutional Decoding for 2D Medical Imag
MIST Medical Image Segmentation Transformer With Convolutional Attention Mixing CAM
Semi-Supervised Semantic Depth Estimation Using Symbiotic Transformer and NearFarMix Augmentation
Image Labels Are All You Need for Coarse Seagrass Segmentation
Towards Realistic Generative 3D Face Models
Fingervein Verification Using Convolutional Multi-Head Attention Network
Multispectral Imaging for Differential Face Morphing Attack Detection A Preliminary
Source-Guided Similarity Preservation for Online Person Re-Identification
Continual Atlas-Based Segmentation of Prostate MRI
Activity-Based Early Autism Diagnosis Using a Multi-Dataset Supervised Contrastive Learning
MaskConver Revisiting Pure Convolution Model for Panoptic Segmentation
Attention-Guided Prototype Mixing Diversifying Minority Context on Imbalanced Whole Sli
GC-VTON Predicting Globally Consistent and Occlusion Aware Local Flows With
MOPA Modular Object Navigation With PointGoal Agents
Towards Domain-Aware Knowledge Distillation for Continual Model Generalization
Differentiable JPEG The Devil Is in the Details
Content-Aware Image Color Editing With Auxiliary Color Restoration Tasks
MuSHRoom Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction an
Segment Anything From Spac
VEATIC Video-Based Emotion and Affect Tracking in Context Dataset
Salient Object Detection for Images Taken by People With Vision
MotionGPT Human Motion Synthesis With Improved Diversity and Realism vi
Recognition of Unseen Bird Species by Learning From Field Guides
Effects of Markers in Training Datasets on the Accuracy o
Time To Shine Fine-Tuning Object Detection Models With Synthetic Advers
FuseCap Leveraging Large Language Models for Enriched Fused Image Captions
ClipSitu Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition
Efficient Expansion and Gradient Based Task Inference for Replay F
Interaction Region Visual Transformer for Egocentric Action Anticipation
Describe Images in a Boring Way Towards Cross-Modal Sarcasm Generation
TriCoLo Trimodal Contrastive Loss for Text To Shape Retrieval
On the Importance of Large Objects in CNN Based Object
Rank2Tell A Multimodal Driving Dataset for Joint Importance Ranking an
Leveraging Task-Specific Pre-Training To Reason Across Images and Videos
Auto-BPA An Enhanced Ball-Pivoting Algorithm With Adaptive Radius Using Contextual
MAdVerse A Hierarchical Dataset of Multi-Lingual Ads From Diverse Sources
Enhancing Multimodal Compositional Reasoning of Visual Language Models With Generativ
POP-VQA - Privacy Preserving On-Device Personalized Visual Question Answering
SICKLE A Multi-Sensor Satellite Imagery Dataset Annotated With Multiple Key
On Manipulating Scene Text in the Wild With Diffusion Models
Aligning Non-Causal Factors for Transformer-Based Source-Free Domain Adaptation
A Visual Active Search Framework for Geospatial Exploration
Benchmark Generation Framework With Customizable Distortions for Image Classifier Robustness
Open-Set Object Detection by Aligning Known Class Representations
Collage Diffusion
BirdSAT Cross-View Contrastive Masked Autoencoders for Bird Species Classification an
Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks
Detection Defenses An Empty Promise Against Adversarial Patch Attacks on
IndustReal A Dataset for Procedure Step Recognition Handling Execution Errors
Identifying Label Errors in Object Detection Datasets by Loss Inspection
OOD Aware Supervised Contrastive Learning
REALM Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Tim
Ordinal Classification With Distance Regularization for Robust Brain Age Prediction
IDD-AW A Benchmark for Safe and Robust Segmentation of Driv
RIMeshGNN A Rotation-Invariant Graph Neural Network for Mesh Classification
Improved Topological Preservation in 3D Axon Segmentation and Centerline Detection
Analyzing the Domain Shift Immunity of Deep Homography Estimation
Assist Is Just As Important as the Goal Image Resurfacing
Favoring One Among Equals - Not a Good Idea Many-to-On
CXR-IRGen An Integrated Vision and Language Model for the Generation
DiffCLIP Leveraging Stable Diffusion for Language Grounded 3D Classification
Med-DANet V2 A Flexible Dynamic Architecture for Efficient Medical Volumetric
Multitask Vision-Language Prompt Tuning
Towards Diverse and Consistent Typography Generation
Video-kMaX A Simple Unified Approach for Online and Near-Online Video
Egocentric Action Recognition by Capturing Hand-Object Contact and Object Stat
Benchmarking Out-of-Distribution Detection in Visual Question Answering
Conditional Velocity Score Estimation for Image Restoration
Few-Shot Shape Recognition by Learning Deep Shape-Aware Features
Training-Free Object Counting With Prompts
Have We Ever Encountered This Before Retrieving Out-of-Distribution Road Obstacles
Asymmetric Image Retrieval With Cross Model Compatible Ensembles
FPGAN-Control A Controllable Fingerprint Generator for Training With Synthetic Dat
ArcGeo Localizing Limited Field-of-View Images Using Cross-View Matching
Opinion Unaware Image Quality Assessment via Adversarial Convolutional Variational Autoenco
Vikriti-ID A Novel Approach for Real Looking Fingerprint Data-Set Generation
Deep Plug-and-Play Nighttime Non-Blind Deblurring With Saturated Pixel Handling Schemes
Joint Depth Prediction and Semantic Segmentation With Multi-View SAM
Lightweight Thermal Super-Resolution and Object Detection for Robust Perception in
PAIR Perception Aided Image Restoration for Natural Driving Conditions
Brainomaly Unsupervised Neurologic Disease Detection Utilizing Unannotated T1-Weighted Brain MR
Uncertainty Estimation in Instance Segmentation With Star-Convex Shapes
LipAT Beyond Style Transfer for Controllable Neural Simulation of Lipstick
Discriminator-Free Unsupervised Domain Adaptation for Multi-Label Image Classification
Learning Robust Deep Visual Representations From EEG Brain Recordings
SynthProv Interpretable Framework for Profiling Identity Leakag
Data-Centric Debugging Mitigating Model Failures via Targeted Image Retrieval
Hardware Aware Evolutionary Neural Architecture Search Using Representation Similarity Metric
Deep Image Fingerprint Towards Low Budget Synthetic Image Detection an
Gradient Coreset for Federated Learning
Computer Vision on the Edge Individual Cattle Identification in Real-Tim
MSCC Multi-Scale Transformers for Camera Calibration
Overcoming Catastrophic Forgetting for Multi-Label Class-Incremental Learning
StyleGAN-Fusion Diffusion Guided Domain Adaptation of Image Generators
SyntheWorld A Large-Scale Synthetic Dataset for Land Cover Mapping an
Single-Image Deblurring Trajectory and Shape Recovery of Fast Moving Objects
Visual Narratives Large-Scale Hierarchical Classification of Art-Historical Images
pSTarC Pseudo Source Guided Target Clustering for Fully Test-Time Adaptation
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment
OmniVec Learning Robust Representations With Cross Modal Sharing
Holistic Representation Learning for Multitask Trajectory Anomaly Detection
Training Ensembles With Inliers and Outliers for Semi-Supervised Active Learning
Diffused Heads Diffusion Models Beat GANs on Talking-Face Generation
A Closer Look at Robustness of Vision Transformers to Backdoo
Diffuse and Restore A Region-Adaptive Diffusion Model for Identity-Preserving Blin
LaughTalk Expressive 3D Talking Head Generation With Laught
Defense Against Adversarial Cloud Attack on Remote Sensing Salient Object
Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths
NeRFEditor Differentiable Style Decomposition for 3D Scene Editing
Rethinking Visibility in Human Pose Estimation Occluded Pose Reasoning vi
RSMPNet Relationship Guided Semantic Map Prediction
Towards Better Structured Pruning Saliency by Reorganizing Convolution
FastCLIPstyler Optimisation-Free Text-Based Image Style Transfer Using Style Representations
GRIT GAN Residuals for Paired Image-to-Image Translation
Face Identity-Aware Disentanglement in StyleGAN
Adapt Your Teacher Improving Knowledge Distillation for Exemplar-Free Continual Learning
Few-Shot Event Classification in Images Using Knowledge Graphs for Prompting
Diffusion Models Meet Image Counter-Forensics
Active Transfer Learning for Efficient Video-Specific Human Pose Estimation
Appearance-Based Curriculum for Semi-Supervised Learning With Multi-Angle Unlabeled Dat
Kaizen Practical Self-Supervised Continual Learning With Continual Fine-Tuning
Semantic-Aware Video Representation for Few-Shot Action Recognition
Discovering and Mitigating Biases in CLIP-Based Image Editing
Weakly-Supervised Deepfake Localization in Diffusion-Generated Images
Cross-Domain Few-Shot Incremental Learning for Point-Cloud Recognition
SciOL and MuLMS-Img Introducing a Large-Scale Multimodal Scientific Dataset an
PrivObfNet A Weakly Supervised Semantic Segmentation Model for Data Protection
RGB-D Mapping and Tracking in a Plenoxel Radiance Fiel
Complementary-Contradictory Feature Regularization Against Multimodal Overfitting
360BEV Panoramic Semantic Mapping for Indoor Birds-Eye View
Learning To Compose SuperWeights for Neural Parameter Allocation Search
Lets Observe Them Over Time An Improved Pedestrian Attribute Recognition
GraphGraph A Nested Graph-Based Framework for Early Accident Anticipation
Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos
C-CLIP Contrastive Image-Text Encoders To Close the Descriptive-Commentative G
Object Re-Identification From Point Clouds
Using Early Readouts To Mediate Featural Bias in Distillation
Permutation-Aware Activity Segmentation via Unsupervised Frame-To-Segment Alignment
PointCT Point Central Transformer Network for Weakly-Supervised Point Cloud Semantic
3D Super-Resolution Model for Vehicle Flow Field Enrichment
Query-Guided Attention in Vision Transformers for Localizing Objects Using
Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution With Implicit Representation Networks
2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation
Occlusion Sensitivity Analysis With Augmentation Subspace Perturbation in Deep Featu
Controllable Image Synthesis of Industrial Data Using Stable Diffusion
Landeghem Beyond Document Page Classification Design Datasets and Challenges
Rozendaal MobileNVC Real-Time 1080p Neural Video Compression on a Mobile Devic
GC-MVSNet Multi-View Multi-Scale Geometrically-Consistent Multi-View Stereo
Evaluation of Video Masked Autoencoders Performance and Uncertainty Estimations fo
Causal Feature Alignment Learning To Ignore Spurious Background Features
Can You Even Tell Left From Right Presenting a New
CoD Coherent Detection of Entities From Images With Multiple Modalities
GraphFill Deep Image Inpainting Using Graphs
Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning
e Silva Attention Modules Improve Image-Level Anomaly Detection for Industrial Inspection
TEGLO High Fidelity Canonical Texture Mapping From Single-View Images
Toward Planet-Wide Traffic Camera Calibration
Fine-Grained Alignment for Cross-Modal Recipe Retrieval
Improving Open-Set Semi-Supervised Learning With Self-Supervision
3D Human Pose Estimation With Two-Step Mixed-Training Strategy
Continual Test-Time Domain Adaptation via Dynamic Sample Selection
Customizing 360-Degree Panoramas Through Text-to-Image Diffusion Models
Distortion-Disentangled Contrastive Learning
Efficient Transferability Assessment for Selection of Pre-Trained Detectors
FreMIM Fourier Transform Meets Masked Image Modeling for Medical Imag
GazeGNN A Gaze-Guided Graph Neural Network for Chest X-Ray Classification
Hyb-NeRF A Multiresolution Hybrid Encoding for Neural Radiance Fields
Improving the Effectiveness of Deep Generative Dat
Learning Quality Labels for Robust Image Classification
Maximum Knowledge Orthogonality Reconstruction With Gradients in Federated Learning
Multimodality-Guided Image Style Transfer Using Cross-Modal GAN Inversion
Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis
Painterly Image Harmonization via Adversarial Residual Learning
RS2G Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception
Self-Annotated 3D Geometric Learning for Smeared Points Removal
Sparse Convolutional Networks for Surface Reconstruction From Noisy Point Clouds
TSP-Transformer Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding
VCISR Blind Single Image Super-Resolution With Video Compression Synthetic Dat
Density-Based Flow Mask Integration via Deformable Convolution for Video Peopl
Exploiting CLIP for Zero-Shot HOI Detection Requires Knowledge Distillation at
Interpretable Object Recognition by Semantic Prototype Analysis
Constrained Probabilistic Mask Learning for Task-Specific Undersampled MRI Reconstruction
Approximating Intersections and Differences Between Linear Statistical Shape Models Using
FATE Feature-Agnostic Transformer-Based Encoder for Learning Generalized Embedding Spaces in
HAMMER Learning Entropy Maps To Create Accurate 3D Models in
Best of Both Worlds Learning Arbitrary-Scale Blind Super-Resolution via Dual
From Denoising Training To Test-Time Adaptation Enhancing Domain Generalization fo
Second-Order Graph ODEs for Multi-Agent Trajectory Forecasting
The Growing Strawberries Dataset Tracking Multiple Objects With Biological Development
Gradual Source Domain Expansion for Unsupervised Domain Adaptation
Camera-Independent Single Image Depth Estimation From Defocus Blu
Link Prediction for Flow-Driven Spatial Networks
ECSIC Epipolar Cross Attention for Stereo Image Compression
Sketch-Based Video Object Localization
A Robust Diffusion Modeling Framework for Radar Camera 3D Object
Correlation-Aware Active Learning for Surgery Video Segmentation
HD-Fusion Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation
Learning Better Keypoints for Multi-Object 6DoF Pose Estimation
Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation
MIVC Multiple Instance Visual Component for Visual-Language Models
RPCANet Deep Unfolding RPCA Based Infrared Small Target Detection
CLIP-DIY CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-F
MITFAS Mutual Information Based Temporal Feature Alignment and Sampling fo
PMI Sampler Patch Similarity Guided Frame Selection for Aerial Action
TSA2 Temporal Segment Adaptation and Aggregation for Video Harmonization
DREAM Visual Decoding From Reversing Human Visual System
Beyond Fusion Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection
SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
Sign Language Production With Latent Motion Transform
Glance To Count Learning To Rank With Anchors for Weakly-Supervis
HDMNet A Hierarchical Matching Network With Double Attention for Large-Scal
DPPMask Masked Image Modeling With Determinantal Point Processes
GIPCOL Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
GTP-ViT Efficient Vision Transformers via Graph-Based Token Propagation
Personalized Face Inpainting With Diffusion Models by Parallel Visual Attention
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Self-Supervised Edge Detection Reconstruction for Topology-Informed 3D Axon Segmentation an
Self-Supervised Relation Alignment for Scene Graph Generation
SpectralCLIP Preventing Artifacts in Text-Guided Style Transfer From a Spectral
Active Learning for Single-Stage Object Detection in UAV Images
Convolutional Masked Image Modeling for Dense Prediction Tasks on Pathology
Foundation Model Assisted Weakly Supervised Semantic Segmentation
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Latent-Guided Exemplar-Based Image Re-Colorization
MGM-AE Self-Supervised Learning on 3D Shape Using Mesh Graph Mask
PolyMaX General Dense Prediction With Mask Transform
Robust Category-Level 3D Pose Estimation From Diffusion-Enhanced Synthetic Dat
SCoRD Subject-Conditional Relation Detection With Text-Augmented Dat
SimpliMix A Simplified Manifold Mixup for Few-Shot Point Cloud Classification
AFTer-SAM Adapting SAM With Axial Fusion Transformer for Medical Imaging
Universal Semi-Supervised Model Adaptation via Collaborative Consistency Training
Group-Wise Contrastive Bottleneck for Weakly-Supervised Visual Representation Learning
3SD Self-Supervised Saliency Detection With No Labels
Self-Supervised Denoising Transformer With Gaussian Process
Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning
PathLDM Text Conditioned Latent Diffusion Model for Histopathology
Concurrent Band Selection and Traversability Estimation From Long-Wave Hyperspectral Imagery
FIRe Fast Inverse Rendering Using Directional and Signed Distance Functions
Label Shift Estimation for Class-Imbalance Problem A Bayesian Approach
LAVSS Location-Guided Audio-Visual Spatial Audio Separation
Unsupervised Exemplar-Based Image-to-Image Translation and Cascaded Vision Transformers for Tagg
FG-Net Facial Action Unit Detection With Generalizable Pyramidal Features
Augment the Pairs Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision
Optical Flow Domain Adaptation via Target Style Trans
Real-Time Polyp Detection in Colonoscopy Using Lightweight Transform
Cross-Attention Between Satellite and Ground Views for Enhanced Fine-Grained Robot
FAKD Feature Augmented Knowledge Distillation for Semantic Segmentation
Rethinking Multimodal Content Moderation From an Asymmetric Angle With Mixed-Modality
StreamMapNet Streaming Mapping Network for Vectorized Online HD Map Construction
Understanding Hyperbolic Metric Learning Through Hard Negative Sampling
Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation
DocReal Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control
Evolve Enhancing Unsupervised Continual Learning With Multiple Experts
When 3D Bounding-Box Meets SAM Point Cloud Instance Segmentation With
Refine and Redistribute Multi-Domain Fusion and Dynamic Label Assignment fo
Cheating Depth Enhancing 3D Surface Anomaly Detection via Depth Simulation
Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection
Can Vision-Language Models Be a Good Guesser Exploring VLMs fo
Contextual Affinity Distillation for Image Anomaly Detection
D3GU Multi-Target Active Domain Adaptation via Enhancing Domain Alignment
DECDM Document Enhancement Using Cycle-Consistent Diffusion Models
Domain Generalization With Correlated Style Uncertainty
DR2 Disentangled Recurrent Representation Learning for Data-Efficient Speech Video Synthesis
Generated Distributions Are All You Need for Membership Inference Attacks
Handformer2T A Lightweight Regression-Based Model for Interacting Hands Pose Estimation
Improving the Fairness of the Min-Max Game in GANs Training
Improving the Leaking of Augmentations in Data-Efficient GANs via Adaptiv
Incorporating Physics Principles for Precise Human Motion Prediction
Instruct Me More Random Prompting for Visual In-Context Learning
Movie Genre Classification by Language Augmentation and Shot Sampling
Multimodal Channel-Mixing Channel and Spatial Masked AutoEncoder on Facial Action
Object-Centric Video Representation for Long-Term Action Anticipation
On the Quantification of Image Reconstruction Uncertainty Without Training Dat
Open-NeRF Towards Open Vocabulary NeRF Decomposition
Patch-Based Selection and Refinement for Early Object Detection
PGVT Pose-Guided Video Transformer for Fine-Grained Action Recognition
PMVC Promoting Multi-View Consistency for 3D Scene Reconstruction
Preserving Image Properties Through Initializations in Diffusion Models
Semantic Transfer From Head to Tail Enlarging Tail Margin fo
Sequential Transformer for End-to-End Video Text Detection
Text-to-Image Editing by Image Information Removal
WalkFormer Point Cloud Completion via Guided Walks
BALF Simple and Efficient Blur Aware Local Feature Detecto
Deep Subdomain Alignment for Cross-Domain Image Classification
Leveraging the Power of Data Augmentation for Transformer-Based Tracking
Polarimetric PatchMatch Multi-View Stereo
SemST Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment
THInImg Cross-Modal Steganography for Presenting Talking Heads in Images
Unsupervised Domain Adaptation for Semantic Segmentation With Pseudo Label Self-Refinement
CAILA Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning
TPSeNCE Towards Artifact-Free Realistic Rain Generation for Deraining and Object
Lightweight Portrait Matting via Regional Attention and Refinement
4K-Resolution Photo Exposure Correction at 125 FPS With 8K Parameters
FELGA Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association
CATS Combined Activation and Temporal Suppression for Efficient Network Inferenc
Consistent Multimodal Generation via a Unified GAN Framework
ShARc Shape and Appearance Recognition for Person Identification In-the-Wil
SSP Semi-Signed Prioritized Neural Fitting for Surface Reconstruction From Unorient
Unsupervised Graphic Layout Grouping With Transformers
Multimodal Deep Learning for Remote Stress Estimation Using CCT-LSTM
Learning To Recognize Occluded and Small Objects With Partial Inputs

评论区 0