My face

Contact

Office
2 Technology Park Dr.
NVIDIA
Westford, MA 01886
Email

Links

Meet Eclipse

Cat tickers

Hang Su

I'm a research scientist in the Learning and Perception Research (LPR) team at NVIDIA Research. I completed my Ph.D. study in the Computer Vision Lab at UMass Amherst. Before that, I obtained my master's degree from Brown University and my bachelor's degree from Peking University.

I work in the areas of computer vision, graphics, and deep learning. In particular, I am interested in bringing together the strengths of 2D, 3D and 4D visual perception, as well as equipping AI agents with visual information from all modalities. Our work on 3D shape recognition won first place in the SHREC '16 Large-Scale 3D Shape Retrieval Contest, and I received the CVPR Best Paper Honorable Mention Award for our work on point cloud processing and the CVPR Best Paper Award Candidate for our work on monocular scene flow estimation.

Research Projects

L4P: Low-level 4D Vision Perception Unified

L4P

A unified framework for low-level 4D vision perception tasks, bridging 2D, 3D, and temporal understanding for autonomous navigation and robotics applications.

project page arXiv video code

Abhishek Badki*, Hang Su*, Bowen Wen, and Orazio Gallo, "L4P: Low-level 4D Vision Perception Unified", arXiv 2025.

Zero-shot Monocular Scene Flow Estimation in the Wild

Zero-shot Scene Flow

A novel approach for zero-shot monocular scene flow estimation that works in challenging real-world scenarios without requiring training on specific datasets.

🏆 CVPR 2025 Oral, Best Paper Award Candidate

project page arXiv video code

Yiqing Liang, Abhishek Badki*, Hang Su*, James Tompkin, and Orazio Gallo, "Zero-shot Monocular Scene Flow Estimation in the Wild", CVPR 2025.

HybridDepth: Robust Depth Fusion

HybridDepth

Depth estimation that combines depth from focus and single-image priors for robust metric depth fusion using focal stack images captured from a camera.

code pdf arXiv

Ashkan Ganj, Hang Su, and Tian Guo, "HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors", WACV 2025.

Ashkan Ganj, Hang Su, and Tian Guo, "Toward Robust Depth Fusion for Mobile AR With Depth from Focus and Single-Image Priors", ISMAR 2024.

Surround View Synthesis for In-Car Parking Visualization

Surround View Synthesis

Comprehensive surround-view visualizations to assist with in-car parking and navigation scenarios.

video

Abhishek Badki*, Hang Su*, Jan Kautz, and Orazio Gallo, "Surround View Synthesis for In-Car Parking Visualization", NVIDIA GTC 2024.

BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation

BlobGEN-3D

A novel framework for compositional 3D-consistent freeview image generation using 3D blobs, enabling high-quality novel view synthesis with improved geometric consistency.

project page paper

Chao Liu, Weili Nie, Sifei Liu, Abhishek Badki, Hang Su, Morteza Mardani, Benjamin Eckart, and Arash Vahdat, "BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation with 3D Blobs", SIGGRAPH Asia 2024.

FoVA-Depth: Field-of-View Agnostic Depth Estimation

FoVA-Depth

Field-of-view agnostic depth estimation method that enables robust cross-dataset generalization, addressing the challenge of varying camera parameters at inference time and across different datasets.

🏆 3DV 2024 Oral

project page arXiv code

Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, and Orazio Gallo, "FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset Generalization", 3DV 2024.

nvTorchCam: Camera-Agnostic Differentiable Geometric Vision

nvTorchCam

An open-source PyTorch library providing camera-agnostic differentiable geometric vision operations, enabling seamless integration of various camera models in deep learning pipelines.

project page arXiv code

Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, and Orazio Gallo, "nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision", arXiv 2024.

Mobile AR Depth Estimation: Challenges & Prospects

Mobile AR Depth

A comprehensive study examining the challenges and future prospects of depth estimation for mobile augmented reality applications, addressing key technical and practical considerations.

arXiv

Ashkan Ganj, Yiqin Zhao, Hang Su, and Tian Guo, "Mobile AR Depth Estimation: Challenges & Prospects", International Workshop on Mobile Computing Systems and Applications (HotMobile) 2024.

Pixel-Adaptive Convolution

PAC

PAC is a content-adaptive operation that generalizes standard convolution and bilateral filters.

project page video arXiv code

Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, and Jan Kautz, "Pixel-Adaptive Convolutional Neural Networks", CVPR 2019.

Half&Half Benchmarks

Half&Half

Making intelligent decisions about unseen objects given only partial observations is a fundamental component of visual common sense. In this work, we formalize prediction tasks critical to visual common sense and introduce the Half&Half benchmarks to measure an agent's ability to perform these tasks.

pdf

Ashish Singh*, Hang Su*, SouYoung Jin, Huaizu Jiang, Chetan Manjesh, Geng Luo, Ziwei He, Li Hong, Erik G. Learned-Miller, and Rosemary Cowell, "Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense", CVPR 2019 Workshop on Vision Meets Cognition.

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

SPLATNET

A fast and end-to-end trainable neural network that directly works on point clouds and can also do joint 2D-3D processing.

🏆 CVPR 2018 Oral, Best Paper Award Honorable Mention

🏆 NVAIL Pioneering Research Award

project page video pdf arXiv code

Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz, "SPLATNet: Sparse Lattice Networks for Point Cloud Processing", CVPR 2018.

End-to-end Face Detection and Cast Grouping in Movies Using Erdős–Rényi Clustering

erdos-renyi

An end-to-end system for detecting and clustering faces by identity in full-length movies.

🏆 ICCV 2017 Spotlight

project page pdf arXiv code

SouYoung Jin, Hang Su, Chris Stauffer, and Erik Learned-Miller, "End-to-end face detection and cast grouping in movies using Erdős–Rényi clustering", ICCV 2017.

Multi-view CNN (MVCNN) for 3D Shape Recognition

MVCNN architecture

A novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering state-of-the-art performance in a range of recognition tasks.

🏆 Ranked #1 in SHREC'16 Large-Scale 3D Shape Retrieval Contest

project page video pdf arXiv code

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition", ICCV 2015.

M. Savva, et al., "SHREC’16 Track: Large-Scale 3D Shape Retrieval from ShapeNet Core55", Eurographics Workshop on 3D Object Retrieval, 2016.

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

Scene Attributes

The first large-scale scene attribute database.

project page pdf

G. Patterson, C. Xu, H. Su, J. Hays, "The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding", IJCV, May 2014.

Earlier Projects

Layered Global-Local (GLOC) Model for Image Parts Labelling with Occlusion 2014

Global-Local Occlusion Model

Learning and reasoning visual occlusions (e.g. on faces) using a deep graphical model. Co-advised by Professor Vangelis Kalogerakis and Professor Erik Learned-Miller.

We create an extension to LFW Part Labels dataset. It provides 7 part labels to 2,927 portrait photos.

data (lfw-parts-v2)

Face & Pose Detection Using Deformable Part-based Model 2012 Summer @eHarmony

Face Detection

In this project, I implemented in C++ a human face and body detection system based on the paper "Face detection, pose estimation and landmark localization in the wild" (X. Zhu and D. Ramanan, CVPR 2012). The implementation achieves 0.95 recall and 0.90 precision on eHarmony’s user profile photos.

code

Photo Quality Assessment on User Profile Photos 2012

Photo Quality Assessment

The goal of this project is to automatically distinguish high quality professional photos from low quality snapshots.

We focus on assessing the quality of photos which contain faces (e.g. user profile photos). We propose several image features particularly useful for this task, e.g. skin smoothness, composition, bokeh. Experiments show that with small modifications they are also useful for assessing other types of photos.

Front Vehicle Detection Using Onboard Camera 2010-2011

Vehicle Detection & Road Segmentation

Onboard vehicle detection plays a key role in collision prevention and autonomous driving. Camera-based detection techniques have been proven effective and economical, and show wide application prospect.

This project focuses on front vehicle detection using onboard cameras. Hypothesis generation based on shadows and hypothesis verification based on HOG features are combined to achieve a real-time system. We also introduce and integrate a passing vehicle detection component using optical flow, as well as road surface segmentation.

3D Modelling of Peking University Campus 2008

One of the 3D models

With almost 100 beautifully modeled 3D buildings on Peking University campus, our team won the top prize in 2008 Google International Model Your Campus Competition.

project page

×