I'm a research scientist in the Learning and Perception Research (LPR) team at NVIDIA Research. I completed my Ph.D. study in the Computer Vision Lab at UMass Amherst, advised by Prof. Erik Learned-Miller. I obtained my master's degree in Computer Science from Brown University and my bachelor's degree in Intelligent Science and Technology from Peking University.
I work in the areas of computer vision, graphics, and machine learning, and in particular, I am interested in bringing together the strengths of 2D and 3D visual information for learning richer and more flexible representations. Our work on 3D shape recognition won first place in the SHREC '16 Large-Scale 3D Shape Retrieval Contest, and I am a recipient of a CVPR Best Paper Honorable Mention Award for our work on point cloud processing.
PAC is a content-adaptive operation that generalizes standard convolution and bilateral filters.
Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, and Jan Kautz, "Pixel-Adaptive Convolutional Neural Networks", CVPR 2019.
Making intelligent decisions about unseen objects given only partial observations is a fundamental component of visual common sense. In this work, we formalize prediction tasks critical to visual common sense and introduce the Half&Half benchmarks to measure an agent's ability to perform these tasks.
Ashish Singh*, Hang Su*, SouYoung Jin, Huaizu Jiang, Chetan Manjesh, Geng Luo, Ziwei He, Li Hong, Erik G. Learned-Miller, and Rosemary Cowell, "Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense", CVPR 2019 Workshop on Vision Meets Cognition (to appear).
A fast and end-to-end trainable neural network that directly works on point clouds and can also do joint 2D-3D processing.
Awarded "Best Paper Honorable Mention" at CVPR'18!
NVAIL Pioneering Research Award
project page video pdf arXiv code
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz, "SPLATNet: Sparse Lattice Networks for Point Cloud Processing", CVPR 2018 (oral).
An end-to-end system for detecting and clustering faces by identity in full-length movies.
SouYoung Jin, Hang Su, Chris Stauffer, and Erik Learned-Miller, "End-to-end face detection and cast grouping in movies using Erdős–Rényi clustering", ICCV 2017 (splotlight).
A novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering state-of-the-art performance in a range of recognition tasks.
Ranked #1 in a SHREC'16 contest!
project page video pdf arXiv code
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition", ICCV 2015.
M. Savva, et al., "SHREC’16 Track: Large-Scale 3D Shape Retrieval from ShapeNet Core55", Eurographics Workshop on 3D Object Retrieval, 2016.
The first large-scale scene attribute database.
G. Patterson, C. Xu, H. Su, J. Hays, "The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding", IJCV, May 2014.
Learning and reasoning visual occlusions (e.g. on faces) using a deep graphical model. Co-advised by Professor Vangelis Kalogerakis and Professor Erik Learned-Miller.
We create an extension to LFW Part Labels dataset. It provides 7 part labels to 2,927 portrait photos.
data (lfw-parts-v2)
In this project, I implemented in C++ a human face and body detection system based on the paper "Face detection, pose estimation and landmark localization in the wild" (X. Zhu and D. Ramanan, CVPR 2012). The implementation achieves 0.95 recall and 0.90 precision on eHarmony’s user profile photos.
The goal of this project is to automatically distinguish high quality professional photos from low quality snapshots.
We focus on assessing the quality of photos which contain faces (e.g. user profile photos). We propose several image features particularly useful for this task, e.g. skin smoothness, composition, bokeh. Experiments show that with small modifications they are also useful for assessing other types of photos.
Onboard vehicle detection plays a key role in collision prevention and autonomous driving. Camera-based detection techniques have been proven effective and economical, and show wide application prospect.
This project focuses on front vehicle detection using onboard cameras. Hypothesis generation based on shadows and hypothesis verification based on HOG features are combined to achieve a real-time system. We also introduce and integrate a passing vehicle detection component using optical flow, as well as road surface segmentation.
With almost 100 beautifully modeled 3D buildings on Peking University campus, our team won the top prize in 2008 Google International Model Your Campus Competition.