My face


2 Technology Park Dr.
Westford, MA 01886


Meet Eclipse

PitaPata Cat tickers

Hang Su

I'm a research scientist in the Learning and Perception Research (LPR) team at NVIDIA Research. I completed my Ph.D. study in the Computer Vision Lab at UMass Amherst, advised by Prof. Erik Learned-Miller. I obtained my master's degree in Computer Science from Brown University and my bachelor's degree in Intelligent Science and Technology from Peking University.

I work in the areas of computer vision, graphics, and machine learning, and in particular, I am interested in bringing together the strengths of 2D and 3D visual information for learning richer and more flexible representations. Our work on 3D shape recognition won first place in the SHREC '16 Large-Scale 3D Shape Retrieval Contest, and I am a recipient of a CVPR Best Paper Honorable Mention Award for our work on point cloud processing.

Recent & Upcoming

I joined NVIDIA as a research scientist.
Talk at CVPR tutorial on Learning Representations via Graph-structured Networks.
Our paper on Half&Half Benchmarks accepted to CVPR 2019 Workshop on Vision Meets Cognition.
Our paper on PACNet accepted to CVPR 2019.


Pixel-Adaptive Convolution


PAC is a content-adaptive operation that generalizes standard convolution and bilateral filters.

project page video arXiv code

Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, and Jan Kautz, "Pixel-Adaptive Convolutional Neural Networks", CVPR 2019.

Half&Half Benchmarks


Making intelligent decisions about unseen objects given only partial observations is a fundamental component of visual common sense. In this work, we formalize prediction tasks critical to visual common sense and introduce the Half&Half benchmarks to measure an agent's ability to perform these tasks.


Ashish Singh*, Hang Su*, SouYoung Jin, Huaizu Jiang, Chetan Manjesh, Geng Luo, Ziwei He, Li Hong, Erik G. Learned-Miller, and Rosemary Cowell, "Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense", CVPR 2019 Workshop on Vision Meets Cognition (to appear).

SPLATNet: Sparse Lattice Networks for Point Cloud Processing


A fast and end-to-end trainable neural network that directly works on point clouds and can also do joint 2D-3D processing.

Awarded "Best Paper Honorable Mention" at CVPR'18!

NVAIL Pioneering Research Award

project page video pdf arXiv code

Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz, "SPLATNet: Sparse Lattice Networks for Point Cloud Processing", CVPR 2018 (oral).

End-to-end Face Detection and Cast Grouping in Movies Using Erdős–Rényi Clustering


An end-to-end system for detecting and clustering faces by identity in full-length movies.

project page pdf arXiv code

SouYoung Jin, Hang Su, Chris Stauffer, and Erik Learned-Miller, "End-to-end face detection and cast grouping in movies using Erdős–Rényi clustering", ICCV 2017 (splotlight).

Multi-view CNN (MVCNN) for 3D Shape Recognition

MVCNN architecture

A novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering state-of-the-art performance in a range of recognition tasks.

Ranked #1 in a SHREC'16 contest!

project page video pdf arXiv code

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition", ICCV 2015.

M. Savva, et al., "SHREC’16 Track: Large-Scale 3D Shape Retrieval from ShapeNet Core55", Eurographics Workshop on 3D Object Retrieval, 2016.

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

Scene Attributes

The first large-scale scene attribute database.


G. Patterson, C. Xu, H. Su, J. Hays, "The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding", IJCV, May 2014.

Earlier Projects

Layered Global-Local (GLOC) Model for Image Parts Labelling with Occlusion 2014

Global-Local Occlusion Model

Learning and reasoning visual occlusions (e.g. on faces) using a deep graphical model. Co-advised by Professor Vangelis Kalogerakis and Professor Erik Learned-Miller.

We create an extension to LFW Part Labels dataset. It provides 7 part labels to 2,927 portrait photos.

data (lfw-parts-v2)

Face & Pose Detection Using Deformable Part-based Model 2012 Summer (@eHarmony)

Face Detection

In this project, I implemented in C++ a human face and body detection system based on the paper "Face detection, pose estimation and landmark localization in the wild" (X. Zhu and D. Ramanan, CVPR 2012). The implementation achieves 0.95 recall and 0.90 precision on eHarmony’s user profile photos.


Photo Quality Assessment on User Profile Photos 2012

Photo Quality Assessment

The goal of this project is to automatically distinguish high quality professional photos from low quality snapshots.

We focus on assessing the quality of photos which contain faces (e.g. user profile photos). We propose several image features particularly useful for this task, e.g. skin smoothness, composition, bokeh. Experiments show that with small modifications they are also useful for assessing other types of photos.

Front Vehicle Detection Using Onboard Camera 2010-2011

Vehicle Detection & Road Segmentation

Onboard vehicle detection plays a key role in collision prevention and autonomous driving. Camera-based detection techniques have been proven effective and economical, and show wide application prospect.

This project focuses on front vehicle detection using onboard cameras. Hypothesis generation based on shadows and hypothesis verification based on HOG features are combined to achieve a real-time system. We also introduce and integrate a passing vehicle detection component using optical flow, as well as road surface segmentation.

3D Modelling of Peking University Campus 2008

One of the 3D models

With almost 100 beautifully modeled 3D buildings on Peking University campus, our team won the top prize in 2008 Google International Model Your Campus Competition.

project page

My Calendar