Sean Fanello, Christoph Rhemann, Jonathan Taylor, Sofien Bouaziz, Adarsh Kowdle, Rohit Pandey, Sergio Orts-Escolano, Paul Debevec, Shahram Izadi



3D Capture and Rendering of humans has shown incredible progress in the last few years reaching a level of quality that is getting closer to Image Based Renderings (IBR) approaches. These systems usually rely on multiview capture setups and sophisticated pipelines to build a consistent, parameterized mesh of the performer with reflectance properties. The final goal of these approaches is to render high quality, photo-realistic humans that can match the quality of Hollywood productions, without any manual intervention or post-processing.

However, despite steady progress and encouraging results obtained by these 3D capture systems, they still face important challenges and limitations. For example, translucent and transparent objects cannot be easily captured; reconstructing thin structures (e.g. hair) is still very challenging even with high resolution depth sensors. At the same time, the computer vision community has focused its attention towards deep learning techniques to overcome the limitations of geometric approaches.

In this tutorial, we will show how to combine geometric pipelines with recent advances in neural rendering to construct disentangled 3D representations for photo-realistic renderings of humans in novel viewpoints and desired lighting conditions. We will walk the audience through the current state-of-the-art for 3D performance capture, highlighting the pros and cons of the various techniques.

In particular, in the first part of the tutorial we will focus on the capture system, that is the foundation for any machine learning methods that rely on supervised ground truth data. We will consider the hardware design choices for cameras, sensors, lighting, and depth estimation algorithms. We will then describe all the steps needed to select and design the right depth sensing technology for a given application.

In the second part we will then detail state-of-the-art methods to reconstruct humans with high fidelity. We will focus on topics such as 3D reconstruction, parametric and non-parametric tracking, mesh parameterization and compression. We will also detail traditional methods to compute reflectance and material properties of arbitrary objects.

In the third part of the tutorial we will show how deep learning can be applied to overcome the limitations of the traditional capture and rendering pipelines. We will detail recent trends in disentangled representations for human capture, with particular emphasis on pose, viewpoint and lighting. Finally we will discuss multiple applications enabled by performance capture systems and machine learning.

When and Where

June 14th - 9.10 am PDT - Virtual, hosted on the official CVPR Website.
Talks are pre-recorded and attendees can watch them asychronously.
Organizers will periodically check the comments to answer questions.


Morning Session:
3D Capture Systems for Groundtruth Generation
9:10 - 9:30
Sean Fanello
9:30 - 10:00
Adarsh Kowdle
10:00 - 10:30
Jay Busch, Matt Whalen
10:30 - 10:45
Coffee Break
10:45 - 11:30
Sean Fanello
11:30 - 12:10
Alex Ma
12:10 - 13:30
Lunch Break
Afternoon Session:
Deep Learning meets Light Stage
13:30 - 14:00
Paul Debevec
14:00 - 14:20
Yinda Zhang
14:20 - 14:40
Anastasia Tkach
14:40 - 15:00
Sergio Orts-Escolano
15:00 - 15:20
Chloe LeGendre
15:20 - 15:40
Danhang Tang
15:40 - 16:00
Coffee Break
16:00 - 16:20
Abhimitra Meka
16:20 - 17:00
Rohit Pandey

Please contact Shahram Izadi or Sean Fanello if you have any questions.