A new paper from the Max Planck Institute and MIT proposed a technique to achieve true disentanglement of Neural Radiance Fields (NeRF) content from the illumination that was present when the data was collected, allowing ad hoc environment maps to completely disable lighting in a NeRF scene:
The new approach uses the popular open-source 3D animation program Blender to create a “virtual light scene”, where many iterations of possible lighting scenarios are rendered and ultimately trained in a special layer of the NeRF model that can accommodate n any environment map that the user wants to use to illuminate the scene.
The approach was tested against the Mitsuba2 reverse render frame, and also against previous work PhySG, RNR, Neural-PIL and NeRFactor, using only a direct illumination model, and obtained the best scores:
The paper states:
“Our qualitative and quantitative results demonstrate a clear advance in terms of scene parameter recovery as well as the synthesis quality of our approach under new views and lighting conditions compared to the previous state of the art.”
The researchers say they will eventually release the code for the project.
The Need for NeRF Editability
This type of disentanglement has proven to be a notable challenge for researchers in neural radiation fields, since NeRF is essentially a photogrammetry technique that calculates the pixel value of thousands of possible paths from a viewpoint, by assigning RGBD values and assembling a matrix of those values into a volumetric representation. At its core, NeRF is defined by lighting.
In fact, despite its impressive visuals and lavish adoption by NVIDIA, NeRF is uniquely “rigid” – in CGI terms, “cooked”. Therefore, the research community has focused on improving its traceability and versatility in this regard over the past 12-18 months.
In terms of significance, the stakes for this type of milestone are high and include the possibility of transforming the visual effects industry from a creative and collaborative model centered on mesh generation, motion dynamics and texturing, to a model built around reverse renderingwhere the VFX pipeline is fed by real-world photos of real things (or even, possibly, actual, synthesized models), rather than estimated homemade approximations.
For now, there is relatively little cause for concern among the visual effects community, at least from Neural Radiance Fields. NeRF only has nascent capabilities in terms of rigging, nesting, depth control, articulation… and certainly lighting as well. The accompanying video for another new paperr, which offers rudimentary deformations for NeRF geometry, illustrates the huge chasm between the current state of the art in CGI and pioneering efforts in neural rendering techniques.
Sift the elements
Nevertheless, since it is necessary to start somewhere, the researchers of the new paper have adopted CGI as an intermediate control and production mechanism, now a common approach to the rigid latent spaces of GANs and the almost impenetrable, linear networks of NeRF.
Indeed, the central challenge is to calculate Global Illumination (GI, which has no direct applicability in neural rendering) into an equivalent Pre-calculated radiation transfer (PRT, which can be adapted for neural rendering).
GI is a now venerable CGI rendering technique that models how light bounces off surfaces and off other surfaces, and incorporates those areas of reflected light into a render, for added realism.
The PRT is used as an intermediate lighting function in the new approach, and the fact that it is an unobtrusive and modifiable component is what enables disentangling. The new method models the material of the NeRF object with a learned PRT.
The actual scene lighting from the original data is retrieved as an environment map in the process, and the scene geometry itself is extracted as a signed distance field (MSDS) which will eventually provide a traditional mesh that Blender can run on at the virtual light stage.
The first step in the process is to extract the scene geometry from the available multi-view images by implicit surface reconstruction, via techniques used in the 2021 Search NeuS collaboration.
In order to develop a neural radiation transfer field (NRTF, which will host the illumination data), the researchers used the Mitsuba 2 differentiable path tracer.
This facilitates the joint optimization of a two-way broadcast distribution function (BSDF), as well as the generation of an initial environment map. Once the BSDF is created, the path tracer can be used in Blender (see embedded video directly above) to create virtual scene renderers one light at a time (OLAT).
The NRTF is then trained with a combined loss between the photoreal material effects and the synthetic data, which are not entangled with each other.
The path to enlightenment
The training requirements for this technique, while significantly lower than the original NeRF training times, are not negligible. On a NVIDIA Quadro RTX 8000 with 48GB of VRAM, preliminary training for initial light and texture estimation takes 30 minutes; The OLAT training (i.e. the training of the virtual shots of the light scene) lasts eight hours; and the final joint optimization between the disentangled synthetic and real data takes another 16 hours to reach optimal quality.
Moreover, the resulting neural representation cannot operate in real time, taking, according to the researchers, “several seconds per image.”
The researchers conclude:
“Our results demonstrate a marked improvement over the current state of the art, while future work may involve further improvement in runtime and joint reasoning of the geometry, material and illumination of the scene.”
First published July 28, 2022.