NeRF: Neural Radiance Fields network (Mildenhall et al., 2020).
Maps 3D coordinates (and optionally viewing directions) to color and density values using Fourier positional encoding followed by an MLP with skip connections. This is the core network architecture used in Neural Radiance Fields for novel view synthesis.
Architecture
Coordinates [batch, 3]
|
+-----v--------------------+
| Fourier Encoding | gamma(p) = [p, sin(2^0*pi*p), cos(2^0*pi*p), ...]
+---------------------------+
|
v
[batch, 3 * (2*L + 1)]
|
+-----v--------------------+
| Dense Layer 1 | ReLU
| Dense Layer 2 | ReLU
| ... |
| Dense Layer K (skip_layer)| Concatenate encoded input, ReLU
| ... |
| Dense Layer N | ReLU
+---------------------------+
|
+---> Density sigma [batch, 1]
|
+---> Feature -> concat(directions_encoded) -> Dense -> RGB [batch, 3]
|
v
Output [batch, 4] (RGB + density)Differences from Vision Models
Unlike other vision models in Edifice, NeRF does not take image inputs. Instead, it takes raw 3D coordinates and optional viewing directions, making it fundamentally a coordinate-to-color mapping network.
Usage
# With viewing directions
model = NeRF.build(
hidden_size: 256,
num_layers: 8,
skip_layer: 4,
num_frequencies: 10,
use_viewdir: true
)
# Without viewing directions
model = NeRF.build(use_viewdir: false)References
- Mildenhall et al., "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" (ECCV 2020)
- https://arxiv.org/abs/2003.08934
Summary
Types
@type build_opt() :: {:coord_dim, pos_integer()} | {:dir_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_frequencies, pos_integer()} | {:num_layers, pos_integer()} | {:skip_layer, pos_integer()} | {:use_viewdir, boolean()}
Options for build/1.
Functions
Build a NeRF network.
Options
:coord_dim- Coordinate input dimension (default: 3):dir_dim- Viewing direction dimension (default: 3):hidden_size- Hidden layer size (default: 256):num_layers- Number of MLP layers (default: 8):skip_layer- Layer index for skip connection (default: 4):num_frequencies- Number of Fourier frequency bands (default: 10):use_viewdir- Whether to use viewing direction input (default: true)
Returns
An Axon model that takes "coordinates" (and optionally "directions") inputs
and outputs [batch, 4] (RGB + density).
@spec output_size(keyword()) :: pos_integer()
Get the output size of a NeRF model.
Always returns 4 (RGB + density).